Post Go back to editing

AD9364 TX/RX path latency

We have an application that is very sensitive to latency in both the TX and RX paths.

The AD9364 Reference Manual explains the latency due to the digital filters, but in addition to this latency, we are measuring an additional latency that is not explained in the data sheet or reference manual.

Our clock setup using the no-os drivers is as follows:
ad9361_set_trx_clock_chain: 960000000 480000000 240000000 120000000 60000000 60000000
ad9361_set_trx_clock_chain: 960000000 240000000 120000000 60000000 60000000 60000000

Looking at the AD9364 users guide "DIGITAL Rx BLOCK DELAY" it looks like I should be seeing a delay contribution from the digital filters of approx:
                  HB3           HB2           HB1   
RX path (2/240M) + (2/120M) + (7/60M) = 8.3ns + 16.6ns + 116ns = 140ns
TX path (2/120M) + (2/60M)  + 0             = 16.6ns + 33.8ns             =  50.4ns
 
= 190.4ns total delay for digital filters

I am using an ILA (logic analyzer) in our FPGA to capture the TX and RX data just before clocking in/out to the AD9364.  I am seeing a delay of approx. 800ns.

 

I know that the 190ns is only the digital filter delay.  Is there a description somewhere of what the source might be for the additional ~600ns delay I am measuring?

Most importantly for our application, is there anything that can be done to reduce the latency below the 800ns we are currently seeing?

 

Thank you for any assistance.

  • Refer below post

    AD9361 latency  

    Also as mentioned in post use Filter wizard to get overall latency including analog and digital path.

  • Thanks, sripad,

    As mentioned in my post, I already know the digital filter latency.  What I am asking about is the additional ~600ns latency that I am measuring that is not accounted for by the digital filters.

    I understand the analog filters will add latency as well, but do not have any reference that tells me what that might be.  I currently set the rf_rx_bandwidth_hz/rf_tx_bandwidth_hz to 30000000 using the no-os drivers. Can the analog filter latency explain the 600ns latency I am seeing that is unaccounted for?

    I do not have matlab so cannot use the filter wizard.  If someone could run the filter wizard with the clock configuration I show, and the rf bandwidth of 30000000, I would appreciate knowing what is says the latency should be.

    thanks,

  • To put this another way:

    What latency can I expect in addition to what the Filter Wizard shows for the digital and analog filters and is there anything that I can configure to reduce this additional latency?

    Also, if it is not clear from my description above, we have both the RX and TX FIR filters disabled.

    Thanks, 

  • Only the FIR can add more delay as it runs at lower rate and no of taps.

    if decimation is 1 but not bypassed which is possible then still it goes through FIR and you may be seeing this extra delay

    calculated for you configuration 

      TX Interpolation Filter order Delay RX Decimation Filter order Delay
    Data rate 6.00E+07       6.00E+07      
    FIR 6.00E+07 1 37 3.08E-07 6.00E+07 1 37 3.08E-07
    HB1 6.00E+07 1 0 0.00E+00 1.20E+08 2 14 5.83E-08
    HB2 1.20E+08 2 6 2.50E-08 2.40E+08 2 6 1.25E-08
    HB3 2.40E+08 2 4 8.33E-09 4.80E+08 2 4 4.17E-09
          TX Delay 3.4167E-07     RX delay 3.83E-07

    You can see total delay comes around 725 ns almost near to your measured value

    So please read 0x02 and 0x03 registers just to make sure FIR is bypassed?

  • Following are register values actually read from device at run time.
    register[0x2]=0x58
    register[0x3]=0x5c

    So for TX filters (register 0x2) this is:
    D6 = 1 : TX channel enabled
    D5:4 = 01 : HB3 enabled with interpolate by 2
    D3 = 1 : HB2 enabled interpolate by 2
    D2 = 0 : HB1 bypassed interpolate by 1
    D1:0 = 00 : FIR filter bypassed interpolate by 1

    For RX filters (register 0x3) this is:
    D6 = 1 : RX channel enabled
    D5:4 = 01 : HB3 enabled with decimate by 2
    D3 = 1 : HB2 enabled decimate by 2
    D2 = 1 : HB1 enabled decimate by 2
    D1:0 = 00 : FIR bypassed decimate by 1


    Just to be sure that the FIRs are configured as expected, I also intentionally enabled the RX and TX FIR with 16 taps each and saw a correpsonding increase in latency so I am very certain the FIRs are being bypassed as expected.

    I am confused by the RX path delay calculation you show. Shouldn't it be the output clock rate from each filter that is used (after decimate)? That's what the datasheet shows. It looks like you are using the input filter rate (before decimate). In any case, I made the same mistake for the TX path I show in the original post so the delay should be even less than I show.

    So I think what it should be is :

    Filter TX clock Interpolation Filter order Delay RX clock Decimation Filter order
    FIR 6.00E+07 1 37 0.00E+00 6.00E+07 1 37 0.00E+00
    HB1 6.00E+07 1 0 0.00E+00 1.20E+08 2 14 1.17E-07
    HB2 1.20E+08 2 6 2.50E-08 2.40E+08 2 6 2.50E-08
    HB3 2.40E+08 2 4 8.33E-09 4.80E+08 2 4 8.33E-09
          TX Delay 3.33E-08     RX delay 1.50E-07

    This gives a grand total of 183ns, which is a bit less than I originally calculated.

    As mentioned.  I don't have Matlab so can't run the filter wizard.  What does it show the latency should be including the analog filters?

    In our actual system, I am using a QPSK pi/4 modulator/demodulator in our baseband processor (FPGA).  I am instantiating an ILA in Vivado to capture the baseband data just as it is being received / transmitted  to/from the FPGA using an LVDS interface.  This is using the Analog Devices HDL reference AD9361 HDL Reference Designs [Analog Devices Wiki] axi_ad9361_lvds_if.v and the ILA is connected to this modules adc_data and dac_data nets.  The ILA is capturing the data signals using the AD9361 rx clock domain (120MHz in this case because of the lvds interface).

    To test I am setting both RX and TX LO to the same frequency and sending some number of 00 symbols (45degree phase shift per symbol because of the PI/4) then sending a 01 symbol and watching the phase change propogate out the TX path and then being received in the RX path.

    So the question is are there expected delays in addition to the filters that would explain the total latency I am seeing of approx. 800ns (approx. 600ns still unaccounted for)?

    Thanks for any assistance.

  • Updated table of delay calculations:

    TX Filter Path Order Interpolate Fs (after interpolate) Delay
    FIR 0 (bypassed) 1 6.00E+07 0.00E+00
    HB1 15 (bypassed) 1 6.00E+07 0.00E+00
    HB2 7 2 1.20E+08 2.50E-08
    HB3 3 2 2.40E+08 4.17E-09
    TX delay 2.92E-08
    RX Filter Path Order Decimate Fs (before decimate) Delay
    FIR 0 (bypassed) 1 6.00E+07 0.00E+00
    HB1 15 2 1.20E+08 5.83E-08
    HB2 7 2 2.40E+08 1.25E-08
    HB3 5 2 4.80E+08 4.17E-09
    RX delay 7.50E-08
    Total RX+TX delay 1.04E-07

    The total delay is 104ns but we are measuring ~800ns.

    Where can the other ~700ns come from?

  • If I enable the BIST loopback in the AD9361 (TX->RX) then I measure a latency of 18 interface clock cycles or 150ns.

  • Hope you understood why we take filter rates in RX, I can see that now you consider filter rates in latest calculation.

    But still you are using wrong filter orders

    If you compare from UG-570 

    Recalculated 

      TX Interpolation Filter order Delay RX Decimation Filter order Delay
    Data rate 6.00E+07       6.00E+07      
    FIR 6.00E+07 1 0 0.00E+00 6.00E+07 1 0 0.00E+00
    HB1 6.00E+07 1 0 0.00E+00 1.20E+08 2 14 5.83E-08
    HB2 1.20E+08 2 6 2.50E-08 2.40E+08 2 6 1.25E-08
    HB3 1.20E+08 1 2 8.33E-09 4.80E+08 2 4 4.17E-09
                     
          TX Delay 3.3333E-08     RX delay 7.5E-08
    Total 1.08E-07

    It is around 108 ns.

    Is it possible for you to generate a pulsed signal in TX and capture same in RX after loop back and check the delay.

    Also share ILA capture for same.

    RX Filter setting and group delay

    TX Filter setting and group delay

    Even when we don't enable internal FIR the tool calculates group delay expecting user will implement FIR in BB and gives a FIR with few taps.

    If we add delay with 19 taps for FIR the delay for TX is around 192 ns and RX is 233 ns and it matches with the simulation results.

    Please try same using pulsed input and verify.

  • Here is the ILA capture using a simple impulse on TX and capturing RX.

    • DAC_I and DAC_Q are 12 bit slices of the axi_ad9361_lvds_if.v dac_data[47:0] nets
    • ADC_I and ADC_Q are 12 bit slices of the axi_ad9361_lvds_if.v adc_data[47:0] nets
    • tx_p_data_p is the 6 bit muxed tx data just as it is clocked into the ODDR/OBUFDS
    • rx_data_p is the 6 bit muxed rx data just as it is registered from the IDDR

    TX Impulse to RX using external RF loopback

    All signals are captured in the clock domain derived from the AD9364 DATA_CLK running at 120MHz.  We are using LVDS interface with DDR.

    The txp_data_p and rx_data_p are multiplexed between MSW/LSW I/Q.  The blue marker is placed at the start of the tx impulse as it is clocked into the ODDR.  The yellow marker is placed at the point where the data is clocked in from the IDDR.

    There is a delay of 72 clock pulses between them that corresponds to a latency of 600ns.

    For reference, below is a similar capture with the BIST data loopback enabled

    TX Impulse to RX using BIST data loopback

    Even the BIST data loopback has a latency of 17 clock cycles (142ns).  This is what leads me to believe there are significant sources of latency in addition to what we are calculating for just the digital filters.

    We just need to know what these additional sources of latency are, and if there is anything we can do about them.

  • As an experiment, I also tried leaving the ADC / DAC data rates as is (480MS/s, 240MS/s respectively), but additionally bypassing RX-HB1 and TX-HB2 with a DATA_CLK now running at 240MHz vs 120MHz.

    After this change, the latency was cut in half, so it looks like whatever is introducing the additional latency is related to DATA_CLK and not the ADC/DAC clocks.

    The problem is, this would also seem to violate the maximum baseband rate of 61.44MHz even though UG-673 says DATA_CLK can go to 245.76MHz.