Post Go back to editing

4 ad9361 slots rf on custom board, one slot Tx influencing other Tx slots

Hi,

Background:-
1. I brought up four slots design on custom board with changes to original zcu102.
2. I made necessary changes in the linux device tree and all the 12 iio devices are detected.
slot 0 => ad9361-phy[ad9361 rfic] + cf-ad9361-dds-core-lpc [Tx DMA] + cf-ad9361-lpc [Rx DMA].
slot 1 => ad9361-phy0[ad9361 rfic] + cf-ad9361-dds-core-lpc0 [Tx DMA] + cf-ad9361-lpc0 [Rx DMA]
slot 2 => ad9361-phy1[ad9361 rfic] + cf-ad9361-dds-core-lpc1 [Tx DMA] + cf-ad9361-lpc1 [Rx DMA]
slot 3 => ad9361-phy2[ad9361 rfic] + cf-ad9361-dds-core-lpc2 [Tx DMA] + cf-ad9361-lpc2 [Rx DMA]
3. I am using libiio way of configuring txwaveform, txfreq, rxfreq and bandwidth and other settings.
4. I have ensured that the when i run commands for setting bw+fir and sending waveform it is ONLY affecting its related iio:device<x> attributes not the other slot's.
say slot 0 affects only attributes of ad9361-phy[ad9361 rfic] + cf-ad9361-dds-core-lpc [Tx DMA] + cf-ad9361-lpc [Rx DMA] and nothing else.
5. in vivado design, i ensured that everyblock has its own clkdiv feed by its own axi_ad9361 lclk and its own set of tx paths like dac_dma, dac_upack and dac_fifo. same with receive path
no slot path shares any clocks/signals with other slot path [block diagram system.pdf attached1].
in other words, each path is different from others and it is independent.
6. all of the slots are detected as individual MASTERS [something like this]

[ 6.563475] cf_axi_dds 99024000.cf-ad9361-dds-core-lpc: Analog Devices CF_AXI_DDS_DDS MASTER (9.00.b) at 0x99024000 mapped to 0xffffff8008e8e000, probed DDS AD9361
[ 6.618076] cf_axi_dds 99d24000.cf-ad9361-dds-core-lpc0: Analog Devices CF_AXI_DDS_DDS MASTER (9.00.b) at 0x99D24000 mapped to 0xffffff8008eaa000, probed DDS AD9361

7. when i run txdma using libiio on each slot individuall they work perfectly [no issues there].

Problem/issue: -
8. when i keep one slot tx path running [say 15mhz] and other tx path running [5mhz], i see issue that 5mhz path is influencing 15mhz path signal. i see the 15mhz signal is corrupted, surprisingly 5mhz signal looks clean. same is affect with other bandwidths, however the affect is different, split constellation. so it is not isolated to 15mhz and 5mhz combination.

This happens irrespective of which path dma i start first. order does not matter.
9. i looked into DMAC channel registers to this [Base (common to all cores) [Analog Devices Wiki] ]
and [High-Speed DMA Controller Peripheral [Analog Devices Wiki] ]


15mhz path has the following
9c420414 0x7C100000 [start address]
9c420418 0x00383FFF [length]

5mhz path has the following
99d40414 0x7CD00000 [start address]
99d40418 0x0012BFFF [length]

i also ensured that start address <= CURRENT_SRC_ADDRESS [AXI DMAC 0x438] < start address + length

in other words, DMA buffer wise, they are way apart and they conform to multiple of 4k bytes alignment.
10. i put ILA and traced back to DMAC output of 64 bits and here is what i see. Here i am generating a pure sinusoidal signal on 15mhz path [slot 0] and a typical lte signal on 5mhz path [slot 1].
As you can see the waveform is clean if i just run the slot0 tx dma alone. [BEFORE case in pic] -> working case
if slot0 tx dma is run along with 5mhz signal, you can see the gaps [AFTER case in pic] -> non working case

few notes: -
1. However signal is contiguous even with that gap, say iq sample before that idle period and after that idle period are contiguous.


few questions:
1. to operate each path as its own MASTER, is there anything need to be done specifically in dts. [i have attached final output dts].
2. does core_ID of each instance of AXI_DMAC has to be different. does that matter to linux?
3. what can cause the DMAC to pause for few clock cycles and resume back?

4.what is the effect of core_ID [i read from these posts]

https://ez.analog.com/message/276406-re-how-to-use-device-type-and-id-of-the-axiad9467?commentID=276406#comment-276406

https://wiki.analog.com/resources/fpga/docs/axi_ad9361

"ID Core ID should be unique for each AD9361 IP in the system 0"

IDLE CASE ILA waveforms: - 

Working case 15mhz

non working 15mhz:-

dts:-

plnx_aarch64-system.txt

bd: - 

system.pdf

Thanks



added dts this time
[edited by: ENGINEER at 6:14 PM (GMT -4) on 28 Aug 2018]
  • Hi,

    This sounds like a data throughput issue. Each DMA has its own datastream, but I can see in your design that all DMAs go through the same AXI interconnect. That means they share the same data bus and this data bus can only support a certain data throughput. If you go beyond what can be supported by the interconnect you'll see underflows and data corruption.

    Some tricks to avoid this is to spread the DMAs over multiple HPC ports on the UltraScale. Each port allows for 128-bit of data, this means if you use two ports you double your max throughput and if you use 4 ports your quadruple it. Since you have four independent AD9361 paths you could consider allocating one HPC for each pipeline.

    The other thing is that you can increase the clock speed for the interconnect and the MM DMA ports. Your block diagram doesn't show the clock rate, but it is connected to the pl_clk0 which I assume runs at a 100MHz. The FPGA should be able to support a DMA clock to up to 300MHz, which will also increase the maximum throughput by a factor of 3. You can for example do this by using the currently unused pl_clk2 for the DMA clock.

    Another thing, check that your interconnected is configured for maximum performance rather than minimum area. Looking at the waveforms I'd assume it is configured for minimum area at the moment.

    - Lars

  • Hi Lars,

               Thanks for quick answer. That helped indeed.

    I placed each path ADC_DMA + DAC+DMA on individual independent HP bus.

    like

     ADC_DMA0 + DAC+DMA0 on HP0 

     ADC_DMA1 + DAC+DMA1 on HP1 

     ADC_DMA2 + DAC+DMA2 on HP2 

     ADC_DMA3 + DAC+DMA3 on HP3 

    all are clocked at 100MHz.

    working combination with new change: - 

    one slot at 10 mhz and 15mhz

    however, i am seeing same issue at higher bandwidth [i.e combination of 15 and 20], now i will try your next suggestion of clocking at 300MHz.

    Thanks

  • Hi,

    If it is one DMA per HP port and none of the ports are shared in theory 100MHz should be enough. But I see you also have a Ethernet DMA in there which might be causing issues. Also try to make sure that the interconnects are configured for maximum performance.

    - Lars

  • Hi Larsc,

                   Thanks for reply.

    clocking did not help. when i looked into internal diagram, both hp1 and hp2 share a data switch internally--> now this is cause of bus contention.

    so  i had to solve it like this

    ADC_DMA0 + DAC+DMA0 on HP0 

     ADC_DMA1 + DAC+DMA1 on HP1 

     ADC_DMA2 + DAC+DMA2 on HPC1

     ADC_DMA3 + DAC+DMA3 on HP3

    all other peripherals use HPC0 works perfectly. 

    i tried all combinations and above worked in all cases.

    HPC0 and HPC1 have 2 pipes to DDR so no issues there.

    Thanks