I've been looking into the implementation of the util_clkdiv and then stumbled over this discussion on the xilinx forums. In short, it's exact;y about how you can safely divide a "fast" clock by some integer value to get a "slower" one.
The discussion stated that when using BUFR after BUFG (what is seen in the ad9361 reference design) you'll get mesochronous clocks and need to use FIFO's for the datapath.
However, if you use a BUFG in pair with BUFGCE (as on the diagram), then you get synchronous clocks and don't have to worry about synchronization anymore.
It seems like an easy patch for the reference design to remove util_clkidiv IP and change ad_data_clk.v to include the above diagram. But maybe I am missing something and this won't work at all?
P.S. By no means I am proposing these changes into the mainline repo. This is an experiment I would like to conduct for my own design.
Hi,Those FIFOs do more than just clock domain crossing more precisely, downscale/upscale the clock rate. Even if the connection between the axi_ad9361 and FIFO is a parallel one, the data is sent one I/Q pair at a time(serial manner), this is because of the interface operates.https://wiki.analog.com/resources/fpga/docs/util_rfifo
What you are proposing can positively impact the timing margin of the design .
Thank you for your suggestion!Regards,Andrei
For me the main issue with these FIFO's was their latecny: https://ez.analog.com/fpga/f/q-a/112232/synchronous-dac-adc-dma-s-in-new-hdl-releases
lnagy said:The util_rfifo and util_wfifo does not have a fixed latency from enable to enable, it's better to remove it if you are going after a deterministic delay
So a project with a stable roundtrip latency from tx to rx had to operate without them using initial high clock frequency.
When using the proposed BUFGCE method I still have to insert an IP core that would correct for the length of valid pulses:
i.e. on the rx side the adc_valid signal from ad9361 may not be aligned with the posedge of the downscaled clock and has to be set for 2-4 ticks depending on the clock div value. However, this could be done with a stable latency, because there's no "real CDC" happening.
I have not yet implemented this idea in our project, but will post here once I'll have enough time to check everything in hardware.