In today’s FPGA-based digital signal processing (DSP) world, system performance is often limited not only by computation but also by how quickly and efficiently data can move between memory and custom logic blocks. Modern radar receivers, high-speed data acquisition systems, and wireless basebands frequently need to stream gigabytes of data per second, continuously. Relying on a CPU to handle this data movement is neither reliable nor provides sufficient throughput.
This blog will walk you through the process of configuring AXI Direct Memory Access (DMA) in Vivado for both Memory Mapped to Stream (MM2S) and Stream to Memory Mapped (S2MM) modes. Successful implementation will enable high-performance data movement between DDR memory and FPGA fabric with the benefit of minimal latency and maximum efficiency.
The MM2S path is ideal for driving pre-stored samples into your signal processing pipeline with very low latency, whether it involves high-speed streaming data, a precomputed sine lookup table, or buffered ADC captures. The S2MM path does the reverse—it grabs your data processed via the FPGA streaming interfaces and writes it back into system memory, allowing it to be used again or integrated with software applications for further processing. Envision the data pushing into and out of the FPGA as a two-way express lane—accelerating throughput and boosting the system’s overall responsiveness.
The features of both MM2S and S2MM are explained in the following table.
Feature |
MM2S |
S2MM |
Direction |
Memory to Stream |
Stream to Memory |
AXI Interface |
Master AXI4-M (for reads) |
Slave AXI4-M (for writes) |
Stream Interface |
AXI4-Stream (output) |
AXI4-Stream (input) |
Common Use |
Send data to hardware (e.g. DAC) |
Capture data from hardware (e.g. ADC) |
AXI Protocols in Brief
Before diving into design, let’s briefly recap the relevant AXI protocols:
Protocol |
Use Case |
Key Feature |
AXI4 |
Memory-mapped transfers |
Burst support |
AXI4-Lite |
Low-throughput control |
Simple register access |
AXI4-Stream |
High-speed streaming data |
No address phase; handshake only |
System Architecture Overview
The key components of this design include:
- Zynq PS (Processing System): Contains the processor and DDR memory controller.
- AXI DMA (PL): Handles data transfer between memory and AXI streaming interfaces.
- AXI4-Stream Data FIFO (PL): Used to create the loopback between MM2S and S2MM DMA streams.
Communication between components is facilitated via:
- AXI-Lite Interface: For processor to DMA configuration and status.
- AXI4 Memory-Mapped Interfaces (MM2S and S2MM): For data transfer to/from memory.
- AXI4-Stream Interfaces: For data streams without addressing.
Figure 1: Configuration block design illustrates how these interfaces connect various components in the system, forming the backbone of efficient data movement.
Figure 1: Configuration block design
Figure 2: RTL block design
A Perfect Solution
Congratulations! You have successfully created a loopback design using AXI DMA and a FIFO in Vivado for the Zedboard. This setup serves as a template for more advanced data acquisition and processing designs where the FIFO can be replaced with any custom IP, such as a filter, modulator, or signal analyzer.
Designing an AXI DMA engine at the RTL level offers deep control and optimization potential in high-performance DSP systems. The AXI4-Stream enables the transfer of massive data with minimal CPU intervention and low latency, making it an ideal solution for radar, communications, and real-time acquisition applications.
Upcoming blogs will explore FFT, FIR filter design, DDS implementation, and real-world signal processing chains—all built on this solid DMA foundation.