Abstract blue tunnel background with dynamic light streaks creating a sense of depth and movement

RTL Design and Verification of AXI DMA for Streaming Data

In today’s FPGA-based digital signal processing (DSP) world, system performance is often limited not only by computation but also by how quickly and efficiently data can move between memory and custom logic blocks. Modern radar receivers, high-speed data acquisition systems, and wireless basebands frequently need to stream gigabytes of data per second, continuously. Relying on a CPU to handle this data movement is neither reliable nor provides sufficient throughput.

This blog will walk you through the process of configuring AXI Direct Memory Access (DMA) in Vivado for both Memory Mapped to Stream (MM2S) and Stream to Memory Mapped (S2MM) modes. Successful implementation will enable high-performance data movement between DDR memory and FPGA fabric with the benefit of minimal latency and maximum efficiency.

The MM2S path is ideal for driving pre-stored samples into your signal processing pipeline with very low latency, whether it involves high-speed streaming data, a precomputed sine lookup table, or buffered ADC captures. The S2MM path does the reverse—it grabs your data processed via the FPGA streaming interfaces and writes it back into system memory, allowing it to be used again or integrated with software applications for further processing. Envision the data pushing into and out of the FPGA as a two-way express lane—accelerating throughput and boosting the system’s overall responsiveness.

 

The features of both MM2S and S2MM are explained in the following table.

Feature

MM2S

S2MM

Direction

Memory to Stream

Stream to Memory

AXI Interface

Master AXI4-M (for reads)

Slave AXI4-M (for writes)

Stream Interface

AXI4-Stream (output)

AXI4-Stream (input)

Common Use

Send data to hardware (e.g. DAC)

Capture data from hardware (e.g. ADC)

AXI Protocols in Brief

Before diving into design, let’s briefly recap the relevant AXI protocols:

Protocol

Use Case

Key Feature

AXI4

Memory-mapped transfers

Burst support

AXI4-Lite

Low-throughput control

Simple register access

AXI4-Stream

High-speed streaming data

No address phase; handshake only

System Architecture Overview

The key components of this design include:

  • Zynq PS (Processing System): Contains the processor and DDR memory controller.
  • AXI DMA (PL): Handles data transfer between memory and AXI streaming interfaces.
  • AXI4-Stream Data FIFO (PL): Used to create the loopback between MM2S and S2MM DMA streams.

Communication between components is facilitated via:

  • AXI-Lite Interface: For processor to DMA configuration and status.
  • AXI4 Memory-Mapped Interfaces (MM2S and S2MM): For data transfer to/from memory.
  • AXI4-Stream Interfaces: For data streams without addressing.

Figure 1: Configuration block design illustrates how these interfaces connect various components in the system, forming the backbone of efficient data movement.




Figure 1: Configuration block design


instructions step 1


instructions step 3




Vivaldo block diagram

Figure 2: RTL block design


instructions step 6 - 7






code 1


code 2


code 3

 A Perfect Solution

Congratulations! You have successfully created a loopback design using AXI DMA and a FIFO in Vivado for the Zedboard. This setup serves as a template for more advanced data acquisition and processing designs where the FIFO can be replaced with any custom IP, such as a filter, modulator, or signal analyzer.

Designing an AXI DMA engine at the RTL level offers deep control and optimization potential in high-performance DSP systems. The AXI4-Stream enables the transfer of massive data with minimal CPU intervention and low latency, making it an ideal solution for radar, communications, and real-time acquisition applications.

Upcoming blogs will explore FFT, FIR filter design, DDS implementation, and real-world signal processing chains—all built on this solid DMA foundation.