Post Go back to editing

EMDMA and System MMR Latencies

Thread Summary

The user inquired about calculating the core cycles for EMDMA System MMR Latencies on the ADSP-2156x SHARC+ Processor. The final answer clarified that the provided latencies are for core accesses to peripheral MMRs, not for EMDMA execution, and directed the user to refer to application notes EE-412 and EE-461 for detailed throughput data and memory transfer characteristics. The user was also advised to consider all possible overheads, such as DDR access delays, for a more accurate calculation.
AI Generated Content
Category: Hardware
Product Number: ADSP-21569

Hi.

Can you check me for EMDMA System MMR Latencies.

I took System MMR Latencies from ADSP-2156x SHARC+ Processor System Optimization Techniques (EE-412).

I use EMDMA TCBs for Standard Circular DMA. Each TCB copy one word from L1 memory to external (DDRIII) memory. I have 32 TCB list.

So EMDMA need to load 8 Registers for Standard Circular DMA . So it take 8*43=344 core cycles.

Than EMDMA must read data from L1 memory and write to external memory. I don't known how long it. I assume it be 43+44=87 core cycles.

Than EMDMA must renew External Index Register. It take 1*44=44 core cycles.

So one TCB need 344+87+44=475 core cycles.

32x TCB: 32*475=15200 core cycles.

Am I right?

Best regards.

  • Hi daim,

    You can measure the cycle counts of specific code using the method described in the help chapter "Measuring Cycle Counts." Refer to all topics in that chapter for more information.

    CrossCore® Embedded Studio 2..x.x > Blackfin® Development Tools Documentation > C/C++ Compiler and Library Manual for Blackfin® Processors > DSP Run-Time Library > DSP Run-Time Library Guide > Measuring Cycle Counts

    A basic example of this cycle count is as follows:

    #include <cycle_count.h>
    #include <stdio.h>

    int main(void)
    {
    cycle_t start_count;
    cycle_t final_count;
    START_CYCLE_COUNT(start_count)
    Some_Function_Or_Code_To_Measure();
    STOP_CYCLE_COUNT(final_count,start_count)

    PRINT_CYCLES("Number of cycles: ",final_count)
    };

    When using cycle counts, ensure the DO_CYCLE_COUNTS macro is enabled in Project Options. You can enable it in one of two ways:

    1.Add DO_CYCLE_COUNTS under Project Options → Compile → Preprocessor → Preprocessor Definitions
    2.Or add -DDO_CYCLE_COUNTS under Project Options → Compile → Additional Options

    Also refer to the EMDMA_Throughput example; it can be used to measure EMDMA throughput for different parameters on the ADSP-2156x (EE-412 example). Please find the below link,
    https://www.analog.com/media/en/technical-documentation/application-notes/ee412v02.zip

    Regards,
    Nandini C

  • Hi.

    I want to calculate rough core clock.

    Can I do to calculate a rough core clocks of EMDMA work time  without tests?

    Best regards.

  • Hi Daim,

    The initial calculation was not accurate because these latencies describe core accesses to peripheral MMRs during configuration, not the cycles EMDMA spends executing each TCB and also it did not consider the actual transmission cycles required for each TCB.

    Please refer to the Extended Memory DMA (EMDMA) and L3/External Memory Throughput sections in the app note EE-412 and EE-461 link below:
    EE-412: https://www.analog.com/media/en/technical-documentation/application-notes/ee412v02.pdf
    EE-461: https://www.analog.com/media/en/technical-documentation/application-notes/ee461v03.pdf

    These application notes discuss architectural features that contribute to overall system bandwidth and latencies, including measured EMDMA throughput data and memory transfer characteristics.

    Regarding DDR access on the ADSP-21569, DDR is external to the processor and must go through the system crossbar and memory controller. It also has additional overhead like row activation, CAS delays, periodic refresh cycles, and operates in a different clock domain with a narrower bus. All of these factors add wait time compared to on-chip L1/L2 memory, so higher latency is expected when accessing DDR.

    Please consider all the possible overheads to calculate theoretically for approximate core cycles, whereas running the core cycle example mentioned above for estimation would give you better accuracy.

    Please refer these documents above in detail to understand how the core cycles are consumed, including all the relevant overhead for different configurations.

    Regards,
    Nandini C