2009-04-13 15:04:03     memcpy vs. dma_memcpy

Document created by Aaronwu Employee on Aug 14, 2013
Version 1Show Document
  • View in full screen mode

2009-04-13 15:04:03     memcpy vs. dma_memcpy

Chris Gourley (UNITED STATES)

Message: 72604   


We are using a custom 561 board and have been doing some performance testing to optimize moving chunks of memory to and from any combination of L1 cache, L2 cache, and SDRAM.  I had assumed that using dma_memcpy() would be the fastest way to accomplish this, but memcpy() is showing a 2-5x speed increase over dma_memcpy.  For example, copying 4k of data from L2 to SDRAM using dma_memcpy takes about 22,000 cycles; using memcpy takes about 10,000 cycles.  Copying 4k from L1 to L1 takes 34,500 cycles using dma_memcpy; 2120 using memcpy (which I would expect from both). Should this be the case?  We currently have 2009R1-pre-svn5690 running with a CCLK of 525MHz and SCLK of 131.25MHz.






2009-04-13 17:50:40     Re: memcpy vs. dma_memcpy


Message: 72611   




the dma overhead of calling into the kernel, locking the dma channels, setting up the dma, waiting for it to be done, and then returning - can be quite high.


If you are using small peices - memcpy might be better. (as you found out).


Also - check out of the custom non-blocking dma memcpy - this will not wait for the DMA to complete before returning to userspace - so should be faster.








2009-04-14 09:04:23     Re: memcpy vs. dma_memcpy

Chris Gourley (UNITED STATES)

Message: 72636   


Thanks Robin.


I will try the non-blocking call as well.  I have also tested larger memory sizes (100k from L2 to SRAM, 1M from SRAM to SRAM, and 16M from SRAM to SRAM) and memcpy is always faster than dma_memcpy copying to SRAM.  The only place I have seen dma_memcpy faster than memcpy is from SRAM to L2 with more than 8k.