2009-04-13 15:04:03 memcpy vs. dma_memcpy
Chris Gourley (UNITED STATES)
Message: 72604
We are using a custom 561 board and have been doing some performance testing to optimize moving chunks of memory to and from any combination of L1 cache, L2 cache, and SDRAM. I had assumed that using dma_memcpy() would be the fastest way to accomplish this, but memcpy() is showing a 2-5x speed increase over dma_memcpy. For example, copying 4k of data from L2 to SDRAM using dma_memcpy takes about 22,000 cycles; using memcpy takes about 10,000 cycles. Copying 4k from L1 to L1 takes 34,500 cycles using dma_memcpy; 2120 using memcpy (which I would expect from both). Should this be the case? We currently have 2009R1-pre-svn5690 running with a CCLK of 525MHz and SCLK of 131.25MHz.
QuoteReplyEditDelete
2009-04-13 17:50:40 Re: memcpy vs. dma_memcpy
Robin Getz (UNITED STATES)
Message: 72611
Chris:
the dma overhead of calling into the kernel, locking the dma channels, setting up the dma, waiting for it to be done, and then returning - can be quite high.
If you are using small peices - memcpy might be better. (as you found out).
Also - check out of the custom non-blocking dma memcpy - this will not wait for the DMA to complete before returning to userspace - so should be faster.
docs.blackfin.uclinux.org/doku.php?id=linux-kernel:drivers:bfin-dma
-Robin
QuoteReplyEditDelete
2009-04-14 09:04:23 Re: memcpy vs. dma_memcpy
Chris Gourley (UNITED STATES)
Message: 72636
Thanks Robin.
I will try the non-blocking call as well. I have also tested larger memory sizes (100k from L2 to SRAM, 1M from SRAM to SRAM, and 16M from SRAM to SRAM) and memcpy is always faster than dma_memcpy copying to SRAM. The only place I have seen dma_memcpy faster than memcpy is from SRAM to L2 with more than 8k.
Chris