2009-04-07 02:00:29     Memory optimization

Document created by Aaronwu Employee on Aug 14, 2013
Version 1Show Document
  • View in full screen mode

2009-04-07 02:00:29     Memory optimization

Glauber Tadeu (BRAZIL)

Message: 72290   




Blackfin is optimizate to load 32 to 64 bits in each memory access. Is uClinux optimizate to it? A possible optimization, for example, is use only 32 multiple address when acess the memory.








2009-04-07 02:12:47     Re: Memory optimization

Mike Frysinger (UNITED STATES)

Message: 72291   


that's really up to the code you write.  if you use 32bit sized types, then the compiler will generate 32bit aligned loads/stores.  if you use 16bit types, then the compiler will generate 16bit aligned loads/stores.





2009-04-07 02:37:14     Re: Memory optimization

Glauber Tadeu (BRAZIL)

Message: 72293   


Ok, thanks..


In my problem I've an image (char matrix) and access each position with a loop. Is blackfin will access byte to byte or load 4 bytes in only one access? I wrote the code in C, using gcc cross-compiler. Below there is a code example. How can I do this analysis?


int x, y;


unsigned char image[512][512];


for(x=0; x<512; x++)


for(y=0; y<512; y++)




image[y][x] = 0;




image[y][x] = 255;




Thanks again.




2009-04-07 02:41:51     Re: Memory optimization

Mike Frysinger (UNITED STATES)

Message: 72294   


why would it use 32bit stores ?  each element is 8bits wide so doing 32bit stores would completely break things.


if you want to know what kind of assembly is generated, then just ask gcc (use the -S option) or disassemble the object with objdump (and the -d option).




2009-04-07 02:50:04     Re: Memory optimization

Glauber Tadeu (BRAZIL)

Message: 72295   


"time is money"...


If I access 32 bits in a unique cicle, my program will run faster... Am I correct?


My big problem is to prove why a Intel Celeron 1.8GHz (a general propouse processor) with SUSe Linux is faster than a Blackfin 537 with uClinux (audio and video processor) to execute a image process task. Celeron is 10x faster than blackfin (the same code, compiled with gcc and gcc-crosscopiler).


Ok, frequency can be a reason... but 10x is a big difference, there are more reasons. I'm trying understand what's happening.




2009-04-07 03:02:48     Re: Memory optimization

Mike Frysinger (UNITED STATES)

Message: 72297   


you're doing byte stores, not 32bit stores.  and considering the code in question, i dont really see how you could possibly expect anything different.  there are no instructions to do operations that load 32bits and then bit shift/test/assign each 8bits individually before doing a 32bit store.


the host PC not only has frequency on the Blackfin processor, but it also has much faster memory, much wider busses, and much bigger caches.  the BF537 only has a 16bit bus to external SDRAM.