2009-04-07 02:00:29 Memory optimization
Glauber Tadeu (BRAZIL)
Message: 72290
Hi
Blackfin is optimizate to load 32 to 64 bits in each memory access. Is uClinux optimizate to it? A possible optimization, for example, is use only 32 multiple address when acess the memory.
Regards,
Glauber
QuoteReplyEditDelete
2009-04-07 02:12:47 Re: Memory optimization
Mike Frysinger (UNITED STATES)
Message: 72291
that's really up to the code you write. if you use 32bit sized types, then the compiler will generate 32bit aligned loads/stores. if you use 16bit types, then the compiler will generate 16bit aligned loads/stores.
QuoteReplyEditDelete
2009-04-07 02:37:14 Re: Memory optimization
Glauber Tadeu (BRAZIL)
Message: 72293
Ok, thanks..
In my problem I've an image (char matrix) and access each position with a loop. Is blackfin will access byte to byte or load 4 bytes in only one access? I wrote the code in C, using gcc cross-compiler. Below there is a code example. How can I do this analysis?
int x, y;
unsigned char image[512][512];
for(x=0; x<512; x++)
for(y=0; y<512; y++)
if(image[y][x]<127)
image[y][x] = 0;
else
image[y][x] = 255;
Thanks again.
QuoteReplyEditDelete
2009-04-07 02:41:51 Re: Memory optimization
Mike Frysinger (UNITED STATES)
Message: 72294
why would it use 32bit stores ? each element is 8bits wide so doing 32bit stores would completely break things.
if you want to know what kind of assembly is generated, then just ask gcc (use the -S option) or disassemble the object with objdump (and the -d option).
QuoteReplyEditDelete
2009-04-07 02:50:04 Re: Memory optimization
Glauber Tadeu (BRAZIL)
Message: 72295
"time is money"...
If I access 32 bits in a unique cicle, my program will run faster... Am I correct?
My big problem is to prove why a Intel Celeron 1.8GHz (a general propouse processor) with SUSe Linux is faster than a Blackfin 537 with uClinux (audio and video processor) to execute a image process task. Celeron is 10x faster than blackfin (the same code, compiled with gcc and gcc-crosscopiler).
Ok, frequency can be a reason... but 10x is a big difference, there are more reasons. I'm trying understand what's happening.
QuoteReplyEditDelete
2009-04-07 03:02:48 Re: Memory optimization
Mike Frysinger (UNITED STATES)
Message: 72297
you're doing byte stores, not 32bit stores. and considering the code in question, i dont really see how you could possibly expect anything different. there are no instructions to do operations that load 32bits and then bit shift/test/assign each 8bits individually before doing a 32bit store.
the host PC not only has frequency on the Blackfin processor, but it also has much faster memory, much wider busses, and much bigger caches. the BF537 only has a 16bit bus to external SDRAM.