2008-12-08 12:58:51 Cycle counts for the basic arithmetic operations are not consistent
Kiran Kumar B (INDIA)
Message: 66517
Hi all,
I was trying to profile the basic arithmetic operation of DSP Blackfin BF527 under uClinux. For testing purpose I only have a single threaded application , just running the multiplication of variables of same data types ( float, long, long long, ushort, short, int, uint ). I have included the floating point library options in the Make File .
I observed that each time I run the application, the cycle count for float multiplication is just 2 cycles and also the cycle count varies for Othe data types from 18 to 34+ ( each time the application is run, gives different cycle count ).
Why is there no consistency in the cycle count ? are there any system calls called inbetween? Is there any way to run a bare metal app? ( but with linux thread scheduler ). Can anyone share with me any previous profiled results for basic arithmetic operations compiled under gcc / VDSP?
Thank you
Kiran
QuoteReplyEditDelete
2008-12-08 13:42:05 Re: Cycle counts for the basic arithmetic operations are not consistent
Robin Getz (UNITED STATES)
Message: 66518
Kiran:
It's a multi-tasking operating system - of course other things can go on. Your application can be swapped out, and not come back until something else (higher priority) is complete.
-Robin
QuoteReplyEditDelete
2008-12-08 15:36:11 Re: Cycle counts for the basic arithmetic operations are not consistent
Mike Frysinger (UNITED STATES)
Message: 66523
there is no way to run bare metal code under Linux. those two terms together simply dont make sense together.
QuoteReplyEditDelete
2008-12-09 07:38:28 Re: Cycle counts for the basic arithmetic operations are not consistent
Kiran Kumar B (INDIA)
Message: 66556
Robin: Thank you for the reply...
We have the text in the L1. The I cache is configured for 16K and D cache for 32K.
Does it mean that during the swap out of application, the cache will be flushed? Would this add up cycle count for the arithmetic operations that I am interested in ? Does a Bare metal app ( complied with VDSP ) have a greater performance in terms of speed, compared to code running under uClinux ?
Is there a method to target/profile cache hits & misses?
What percentage improvement can we get by running a bare metal app ?
Thanks
Kiran
QuoteReplyEditDelete
2008-12-09 07:45:30 Re: Cycle counts for the basic arithmetic operations are not consistent
Mike Frysinger (UNITED STATES)
Message: 66558
caches do not get flushed on context switches, but they'll certainly get polluted.
if you're using cycles, then every instruction executed gets added up ... the cycles registers does not differentiate between user/supervisor mode, nor does it know anything about processes.
Blackfin code will run exactly the same cycle count regardless when it is active. the only difference is that obviously bare metal wont have scheduling issues (unless you add scheduling of course).
there is no method atm to track cache hits/misses.
we've done no benchmarks for bare-metal vs Linux userspace nor do we plan to. it just doesnt make any sense to. there's very little overlap between people who use Linux and people who use bare metal.
QuoteReplyEditDelete
2008-12-09 07:50:18 Re: Cycle counts for the basic arithmetic operations are not consistent
Robin Getz (UNITED STATES)
Message: 66561
Kiran:
The only thing I would add to Mike's comments - is that comparing Linux to bare metal is independant of the toolchain, since both VDSP and bfin-elf-gcc can compile things without an OS.
-Robin