2010-09-02 05:03:07     how to use hardware loop counters in C language

Document created by Aaronwu Employee on Sep 26, 2013
Version 1Show Document
  • View in full screen mode

2010-09-02 05:03:07     how to use hardware loop counters in C language

Kishore Prahlad (INDIA)

Message: 93067   

 

I am trying to port a Image processing alrorithm on BF 537 which requires computation of  2D DFT (requiring 4 nested for loops) , the disassembly of this  C code does not make use of the hardware loop counters LC0 and LC1 available in the chip .

 

The program almost takes 10 minutes to run and i found out that around 8 minutes is spent in the "for loops" itself

 

Is there any way 2 make use of the hardware loop counters in C language so that I can reduce the time taken to run the code (since hardware loop counters reduce on the no of overheads involved)

 

thanks

QuoteReplyEditDelete

 

 

2010-09-02 08:03:22     Re: how to use hardware loop counters in C language

Steve Kilbane (UNITED KINGDOM)

Message: 93073   

 

Try with the options -O2 -funsafe-loop-optimizations -Wunsafe-loop-optimizations.

 

steve

QuoteReplyEditDelete

 

 

2010-09-03 02:36:00     Re: how to use hardware loop counters in C language

Kishore Prahlad (INDIA)

Message: 93084   

 

I have tried O9 optimisation option yesterday(enables all possible optimisations) but even then it takes 5 -6 minutes to run.

 

The same program takes less than a minute in VDSP++ .

QuoteReplyEditDelete

 

 

2010-09-03 11:14:44     Re: how to use hardware loop counters in C language

Mike Frysinger (UNITED STATES)

Message: 93110   

 

there is no such thing as -O9.  as the documentation states, -O3 is the highest.

 

if you have some example code for us to look at and test with, we can investigate.  otherwise, there isnt much else for us to suggest.

QuoteReplyEditDelete

 

 

2010-09-06 05:14:14     Re: how to use hardware loop counters in C language

Naught not (UNITED STATES)

Message: 93175   

 

But O9 will do what O3 does.(as of now. May change if further optimizations are introduced) Provide code snippets. That will help in identifying the problem.

QuoteReplyEditDelete

 

 

2010-09-06 10:52:40     Re: how to use hardware loop counters in C language

Mike Frysinger (UNITED STATES)

Message: 93182   

 

there hasnt been an -O4 in over a decade.  i doubt things are going to change.

QuoteReplyEditDelete

 

 

2010-09-07 05:40:59     Re: how to use hardware loop counters in C language

Kishore Prahlad (INDIA)

Message: 93219   

 

the following code takes around 5-6 mins (with O3 optimisation enabled),  in uclinux wheras it takes less than a minute (45 secs on an average)  VDSP

 

If these loops can make use of hardware loop counters then much of the overheads will be reduced is what i felt

 

I have attached the C program for blurring an greyscale image of size 32x32 using a 6x6 averaging filter

 

thanks

 

image1.dat

blur.c

QuoteReplyEditDelete

 

 

2010-09-07 09:17:32     Re: how to use hardware loop counters in C language

Steve Kilbane (UNITED KINGDOM)

Message: 93224   

 

I haven't looked into this in detail, but I suspect that getting the hw loops working aren't going to win you all that much, since these loops are heavily dominated by emulated floating-point operations, with floats being promoted to doubles, and that means that the VDSP++ implementation is doing a lot less work than the GCC implementation, out of the box. That's because VDSP++ defaults to -double-size-32, so the floating-point operations will be computed at 32-bit precision. GCC, on the other hand, will be doing operations at 64-bit precision. VDSP++ does appear to be producing better code within the loops, but I don't know how much of that is just down to GCC suffering from increased register pressure, due to dealing with 64-bit parameters/results when invoking the FP emulation.

 

Options you might want to consider: -pipe -ffast-math -mfast-fp. For details, see the GCC manual texts on Optimization and on Blackfin-specific options (these options are making assertions about your code, so they're not suitable for all circumstances).

 

http://gcc.gnu.org/onlinedocs/gcc-4.3.5/gcc/Optimize-Options.html#Optimize-Options

 

http://gcc.gnu.org/onlinedocs/gcc-4.3.5/gcc/Blackfin-Options.html#Blackfin-Options

 

steve

 

PS I'm looking at code produced by GCC 4.3.5.

QuoteReplyEditDelete

 

 

2010-09-07 20:36:19     Re: how to use hardware loop counters in C language

Simon Brewer (AUSTRALIA)

Message: 93228   

 

A couple of other things to keep in mind.  Your code is calling the cos()/sin() functions.  These are going to be really slow on most architectures.  On a DSP, programmers normally use pre-computed sin/cos tables.

 

Additionally, function like the DFT and the IDFT are very common and available as libraries.  See here for some Blackin Linux DSP libraries that should do the job. e.g. Nxn Point 2-D Real Input FFT

 

http://docs.blackfin.uclinux.org/doku.php?id=toolchain:libbfdsp

 

This will require you to use 16 bit fractional arithmetic, but if you are doing image processing this should be enough precision.

 

So my recommendations are:

 

a/ convert arithmetic to 16 bit

 

b/ use built in DSP libraries

 

You should find that this code will probably run faster than .01 second!

QuoteReplyEditDelete

 

 

2010-09-22 08:25:29     Re: how to use hardware loop counters in C language

Kishore Prahlad (INDIA)

Message: 93741   

 

I tried using look up tables for trigonometric functions and now the code takes around 45-50 secs...

 

thanks a lot for ur sugesstions.......

Attachments

    Outcomes