2008-01-04 05:39:18     Performance loss (down to 1/4)

Document created by Aaronwu Employee on Aug 4, 2013
Version 1Show Document
  • View in full screen mode

2008-01-04 05:39:18     Performance loss (down to 1/4)

Piggebu Piggebu (ANDORRA)

Message: 49251    Hi all

 

I encountered a strange behaviour while programming in c++ for the blackfin BF 549 on the BF548-EZ-Kit-Board.

The programme does in general the following:

- read one frame each from two  raw video files

- compare the frames

- display the solution on the touch screen on the board

As this should be in real time, performance is crucial.

 

My problem lies in the routine where the frames are compared: see attachment lines 62-65.

assigning a value to a dereferenced pointer seems to be a problem (??) do you have any experience with this? Does it has to do with uCLinux? Maybe I am just missing a little thing.

 

if it is a bigger issue, please let me know if you need the whole programme to reproduce the effect

 

Kind regards

 

Piggebu

calcs_post.cpp

QuoteReplyEditDelete

 

 

2008-01-04 08:10:38     Re: Performance loss (down to 1/4)

Robin Getz (UNITED STATES)

Message: 49255    Piggebu:

 

You didn't describe the problem you are seeing:

- it doesn't compile

- it has run time errors

- it runs, but is slow (compared to what)

 

?

 

-Robin

QuoteReplyEditDelete

 

 

2008-01-04 08:47:51     Re: Performance loss (down to 1/4)

Piggebu Piggebu (ANDORRA)

Message: 49256    Sorry, I did not write it clearly. I described the problem in the source code file shortly, more precise:

 

in the attached source code file above i do some calculations and at the end I write the result to a variable which will be returned to the main programme.

 

the  subroutine is :

void easy_c(float* calcframe,

            float* frame, unsigned short* framei,

            float* backgr, unsigned short* backgri, int X, int Y, float maxi)

{

    for (p_calcframe=calcframe to end of calcframe, p_calcframe++)

    {

        buffer = do some calculations

/**/        *p_calcframe = buffer

    }

}

 

where p_calcframe points to an element in the array calcframe and buffer is the result of the calculations. See source file for more details. calcframe is defined in the main as "float calcframe = new float[const]"

 

now: everything compiles fine, no runtime errors

but: when I execute the programme I recognize a significant speed difference if I comment the "*p_calcframe = buffer" line ( then it runs fast) compared to when I enable that line (runs factor 4 slower).

btw: the main programme does no further calculations with calcframe etc, so it makes no speed difference for the rest of the programme if calcframe is empty or filled with data.

 

attached you find the source code and the compiled programmes once with the line enabled (*_slow) and disabled (*_fast). Compiled for bf 548 EZ-kit. run with "./videoplay_[fast/slow] i.raw z.raw /dev/fb0"

compile with "make" use then "videoplay", not "videoplay_thrd". If you compile it from source the discussed line is in calcs.cpp line 88. .... and set the correct paths in the Makefile...

 

 

best regards

Piggebu

post.zip

QuoteReplyEditDelete

 

 

2008-01-04 09:02:57     Re: Performance loss (down to 1/4)

Bernd Schmidt (GERMANY)

Message: 49257    Your loop probably gets optimized away if you eliminate the line that uses the value it computes.

QuoteReplyEditDelete

 

 

2008-01-04 09:30:39     Re: Performance loss (down to 1/4)

Piggebu Piggebu (ANDORRA)

Message: 49258    Bernd

 

thank you, this might indeed be in the above case.

I tried the following, in pseudocode this would be:

 

void easy_c(float* calcframe,

            float* frame, unsigned short* framei,

            float* backgr, unsigned short* backgri, int X, int Y, float maxi)

{

    for (p_calcframe=calcframe to end of calcframe, p_calcframe++)

    {

        buffer += do some calculations

/**/        *p_calcframe = buffer

    }

    printf("output: %i",buffer);

}

 

note the += assigned to buffer and the printf command at the end. with this the loop has necesarily to be done. the programme is still 4 times faster than with the *p_calcframe-line added. Therefore I don't think the loop gets optimized away in the case posted

 

kind regards and thank you for your help

QuoteReplyEditDelete

 

 

2008-01-04 09:39:29     Re: Performance loss (down to 1/4)

Piggebu Piggebu (ANDORRA)

Message: 49259    Maybe a hint?

 

I tried different things with using pointers and fix arrays with the following result (again in pseudo code)

 

void easy_c(float* calcframe,

            float* frame, unsigned short* framei,

            float* backgr, unsigned short* backgri, int X, int Y, float maxi)

{

    int buffer[100000];

    for (p_calcframe=calcframe to end of calcframe, p_calcframe++)

    {

        *buffer = do some calculations

    }

}

 

==> This runs fast. (btw: i know the code does not make sense as *buffer always points to the first element, its only for testing)

 

 

void easy_c(float* calcframe,

            float* frame, unsigned short* framei,

            float* backgr, unsigned short* backgri, int X, int Y, float maxi)

{

    int* buffer=new int[100000];

    for (p_calcframe=calcframe to end of calcframe, p_calcframe++)

    {

        *buffer = do some calculations

    }

 

    delete buffer;

}

==> this runs 4 times slower than the first example! Why is there such a big difference? how to solve the problem if it has to be a pointer because the values have to be passed back to the main function?

QuoteReplyEditDelete

 

 

2008-01-04 10:11:52     Re: Performance loss (down to 1/4)

Mike Frysinger (UNITED STATES)

Message: 49260    the first case, you're allocating on the stack which is trivial: you increase the stack point by a constant.  the second case, you're dynamically allocating a huge chunk of memory.  of course there's going to be speed differences.

 

btw, you cannot allocate something that large on the stack or you overflow the stack

QuoteReplyEditDelete

 

 

2008-01-07 03:47:48     Re: Performance loss (down to 1/4)

Piggebu Piggebu (ANDORRA)

Message: 49312    Mike,

 

Thank you for the programming crash course :-) I now see it a bit clearer.

I implemented now a class where I store the calculated data, allocating it on the stack. I then only return a pointer to the array. This seems to work with the desired speed (although not yet tested on the board, only on the host machine)

 

Concerning your "btw"-note: how much can I allocate on the stack? Could you point me the corresponding literature where I can find the description of the "stack" and other memory management issues?  (i.e. which Memory of the board is used for the stack, where is the dynamically allocated memory stored on etc).

 

Thank you for your valuable help

Piggebu

QuoteReplyEditDelete

 

 

2008-01-07 10:41:31     Re: Performance loss (down to 1/4)

Mike Frysinger (UNITED STATES)

Message: 49318    on no-mmu, the stack size is fixed at link time ... without virtual memory, you cannot dynamically grow/shrink the stack

 

this document covers how to control the stack size:

http://docs.blackfin.uclinux.org/doku.php?id=debuging_applications

Attachments

    Outcomes