34.5us 1024 Point Complex FFT for the ADSP-21479

Hello!

Question from customer: How to reach performance of 34.5us for 1024 Point Complex FFT using the ADSP-21479? Is there any code example or library available?

Customer's calculations (based on ADSP-214xx_HRM_rev0.3) shows that 39168 cycles (294.5us) are requred for 1024 pint FFT on signal lengths with power of 2.

Thanks,

Aleksey

  • 0
    •  Analog Employees 
    on Jan 11, 2011 9:16 PM

    Hi,

    The 39168 cycles you got is using the hardware accelerators on the 21479. The algorithm used for large FFTs in the hardware accelerator requires these cycles. You could implement the FFTs in software in the core to acheive higher performance for  FFTs

    You can  find library software codes for complex FFT (CFFTN) which are optimized for SHARC SIMD defined under filter.h as descrobed in the runtime library manual at the following link.

    http://www.analog.com/static/imported-files/software_manuals/50_21k_rtl_mn_rev_1.4.pdf

    You can also find the benchmark code  radix 2 FFT for this at the following link. Though these benchmarks are for 21364 they are applicable to 21479 as well.

    http://www.analog.com/en/embedded-processing-dsp/SHARC/products/code-examples/2136x_application_code_examples/resources/fca.html.

    Thanks,

    Divya

  • Thank you for the help.

    Though I still have a question. What is the purpose of FFT accelerator if the core could solve this task with around in 10 times higher perfomance? In case of using 21364 this is reasonable, because it doesn't have an accelearator. That is why I'm using in my project 21479 instead of 21364. There are another tasks for the core, that might be processed in the same time as FFT.

    So adding an accelerator into 21479 doesn't improve the perfomance of FFT in comparance to 21364, it only gives an ability to free the core by calculating 1024-point FFT in 10 times slower, am I right?

  • 0
    •  Analog Employees 
    on Jan 12, 2011 8:42 PM

    Hi,

    Yes your understanding is right.

    Thanks,

    Divya

  • Using the CFFT1024 function on the ADSP-21479 EZ Lite eval kit, and I am getting a run time of 2.453572e-04 seconds.  I have also check the "optimized for time" compile option in DSP++.  My code is below just to make sure I have used the function corretly, but why would that run time be so slow if its optimized for the SIMD?  I have tried reusing the assembly code that you mentioned, but I am having errors with non existant registers being used.  Did Analog actually rewrite the code to verify the benchmark and if so could you give me a copy to show our customer in house?

    #include <trans.h>
    #include <time.h>
    #include <stdio.h>

    #define N 1024

    extern int main( void ){

        float real_input[N], imag_input[N];
        float real_output[N], imag_output[N];
        volatile clock_t clock_start;
        volatile clock_t clock_stop;
        double secs;
        int p;
       
        printf("Starting Verification test \n");
        for(p=0;p<4;p++){
            //take initial time stamp
            clock_start = clock();
            cfft1024 (real_input, imag_input, real_output, imag_output);
            //take final time stamp
            clock_stop = clock();
       
            //print time taken
            secs = ((double) (clock_stop - clock_start))/ CLOCKS_PER_SEC;
            printf("Cycle %d's time taken is %e seconds\n",p,secs);   
        }
        printf("Verification test concluded \n");

        return 0;
    }

  • 0
    •  Analog Employees 
    on Apr 7, 2011 12:14 AM

    Hi,

    The code you gave below doesnot use the SIMD capability. To use the SIMD capability  the CFFTN library function should be used with complex interleaved input as below.

    volatile clock_t clock_start;
        volatile clock_t clock_stop;
        double secs;
        int cyclecount;
        int p;
       complex_float input[N];
       complex_float output[N] ;
       
      
        printf("Starting Verification test \n");
     
            clock_start = clock();
       
            cfft1024(input, output);
         
            clock_stop = clock();
           
            cyclecount = clock_stop-clock_start;
       
            //print time taken
            secs = ((double) (clock_stop - clock_start))/ Frequency;
            printf(" time taken is %e seconds\n",secs);   
       
        printf("Verification test concluded \n");

    With the above code you will get 1.740188e-04 seconds using the SIMD capability with the library function. The function uses the radix-2 FFT.

    The benchmark codes in the link given in this are hand tuned in assembly. I am attaching a single channel 1024 point radix4 FFT benchmark ported for ADSP-21479. The cycle count for this benchmark is 16934 core clock cycles which gives about 63.6us for 266Mhz speed.

    Thanks,

    Divya