CMSIS DSP FFT functionality increases size of the project drastically

Hi All, My Question is related to CMSIS DSP Library. i'm working on ADuCM3029 EZ-KIT. I'm using IAR v7.8.1 to build the project. In my application i have an ADC interfaced with the controller, from which i acquire 60 samples and stores it into the buffer. I intend to do math operations (FFT to be specific) on the acquired buffer, I've enabled use CMSIS option and Checked DSP library in the project settings. but the size of my code increases horrendously as soon as I include FFT operation, and becomes ~83 KB from ~9 KB. Is there anything that i'm doing wrong over here? Is there a better way to add FFT functionality to the project without letting the size increase horrendously? 

Along with this, I can see that when FFT function is executed, the array realCoefBQ15 is getting overlapped into one of the SPI descriptor which is causing Bus Fault Error. Since I'm using 64 Point radix 2 FFT, that many twiddle factor coefficients are of no use. I understand it has been defined in that way so that the same array can  be used for upto 4096 points FFT.

Is there any way to avoid this?

,  can you guys look into it and help me out on this? 

Any help will be deeply appreciated. 

  • Hi,

    Regarding the size increment, this is an ARM matter. The CMSIS-DSP library is made by ARM and they created a very optimized code for each of the cores. I've made a small comparison:

    Core Data size Buffer size IAR optimization code data r/w data
    Cortex M3 16 4096 None 21704 56936 25600
    Cortex M3 16 4096 Low 21700 56936 25600
    Cortex M3 16 4096 Medium 21700 56936 25600
    Cortex M3 16 4096 High balanced 21700 56936 25600
    Cortex M3 16 4096 High speed 21700 56936 25600
    Cortex M3 16 4096 High size 21700 56936 25600
    Cortex M3 32 4096 None 37080 97848 50176
    Cortex M3 16 1024 None 21704 56936 7168
    Cortex M4, FPU 16 4096 None 19514 56938 25600

    The first row is used as reference, the rest of the rows are a modification respect that.

    As you can see, there is not a high impact when the optimization is changed. I guess it is due to the fact that this library is highly optimized.

    The data size is a parameter that affects to the size, both, code and data.

    The number of samples (buffer size) only change the data size, not the code, as expected.

    And finally, a core with FPU can reduce the size significantly. This is because this core has more assembler instructions available for DSP tasks, so you need less code to do the same due to the hardware support.

    The code tested was:

    //#define __FPU_PRESENT  1

    #include <arm_math.h>
    #include <math.h>
    #include <stdio.h>
    #include <stdint.h>
    #include <arm_const_structs.h>

    #define BUF_SIZE        4096

    typedef q15_t data_FFT_t;

    static data_FFT_t input[BUF_SIZE] = {0};
    static data_FFT_t output[2*BUF_SIZE] = {0};

    void main(){
      arm_status status;
      arm_rfft_instance_q15 S;
      status = arm_rfft_init_q15(&S, BUF_SIZE, 0, 1 );
      arm_rfft_q15(&S, input, output);

    As you can see, all those functions are quite heavy in terms of memory footprint. If you want to take a deep look to this, those functions can be located in C:\Program Files (x86)\IAR Systems\Embedded Workbench 7.5\arm\CMSIS\DSP_Lib\Source\TransformFunctions, opening the init functions you can see some huge constant coefficient arrays, which are in part responsible of this size. Therefore, I don't think you are doing something wrong.

    About overwriting other memory addresses, have you considered that the output array might need more size than the input? In the example above, if you create a delta signal by adding:

    input[BUF_SIZE >> 1] = 100;

    , you can see how the output array is written until position 8190 (output is double size than input). You also can simulate a sin input signal and compare to matlab or equivalent in order to see the more suitable format for you.

    Best regards

  • Hi ,

    the more common way to use CMSIS-DSP library is just to include the whole library. As each function is one c file, the linker will include only the files needed, the problem is that those files can include a huge amount of data as we have seen. Since the link is to the precompiled files, the size is what it is, there is no much we can do.

    In other cases, when you need something special or you want a newest version not included in IAR yet, one could include the source files instead the precompiled files, such as the BSP drivers or others. In those cases you could prune the library as you comment. If this is your case, I think you are in the right way.

    Best regards.

  • In addition to answer, you can explicitly specify the locations of your big arrays into RAM banks with large memory space to avoid overwriting of other variables or the stack or heap sections.

    You can refer to this link on how to explicit declaration of memory location to variables :
    Flash Memory ADuCM3029 


  • Hi

    Yes I can see that memory footprint is too much, just wanted to be sure if i wasn't mishandling the library from my side, apart from this i was able to solve that memory overwrite issue.

    So, stripping down the library and redefining the array according to my application would be the only way left to fix the size issue?

  • Hi ,

    Thanks for the reply, the issue was fixed as of now. But, the information you shared is really helpful. I think that will be needed in near future. Thanks 

    With Best Regards,

    Jay Shah