Error li1040 for large fft sizes

I'm on CCES 2.10 with an SC589 EZ-Kit board. I'm compiling a tiny example program that just puts a sine wave through the FFT accelerator and I noticed I'm getting a linker error for large values of NFFT (>8192)

#include <sys/platform.h>
#include <sys/adi_core.h>
#include "adi_initialize.h"
#include "FFT_accel_test_Core1.h"
#include <stdio.h>

#include <math.h>
#define M_PI 3.14159265358979323846

#include <adi_fft_wrapper.h>
#include <complex.h>

#define NUM_POINTS 16384

#pragma align 32
float input[NUM_POINTS];
#pragma align 32
complex_float output[NUM_POINTS];

int main(void)
{
  int i;
  adi_initComponents();

  for (i=0; i<NUM_POINTS; i++)
    {
      input[i] = cos(2*M_PI*i*10/NUM_POINTS);
    }

  accel_rfft16384(input, output);

  for (i=0; i<NUM_POINTS; i++)
    {
      printf("%f %f\n",output[i].re, output[i].im);
    }

  return 0;
}

Errors:
li1040: Out of memory in output section 'dxe_block0_data_prio2_bw' in processor 'SC589_CORE_1' app.ldf /FFT_accel_test_Core1/system/startup_ldf
li1040: Out of memory in output section 'dxe_block0_data_prio3_bw' in processor 'SC589_CORE_1' app.ldf /FFT_accel_test_Core1/system/startup_ldf
li1040: Out of memory in output section 'dxe_block1_data_prio2_bw' in processor 'SC589_CORE_1' app.ldf /FFT_accel_test_Core1/system/startup_ldf
li1040: Out of memory in output section 'dxe_block1_data_prio3_bw' in processor 'SC589_CORE_1' app.ldf /FFT_accel_test_Core1/system/startup_ldf
li1040: Out of memory in output section 'dxe_block2_data_bw' in processor 'SC589_CORE_1' app.ldf /FFT_accel_test_Core1/system/startup_ldf
li1040: Out of memory in output section 'dxe_block3_data_bw' in processor 'SC589_CORE_1' app.ldf /FFT_accel_test_Core1/system/startup_ldf
li1040: Out of memory in output section 'dxe_l2_data_bw' in processor 'SC589_CORE_1' app.ldf /FFT_accel_test_Core1/system/startup_ldf

How and why am I getting these errors? This example seems so simple and it's as if I can't use the larger FFT options that are made available in the library manual.

  • Follow-up question that I had after submitting this question. Why do most of the accell_rfft examples in the library manual show both the input and output arrays pragma aligned when the description section always says "The output array must be aligned to the cache line size..."? It never mentions the alignment of the input array.

  • 0
    •  Analog Employees 
    on Aug 31, 2021 11:32 AM

    Hi,

    Please note that the error message "out of memory" (li1040) is reported when the linker is unable to map all the input sections into the named output section.

    16K and 32K FFT implementation using the Accelerator block does not fit in the internal memory of ADSP-SC58x processor. The accelerator block supports only up to 8k FFT points.
    To generate larger FFTs, you may have to make use of external memory. So please enable external memory option in your project to avoid out of memory linker error.

    As your project exhaust the available internal memory, you would need to look at making use of SDRAM, if your target has external memory. Select Use external memory (SDRAM) to enable the size and partitioning controls under system.svc > Startupcode/LDF > LDF > External Memory

    Also we would recommend to enable Linker Elimination via  Project->Properties->C/C++ Build->Settings->CrossCore SHARC Linker->Elimination->Eliminate Unused Objects. When elimination is enabled, the Linker discards any objects (*.doj) within the library that do not contain any relevant code or data (i.e. the application makes no reference to any of the data or code symbols within that object). The linker can then examine the objects that are required for the link, and extract only those symbols required to resolve references from the application code, further stripping out unused data.

    Another option is enabling 'Individually map functions and data items' via 'General' tab of the CrossCore SHARC Linker options.When enabled, directs the linker to fill in fragmented memory with individual data objects that fit. When this option is selected, the default behavior of the linker (to place data blocks in consecutive memory addresses) is overridden.

    By enabling the 'Generate symbol Map' option under the 'General' options for the Linker. This will produce a "project_name.map.xml" file in 'debug' folder the project that can be opened in Internet Explorer. It will show all your memory sections, and how much free/unused space there is.

    Regarding Allignment:
    For the best performance to be achieved, it is needed for all the arrays must be aligned to at least a 32 byte boundary. "#pragma align 32" is used in a source file, that the block of data in the corresponding object file (which contains all the variables) will be 32-byte aligned.

    For all large FFTs (i.e. where the number of points exceeds 2048) require any input or output buffer of complex data to be aligned to at least an 8 byte boundary. If a buffer which does not meet this requirement is passed to the function, an error occurs and the function will return NULL.

    Large FFT's require a DMA transfers, a larger DMA transfer size (resulting in faster execution) will be used if the function is passed aligned input and ouput buffers, up to alignment on a 32-byte boundary. If there is a variables that is set 32byte alignment,  the whole of the block is aligned at the 32byte boundary,

    You can refer the below CCES help path for more details:
    CrossCore® Embedded Studio 2.10.0 > SHARC® Development Tools Documentation > C/C++ Compiler Manual for SHARC® Processors > Compiler > C/C++ Compiler Language Extensions > Pragmas > Data Declaration Pragmas > #pragma align alignopt
    You can find more information about, recommendations regarding the alignment of data buffers when using the SHARC+ FFTA Accelerator under section "Data Buffer Alignment" available at the following link,
    https://www.analog.com/media/en/dsp-documentation/software-manuals/cces-SharcLibrary-manual.pdf

    Regards,
    Santhakumari.K