FFT Accelerator on ADSP-SC58x/ADSP-2158x is a completely new design as compared to the FFT accelerator on ADSP-214xx processors. The major differences/features of the FFT accelerator on ADSPSC58x/ADSP-2158x are:
Supports both complex and real FFT and IFFT operations.
Supports 64 to 2048 points in small FFT mode and 4096 to 4 M points in large FFT mode.
Supports the IEEE-754/854 single-precision floating-point data format, round to even Radix-4 butterfly efficiency at a radix-2 (integer power of two) point granularity
Automatic insertion of zeros for real FFTs
Supports automatic conjugating of the twiddle factors for IFFT
Supports automatic scaling of FFT and IFFT inputs.
Hardware support for windowing
Hardware support for magnitude squared FFT output
Hardware support for pipelined data flow
Dedicated high speed DMA engines for data load and dump with a data width 64-bit clocked by SYSCLK supports data and coefficient access from both on-chip (L1/L2) and off-chip memories (L3)
Optional support for bypassing the compute engine to perform high speed memory to memory MDMA transfers
Compute engine clock division options for power reductions, supports 1:1, 1:2, 1:4 and 1:8 clock ratio modes
I am trying to find out how many cycle counts (MIPS) the single shot accelerator takes to compute the FFT when I am placing the data in different memory blocks (L1, L2 or SDRAM), and also when I am enabling the cache. What I am seeing is when I am enabling the L1 cache, that time the cycle counts taken to compute single shot accelerator FFT is more, compared to when I am disabling the cache. I want to know why this is happening?
Any information on this will be of great helpful.
Where can i get the FFT accelerator benchmark numbers ?