I have ADSP 21368 based custom board How Can I do multipoint FFT on data (32K) sitting on External SDRAM? Is it possible to do it without copying it in internal memory?
Thanks in advance.
You cannot do multipoint FFT directly from external memory as SIMD mode is not supported for external memory. Please refer to the restrictions on External memory given on page 3-78 of the hardware reference manual at the following link.
That being said you could do FFT in non SIMD mode accessing single data point from external memory at a time for computation without copying it into internal memory, as the DAG data registers are capable of accessing external memory directly on 21368 unlike the 21364 and the 21362 processors. You could refer to the CFFT_nonsimd.asm available in the tools for this which implements radix-2 FFT. If you are planning to write your own code for FFT from external memory please keep in mind the restrictions on external memory on page 3-78 during implementation.
I've had to do this in the past on processors with even smaller internal memory (like 2k words). You'll need to do a two dimensional decomposition of the FFT. Essentially, you'll break the 32k-point FFT into 128- and 256-point FFTs. You'll need to move the shorter blocks of data into internal memory, FFT them, and then move them out. Since the main FFTs occur in internal memory, you can use the full capabilities of SIMD. Also, by doing this decomposition, you don't need to allocate 32kwords of internal storage. A good description can be found in the book "DFT, FFT, and Convolution Algorithms" by Burrus and Parks.
I have a similar wish to the original post: to perform an FFT (say 16384 point) on data held in SDRAM connected to a Sharc 21369.
Due to the limited internal memory on the 21369, I am in need of a FFT routine for which the input data, output data, and twiddle table are all held in SDRAM. Is there a library function for this?
I am finding that:
1) The non-SIMD version of cfftN() from filter.h requires N words of dmda at the link stage (presumably to hold its twiddle table?). The max size of FFT I can use is therefore limited by how much internal dmda memory I can make available (not enough...)
2) The non-SIMD version of cfft(N) from filter.h requires a length N twiddle table held in pmda. Again, the max size of FFT I can use is therefore limited by how much internal memory I can make available, this time pmda.
3) cfftN() from trans.h seems to require 0.75N words of pmda at the link stage, presenting similar problems to the filter.h functions.
Despite having oodles of SDRAM available, with only the above functions available, the maximum size of FFT I can perform is still limited by internal memory!
A faster solution is presumably to divide the large FFT into smaller ones that can all be computed in internal memory and use SIMD, as mentioned in a previous reply to this thread. Is there a library function for this?
The basic algorithm is known as the mixed-radix FFT algorithm. Information can be found at:
The large FFT is broken into a number of smaller algorithms which can execute efficiently from internal memory. I don't know of any commercial implementations of the algorithm on the SHARC. We've done related work on a consulting basis for other customers. Its a couple of days of effort.
I also would like to compute a large FFT (32k, 64k) with the new 21469. In Visual DSP I can find the library functions
complex_float *cfft65536 (complex_float dm input, complex_float dm output);
complex_float *cfft32768 (complex_float dm input, complex_float dm output);
which should do perform large FFTs, but in my program they always cause a memory error. After reading this thread, I believe that it is not very promising to play around with the lDF file. Is it right that cfft32768 cannot be used by today processor? Are there any other library function, which allow the computation of large FFTs in SIMD or SISD?
I put together some slides describing how the 2-dimensional decomposition of the FFT works. I also provided example MATLAB code showing how to do it in practice. The approach lets you decompose large FFTs into a series of smaller transforms each of which can fit in internal memory. Take a look at the PPT.
If someone is brave enough, and has the time, can you turn this into C/ASM that leverages the existing complex FFT functions for the SHARC? I've done the theory. I need some help with the implementation. Having an efficient large FFT function on the SHARC would sure be handy.
Retrieving data ...