AnsweredAssumed Answered

ADSP 21469 FFT accelerator

Question asked by johnatastrium on Mar 3, 2010
Latest reply on Mar 8, 2010 by DivyaS

I am thinking about how to use the FFT accelerator on this chip. Our customer wants a 4096-point FFT benchmarking, and as a benchmark it has FFTs only running and nothing else. But there seem to be some issues in practice, and I'm wondering if anyone has any thoughts. I should say first that I am a moderate novice.


We've had the cfftf DSP library routine working, which as I understand it is the highest speed-performance DSP library routine (as opposed to cfft4096), and that uses the core resources. If I want to max out the DSP, I should try to use the accelerator. But the accelerator on its own would actually give me (fractionally) lower performance, because it is using half the clock rate, and then you have to add in extra memory-access latencies for DMA to/from the accelerator. In other words, the accelerator is intended to offload the core for tasks of other types, rather than speed-boost.


So I look at whether I could use both core & accelerator simultaneously to boost the performance.


But for FFTs larger than 256, there is substantial core intervention complexity (if not MIPS) setting up all the DMAs in chain of 256-point FFTs. And what's more, when it does it, it's going to consume a non-zero chunk of the core's memory bandwidth. So if I wanted to ping-pong 4096-point FFTs between the two, I find it difficult to believe that this actually going to work. Again not least because there will have to be a constant stream of interrupts to the core 4096-point FFT every time the accelerator needs 3 sets of DMAs setting up, every 256 points, and it isn't clear how interruptible the core optimised DSP libraries will be.


And there doesn't seem to be any one-shot library routine to use the accelerator, I have to code all the stuff in the hardware reference manual myself, and get it right (not being very experienced at such things)


And the "ping-pong" between the two types of processing isn't going to be 1:1 or even deterministic, so I'd need some clever component to dynamically schedule between the two, and gather the results back up again in-order.


Am I missing something / over-worrying? Or has someone clever already got a chunk of code somewhere that does all this?


Thanks in advance if someone can help me