21469 FFT library speed

ADI's benchmarks, and comments made in this forum, indicate cfftf should take 11071 cycles for a 1024-point FFT, assuming the real data and the imaginary data are in separate memory banks, and the FFT in/out array is 1024-word aligned. But my testing on a 450 MHz 21469-based platform indicates cfftf takes about 29000 cycles (based on checking the EMUCLK register before and after).

Other than audio DMAs (4 channels, 48000 Hz) there is very little else that would be using memory bandwidth. My executable code is in internal memory.

Are there any other restrictions (eg. caching, or in which block the code resides) that might affect the timing by such a large factor?