We're benchmarking the performance of the ADSP-24169 processor, and the results we get show that the DSP algorithm execution is slower than we were expecting.
To sive a specific example, we are running the FIR algorithm with 64 taps.
Running the FIR64 benchmark, the DSP should be executing 1 multiply-accumulate in SIMD mode, which means 2 taps per clock cycle. With 64 taps, this should take 32 clock cycles per output-sample. In addition, there should be a 1 clock-cycle int->float conversion on the input, and 1 clock-cycle float->int conversion on the output.
We operate on 1024 samples in a block, making 1024 x (32+2) = 34816 clock-cycles => 87 microseconds (assuming 400,000,000 clock cycles per second.)
However we measure an average execution time of 62257 clock cycles => 156 microseconds
I've attached our test application project, which when run will execute a test benchmarking this algorithm.
The benchmark is executed in the DspMain thread by function dofn.
Have you any suggestions as to what is causing the slow down?