I have finally got my prototype application up and running on a 21469 evaluation board.
I am currently running at 96kHz. This is the profile (release build):
If one could squeeze some more juice from fir() and fir_interp() 192 kHz is not far away!
Both functions use an internal delay line (parameter state in fir function and delay in fir_interp function). This means they must copy the current data into the delay line for each iteration. If input was a circular buffer this copying could be avoided (at the expense of circular buffer test in the loop). I believe the circular buffer solution would be faster.
The fir_interp function doesn't seem to be using simd instructions. Implementing this function with simd would save about 10% processing time in my application.