Greetings. I've implemented a FIR filter in assembly language to be called from C. This is the prototype:
int16 fir(int16 new_sample, const int16 coefficient, int16 delay, int16 num_taps);
Nothing fancy here: each new sample is fed into the FIR as it arrives, and the output is used immediately. The assembly routine is pretty simple: a zero-overhead loop is used with a circular buffer for the delay array (state of circ. buffer is saved off as well). The MAC is used as follows:
A0 += R0.L * R1.L (IS) || R0.L = W[I1++] || R1.L = W[I2++];
Excluding overhead, the FIR takes one cycle per tap. This works fine, but I'd like to be able to cut that in half and take advantage of the Blackfin's dual MACs; perhaps something like this:
A0 += R0.L * R1.L, A1 += R0.H * R1.H (IS) || R0 = [I1++] || R1 = [I2++];
This would accumulate even taps in A1 and odd taps in A0, and run twice as fast as the original routine. The only problem with this is that after the first sample is added to the delay array, the circ. buffer gets adjusted by 2 bytes, which then means that a 32-bit access through the circ. buffer won't be properly aligned.
Am I missing something about Blackfin programming? I'm absolutely brand new to Blackfin, so it wouldn't surprise me! I've looked at the fir tutorial example, which would work fine except that it calculates two outputs per FIR loop, requiring two inputs at a time. I'm looking to get a filtered output as soon as possible after receiving a new sample.
Thanks for any ideas on this.