Hey!
I'm trying to install a function that can do the simple dot product. Similar to ADI fir() function, I have an array coeffs[] to save my filter coefficients and another array delay[] to save my input samples. delay[] works as a circular buffer to save input samples and keep updating. One thing special in my case is that, the buffer size is 1. So every now and then, the delay[] will start from an odd address. According to your document, SIMD wouldn't work for odd addresses. But we have the need to leverage SIMD in our application to make it more efficiently.
I searched through EngineerZone and people talk about the trick to leverage SIMD is to create delay[] with the size to be the length of the filter + 1. For example, if coeffs[LEN], delay[] will be the size of LEN+1. And we could save the value in delay[0] to delay[LEN]. I've run experiments like this and added the #pragma SIMD_for. SIMD works as expected. But this is undefined performance and it might change with different compilers. I will attach my code here. But I'm interested to know, is there C code to get the same results but without relying on this undefined behavior?
Thanks in advance!
inline float processFir(float input, const pm float * coefs, float dm * delay, int len) { // Read delay line index from the last element of delay line array int index = (int)delay[len]; int index_prev = index; delay[index] = input; // Feed the latest input value to where the index points to index = circindex(index, 1, len); // coefs[0] stores the filter coefficient for the oldest value in delay line // Increase index by 1 for d[index] to get the oldest value in delay line delay[len] = delay[0]; // Update the last element of delay line with the first element of delay line for SIMD to work with odd address float dm * d = &delay[0]; float sum = 0.0f; #pragma SIMD_for for(int n=0;n<len;n++) { sum += coefs[n] * d[index]; index = circindex(index, 1, len); } delay[len] = circindex(index_prev, 1, len); // Increase index_prev by 1 to be ready for the next input sample return sum; }