I written some code on ARM, where I'm having all my datatypes are in double. while i tried to enable NEON engine to optimize the code, I didn't able to get the double precision calculation. Please kindly give me any solution to optimize the following set of code efficiently under cortex A5. The following code already enabled with VFP instructions by the CCES.
register double sum, *bandPtr, *bufOffsetPtr, filter;
for (i=0; i<64; i++)
sum = 0;
for (k=0; k<32; k++)
sum += bandPtr[k] * filter[i][k];
bufOffsetPtr[i] = sum;
How can i optimize this loop, Thanks in advance.
Thanks & Regards,