Question:
What is the cause of precision difference when using software library(Core) and hardware library(Hardware accelerator)?
Answer:
>> Floating-point computations are sensitive to changes in summation order which can cause small differences in rounding of intermediate values.
Depending on which fir library routine using, it may be using SIMD (which splits the summation into two parallel partial sums which are recombined at the end), or it may be using 40-bit floating-point arithmetic (the hardware accelerator uses 32-bit floating-point). The hardware accelerator may also use a different summation order to maximize parallel computations.
It is important to note that a different summation order may change the exact results of the filter but different summation orders are, in general, qualitatively as good as each other so there is no reason to suspect that either the accelerator or library results are better than the other.