Post Go back to editing

Not reaching 8 MACs per cycle in SHARC-FX core processing

Category: Hardware
Product Number: ADSP-SC835

Hello,

we are working on FIR filtering in the SHARC-FX Core, and after benchmarking different FIR functions available in the DSP library we found that none of them seems to reach the theoretical maximum of 8 MACs per cycle. 

We prepared a simplified project (attached) to verify in which conditions the processor can reach this efficiency

  • hardware: ADSP-SC835W-EV-SOM + breakout board, running at 1GHz, CCES 3.0.2. Measuring performance with GPIO toggles (their delay already considered) and CYC counters
  • project built in Release configuration,all optimizations enabled
  • firMacTest() executes 72 iterations of PDX_MULA_MXF32 (not FIR filter, just MACs), all signals located in L1 memory, aligned 
  • we are expecting that 72 iterations of PDX_MULA_MXF32 take approximately 72 cycles, achieving 8 MAC of 32 bit floats per cycle. But in reality the 72 iterations are taking in average 90 to 95 cycles, which translates approximately to 6 MAC of 32 bit floats per cycle
  • in our complete scheme we are using a modified version of adi_s1fir_fastf, and overall we are getting around 5 MAC per cycle, which is understandable, that's why we arrived to test specifically the simplest possible case of looped MACs with PDX_MULA_MXF32

Could you give us any hint on why this could be happening, or if indeed it is the expected behaviour? Is there a limitation that we are not considering here? 

Thanks,

Leopoldo

1157.8MAC_Core1.rar

Thread Notes

Parents Reply Children
No Data