AnsweredAssumed Answered

Why is assembly dot product slower than manually developed C code?

Question asked by pereira on Nov 17, 2017
Latest reply on Dec 7, 2017 by Jithul_Janardhanan

Hi.

I've found this code in the documentation, for the dot product in assembly :

---

/* dot(int n, dm float *x, pm float *y);
Computes the dot product of two floating-point vectors of length n. One is stored in dm and the other in pm. Length n must be greater than 2.*/

#include <asm_sprt.h>


.section/pm seg_pmco;
 
.GLOBAL _dotASM;
_dotASM:
 
leaf_entry;
 
r0=r4-1,i4=r8; /* Load first vector address into I register, and load r0 with length -1 */
 
r0=r0-1,i12=r12; /* Load second vector address into I register and load r0 with length-2 (because the 2 iterations outside feed and drain the pipe */
 
f12=f12-f12,f2=dm(i4,m6),f4=pm(i12,m14); /* Zero the register that will hold the result and start feeding pipe */
 
f8=f2*f4, f2=dm(i4,m6),f4=pm(i12,m14); /* Second data set into pipeline, also do first multiply */
 
lcntr=r0, do dot_loop until lce; /* Loop length-2 times, three-stage pipeline: read, mult, add */
 
dot_loop:
   f8=f2*f4, f12=f8+f12,f2=dm(i4,m6),f4=pm(i12,m14);
   f8=f2*f4, f12=f8+f12;
   f0=f8+f12;
/* drain the pipe and end with the result in r0, where it’ll be  returned */
 
leaf_exit; /* restore the old frame pointer and return */

 

_dotASM.end:

---

I wanted to compare this code with a manually developed C code for the dot product :

---

float dotC(int n, dm float *x, pm float *y) {
    int i; float z = 0.;
    
    for(i = 0; i < n; i++) {
        z += x[i]*y[i];
    }
    
    return z;
}

---

The results are the same, but the dotC is twice more faster than dotASM. I don't understand, It seems that the ASM function have way less instructions than the C one. Can you explain why the C one is faster?
You can have a look at my attached project.

Attachments

Outcomes