AnsweredAssumed Answered

Why is assembly dot product slower than manually developed C code?

Question asked by pereira on Nov 17, 2017
Latest reply on Dec 18, 2017 by pereira


I've found this code in the documentation, for the dot product in assembly :


/* dot(int n, dm float *x, pm float *y);
Computes the dot product of two floating-point vectors of length n. One is stored in dm and the other in pm. Length n must be greater than 2.*/

#include <asm_sprt.h>

.section/pm seg_pmco;
r0=r4-1,i4=r8; /* Load first vector address into I register, and load r0 with length -1 */
r0=r0-1,i12=r12; /* Load second vector address into I register and load r0 with length-2 (because the 2 iterations outside feed and drain the pipe */
f12=f12-f12,f2=dm(i4,m6),f4=pm(i12,m14); /* Zero the register that will hold the result and start feeding pipe */
f8=f2*f4, f2=dm(i4,m6),f4=pm(i12,m14); /* Second data set into pipeline, also do first multiply */
lcntr=r0, do dot_loop until lce; /* Loop length-2 times, three-stage pipeline: read, mult, add */
   f8=f2*f4, f12=f8+f12,f2=dm(i4,m6),f4=pm(i12,m14);
   f8=f2*f4, f12=f8+f12;
/* drain the pipe and end with the result in r0, where it’ll be  returned */
leaf_exit; /* restore the old frame pointer and return */




I wanted to compare this code with a manually developed C code for the dot product :


float dotC(int n, dm float *x, pm float *y) {
    int i; float z = 0.;
    for(i = 0; i < n; i++) {
        z += x[i]*y[i];
    return z;


The results are the same, but the dotC is twice more faster than dotASM. I don't understand, It seems that the ASM function have way less instructions than the C one. Can you explain why the C one is faster?
You can have a look at my attached project.