# 2010-12-23 01:53:55     How to transform 32bit*32bit multiply C code to ASM code

Document created by Aaronwu on Oct 16, 2013
Version 1Show Document

2010-12-23 01:53:55     How to transform 32bit*32bit multiply C code to ASM code

Bill Xie (CHINA)

Message: 96962

Hi,

I want to transform c code to asm code to promote efficiency.

C code:

static inline long mul(long x, long y)

{

short ah, bh;

unsigned  short al, bl;

int result;

ah=(x >> 16);

al= (x & 0xffff);

bh= y >> 16;

bl= (y & 0xffff);

result=(long )(ah*bh);

result+=(long )(ah*bl)>>16;

result+=(long )(al*bh)>>16;

return result;

}

static inline long muls(long x, long y)

{

short ah, bh;

unsigned  short al, bl;

int result;

ah=(x >> 16);

al= (x & 0xffff);

bh= y >> 16;

bl= (y & 0xffff);

result=(long )(ah*bh)<<1;

result+=(long )(ah*bl)>>15;

result+=(long )(al*bh)>>15;

return result;

}

C code with asm code:

static inline long mul(long a, long b)

{

long  result;

__asm__ (

"A1 = %1.H * %2.L;\n\t"

"A1 = A1 >> 16;\n\t"

"A1 += %1.H * %2.H, A0 = %1.L*%2.H;\n\t"

"A0 = A0 >>16; \n\t"

"%0 = (A0 += A1);\n\t"

:  "=d" (result)

: "d" (a), "d" (b)

: "A0", "A1"

);

return result;

}

static inline long muls(long a, long b)

{

long  result;

__asm__ (

"A1 = %1.H * %2.H;\n\t"

"A1 = A1 << 1;\n\t"

"A0 = %1.L * %2.H; \n\t"

"A0 += %1.H * %2.L;\n\t"

"A0 = A0 >>15;\n\t"

"%0 = (A0 += A1);\n\t"

:  "=d" (result)

: "d" (a), "d" (b)

: "A0", "A1"

);

return result;

}

I try many time to transform the c code to asm code, the multiply result from asm code is wrong. please help me to solve the problem. Thank you.

2010-12-23 11:41:25     Re: How to transform 32bit*32bit multiply C code to ASM code

Mike Frysinger (UNITED STATES)

Message: 96972

you can see what gcc generates by running -S on the .c file.  then you may tweak the result yourself.

the accumulator is only 40bits, so you wont be able to represent a 64bit result in it.

2011-01-14 04:58:17     Re: How to transform 32bit*32bit multiply C code to ASM code

Steve Kilbane (UNITED KINGDOM)

Message: 97455

I see no reason for doing this, since the processor includes a 32-bit integer-by-integer multiply that takes three cycles, which is less than you have in your asms.

It's possible to improve performance by breaking up the instruction into 16-bit multplies as you've got, but that only helps if you can schedule other instructions at the same time, or if you can omit some of the steps because you know one of the values has a zero high or low half. I don't see either of those happening in this case, with asms (unless GCC is particularly clever with the contents of asm statements).

steve