Question asked by johnatastrium on Mar 18, 2010

Hi all,

I'm trying to work out how the ADSP-21469 performs parallel operations in terms of its execution units. First off, I think I am correct in saying that it can ONLY execute two multiplications (or additions) per clock if it is in SIMD mode - i.e. it isn't two arbitrary multiplications per clock? And that if I am calling this from C, the way to do this is to flag it typically with #pragma SIMD_for. Without that, it just doesn't happen.

Now, I can't use functions calls within SIMD_for. But does  the compiler remove that restriction if I flag the function as "inline" - in other words, when is it inlined

The functioning of SIMD_for as I understand it, is that it performs an operation on an address, and address + 1, and only that? And the compiler cannot work out structs, even if the writer knows that this would be OK. This is relevant to me, because I have lots of code on complex variables, where

typedef struct {float re;float im;} complex_float;

The address, address + 1 constraint knocks out the ability to do, for example complex-by-complex multiplications in SIMD_for ? If I wanted to do that, I need to interleave my data even-odd in address-space beforehand to do two in parallel, is that correct?

And the fact that it can't handle structs, means that I have to re-code, to add two complex numbers, even though in principle I am doing out[0] = in1[0] + in2[0], out [1] = in1[1] + in2[1] etc

As another example, not

#pragma SIMD_for

For (i = 0;i<300;i+=3)

{

x[i] = y[i] + y[i+1] + y[i+2]

x[i+1] = 3*y[i] + y[i+2] + 2*y[i+3]

x[i+2] = y[i] -y[i+1]

}

But I need to find a way to re-cast these expressions to find intermediate expressions which are address-consecutive? Probably by first interleaving the input data 0,3,1,4,2,5,6,9,7,10......, and then de-interleaving the output again.

Thanks if someone can confirm my understanding before I put lots of effort into re-coding stuff