I need to perform a lookup translation of 5M 10-bit values (in 16-bit words) utterly as quickly as possible. That, of course, calls for an assembly language implementation of this C code. This is my first project using Blackfin Assembler so some assistance will be appreciated. Is this the fastest way for me to do this? Later, I'll ask about how to alternate chunks through L2 cache while this munches on the other chunk. Thanks!
extern unsigned short LookupTable;
void Translate( unsigned short *pIObits, int iLength )
for( i = 0; i < Length; i++ )
IObits[i] = LookupTable[ IObits[i] ] ;
Below is my Assembly code version but the Assembler complains
"Preg read after write which requires 4 extra cycles" at the assignment into R0.H.
P2 = R1; // Length
I0 = R0; // pIObits
P3.L = _LookupTable
P3.H = _LookupTable;
R2.H = 0; // Clear high word
M0 = 2;
// Translating two words per iteration
LSETUP( LoopTop, LoopBottom) LC0 = P2 >> 1;
R0 = [I0]; // Load two 10-bit values in separate words
R0 <<= 1; // Use each as an index into a word table
R1 = PACK( R2.H, R0.L); // Sample N+1
R0 = PACK( R2.H, R0.H); // Sample N
P1 = R1;
P0 = R0;
R0.H = W[P0 ++ P3]; // Preg read after write which requires 4 extra cycles
R0.L = W[P1 ++ P3];
[I0++M0] = R0; // Store translated words
Is this the best, fastest way to do this? What can I do to get rid of the warning?
Advice and suggestions will be appreciated!
Bruising my nose on the learning curve...