Working with Sharc-SC584.

I am running the core FIR function and FIR accelerator, wanting to compare them timewise.

When I run the **fir** (uiInputBuff1[ii],uiCoeffBuff, delayArray, 100); I get the cycle count of 191

When I configure the FIR accelerator I get 100 times more. Which means I am not configuring right

What should be the way to configure the one channel, **Sample based** processing mode TCB for the FIR accelerator ?

I use window = 1 , for the Sample based processing mode.

This is what I have for configuration right now, based on the example I downloaded.

FIRA_TCB1[0]=0;//((int)(FIRA_TCB3+12)>>2)|0xA000000; //CP

FIRA_TCB1[1]=TAP_LENGTH1; //CL

FIRA_TCB1[2]=1; //CM

FIRA_TCB1[3]=((**int**)CoeffBuff1>>2)|0xA000000; //CI

FIRA_TCB1[4]=((**int**)OutputBuff1>>2)|0xA000000; //OB

FIRA_TCB1[5]=WINDOW_SIZE1; //OL

FIRA_TCB1[6]=1; //OM

FIRA_TCB1[7]=((**int**)OutputBuff1>>2)|0xA000000; //OI

FIRA_TCB1[8]=((**int**)InputBuff1>>2)|0xA000000; //IB

FIRA_TCB1[9]=TAP_LENGTH1+WINDOW_SIZE1-1; //IL

FIRA_TCB1[10]=1; //IM

FIRA_TCB1[11]=((**int**)InputBuff1>>2)|0xA000000; //II

FIRA_TCB1[12]=(TAP_LENGTH1-1)|(WINDOW_SIZE1-1)<<14; //FIRCTL2

Hi,

Please note that sample based processing in the FIR accelerator is expected to be slow because of the following fact:

"The FIR accelerator preloads the delay line with TAPS-1 samples and coefficients for each processing iteration using DMA. Thus, for sample based processing, even though the compute engine can work faster, the DMA overheads dominate the total number of CCLK cycles required for processing one iteration per channel"

Thus, what you see is kind of expected as per the FIR accelerator architecture. I am attaching both the theoretical and measured numbers for the FIR accelerator for various values of TAP and WINDOW. As you might see, for a block based processing the DMA overheads tend to become negligible and the over all performance improves significantly (around 1 CCLK cycles per TAP, per sample).

Hope this helps.

Thanks,

Mitesh