2010-10-13 08:10:06     Can't get SRC algorithm to fly

Document created by Aaronwu Employee on Aug 23, 2013
Version 1Show Document
  • View in full screen mode

2010-10-13 08:10:06     Can't get SRC algorithm to fly

Daniel Persson (SWEDEN)

Message: 94465   

 

Hello

 

I am using uClinux with a BF526 on a custom board, with the core clock at 226 MHz. I need a sample rate converter on my system, so I found a VDSP++ src implementation by Jeff Sondermeyer described in EE183. On the osdir.com forum I found a rewriting of the code to compile with the gcc toolchain (  osdir.com/ml/linux.hardware.blackfin.kernel.devel/2008-05/msg00016.html). The src is working, but the problem I have is that I get really bad MIPS (73 to 102 MIPS when converting from 44100 to 48000 at 16 bit!!!) so I guess something must be wrong.

 

The src was initially developed for BF53x. Is it a problem that I am using a BF526? Are there any conciderations that has to be made due to that?

 

I have enabled the caches in the uClinux setup and the conversion buffers are located in l1 ram.

Blackfin Processor Options  --->

[*] Enable ICACHE

[*] Enable DCACHE

 

Are there anything special that needs to be taken into consideration when trying to get an algorithm to perform well under uClinux on Blackfin?

 

Are there any alternate open source src implementations for uClinux on Blackfin?

 

The attached archive contains the source files that are also listed below, plus some other files that just contain filter constants and input data constants.

 

I am using the following versions (checked out from blackfin.uclinux.org svn server replica):

TOOLCHAIN_VERSION=tags/toolchain_09r1.1_rc2

LINUX_KERNEL_VERSION=tags/2009R1.1-RC4

UCLINUX_DIST_VERSION=tags/2009R1.1-RC4

 

 

Kind Regards, Daniel Persson

 

 

Here is the source i'm using:

 

Makefile:

 

KERNELDIR      := ../../external/linux-kernel/modified/linux-2.6.x

TFTPBOOT       := /var/lib/tftpboot

PWD            := $(shell pwd)

ARCH           := blackfin

CROSS_COMPILE  := bfin-linux-uclibc-

EXTRA_CFLAGS   := -O3 -I$(KERNELDIR)/arch/blackfin/include -I$(KERNELDIR)/include

 

TARGETS        := SRC

 

all: install

 

install: $(TARGETS)

    cp -pf $^ $(TFTPBOOT)

 

SRC: SRC.c src_init.S src_flt.S

    $(CROSS_COMPILE)gcc $(EXTRA_CFLAGS) $^ -o $@

 

clean:

    rm -f *.o a.out $(TARGETS)

 

 

SRC.c:

 

/****************************************************************************

*  File:    SRC.c

*  Date:    Sept 26 2002

*   Created:   Jeff Sondermeyer

****************************************************************************/

 

/*

(C) Copyright 2002 - Analog Devices, Inc.  All rights reserved.

 

File Name:     SRC.c

 

Date Modified: 12/28/2005     Jeff Sondermeyer     Rev 0.5

 

Purpose:    The Sample Rate Converter (SRC) and Main Program Shell was developed using the ADSP-21535 EZ-KIT Lite

Evaluation Platform.  However, with Rev 0.5, I have verified the code works on the BF533 EZKIT and on ALL

Blackfins - BF53x.  I removed the LDF from the project so this code will work "out of box" with just about any

VDSP version.  Note that if the user would like to use the other "precanned" filters and include files please

apply changes to src_xxxtoxxx.h per "IPDC comment" in the active project directory.  I leave this as

an exercise to the user :-)

 

This C shell contains function calls and routines to initialize the state of the BF53x as well as the SRC.

 

This program assumes input data comes from a 16-bit buffer (initialized as 'x' in this shell).  This data is

copied into a 32-bit buffer 'in1' within src_flt.asm.  At the end of src_flt.asm, the last 32-bit

buffer 'inx' (where 'x' is the last stage) is copied into a 16-bit buffer ('y' in this shell).  These

16-bit input/output buffers can be eliminated to conserve data space.  In this case, you will need to

undefine 'BUFIN' and preload 'in1' with 32-bit data and then use the 32-bit output data from 'inx'.

 

The converter was designed to convert between any of the following rates:

48000, 44100, 32000, 22050, 16000, 11025, and 8000.  If you have the SRC program from Momentum Data

Systems you can generate coefficients for any SRC.  Follow #3 below.

 

One might use workspaces within VDSP to verify all necessary plots of the input/output stages as well in the

intermediate buffers.  You can look at the data in the time domain or apply the built-in FFT plotting

function to analyze  the frequency domain.  Load "*.vdw" for the example SRC.

 

I have generated a "SINE_xxxxx_16bit_1024.dat" input file to test every SRC.  This is a 16-bit, 1024-sample, 1KHz

sine wave at the input sample rate.  These were generated using MATLAB (see 'gen_sine_wave_comma_16.m').

It's easy to verify proper conversion by counting samples in one period at both the input rate (in the 'x'

plot) and the output rate (in the 'y' plot) in workspace #2.

 

Notes:         1. You can modify the size of NINPS and NOUTS in each 'src_xxxxtoxxxx.h' file.  However, it MUST be the

same multiple of the GCD.

 

2. Buffer sizes, NINPS and NOUTS must be at least half of the filter coefficient sizes times the INTPx

value to ensure   valid output data.

 

3. Do the following to convert the decimal filter coefficients from Momentum Data Systems SRC *.dsp

file to properly format this data as 32-bit Hexidecimal value.  This is then read into the

corresponding variable at initialization:

a. Use Excel to import the *.dsp file (space delimited).  Select the "D" column and erase

everything else.  Save the file as a "Formatted Text (Space Delimited)(*.prn)" file.

b. Use the MATLATB program "dec_file_to_hex_file_converter.m".  This MATLAB program

will read in decimal (exponential) data from a file (*.prn) and convert to a 32-bit

Hexidecimal format (*.dat file) suitable to be read by VisualDSP within a data

initialization section.

 

4. When 'BUFIN' is undefined, the program assumes that 'in1' is preloaded with 32-bit input data AFTER the

src_init is accomplished (buffer zeroing).  This requires that the shell program preload 'in1' from a 32-bit

source.  Define 'BUFIN' to include the 16-bit buffer transfer code within src_flt.asm. x and y 16-bit

buffers are nice for debug and prototyping but it is just another chunk of memory that is necessary.

 

5. To "zero" out filter delays, use the following equations as offsets to first valid data:

1st Offset = (LENG1-1)/(2*DOWN1)

2nd Offset = INTP2/DOWN2*1st Offset + (LENG2-1)/(2*DOWN2)

3rd Offset = INTP3/DOWN3*2nd Offset + (LENG3-1)/(2*DOWN3)

See the constants generated in the 'src_xxxxtoxxxx.h' files.

 

6. DOFSx (in src_xxxxtoxxxx.h) is the offset and also is the number of valid output data samples.  This will allow

you to figure how often this routine needs to be executed in a block-processed system.  Be careful with this number.

The preprocessor in VDSP will not generate fractional constants.  Therefore, depending on the math here, DOFSx

could have an error of 1.  For a particular SRC, check the first sample in 'y' and adjust the DOFSx accordingly.

 

7. One idea of reducing the number of intermediate buffers is to call a 'zero_buf' function that would rezero

the buffers between filter sections.  This would reduce the number of intermediate buffers to two at the expense

of more MIPs.  However, the MIPs increase would be negligable and is on the order of the size of the buffer.  These

two intermediate buffers should be sized to the maximum needed for any SRC.

 

8. If there is a big interpolation constant, this severely reduces the number of valid data samples in the

final output buffer.  For example, in the 44.1K to 48K case, there is an interpolation constant of 16 in the 3rd stage.

If we only use L1 data sections (max = 4096 bytes) we only get 111 valid data samples in the final output buffer.

However, if I use L2 and make this intermediate buffer as large as 4096 words (16K bytes), I can get a relatively

large number of valid output data samples.  The point here is that.. depending on interpolation constants, the limiting

factor appears to be the L1 section size.  I can maximize all my filters based on this L1 section size (4096 bytes or 1024

words) ...OR..  assume someone can use L2 and make the intermediate buffers larger.  In the later case, the number of VALID

output data samples greatly increases.

 

9. The half band code was not implemented.  Therefore, the HALFB define is not used.

 

10. 11025to16, 16to2204, and 8to11025 produced corrupted data with 3-stage filters.  Had to use 2-stages.  MDS filter

generator produces corrupted 3rd stage output for close sample rate conversions that required up sampling???  Not sure why.

 

11. The latest revision of the code was debugged on a Momentum Systems Hawk PCI board.  All FileIO was done over the PCI bus.

Several things need to change in this code to work with the Hawk board:

a. Define 'HAWK'

b. Add idle.c and the basiccrt.s file for the Hawk board to the project.

 

 

*/

 

/* ------------------------------------------------------------------------ */

#include "fract_math.h"

//#include <defBF533.h>

#include "SRC_inc.h"

#include "src_441to48.h"

#include <stdlib.h>

#include <stdio.h>

#include <string.h>

 

/* ------------------------------------------------------------------------ */

 

// 16-bit input/output buffers

 

 

short __attribute__((l1_data_B)) x[NINPS];

 

short __attribute__((l1_data_B)) y[NOUTS];

 

FILE *inFile, *outFile;

 

// Filter Coefficients

 

 

fract32 __attribute__((l1_data_B)) filter_h1[MLEN1] =

   {

#include "441to48_32bit_flt1.dat"

};

 

#if STAGE>=2

fract32 __attribute__((l1_data_B)) filter_h2[MLEN2] =

{

#include "441to48_32bit_flt2.dat"

};

#endif

 

#if STAGE==3

fract32 __attribute__((l1_data_B)) filter_h3[MLEN3] =

{

#include "441to48_32bit_flt3.dat"

};

#endif

 

/* ------------------------------------------------------------------------ */

 

short sin_data[] =

   {

#include "sin_data.dat"

};

 

static void *init_first_stage( STAGE_ENTRY *V, void *buffers )

{

   V->in_s = buffers;

   V->in_z = SZIN1;

   buffers += SZIN1 * sizeof(int);

 

   V->out_s = buffers;

   V->out_z = SZIN2;

 

   V->h = filter_h1;

   V->plen = PLEN1 - 1;

   V->up = INTP1;

   V->dn = DOWN1;

   V->nis = NINP1;

   V->nos = NINP2;

   V->nshft = SHFT1;

   V->in_c = V->in_s;

   V->out_c = V->out_s;

 

   V->space = V->nis;

   V->available = 0;

   return buffers;

 

}

 

#if STAGE >= 2

static void *init_sec_stage(STAGE_ENTRY *M, void *buffers)

{

   M->in_s = buffers;

   M->in_z = SZIN2;

   buffers += SZIN2 * sizeof (int);

 

   M->out_s = buffers;

   M->out_z = SZIN3;

 

   M->h = filter_h2;

   M->plen = PLEN2 - 1;

   M->up = INTP2;

   M->dn = DOWN2;

   M->nis = NINP2;

   M->nos = NINP3;

   M->nshft = SHFT2;

   M->in_c = M->in_s;

   M->out_c = M->out_s;

 

   M->space = M->nis;

   M->available = 0;

   return buffers;

}

#endif

 

#if STAGE == 3

static void *init_last_stage(STAGE_ENTRY *L, void *buffers)

{

   L->in_s = buffers;

   L->in_z = SZIN3;

   buffers += SZIN3 * sizeof (int);

 

   L->out_s = buffers;

   L->out_z = SZIN4;

 

   L->h = filter_h3;

   L->plen = PLEN3 - 1;

   L->up = INTP3;

   L->dn = DOWN3;

   L->nis = NINP3;

   L->nos = NINP4;

   L->nshft = SHFT3;

   L->in_c = L->in_s;

   L->out_c = L->out_s;

 

   L->space = L->nis;

   L->available = 0;

   return buffers;

}

#endif

 

/* ------------------------------------------------------------------------ */

 

void init_src( FUNDAMENT_DATA_ENTRY *F, STAGE_HANDLE *S )

{

   F->S = S;

 

   F->half_band = HALFB;

   F->up_stage = NUPST;

   F->pivot_stage = PVTFL;

   F->down_stage = NDWNS;

   F->nstages = STAGE;

   F->ninputs = NINPS;

   F->noutputs = NOUTS;

 

   src_init( F );

}

 

#include <bfin_sram.h>

 

FUNDAMENT_DATA_ENTRY *alloc_441to48( void )

{

   void *p = malloc( sizeof(FUNDAMENT_DATA_ENTRY) + sizeof(STAGE_HANDLE)

   + STAGE * sizeof(STAGE_ENTRY)

   /*, L1_DATA_B_SRAM */);

   FUNDAMENT_DATA_ENTRY *F = p;

   STAGE_HANDLE *S = p + sizeof(FUNDAMENT_DATA_ENTRY);

   STAGE_ENTRY *E1 = (void *) S + sizeof(STAGE_HANDLE);

#if STAGE > 1

   STAGE_ENTRY *E2 = (void *)E1 + sizeof (STAGE_ENTRY);

#if STAGE > 2

   STAGE_ENTRY *E3 = (void *)E2 + sizeof (STAGE_ENTRY);

#endif

#endif

 

   void *buffers = malloc( (SZIN1 + SZIN2

#if STAGE > 1

            + SZIN3

#if STAGE > 2

            + SZIN4

#endif

#endif

             ) * sizeof(int) /*, L1_DATA_A_SRAM */);

 

   F->input = buffers;

 

   S->V = E1;

   buffers = init_first_stage( S->V, buffers );

 

   S->M = E2;

   buffers = init_sec_stage( S->M, buffers );

 

   S->L = E3;

   buffers = init_last_stage( S->L, buffers );

 

   F->output = buffers;

 

   init_src( F, S );

 

   return F;

}

 

void free_src( FUNDAMENT_DATA_ENTRY *F )

{

   free( F->input );

   free( F );

}

 

int cycles()

{

   int ret;

 

   __asm__ __volatile__

   (

   "%0 = CYCLES;\n\t"

   : "=&d" (ret)

   :

   : "R1"

   );

 

   return ret;

}

 

int main()

{

   FUNDAMENT_DATA_ENTRY* vfd;

   long unsigned int old_cycles, new_cycles, diff;

   long int count;

   STAGE_ENTRY *last_stage_entry;

 

   vfd = alloc_441to48();

 

   old_cycles = cycles();

 

   for ( count = 0; count < 44100; count += NINPS )

   {

      memcpy( x, &sin_data[count], NINPS * 2 );

 

      src_flt_bufin( x, vfd, 2, NINPS );

      last_stage_entry = src_flt( vfd );

      src_flt_bufout( y, last_stage_entry, 2, NOUTS );

   }

 

   new_cycles = cycles();

 

   if ( new_cycles > old_cycles )

      diff = new_cycles - old_cycles;

   else

      diff = 0xFFFFFFFF - old_cycles + new_cycles;

 

   printf( "cycles: %d\n", diff );

}

 

 

src_441to48.h:

 

//Include file for 44.1KHz to 48KHz.  Greatest Common Denominator (GCD) = 147/160.

 

#define HALFB 0                        // Half band flag

#define NUPST 2                        // Number of up stages

#define PVTFL 1                        // Pivot flag

#define NDWNS 0                        // Number of down stages

#define STAGE 3                        // Number of total stages

#define NINPS (147*1)                // Number of imput samples (Should be a even multiple of the GCD)

#define NOUTS (160*1)                    // Number of output samples (Should be the same multiple as above of the GCD)

 

#define INTP1 2

#define DOWN1 1

#define LENG1 509                    // LENG1 = length of stage filter

#define PLEN1 255                        // PLEN1 = MLEN1/INTP1 (polyphase length)

#define MLEN1 510                    // MLEN1 = LENG1 + enough to make even length for polyphase

#define SHFT1 0

#define NINP1 NINPS                    // NINPS (...or NOUTS) = 160

#define SZIN1 (NINP1 + ((LENG1-1)/INTP1) + 1) // 160 + 48/147 + 1 = 161

 

 

#define INTP2 5

#define DOWN2 1

#define LENG2 61                    // LENG2 = length of stage filter

#define PLEN2 13                    // PLEN2 = MLEN2/INTP2 (polyphase length)

#define MLEN2 65                    // MLEN2 = LENG2 + enough to make even length for polyphase

#define SHFT2 1

#define NINP2 ((NINP1*INTP1)/DOWN1)    // (NINPx*INTPx)/DOWNx = 160*147/16 = 1470

#define SZIN2 (NINP2 + ((LENG2-1)/INTP2) + 1) // 1470 + 26/1 + 1 = 1497

 

 

#define INTP3 16

#define DOWN3 147

#define LENG3 113                    // LENG3 = length of stage filter

#define PLEN3 8                    // PLEN2 = MLEN2/INTP2 (polyphase length)

#define MLEN3 128                    // MLEN2 = LENG2 + enough to make even length for polyphase

#define SHFT3 0

#define NINP3 ((NINP2*INTP2)/DOWN2)    // (NINPx*INTPx)/DOWNx 1470*1/5 = 294

#define SZIN3 (NINP3 + ((LENG3-1)/INTP3) + 1) // 294 + 222/1 + 1 = 517

 

#define NINP4 ((NINP3*INTP3)/DOWN3)    // (NINPx*INTPx)/DOWNx = 294*1/2 = 147

#define SZIN4 NINP4 + 1              // for last decimation stage only = 148

 

 

#define OFFS1 (LENG1-1)/(2*DOWN1)    // 

#define OFFS2 (LENG2-1)/(2*DOWN2)    // 

#define OFFS3 (LENG3-1)/(2*DOWN3)    // 

 

 

#if OFFS3 < 1

#define OF2S3 1

#else

#define OF2S3 OFFS3

#endif

 

#define TOFS1 OFFS1                            //

#define TOFS2 ((INTP2*TOFS1)/DOWN2 + OFFS2)    // 

#define TOFS3 ((INTP3*TOFS2)/DOWN3 + OF2S3)    // 

 

/*********IPDC comment *******/

//#define DOFS3 (NOUTS-TOFS3)        // Used to strip filter delays off buffers

/******************************/

 

/*********IPDC addition*******/

#define DOFS3 NOUTS

/******************************/

 

 

SRC_inc.h:

 

/****************************************************************************

*    File:        SRC_inc.h

*    Date:        Sept 26 2002

*   Created:    Jeff Sondermeyer

****************************************************************************/

 

#include <fract.h>

 

/* data structure for each stage */

 

typedef struct {

    int *in_s;        // input signal buffer

    int in_z;        // input signal buffer size

    int *out_s;        // output signal buffer

    int out_z;        // output signal buffer size

    fract32 *h;            // filter

    int plen;        // filter polyphase length

    int up;            // interpolation factor

    int dn;            // decimatation factor

    int nis;        // number of inputs

    int nos;        // number of outputs

    int nshft;        // number of shifts

   

    int *in_c;        /* base address of input signal */

    int *out_c;        /* base address of output signal*/

 

    /* Members used for copying into input and out of output buffers.  */

    int space;        // Space left for copy into buffer.

    int available;        /* Number of valid samples in output buffer.  */

} STAGE_ENTRY;  // (nos * dn) = (nis * up) is required

 

/* fundamental structure for sample rate conversion */

 

typedef struct {

    STAGE_ENTRY    *V; // first stage

    STAGE_ENTRY    *M; // middle stage

    STAGE_ENTRY    *L; // last stage

} STAGE_HANDLE;

 

 

typedef struct {

    STAGE_HANDLE *S;

 

    int half_band;        // half band flag

    int up_stage;        // number of pure up stages

    int pivot_stage;        // pivot stage flag

    int down_stage;        // number of pure down stages

    int nstages;        // total number of stages

    int ninputs;        // number of input samples per block

    int noutputs;        // number of output samples per block

 

    int *input;

    int *output;

} FUNDAMENT_DATA_ENTRY;

 

 

/* Copies N samples from INPUTS into the input buffer of the sample rate

   converter described by F.  N should be less than or equal to the

   "space" field in the first stage of F.

   When reading INPUTS, STRIDE is used as an increment (in bytes) after

   each sample.  For normal mono data, STRIDE should be 2, while it should

   be 4 for stereo data which has the samples interleaved in the same

   buffer.  */

void src_flt_bufin (fract16 *inputs, FUNDAMENT_DATA_ENTRY *F, int stride, int n);

 

/* Perform sample rate conversion described by F.  This assumes that the

   input buffer has been filled, either using src_flt_bufin or manually.

   This function returns a pointer to the last processed stage, which can

   be passed to src_flt_bufout to retrieve the data.  */

STAGE_ENTRY *src_flt (FUNDAMENT_DATA_ENTRY *F);

 

/* Copies N samples from stage E (which should be obtained from the return

   value of src_flt) into the buffer OUTPUTS.  N should be less or equal to

   the "available" field in E.

   When writing OUTPUTS, STRIDE is used as an increment (in bytes) after

   each sample.  For normal mono data, STRIDE should be 2, while it should

   be 4 for stereo data which has the samples interleaved in the same

   buffer.  */

void src_flt_bufout (fract16 *outputs, STAGE_ENTRY *E, int stride, int n);

 

/* Allocate a sample rate converter for 44100Hz to 48000Hz conversion.  */

extern FUNDAMENT_DATA_ENTRY *alloc_441to48 (void);

 

/* Free a sample rate converter.  */

void free_src (FUNDAMENT_DATA_ENTRY *);

 

/* Initialize all buffers for sample rate converter F.  */

void src_init (FUNDAMENT_DATA_ENTRY *F);

 

 

src_flt.S:

 

/*     File: src_flt.asm Version 0.1

 

 

fundemental structure order:

 

1.   stage data handle

2.   half band flag (0,1, or 2)

3.   number of up stages

4.   pivot flag (0 or 1)

5.   number of down stages

6.   number of stages (total)

7.   number of input samples per block

8.   number of output samples per block

 

 

    P0 -> fundamental structure

    P1 -> input samples

    P2 -> output samples

    P3 -> memory storage and retreival

    P4 = temporary pointer

    P5 = loop counter

 

    R0 = Loop counters

    R1 = temporary storage

    R2 = Loop counters

    R3 = Shift count

    R4 = inner loop calculations

    R5 = inner loop calculations

    R6 = temporary storage

    R7 = temporary storage

 

    I0 = dedicated to input buffer 'inx'

    I1 = general use...reading 'inputData' plus others

    I2 = general use...reading 'inx' for output data

    I3 = general use...

 

 

   Input Data Structure (VAR_SIZE words)

       AIS: address of input signal (circular), updated after return,

       SIS: circular size of AIS,

       AOS: address of output signal (circular), updated after return,

       SOS: circular size of AOS,

       AFA: address of filter array,

       LEN: poly-phase filter length,

    UPR: up sample rate >= 2,

    DNR: down sample rate = 1 is assumed

       NIS: number of input signals

    NOS: number of output signals

    SHF: number of shift counter, 0 or 1

      

 

*/

 

#define STAGE_in_s 0

#define STAGE_in_z 4

#define STAGE_out_s 8

#define STAGE_out_z 12

#define STAGE_h 16

#define STAGE_plen 20

#define STAGE_up 24

#define STAGE_dn 28

#define STAGE_nis 32

#define STAGE_nos 36

#define STAGE_nshft 40

#define STAGE_in_c 44

#define STAGE_out_c 48

#define STAGE_space 52

#define STAGE_available 56

 

 

.text

 

.globl _src_flt_bufin;

_src_flt_bufin:

    P1 = R0;                // Address of input data

    P0 = R1;                // Address of fundemental structure

    M0 = R2;

 

    R0 = [sp + 12];

 

    p0 = [p0];                // stage handle

 

    [--SP]=(R7:4);

 

    p0 = [p0];            // p0 -> first data structure

 

    r6 = [p0 + STAGE_in_s];            // r6 -> first input buffer 'inx'

    i0 = r6;                // i0 -> first input buffer 'inx'

 

    r6 = [p0 + STAGE_in_c];

    b0 = r6;                // b0 -> base of first input circular buffer

    r6 = [p0 + STAGE_in_z];

    r6 = r6 << 2;            // double length (4 bytes per word)

    l0 = r6;                // l0 = first input circular buffer size 'SZINx'

 

    r7 = [p0 + STAGE_space];

    r5 = [p0 + STAGE_nis];

 

    /* Update space left after this copy.  */

    r1 = r7 - r0;

    [p0 + STAGE_space] = r1;

 

    /* Compute destination pointer from buffer pointer and space

       left.  */

    r7 = r5 - r7;

    r7 = r7 << 2;

    m2 = r7;

    i0 += m2;

 

    p2 = r0;            // p2 =  number of input samples per block

 

    i1 = p1;            // load i1 with address of 'inputData'

    l1 = 0;

 

 

    r6.l = 0;

    LSETUP(READ_INPUTS_BEGIN, READ_INPUTS_END) LC0 = p2;

READ_INPUTS_BEGIN:

        r6.h = w[i1];            // read the input buffer 'inputData'

        i1 += m0;

READ_INPUTS_END:

        [i0++] = r6;            // write input into input buffer 'inx'

 

    (R7:4)=[SP++];    // Pop R7 ...P5

    L0=0;

    L1=0;

    L2=0;

    L3=0;

 

    RTS;

 

 

/* STACK LAYOUT

   Local variables: [0..16[

   Saved registers: SP + [16..44[

   Input args: SP + [44 .. 48[

   +44 FUNDAMENTAL STRUCTURE  */

 

#define OFF_PT_FUNDST 0

#define OFF_PT2_FUNDST 4

#define OFF_ST_HANDLE 8

 

#define OFF_FUNDST 44

 

   

.globl _src_flt;

_src_flt:

    P0 = R0;            // Address of fundemental structure

    [--SP]=(R7:4,P5:3);

    SP += -16;

 

    r7 = [p0++];

    [SP + OFF_ST_HANDLE] = r7;            // store stage data handle

 

    r6 = [p0++];        // r6 = half band flag (move past this for now)

 

    r2 = [p0++];        // r2 = # of up stages

    [SP + OFF_PT2_FUNDST] = p0;            // save pointer to current fundemental structure

 

 

    CC = r2 <= 0;

    IF CC JUMP over_upstage;        // if upstage = 0, jump over

 

UPSTAGE_BEGIN:

 

        p4 = [SP + OFF_ST_HANDLE];    // p4 -> current stage data handle

 

        r7 = [p4++];       

        p0 = r7;            // p0 -> stage data

 

        [SP + OFF_ST_HANDLE] = p4;    // save pointer to stage data handle

 

up_src:

        r7 = [p0 + STAGE_in_s];            // r7 -> input signal 'inx'

        r5 = [p0 + STAGE_in_c];

        b3 = r5;                // b3 set for circular buffering

 

        r5 = [p0 + STAGE_in_z];

        r5 = r5 << 2;            // double the length (4 bytes per word)       

        l3 = r5;                // l3 = Size of Input Stage (SIS)

 

           

        r6 = [p0 + STAGE_out_s];

        i2 = r6;                // i2 -> output signal 'inx'+1 buffer (output buffer)

        r6 = [p0 + STAGE_out_c];

        b2 = r6;                // b2 set for circular buffering

 

        r6 = [p0 + STAGE_out_z];

        r6 = r6 << 2;            // double the output size (4 bytes per word)

        l2 = r6;            // l2 = Size of Output Stage (SOS)

 

        r3 = [p0 + STAGE_h];        // r3 -> the filter coefficients

        p3 = [p0 + STAGE_plen];        // p3 = poly-phase filter size

 

        p4 = 8;                // always skip over DNR (2*4bytes) in the up SRC

 

        p5 = [p0 + STAGE_up];        // p5 = Up Sample Rate (UPR)

        r0 = [p0 + STAGE_nis];        // r0 = NIS

        [p0 + STAGE_space] = r0;    // free up input space

 

        r6 = [p0 + STAGE_nshft];    // r6 = number of shifts (always a arithmatic left shift..upshift)

        m2 = r6;            // Save in m2

 

UP_SRC_OUTER_BEGIN:

 

            i1 = r3;                // i1 -> filter coefficients

            l1 = 0;                    // linear addressing???

   

            LSETUP(UP_SAMPLE_BEGIN, UP_SAMPLE_END) LC0 = p5;

 

UP_SAMPLE_BEGIN:

                i3 = r7;            // i3 - > 'in' buffer       

                A1=A0=0 || R6=[I1++] || R5=[I3--]; // r6=filter coef, r5='inx' buffer

 

                LSETUP(POLY_PHASE_BEGIN, POLY_PHASE_END) LC1 = p3;

 

POLY_PHASE_BEGIN:    R4=(A0+=R6.H*R5.H), A1+=R6.H*R5.L (M);           

POLY_PHASE_END:        R1=(A1+=R5.H*R6.L) (M) || R6=[I1++] || R5=[I3--];

               

//                R1=R1>>16;

//                R4=R4+R1 (S);

 

                r5=m2;                    // load r5 with number of shifts

/******** IPDC comment     *******/

//                A1 = A1>>16;

/******************************/

 

/*********IPDC addition*******/

                A1=A1>>>15;

/******************************/

                A0+=A1;

                A0 = ASHIFT A0 BY r5.l;

                r4 = A0;                // high half-word extraction with 16-bit saturation.  Rounding cntrl by

                                        // RND_MOD.  0 = unbiased rounding = default

 

 

 

//                A0 = A0 >>> 1;

//                R4 = A0;

 

 

UP_SAMPLE_END:

                [i2++] = R4;            // save output into 'inx'+1

 

            i3 = r7;            // get input back at beginning of 'inx'

            m3 = 4;

            i3 += m3;            // increment by 1 word (4 bytes)

            r7 = i3;            // update r7 -> 'inx' buffer

 

UP_SRC_OUTER_END:

        r0 += -1;                        // Check number of input samples (NIS)

        CC = r0 <= 0;

        IF !CC JUMP UP_SRC_OUTER_BEGIN;    // if NIS equal to 0, jump to UP_SRC_OUTER_BEGIN

 

        [p0 + STAGE_in_s] = r7;                    // save the input signal address

 

        r6 = i2;

        [p0 + STAGE_out_s] = r6;                // save the output signal address

   

UPSTAGE_END:

    r2 += -1;                        // Check number of stages

    CC = r2 <= 0;

    IF !CC JUMP UPSTAGE_BEGIN;        // if upstage not equal to 0, jump to UPSTAGE_BEGIN

 

over_upstage:

 

    p0 = [SP + OFF_PT2_FUNDST];            // p0 -> fundamental structure

 

    r6 = [p0++];        // r6 = pivot flag

    [SP + OFF_PT2_FUNDST] = p0;            // save fundamental structure

 

    CC = r6 <= 0;

    IF CC JUMP over_pivotstage;        // if pivotstage = 0, jump over

 

    p4 = [SP + OFF_ST_HANDLE];             // p4 -> current stage data handle

 

    r7 = [p4++];       

    p0 = r7;            // p0 -> stage data

 

    [SP + OFF_ST_HANDLE] = p4;            // save pointer to stage data handle

 

pvt_src:

    r7 = [p0 + STAGE_in_s];            // r7 -> input signal 'inx'

    i3 = r7;

    r5 = [p0 + STAGE_in_c];

    b3 = r5;

 

    r5 = [p0 + STAGE_in_z];

    r5 = r5 << 2;            // double the length (4 bytes per word)       

    l3 = r5;                // l3 = Size of Input Stage (SIS)

 

    r6 = [p0 + STAGE_out_s];

    i2 = r6;                // i2 -> output signal 'inx'+1 buffer (output buffer)

    r6 = [p0 + STAGE_out_c];

    b2 = r6;                // b2 set for circular buffering

 

    r6 = [p0 + STAGE_out_z];

    r6 = r6 << 2;            // double the output size (4 bytes per word)

    l2 = r6;                // l2 = Size of Output Stage (SOS)

 

    r3 = [p0 + STAGE_h];            // r3 -> the filter coefficients

    r6 = [p0 + STAGE_plen];            // r6 = poly-phase filter size

    p3 = r6;                // save poly-phase into p3

 

    r6 = [p0 + STAGE_up];            // r6 = UPR (filter step)

    r6 = r6 << 2;            // post increment must be two bytes

    m1 = r6;                // post increment set to UPR

                        // always skip over UPR (2*4bytes) in the up SRC

 

    r0 = [p0 + STAGE_nis];        // r0 = NIS

    [p0 + STAGE_space] = r0;    // free up input space

 

    r0 = [p0 + STAGE_dn];            // r0 = DNR

    r0 = r0 << 2;            // four bytes per word

    p5 = [p0 + STAGE_nos];            // p5 = Number of Outputs (NOS)

    [p0 + STAGE_available] = p5;

 

//    r1.l = w[p0 + STAGE_nshft];            // r1.l = Number of shifts (can be left shift=upshift or right shift=downshift)

    r6 = [p0 + STAGE_nshft];           

    m2 = r6;                // m2 = Number of shifts

 

    r2 = 0;                    // set poly index value to 0

    i1 = r3;                // i1 -> filter coefficients

 

//    CC = r6 <= 0;

//    IF !CC JUMP pvt_positive;        // if # of shifts > 0, jump over

//    CC = r6 < 0;

//    IF CC JUMP pvt_negative;        // if # of shifts < 0, jump over

 

    LSETUP(PVT_OUT_BEGIN, PVT_OUT_END) LC0 = p5;

PVT_OUT_BEGIN:

 

        m3 = i3;                 // save i3 into m3;   

        A1=A0=0 || R6=[I1] || R5=[I3--]; // r6=filter coef, r5='inx' buffer

//        i1 += m1; 

        LSETUP(PVT_FILTER_BEGIN, PVT_FILTER_END) LC1 = p3;

 

PVT_FILTER_BEGIN:

            R4=(A0+=R6.H*R5.H), A1+=R6.H*R5.L (M)||i1 += m1;   

PVT_FILTER_END:

            R1=(A1+=R5.H*R6.L) (M) || R6=[I1] || R5=[I3--];

            i1 += m1;

 

//        R1=R1>>16;

//        R4=R4+R1 (S);

 

        r5 = m2;

/******** IPDC comment     *******/

//        A1 = A1>>16;

/******************************/

 

/*********IPDC addition*******/

        A1=A1>>>15;

/******************************/

        A0+=A1;

        A0 = ASHIFT A0 BY r5.l;

        r6 = A0;                // high half-word extraction with 16-bit saturation.  Rounding cntrl by

                                // RND_MOD.  0 = unbiased rounding = default

        [i2++] = r6;            // save output into 'inx'+1

       

 

//        R6.H=(A1+=R6.L*R5.H)    ||  NOP  ||  NOP;

 

//        [i2++] = r4;                // save output into 'inx'+1

 

//new_poly:

 

        r7 = r2;                    // r7 = poly_index

        r7 = r7 + r0;                // r7 = poly_index + DNR

        i3 = m3;                    // restore i3

        r6 = m1;                    // r6 = UPR

 

test_address:

         r5 = r7 - r6;                // (poly_index + DNR)-UPR

        CC = r5 < 0;

        IF CC JUMP next_address;    // if true, jump over           

        r7 = r5;                    // r7 = new poly_index

        m0 = 4;

        i3 += m0;                    // increment by 1 word (4 bytes)

 

        JUMP test_address;            // test the new poly_index 

next_address:

 

        r2 = r7;                    // save the new address

        r7 = r7 + r3;                // r7 -> adjusted filter address

 

PVT_OUT_END:

        i1 = r7;                    // i1 -> poly-phase filter

 

pvt_return:

 

    r7 = i3;            // update r7 -> 'inx' buffer

    p4 = 8;                            // 2 words (2*4bytes per word)

    [p0++p4] = r7;                    // save the input signal

 

    r6 = i2;

    [p0] = r6;                        // save the output signal address

 

over_pivotstage:

 

    p0 = [SP + OFF_PT2_FUNDST];            // p0 -> fundamental structure

 

    r0 = [p0++];        // r0 = number of down stages

    CC = r0 <= 0;

 

    IF CC JUMP return_src_core;        // if number of down stages = 0, RTS

       

DOWNSTAGE_BEGIN:

 

    p4 = [SP + OFF_ST_HANDLE];             // p4 -> current stage data handle

 

    p0 = [p4++];        // p0 -> stage data

 

    [SP + OFF_ST_HANDLE] = p4;            // save pointer to stage data handle

 

dn_src:

 

    r7 = [p0 + STAGE_in_s];            // r7 -> input signal 'inx'

    i3 = r7;

    r5 = [p0 + STAGE_in_c];

    b3 = r5;

 

    r5 = [p0 + STAGE_in_z];

    r5 = r5 << 2;            // double the length (4 bytes per word)       

    l3 = r5;                // l3 = Size of Input Stage (SIS)

 

    r6 = [p0 + STAGE_out_s];

    i2 = r6;                // i2 -> output signal 'inx'+1 buffer (output buffer)

    r6 = [p0 + STAGE_out_c];

    b2 = r6;                // b2 set for circular buffering

 

    r6 = [p0 + STAGE_out_z];

    r6 = r6 << 2;            // double the output size (2 bytes per word)

    l2 = r6;                // l2 = Size of Output Stage (SOS)

 

    p4 = 8;                    // always skip over DNR (2*4bytes) in the up SRC

 

    r3 = [p0 + STAGE_h];            // r3 -> the filter coefficients

    r6 = [p0 + STAGE_plen];            // r6 = filter length

    p3 = r6;

 

    r4 = [p0 + STAGE_dn];            // r4 = DNR

    r4 = r4 << 2;            // Four bytes per word

    m3 = r4;

 

    p5 = [p0 + STAGE_nos];            // p5 = number of outputs

    [p0 + STAGE_available] = p5;

 

    r2 = [p0 + STAGE_nshft];        // r2 = number of shifts

 

    LSETUP(DN_OUT_BEGIN, DN_OUT_END) LC0 = p5;

       

DN_OUT_BEGIN:

 

        i1 = r3;                // i1 -> filter coefficients

        m1 = i3;                // save i3 into m1

 

        A1=A0=0 || R6=[I1++] || R5=[I3--]; // r6=filter coef, r5='inx' buffer

        LSETUP(DOWN_FILTER_BEGIN, DOWN_FILTER_END) LC1 = p3;   

DOWN_FILTER_BEGIN:

            R4=(A0+=R6.H*R5.H), A1+=R6.H*R5.L (M);

DOWN_FILTER_END:

            R1=(A1+=R5.H*R6.L) (M) || R6=[I1++] || R5=[I3--];

 

//        R1=R1>>16;

//        R4=R4+R1 (S);

/******** IPDC comment     *******/

//        A1 = A1>>16;

/******************************/

 

/*********IPDC addition*******/

        A1=A1>>>15;

/******************************/

        A0+=A1;

        A0 = ASHIFT A0 BY r2.l;

        r6 = A0;                // high half-word extraction with 16-bit saturation.  Rounding cntrl by

                                // RND_MOD.  0 = unbiased rounding = default

        [i2++] = r6;            // save output into 'inx'+1

//        JUMP shiftDone;

 

//shiftPos:                        // Left Shift = Up shift = positive number

//        A1 = ASHIFT A1 BY r2.l;

//        r6.h = A1;                // high half-word extraction with 16-bit saturation.  Rounding cntrl by

                                // RND_MOD.  0 = unbiased rounding = default       

 

//        w[i2++] = r6.h;            // save output into 'inx'+1   

 

//shiftDone:

        i3 = m1;                // restore i3

 

DN_OUT_END:

        i3 += m3;                // increment by 4 bytes per word

    r7 = i3;

 

    p4 = 8;                    // 2 words (2*4bytes per word)

    [p0 + STAGE_in_s] = r7;            // save the input signal address

 

    r6 = i2;

    [p0 + STAGE_out_s] = r6;        // save the output signal address

 

 

DOWNSTAGE_END:

 

    r0 += -1;                        // Check number of downstages

    CC = r0 <= 0;

    IF !CC JUMP DOWNSTAGE_BEGIN;    // if # equal to 0, jump to DOWNSTAGE_BEGIN

 

return_src_core:

    /* Return a pointer to the last stage we processed.  */

    p0 = [SP + OFF_ST_HANDLE];

    p0 += -4;

    r0 = [p0];

 

    SP += 16;

    (R7:4,P5:3)=[SP++];    // Pop R7 ...P5

    L0=0;

    L1=0;

    L2=0;

    L3=0;

 

    RTS;

_src_flt.end:

 

.global _src_flt_bufout

_src_flt_bufout:

    p1 = r0;    // destination buffer

    p0 = r1;    // stage pointer

 

    [--sp] = (p5:5);

 

    r0 = [p0 + STAGE_out_c];

    b2 = r0;

    r0 = [p0 + STAGE_out_z];

    r0 = r0 << 2;            // double the output size (2 bytes per word)

    l2 = r0;                // l2 = Size of Output Stage (SOS)

 

    r0 = [p0 + STAGE_out_s];

    i2 = r0;

 

    p5 = [p0 + STAGE_available];

    p2 = p5 << 2;

    m0 = p2;

    i2 -= m0;

 

    p2 = [sp + 16];

    p5 -= p2;

    [p0 + STAGE_available] = p5;

 

    p0 = r2;   // stride

    LSETUP(READ_OUTS_BEGIN, READ_OUTS_END) LC0 = p2;

 

READ_OUTS_BEGIN:

        r0 = [i2++];                // get 32-bit output from buffer

 

READ_OUTS_END:

        w[p1 ++ p0] = r0.h;                // write 16-bit output to 'outputData'

 

    (P5:5)=[SP++];    // Pop R7 ...P5

    L2=0;

 

    RTS;

 

 

 

src_init.S:

 

/*initial.asm: Sample Rate Conversion Version 0.1

 

    P0 -> a fundamental structure

 

    Registers used: P0, P1, P2, P5, R2, R3, R4, R5, R6, R7

 

*/

 

 

    .text;

.align 4;

 

.global _src_init;

/*

    initialize all the buffers (inputs and delay)

    P0 -> a fundamental structure

*/

 

_src_init:

 

        [--SP]=(R7:4,P5:3);        // Push R7 and

 

        P0 = R0;                // Address of fundemental structure

        p5 = 20;                // 5*4 bytes = 20 byte-wide increment   

        r6 = [p0++p5];            // Pointer to fundemental structure 'fs_x' post increment of 5 32-bit words

//jws        p1 = r2;                // p1 = 32-bit pointer 'st_handle'

 

        r7 = [p0++];            // load number of stages

        p5 = r7;

        p1 = r6;                // p1 = 32-bit pointer 'st_handle'

 

        LSETUP(0f, 1f) LC0 = p5;

0:

            r2 = [p1++];

           

            p2 = r2;                // p2 -> 'datax'

            r3 = [p2++];            // r3 -> first element 'inx'

            r4 = [p2++];            // r4 = length 'SZINx'

 

            i0 = r3;                // i0 -> 'inx' buffer

            p5 = r4;

            r5 = 0;

            l0 = 0;                    // l0 = length of 'inx' buffer SZINx

            LSETUP(2f, 3f) LC1 = p5;

2:

3:

                [i0++] = r5;        // zero out a 32-bit word

 

1:

            nop;

 

        (R7:4,P5:3)=[SP++];    // Pop R7 and P5

 

        RTS;

       

_src_init.end:

 

SRC_test.tar.bz2

QuoteReplyEditDelete

 

 

2010-10-13 09:42:46     Re: Can't get SRC algorithm to fly

Mike Frysinger (UNITED STATES)

Message: 94471   

 

all Blackfin cores are the same so any core behavior is going to be exactly the same across variants.  the only real difference you'd see would be due to diff in bus sizes or memory types, but the BF52x and BF53x are the same there.

 

you could try moving the funcs to l1 too with the l1_text attribute.

QuoteReplyEditDelete

 

 

2010-10-13 12:29:22     Re: Can't get SRC algorithm to fly

Daniel Persson (SWEDEN)

Message: 94486   

 

Thanks for the reply.

 

I have tested to put all code in l1 instruction ram. Now I get a solid 73 MIPS (before there was seemingly random alternation between 73 and 102 MIPS), which is still far more than I can afford.

 

Jeff Sondermeyer wrote in EE183 that the VDSP++ implementation consumed 2 MIPS. So I assume there has to be something really wrong with my build. Do you have any more ideas of what might be wrong?

QuoteReplyEditDelete

 

 

2010-10-13 13:56:18     Re: Can't get SRC algorithm to fly

Mike Frysinger (UNITED STATES)

Message: 94488   

 

did you only change the C files ?  or did you also change the assembly from ".text" to ".section .l1.text" ?

 

otherwise, i really know nothing of signal processing.  someone else will have to field that aspect.

QuoteReplyEditDelete

 

 

2010-10-13 20:14:29     Re: Can't get SRC algorithm to fly

Simon Brewer (AUSTRALIA)

Message: 94493   

 

I had a look at the code, and can make another suggestion.  You need to ensure that the filter and data are in separate L1 data banks or else you will get extra stalls in the filter implementation i.e. banka and bankb

QuoteReplyEditDelete

 

 

2010-10-13 20:36:22     Re: Can't get SRC algorithm to fly

Simon Brewer (AUSTRALIA)

Message: 94494   

 

One other thing to keep in mind.  This code is running for about 0.14 seconds on my test board.  During that time there will be ~35 timer interrupts that occur during that time period (assuming a 250Hz timer int rate).  The cycle count will be measuring the total cycles for everything that occurs in that 0.14 second period.

QuoteReplyEditDelete

 

 

2010-10-14 05:02:52     Re: Can't get SRC algorithm to fly

Daniel Persson (SWEDEN)

Message: 94506   

 

Mike, I have put both assembler and c code in instruction cache and verified it by looking in the map file.

 

Simon, thanks for the suggestion. I have tested to put data and filter in separeat l1 banks but unfortunately it didn't make any difference for my application. I verified that the data and filter eneded up in separate l1 banks by looking at the map file.

 

You say that  the code is running for 0.14 s on your test board, which is much faster then the time it takes to run through the application on my board:

 

real    0m 0.48s

user    0m 0.46s

sys    0m 0.02s

 

What core clock are you using? What cycle count do you get as output from the application?

 

As you say there will be a linux overhead, (the timer interrupts I guess?) but I was hoping that the overhead should be small enough not to have any major impact on the system performance, and that I would be able to perform realtime signal processing (like src for several parallell audio channels).

 

KR, Daniel

QuoteReplyEditDelete

 

 

2010-10-15 02:03:13     Re: Can't get SRC algorithm to fly

Simon Brewer (AUSTRALIA)

Message: 94540   

 

Hi Daniel,

 

I had some time today to look at this in more detail.  Running on my board I get about 70MIPs as well.  I am running a core clock of 525 MHz.

 

I guess the point I was making about the cycle counter was that it can be useful, but could be extremely misleading in some cases (if the process is pre-empted for example).

 

Anyway I dug into the algorithm a little more (by stepping using GDB).  The inner loop is 2* 0xfe* 2+5*0xc*2 instructions per sample.  With a few other loops added in, the overall is 0x470 cycles per sample.  Which is about 55 MIPS (at 48kHz).  So we are in the ball park....

 

Working from the other direction, IF the MIPS budget was 2 then at 48kHz there would be 41 cycles per sample available.  Now given this is 32 bit arithmetic (2 cycles for MAC), that leaves ~20 cycles per sample.  In other words a 20 tap filter; this is not going to give very good performance... Using 32 bit arithmetic with a 20 tap FIR is, err, a little bit of a waste ;-)

 

So in summary the SRC is probably working as expected.  I think the document is probably misleading when it mentions 2 MIPS.

 

Simon

QuoteReplyEditDelete

 

 

2010-10-15 04:09:06     Re: Can't get SRC algorithm to fly

Daniel Persson (SWEDEN)

Message: 94553   

 

Hi Simon

 

Thanks for the time and expertise you have put in.

 

As you might have guessed I am new to digital signal processing, hence the following question.

 

The DSP requirements of my project is to be able to, in worst case, simultaneously handle two src's, and an EQ filter on top of that (preferably several EQ filters). Would you say that it is possible to achive this with 150 MIPS budget? If you think it is possible, what kind of filter would you use for the src's and EQ and how many taps would they have. The audio quality I am currently aming at is 16 bit at 44100 Hz. If you don't think it is possible, what kind of MIPS would be required to make it possible?

 

Daniel

QuoteReplyEditDelete

 

 

2010-10-17 20:09:28     Re: Can't get SRC algorithm to fly

Simon Brewer (AUSTRALIA)

Message: 94639   

 

Hi Daniel,

 

how many frequency bands do you want for your EQ?

 

I think 150MIPs is ok i.e. around 1700 cycles per sample, although quality compromises may need to be made.

 

What sort of delay can you handle through the system?  What sort of audio quality are you expecting?  If the quality requirements are very high, then the filter lengths will need to be longer.

 

Simon

QuoteReplyEditDelete

 

 

2010-10-25 09:27:18     Re: Can't get SRC algorithm to fly

Daniel Persson (SWEDEN)

Message: 95137   

 

Hi Simon

 

I will need a two band EQ. The use case for the EQ is that the sound is received through a microphone and the environment may be noisy. The quality I aim for (if it is possible) is a samplerate at 44100 Hz, and word length at 16 bit, and since the user will hear himself via the microphone and headset, the latency has to be real short, like 10 ms (confilicting requirements, I know).

 

Daniel

 

QuoteReplyEditDelete

 

 

2010-10-26 02:08:45     Re: Can't get SRC algorithm to fly

Simon Brewer (AUSTRALIA)

Message: 95157   

 

Hi

 

A couple of comments.  If your audio environment is noisy, it might be possible to compromise on filter quality.  A two band EQ is not that expensive.  For example you could implement and FIR based QMF filter.

 

Where in the system do you need to 44.1 -> 48kHz conversion?

 

For a latency of 10ms, at 44.1kHz, gives a delay of 441 samples.  This is pretty aggressive, and probably not attainable.  I would do some measurements on your Linux system and figure out the delay through the audio paths.

 

Simon

QuoteReplyEditDelete

Attachments

Outcomes