Post Go back to editing

Simple audio in-out code for the ADSP-BF706 EZ-KIT

I have an audio in-out program in C for the ADSP-BF706 EZ-KIT mini, about 80 lines in length. It's simple and very easy to understand. It's also completely self-contained - it doesn't use any of the header files that the "TalkThrough_BF706Mini.c" program uses (supplied with the kit). It includes a basic TWI driver, SPORT0 set up and configuration routine for the on board codec (ADAU1761). With some simple modifications it can be used for filtering, both FIR and IIR. The file is attached. All welcome to use free.

PatrickG

BF706_audio_inout.pdf
  • Hi Patrick,

    Thanks for the assembly code for clock configuration!

    I think delay loops after TWI transmit are not save enough.
    This is my proposal for ADAU1761 configuration:

    void write_CODEC_reg(int8_t codec_address, int16_t reg_address, uint8_t reg_data)
    {
        *pREG_TWI0_CTL = 0x8c;                  // Set prescale and enable TWI
        *pREG_TWI0_CLKDIV = 0x3232;             // Set duty cycle
        *pREG_TWI0_MSTRADDR = codec_address;    // Address of CODEC
        *pREG_TWI0_TXDATA8 = reg_address >> 8;  // Address of register to set, MSB
        *pREG_TWI0_MSTRCTL = 0xc1;              // Send three bytes and enable transmit
        while((*pREG_TWI0_MSTRCTL >> 6) != 2);  // Wait until first byte is transmitted
        *pREG_TWI0_TXDATA8 = reg_address;       // Address of register to set, LSB
        while((*pREG_TWI0_MSTRCTL >> 6) != 1);  // Wait until second byte is transmitted
        *pREG_TWI0_TXDATA8 = reg_data;          // Data to write
        while (!(*pREG_TWI0_ISTAT & BITM_TWI_ISTAT_MCOMP));
        *pREG_TWI0_ISTAT |= BITM_TWI_ISTAT_TXSERV;
        *pREG_TWI0_ISTAT |= BITM_TWI_ISTAT_MCOMP;
    }

    void configure_CODEC()
    {
        write_CODEC_reg(0x38, 0x4000, 0x01);    // Enable master clock, disable PLL
        write_CODEC_reg(0x38, 0x400a, 0x0b);    // Set left line-in gain to 0 dB
        write_CODEC_reg(0x38, 0x400c, 0x0b);    // Set right line-in gain to 0 dB
        write_CODEC_reg(0x38, 0x4015, 0x01);    // Set serial port master mode
        write_CODEC_reg(0x38, 0x4017, 0x00);    // Set CODEC default sample rate, 48 kHz
        write_CODEC_reg(0x38, 0x4019, 0x63);    // Set ADC to on, both channels
        write_CODEC_reg(0x38, 0x401c, 0x21);    // Enable left channel mixer
        write_CODEC_reg(0x38, 0x401e, 0x41);    // Enable right channel mixer
        write_CODEC_reg(0x38, 0x4023, 0xe7);    // Set left headphone volume to 0 dB
        write_CODEC_reg(0x38, 0x4024, 0xe7);    // Set right headphone volume to 0 dB
        write_CODEC_reg(0x38, 0x4029, 0x03);    // Turn on power, both channels
        write_CODEC_reg(0x38, 0x402a, 0x03);    // Set both DACs on
        write_CODEC_reg(0x38, 0x40f2, 0x01);    // DAC gets L, R input from serial port
        write_CODEC_reg(0x38, 0x40f3, 0x01);    // ADC sends L, R input to serial port
        write_CODEC_reg(0x38, 0x40f9, 0x7f);    // Enable all clocks
        write_CODEC_reg(0x38, 0x40fa, 0x03);    // Enable all clocks
    }

  • Ah yes. I have only tried release mode in assembly - no compiler optimisation.

  • These are the fully tested versions of the sample-by-sample
    and block-processing examples for the BF706 EZ-KIT Mini.
    In the ADC control register, Address 0x4019, Bit 6 (ADCPOL)
    is active to avoid negative (inverted) polarity of the input signal.
    The block-processing version can be used for FFT based
    algorithms like overlap add and overlap save.
    The block size can be changed in config.h
    The software overhead is < 5%. In release mode more the
    95% of the processor cycles can be used for audio processing.

    attachments.zip
  • I have added an UART DMA control application to facilitate @PatrickG and @UweS excellent audio examples...The discussion is here : 

    BF706 UART DMA Terminal for Audio and FIR filter Control 

  • Hello Patrick,
    the DMA block in-out is issue is finally solved (I hope). After several months of frustration
    Mike Smith BF706 2D DMA Audio Loopback  proposed a solution:
    2D-DMA for double-buffering. According to my tests this works fine for very large blocks or
    even sample by sample. One DMA-interrupt per block is generated and the audio processing
    can be done in the RX_interrupt_handler.

    I would not have solved this without your posts!
    Best regards
    Uwe

    BF706_Audio_Filter_2018.zip
  • Why is double buffering necessary? I have clearly showed a method in my

    samples that uses DMA in circular mode and the DMA runs itself without

    double-buffering assistance. Of course the DMA can also run in linked list

    mode, but it's not like you are submitting data where you need write behind

    via page flipping mode like you do in graphics...There is a way to setup

    the DMA to run circular with very large chunks of audio...Even 1 second

    long!

    On Sat, Jul 14, 2018, 9:29 AM UweS <analog@analog-vm.hosted.jivesoftware.com>

  • Hello Mario,

    Double buffering is not necessary as long as you do sample by sample processing.
    This is often a good choice producing the lowest possible audio latency.
    It is also quite efficient (<5% overhead). More than 400 biquad filters can
    be performed at 48kHz sampling rate (32 bit, C++, inline, release mode).
    Blackfin+ is a very powerful processor!

    However, if block processing is needed (to implement FFT based algorithms like
    overlap save) some sort of double buffering is necessary:

    During the processing of one audio buffer new samples are stored a separate buffer.
    If one block is full, the recording and processing buffers are swapped.
    The same is true for audio output. One buffer is sent to the output while a separate
    buffer is filled with input. I have implemented this manually in BF706_Block_Filter.zip (main.cpp).

    2D-DMA is a clever way to do this automatically: BF706 2D DMA Audio Loopback 

    Did you find another way to implement this?

    Best regards
    Uwe

  • In my TI books for the TMS320C6713 By Rulph Casaing he mentions ping pong

    and double buffering, but circular DMA is better because it works

    preemptively while you write behind the bytes getting preempted. I show

    this method in one of my samples...It might be the one where I show Fourier

    Series summation on the BF706...That shows how to transfer signals on the

    fly via your DMA method but I modified it to work with very large buffers,

    so you can do a write behind directly into the circular buffer. I have done

    an FPGA design for a defense contractor showing that 1024 point FFT can run

    continuously via an efficient stereo pair FIFO buffer on input and outputs.

    Definitelly can be done! Wanna make sure you don't overrun your FIFO so

    tune it properly by choosing an acceptable size so that the font does not

    catch up to the tail...The Blackfins also allow linked list buffers

    probably for graphics buffers which is most likely what you are describing.

    Or perhaps cascading filter buffers...

    On Sun, Jul 15, 2018, 4:41 AM UweS <analog@analog-vm.hosted.jivesoftware.com>

  • It would be interesting to compare the efficiencies of block processing and sample-by-sample processing. In the example code I give, sample-by-sample allows up to 8000 32-bit FIR filter coefficients at a sample rate of 48 kHz. For 16-bit, it's about 13000. Anyone done any speed comparisons with block mode? I have not implemented block mode FIR yet - am I correct in assuming it would be better (time domain, not FFT based filtering)?

  • Not very much can be gain by block processing in this case, since sample by sample
    processing is already quite efficient.
    At fc = 400 MHz clock frequency and fs = 48 kHz sampling frequency fc / fs =
    8333 cycles are available for one audio sample (or a frame left/right).
    The Blackfin+ is able to compute a 32 bit FIR filter of N coefficients in N cycles
    (or 2*N coefficients in N cycles for 16 bit filters).

    Therefore, a filter length of 8333 is the theoretical maximum. 8000 is 96% of that maximum.
    No significant improvement is possible unless other algorithms for block FIR are used.

    For some algorithms like biquad filters, the function call overhead can be an issue with
    sample by sample processing. In this case C/C++ inline functions can be advantageous.