Post Go back to editing

SoC processor too slow for real-time processing

Dear all,

     We are trying to capture data from ADRV9009 by using ZCU102. We use No-OS project to real time capture data.

During our work, we observed that No-OS code is able to successfully capture data data blocks at very high-rate (122.22 MSPS ), however, the processor in SoC works very slow and cannot process the data within the time between capturing data blocks. To make an estimate of the processing time, we performed following experiment.

In headless.c file, we put the axi_dmac_transfer() function in a while(1) loop, like this

while(1)
        {

            axi_dmac_transfer(rx_dmac,
                  DDR_MEM_BASEADDR + 0x800000,
                  NUM_OF_SAMPLES_PER_CHANNEL * TALISE_NUM_CHANNELS *
                  DIV_ROUND_UP(talInit.jesd204Settings.framerA.Np, 8));

        }

In axi_dmac.c file, we added following global counters

uint32_t timeout0 = 0, timeout1 = 0, timeout2 = 0,  count_time = 0;

Then in axi_dmac_transfer() function, we incremented these counters like this (Modified code after line 267 is provided)

if (dmac->flags & DMA_CYCLIC)
        return SUCCESS;
    count_time++;
    /* Wait until the new transfer is queued. */
    do {
        timeout0++;
        axi_dmac_read(dmac, AXI_DMAC_REG_START_TRANSFER, &reg_val);
    } while(reg_val == 1);

    /* Wait until the current transfer is completed. */
    do {
        timeout1++;
        axi_dmac_read(dmac, AXI_DMAC_REG_IRQ_PENDING, &reg_val);
        if (reg_val == (AXI_DMAC_IRQ_SOT | AXI_DMAC_IRQ_EOT))
            break;
    } while(!dmac->big_transfer.transfer_done);
    if (reg_val != (AXI_DMAC_IRQ_SOT | AXI_DMAC_IRQ_EOT))
        axi_dmac_write(dmac, AXI_DMAC_REG_IRQ_PENDING, reg_val);

    /* Wait until the transfer with the ID transfer_id is completed. */
    do {
        timeout2++;
        axi_dmac_read(dmac, AXI_DMAC_REG_TRANSFER_DONE, &reg_val);
    } while((reg_val & (1u << transfer_id)) != (1u << transfer_id));


    if (count_time>=1000){
        count_time =0;
    }

We set breakpoint a within the below-mentioned loop to see the value of these counters for 1000 executions of axi_dmac_transfer() function

   if (count_time>=1000){
        count_time =0;
    }

We noted following value of counters for two different values of size of data (We set the data size by varying  NUM_OF_SAMPLES_PER_CHANNELvariable while calling axi_dmac_transfer function in headless.c file).

If we set NUM_OF_SAMPLES_PER_CHANNEL to 128

count_time=1000

time_out1=4079

This shows that SoC processor can almost execute only  4 instructions (time_out1/count_time =4)  between capturing two blocks of 128 samples.

If we set NUM_OF_SAMPLES_PER_CHANNEL to 3712

count_time=1000

time_out1=84909

This shows that SoC processor can almost execute only 85 instructions (time_out1/count_time =85)  between capturing two blocks of 3712 samples.

Hence, we have concluded that SoC processor is not able to perform processing between the capturing of two blocks

Can anyone guide us that how can we perform real-time processing in SoC

Many thanks,

Best regards,

Avais

Top Replies

  • Hi  ,

    If I understand this correctly, I think you should rather look at timeout2 not timeout1 when a lot of waiting is happening but in any case, could you redo the test and show all variables values when hitting count_time=1000? timeout0, timeout1, timeout2

  • These are the all timer values.
    For the case no of sampels 3712
    cout_time   =   1000
    timeout0     =   1000
    timeout1     =    84909
    timeout2     =    1000
    For the case no of sampels 128
    cout_time   =   1000
    timeout0     =   1000
    timeout1     =    4079
    timeout2     =    1000
    Many thanks,
    Best regards,
    Avais
  • Hello  ,

    One solution would be the use of interrupts and the transfer of large chunks of data, so that between the initiation of a DMA transfer and its completion, you have enough time to process the data from the previous one. The maximum supported transfer size is the max_length member of the axi_dmac structure (current no-OS DMA driver).

    You can take a look at the following thread:  https://ez.analog.com/microcontroller-no-os-drivers/f/q-a/90838/ad9361-no-os-rx-dma-cyclic-mode/393269

    Furthermore, you can use ping-pong buffers for assuring that you have enough time for processing data.

    Regards,

    George

  • Dear George,

           Many thanks for your kind reply. I can foresee a problem in the proposed solution of "Using large chunk of data". If we use "Large Chunk of Data", no doubt, we will have more time between the reception of the two consecutive data chunks, however, it will also increase the data size, so the time required to process the data will also increase. Hence, the problem will remain same. 

    In our experimentation, as described in the previous email.

    For no of samples = 128

    count_time = 1000,            timeout1  =  4079,            timeout1 /  count_time = 4079 / 1000 = 4.079,

    For no of samples = 3712

    count_time = 1000,           timeout1  =  84909,          timout1 / count_time = 84909 / 1000 = 84.909.

    From this experimentation, we have deduced that the processor can only execute at-maximum 4.079 instructions between capturing two blocks of 128 samples, and processor can only execute at-maximum 85 instructions between capturing two blocks of 3712 samples. This shows that the no. of instructions that can be executed between the reception of two data chunks is very low, therefore, we have concluded that the processor works very slow and is not capable of handling data real-time.

    Can you please, guide that is our conclusion correct?, or should we perform some other experimentation to measure the no. of instructions that can be executed between the reception of two consecutive data chunks.

    Further, the idea of ping-pong buffer will only work, if one chunk of data can be completely processed, when the next data block is being received. As our experimentation reveal that the no. of instructions that can be executed between the time of receiving two data chunks is very very small. Therefore, we think that using ping-pong buffer is not the solution of the problem. Our main problem is that SoC processor is very slow.

  • Hello ,

    I believe that computing the number of instructions between two moments in time is a more complex problem. However, the conclusion is valid: depending on the size of the chunk of read data, you will have more or less time for data processing.

    Further, the problem seems to be one of optimization, both in terms of hardware (number of processors, capacity of parallel execution, etc.) and software (multi-threading, multi--processing, code parallelization, etc.). These adjustments are specific to any given application and should be carefully analyzed in the specification phase.

    Regards,

    George