2011-05-12 09:32:39 ATAPI driver
Philippe VERDIER (FRANCE)
Message: 100615
Dear Sir,
We have developed a multi DSPs application used to analyze and record analog
signals on hard disk. Up to now the record process used a ADSP21065L that we
want to replace by a BF548.
The input data are read from a global shared SRAM memory connected to the BF548
via the external asynchronous memory interface. The hard disk is connected to
the BF548 via the ATAPI bus with the GPIO (not the asynchronous bus). For the
new design we are using uCLinux (release 2010R1) to develop our software
application. To access the global 32 bits SRAM memory we have developed a
simple char driver using DMA (the memory DMA stream 2 with the DMAC1 DMA
controller). To access the hard disk we are using the ATAPI driver of uCLinux
by opening directly the device (/dev/sda) because we have our own file system.
We have developed a test program in order to measure the throughput from the
global memory to the hard disk. The result are the following :
- read from external global SRAM to internal L1 buffer (DMA of 2048 longs) :
~13.96 MByte/s
- write to disk from L1 buffer (by block of 2048
longs) : 13 MByte/s
- read from SRAM to L1 buffer and write to disk (by blocks of 2048 longs) :
~7.36 MByte/s
NOTE : the last result is better than expected (arithmetic average of the 2
firsts throughputs) because the ATAPI driver must used some kinds of non
blocking DMA.
In order to optimize the record process we have modified our global shared
memory driver by adding a non-blocking read mode. In this mode the read
function returns as soon as the DMA from the global SRAM memory to a local
buffer is started.
At the beginning of the function we have also added a call to "wait_for_completion_interruptible" in order to
wait the completion of the previous non blocking DMA transfer.
The operation is as follows:
- 2 buffers of 2048 longs are allocated into the internal L1 SRAM (one in L1A
bank and the other in L1B)
- a first blocking read into the global memory is started to fill the first
buffer
- we move the driver in non blocking mode
In a loop for a specified record size (1 GByte)
- non blocking read in the global memory to the free buffer
- write the full buffer on the hard disk
End of loop
- Wait for the end of the last global memory read
- write the last buffer on the hard disk
We expected to get a rate of around 13 MByte / s (the time of writing a block
of 2048 longs on the hard disk is comparable to the time of reading a block of
the same size in global memory).
The measured throughput is 8.77 MByte / s.
I send these informations to the Analog Device "processor support" team to ask if it is not an hardware conflict between the global memory DMA and the ATAPI DMA. Their response is no (see the attached file mailanalogpbdma).
I tried several uclinux kernel configuration but failed to improve the write speed to disk (preemption model changed, changed the kernel timer frequency, enabled "DMA priority over core activity"...).
We are using the DASP signal of the IDE bus to drive a LED wich indicates disk activity. When the data writen to the disk are read directly from external DDR or from internal L1 memory the LED is "on" during all the recording time. But if the data are first read from the global shared SRAM memory (with or without non blocking DMA) the activity LED flash every few Hz (as if something is blocking the system while all this goes well if the two operations, read from shared memory to L1 and write from L1 to disk, are performed separately).
Is it possible that the ATAPI driver block the DMAC1 controller during disk access ?
Or are you another explanation and/or workarround ?
Regards,
Philippe VERDIER
ConfiguClinuxBF548.txt
mailanalogpbdma.txt
bootlinuxbf548.txt
QuoteReplyEditDelete
2011-06-01 04:30:49 Re: ATAPI driver
Sonic Zhang (CHINA)
Message: 100969
All read/write operations to ATA registers are blocked in DMA mode, which may cost CPU overhead. But, data transfer between DRAM and ATA device is not blocked.
QuoteReplyEditDelete
2011-06-01 05:01:42 Re: ATAPI driver
Sonic Zhang (CHINA)
Message: 100970
Could you try to run following 2 of your test cases concurrently against 2 different buffers in L1? The result can tell you if it is a L1 access conflict.
- read from external global SRAM to internal L1 buffer (DMA of 2048 longs) :
~13.96 MByte/s
- write to disk from L1 buffer (by block of 2048
longs) : 13 MByte/s