2009-03-26 15:28:32 SPORT Latency test on BF527
Giuseppe Olivadoti (UNITED STATES)
Message: 71629
When testing the latency between two bursts of data out from the SPORT I see about 500 uSec of latency from when a SPORT transmission is finished until the time when the next burst begins.
The test software performs following functions:
1. Transfer via SPORT interface 8 bytes package using DMA.
2. When transfer is done set programmable flag.
3. Reset programmable flag.
4. Repeat 1-3
5. Sleep for 1 sec
6. Loop back to 1
The BF527 EZKit runs Kernel snapshot revision 7780 from 02/20/2009 and CPU clocks were set to provide the following frequencies:
· Core CLK 400 MHz
· System CLK 100 MHz
See attached scope plots for more detail.
The delay between when the actual transfer is done and when CPU is ready to transfer next package may not only be interrupt latency, but also SPORT driver / Kernel latency. A similar experiment using the SPI as opposed to the SPORT yields latency in the order of 50 uSec, as opposed to the 500 uSec on the SPORT.
I wonder if someone can point out why this behaves like this, and if I can modify my test to improve the latency.
Attached is the simple user space application that shows the issue, it is written so you could trigger a scope on the GPIO's.
SPORT plot1.bmp
SPORT plot2.bmp
SPORT_527Kit_main.cpp
QuoteReplyEditDelete
2009-03-26 15:56:03 Re: SPORT Latency test on BF527
Robin Getz (UNITED STATES)
Message: 71632
Giuseppe:
I'm not sure what you are doing wrong,
We did some benchmarks, using a function generator hooked up to a GPIO IRQ.In the ISR I wiggle a second TEST GPIO high for 1 us.
We measured the delay between GPIO IRQ asserted and TEST IRQ asserted.
The typical interrupt latency of a system in idle is approx 2.7us (min 2.6us, max 5.2us)
Under high process and IO load (Ethernet ping flood, and full UART activity)
The average interrupt latency is approx 6 us. (min 2.6us, max 35us)
root:/> version
kernel: Linux release 2.6.28.7-ADI-2009R1-pre-svn6218, build #1439 Wed Mar 25 13:35:15 CET 2009
toolchain: bfin-uclinux-gcc release gcc version 4.1.2 (ADI svn)
user-dist: release svn-7922, build #2888 Wed Mar 25 13:34:32 CET 2009
root:/> cat /proc/cpuinfo
processor : 0
vendor_id : Analog Devices
cpu family : 0x27c8
model name : ADSP-BF537 500(MHz CCLK) 125(MHz SCLK) (mpu off)
stepping : 3
cpu MHz : 500.000/125.000
bogomips : 997.37
Calibration : 498688000 loops
cache size : 16 KB(L1 icache) 32 KB(L1 dcache-wb) 0 KB(L2 cache)
dbank-A/B : cache/cache
icache setup : 4 Sub-banks/4 Ways, 32 Lines/Way
dcache setup : 2 Super-banks/4 Sub-banks/2 Ways, 64 Lines/Way
board name : ADI BF537-STAMP
board memory : 65536 kB (0x00000000 -> 0x04000000)
kernel memory : 57336 kB (0x00001000 -> 0x037ff000)
QuoteReplyEditDelete
2009-03-26 16:00:44 SPORT Latency test on BF527
Michael Hennerich (GERMANY)
Message: 71633 Hi Giuseppe,
In Linux - device drivers typically life in kernel (supervisor) space and not in user space apps.
So if you invoke a system call write(sport_fd, ...) and then do write(gpio_fd, ...) you have several times the system call overhead in linux.
Interrupt latency on Blackfin Linux is somewhere between 1.6us and 30us on average apprx 2.6us and for a very busy system around 6us.
In case you move your code into the kernel you see the next burst start typ. 2.6us after the DMA DONE Int of pervious transfer triggered.
The SPI driver does busy waiting IO, so it's much more likely that the system call returns to the invoking process.
-Michael
QuoteReplyEditDelete
2009-03-30 11:22:30 Re: SPORT Latency test on BF527
Giuseppe Olivadoti (UNITED STATES)
Message: 71783
Thanks,
Looking deeper into the issue it turns out that the SPORT TX IRQ handler code has a 500 uSec delay. I assume this is to allow the SPORT TX buffer to drain after the DMA is done. Any advice on how much this could safely be shortened by? This would account for almost all of the latency differences between the GPIO tests you have run and the SPORT test listed above.
static irqreturn_t dma_tx_irq_handler(int irq, void *dev_id)
{
struct sport_dev *dev = dev_id;
unsigned int status;
pr_debug("%s enter\n", __func__);
status = get_dma_curr_irqstat(dev->dma_tx_chan);
while (status & DMA_RUN) {
status = get_dma_curr_irqstat(dev->dma_tx_chan);
pr_debug("status:0x%04x\n", status);
}
status = dev->regs->stat;
while (!(status & TXHRE)) {
pr_debug("%s status:%x\n", __func__, status);
udelay(1);
status = *(volatile unsigned short *)&dev->regs->stat;
}
/* Wait for the last byte sent out */
udelay(500); //??? Why do we need this delay
pr_debug("%s status:%x\n", __func__, status);
dev->regs->tcr1 &= ~TSPEN;
SSYNC();
disable_dma(dev->dma_tx_chan);
dev->wait_con = 1;
wake_up(&dev->waitq);
/* Clear the interrupt status */
clear_dma_irqstat(dev->dma_tx_chan);
return IRQ_HANDLED;
}
QuoteReplyEditDelete
2009-03-30 16:25:19 Re: SPORT Latency test on BF527
Robin Getz (UNITED STATES)
Message: 71792
Giuseppe:
All those udelays look like they are *bad*, and are working around the issue of there is no tc_drain type functionality on the SPORT. You know when it goes in the buffer, but not when it comes out the wire. Is there an equivilent in the SPORT that we missed?
Since those were added 2006 (by someone who is no longer here), I'm not sure we are going to find out where 500 came from.
-Robin
QuoteReplyEditDelete
2009-03-30 18:14:58 Re: SPORT Latency test on BF527
Mike Frysinger (UNITED STATES)
Message: 71800
if that's true, it's going to make implementing tcdrain() on a SPORT/UART messy as we will have to overestimate the character time and sleep for that long ...
QuoteReplyEditDelete
2009-03-30 23:06:12 Re: SPORT Latency test on BF527
Sonic Zhang (CHINA)
Message: 71804
The sport clock in the common sport driver can be always enabled when the device is opened. Then the delay in each write operation can be discarded.
QuoteReplyEditDelete
2009-03-31 06:56:24 Re: SPORT Latency test on BF527
Cliff Cai (CHINA)
Message: 71869
500us seems too long.when tx hold register is empty,we just need to wait around 33 clock to ensure that the last bit has been shifted out.
Cliff