[#5721] Sometimes kernel locks while accessing SPI devices
Submitted By: Eduardo Tagle
Open Date
2009-11-24 09:40:12
Priority:
Medium High Assignee:
Barry Song
Status:
Open Fixed In Release:
N/A
Found In Release:
2010R1 Release:
2010-RC1
Category:
Drivers Board:
Custom
Processor:
BF532 Silicon Revision:
bf532-0.5
Is this bug repeatable?:
Yes Resolution:
Fixed
Uboot version or rev.:
Toolchain version or rev.:
gcc versión 4.1.2 (ADI svn)
App binary format:
N/A
Summary: Sometimes kernel locks while accessing SPI devices
Details:
Lately, i have been facing random kernel lockups while accessing an SPI device connected to the Blackfin SPI bus. The system seems to freeze ... But it does not happen all the time... I suspected the SPI driver, and i compiled a debug version of it to check and try to find out where the problem is.
But, when the SPI driver (spi_bfin5xx.c) is compiled with debugging messages, the random lockups disappear, and the system works extremely reliable.
I started to suspect the problem could be a time related one, perhaps a race condition, so a commented out all debugging messages, and selectively uncommented some of them, trying to reproduce the problem.
I found out that when another driver (using the SPI bus driver) queus a message, them message is always queued, but, sometimes, instead of starting the message transfer, the hw simply freezes.
I won't tell exactly how much time it took me to trace this problem (it is a long story, and it is meaningless here), but i finally found the piece of code that _sometimes_ works, _sometimes_ locks the system and that works realiably if debugging messages are turned on.
Everytime a new message is started, a call to the function bfin_spi_restore_state() is done. That is the last function executed before the system locks. If you look carefully, there you disable SPI, reconfigure it, and reenable it. I found that if i comment out all debugging messages except the one present on that function, then the driver works realiably. But if i comment out that debugging message and uncomment all the others present on the driver, then the system randomly locks up.
So, i started to suspect, perhaps the SPI peripheral requires some time before being reenabled... I moved the debug message to the start of the function, and the random lock ups are still there... I added a CSYNC() just after bfin_spi_disable() and the problem disappears (and the driver starts to woirk reliably)
Well, after all those experiments, i suspect the weak read/write ordering of the Blackfin could be causing this problem... So, i have dissasembled the function and commented it:
000001b8 <_bfin_spi_restore_state>:
1b8: 08 32 P1 = R0;
/* clear status and disable clock */
1ba: f8 63 R0 = -0x1 (X); /* R0=0xffffffff( -1) */
1bc: 8a ac P2 = [P1 + 0x8];
1be: 08 e5 1a 00 P0 = [P1 + 0x68];
1c2: 42 6c P2 += 0x8; /* ( 8) */
1c4: 10 97 W[P2] = R0; /* this the write_STAT(drv_data, BIT_STAT_CLR); */
/* inlined bfin_spi_disable() - Disables SPI */
1c6: 8a ac P2 = [P1 + 0x8];
1c8: 10 95 R0 = W[P2] (Z);
1ca: 81 e1 ff bf R1 = 0xbfff (Z); /* R1=0xbfff(49151) */
1ce: 08 54 R0 = R0 & R1;
1d0: 10 97 W[P2] = R0; /* this is write_CTRL(drv_data, (read_CTRL(drv_data) & (~BIT_CTL_ENABLE))); */
/* Load the registers */
1d2: 8a ac P2 = [P1 + 0x8];
1d4: 00 95 R0 = W[P0] (Z);
1d6: 10 97 W[P2] = R0; /* write_CTRL(drv_data, chip->ctl_reg); */
1d8: 8a ac P2 = [P1 + 0x8];
1da: 40 a4 R0 = W[P0 + 0x2] (Z);
1dc: a2 6c P2 += 0x14; /* ( 20) */
1de: 10 97 W[P2] = R0; /* write_BAUD(drv_data, chip->baud); */
/* inlined bfin_spi_enable() - Reenables SPI */
1e0: 8a ac P2 = [P1 + 0x8];
1e2: 10 95 R0 = W[P2] (Z);
1e4: c0 42 R0 = R0.L (Z);
1e6: 70 4a BITSET (R0, 0xe); /* bit 14 */
1e8: 10 97 W[P2] = R0; /* write_CTRL(drv_data, (read_CTRL(drv_data) | BIT_CTL_ENABLE)); */
1ea: 41 30 R0 = P1;
1ec: 48 30 R1 = P0;
1ee: ff e2 cd ff JUMP.L 0x188 <_bfin_spi_cs_active>;
...
I think the problem is the weak read/write ordering, but i'm not sure.. But, adding a CSYNC() just after the bfin_spi_disable() function call solves the problem....
Please, i am right ? Is the problem related to the weak read/write ordering of the Blackfin ? Why does sometimes the hw lock up if i don't place the CSYNC() call ? - The only processor i have access to is a Bf532-0.5, so i can't try this on other members of the Blackfin family...
Well, really hope you can try to answer, at least some of those questions... Thanks in advance,
Eduardo
Follow-ups
--- Barry Song 2009-11-29 22:54:07
Eduardo,
Thanks for digging. It should make sense to think sync or delay required
between continuous spi disable and enable to avoid possible spi controller
hang.
-Barry
--- Barry Song 2009-11-29 23:10:18
And in fact, the register access to the spi controller should be in right order
and will not be adjusted. The main problem should be delay, I think.
--- Eduardo Tagle 2009-11-30 22:21:47
Yes, i also think the problem is some missing delay. It was very strange... I am
running the ADSP-BF532-0.5 387(MHz CCLK) 129(MHz SCLK) (mpu on)
On my board, i am using the SPI bus to access an MMC card, and a SPI
addressable perphiperal also attached to the same SPI bus that manages a qwerty
keyboard, a touchscreen and some LEDs.
I have written several kernel drivers to be able to access those leds and
keyboard through the linux led infrastructure and input event infrastructure,
and i use one of the leds to monitor MMC access, as supported by the linux led
infrastructure.
I suspect that when booting from the MMC card (a custom bootloader written by
me) and telling the linux kernel to use the mmc card filesystem as root
filesystem, i am placing a very heavy load to the SPI bus driver. Thats why i
have reported several problems and fixes for it, and also i have extensively
debugged that driver. Well, under such strain, i have seen the described SPI
device lockups. If i try to reproduce the problem when the mmc card is not used
as root filesystem, those problems seem not to happen.
In fact, i have also seen another problem that i wasn't able to fix... If i
enable either SPI DMA or interrupt driven PIO for my SPI device (the one
cntrolling touchscreen/keyboard/leds), i have also seen that data is transferred
as expected, but the interrupt that should be fired when the transfer completes
never happens... I really don't know why...
I was thinking on some hw problem on my design, but i doubt it could be the
cause, as i have written a program to stress the system (it is a memory test
program, basically, i ported the memtest suite to BlackFin), and i have left it
running for several days without any trouble...
Really don't know what's happening here, all i can tell is that adding that
CSYNC solves the problem for me :S
Regards, Eduardo
Files
Changes
Commits
Dependencies
Duplicates
Associations
Tags
File Name File Type File Size Posted By
No Files Were Found