[#5721] Sometimes kernel locks while accessing SPI devices

Document created by Aaronwu Employee on Sep 5, 2013
Version 1Show Document
  • View in full screen mode

[#5721] Sometimes kernel locks while accessing SPI devices

Submitted By: Eduardo Tagle

Open Date

2009-11-24 09:40:12    

Priority:

Medium High     Assignee:

Barry Song

Status:

Open     Fixed In Release:

N/A

Found In Release:

2010R1     Release:

2010-RC1

Category:

Drivers     Board:

Custom

Processor:

BF532     Silicon Revision:

bf532-0.5

Is this bug repeatable?:

Yes     Resolution:

Fixed

Uboot version or rev.:

    Toolchain version or rev.:

gcc versión 4.1.2 (ADI svn)

App binary format:

N/A     

Summary: Sometimes kernel locks while accessing SPI devices

Details:

 

Lately, i have been facing random kernel lockups while accessing an SPI device connected to the Blackfin SPI bus. The system seems to freeze ... But it does not happen all the time... I suspected the SPI driver, and i compiled a debug version of it to check and try to find out where the problem is.

But, when the SPI driver (spi_bfin5xx.c) is compiled with debugging messages, the random lockups disappear, and the system works extremely reliable.

I started to suspect the problem could be a time related one, perhaps a race condition, so a commented out all debugging messages, and selectively uncommented some of them, trying to reproduce the problem.

I found out that when another driver (using the SPI bus driver) queus a message, them message is always queued, but, sometimes, instead of starting the message transfer, the hw simply freezes.

I won't tell exactly how much time it took me to trace this problem (it is a long story, and it is meaningless here), but i finally found the piece of code that _sometimes_ works, _sometimes_ locks the system and that works realiably if debugging messages are turned on.

Everytime a new message is started, a call to the function bfin_spi_restore_state() is done. That is the last function executed before the system locks. If you look carefully, there you disable SPI, reconfigure it, and reenable it. I found that if i comment out all debugging messages except the one present on that function, then the driver works realiably. But if i comment out that debugging message and uncomment all the others present on the driver, then the system randomly locks up.

So, i started to suspect, perhaps the SPI peripheral requires some time before being reenabled... I moved the debug message to the start of the function, and the random lock ups are still there... I added a CSYNC() just after bfin_spi_disable() and the problem disappears (and the driver starts to woirk reliably)

Well, after all those experiments, i suspect the weak read/write ordering of the Blackfin could be causing this problem... So, i have dissasembled the function and commented it:

 

000001b8 <_bfin_spi_restore_state>:

     1b8:    08 32           P1 = R0;

 

        /* clear status and disable clock */

     1ba:    f8 63           R0 = -0x1 (X);        /*        R0=0xffffffff( -1) */

     1bc:    8a ac           P2 = [P1 + 0x8];

     1be:    08 e5 1a 00     P0 = [P1 + 0x68];

     1c2:    42 6c           P2 += 0x8;        /* (  8) */

     1c4:    10 97           W[P2] = R0;  /* this the write_STAT(drv_data, BIT_STAT_CLR); */

 

        /* inlined bfin_spi_disable() - Disables SPI */

     1c6:    8a ac           P2 = [P1 + 0x8];

     1c8:    10 95           R0 = W[P2] (Z);

     1ca:    81 e1 ff bf     R1 = 0xbfff (Z);        /*        R1=0xbfff(49151) */

     1ce:    08 54           R0 = R0 & R1;

     1d0:    10 97           W[P2] = R0;  /* this is  write_CTRL(drv_data, (read_CTRL(drv_data) & (~BIT_CTL_ENABLE))); */

 

        /* Load the registers */   

     1d2:    8a ac           P2 = [P1 + 0x8];

     1d4:    00 95           R0 = W[P0] (Z);

     1d6:    10 97           W[P2] = R0; /* write_CTRL(drv_data, chip->ctl_reg); */

     1d8:    8a ac           P2 = [P1 + 0x8];

     1da:    40 a4           R0 = W[P0 + 0x2] (Z);

     1dc:    a2 6c           P2 += 0x14;        /* ( 20) */

     1de:    10 97           W[P2] = R0; /* write_BAUD(drv_data, chip->baud); */

 

        /* inlined bfin_spi_enable() - Reenables SPI */

     1e0:    8a ac           P2 = [P1 + 0x8];

     1e2:    10 95           R0 = W[P2] (Z);

     1e4:    c0 42           R0 = R0.L (Z);

     1e6:    70 4a           BITSET (R0, 0xe);        /* bit 14 */

     1e8:    10 97           W[P2] = R0;  /* write_CTRL(drv_data, (read_CTRL(drv_data) | BIT_CTL_ENABLE)); */

     1ea:    41 30           R0 = P1;

     1ec:    48 30           R1 = P0;

     1ee:    ff e2 cd ff     JUMP.L 0x188 <_bfin_spi_cs_active>;

    ...

 

I think the problem is the weak read/write ordering, but i'm not sure.. But, adding a CSYNC() just after the bfin_spi_disable() function call solves the problem....

 

Please, i am right ? Is the problem related to the weak read/write ordering of the Blackfin ? Why does sometimes the hw lock up if i don't place the CSYNC() call ? - The only processor i have access to is a Bf532-0.5, so i can't try this on other members of the Blackfin family...

 

Well, really hope you can try to answer, at least some of those questions... Thanks in advance,

Eduardo

 

 

Follow-ups

 

--- Barry Song                                               2009-11-29 22:54:07

Eduardo,

Thanks for digging. It should make sense to think sync or delay required

between continuous spi disable and enable to avoid possible spi controller

hang.

-Barry

 

--- Barry Song                                               2009-11-29 23:10:18

And in fact, the register access to the spi controller should be in right order

and will not be adjusted. The main problem should be delay, I think.

 

 

--- Eduardo Tagle                                            2009-11-30 22:21:47

Yes, i also think the problem is some missing delay. It was very strange... I am

running the ADSP-BF532-0.5 387(MHz CCLK) 129(MHz SCLK) (mpu on)

On my board, i am using the SPI bus to access an MMC card, and a SPI

addressable perphiperal also attached to the same SPI bus that manages a qwerty

keyboard, a touchscreen and some LEDs.

I have written several kernel drivers to be able to access those leds and

keyboard through the linux led infrastructure and input event infrastructure,

and i use one of the leds to monitor MMC access, as supported by the linux led

infrastructure.

I suspect that when booting from the MMC card (a custom bootloader written by

me) and telling the linux kernel to use the mmc card filesystem as root

filesystem, i am placing a very heavy load to the SPI bus driver. Thats why i

have reported several problems and fixes for it, and also i have extensively

debugged that driver. Well, under such strain, i have seen the described SPI

device lockups. If i try to reproduce the problem when the mmc card is not used

as root filesystem, those problems seem not to happen.

In fact, i have also seen another problem that i wasn't able to fix... If i

enable either SPI DMA or interrupt driven PIO for my SPI device (the one

cntrolling touchscreen/keyboard/leds), i have also seen that data is transferred

as expected, but the interrupt that should be fired when the transfer completes

never happens... I really don't know why...

I was thinking on some hw problem on my design, but i doubt it could be the

cause, as i have written a program to stress the system (it is a memory test

program, basically, i ported the memtest suite to BlackFin), and i have left it

running for several days without any trouble...

Really don't know what's happening here, all i can tell is that adding that

CSYNC solves the problem for me :S

Regards, Eduardo

 

 

 

    Files

    Changes

    Commits

    Dependencies

    Duplicates

    Associations

    Tags

 

File Name     File Type     File Size     Posted By

No Files Were Found

Attachments

    Outcomes