FAQ:[#6014] bf537 and bf527 Processor stalls in SSYNC randomly under heavy network load(2010)

Document created by Aaronwu Employee on Sep 10, 2013
Version 1Show Document
  • View in full screen mode

[#6014] bf537 and bf527 Processor stalls in SSYNC randomly under heavy network load

Submitted By: Stefan Wanja

Open Date

2010-04-23 08:44:15    

Priority:

Medium     Assignee:

Nobody

Status:

Open     Fixed In Release:

N/A

Found In Release:

2009R1.1-RC4     Release:

Category:

Kernel Functions     Board:

N/A

Processor:

BF537     Silicon Revision:

0.3 (bf537), 0.2 (bf527)

Is this bug repeatable?:

Yes     Resolution:

N/A

Uboot version or rev.:

2008R1, ADI-2009R1-rc3     Toolchain version or rev.:

rev 3070 and rev 3792

App binary format:

N/A     

Summary: bf537 and bf527 Processor stalls in SSYNC randomly under heavy network load

Details:

 

When the bf527 and bf537 run under network load, sometimes they stall. They can only be reached by the gnICE debugger then and it shows, that the bf537 processor stays at a ssync instruction which was called from 'bfin_mac.c' at the function bfin_mac_hard_start_xmit after blackfin_dcache_flush_range was called. The bf527 seems to stay at the ssync as well, but the first entry in stack says its in an interrupt handler (one of the evt_evtX functions).

 

Steps to reproduce:

uClinux:

boot, run "ifconfig eth0 up <ip> netmask <netmask>", run "iperf -s &",

over the run time, make sure the log doesn't fill up the whole memory.

 

PC:

run "iperf -c <blackfin ip> -P 1 -f M -i <log interval> -d -t <runtime_in_s>"

(I have attached a windows binary, for linux it should be available on every distro)

 

Let it run for a COUPLE OF DAYS! If iperf logs that there is no more traffic,

check serial console of uclinux. If it is not alive anymore, check with gnICE+,

otherwise restart test.

It can really take a long time but can also happen after some seconds. Seems to be randomly, possibly related to interrupts.

 

 

This thread in the forum is related to this problem and has more information:

https://blackfin.uclinux.org/gf/project/uclinux-dist/forum/?action=ForumBrowse&forum_id=39&_forum_action=ForumMessageBrowse&thread_id=39883

 

Files attached are:

my kernel config         : .config

my uClinux system config : system_config

my rootfs                : rootfs.jffs2

my kernel                : vmImage

win iperf                : iperf.exe

 

Follow-ups

 

--- Stefan Wanja                                             2010-04-23 09:13:55

One important thing I forgot, this only seems to happen in WRITE THROUGH cache

mode.

 

--- Robin Getz                                               2010-04-23 11:14:26

Stefan:

 

When I look in 'bfin_mac.o' - there are only 2 SSYNCs.

 

Can you check yours?

 

One in <_mii_probe>:

 

b10:   13 97           W[P2] = R3;

b12:   30 00           CLI R0;

b14:   00 00           NOP;

b16:   00 00           NOP;

b18:   24 00           SSYNC;

b1a:   40 00           STI R0;

b1c:   20 00           IDLE;

 

One in _bfin_mac_hard_start_xmit

 

940:   ff e3 60 fb     CALL 0x0 <_bfin_mdiobus_reset>;

944:   30 00           CLI R0;

946:   00 00           NOP;

948:   00 00           NOP;

94a:   24 00           SSYNC;

94c:   40 00           STI R0;

94e:   69 91           P1 = [P5];

 

(other than the exact addresses), does that look close - and when things hang,

it's in the first, or 2nd?

 

-Robin

 

--- Stefan Wanja                                             2010-04-23 11:46:51

Robin: Yes mine looks similar. Looks close. Hangs on the second.

 

--- Robin Getz                                               2010-04-23 12:08:23

OK - after re-reading anomaly 244 & 312, and going to talk to the person who

wrote things - after a discussion - and whining (we missed some things), and he

didn't make things very clear....

 

I updated ./arch/blackfin/include/asm/blackfin.h with this patch.

===================================================================

--- arch/blackfin/include/asm/blackfin.h        (revision 8659)

+++ arch/blackfin/include/asm/blackfin.h        (working copy)

@@ -17,22 +17,16 @@

static inline void SSYNC(void)

{

        int _tmp;

-       if (ANOMALY_05000312)

+       if (ANOMALY_05000312 || ANOMALY_05000244)

                __asm__ __volatile__(

                        "cli %0;"

                        "nop;"

                        "nop;"

+                       "nop;"

                        "ssync;"

                        "sti %0;"

                        : "=d" (_tmp)

                );

-       else if (ANOMALY_05000244)

-               __asm__ __volatile__(

-                       "nop;"

-                       "nop;"

-                       "nop;"

-                       "ssync;"

-               );

        else

                __asm__ __volatile__("ssync;");

}

@@ -41,22 +35,16 @@

static inline void CSYNC(void)

{

        int _tmp;

-       if (ANOMALY_05000312)

+       if (ANOMALY_05000312 || ANOMALY_05000244)

                __asm__ __volatile__(

                        "cli %0;"

                        "nop;"

                        "nop;"

+                       "nop;"

                        "csync;"

                        "sti %0;"

                        : "=d" (_tmp)

                );

-       else if (ANOMALY_05000244)

-               __asm__ __volatile__(

-                       "nop;"

-                       "nop;"

-                       "nop;"

-                       "csync;"

-               );

        else

                __asm__ __volatile__("csync;");

}

@@ -73,14 +61,10 @@

#define ssync(x) SSYNC(x)

#define csync(x) CSYNC(x)

 

-#if ANOMALY_05000312

-#define SSYNC(scratch) cli scratch; nop; nop; SSYNC; sti scratch;

-#define CSYNC(scratch) cli scratch; nop; nop; CSYNC; sti scratch;

+#if ANOMALY_05000312 || ANOMALY_05000244

+#define SSYNC(scratch) cli scratch; nop; nop; nop; SSYNC; sti scratch;

+#define CSYNC(scratch) cli scratch; nop; nop; nop; CSYNC; sti scratch;

 

-#elif ANOMALY_05000244

-#define SSYNC(scratch) nop; nop; nop; SSYNC;

-#define CSYNC(scratch) nop; nop; nop; CSYNC;

-

#else

#define SSYNC(scratch) SSYNC;

#define CSYNC(scratch) CSYNC;

 

-------

 

Basically both need to be protected with 3 NOPS, and CLI/STI.

 

I'm building things up, and will let it run over the weekend - can you do the

same?

 

Thanks

-Robin

 

--- Robin Getz                                               2010-04-23 14:15:20

I committed something to trunk, since it seemed to be correct (and not

negatively effect anything).

 

-Robin

 

--- Robin Getz                                               2010-04-23 15:52:59

OK - things have been running a few hours without issues.

- iperf, pingflood, and userspace is running alterating dhrystone and

whetstone.

 

Based on the description - I don't think a few hours counts, and will let you

know on Wednesday when I'm back.

 

--- Stefan Wanja                                             2010-04-23 16:47:30

I will test that, but your changes won't fix it for the bf527 as ANOMALIES

05000244 and 05000312 are not active on it. Right?

 

--- Stefan Wanja                                             2010-04-27 08:57:58

Now we had a stall at a different place (same on bf527 and bf537). The patch you

proposed had been adopted in the bf537 test. We have seen the place where it

stalled before, but only once or twice. The only connection I can see is that

there is a flush instruction a few steps ahead.

 

Here is the info I can provide:

 

Registers:

r0   = 0x3454e0

r1   = 0x42

p2   = 0x3454e0

rets = 0xcb29a <bfin_mac_interrupt+202>

usp  = 0x3c7bc78

seqstat = 0x0

syscfg = 0x6

reti = 0x384ec9a

retx = 0x384ec9a

retn = 0x3c76000

rete = 0xf2a5a

pc   = 0xf2a5a <skb_put+10>

 

Disassembly around pc:

 

000f2a50 <_skb_put>:

   f2a50:    10 32           P2 = R0;

   f2a52:    78 05           [--SP] = (R7:7);

   f2a54:    00 e8 04 00     LINK 0x10;        /* (16) */

   f2a58:    11 30           R2 = R1;

>>>f2a5a:    10 e4 15 00     R0 = [P2 + 0x54];

   f2a5e:    00 0c           CC = R0 == 0x0;

   f2a60:    17 e4 22 00     R7 = [P2 + 0x88];

   f2a64:    13 10           IF !CC JUMP 0xf2a8a <_skb_put+0x3a>;

   f2a66:    00 00           NOP;

   f2a68:    0f 50           R0 = R7 + R1;

   f2a6a:    10 e6 22 00     [P2 + 0x88] = R0;

   f2a6e:    11 e4 14 00     R1 = [P2 + 0x50];

   f2a72:    51 50           R1 = R1 + R2;

   f2a74:    11 e6 14 00     [P2 + 0x50] = R1;

   f2a78:    11 e4 23 00     R1 = [P2 + 0x8c];

 

That is what the stack looked like:

 

* skb_put(skb=0x3453e0, len=66) at net/core/skbuff.c:958

* bfin_mac_interrupt(irq=<value optimized out>, dev_id=0x3c65000) at

drivers/net/bfin_mac.c:729

* handle_IRQ_event(irq=24, action=0x32adc0) at kernel/irq/handle.c:142

* handle_simple_ieq(irq=24, desc=0x1ec850) t kernel/irq/chip.c:327

* asm_do_IRQ (irq=24, regs=<value optimized out>) at

include/linux/irq.h:282

* do_irq(vec=<value optimized out>,fp=0x42) at

arch/blackfin/mach-common/ints-priority.c:1196

* _common_int_entry () at

/opt/uClinux/uclinux-2009R1-bf537/linux-2.6.x/arch/blackfin/include/asm/current.h:17

 

 

skb_put is called from bfin_mac_rx:

 

static void bfin_mac_rx(struct net_device *dev)

{

    struct sk_buff *skb, *new_skb;

    unsigned short len;

#if defined(BFIN_MAC_CSUM_OFFLOAD)

    unsigned int i;

    unsigned char fcs[ETH_FCS_LENGTH + 1];

#endif

    /* check if frame status word reports an error condition

     * we which case we simply drop the packet

     */

    if (current_rx_ptr->status.status_word & RX_ERROR_MASK) {

        printk(KERN_NOTICE DRV_NAME

               ": rx: receive error - packet dropped\n");

        dev->stats.rx_dropped++;

        goto out;

    }

 

    /* allocate a new skb for next time receive */

    skb = current_rx_ptr->skb;

    new_skb = dev_alloc_skb(PKT_BUF_SZ + NET_IP_ALIGN);

    if (!new_skb) {

        printk(KERN_NOTICE DRV_NAME

               ": rx: low on mem - packet dropped\n");

        dev->stats.rx_dropped++;

        goto out;

    }

    /* reserve 2 bytes for RXDWA padding */

    skb_reserve(new_skb, NET_IP_ALIGN);

    /* Invidate the data cache of skb->data range when it is write back

     * cache. It will prevent overwritting the new data from DMA

     */

    blackfin_dcache_invalidate_range((unsigned long)new_skb->head,

                     (unsigned long)new_skb->end);

 

    current_rx_ptr->skb = new_skb;

    current_rx_ptr->desc_a.start_addr = (unsigned long)new_skb->data - 2;

 

    len = (unsigned short)((current_rx_ptr->status.status_word) &

RX_FRLEN);

    /* Deduce Ethernet FCS length from Ethernet payload length */

    len -= ETH_FCS_LENGTH;

>>    skb_put(skb, len);

        ...

 

skb_put hangs at the SKB_LINEAR_ASSERT(skb) call.

 

A few lines before the call to skb_put there is a call to

'blackfin_dcache_invalidate_range' with a newly dev_alloc_skb'ed memory area.

Shouldn't there be a ssync afterwards anyways?

 

So to sum up, we have two stall places, one in the receive path of bfin_mac and

one in the transmit path. Reason unknown and I assume not completely fixed by

your patch since it doesn't affect bf527.

 

--- Stefan Wanja                                             2010-04-28 05:28:16

Hello again,

 

we have a slightly different place now where bf537 (with your SSYNC changes)

stalled. See below.

 

(gdb) target remote localhost:2000

Remote debugging using localhost:2000

0x000cb2a2 in bfin_mac_interrupt (irq=<value optimized out>,

dev_id=0x3c65000)

    at drivers/net/bfin_mac.c:731

731             dev->last_rx = jiffies;

(gdb) info registers

r0             0x544012 5521426

r1             0x544640 5523008

r2             0x42     66

r3             0x343648 3421768

r4             0x1e5d34 1989940

r5             0x42     66

r6             0x0      0

r7             0x1e5d34 1989940

p0             0x1e5d34 0x1e5d34

p1             0x32     0x32

p2             0x1eb648 0x1eb648

p3             0x3c65000        0x3c65000

p4             0x3a56e60        0x3a56e60

p5             0x3435c0 0x3435c0

sp             0x503670 0x503670

fp             0x503684 0x503684

i0             0x0      0

i1             0x33f504 3405060

i2             0x0      0

i3             0x0      0

m0             0x0      0

m1             0x0      0

m2             0x0      0

m3             0x0      0

b0             0x0      0

b1             0x0      0

b2             0x0      0

b3             0x0      0

l0             0x0      0

l1             0x0      0

l2             0x0      0

l3             0x0      0

a0x            0x0      0

a0w            0x7f     127

a1x            0x0      0

a1w            0x7f     127

astat          0x2002022        33562658

rets           0xcb29a  0xcb29a <bfin_mac_interrupt+202>

lc0            0x0      0

lt0            0xffa0131c       -6286564

lb0            0xffa0131c       -6286564

lc1            0x0      0

lt1            0x7036   28726

lb1            0x7036   28726

cycles         0x339b04b7       865797303

cycles2        0x25e6   9702

usp            0x50bc78 0x50bc78

seqstat        0x0      0

syscfg         0x6      6

reti           0xfbf98  1032088

retx           0x384ec9a        59042970

retn           0x504000 5259264

rete           0xcb2a2  832162

>>pc             0xcb2a2  0xcb2a2 <bfin_mac_interrupt+210>

cc             0x0      0

text_addr      0x0      0

text_end_addr  0x0      0

data_addr      0x0      0

fdpic_exec     0x0      0

fdpic_interp   0x0      0

ipend          0x0      0

 

 

(gdb) info stack

#0  0x000cb2a2 in bfin_mac_interrupt (irq=<value optimized out>,

dev_id=0x3c65000)

    at drivers/net/bfin_mac.c:731

#1  0x0002c87e in handle_IRQ_event (irq=24, action=0x32adc0) at

kernel/irq/handle.c:142

#2  0x0002e1ba in handle_simple_irq (irq=24, desc=0x1ec850) at

kernel/irq/chip.c:327

#3  0xffa001dc in asm_do_IRQ (irq=24, regs=<value optimized out>) at

include/linux/irq.h:282

#4  0xffa00dfc in do_irq (vec=<value optimized out>, fp=0x544640)

    at arch/blackfin/mach-common/ints-priority.c:1196

#5  0xffa00b4a in _common_int_entry ()

    at

/opt/uClinux/uclinux-2009R1-bf537/linux-2.6.x/arch/blackfin/include/asm/current.h:17

 

 

Disassembly:

 

   cb270:       28 e4 24 00     R0 = [P5 + 0x90];

   cb274:       f9 e3 d6 de     CALL 0x7020

<_blackfin_dcache_invalidate_range>;

   cb278:       04 32           P0 = R4;

   cb27a:       21 e1 ff 07     R1 = 0x7ff (X);         /*            

R1=0x7ff(2047) */

   cb27e:       42 91           P2 = [P0];

   cb280:       55 bc           [P2 + 0x4] = P5;

   cb282:       28 e4 25 00     R0 = [P5 + 0x94];

   cb286:       f0 67           R0 += -0x2;             /* ( -2) */

   cb288:       d0 b0           [P2 + 0xc] = R0;

   cb28a:       50 a2           R0 = [P2 + 0x24];

   cb28c:       08 54           R0 = R0 & R1;

   cb28e:       e0 67           R0 += -0x4;             /* ( -4) */

   cb290:       c5 42           R5 = R0.L (Z);

   cb292:       0d 30           R1 = R5;

   cb294:       44 30           R0 = P4;

   cb296:       01 e3 dd 3b     CALL 0xf2a50 <_skb_put>;

   cb29a:       4a e1 1e 00     P2.H = 0x1e;            /* ( 30)      

P2=0x1e02bc */

   cb29e:       0a e1 48 b6     P2.L = 0xb648;          /* (-18872)   

P2=0x1eb648 <_jiffies> */

>>   cb2a2:       10 91           R0 = [P2];

   cb2a4:       18 e6 4b 00     [P3 + 0x12c] = R0;

   cb2a8:       4b 30           R1 = P3;

   cb2aa:       44 30           R0 = P4;

   cb2ac:       01 e3 56 c4     CALL 0x103b58 <_eth_type_trans>;

   cb2b0:       21 e5 25 00     P1 = [P4 + 0x94];

   cb2b4:       60 e6 33 00     W[P4 + 0x66] = R0;

   cb2b8:       21 e1 45 00     R1 = 0x45 (X);          /*            

R1=0x45( 69) */

   cb2bc:       08 99           R0 = B[P1] (Z);

   cb2be:       08 08           CC = R0 == R1;

   cb2c0:       1c 18           IF CC JUMP 0xcb2f8

<_bfin_mac_interrupt+0x128>;

   cb2c2:       44 30           R0 = P4;

   cb2c4:       01 e3 5c 67     CALL 0xf817c <_netif_rx>;

   cb2c8:       18 e4 14 00     R0 = [P3 + 0x50];

   cb2cc:       08 64           R0 += 0x1;              /* (  1) */

   cb2ce:       18 e6 14 00     [P3 + 0x50] = R0;

   cb2d2:       18 e4 16 00     R0 = [P3 + 0x58];

   cb2d6:       28 50           R0 = R0 + R5;

   cb2d8:       18 e6 16 00     [P3 + 0x58] = R0;

   cb2dc:       a3 2f           JUMP.S 0xcb222

<_bfin_mac_interrupt+0x52>;

   cb2de:       4a e1 c0 ff     P2.H = 0xffc0;          /* (-64)      

P2=0xffc0b648(-4147640) */

 

This is later that in the last post. We have the suspicion, that it might have

to do with the combination of blackfin_dcache_flush_range in tx and

blackfin_dcache_invalidate_range in rx, or with interrupts interfering with

either or both.

 

--- Robin Getz                                               2010-04-28 13:44:43

OK - I'm back - check on things, and they are still running over 4 days..

 

dhrystone and whetstone are running, ping flood, iperf, everything...

 

PING 10.64.173.28 (10.64.173.28) 56(84) bytes of data.                        

                                                       .--- 10.64.173.28 ping

statistics ---

242234870 packets transmitted, 242234865 received, +5 duplicates, 0% packet

loss, time 430902219ms

rtt min/avg/max/mdev = 0.099/1.749/19.979/1.460 ms, pipe 2, ipg/ewma

1.778/1.442 ms

 

And Iperf still is shooting out...

[ ID] Interval       Transfer     Bandwidth

[  4] 428740.0-428750.0 sec    104 MBytes  10.4 MBytes/sec

 

I can't replicate things on trunk (with trunk 4.3 compiler).

 

What version of everything are you using? It indicates that it is 2009R1.1 -

there have been updates in the svn branch - do you have all those?

 

-Robin

 

--- Robin Getz                                               2010-04-28 13:48:34

My config:

 

root:/> uptime

04:09:45 up 5 days, 40 min, load average: 1.41, 1.84, 1.86

root:/> version

kernel:    Linux release 2.6.33.2-ADI-2010R1-pre-svn8659, build #639 Fri Apr 23

12:11:46 EDT 2010

toolchain: bfin-linux-uclibc-gcc release gcc version 4.3.4

(ADI-trunk/svn-3951)

user-dist: release svn-9497, build #307 Mon Mar 29 11:54:47 EDT 2010

root:/> cat /proc/interrupts

  6:  108611291      CORE  Blackfin CoreTimer

10:          1      INTN  rtc-bfin

18:        670      INTN  BFIN_UART_RX

19:   11768595      INTN  BFIN_UART_TX

24: 3429229526      INTN  EMAC_RX

NMI:          0      CORE  Non Maskable Interrupt

Err:          0

 

root:/> /sbin/ifconfig eth0

eth0      Link encap:Ethernet  HWaddr 00:E0:22:FE:45:D0

          inet addr:10.64.173.28  Bcast:10.64.173.255  Mask:255.255.255.0

          UP BROADCAST NOTRAILERS RUNNING MULTICAST  MTU:1500  Metric:1

          RX packets:3502511927 errors:0 dropped:0 overruns:0 frame:0

          TX packets:869923096 errors:0 dropped:0 overruns:0 carrier:0

          collisions:0 txqueuelen:1000

          RX bytes:3449487351 (3.2 GiB)  TX bytes:1290334026 (1.2 GiB)

 

-Robin

 

--- Stefan Wanja                                             2010-04-29 04:19:50

Hi,

 

first of all, we just had one board stall after 6 days of running, please leave

it running. We currently have 6 boards running to be able to see as much stalls

as possible.

 

We are using the latest svn of the 2009R1 branch. Kernel is: 2009R1 Rev 8664,

uclinux-dist is 2009R1.1 Rev 9586.

The compiler is  bfin-linux-uclibc-gcc (GCC) 4.1.2 (ADI svn) and from the

directory I can only tell that is was built from blackfin-toolchain-09r1.1.

 

I noticed that your iperf bandwidth is quite high, we only have half of it here

on all our boards, but in both directions. Are you actually using the -d (dual)

option on the pc for full duplex traffic?

 

So either the problem has just not yet appeared, or it depends on the compiler

version, or on the kernel version or it is because you didn't use iperf's dual

test mode. Maybe you should exclude dhry, whetstone and pingflood from the

test?

 

-Stefan

 

--- Robin Getz                                               2010-04-29 04:26:49

I'll continue to leave things running. I'll also start up a few more boards to

see -- I also may have been building for 0.2 - so I'll bump things to 0.3 and

try that too.

 

I ran: "./iperf -c 10.64.173.28 -P 1 -f M -i 10 -d -t 604800" from

trunk. (after I changed the make system to build both host and Blackfin

versions).

 

-Robin

 

--- Stefan Wanja                                             2010-04-29 04:44:58

Iperf should report 2 lines every 10 seconds in your case, one for outbound and

one for inbound traffic. It should open two connections. Can you verify that

(netstat/wireshark/...)?

 

If you can do a 10MByte/s transfer simultaneously in both directions, I would

be really interested, why we can't ;)!

 

-Stefan

 

--- Robin Getz                                               2010-04-29 13:41:21

I think it is dual - I see:

 

------------------------------------------------------------

[  5] local 10.64.173.47 port 50419 connected with 10.64.173.28 port 5001

[  4] local 10.64.173.47 port 5001 connected with 10.64.173.28 port 35118

[ ID] Interval       Transfer     Bandwidth

[  5]  0.0-60.0 sec    593 MBytes  9.88 MBytes/sec

[  4]  0.0-60.0 sec    234 MBytes  3.89 MBytes/sec

 

And wiresharc shows things going both ways...

-Robin

 

--- Stefan Wanja                                             2010-05-12 05:51:23

Hey Robin,

 

could you reproduce the problem by now?

 

-Stefan

 

--- Robin Getz                                               2010-05-13 10:28:28

Not with trunk - I'm backing out things to the release branch to see there.

 

-Robin

 

--- Stefan Wanja                                             2011-04-28 10:51:31

Hello again,

 

I'm back on this problem now... didn't disappear by itself unfortunately ;).

Checked with gcc version 4.3.3 which led to the same problem.

 

 

 

    Files

    Changes

    Commits

    Dependencies

    Duplicates

    Associations

    Tags

 

File Name     File Type     File Size     Posted By

config    application/xml    25881    Stefan Wanja

vmImage    application/octet-stream    1012999    Stefan Wanja

rootfs.jffs2    application/octet-stream    1703936    Stefan Wanja

system_config    application/octet-stream    46025    Stefan Wanja

iperf.exe    application/octet-stream    110592    Stefan Wanja

Outcomes