[#6014] bf537 and bf527 Processor stalls in SSYNC randomly under heavy network load
Submitted By: Stefan Wanja
Open Date
2010-04-23 08:44:15
Priority:
Medium Assignee:
Nobody
Status:
Open Fixed In Release:
N/A
Found In Release:
2009R1.1-RC4 Release:
Category:
Kernel Functions Board:
N/A
Processor:
BF537 Silicon Revision:
0.3 (bf537), 0.2 (bf527)
Is this bug repeatable?:
Yes Resolution:
N/A
Uboot version or rev.:
2008R1, ADI-2009R1-rc3 Toolchain version or rev.:
rev 3070 and rev 3792
App binary format:
N/A
Summary: bf537 and bf527 Processor stalls in SSYNC randomly under heavy network load
Details:
When the bf527 and bf537 run under network load, sometimes they stall. They can only be reached by the gnICE debugger then and it shows, that the bf537 processor stays at a ssync instruction which was called from 'bfin_mac.c' at the function bfin_mac_hard_start_xmit after blackfin_dcache_flush_range was called. The bf527 seems to stay at the ssync as well, but the first entry in stack says its in an interrupt handler (one of the evt_evtX functions).
Steps to reproduce:
uClinux:
boot, run "ifconfig eth0 up <ip> netmask <netmask>", run "iperf -s &",
over the run time, make sure the log doesn't fill up the whole memory.
PC:
run "iperf -c <blackfin ip> -P 1 -f M -i <log interval> -d -t <runtime_in_s>"
(I have attached a windows binary, for linux it should be available on every distro)
Let it run for a COUPLE OF DAYS! If iperf logs that there is no more traffic,
check serial console of uclinux. If it is not alive anymore, check with gnICE+,
otherwise restart test.
It can really take a long time but can also happen after some seconds. Seems to be randomly, possibly related to interrupts.
This thread in the forum is related to this problem and has more information:
Files attached are:
my kernel config : .config
my uClinux system config : system_config
my rootfs : rootfs.jffs2
my kernel : vmImage
win iperf : iperf.exe
Follow-ups
--- Stefan Wanja 2010-04-23 09:13:55
One important thing I forgot, this only seems to happen in WRITE THROUGH cache
mode.
--- Robin Getz 2010-04-23 11:14:26
Stefan:
When I look in 'bfin_mac.o' - there are only 2 SSYNCs.
Can you check yours?
One in <_mii_probe>:
b10: 13 97 W[P2] = R3;
b12: 30 00 CLI R0;
b14: 00 00 NOP;
b16: 00 00 NOP;
b18: 24 00 SSYNC;
b1a: 40 00 STI R0;
b1c: 20 00 IDLE;
One in _bfin_mac_hard_start_xmit
940: ff e3 60 fb CALL 0x0 <_bfin_mdiobus_reset>;
944: 30 00 CLI R0;
946: 00 00 NOP;
948: 00 00 NOP;
94a: 24 00 SSYNC;
94c: 40 00 STI R0;
94e: 69 91 P1 = [P5];
(other than the exact addresses), does that look close - and when things hang,
it's in the first, or 2nd?
-Robin
--- Stefan Wanja 2010-04-23 11:46:51
Robin: Yes mine looks similar. Looks close. Hangs on the second.
--- Robin Getz 2010-04-23 12:08:23
OK - after re-reading anomaly 244 & 312, and going to talk to the person who
wrote things - after a discussion - and whining (we missed some things), and he
didn't make things very clear....
I updated ./arch/blackfin/include/asm/blackfin.h with this patch.
===================================================================
--- arch/blackfin/include/asm/blackfin.h (revision 8659)
+++ arch/blackfin/include/asm/blackfin.h (working copy)
@@ -17,22 +17,16 @@
static inline void SSYNC(void)
{
int _tmp;
- if (ANOMALY_05000312)
+ if (ANOMALY_05000312 || ANOMALY_05000244)
__asm__ __volatile__(
"cli %0;"
"nop;"
"nop;"
+ "nop;"
"ssync;"
"sti %0;"
: "=d" (_tmp)
);
- else if (ANOMALY_05000244)
- __asm__ __volatile__(
- "nop;"
- "nop;"
- "nop;"
- "ssync;"
- );
else
__asm__ __volatile__("ssync;");
}
@@ -41,22 +35,16 @@
static inline void CSYNC(void)
{
int _tmp;
- if (ANOMALY_05000312)
+ if (ANOMALY_05000312 || ANOMALY_05000244)
__asm__ __volatile__(
"cli %0;"
"nop;"
"nop;"
+ "nop;"
"csync;"
"sti %0;"
: "=d" (_tmp)
);
- else if (ANOMALY_05000244)
- __asm__ __volatile__(
- "nop;"
- "nop;"
- "nop;"
- "csync;"
- );
else
__asm__ __volatile__("csync;");
}
@@ -73,14 +61,10 @@
#define ssync(x) SSYNC(x)
#define csync(x) CSYNC(x)
-#if ANOMALY_05000312
-#define SSYNC(scratch) cli scratch; nop; nop; SSYNC; sti scratch;
-#define CSYNC(scratch) cli scratch; nop; nop; CSYNC; sti scratch;
+#if ANOMALY_05000312 || ANOMALY_05000244
+#define SSYNC(scratch) cli scratch; nop; nop; nop; SSYNC; sti scratch;
+#define CSYNC(scratch) cli scratch; nop; nop; nop; CSYNC; sti scratch;
-#elif ANOMALY_05000244
-#define SSYNC(scratch) nop; nop; nop; SSYNC;
-#define CSYNC(scratch) nop; nop; nop; CSYNC;
-
#else
#define SSYNC(scratch) SSYNC;
#define CSYNC(scratch) CSYNC;
-------
Basically both need to be protected with 3 NOPS, and CLI/STI.
I'm building things up, and will let it run over the weekend - can you do the
same?
Thanks
-Robin
--- Robin Getz 2010-04-23 14:15:20
I committed something to trunk, since it seemed to be correct (and not
negatively effect anything).
-Robin
--- Robin Getz 2010-04-23 15:52:59
OK - things have been running a few hours without issues.
- iperf, pingflood, and userspace is running alterating dhrystone and
whetstone.
Based on the description - I don't think a few hours counts, and will let you
know on Wednesday when I'm back.
--- Stefan Wanja 2010-04-23 16:47:30
I will test that, but your changes won't fix it for the bf527 as ANOMALIES
05000244 and 05000312 are not active on it. Right?
--- Stefan Wanja 2010-04-27 08:57:58
Now we had a stall at a different place (same on bf527 and bf537). The patch you
proposed had been adopted in the bf537 test. We have seen the place where it
stalled before, but only once or twice. The only connection I can see is that
there is a flush instruction a few steps ahead.
Here is the info I can provide:
Registers:
r0 = 0x3454e0
r1 = 0x42
p2 = 0x3454e0
rets = 0xcb29a <bfin_mac_interrupt+202>
usp = 0x3c7bc78
seqstat = 0x0
syscfg = 0x6
reti = 0x384ec9a
retx = 0x384ec9a
retn = 0x3c76000
rete = 0xf2a5a
pc = 0xf2a5a <skb_put+10>
Disassembly around pc:
000f2a50 <_skb_put>:
f2a50: 10 32 P2 = R0;
f2a52: 78 05 [--SP] = (R7:7);
f2a54: 00 e8 04 00 LINK 0x10; /* (16) */
f2a58: 11 30 R2 = R1;
>>>f2a5a: 10 e4 15 00 R0 = [P2 + 0x54];
f2a5e: 00 0c CC = R0 == 0x0;
f2a60: 17 e4 22 00 R7 = [P2 + 0x88];
f2a64: 13 10 IF !CC JUMP 0xf2a8a <_skb_put+0x3a>;
f2a66: 00 00 NOP;
f2a68: 0f 50 R0 = R7 + R1;
f2a6a: 10 e6 22 00 [P2 + 0x88] = R0;
f2a6e: 11 e4 14 00 R1 = [P2 + 0x50];
f2a72: 51 50 R1 = R1 + R2;
f2a74: 11 e6 14 00 [P2 + 0x50] = R1;
f2a78: 11 e4 23 00 R1 = [P2 + 0x8c];
That is what the stack looked like:
* skb_put(skb=0x3453e0, len=66) at net/core/skbuff.c:958
* bfin_mac_interrupt(irq=<value optimized out>, dev_id=0x3c65000) at
drivers/net/bfin_mac.c:729
* handle_IRQ_event(irq=24, action=0x32adc0) at kernel/irq/handle.c:142
* handle_simple_ieq(irq=24, desc=0x1ec850) t kernel/irq/chip.c:327
* asm_do_IRQ (irq=24, regs=<value optimized out>) at
include/linux/irq.h:282
* do_irq(vec=<value optimized out>,fp=0x42) at
arch/blackfin/mach-common/ints-priority.c:1196
* _common_int_entry () at
/opt/uClinux/uclinux-2009R1-bf537/linux-2.6.x/arch/blackfin/include/asm/current.h:17
skb_put is called from bfin_mac_rx:
static void bfin_mac_rx(struct net_device *dev)
{
struct sk_buff *skb, *new_skb;
unsigned short len;
#if defined(BFIN_MAC_CSUM_OFFLOAD)
unsigned int i;
unsigned char fcs[ETH_FCS_LENGTH + 1];
#endif
/* check if frame status word reports an error condition
* we which case we simply drop the packet
*/
if (current_rx_ptr->status.status_word & RX_ERROR_MASK) {
printk(KERN_NOTICE DRV_NAME
": rx: receive error - packet dropped\n");
dev->stats.rx_dropped++;
goto out;
}
/* allocate a new skb for next time receive */
skb = current_rx_ptr->skb;
new_skb = dev_alloc_skb(PKT_BUF_SZ + NET_IP_ALIGN);
if (!new_skb) {
printk(KERN_NOTICE DRV_NAME
": rx: low on mem - packet dropped\n");
dev->stats.rx_dropped++;
goto out;
}
/* reserve 2 bytes for RXDWA padding */
skb_reserve(new_skb, NET_IP_ALIGN);
/* Invidate the data cache of skb->data range when it is write back
* cache. It will prevent overwritting the new data from DMA
*/
blackfin_dcache_invalidate_range((unsigned long)new_skb->head,
(unsigned long)new_skb->end);
current_rx_ptr->skb = new_skb;
current_rx_ptr->desc_a.start_addr = (unsigned long)new_skb->data - 2;
len = (unsigned short)((current_rx_ptr->status.status_word) &
RX_FRLEN);
/* Deduce Ethernet FCS length from Ethernet payload length */
len -= ETH_FCS_LENGTH;
>> skb_put(skb, len);
...
skb_put hangs at the SKB_LINEAR_ASSERT(skb) call.
A few lines before the call to skb_put there is a call to
'blackfin_dcache_invalidate_range' with a newly dev_alloc_skb'ed memory area.
Shouldn't there be a ssync afterwards anyways?
So to sum up, we have two stall places, one in the receive path of bfin_mac and
one in the transmit path. Reason unknown and I assume not completely fixed by
your patch since it doesn't affect bf527.
--- Stefan Wanja 2010-04-28 05:28:16
Hello again,
we have a slightly different place now where bf537 (with your SSYNC changes)
stalled. See below.
(gdb) target remote localhost:2000
Remote debugging using localhost:2000
0x000cb2a2 in bfin_mac_interrupt (irq=<value optimized out>,
dev_id=0x3c65000)
at drivers/net/bfin_mac.c:731
731 dev->last_rx = jiffies;
(gdb) info registers
r0 0x544012 5521426
r1 0x544640 5523008
r2 0x42 66
r3 0x343648 3421768
r4 0x1e5d34 1989940
r5 0x42 66
r6 0x0 0
r7 0x1e5d34 1989940
p0 0x1e5d34 0x1e5d34
p1 0x32 0x32
p2 0x1eb648 0x1eb648
p3 0x3c65000 0x3c65000
p4 0x3a56e60 0x3a56e60
p5 0x3435c0 0x3435c0
sp 0x503670 0x503670
fp 0x503684 0x503684
i0 0x0 0
i1 0x33f504 3405060
i2 0x0 0
i3 0x0 0
m0 0x0 0
m1 0x0 0
m2 0x0 0
m3 0x0 0
b0 0x0 0
b1 0x0 0
b2 0x0 0
b3 0x0 0
l0 0x0 0
l1 0x0 0
l2 0x0 0
l3 0x0 0
a0x 0x0 0
a0w 0x7f 127
a1x 0x0 0
a1w 0x7f 127
astat 0x2002022 33562658
rets 0xcb29a 0xcb29a <bfin_mac_interrupt+202>
lc0 0x0 0
lt0 0xffa0131c -6286564
lb0 0xffa0131c -6286564
lc1 0x0 0
lt1 0x7036 28726
lb1 0x7036 28726
cycles 0x339b04b7 865797303
cycles2 0x25e6 9702
usp 0x50bc78 0x50bc78
seqstat 0x0 0
syscfg 0x6 6
reti 0xfbf98 1032088
retx 0x384ec9a 59042970
retn 0x504000 5259264
rete 0xcb2a2 832162
>>pc 0xcb2a2 0xcb2a2 <bfin_mac_interrupt+210>
cc 0x0 0
text_addr 0x0 0
text_end_addr 0x0 0
data_addr 0x0 0
fdpic_exec 0x0 0
fdpic_interp 0x0 0
ipend 0x0 0
(gdb) info stack
#0 0x000cb2a2 in bfin_mac_interrupt (irq=<value optimized out>,
dev_id=0x3c65000)
at drivers/net/bfin_mac.c:731
#1 0x0002c87e in handle_IRQ_event (irq=24, action=0x32adc0) at
kernel/irq/handle.c:142
#2 0x0002e1ba in handle_simple_irq (irq=24, desc=0x1ec850) at
kernel/irq/chip.c:327
#3 0xffa001dc in asm_do_IRQ (irq=24, regs=<value optimized out>) at
include/linux/irq.h:282
#4 0xffa00dfc in do_irq (vec=<value optimized out>, fp=0x544640)
at arch/blackfin/mach-common/ints-priority.c:1196
#5 0xffa00b4a in _common_int_entry ()
at
/opt/uClinux/uclinux-2009R1-bf537/linux-2.6.x/arch/blackfin/include/asm/current.h:17
Disassembly:
cb270: 28 e4 24 00 R0 = [P5 + 0x90];
cb274: f9 e3 d6 de CALL 0x7020
<_blackfin_dcache_invalidate_range>;
cb278: 04 32 P0 = R4;
cb27a: 21 e1 ff 07 R1 = 0x7ff (X); /*
R1=0x7ff(2047) */
cb27e: 42 91 P2 = [P0];
cb280: 55 bc [P2 + 0x4] = P5;
cb282: 28 e4 25 00 R0 = [P5 + 0x94];
cb286: f0 67 R0 += -0x2; /* ( -2) */
cb288: d0 b0 [P2 + 0xc] = R0;
cb28a: 50 a2 R0 = [P2 + 0x24];
cb28c: 08 54 R0 = R0 & R1;
cb28e: e0 67 R0 += -0x4; /* ( -4) */
cb290: c5 42 R5 = R0.L (Z);
cb292: 0d 30 R1 = R5;
cb294: 44 30 R0 = P4;
cb296: 01 e3 dd 3b CALL 0xf2a50 <_skb_put>;
cb29a: 4a e1 1e 00 P2.H = 0x1e; /* ( 30)
P2=0x1e02bc */
cb29e: 0a e1 48 b6 P2.L = 0xb648; /* (-18872)
P2=0x1eb648 <_jiffies> */
>> cb2a2: 10 91 R0 = [P2];
cb2a4: 18 e6 4b 00 [P3 + 0x12c] = R0;
cb2a8: 4b 30 R1 = P3;
cb2aa: 44 30 R0 = P4;
cb2ac: 01 e3 56 c4 CALL 0x103b58 <_eth_type_trans>;
cb2b0: 21 e5 25 00 P1 = [P4 + 0x94];
cb2b4: 60 e6 33 00 W[P4 + 0x66] = R0;
cb2b8: 21 e1 45 00 R1 = 0x45 (X); /*
R1=0x45( 69) */
cb2bc: 08 99 R0 = B[P1] (Z);
cb2be: 08 08 CC = R0 == R1;
cb2c0: 1c 18 IF CC JUMP 0xcb2f8
<_bfin_mac_interrupt+0x128>;
cb2c2: 44 30 R0 = P4;
cb2c4: 01 e3 5c 67 CALL 0xf817c <_netif_rx>;
cb2c8: 18 e4 14 00 R0 = [P3 + 0x50];
cb2cc: 08 64 R0 += 0x1; /* ( 1) */
cb2ce: 18 e6 14 00 [P3 + 0x50] = R0;
cb2d2: 18 e4 16 00 R0 = [P3 + 0x58];
cb2d6: 28 50 R0 = R0 + R5;
cb2d8: 18 e6 16 00 [P3 + 0x58] = R0;
cb2dc: a3 2f JUMP.S 0xcb222
<_bfin_mac_interrupt+0x52>;
cb2de: 4a e1 c0 ff P2.H = 0xffc0; /* (-64)
P2=0xffc0b648(-4147640) */
This is later that in the last post. We have the suspicion, that it might have
to do with the combination of blackfin_dcache_flush_range in tx and
blackfin_dcache_invalidate_range in rx, or with interrupts interfering with
either or both.
--- Robin Getz 2010-04-28 13:44:43
OK - I'm back - check on things, and they are still running over 4 days..
dhrystone and whetstone are running, ping flood, iperf, everything...
PING 10.64.173.28 (10.64.173.28) 56(84) bytes of data.
.--- 10.64.173.28 ping
statistics ---
242234870 packets transmitted, 242234865 received, +5 duplicates, 0% packet
loss, time 430902219ms
rtt min/avg/max/mdev = 0.099/1.749/19.979/1.460 ms, pipe 2, ipg/ewma
1.778/1.442 ms
And Iperf still is shooting out...
[ ID] Interval Transfer Bandwidth
[ 4] 428740.0-428750.0 sec 104 MBytes 10.4 MBytes/sec
I can't replicate things on trunk (with trunk 4.3 compiler).
What version of everything are you using? It indicates that it is 2009R1.1 -
there have been updates in the svn branch - do you have all those?
-Robin
--- Robin Getz 2010-04-28 13:48:34
My config:
root:/> uptime
04:09:45 up 5 days, 40 min, load average: 1.41, 1.84, 1.86
root:/> version
kernel: Linux release 2.6.33.2-ADI-2010R1-pre-svn8659, build #639 Fri Apr 23
12:11:46 EDT 2010
toolchain: bfin-linux-uclibc-gcc release gcc version 4.3.4
(ADI-trunk/svn-3951)
user-dist: release svn-9497, build #307 Mon Mar 29 11:54:47 EDT 2010
root:/> cat /proc/interrupts
6: 108611291 CORE Blackfin CoreTimer
10: 1 INTN rtc-bfin
18: 670 INTN BFIN_UART_RX
19: 11768595 INTN BFIN_UART_TX
24: 3429229526 INTN EMAC_RX
NMI: 0 CORE Non Maskable Interrupt
Err: 0
root:/> /sbin/ifconfig eth0
eth0 Link encap:Ethernet HWaddr 00:E0:22:FE:45:D0
inet addr:10.64.173.28 Bcast:10.64.173.255 Mask:255.255.255.0
UP BROADCAST NOTRAILERS RUNNING MULTICAST MTU:1500 Metric:1
RX packets:3502511927 errors:0 dropped:0 overruns:0 frame:0
TX packets:869923096 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:3449487351 (3.2 GiB) TX bytes:1290334026 (1.2 GiB)
-Robin
--- Stefan Wanja 2010-04-29 04:19:50
Hi,
first of all, we just had one board stall after 6 days of running, please leave
it running. We currently have 6 boards running to be able to see as much stalls
as possible.
We are using the latest svn of the 2009R1 branch. Kernel is: 2009R1 Rev 8664,
uclinux-dist is 2009R1.1 Rev 9586.
The compiler is bfin-linux-uclibc-gcc (GCC) 4.1.2 (ADI svn) and from the
directory I can only tell that is was built from blackfin-toolchain-09r1.1.
I noticed that your iperf bandwidth is quite high, we only have half of it here
on all our boards, but in both directions. Are you actually using the -d (dual)
option on the pc for full duplex traffic?
So either the problem has just not yet appeared, or it depends on the compiler
version, or on the kernel version or it is because you didn't use iperf's dual
test mode. Maybe you should exclude dhry, whetstone and pingflood from the
test?
-Stefan
--- Robin Getz 2010-04-29 04:26:49
I'll continue to leave things running. I'll also start up a few more boards to
see -- I also may have been building for 0.2 - so I'll bump things to 0.3 and
try that too.
I ran: "./iperf -c 10.64.173.28 -P 1 -f M -i 10 -d -t 604800" from
trunk. (after I changed the make system to build both host and Blackfin
versions).
-Robin
--- Stefan Wanja 2010-04-29 04:44:58
Iperf should report 2 lines every 10 seconds in your case, one for outbound and
one for inbound traffic. It should open two connections. Can you verify that
(netstat/wireshark/...)?
If you can do a 10MByte/s transfer simultaneously in both directions, I would
be really interested, why we can't ;)!
-Stefan
--- Robin Getz 2010-04-29 13:41:21
I think it is dual - I see:
------------------------------------------------------------
[ 5] local 10.64.173.47 port 50419 connected with 10.64.173.28 port 5001
[ 4] local 10.64.173.47 port 5001 connected with 10.64.173.28 port 35118
[ ID] Interval Transfer Bandwidth
[ 5] 0.0-60.0 sec 593 MBytes 9.88 MBytes/sec
[ 4] 0.0-60.0 sec 234 MBytes 3.89 MBytes/sec
And wiresharc shows things going both ways...
-Robin
--- Stefan Wanja 2010-05-12 05:51:23
Hey Robin,
could you reproduce the problem by now?
-Stefan
--- Robin Getz 2010-05-13 10:28:28
Not with trunk - I'm backing out things to the release branch to see there.
-Robin
--- Stefan Wanja 2011-04-28 10:51:31
Hello again,
I'm back on this problem now... didn't disappear by itself unfortunately ;).
Checked with gcc version 4.3.3 which led to the same problem.
Files
Changes
Commits
Dependencies
Duplicates
Associations
Tags
File Name File Type File Size Posted By
config application/xml 25881 Stefan Wanja
vmImage application/octet-stream 1012999 Stefan Wanja
rootfs.jffs2 application/octet-stream 1703936 Stefan Wanja
system_config application/octet-stream 46025 Stefan Wanja
iperf.exe application/octet-stream 110592 Stefan Wanja