[#5120] trap_test can double fault and crash the kernel.
Submitted By: Robin Getz
Open Date
2009-05-11 11:12:27 Close Date
2009-05-29 07:38:37
Priority:
Medium Assignee:
Robin Getz
Status:
Closed Fixed In Release:
N/A
Found In Release:
2010R1 Release:
trunk
Category:
N/A Board:
N/A
Processor:
ALL Silicon Revision:
all
Is this bug repeatable?:
Yes Resolution:
Fixed
Uboot version or rev.:
trunk Toolchain version or rev.:
trunk
App binary format:
N/A
Summary: trap_test can double fault and crash the kernel.
Details:
There are a few problems which trap_test has discovered - this bug tracker will track them all...
-Robin
Follow-ups
--- Robin Getz 2009-05-11 14:34:46
Issue #1: ./traps.c Stack printing code shouldn't try to print out bad stacks.
Running test 21 for exception 0x24: Stack set to odd address - misaligned
address violation
... Data access misaligned address violation
- Attempted misaligned data memory or data cache access.
Deferred Exception context
CURRENT PROCESS:
COMM=traps_test PID=256
PROCESSOR STATE:
R0 : 87654321 R1 : 00000000 R2 : 00000080 R3 : 03a46994
R4 : 00000001 R5 : 00000015 R6 : 00000006 R7 : 00000001
P0 : 03be6f80 P1 : 03bf9f34 P2 : 00000001 P3 : 03be6f80
P4 : 00c1fda8 P5 : 00cafe14 FP : 87654321 SP : 00cb7f24
LB0: 039ae0d1 LT0: 039ae0c4 LC0: 00000000
LB1: 03a60327 LT1: 03a602e4 LC1: 00000000
B0 : 00000000 L0 : 00000000 M0 : 00000000 I0 : 03bfa590
B1 : 00000000 L1 : 00000000 M1 : 00000000 I1 : 00000000
B2 : 00000000 L2 : 00000000 M2 : 00000000 I2 : 00000005
B3 : 00000000 L3 : 00000000 M3 : 00000000 I3 : 00000000
A0.w: 00000000 A0.x: 00000000 A1.w: 00000000 A1.x: 00000000
USP : 87654321 ASTAT: 02003024
Userspace Stack
Stack info:
SP: [0x87654321] <0x87654321> /* kernel dynamic memory */
Double Fault
Kernel OOPS in progress
Deferred Exception context
CURRENT PROCESS:
COMM=traps_test PID=256
CPU = 0
TEXT = 0x03bf8000-0x03bfaab8 DATA = 0x03be6ab8-0x03be7060
BSS = 0x03be7060-0x00c00000 USER-STACK = 0x00c1fe90
return address: [0x000045fe]; contents of:
0x000045d0: 59fc 3055 e30a d6ee 6381 e430 0034 5408
0x000045e0: 0a06 1835 3228 6007 601d 6fe5 2008 6c25
0x000045f0: 3045 6420 0a06 1829 0000 0000 0000 [a068]
0x00004600: 4800 17f6 67f0 e3ff ff8d 4340 0c00 067d
bfin-elf-addr2line -f -e ./linux-2.6.x/vmlinux 0x000045fe
show_stack
arch/blackfin/kernel/traps.c:842
--- linux-2.6.x/arch/blackfin/kernel/traps.c (revision 6310)
+++ linux-2.6.x/arch/blackfin/kernel/traps.c (working copy)
@@ -831,6 +837,11 @@
decode_address(buf, (unsigned int)stack);
printk(KERN_NOTICE " SP: [0x%p] %s\n", stack, buf);
+ if (!access_ok(VERIFY_READ, stack, (unsigned int)endstack - (unsigned
int)stack)){
+ printk(KERN_NOTICE "Invalid stack pointer\n");
+ return;
+ }
+
Fixes this.
--- Robin Getz 2009-05-11 14:56:29
Issue #2 - new anomaly -- 05-00-0461 The RETI register can't point to
non-existent memory when returning from a HW Error.
Running test 47 for exception 0x3f: Jump to non-existent L1
... External Memory Addressing Error
HW Error context
CURRENT PROCESS:
COMM=traps_test PID=282
CPU = 0
TEXT = 0x03be4000-0x03be6ab8 DATA = 0x03bf4ab8-0x03bf5060
BSS = 0x03bf5060-0x00c00000 USER-STACK = 0x00c1fe90
return address: [0xffaffffc]; contents of:
SEQUENCER STATUS: Not tainted
SEQSTAT: 0000c03f IPEND: 0030 SYSCFG: 3bf4e88
HWERRCAUSE: 0x3
EXCAUSE : 0x3f
interrupts disabled
physical IVG5 asserted : <0xffa00b88> { _evt_ivhw + 0x0 }
RETE: <0x00000000> { _do_one_initcall + 0xfffff000 }
RETN: <0x00c96000> /* kernel dynamic memory */
RETX: <0x03be5eb4> [ /traps_test + 0x1eb4 ]
RETS: <0x03be5896> [ /traps_test + 0x1896 ]
PC : <0xffaffffc> /* kernel dynamic memory */
PROCESSOR STATE:
R0 : 00c93e14 R1 : 00000006 R2 : 03bf4f80 R3 : 00000000
R4 : 00000000 R5 : 00000000 R6 : 00000080 R7 : 03a46994
P0 : 00000001 P1 : 0000002f P2 : 00000006 P3 : 00000001
P4 : 03bf4f80 P5 : ffaffffc FP : 03bf4e88 SP : 00c95ef0
LB0: 039ae0d1 LT0: 039ae0c4 LC0: 00000000
LB1: 03a60327 LT1: 03a602e4 LC1: 00000000
B0 : 00000000 L0 : 00000000 M0 : 03be6890 I0 : 00c1fda8
B1 : 00000000 L1 : 00000000 M1 : 00000000 I1 : 00c93e14
B2 : 00000000 L2 : 00000000 M2 : 00000005 I2 : 00000000
B3 : 00000000 L3 : 00000000 M3 : 00000000 I3 : 00c1fd04
A0.w: 00000000 A0.x: 00000000 A1.w: 00000000 A1.x: 00000000
USP : 00c1fd04 ASTAT: 02003024
Hardware Trace:
0 Target : <0x00004c38> { _trap_c + 0x0 }
Source : <0xffa00c06> { _evt_ivhw + 0x7e } CALL pcrel
1 Target : <0xffa00b88> { _evt_ivhw + 0x0 }
Source : <0x03be5894> [ /traps_test + 0x1894 ] CALL (P1)
2 Target : <0x03be587c> [ /traps_test + 0x187c ]
Source : <0x03be5eb8> [ /traps_test + 0x1eb8 ] CALL (P1)
3 Target : <0x03be5eb4> [ /traps_test + 0x1eb4 ]
Source : <0xffa003e4> { _ex_dcplb_miss + 0x5c } RTX
External Memory Addressing Error
Kernel OOPS in progress
HW Error context
CURRENT PROCESS:
COMM=traps_test PID=282
CPU = 0
TEXT = 0x03be4000-0x03be6ab8 DATA = 0x03bf4ab8-0x03bf5060
BSS = 0x03bf5060-0x00c00000 USER-STACK = 0x00c1fe90
return address: [0xffa0031e]; contents of:
0xffa002f0: 304a 0061 0030 0000 0000 0023 0040 e14a
0xffa00300: ffe0 e10a 2108 9111 e120 7fff 5441 3001
0xffa00310: 67f8 5408 4280 0c00 1403 e300 0333 [e300]
0xffa00320: 0847 6c66 0127 932e 05b5 0010 c682 8027
Looks like this was a deferred error - sorry
Hardware Trace:
0 Target : <0x00004c38> { _trap_c + 0x0 }
Source : <0xffa00c06> { _evt_ivhw + 0x7e } CALL pcrel
1 Target : <0xffa00b88> { _evt_ivhw + 0x0 }
Source : <0xffa00994> { _evt14_softirq + 0x8 } RTS
2 Target : <0xffa0098c> { _evt14_softirq + 0x0 }
Source : <0xffa00988> { _lower_to_irq14 + 0x8 } RTI
3 Target : <0xffa00980> { _lower_to_irq14 + 0x0 }
Source : <0xffa0031a> { _asm_do_IRQ + 0x5e } CALL pcrel
4 Target : <0xffa002f4> { _asm_do_IRQ + 0x38 }
Source : <0xffa03008> { _handle_simple_irq + 0x70 } RTS
5 Target : <0xffa02ffc> { _handle_simple_irq + 0x64 }
Source : <0xffa03012> { _handle_simple_irq + 0x7a } JUMP.S
6 Target : <0xffa03012> { _handle_simple_irq + 0x7a }
Modules linked in:
Kernel panic - not syncing: Kernel exception
Fixed with:
--- linux-2.6.x/arch/blackfin/kernel/traps.c (revision 6344)
+++ linux-2.6.x/arch/blackfin/kernel/traps.c (working copy)
@@ -593,6 +593,9 @@
force_sig_info(sig, &info, current);
}
+ if (ANOMALY_05000461 && trapnr == VEC_HWERR &&
!access_ok(VERIFY_READ, fp->pc, 8))
+ fp->pc = SAFE_USER_INSTRUCTION;
+
trace_buffer_restore(j);
return;
}
Fixes this.
--- Robin Getz 2009-05-11 16:25:41
>/traps_test -d 0 -c 10000 49
(while ping flooding it)
gets:
Running test 49 for exception 0x3f: Write non-existent L1
... External Memory Addressing Error
Kernel OOPS in progress
HW Error context
CURRENT PROCESS:
COMM=traps_test PID=1994
CPU = 0
TEXT = 0x00d10000-0x00d12ab8 DATA = 0x00d1aab8-0x00d1b060
BSS = 0x00d1b060-0x00d80000 USER-STACK = 0x00d9fea0
return address: [0xffa00b0e]; contents of:
0xffa00ae0: 0162 0163 0170 0173 0171 0174 0172 0175
0xffa00af0: 0166 0140 0167 31d3 0142 017c 017d 017e
0xffa00b00: 0179 0141 61f9 0041 017b 6001 3621 [3629]
0xffa00b10: 3631 3639 304e 6fa6 e300 0114 6c66 e3ff
Looks like this was a deferred error - sorry
It might be better to look around here :
-------------------------------------------
SEQUENCER STATUS: Not tainted
SEQSTAT: 0000c026 IPEND: 0000 SYSCFG: 0006
EXCAUSE : 0x26
RETE: <0x00000000> { _do_one_initcall + 0xfffff000 }
RETN: <0x00d76000> /* kernel dynamic memory */
RETX: <0x00d11eb4> [ /traps_test + 0x1eb4 ]
RETS: <0x00d11eba> [ /traps_test + 0x1eba ]
PC : <0x00d118a8> [ /traps_test + 0x18a8 ]
DCPLB_FAULT_ADDR: <0x00d61e78> /* kernel dynamic memory */
ICPLB_FAULT_ADDR: <0x0009efb4> { _sprintf + 0x0 }
PROCESSOR STATE:
R0 : 00000000 R1 : 00000000 R2 : 00000080 R3 : 03a46994
R4 : 00000001 R5 : 00000031 R6 : 00000005 R7 : 00000001
P0 : 00d1af80 P1 : 00d1189c P2 : ffaffffc P3 : 00d1af80
P4 : 00d9fdb8 P5 : 00d7be14 FP : 00000001 SP : 00d75f24
LB0: 039ce0d1 LT0: 039ce0c4 LC0: 00000000
LB1: 039ca4fb LT1: 039ca4fa LC1: 00000000
B0 : 00000000 L0 : 00000000 M0 : 00000000 I0 : 00d9ffe5
B1 : 00000000 L1 : 00000000 M1 : 00000000 I1 : 00d79378
B2 : 00000000 L2 : 00000000 M2 : 00000000 I2 : 00000005
B3 : 00000000 L3 : 00000000 M3 : 00000000 I3 : 00000000
A0.w: 00000000 A0.x: 00000000 A1.w: 00000000 A1.x: 00000000
USP : 00d9fd2c ASTAT: 02003024
-------------------------------------------
SEQUENCER STATUS: Not tainted
SEQSTAT: 0000c03f IPEND: 0830 SYSCFG: 0006
HWERRCAUSE: 0x3
EXCAUSE : 0x3f
interrupts disabled
physical IVG5 asserted : <0xffa00b88> { _evt_ivhw + 0x0 }
physical IVG11 asserted : <0xffa00cb4> { _evt_evt11 + 0x0 }
logical irq 6 mapped : <0xffa00374> { _timer_interrupt + 0x0 }
logical irq 10 mapped : <0x000ea5c0> { _bfin_rtc_interrupt + 0x0 }
logical irq 12 mapped : <0x0010773c> { _rx_handler + 0x0 }
logical irq 13 mapped : <0x001076e0> { _tx_handler + 0x0 }
logical irq 18 mapped : <0x000b03d4> { _bfin_serial_dma_rx_int + 0x0
}
logical irq 19 mapped : <0x000b00a0> { _bfin_serial_dma_tx_int + 0x0
}
logical irq 24 mapped : <0x000b9f68> { _bfin_mac_interrupt + 0x0 }
logical irq 45 mapped : <0x0010754c> { _err_handler + 0x0 }
RETE: <0x00000000> { _do_one_initcall + 0xfffff000 }
RETN: <0x00d76000> /* kernel dynamic memory */
RETX: <0x00d11eb4> [ /traps_test + 0x1eb4 ]
RETS: <0x00d11eba> [ /traps_test + 0x1eba ]
PC : <0xffa00b0e> { __common_int_entry + 0x56 }
PROCESSOR STATE:
R0 : 0000000b R1 : 00000000 R2 : 00d118a8 R3 : 03a46994
R4 : 00000001 R5 : 00000031 R6 : 00000005 R7 : 00000001
P0 : 00d1af80 P1 : 00d1189c P2 : ffaffffc P3 : 00d1af80
P4 : 00d9fdb8 P5 : 00d7be14 FP : 00000001 SP : 00d75e48
LB0: 039ce0d1 LT0: 039ce0c4 LC0: 00000000
LB1: 039ca4fb LT1: 039ca4fa LC1: 00000000
B0 : 00000000 L0 : 00000000 M0 : 00000000 I0 : 00d9ffe5
B1 : 00000000 L1 : 00000000 M1 : 00000000 I1 : 00d79378
B2 : 00000000 L2 : 00000000 M2 : 00000000 I2 : 00000005
B3 : 00000000 L3 : 00000000 M3 : 00000000 I3 : 00000000
A0.w: 00000000 A0.x: 00000000 A1.w: 00000000 A1.x: 00000000
USP : 00d9fd2c ASTAT: 02003024
Hardware Trace:
0 Target : <0x00004c38> { _trap_c + 0x0 }
Source : <0xffa00c06> { _evt_ivhw + 0x7e } CALL pcrel
1 Target : <0xffa00b88> { _evt_ivhw + 0x0 }
Source : <0xffa00b0c> { __common_int_entry + 0x54 } 0x3621
2 Target : <0xffa00ab8> { __common_int_entry + 0x0 }
Source : <0xffa00cbe> { _evt_evt11 + 0xa } JUMP.S
3 Target : <0xffa00cb4> { _evt_evt11 + 0x0 }
Source : <0x00d118a6> [ /traps_test + 0x18a6 ] 0x9310
4 Target : <0x00d1189c> [ /traps_test + 0x189c ]
Source : <0x00d11eb8> [ /traps_test + 0x1eb8 ] CALL (P1)
5 Target : <0x00d11eb4> [ /traps_test + 0x1eb4 ]
Source : <0xffa003e4> { _ex_dcplb_miss + 0x5c } RTX
6 Target : <0xffa003ae> { _ex_dcplb_miss + 0x26 }
Source : <0x00009b4e> { _dcplb_miss + 0x16e } RTS
7 Target : <0x00009ae2> { _dcplb_miss + 0x102 }
Source : <0x00009bba> { _dcplb_miss + 0x1da } JUMP.S
8 Target : <0x00009bb8> { _dcplb_miss + 0x1d8 }
Source : <0x00009abc> { _dcplb_miss + 0xdc } IF !CC JUMP
9 Target : <0x00009ab4> { _dcplb_miss + 0xd4 }
Source : <0x00009ac4> { _dcplb_miss + 0xe4 } IF CC JUMP
10 Target : <0x00009ab4> { _dcplb_miss + 0xd4 }
Source : <0x00009ac4> { _dcplb_miss + 0xe4 } IF CC JUMP
11 Target : <0x00009ab4> { _dcplb_miss + 0xd4 }
Source : <0x00009ac4> { _dcplb_miss + 0xe4 } IF CC JUMP
12 Target : <0x00009ab4> { _dcplb_miss + 0xd4 }
Source : <0x00009ac4> { _dcplb_miss + 0xe4 } IF CC JUMP
13 Target : <0x00009ab4> { _dcplb_miss + 0xd4 }
Source : <0x00009ac4> { _dcplb_miss + 0xe4 } IF CC JUMP
14 Target : <0x00009ab4> { _dcplb_miss + 0xd4 }
Source : <0x00009ac4> { _dcplb_miss + 0xe4 } IF CC JUMP
15 Target : <0x00009ab4> { _dcplb_miss + 0xd4 }
Source : <0x00009ac4> { _dcplb_miss + 0xe4 } IF CC JUMP
The Blackfin has weak ordering of loads and stores. Weak ordering implies that
the timing of the actual completion of the memory operations — even the order
in which these events occur — may not align with how they appear in the
sequence of the program source code.
In this issue - the write gets sent to a write buffer - but isn't actually
triggered as "bad" (when the IRQ5 goes off) until we are in a
different context state (this time - irq11 - the Ethernet interrupt). Since we
receive the IRQ5 event while in kernel space - we assume we are in kernel space
and OOPS...
The way to handle this is to SSYNC (which forces all writes to complete, and
signal an HW Error), check to see if IRQ5 is gone off, handle the error, and
then handle the interrupt properly...
Committing on trunk as an option (normally turned off, but turned on for our
platforms) -- just need to do a little more testing to make sure I have all the
corner cases...
--- Robin Getz 2009-05-11 16:42:03
Add hours
--- Robin Getz 2009-05-18 15:07:10
after testing all weekend - committed on trunk.
More noodling about other issues...
-Robin
--- Robin Getz 2009-05-29 07:38:37
Add (with Mike's help) code which does ptracing of code under test - to simulate
a gdbserver session. This uncovered what I expected... (another place in the
kernel which needed to be fixed). traps_test committed with updated tests.
Fixed the kernel (on trunk) as well.
This is the last problem I can think of. If anyone can get the kernel to crash
from userspace - (with CONFIG_EXACT_HWERR on) please open a new bug.
I will update the documentation today.
Closing, and marked fixed.
-Robin
--- Robin Getz 2009-06-02 09:45:38
OK - stress testing found one more problem (which is now fixed on trunk).
Stress testing is defined as - traps test, telnet (whetstone) telnet (top) and
ping flood from host.
After the recent change - everything works overnight. (Yeah!)
-Robin
Files
Changes
Commits
Dependencies
Duplicates
Associations
Tags
File Name File Type File Size Posted By
No Files Were Found