[#5120] trap_test can double fault and crash the kernel.

Document created by Aaronwu Employee on Sep 3, 2013
Version 1Show Document
  • View in full screen mode

[#5120] trap_test can double fault and crash the kernel.

Submitted By: Robin Getz

Open Date

2009-05-11 11:12:27     Close Date

2009-05-29 07:38:37

Priority:

Medium     Assignee:

Robin Getz

Status:

Closed     Fixed In Release:

N/A

Found In Release:

2010R1     Release:

trunk

Category:

N/A     Board:

N/A

Processor:

ALL     Silicon Revision:

all

Is this bug repeatable?:

Yes     Resolution:

Fixed

Uboot version or rev.:

trunk     Toolchain version or rev.:

trunk

App binary format:

N/A     

Summary: trap_test can double fault and crash the kernel.

Details:

 

There are a few problems which trap_test has discovered - this bug tracker will track them all...

 

-Robin

 

Follow-ups

 

--- Robin Getz                                               2009-05-11 14:34:46

Issue #1: ./traps.c Stack printing code shouldn't try to print out bad stacks.

 

 

Running test 21 for exception 0x24: Stack set to odd address - misaligned

address violation

... Data access misaligned address violation

- Attempted misaligned data memory or data cache access.

Deferred Exception context

CURRENT PROCESS:

COMM=traps_test PID=256

 

PROCESSOR STATE:

R0 : 87654321    R1 : 00000000    R2 : 00000080    R3 : 03a46994

R4 : 00000001    R5 : 00000015    R6 : 00000006    R7 : 00000001

P0 : 03be6f80    P1 : 03bf9f34    P2 : 00000001    P3 : 03be6f80

P4 : 00c1fda8    P5 : 00cafe14    FP : 87654321    SP : 00cb7f24

LB0: 039ae0d1    LT0: 039ae0c4    LC0: 00000000

LB1: 03a60327    LT1: 03a602e4    LC1: 00000000

B0 : 00000000    L0 : 00000000    M0 : 00000000    I0 : 03bfa590

B1 : 00000000    L1 : 00000000    M1 : 00000000    I1 : 00000000

B2 : 00000000    L2 : 00000000    M2 : 00000000    I2 : 00000005

B3 : 00000000    L3 : 00000000    M3 : 00000000    I3 : 00000000

A0.w: 00000000   A0.x: 00000000   A1.w: 00000000   A1.x: 00000000

USP : 87654321  ASTAT: 02003024

 

Userspace Stack

Stack info:

SP: [0x87654321] <0x87654321> /* kernel dynamic memory */

 

Double Fault

Kernel OOPS in progress

Deferred Exception context

CURRENT PROCESS:

COMM=traps_test PID=256

CPU = 0

TEXT = 0x03bf8000-0x03bfaab8        DATA = 0x03be6ab8-0x03be7060

BSS = 0x03be7060-0x00c00000  USER-STACK = 0x00c1fe90

 

return address: [0x000045fe]; contents of:

0x000045d0:  59fc  3055  e30a  d6ee  6381  e430  0034  5408

0x000045e0:  0a06  1835  3228  6007  601d  6fe5  2008  6c25

0x000045f0:  3045  6420  0a06  1829  0000  0000  0000 [a068]

0x00004600:  4800  17f6  67f0  e3ff  ff8d  4340  0c00  067d

 

bfin-elf-addr2line -f -e ./linux-2.6.x/vmlinux 0x000045fe

show_stack

arch/blackfin/kernel/traps.c:842

 

--- linux-2.6.x/arch/blackfin/kernel/traps.c    (revision 6310)

+++ linux-2.6.x/arch/blackfin/kernel/traps.c    (working copy)

@@ -831,6 +837,11 @@

        decode_address(buf, (unsigned int)stack);

        printk(KERN_NOTICE " SP: [0x%p] %s\n", stack, buf);

 

+       if (!access_ok(VERIFY_READ, stack, (unsigned int)endstack - (unsigned

int)stack)){

+               printk(KERN_NOTICE "Invalid stack pointer\n");

+               return;

+       }

+

 

Fixes this.

 

--- Robin Getz                                               2009-05-11 14:56:29

Issue #2 - new anomaly -- 05-00-0461 The RETI register can't point to

non-existent memory when returning from a HW Error.

 

Running test 47 for exception 0x3f: Jump to non-existent L1

... External Memory Addressing Error

HW Error context

CURRENT PROCESS:

COMM=traps_test PID=282

CPU = 0

TEXT = 0x03be4000-0x03be6ab8        DATA = 0x03bf4ab8-0x03bf5060

BSS = 0x03bf5060-0x00c00000  USER-STACK = 0x00c1fe90

 

return address: [0xffaffffc]; contents of:

 

SEQUENCER STATUS:               Not tainted

SEQSTAT: 0000c03f  IPEND: 0030  SYSCFG: 3bf4e88

  HWERRCAUSE: 0x3

  EXCAUSE   : 0x3f

  interrupts disabled

  physical IVG5 asserted : <0xffa00b88> { _evt_ivhw + 0x0 }

RETE: <0x00000000> { _do_one_initcall + 0xfffff000 }

RETN: <0x00c96000> /* kernel dynamic memory */

RETX: <0x03be5eb4> [ /traps_test + 0x1eb4 ]

RETS: <0x03be5896> [ /traps_test + 0x1896 ]

PC  : <0xffaffffc> /* kernel dynamic memory */

 

PROCESSOR STATE:

R0 : 00c93e14    R1 : 00000006    R2 : 03bf4f80    R3 : 00000000

R4 : 00000000    R5 : 00000000    R6 : 00000080    R7 : 03a46994

P0 : 00000001    P1 : 0000002f    P2 : 00000006    P3 : 00000001

P4 : 03bf4f80    P5 : ffaffffc    FP : 03bf4e88    SP : 00c95ef0

LB0: 039ae0d1    LT0: 039ae0c4    LC0: 00000000

LB1: 03a60327    LT1: 03a602e4    LC1: 00000000

B0 : 00000000    L0 : 00000000    M0 : 03be6890    I0 : 00c1fda8

B1 : 00000000    L1 : 00000000    M1 : 00000000    I1 : 00c93e14

B2 : 00000000    L2 : 00000000    M2 : 00000005    I2 : 00000000

B3 : 00000000    L3 : 00000000    M3 : 00000000    I3 : 00c1fd04

A0.w: 00000000   A0.x: 00000000   A1.w: 00000000   A1.x: 00000000

USP : 00c1fd04  ASTAT: 02003024

 

Hardware Trace:

   0 Target : <0x00004c38> { _trap_c + 0x0 }

     Source : <0xffa00c06> { _evt_ivhw + 0x7e } CALL pcrel

   1 Target : <0xffa00b88> { _evt_ivhw + 0x0 }

     Source : <0x03be5894> [ /traps_test + 0x1894 ] CALL (P1)

   2 Target : <0x03be587c> [ /traps_test + 0x187c ]

     Source : <0x03be5eb8> [ /traps_test + 0x1eb8 ] CALL (P1)

   3 Target : <0x03be5eb4> [ /traps_test + 0x1eb4 ]

     Source : <0xffa003e4> { _ex_dcplb_miss + 0x5c } RTX

 

 

External Memory Addressing Error

Kernel OOPS in progress

HW Error context

CURRENT PROCESS:

COMM=traps_test PID=282

CPU = 0

TEXT = 0x03be4000-0x03be6ab8        DATA = 0x03bf4ab8-0x03bf5060

BSS = 0x03bf5060-0x00c00000  USER-STACK = 0x00c1fe90

 

return address: [0xffa0031e]; contents of:

0xffa002f0:  304a  0061  0030  0000  0000  0023  0040  e14a

0xffa00300:  ffe0  e10a  2108  9111  e120  7fff  5441  3001

0xffa00310:  67f8  5408  4280  0c00  1403  e300  0333 [e300]

0xffa00320:  0847  6c66  0127  932e  05b5  0010  c682  8027

Looks like this was a deferred error - sorry

 

 

Hardware Trace:

   0 Target : <0x00004c38> { _trap_c + 0x0 }

     Source : <0xffa00c06> { _evt_ivhw + 0x7e } CALL pcrel

   1 Target : <0xffa00b88> { _evt_ivhw + 0x0 }

     Source : <0xffa00994> { _evt14_softirq + 0x8 } RTS

   2 Target : <0xffa0098c> { _evt14_softirq + 0x0 }

     Source : <0xffa00988> { _lower_to_irq14 + 0x8 } RTI

   3 Target : <0xffa00980> { _lower_to_irq14 + 0x0 }

     Source : <0xffa0031a> { _asm_do_IRQ + 0x5e } CALL pcrel

   4 Target : <0xffa002f4> { _asm_do_IRQ + 0x38 }

     Source : <0xffa03008> { _handle_simple_irq + 0x70 } RTS

   5 Target : <0xffa02ffc> { _handle_simple_irq + 0x64 }

     Source : <0xffa03012> { _handle_simple_irq + 0x7a } JUMP.S

   6 Target : <0xffa03012> { _handle_simple_irq + 0x7a }

 

 

Modules linked in:

Kernel panic - not syncing: Kernel exception

 

Fixed with:

 

--- linux-2.6.x/arch/blackfin/kernel/traps.c    (revision 6344)

+++ linux-2.6.x/arch/blackfin/kernel/traps.c    (working copy)

@@ -593,6 +593,9 @@

                force_sig_info(sig, &info, current);

        }

 

+       if (ANOMALY_05000461 && trapnr == VEC_HWERR &&

!access_ok(VERIFY_READ, fp->pc, 8))

+               fp->pc = SAFE_USER_INSTRUCTION;

+

        trace_buffer_restore(j);

        return;

}

 

Fixes this.

 

--- Robin Getz                                               2009-05-11 16:25:41

>/traps_test -d 0 -c 10000 49

(while ping flooding it)

 

gets:

 

Running test 49 for exception 0x3f: Write non-existent L1

... External Memory Addressing Error

 

Kernel OOPS in progress

HW Error context

CURRENT PROCESS:

COMM=traps_test PID=1994

CPU = 0

TEXT = 0x00d10000-0x00d12ab8        DATA = 0x00d1aab8-0x00d1b060

BSS = 0x00d1b060-0x00d80000  USER-STACK = 0x00d9fea0

 

return address: [0xffa00b0e]; contents of:

0xffa00ae0:  0162  0163  0170  0173  0171  0174  0172  0175

0xffa00af0:  0166  0140  0167  31d3  0142  017c  017d  017e

0xffa00b00:  0179  0141  61f9  0041  017b  6001  3621 [3629]

0xffa00b10:  3631  3639  304e  6fa6  e300  0114  6c66  e3ff

Looks like this was a deferred error - sorry

It might be better to look around here :

-------------------------------------------

 

SEQUENCER STATUS:               Not tainted

SEQSTAT: 0000c026  IPEND: 0000  SYSCFG: 0006

  EXCAUSE   : 0x26

RETE: <0x00000000> { _do_one_initcall + 0xfffff000 }

RETN: <0x00d76000> /* kernel dynamic memory */

RETX: <0x00d11eb4> [ /traps_test + 0x1eb4 ]

RETS: <0x00d11eba> [ /traps_test + 0x1eba ]

PC  : <0x00d118a8> [ /traps_test + 0x18a8 ]

DCPLB_FAULT_ADDR: <0x00d61e78> /* kernel dynamic memory */

ICPLB_FAULT_ADDR: <0x0009efb4> { _sprintf + 0x0 }

 

PROCESSOR STATE:

R0 : 00000000    R1 : 00000000    R2 : 00000080    R3 : 03a46994

R4 : 00000001    R5 : 00000031    R6 : 00000005    R7 : 00000001

P0 : 00d1af80    P1 : 00d1189c    P2 : ffaffffc    P3 : 00d1af80

P4 : 00d9fdb8    P5 : 00d7be14    FP : 00000001    SP : 00d75f24

LB0: 039ce0d1    LT0: 039ce0c4    LC0: 00000000

LB1: 039ca4fb    LT1: 039ca4fa    LC1: 00000000

B0 : 00000000    L0 : 00000000    M0 : 00000000    I0 : 00d9ffe5

B1 : 00000000    L1 : 00000000    M1 : 00000000    I1 : 00d79378

B2 : 00000000    L2 : 00000000    M2 : 00000000    I2 : 00000005

B3 : 00000000    L3 : 00000000    M3 : 00000000    I3 : 00000000

A0.w: 00000000   A0.x: 00000000   A1.w: 00000000   A1.x: 00000000

USP : 00d9fd2c  ASTAT: 02003024

 

-------------------------------------------

 

SEQUENCER STATUS:               Not tainted

SEQSTAT: 0000c03f  IPEND: 0830  SYSCFG: 0006

  HWERRCAUSE: 0x3

  EXCAUSE   : 0x3f

  interrupts disabled

  physical IVG5 asserted : <0xffa00b88> { _evt_ivhw + 0x0 }

  physical IVG11 asserted : <0xffa00cb4> { _evt_evt11 + 0x0 }

  logical irq   6 mapped  : <0xffa00374> { _timer_interrupt + 0x0 }

  logical irq  10 mapped  : <0x000ea5c0> { _bfin_rtc_interrupt + 0x0 }

  logical irq  12 mapped  : <0x0010773c> { _rx_handler + 0x0 }

  logical irq  13 mapped  : <0x001076e0> { _tx_handler + 0x0 }

  logical irq  18 mapped  : <0x000b03d4> { _bfin_serial_dma_rx_int + 0x0

}

  logical irq  19 mapped  : <0x000b00a0> { _bfin_serial_dma_tx_int + 0x0

}

  logical irq  24 mapped  : <0x000b9f68> { _bfin_mac_interrupt + 0x0 }

  logical irq  45 mapped  : <0x0010754c> { _err_handler + 0x0 }

RETE: <0x00000000> { _do_one_initcall + 0xfffff000 }

RETN: <0x00d76000> /* kernel dynamic memory */

RETX: <0x00d11eb4> [ /traps_test + 0x1eb4 ]

RETS: <0x00d11eba> [ /traps_test + 0x1eba ]

PC  : <0xffa00b0e> { __common_int_entry + 0x56 }

 

PROCESSOR STATE:

R0 : 0000000b    R1 : 00000000    R2 : 00d118a8    R3 : 03a46994

R4 : 00000001    R5 : 00000031    R6 : 00000005    R7 : 00000001

P0 : 00d1af80    P1 : 00d1189c    P2 : ffaffffc    P3 : 00d1af80

P4 : 00d9fdb8    P5 : 00d7be14    FP : 00000001    SP : 00d75e48

LB0: 039ce0d1    LT0: 039ce0c4    LC0: 00000000

LB1: 039ca4fb    LT1: 039ca4fa    LC1: 00000000

B0 : 00000000    L0 : 00000000    M0 : 00000000    I0 : 00d9ffe5

B1 : 00000000    L1 : 00000000    M1 : 00000000    I1 : 00d79378

B2 : 00000000    L2 : 00000000    M2 : 00000000    I2 : 00000005

B3 : 00000000    L3 : 00000000    M3 : 00000000    I3 : 00000000

A0.w: 00000000   A0.x: 00000000   A1.w: 00000000   A1.x: 00000000

USP : 00d9fd2c  ASTAT: 02003024

 

Hardware Trace:

   0 Target : <0x00004c38> { _trap_c + 0x0 }

     Source : <0xffa00c06> { _evt_ivhw + 0x7e } CALL pcrel

   1 Target : <0xffa00b88> { _evt_ivhw + 0x0 }

     Source : <0xffa00b0c> { __common_int_entry + 0x54 } 0x3621

   2 Target : <0xffa00ab8> { __common_int_entry + 0x0 }

     Source : <0xffa00cbe> { _evt_evt11 + 0xa } JUMP.S

   3 Target : <0xffa00cb4> { _evt_evt11 + 0x0 }

     Source : <0x00d118a6> [ /traps_test + 0x18a6 ] 0x9310

   4 Target : <0x00d1189c> [ /traps_test + 0x189c ]

     Source : <0x00d11eb8> [ /traps_test + 0x1eb8 ] CALL (P1)

   5 Target : <0x00d11eb4> [ /traps_test + 0x1eb4 ]

     Source : <0xffa003e4> { _ex_dcplb_miss + 0x5c } RTX

   6 Target : <0xffa003ae> { _ex_dcplb_miss + 0x26 }

     Source : <0x00009b4e> { _dcplb_miss + 0x16e } RTS

   7 Target : <0x00009ae2> { _dcplb_miss + 0x102 }

     Source : <0x00009bba> { _dcplb_miss + 0x1da } JUMP.S

   8 Target : <0x00009bb8> { _dcplb_miss + 0x1d8 }

     Source : <0x00009abc> { _dcplb_miss + 0xdc } IF !CC JUMP

   9 Target : <0x00009ab4> { _dcplb_miss + 0xd4 }

     Source : <0x00009ac4> { _dcplb_miss + 0xe4 } IF CC JUMP

  10 Target : <0x00009ab4> { _dcplb_miss + 0xd4 }

     Source : <0x00009ac4> { _dcplb_miss + 0xe4 } IF CC JUMP

  11 Target : <0x00009ab4> { _dcplb_miss + 0xd4 }

     Source : <0x00009ac4> { _dcplb_miss + 0xe4 } IF CC JUMP

  12 Target : <0x00009ab4> { _dcplb_miss + 0xd4 }

     Source : <0x00009ac4> { _dcplb_miss + 0xe4 } IF CC JUMP

  13 Target : <0x00009ab4> { _dcplb_miss + 0xd4 }

     Source : <0x00009ac4> { _dcplb_miss + 0xe4 } IF CC JUMP

  14 Target : <0x00009ab4> { _dcplb_miss + 0xd4 }

     Source : <0x00009ac4> { _dcplb_miss + 0xe4 } IF CC JUMP

  15 Target : <0x00009ab4> { _dcplb_miss + 0xd4 }

     Source : <0x00009ac4> { _dcplb_miss + 0xe4 } IF CC JUMP

 

The Blackfin has weak ordering of loads and stores. Weak ordering implies that

the timing of the actual completion of the memory operations — even the order

in which these events occur — may not align with how they appear in the

sequence of the program source code.

 

In this issue - the write gets sent to a write buffer - but isn't actually

triggered as "bad" (when the IRQ5 goes off) until we are in a

different context state (this time - irq11 - the Ethernet interrupt). Since we

receive the IRQ5 event while in kernel space - we assume we are in kernel space

and OOPS...

 

The way to handle this is to SSYNC (which forces all writes to complete, and

signal an HW Error), check to see if IRQ5 is gone off, handle the error, and

then handle the interrupt properly...

 

Committing on trunk as an option (normally turned off, but turned on for our

platforms) -- just need to do a little more testing to make sure I have all the

corner cases...

 

 

--- Robin Getz                                               2009-05-11 16:42:03

Add hours

 

--- Robin Getz                                               2009-05-18 15:07:10

after testing all weekend - committed on trunk.

 

More noodling about other issues...

 

-Robin

 

--- Robin Getz                                               2009-05-29 07:38:37

Add (with Mike's help) code which does ptracing of code under test - to simulate

a gdbserver session. This uncovered what I expected... (another place in the

kernel which needed to be fixed). traps_test committed with updated tests.

 

Fixed the kernel (on trunk) as well.

 

This is the last problem I can think of. If anyone can get the kernel to crash

from userspace - (with CONFIG_EXACT_HWERR on) please open a new bug.

 

I will update the documentation today.

 

Closing, and marked fixed.

 

-Robin

 

--- Robin Getz                                               2009-06-02 09:45:38

OK - stress testing found one more problem (which is now fixed on trunk).

 

Stress testing is defined as - traps test, telnet (whetstone) telnet (top) and

ping flood from host.

 

After the recent change - everything works overnight. (Yeah!)

 

-Robin

 

 

 

    Files

    Changes

    Commits

    Dependencies

    Duplicates

    Associations

    Tags

 

File Name     File Type     File Size     Posted By

No Files Were Found

Attachments

    Outcomes