2008-11-12 12:43:20     Double fault

Document created by Aaronwu Employee on Aug 8, 2013
Version 1Show Document
  • View in full screen mode

2008-11-12 12:43:20     Double fault

Michael McTernan (UNITED KINGDOM)

Message: 65166   

 

Hi,

 

I'm using BF537-0.3 on a custom board with 2008R1.5 patched to the top of SVN and we've got double faults, but only with no-MPU.  Building and running with MPU protection on everything is fine and the system will stay up for ages; with no-MPU it resets in between minutes to hours.

 

For performance we really need the no-MPU builds, so I'm trying to debug it and struggling I'm afraid

 

What I've done so far is disable the reset on double fault bit and then tried to see where the problem is coming from using a combination of very gnICE and stack backtracing.  I've found that just before the double fault, panic_cplb_error() is called with CPLB_NO_ADDR_MATCH.

 

Disabling the CPLBs prior to calling panic_cplb_error() stops the double fault, and makes things a bit easier to debug - I'm not sure if this is right, but it seems to help - I borrowed the code from the double fault handler e.g.

 

Lcplb_error:

#if 1

    DEBUG_START_HWTRACE(p5, r7);

 

        /* Turn caches & protection off, to ensure we don't get a

         * double exception

         */

 

        P4.L = LO(IMEM_CONTROL);

        P4.H = HI(IMEM_CONTROL);

 

        R5 = [P4];              /* Control Register*/

        BITCLR(R5,ENICPLB_P);

        SSYNC;          /* SSYNC required before writing to IMEM_CONTROL. */

        .align 8;

        [P4] = R5;

        SSYNC;

 

        P4.L = LO(DMEM_CONTROL);

        P4.H = HI(DMEM_CONTROL);

        R5 = [P4];

        BITCLR(R5,ENDCPLB_P);

        SSYNC;          /* SSYNC required before writing to DMEM_CONTROL. */

        .align 8;

        [P4] = R5;

        SSYNC;

#endif

    R1 = sp;

    SP += -12;

    call _panic_cplb_error;

    SP += 12;

    JUMP.L _handle_bad_cplb;

 

 

Then I modified panic_cplb_error to dump some more info, including the CPLB tables and the switch tables via a modified cplbinfo_proc_output():

 

void panic_cplb_error(int cplb_panic, struct pt_regs *fp)

{

    int tflags;

 

    trace_buffer_save(tflags);

 

    switch (cplb_panic) {

    case CPLB_NO_UNLOCKED:

        printk(KERN_EMERG "All CPLBs are locked\n");

        break;

    case CPLB_PROT_VIOL:

        printk(KERN_EMERG "CPLB_PROT_VIOL\n");

        return;

    case CPLB_NO_ADDR_MATCH:

        printk(KERN_EMERG "CPLB_NO_ADDR_MATCH:\n");

        printk(KERN_EMERG "DCPLB_STATUS=%p\n", (void *)bfin_read_DCPLB_STATUS());

        printk(KERN_EMERG "ICPLB_STATUS=%p\n", (void *)bfin_read_ICPLB_STATUS());

        printk(KERN_EMERG "DCPLB_FAULT_ADDR=%p\n", (void *)bfin_read_DCPLB_FAULT_ADDR());

        printk(KERN_EMERG "ICPLB_FAULT_ADDR=%p\n", (void *)bfin_read_ICPLB_FAULT_ADDR());

        cplbinfo_proc_output(NULL);

        return;

 

    ....

 

Finally I've been disabling the trace buffer around some of the code to try and get a more useful backtrace, showing entry to the kernel and some of the handling.  I had to disable over the CPLB search since that was too jumpy.  Here is one of the traces:

 

CPLB_NO_ADDR_MATCH:

DCPLB_STATUS=00020000

ICPLB_STATUS=00020000

DCPLB_FAULT_ADDR=076c9f18

ICPLB_FAULT_ADDR=ffa014b0

------------------ CPLB Information ------------------

 

Instruction CPLB entry:

Address         Data    Size    Valid   Locked  Swapin  iCount  oCount

0x00000000      0x00083 1K      Y       Y        0      0       24

0xffa00000      0x30007 4M      Y       Y        1      0       0

0x00000000      0x31205 4M      Y       N       -1      23      0

0x00400000      0x31205 4M      Y       N        8      20      20

0x00800000      0x31205 4M      Y       N        3      21      21

0x00c00000      0x31205 4M      Y       N       -1      13      14

0x01000000      0x31205 4M      Y       N       -1      19      20

0x01400000      0x31205 4M      Y       N       12      21      21

0x01800000      0x31205 4M      Y       N        4      20      20

0x01c00000      0x31205 4M      Y       N       -1      0       1

0x02000000      0x31205 4M      Y       N       -1      0       1

0x02400000      0x31205 4M      Y       N       -1      0       1

0x02800000      0x31205 4M      Y       N       -1      0       1

0x02c00000      0x31205 4M      Y       N       -1      0       1

0x03000000      0x31205 4M      Y       N       -1      0       1

0x03400000      0x31205 4M      Y       N       -1      0       1

0x03800000      0x31205 4M      Y       N       -1      0       0

0x03c00000      0x31205 4M      Y       N       -1      0       0

0x04000000      0x31205 4M      Y       N       -1      2       2

0x04400000      0x31205 4M      Y       N       -1      20      20

0x04800000      0x31205 4M      Y       N        6      18      17

0x04c00000      0x31205 4M      Y       N       -1      5       5

0x05000000      0x31205 4M      Y       N       -1      8       8

0x05400000      0x31205 4M      Y       N       -1      5       5

0x05800000      0x31205 4M      Y       N       11      13      12

0x05c00000      0x31205 4M      Y       N        9      14      13

0x06000000      0x31205 4M      Y       N       13      22      21

0x06400000      0x31205 4M      Y       N       -1      1       1

0x06800000      0x31205 4M      Y       N        2      19      18

0x06c00000      0x31205 4M      Y       N        7      20      19

0x07000000      0x31205 4M      Y       N        5      20      19

0x07400000      0x31205 4M      Y       N       14      21      20

0x07800000      0x31205 4M      Y       N       -1      6       6

0x07c00000      0x21205 1M      Y       N       -1      0       0

0x07d00000      0x21205 1M      Y       N       10      15      14

0xef000000      0x21205 1M      Y       N       -1      0       0

Unused/mismatched CPLBs:

15: 0x07400000  0x00000 1K      N       N

 

Instruction CPLB switch table:

Address         Data    Size    Valid   Locked  Swapin  iCount  oCount

0x00000000      0x00083 1K      Y       Y        0      0       24

0xffa00000      0x30007 4M      Y       Y        1      0       0

0x00000000      0x31205 4M      Y       N        2      23      0

0x00400000      0x31205 4M      Y       N        3      20      20

0x00800000      0x31205 4M      Y       N        4      21      21

0x00c00000      0x31205 4M      Y       N        5      13      14

0x01000000      0x31205 4M      Y       N        6      19      20

0x01400000      0x31205 4M      Y       N        7      21      21

0x01800000      0x31205 4M      Y       N        8      20      20

0x01c00000      0x31205 4M      Y       N        9      0       1

0x02000000      0x31205 4M      Y       N       10      0       1

0x02400000      0x31205 4M      Y       N       11      0       1

0x02800000      0x31205 4M      Y       N       12      0       1

0x02c00000      0x31205 4M      Y       N       13      0       1

0x03000000      0x31205 4M      Y       N       14      0       1

0x03400000      0x31205 4M      Y       N       15      0       1

0x03800000      0x31205 4M      Y       N       16      0       0

0x03c00000      0x31205 4M      Y       N       17      0       0

0x04000000      0x31205 4M      Y       N       18      2       2

0x04400000      0x31205 4M      Y       N       19      20      20

0x04800000      0x31205 4M      Y       N       20      18      17

0x04c00000      0x31205 4M      Y       N       21      5       5

0x05000000      0x31205 4M      Y       N       22      8       8

0x05400000      0x31205 4M      Y       N       23      5       5

0x05800000      0x31205 4M      Y       N       24      13      12

0x05c00000      0x31205 4M      Y       N       25      14      13

0x06000000      0x31205 4M      Y       N       26      22      21

0x06400000      0x31205 4M      Y       N       27      1       1

0x06800000      0x31205 4M      Y       N       28      19      18

0x06c00000      0x31205 4M      Y       N       29      20      19

0x07000000      0x31205 4M      Y       N       30      20      19

0x07400000      0x31205 4M      Y       N       31      21      20

0x07800000      0x31205 4M      Y       N       32      6       6

0x07c00000      0x21205 1M      Y       N       33      0       0

0x07d00000      0x21205 1M      Y       N       34      15      14

0xef000000      0x21205 1M      Y       N       35      0       0

Data CPLB entry:

Address         Data    Size    Valid   Locked  Swapin  iCount  oCount

0x00000000      0x00083 1K      Y       Y        0      0       2333

0xff800000      0x3009f 4M      Y       Y        1      0       0

0x00000000      0x3d09d 4M      Y       N       -1      2332    0

0x00400000      0x3d09d 4M      Y       N       14      2212    2212

0x00800000      0x3d09d 4M      Y       N       -1      66      67

0x00c00000      0x3d09d 4M      Y       N       11      2049    2049

0x01000000      0x3d09d 4M      Y       N        8      1019    1019

0x01400000      0x3d09d 4M      Y       N       10      394     394

0x01800000      0x3d09d 4M      Y       N       -1      590     591

0x01c00000      0x3d09d 4M      Y       N        2      1812    1812

0x02000000      0x3d09d 4M      Y       N       -1      2       3

0x02400000      0x3d09d 4M      Y       N        4      1138    1138

0x02800000      0x3d09d 4M      Y       N       -1      0       1

0x02c00000      0x3d09d 4M      Y       N       -1      0       1

0x03000000      0x3d09d 4M      Y       N       -1      0       1

0x03400000      0x3d09d 4M      Y       N       -1      0       1

0x03800000      0x3d09d 4M      Y       N       -1      0       0

0x03c00000      0x3d09d 4M      Y       N       -1      0       0

0x04000000      0x3d09d 4M      Y       N        7      1961    1960

0x04400000      0x3d09d 4M      Y       N       -1      1299    1299

0x04800000      0x3d09d 4M      Y       N       -1      57      57

0x04c00000      0x3d09d 4M      Y       N        3      2082    2081

0x05000000      0x3d09d 4M      Y       N       12      1840    1839

0x05400000      0x3d09d 4M      Y       N       -1      1819    1819

0x05800000      0x3d09d 4M      Y       N       -1      1239    1239

0x05c00000      0x3d09d 4M      Y       N       -1      1300    1300

0x06000000      0x3d09d 4M      Y       N       -1      985     985

0x06400000      0x3d09d 4M      Y       N       -1      1714    1714

0x06800000      0x3d09d 4M      Y       N       -1      66      66

0x06c00000      0x3d09d 4M      Y       N       13      1055    1054

0x07000000      0x3d09d 4M      Y       N       15      493     492

0x07400000      0x3d09d 4M      Y       N        5      1994    1993

0x07800000      0x3d09d 4M      Y       N        6      2226    2225

0x07c00000      0x2d09d 1M      Y       N       -1      77      77

0x07d00000      0x2d09d 1M      Y       N        9      2211    2210

0x07e00000      0x2009d 1M      Y       N       -1      1077    1077

0x07f00000      0x2009d 1M      Y       N       -1      1       1

0x20000000      0x3009d 4M      Y       N       -1      111     111

0xef000000      0x2d09d 1M      Y       N       -1      0       0

 

Data CPLB switch table:

Address         Data    Size    Valid   Locked  Swapin  iCount  oCount

0x00000000      0x00083 1K      Y       Y        0      0       2333

0xff800000      0x3009f 4M      Y       Y        1      0       0

0x00000000      0x3d09d 4M      Y       N        2      2332    0

0x00400000      0x3d09d 4M      Y       N        3      2212    2212

0x00800000      0x3d09d 4M      Y       N        4      66      67

0x00c00000      0x3d09d 4M      Y       N        5      2049    2049

0x01000000      0x3d09d 4M      Y       N        6      1019    1019

0x01400000      0x3d09d 4M      Y       N        7      394     394

0x01800000      0x3d09d 4M      Y       N        8      590     591

0x01c00000      0x3d09d 4M      Y       N        9      1812    1812

0x02000000      0x3d09d 4M      Y       N       10      2       3

0x02400000      0x3d09d 4M      Y       N       11      1138    1138

0x02800000      0x3d09d 4M      Y       N       12      0       1

0x02c00000      0x3d09d 4M      Y       N       13      0       1

0x03000000      0x3d09d 4M      Y       N       14      0       1

0x03400000      0x3d09d 4M      Y       N       15      0       1

0x03800000      0x3d09d 4M      Y       N       16      0       0

0x03c00000      0x3d09d 4M      Y       N       17      0       0

0x04000000      0x3d09d 4M      Y       N       18      1961    1960

0x04400000      0x3d09d 4M      Y       N       19      1299    1299

0x04800000      0x3d09d 4M      Y       N       20      57      57

0x04c00000      0x3d09d 4M      Y       N       21      2082    2081

0x05000000      0x3d09d 4M      Y       N       22      1840    1839

0x05400000      0x3d09d 4M      Y       N       23      1819    1819

0x05800000      0x3d09d 4M      Y       N       24      1239    1239

0x05c00000      0x3d09d 4M      Y       N       25      1300    1300

0x06000000      0x3d09d 4M      Y       N       26      985     985

0x06400000      0x3d09d 4M      Y       N       27      1714    1714

0x06800000      0x3d09d 4M      Y       N       28      66      66

0x06c00000      0x3d09d 4M      Y       N       29      1055    1054

0x07000000      0x3d09d 4M      Y       N       30      493     492

0x07400000      0x3d09d 4M      Y       N       31      1994    1993

0x07800000      0x3d09d 4M      Y       N       32      2226    2225

0x07c00000      0x2d09d 1M      Y       N       33      77      77

0x07d00000      0x2d09d 1M      Y       N       34      2211    2210

0x07e00000      0x2009d 1M      Y       N       35      1077    1077

0x07f00000      0x2009d 1M      Y       N       36      1       1

0x20000000      0x3009d 4M      Y       N       37      111     111

0xef000000      0x2d09d 1M      Y       N       38      0       0

exception: 0x25, ipend=8030, reti=40de0, retx=40de0

Unrecoverable event

- For example, an exception generated while processing a previous exception.

Kernel OOPS in progress

Defered Exception context

CURRENT PROCESS:

COMM=watchdogd PID=236

TEXT = 0x07400040-0x074a38a0        DATA = 0x074a38a4-0x074e1d04

BSS = 0x074e1d04-0x074ea224  USER-STACK = 0x0756af64

 

return address: [0x00040de0]; contents of:

0x00040dc0:  0032  a088  4a30  b088  60f8  0802  1fc9  e14a

0x00040dd0:  0015  e10a  aec0  9110  0040  e120  fdfe  2efc

0x00040de0: [05e5] e800  0007  af3d  3038  3031  302a  0c45

0x00040df0:  1012  6002  6001  a2b8  b0f0  a2f8  b130  b171

 

SEQUENCER STATUS:               Not tainted

SEQSTAT: 00000025  IPEND: 8030  SYSCFG: 0006

  EXCAUSE   : 0x25

  physical IVG15 asserted : <0xffa011f8> { _evt_system_call + 0x0 }

  logical irq   6 mapped  : <0xffa00364> { _timer_interrupt + 0x0 }

  logical irq  12 mapped  : <0x00094ad8> { _dma_rx_irq_handler + 0x0 }

  logical irq  15 mapped  : <0xffa02314> { __etext_l1 + 0x0 }

  logical irq  18 mapped  : <0x00098be4> { _bfin_serial_rx_int + 0x0 }

  logical irq  19 mapped  : <0x00098de8> { _bfin_serial_tx_int + 0x0 }

  logical irq  24 mapped  : <0x000a0cc4> { _bf537mac_interrupt + 0x0 }

  logical irq  45 mapped  : <0x00094b2c> { _sport_err_handler + 0x0 }

  logical irq  46 mapped  : <0x07d8016c> { :devfpgasport:_init_module + 0x771996c }

  logical irq  67 mapped  : <0x00677060> { :devfpgairq:_init_module + 0x35d260 }

RETE: <0x00000000> /* Maybe null pointer? */

RETN: <0x076c9f0c> /* unknown address */

RETX: <0x00040de0> { _sys_pselect6 + 0x0 }

RETS: <0xffa00d0c> { _system_call + 0x68 }

PC  : <0x00040de0> { _sys_pselect6 + 0x0 }

DCPLB_FAULT_ADDR: <0x076c9f18> /* unknown address */

ICPLB_FAULT_ADDR: <0x00040de0> { _sys_pselect6 + 0x0 }

 

PROCESSOR STATE:

R0 : 00000008    R1 : 0779fc74    R2 : 00000000    R3 : 00000000

R4 : 0779fc48    R5 : 00000000    R6 : ffffe000    R7 : 00000000

P0 : 00000134    P1 : 00000000    P2 : 076c8000    P3 : 000000e0

P4 : 00000134    P5 : 00040de0    FP : 00000000    SP : 076c9e30

LB0: 0743badf    LT0: 0743bade    LC0: 00000000

LB1: 0743a7f1    LT1: 0743a7f0    LC1: 00000000

B0 : 00000000    L0 : 00000000    M0 : 00000000    I0 : 0779f060

B1 : 00000000    L1 : 00000000    M1 : 00000000    I1 : 00000000

B2 : 00000000    L2 : 00000000    M2 : 00000000    I2 : 00000000

B3 : 00000000    L3 : 00000000    M3 : 00000000    I3 : 00000000

A0.w: 00000282   A0.x: 00000000   A1.w: 00000282   A1.x: 00000000

USP : 0779fc14  ASTAT: 02002000

 

Hardware Trace:

   0 Target : <0x00004820> { _trap_c + 0x0 }

     Source : <0xffa00b34> { _exception_to_level5 + 0xb4 }

   1 Target : <0xffa00a80> { _exception_to_level5 + 0x0 }

     Source : <0xffa009dc> { _ex_trap_c + 0x68 }

   2 Target : <0x00004684> { _panic_cplb_error + 0x0 } <- trace stopped again here

     Source : <0xffa01390> { __cplb_hdr + 0x98 } <- trace restarted after CPLB handler

   3 Target : <0xffa01308> { __cplb_hdr + 0x10 } <- trace stopped around here

     Source : <0xffa01302> { __cplb_hdr + 0xa } <- cplbhdlr.S: .Lnot_data_write

   4 Target : <0xffa012f8> { __cplb_hdr + 0x0 }

     Source : <0xffa0084c> { _ex_dcplb_miss + 0x64 }

   5 Target : <0xffa007e8> { _ex_dcplb_miss + 0x0 } <- same as ex_icplb_miss

     Source : <0xffa00bd4> { _trap + 0x28 }

   6 Target : <0xffa00bac> { _trap + 0x0 }

     Source : <0xffa00d0a> { _system_call + 0x66 }  <- entry.S:~545:  call (P5), call syscall

   7 Target : <0xffa00ca4> { _system_call + 0x0 }

     Source : <0xffa0125a> { _evt_system_call + 0x62 }

   8 Target : <0xffa011f8> { _evt_system_call + 0x0 }

     Source : <0xffa008c8> { _ex_syscall + 0x14 }

   9 Target : <0xffa008b4> { _ex_syscall + 0x0 }

     Source : <0xffa00bd4> { _trap + 0x28 }

  10 Target : <0xffa00bac> { _trap + 0x0 }

     Source : <0x07437a2e> [ watchdogd + 0x379ee ]  <- libc/sysdeps/linux/common/select.c:18

  11 Target : <0x07437a1c> [ watchdogd + 0x379dc ]

     Source : <0x07437a8e> [ watchdogd + 0x37a4e ]

  12 Target : <0x07437a54> [ watchdogd + 0x37a14 ]

     Source : <0x07419ea2> [ watchdogd + 0x19e62 ]

  13 Target : <0x07419e30> [ watchdogd + 0x19df0 ]

     Source : <0x07419e4a> [ watchdogd + 0x19e0a ]

  14 Target : <0x07419e30> [ watchdogd + 0x19df0 ]

     Source : <0x07419e4a> [ watchdogd + 0x19e0a ]

  15 Target : <0x07419e30> [ watchdogd + 0x19df0 ]

     Source : <0x07419e4a> [ watchdogd + 0x19e0a ]

Stack from 076c9e0c:

        00016220 00000000 ffa00b38 ff8016a8 ff8016a8 ff8016a4 000b22bc 0000ffff

        076c9e4c 00040de0 00008030 00000025 00000000 076c9f0c 00040de0 00040de0

        ffa00d0c 00000008 02002000 0743a7f1 0743badf 0743a7f0 0743bade 00000000

        00000000 00000282 00000000 00000282 00000000 00000000 00000000 00000000

        00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000

        00000000 00000000 00000000 00000000 0779f060 0779fc14 00000000 00040de0

 

Call Trace:

[<0000fffe>] _allow_signal+0x5e/0x78

[<00008000>] _show_mem+0xc8/0xe8

 

Modules linked in: tun devfpgasport devfpgareg devfpgairq

Kernel panic - not syncing: Kernel exception

 

It always seems to happen like this, but with different apps and system calls, somewhat randomly.  To me it looks like the point of the jump to the system call has caused a ICPLB miss which then can't find the entry from the switch table, but I'm not sure why?

 

I'm also not quite sure of the address which is faulting.  The first read of ICPLB_FAULT_ADDR doesn't match that in the later dump or the reported RETX value, but faulting at _sys_pselect6 would match what the application is trying to do and the point at which the trap is taken after _system_call + 0x66.

 

Is there any debug code or maybe a C implementation of the cplb handler which maybe easier to work with?

 

Regards,

 

Mike

QuoteReplyEditDelete

 

 

2008-11-12 13:24:37     Re: Double fault

Michael McTernan (UNITED KINGDOM)

Message: 65168   

 

One other observation is that a couple of lookup tables are used by the cplb handler:

 

 

.data

.align 4;

_page_size_table:

.byte4    0x00000400;    /* 1K */

.byte4    0x00001000;    /* 4K */

.byte4    0x00100000;    /* 1M */

.byte4    0x00400000;    /* 4M */

 

.align 4;

_dcplb_preference:

.byte4    0x00000001;    /* valid bit */

.byte4    0x00000002;    /* lock bit */

 

 

Isn't there a risk that these could cause a double fault if they aren't mapped?  I'm playing with putting them into L1 SRAM, which seems to have a locked down entry so would avoid this.

QuoteReplyEditDelete

 

 

2008-11-12 13:46:59     Re: Double fault

Mike Frysinger (UNITED STATES)

Message: 65169   

 

weird, the cplb entries that cover the kernel should be locked.  your cplbinfo output seems to indicate that they are not.  my trunk kernel correctly shows:

 

Instruction CPLB entry:

Address         Data    Size    Valid   Locked  Swapin  iCount  oCount

0x00000000      0x00083 1K      Y       Y        0      0       0

0xffa00000      0x30007 4M      Y       Y        1      0       0

0x00000000      0x31287 4M      Y       Y        3      0       0

0x00400000      0x31287 4M      Y       Y        4      0       0

0x00800000      0x31287 4M      Y       Y        5      0       0

 

Data CPLB entry:

Address         Data    Size    Valid   Locked  Swapin  iCount  oCount

0x00000000      0x00083 1K      Y       Y        0      0       0

0xff800000      0x3009f 4M      Y       Y        1      0       0

0x00000000      0x3d09f 4M      Y       Y        2      0       0

0x00400000      0x3d09f 4M      Y       Y        3      0       0

0x00800000      0x3d09f 4M      Y       Y        4      0       0

 

if the mappings covering the kernel get swapped out, then things will crash.  probably much like the way you describe.

QuoteReplyEditDelete

 

 

2008-11-12 13:50:47     Re: Double fault

Michael McTernan (UNITED KINGDOM)

Message: 65170   

 

Cool - moving the lookups to the locked L1 SRAM makes it work well.

 

I checked the function to that checks the address to see if something is in the kernel or not, and it's always returning FALSE for me.  I'm wondering if it is something to do with which bits we have configured for L1 SRAM and the 128M SDRAM on board.

 

 

QuoteReplyEditDelete

 

 

2008-11-12 17:19:37     Re: Double fault

Mike Frysinger (UNITED STATES)

Message: 65172   

 

i just booted a vanilla 2008R1.5 svn branch and it seemed OK to me:

 

Instruction CPLB entry:

Address         Data    Size    Valid   Locked  Swapin  iCount  oCount

0x00000000      0x00083 1K      Y       Y        0      0       0

0xffa00000      0x30007 4M      Y       Y        1      0       0

0x00000000      0x31287 4M      Y       Y        2      0       0

0x00400000      0x31287 4M      Y       Y        3      0       0

0x00800000      0x31287 4M      Y       Y        4      0       0

 

Data CPLB entry:

Address         Data    Size    Valid   Locked  Swapin  iCount  oCount

0x00000000      0x00083 1K      Y       Y        0      0       0

0xff800000      0x3009f 4M      Y       Y        1      0       0

0x00000000      0x3d09f 4M      Y       Y        2      0       0

0x00400000      0x3d09f 4M      Y       Y        3      0       0

0x00800000      0x3d09f 4M      Y       Y        4      0       0

 

print out the values that are in play with lock_kernel_check() ... start/end/_end/_stext.  see what's what ...

QuoteReplyEditDelete

 

 

2008-11-13 05:57:07     Re: Double fault

Michael McTernan (UNITED KINGDOM)

Message: 65235   

 

Thanks for your output - with that I think I've got it:

 

Bytes transferred = 1888431 (1cd0af hex)

## Booting image at 01000000 ...

   Image Name:   Linux-2.6.22.19-ADI-2008R1.5-AIR

   Image Type:   Blackfin Linux Kernel Image (gzip compressed)

   Data Size:    1888367 Bytes =  1.8 MB

   Load Address: 00001000

   Entry Point:  00168000

   Verifying Checksum ... OK

   Uncompressing Kernel Image ... OK

Starting Kernel at = 168000

Linux version 2.6.22.19-ADI-2008R1.5-AIRV-0-svn9142 (bagside@localhost.localdomain) (gcc version 4.1.2 (ADI svn)) #379 Thu Nov 13 10:00:26 GMT 2008

early printk enabled on early_BFuart0

Hardware Trace Active and Enabled

Blackfin support (C) 2004-2007 Analog Devices, Inc.

Compiled for ADSP-BF537 Rev 0.3

Blackfin Linux support by http://blackfin.uclinux.org/

Processor Speed: 598 MHz core clock and 119 MHz System Clock

Board Memory: 128MB

Kernel Managed Memory: 128MB

Memory map:

  text      = 0x00001000-0x0010eb60

  rodata    = 0x0010f000-0x001568a8

  data      = 0x00157000-0x00168000

    stack   = 0x00158000-0x0015a000

  init      = 0x00168000-0x00335000

  bss       = 0x00335000-0x003452c4

  available = 0x003452c4-0x07dff000

  DMA Zone  = 0x07e00000-0x08000000

NOMPU: setting up cplb tables for global access

       MAX_SWITCH_I_CPLBS=94 MAX_SWITCH_D_CPLBS =104

Zero Pointer Guard Page: start=0 end=400 page=400

before: pos=0 size=32

Outside kernel: kern=1000<->3452c4  0 400

  after: pos=2 size=32

Outside kernel: kern=1000<->3452c4  0 400

Zero Pointer Guard Page: start=0 end=400 page=400

before: pos=0 size=94

Outside kernel: kern=1000<->3452c4  0 400

  after: pos=2 size=94

Outside kernel: kern=1000<->3452c4  0 400

L1 I-Memory: start=ffa00000 end=ffa0c000 page=400000

before: pos=2 size=32

Outside kernel: kern=1000<->3452c4  ffa00000 ffe00000

  after: pos=4 size=32

L1 I-Memory: start=ffa00000 end=ffa0c000 page=400000

before: pos=2 size=94

Outside kernel: kern=1000<->3452c4  ffa00000 ffe00000

  after: pos=4 size=94

Outside kernel: kern=1000<->3452c4  ff800000 ffc00000

Outside kernel: kern=1000<->3452c4  ff800000 ffc00000

Kernel Memory: start=0 a_start=0 a_end=7c00000 end=7dff000 page=0

before: pos=4 size=32

   mid: pos=4 size=32

Outside kernel: kern=1000<->3452c4  0 400000

Outside kernel: kern=1000<->3452c4  400000 800000

Outside kernel: kern=1000<->3452c4  800000 c00000

Outside kernel: kern=1000<->3452c4  c00000 1000000

 

Our kernel is quite small - less than the 4M page size, and additionally starts at 1k for the NULL pointer checking.  So the kernel check fails as neither the start or end address of the first page fall inside the kernel which is from 0x1000 to 0x3452c4, hence it doesn't gets it's pages locked like yours

 

I've submitted patch 4643 which is my attempt to fix this.  Please take a look and fix the 2008R1.5 branch.

 

config

QuoteReplyEditDelete

 

 

2008-11-13 11:24:11     Re: Double fault

Mike Frysinger (UNITED STATES)

Message: 65246   

 

err, what ?  your load address is at *4k* not *1k* right ?  you should never set the load address below 4k ...

 

i'm trying to reproduce your behavior but i cant ... my kernel is def smaller than 4meg.  ive formatted the output to something (i think) is more readable.

 

you can see that my kernel starts at 0x1000 and ends at 0x164000.  the first 4 meg check for "Kernel Memory" returns true and so that gets locked (for both data and inst).

 

if we look at your settings, your kernel starts at 0x1000 and ends at 0x3452c4.  the check performed is:

    if (start >= (u32)_end || end <= (u32)_stext)

    if (0 >= 0x3452c4 || 0x400000 <= 0x1000)

so this region should be marked as "in kernel" and thus get locked ...

 

INST: Zero Pointer Guard Page:

0: _stext:0x1000 <= start: 0x0 <= _end: 0x164000 <= end:0x400

DATA: Zero Pointer Guard Page:

0: _stext:0x1000 <= start: 0x0 <= _end: 0x164000 <= end:0x400

INST: L1 I-Memory:

0: _stext:0x1000 <= start: 0xffa00000 <= _end: 0x164000 <= end:0xffe00000

DATA: L1 D-Memory:

0: _stext:0x1000 <= start: 0xff800000 <= _end: 0x164000 <= end:0xffc00000

INST: L2 Memory:

0: _stext:0x1000 <= start: 0xfeb00000 <= _end: 0x164000 <= end:0xfec00000

DATA: L2 Memory:

0: _stext:0x1000 <= start: 0xfeb00000 <= _end: 0x164000 <= end:0xfec00000

INST: Kernel Memory:

1: _stext:0x1000 <= start: 0x0 <= _end: 0x164000 <= end:0x400000

0: _stext:0x1000 <= start: 0x400000 <= _end: 0x164000 <= end:0x800000

0: _stext:0x1000 <= start: 0x800000 <= _end: 0x164000 <= end:0xc00000

0: _stext:0x1000 <= start: 0xc00000 <= _end: 0x164000 <= end:0x1000000

0: _stext:0x1000 <= start: 0x1000000 <= _end: 0x164000 <= end:0x1400000

0: _stext:0x1000 <= start: 0x1400000 <= _end: 0x164000 <= end:0x1800000

0: _stext:0x1000 <= start: 0x1800000 <= _end: 0x164000 <= end:0x1c00000

0: _stext:0x1000 <= start: 0x1c00000 <= _end: 0x164000 <= end:0x2000000

0: _stext:0x1000 <= start: 0x2000000 <= _end: 0x164000 <= end:0x2400000

0: _stext:0x1000 <= start: 0x2400000 <= _end: 0x164000 <= end:0x2800000

0: _stext:0x1000 <= start: 0x2800000 <= _end: 0x164000 <= end:0x2c00000

0: _stext:0x1000 <= start: 0x2c00000 <= _end: 0x164000 <= end:0x3000000

0: _stext:0x1000 <= start: 0x3000000 <= _end: 0x164000 <= end:0x3400000

0: _stext:0x1000 <= start: 0x3400000 <= _end: 0x164000 <= end:0x3800000

0: _stext:0x1000 <= start: 0x3800000 <= _end: 0x164000 <= end:0x3c00000

0: _stext:0x1000 <= start: 0x3c00000 <= _end: 0x164000 <= end:0x3d00000

0: _stext:0x1000 <= start: 0x3d00000 <= _end: 0x164000 <= end:0x3e00000

DATA: Kernel Memory:

1: _stext:0x1000 <= start: 0x0 <= _end: 0x164000 <= end:0x400000

0: _stext:0x1000 <= start: 0x400000 <= _end: 0x164000 <= end:0x800000

0: _stext:0x1000 <= start: 0x800000 <= _end: 0x164000 <= end:0xc00000

0: _stext:0x1000 <= start: 0xc00000 <= _end: 0x164000 <= end:0x1000000

0: _stext:0x1000 <= start: 0x1000000 <= _end: 0x164000 <= end:0x1400000

0: _stext:0x1000 <= start: 0x1400000 <= _end: 0x164000 <= end:0x1800000

0: _stext:0x1000 <= start: 0x1800000 <= _end: 0x164000 <= end:0x1c00000

0: _stext:0x1000 <= start: 0x1c00000 <= _end: 0x164000 <= end:0x2000000

0: _stext:0x1000 <= start: 0x2000000 <= _end: 0x164000 <= end:0x2400000

0: _stext:0x1000 <= start: 0x2400000 <= _end: 0x164000 <= end:0x2800000

0: _stext:0x1000 <= start: 0x2800000 <= _end: 0x164000 <= end:0x2c00000

0: _stext:0x1000 <= start: 0x2c00000 <= _end: 0x164000 <= end:0x3000000

0: _stext:0x1000 <= start: 0x3000000 <= _end: 0x164000 <= end:0x3400000

0: _stext:0x1000 <= start: 0x3400000 <= _end: 0x164000 <= end:0x3800000

0: _stext:0x1000 <= start: 0x3800000 <= _end: 0x164000 <= end:0x3c00000

0: _stext:0x1000 <= start: 0x3c00000 <= _end: 0x164000 <= end:0x3d00000

0: _stext:0x1000 <= start: 0x3d00000 <= _end: 0x164000 <= end:0x3e00000

QuoteReplyEditDelete

 

 

2008-11-13 17:10:45     Re: Double fault

Mike Frysinger (UNITED STATES)

Message: 65257   

 

oh, i think i see now ... the branch does this:

 

static u16 __init lock_kernel_check(u32 start, u32 end)

{

    if ((end   <= (u32) _end && end   >= (u32)_stext) ||

        (start <= (u32) _end && start >= (u32)_stext))

        return IN_KERNEL;

    return 0;

}

...

        if (lock_kernel_check(start, start + block_size) == IN_KERNEL)

...

 

while trunk does this:

 

static bool __init lock_kernel_check(u32 start, u32 end)

{

    if (start >= (u32)__init_begin || end <= (u32)_stext)

        return false;

    return true;

}

...

        if (lock_kernel_check(start, start + block_size))

...

 

could you change your code to read the way it does in trunk to see if it fixes things ?

QuoteReplyEditDelete

 

 

2008-11-13 18:59:19     Re: Double fault

Michael McTernan (UNITED KINGDOM)

Message: 65258   

 

Apologies - should have be 4k, yes.  I'm thinking 4k, reading 0x1000 then foolishly writing 1k   The memory map listing is of course accurate:

 

Memory map:

  text      = 0x00001000-0x0011d8f0

  rodata    = 0x0011e000-0x0014e670

  data      = 0x0014f000-0x00160000

    stack   = 0x00150000-0x00152000

  init      = 0x00160000-0x0035a000

  bss       = 0x0035a000-0x0036a2c4

  available = 0x0036a2c4-0x07dff000

  DMA Zone  = 0x07e00000-0x08000000

 

So the problem condition is where the first page is checked as being from 0x0 to 0x400000 (but really this page should end at 0x3FFFFF as per the patch).

 

The following is the check in lock_kernel_check():

 

    if ((end   <= (u32) _end && end   >= (u32)_stext) ||

        (start <= (u32) _end && start >= (u32)_stext))

    {

        ....

 

Which expands to:

 

    if ((0x400000 <= 0x0036a2c4 && 0x400000 >= 0x00001000) ||

        (0x000000 <= 0x0036a2c4 && 0x000000 >= 0x00001000))

    {

        ....

 

Which is false.

 

This is the problem which I think is now understood; the check is whether the page start or end is within kernel address space, but with this small kernel starting at 4k and not exceeding the 4M boundary, the page contains the whole kernel area with both area ends outside the kernel address range.  The 2008R1.5 test misses this unlikely case as.

 

The trunk test you point out is:

 

    if (start >= (u32)__init_begin || end <= (u32)_stext)

 

Expanding to:

 

    if (0x0 >= (u32)__init_begin || 0x400000 <= 0x00001000)

 

Not sure what value to take for init_begin, but assuming it's not 0x0 when NULL pointer detection is on, this still won't pass?

 

So in the submitted patch the kernel end address gets rounded up to a 1M boundary minus 1 byte which I think should work out in __fill_data_cplbtab() where psize = 0 and hence uses 1M pages upto a 4M alignment which should hit the end >= _end condition.  Certainly the kernel page now gets locked in and things are really really stable:

 

------------------ CPLB Information ------------------

 

Instruction CPLB entry:

Address        Data    Size    Valid    Locked    Swapin    iCount    oCount

0x00000000    0x00083    1K    Y    Y     0    0    0

0xffa00000    0x30007    4M    Y    Y     1    0    0

0x00000000    0x31287    4M    Y    Y     2    0    0

0x00400000    0x31287    4M    Y    Y     3    0    0

0x00800000    0x31205    4M    Y    N     4    780    780

<snip>

 

Data CPLB entry:

Address        Data    Size    Valid    Locked    Swapin    iCount    oCount

0x00000000    0x00083    1K    Y    Y     0    0    0

0xff800000    0x3009f    4M    Y    Y     1    0    0

0x00000000    0x3109f    4M    Y    Y     2    0    0

0x00400000    0x3109f    4M    Y    Y     3    0    0

0x00800000    0x3109d    4M    Y    N    -1    1856    1857

<snip>

 

I'm not sure it is needed, but I think the check should be something like the following:

 

    if ((end   <= (u32) _end && end   >= (u32)_stext) || /* page end in kernel space */

        (start <= (u32) _end && start >= (u32)_stext) || /* page start in kernel space */

        (_start >= start && _start <= end) ||            /* kernel start within the page */

        (_end   >= start && _end   <= end))              /* kernel end within the page */

    {

        ....

 

This is just 1 time init code that runs 120 or so times - explicit and slow should be fine.  That said, I think rounding the kernel end boundary up to meet the page end also works, while the trunk code will still fail.

 

 

QuoteReplyEditDelete

 

 

2008-11-14 05:46:56     Re: Double fault

Mike Frysinger (UNITED STATES)

Message: 65291   

 

__init_begin and _end are defined in the kernel linker script ... they basically work out to be the same thing.

 

the NULL page is not the one that matters.  it wouldnt matter whether it's enabled as the Kernel Memory check is always the same: 0 .... 4meg.

 

while your code is explicit, it also has pointless lines.  _stext will always be smaller than _end (__init_begin), and start will always be smaller than end.  i think the test that is in trunk now is correct ... ive backported that to the branch and now i get correct behavior there as well.

 

please svn up the branch and make sure it fixes your problem.

QuoteReplyEditDelete

 

 

2008-11-18 15:45:55     Re: Double fault

Michael McTernan (UNITED KINGDOM)

Message: 65453   

 

Yup - that fixes it.  With the top of SVN 2008R1 I get:

 

Instruction CPLB entry:

Address         Data    Size    Valid   Locked  Swapin  iCount  oCount

0x00000000      0x00083 1K      Y       Y        0      0       0

0xffa00000      0x30007 4M      Y       Y        1      0       0

0x00000000      0x31287 4M      Y       Y        2      0       0

0x00400000      0x31205 4M      Y       N       -1      3       4

...

 

Data CPLB entry:

Address         Data    Size    Valid   Locked  Swapin  iCount  oCount

0x00000000      0x00083 1K      Y       Y        0      0       0

0xff800000      0x3009f 4M      Y       Y        1      0       0

0x00000000      0x3109f 4M      Y       Y        2      0       0

0x00400000      0x3109d 4M      Y       N        3      357     357

...

 

The system map is still the buggy case:

 

[bagside@bazalgette linux-2.6.x]$ grep " __end$\| __stext$" System.map

00001000 T __stext

003762c4 B __end

 

Regarding the patch I submitted, I still think there is a case to ensure that all entries made it into the switch table rather than silently dropping the records when it's fill, presumably with a CPLB miss much later on in execution.  But you can of course take from it what you want - I don't expect to see any problems with your fix now :-)

 

Many Thanks, Mike

 

 

QuoteReplyEditDelete

 

 

2008-11-18 16:01:43     Re: Double fault

Mike Frysinger (UNITED STATES)

Message: 65454   

 

not sure i follow what you mean by "the buggy case" ... could you elaborate ?  the output you posted looks fine to me ...

 

having a sanity check to verify initial kernel coverage would be good, but i'd like to find a simpler method ... simpler implies less likely to be buggy

QuoteReplyEditDelete

 

 

2008-11-18 16:09:35     Re: Double fault

Michael McTernan (UNITED KINGDOM)

Message: 65455   

 

'Buggy case' only to mean that the kernel is still fitting within the first 4M area and starting after 4k i.e. the conditions under which the initial problem appeared were reproduced.

 

> the output you posted looks fine to me ...

 

Indeed it is.  I'm very happy about that :-)

 

> but i'd like to find a simpler method ...

 

Sounds like one for a rainy day - I'm happy for you to reject the patch to close it...

 

Thanks again, Mike

Attachments

Outcomes