2010-05-14 02:44:35     Weird problem

Document created by Aaronwu Employee on Sep 26, 2013
Version 1Show Document
  • View in full screen mode

2010-05-14 02:44:35     Weird problem

Thorsten Pohlmann (GERMANY)

Message: 89394   

 

Hi!

 

I'am having a severe problem, maybe it's a new or not well-handled processor-bug:

 

BF537 0.3 on a prop. board. ucLinux 2009R1 RC6 with xenomai, Toolchain 09r1-10

 

The setup: both sports send and receive data (different unsynced external clocks), every 125usec a frame completes and dma generates a interrupt, both rx-dmas have the same irq-level. This works for about 5 minutes, then the weird thing happens:

 

In xenomais shared interrupt-handler (xnintr_shirq_handler, /kernel/xenomai/nucleus/intr.c) there is this code:

 

intr = shirq->handlers;

 

while (intr) {

  int ret;

 

  ret = intr->isr(intr);

 

.....

        intr = intr->next;

}

 

the failing part is the loop:

 

The top-part including the isr() call: (objdump of intr.o at offset 0x930)

 

[037f98]    R0 = P5 ;

[037f9A]    P2 = [ P5 + 0x8 ] ;   <-- null-pointer access

[037f9C]    CALL ( P2 ) ;

 

the while() part was moved to the end of the loop: (offset 0x9b4)

 

[03801C]    NOP ;

[03801E]    P5 = [ P5 ] ;

[038020]    CC = P5 == 0 ;

[038022]    IF ! CC JUMP -138 /*0x37F98*/ ( BP ) ;

 

Interupts are disabled at this point, but what happens is that P5 is zero at the re-loop!!

 

As i said, this works about 5min, during this xnintr_shirq_handler with this loop is called about ~100000 to 150000 times. It is reproducable evey time, if only one sport is running (the interrupt is not shared then) it does not happen. I guess the "05000245 - False Hardware Error from an Access in the Shadow of a Conditional Branch" bug is not handled well in this case?!

 

hardware trace proofs that there is no irq/whatever interrupting:

 

NULL pointer access

Kernel OOPS in progress

Deferred Exception context

 

No Valid process in current context

return address: [0x00037f9a]; contents of:

0x00037f70:  b9e1  b9f2  0c00  bbb1  bbc2  10e3  0000  0000

0x00037f80:  0000  a2a0  6408  b2a0  915d  0c45  18b7  e144

0x00037f90:  002b  6005  e104  7184  3045 [acaa] 0062  4808

0x00037fa0:  5745  1c3d  a228  6408  b228  e522  008f  e422

 

ADSP-BF537-0.3 600(MHz CCLK) 120(MHz SCLK) (mpu off)

Linux version 2.6.28.10-ADI-2009R1

Built with gcc version 4.1.2 (ADI svn)

 

SEQUENCER STATUS:        Tainted: P        

SEQSTAT: 00000027  IPEND: 8230  SYSCFG: 0006

  EXCAUSE   : 0x27

  interrupts disabled

  physical IVG5 asserted : <0xffa00c94> { _evt_ivhw + 0x0 }

  physical IVG9 asserted : <0xffa00dcc> { _evt_evt9 + 0x0 }

  physical IVG15 asserted : <0xffa00e44> { _evt_system_call + 0x0 }

  logical irq   6 mapped  : <0xffa003a4> { _timer_interrupt + 0x0 }

  logical irq  10 mapped  : <0x0018c400> { _bfin_rtc_interrupt + 0x0 }

  logical irq  16 mapped  : <0x0018f49c> { _bfin_twi_interrupt_entry + 0x0 }

  logical irq  18 mapped  : <0x0017bd54> { _bfin_serial_rx_int + 0x0 }

  logical irq  19 mapped  : <0x0017bee8> { _bfin_serial_tx_int + 0x0 }

  logical irq  24 mapped  : <0x001856d0> { _bfin_mac_interrupt + 0x0 }

  logical irq  73 mapped  : <0x0017bb38> { _bfin_serial_mctrl_cts_int + 0x0 }

  logical irq  80 mapped  : <0xffa03514> /* kernel dynamic memory */

RETE: <0x00000000> /* Maybe null pointer? */

RETN: <0x002f9dc8> /* kernel dynamic memory */

RETX: <0x00000480> /* Maybe fixed code section */

RETS: <0x00037f9e> { _xnintr_shirq_handler + 0x6e }

PC  : <0x00037f9a> { _xnintr_shirq_handler + 0x6a }

DCPLB_FAULT_ADDR: <0x00000008> /* Maybe null pointer? */

ICPLB_FAULT_ADDR: <0x00037f9a> { _xnintr_shirq_handler + 0x6a }

 

PROCESSOR STATE:

R0 : 00000001    R1 : cbc744d9    R2 : 0000002a    R3 : 00000000

R4 : 002b7184    R5 : 00000002    R6 : cbc744d9    R7 : 0000002a

P0 : 03bcb5b8    P1 : 002b6f58    P2 : 002b7184    P3 : 002b6a8c

P4 : 002b6f48    P5 : 00000001    FP : 002f9de8    SP : 002f9cec

LB0: 00185588    LT0: 00185588    LC0: 00000000

LB1: 0018558a    LT1: 0018557e    LC1: 000001db

B0 : 00000000    L0 : 00000000    M0 : ffffffe8    I0 : 03dbab78

B1 : 00000000    L1 : 00000000    M1 : 00000000    I1 : 03dbaba4

B2 : 00000000    L2 : 00000000    M2 : 00000000    I2 : 00000000

B3 : 00000000    L3 : 00000000    M3 : 00000000    I3 : 00000000

A0.w: 00000000   A0.x: 00000000   A1.w: 00000000   A1.x: 00000000

USP : 002fa000  ASTAT: 02003005

 

Hardware Trace:

   0 Target : <0x00004ca0> { _trap_c + 0x0 }

     Source : <0xffa0068a> { _exception_to_level5 + 0xae } CALL pcrel

   1 Target : <0xffa005dc> { _exception_to_level5 + 0x0 }

     Source : <0xffa00498> { _bfin_return_from_exception + 0x18 } RTX

   2 Target : <0xffa00480> { _bfin_return_from_exception + 0x0 }

     Source : <0xffa00534> { _ex_trap_c + 0x6c } JUMP.S

   3 Target : <0xffa004c8> { _ex_trap_c + 0x0 }

     Source : <0xffa00754> { _trap + 0x58 } JUMP (P4)

   4 Target : <0xffa006fc> { _trap + 0x0 }

     Source : <0x00037f98> { _xnintr_shirq_handler + 0x68 } 0x3045

   5 Target : <0x00037f98> { _xnintr_shirq_handler + 0x68 }

     Source : <0x00038022> { _xnintr_shirq_handler + 0xf2 } IF CC JUMP

   6 Target : <0x00038006> { _xnintr_shirq_handler + 0xd6 }

     Source : <0x00037ff8> { _xnintr_shirq_handler + 0xc8 } IF !CC JUMP

   7 Target : <0x00037fc4> { _xnintr_shirq_handler + 0x94 }

     Source : <0x00037fc0> { _xnintr_shirq_handler + 0x90 } IF !CC JUMP

   8 Target : <0x00037f9e> { _xnintr_shirq_handler + 0x6e }

     Source : <0xffa0350c> /* kernel dynamic memory */ RTS

   9 Target : <0xffa03506> /* kernel dynamic memory */

     Source : <0x0005d354> { _rtdm_event_signal + 0x78 } RTS

  10 Target : <0x0005d34e> { _rtdm_event_signal + 0x72 }

     Source : <0x0005d346> { _rtdm_event_signal + 0x6a } IF CC JUMP

  11 Target : <0x0005d33e> { _rtdm_event_signal + 0x62 }

     Source : <0x0005d366> { _rtdm_event_signal + 0x8a } JUMP.S

  12 Target : <0x0005d366> { _rtdm_event_signal + 0x8a }

     Source : <0x0003a9be> { _xnpod_schedule + 0x36 } RTS

  13 Target : <0x0003a9b8> { _xnpod_schedule + 0x30 }

     Source : <0x0003a9aa> { _xnpod_schedule + 0x22 } IF !CC JUMP

  14 Target : <0x0003a992> { _xnpod_schedule + 0xa }

     Source : <0x001998fa> { _rthal_defer_switch_p + 0x26 } RTS

  15 Target : <0x001998d4> { _rthal_defer_switch_p + 0x0 }

     Source : <0x0003a98e> { _xnpod_schedule + 0x6 } CALL pcrel

 

thank you for your work, regards

TranslateQuoteReplyEditDelete

 

 

2010-05-14 03:20:40     Re: Weird problem

Thorsten Pohlmann (GERMANY)

Message: 89400   

 

Hi!

 

Reading would help, P5 is not null, but 0x1, i was blinded by the exception at 0x8, which in my eyes should be a 0x9 ([p5 + 0x8]) !?

 

So this is a just problem who destroys the intr-struct. Sorry for that.

 

regards

TranslateQuoteReplyEditDelete

 

 

2010-05-14 08:14:43     Re: Weird problem

Thorsten Pohlmann (GERMANY)

Message: 89407   

 

Ok, problem found and solved, which leads to another problem:

 

In the kernel-module which implements the sport-irq-handlers, i included the header files from an older distribution (2009R1-rc2 instead of rc6), so the xnintr_t structs had different sizes because the build options for tracing were set in the kernel but not in the module.

 

So why didnt insmod recognize the different versions of module and kernel?

 

 

 

regards

TranslateQuoteReplyEditDelete

 

 

2010-05-14 10:39:53     Re: Weird problem

Robin Getz (UNITED STATES)

Message: 89411   

 

Thorsten:

 

Most likely because the kernel versions were the same.

 

The only checks the version - not the build number.

 

-Robin

QuoteReplyEditDelete

 

 

2010-05-14 13:18:24     Re: Weird problem

Thorsten Pohlmann (GERMANY)

Message: 89416    Ok, it's getting a bit off topic now, but it would be a problem even

in the same kernel. If the size of necessary and shared structs can

change, there should be some kind of validation. Otherwise you get to

hell if you provide binary modules... Is there any trick i am missing?

Regards,

TranslateQuoteReplyEditDelete

 

 

2010-05-14 13:33:06     Re: Weird problem

Robin Getz (UNITED STATES)

Message: 89417   

 

Thorsten:

 

Same happens on a desktop.

 

Since root is the only person who can install modules - it's expected that they know what they are doing.

 

-Robin

QuoteReplyEditDelete

 

 

2010-05-16 14:53:35     Re: Weird problem

Mike Frysinger (UNITED STATES)

Message: 89451   

 

did you enable the kernel options related to module checksuming/versioning ?

 

if not, then it's a misconfiguration on your side.

 

if so, then those options try hard to detect mismatches, but ultimately (as Robin said), it's on the head of the guy loading things ...

QuoteReplyEditDelete

Attachments

    Outcomes