2010-03-16 07:39:47     System Freezing

Document created by Aaronwu Employee on Aug 21, 2013Last modified by Aaronwu Employee on Aug 21, 2013
Version 2Show Document
  • View in full screen mode

2010-03-16 07:39:47     System Freezing

Stefan Wanja (GERMANY)

Message: 87252   

 

Hello,

 

I have a strange problem on a custom bf527 board using the latest svn from 2009R1.1 and its kernel.

 

The problem is, that at random (it seems) times the whole system freezes completely (sometimes after some minutes, sometimes after 3 days). I can connect with VisualDSP and JTAG, but I get only non-sense values for everything I can check, I can't even Reset the chip with it.

 

I tried to get some hints using the "after death" (check the kernel log from uboot), but there was only normal info, nothing that pointed to a problem.

 

I kind of suspect the network, because it seems to be happening more (maybe only) when there is traffic (I use a custom app and iperf to see this).

 

Does somebody have a hint, what I can do to figure the problem out, how to debug it, or what could cause such a freeze?

 

Can it be software at all?

 

Kind Regards,

 

Stefan

QuoteReplyEditDelete

 

 

2010-03-16 10:06:31     Re: System Freezing

Robin Getz (UNITED STATES)

Message: 87258   

 

Stefan:

 

Do you have the watchdog turned on? and what is the state of the doublefault? (CONFIG_DEBUG_DOUBLEFAULT_RESET=y?)

 

-Robin

QuoteReplyEditDelete

 

 

2010-03-16 10:25:46     Re: System Freezing

Stefan Wanja (GERMANY)

Message: 87262   

 

Hi Robin,

 

I had set CONFIG_DEBUG_DOUBLEFAULT_PRINT=y but nothing has been printed when it happend. I think I don't use a watchdog (CONFIG_WATCHDOG is not set, if that is it). What would that help me? I'll enable it now to see if there is another behaviour.

 

Hoping for more help!

 

Stefan

QuoteReplyEditDelete

 

 

2010-03-16 12:48:59     Re: System Freezing

Stefan Wanja (GERMANY)

Message: 87274   

 

Hi again,

 

had another "Freeze" and either the now active watchdog or the DOUBLEFAULT_RESET option led to a reset. No info though on the reason...

 

Kind Regards,

 

Stefan

QuoteReplyEditDelete

 

 

2010-03-16 14:36:14     Re: System Freezing

Robin Getz (UNITED STATES)

Message: 87283   

 

Stefan:

 

If the system is freezing up while the watchdog is on and active (CONFIG_WATCHDOG=y and CONFIG_BFIN_WDT=y, and watchdogd:unknown:/bin/watchdogd -f -s is in /etc/inittab, with it running - root:/> ps | grep watch

    4 root         0 SW   [watchdog/0]

  156 root       480 S    /bin/watchdogd -f -s

 

 

?

 

Can you define "freezing"?

 

-Robin

QuoteReplyEditDelete

 

 

2010-03-16 19:24:35     Re: System Freezing

Stefan Wanja (GERMANY)

Message: 87293   

 

Hi Robin,

 

I am not really sure what your post means...am I missing something? Anyway, CONFIG_WATCHDOG=y and CONFIG_BFIN_WDT=y and no init complaints about respawning of watchdog so I assume its running.

 

By "freezing" I mean what I posted first: the console doesn't react anymore, I can't ping, the audio samples in the buffer keep looping infinitely, I can still connect with VisualDSP and JTAG Debugger but get only **** data (every register shows up with the same content). There is no log of any kind, also not with the "post mortem" diagnosis method.

 

Even with almost all kernel debug features enabled no errors can be seen before that happens.

 

Kind Regards,

 

Stefan

QuoteReplyEditDelete

 

 

2010-03-17 08:13:27     Re: System Freezing

Stefan Wanja (GERMANY)

Message: 87325   

 

Hi again,

 

is there a way of seeing the call stack "after death"? That could help... In the meantime I could also see the bf527 ezkit with an unmodified uclinux 2009R1.1 freeze up as well while running iperf test for two days ...

 

Kind Regards,

 

Stefan

QuoteReplyEditDelete

 

 

2010-03-17 23:12:27     Re: System Freezing

Sonic Zhang (CHINA)

Message: 87356   

 

Attach a gnICE JTAG to your board. Start gdbproxy. Connect gdb to it. When kernel dies, stop gdb and see backtrace.

 

Please read the debug document in our wiki.

QuoteReplyEditDelete

 

 

2010-03-31 06:34:32     Re: System Freezing

Stefan Wanja (GERMANY)

Message: 87914   

 

Hello and thanks for that advice,

 

we got us a gnICE+ debugger and now I can see a bit more.

 

Looking at the frozen board with bfin-elf-gdb the stack consists only of three entries:

 

    evt_timer() at /..../linux-2.6.x/arch/blackfin/include/asm/current.h:17

    bfin_mac_hard_start_xmit(skb=0x457288, dev=0x3d8b0a0) at drivers/net/bfin_mac.c:685

    ??()

 

Looking at the memory there are zeros until 0x20400000, where the kernel begins, from there on its all 0xadadadad.

 

Looking at the registers there are only zeros except for

 

r7=0x3d8b0a0

p4=0x457288

sp=0x42fb10

fp=0x42fb10

rets=0xd4eec <bfin_mac_hard_start_xmit+388>

pc= 0xd4eec <bfin_mac_hard_start_xmit+388>

 

Within bfin_mac_hard_start_xmit it stopped at the function call to blackfin_dcache_flush_range with parameter data being 0x0 (problably the value after the flush).

 

I don't know how the caching works, but we've had problems in the bfin_mac driver with caching before... I think there is a bug in there.

 

Due to the yet unsolved older bug we (still) have WRITE_THROUGH cache policy in use as a work around, might be that the problem only occurs with that setting.

 

Sonic, can you see a bug in there?

 

Kind Regards,

 

Stefan

QuoteReplyEditDelete

 

 

2010-03-31 15:13:24     Re: System Freezing

Robin Getz (UNITED STATES)

Message: 87936   

 

Stefan:

 

Just to confirm - you are seeing the problem in both write through and write back, or just one?

 

-Robin

QuoteReplyEditDelete

 

 

2010-03-31 19:16:54     Re: System Freezing

Stefan Wanja (GERMANY)

Message: 87941   

 

Hi Robin,

 

I definitely see it in WRITE_THROUGH. I have not checked it with WRITE_BACK because there is an error leading to corrupted packets forcing me to use WRITE_THROUGH (blackfin.uclinux.org/gf/project/uclinux-dist/forum/).

 

I will check out if it happens in WRITE_BACK also, but it can take a while, sometimes the freeze only shows up after days of running...

 

Kind Regards,

 

Stefan

QuoteReplyEditDelete

 

 

2010-04-06 06:43:24     Re: System Freezing

Stefan Wanja (GERMANY)

Message: 88143   

 

Hi again,

 

using WRITE_BACK I saw a Doulbe Fault after about 38 hours running iperf on the bf527 EZKit. Unfortunately the dump is not complete, but here is what I could see on the console:

 

  13 Target : <0x000b8bf0> { __raw_spin_unlock + 0x0 }

     Source : <0x0012c650> { __spin_unlock_irqrestore + 0x8 } CALL pcrel

  14 Target : <0x0012c648> { __spin_unlock_irqrestore + 0x0 }

     Source : <0x000128bc> { _release_console_sem + 0x1a4 } CALL pcrel

  15 Target : <0x000128b2> { _release_console_sem + 0x19a }

     Source : <0x00026ab2> { _up + 0x32 } RTS

Stack info:

SP: [0x03de7a90] <0x03de7a90> /* kernel dynamic memory */

FP: (0x03de7a94)

Memory from 0x03de7a90 to 03de8000

03de7a90:[00000000](03de7ab4)<00032ece> 00220b28  00220b40  00000000  00000000  09ec4d18

03de7ab0:<0034b048>(03de7adc)<00033030> 0021e2dc  00000006  00191b18  03de59c0  00000000

03de7ad0:<0012c6e0> 0021e2dc  03de7af0 (03de7af0)<0001ab48><0012c678> 0021e2dc  00191b18

03de7af0:(03de7b0c) ffa00460  00000000  00000000  001b2424  00000041  03de7b34 (03de7b24)

03de7b10:<00030a72> 0021b430  00001b60  03de7b34 <0012c678>(03de7b48)<000321a2> 0021e8b4

03de7b30: 0021b430  0021e8e8  00000009  00001b60  00000000  03de7b58 (03de7b70) ffa002dc

03de7b50: 00000006  001c4874  00001b60  00000000  00000000  00000000  03de7b98  ffa011fc

03de7b70:(03de7ba0) ffa011fc  00191b14  0021b15c <00346ef4> 03de59c0  03aa8ab8  10000000

03de7b90: 00000000  10000000  00000000  ffa00c82 (00000000) ffa00c82  03a87ecc  000306ce

03de7bb0: 00008050  000122d2  00008050  00000026  00000000  01c11d54  0004c056  000122d2

03de7bd0:<000122c0> 00000006  02003024  00009c1c  000122d2  00009c1a  000122d2  00000000

03de7bf0: 0002443c  00000008  00000000  00000008  00000000  03a81596  00000000  03c80d2c

03de7c10: 0000003f  00000000  00000000  00000000  00000000  00000000  0000001b  03c8252c

03de7c30: fffffffd  00000000  00000000  03aa8ab8  00000000  03cdfbd0  03de7ca4  00191b14

03de7c50: 0021b15c  00191b18  0007e7fc  0021e6c8  013aebf0 <00346ef4> 03de59c0  00000009

03de7c70: 0023aab0  00000000  000000fd  0007e7fc  00000000  00000000  013aebf0  00000006

03de7c90: 00191b3a  00191b3a  00191b3a  03de7cc0  03de7cc0  03de7cdc <00015274> 0021dfb8

03de7cb0: 03de59c0  03de8cd8  03de59c0  03de59c0  03de59c0  0000000d  03de7cdc  03de7cd4

03de7cd0: 00000001  03de7cd4  03de7cd4  03de7d10 <000152a8> 03de8cd8  03de6000  03de8cd8

03de7cf0: 03de6000  00000009  0023afb4  0023aab0  03de6000  03de7d44  03de7d20 <0012c634>

03de7d10: 03de7d44 <0001d53a> 00000009  00000008  03de7e60  0023afb4  03de7d4c <0000bb74>

03de7d30: 001916f8  001916dc  03de6000  03de6000  03de6000  03de7ef4 <0000300a> 03de6000

03de7d50: 000000be  03de7f24  00000002  ffffe000  00238a88  0000fffe  03de7d8c  03de7ee0

03de7d70: 03de7f24  00000000  03de5cf8  00243e28  03de59c0  00243e28  03de7db4  ffa01ea6

03de7d90: 03ca24c0  00243e28  00243e28  0000ffff <00020f14> 03de7dd0  03de6000  03de6000

03de7db0: 03de6000  03de7e14  ffa021f2  03de6000  03de7e9c  03de6000  7fffffff  7fffffff

03de7dd0: 03de7ea0  00000000  00004513  03f75351  aac9b4c0  00009445  03de7e14  03de7e14

03de7df0:<0000d3c2> 03de7e20 <0012c70e> 03de6000  03de7e9c  03de6000  03de7e20 <0012c634>

03de7e10: 03de6000  03de7e48  ffa01b42  03de7e48  03de7e48  ffa01ad6  0019169c  00000000

03de7e30: 000000a6  00000001  03de59c0  0000bf4c  00100100  00200200  03de7e78  ffa01c08

03de7e50: 03ca24c0  00000000  00000000  00004111  00000009  00000000  00000080  00000000

03de7e70: 00000000  5a5a5a5a  5a5a5a5a  5a5a5a5a <000158b8> 03cdfbd0  03de7f24  00000000

03de7e90: 00000000  03de91d0  00000000  00000000  00000001  dead4ead  ffffffff  ffffffff

03de7eb0: 03de7eb0  03de7eb0  03de7ef4 <00001618> ffa000e0  000000be  003a9940  00000000

03de7ed0: ffffe000  00238a88  ffffffff  00000000  00000000  00000000  00000000  00000000

03de7ef0: 00000000  00000000  ffa00996  03de6000  000000be  003a9940  00000002  ffffe000

03de7f10: 00238a88  0000fffe  ffffffff  ffffffff  0003c8e0  0034c126  00008000  00000000

03de7f30: 00000000  03de8000  0034c126  0034c126 <0022dd6a> ffa00fca  02002020  002357cf

03de7f50: 0035f7ed  002357ce  0035f7ea  00000000  00000000  00000000  00000000  00000000

03de7f70: 00000000  00000000  00000000  00000000  00000000  00000000  00000000  00000000

03de7f90: 00000000  00000000  00000000  00000000  00000000  00000000  00000000  003a9940

03de7fb0: 0022e5e8  03cdfbd0  03cdfcb4  00000000  00238bb4  003a9940  038ab004  0034c120

03de7fd0: 000000be  459accfe  459acd03  00238a88  ffffffff  ffffffff  000000ff  00000004

03de7ff0: 000000a6  00000004  000000be  00000006

Return addresses in stack:

   frame  1 : <0x00032ece> { ___rcu_pending + 0x42 }

NULL pointer access

Kernel OOPS in progress

Deferred Exception context

CURRENT PROCESS:

COMM=init PID=1

CPU = 0

invalid mm

return address: [0x000b8afa]; contents of:

0x000b8ad0:  e101  35d8  e3ff  ff80  2ff6  0000  0000  e800

0x000b8ae0:  0000  6008  e801  0000  0010  0000  3210  e141

0x000b8af0:  deaf  e101  1eed  e800  0003 [9110] 0808  1004

0x000b8b00:  e801  0000  0010  e141  0017  3042  e101  35d8

 

ADSP-BF527-0.0 525(MHz CCLK) 131(MHz SCLK) (mpu off)

Linux version 2.6.28.10-ADI-2009R1.1-svn8364

Built with gcc version 4.1.2 (ADI svn)

 

SEQUENCER STATUS:               Not tainted

SEQSTAT: 00000027  IPEND: 8070  SYSCFG: 0006

  EXCAUSE   : 0x27

  interrupts disabled

  physical IVG5 asserted : <0xffa00cec> { _evt_ivhw + 0x0 }

  physical IVG6 asserted : <0xffa00ddc> { _evt_timer + 0x0 }

  physical IVG15 asserted : <0xffa00f64> { _evt_system_call + 0x0 }

  logical irq   6 mapped  : <0xffa00374> { _timer_interrupt + 0x0 }

  logical irq  31 mapped  : <0x000cb0ec> { _bfin_serial_dma_rx_int + 0x0 }

  logical irq  32 mapped  : <0x000cb344> { _bfin_serial_dma_tx_int + 0x0 }

  logical irq  35 mapped  : <0x000d4b00> { _bfin_mac_interrupt + 0x0 }

RETE: <0x00000000> /* Maybe null pointer? */

RETN: <0x03de7780> /* kernel dynamic memory */

RETX: <0x00000480> /* Maybe fixed code section */

RETS: <0x0012c780> { __read_lock + 0x8 }

PC  : <0x000b8afa> { __raw_read_lock + 0xe }

DCPLB_FAULT_ADDR: <0x00000004> /* Maybe null pointer? */

ICPLB_FAULT_ADDR: <0x000b8afa> { __raw_read_lock + 0xe }

 

PROCESSOR STATE:

R0 : 00000004    R1 : deaf1eed    R2 : 00000100    R3 : 0034b048

R4 : 03de7808    R5 : 00000100    R6 : 03de77c0    R7 : 03de6000

P0 : 0021e5cc    P1 : 03de6000    P2 : 00000004    P3 : 03de79c8

P4 : 03ca2be8    P5 : 03dcb694    FP : 03de778c    SP : 03de76a4

LB0: 000122d2    LT0: 000122d2    LC0: 0002443c

LB1: 000abad8    LT1: 000abac8    LC1: 00000000

B0 : 0000001b    L0 : 00000000    M0 : 00000001    I0 : 03de798c

B1 : 56000000    L1 : 00000000    M1 : 03c8252c    I1 : 0000000f

B2 : ffffffff    L2 : 00000000    M2 : 0000001b    I2 : 00000000

B3 : 03a81596    L3 : 00000000    M3 : 00000000    I3 : 0013a0e4

A0.w: 00001143   A0.x: 00000000   A1.w: 000006c7   A1.x: 00000000

USP : 03cdfbd0  ASTAT: 02002020

 

Hardware Trace:

 

Kernel Stack

Stack info:

SP: [0x03de7b68] <0x03de7b68> /* kernel dynamic memory */

FP: (0x03de7ca4)

Memory from 0x03de7b60 to 03de8000

03de7b60: 00000000  00000000 [03de7b98] ffa011fc  03de7ba0  ffa011fc  00191b14  0021b15c

03de7b80:<00346ef4> 03de59c0  03aa8ab8  10000000  00000000  10000000  00000000  ffa00c82

03de7ba0: 00000000  ffa00c82  03a87ecc  000306ce  00008050  000122d2  00008050  00000026

03de7bc0: 00000000  01c11d54  0004c056  000122d2 <000122c0> 00000006  02003024  00009c1c

03de7be0: 000122d2  00009c1a  000122d2  00000000  0002443c  00000008  00000000  00000008

03de7c00: 00000000  03a81596  00000000  03c80d2c  0000003f  00000000  00000000  00000000

03de7c20: 00000000  00000000  0000001b  03c8252c  fffffffd  00000000  00000000  03aa8ab8

03de7c40: 00000000  03cdfbd0  03de7ca4  00191b14  0021b15c  00191b18  0007e7fc  0021e6c8

03de7c60: 013aebf0 <00346ef4> 03de59c0  00000009  0023aab0  00000000  000000fd  0007e7fc

03de7c80: 00000000  00000000  013aebf0  00000006  00191b3a  00191b3a  00191b3a  03de7cc0

03de7ca0: 03de7cc0 (03de7cdc)<00015274> 0021dfb8  03de59c0  03de8cd8  03de59c0  03de59c0

03de7cc0: 03de59c0  0000000d  03de7cdc  03de7cd4  00000001  03de7cd4  03de7cd4 (03de7d10)

03de7ce0:<000152a8> 03de8cd8  03de6000  03de8cd8  03de6000  00000009  0023afb4  0023aab0

03de7d00: 03de6000  03de7d44  03de7d20 <0012c634>(03de7d44)<0001d53a> 00000009  00000008

03de7d20: 03de7e60  0023afb4  03de7d4c <0000bb74> 001916f8  001916dc  03de6000  03de6000

03de7d40: 03de6000 (03de7ef4)<0000300a> 03de6000  000000be  03de7f24  00000002  ffffe000

03de7d60: 00238a88  0000fffe  03de7d8c  03de7ee0  03de7f24  00000000  03de5cf8  00243e28

03de7d80: 03de59c0  00243e28  03de7db4  ffa01ea6  03ca24c0  00243e28  00243e28  0000ffff

03de7da0:<00020f14> 03de7dd0  03de6000  03de6000  03de6000  03de7e14  ffa021f2  03de6000

03de7dc0: 03de7e9c  03de6000  7fffffff  7fffffff  03de7ea0  00000000  00004513  03f75351

03de7de0: aac9b4c0  00009445  03de7e14  03de7e14 <0000d3c2> 03de7e20 <0012c70e> 03de6000

03de7e00: 03de7e9c  03de6000  03de7e20 <0012c634> 03de6000  03de7e48  ffa01b42  03de7e48

03de7e20: 03de7e48  ffa01ad6  0019169c  00000000  000000a6  00000001  03de59c0  0000bf4c

03de7e40: 00100100  00200200  03de7e78  ffa01c08  03ca24c0  00000000  00000000  00004111

03de7e60: 00000009  00000000  00000080  00000000  00000000  5a5a5a5a  5a5a5a5a  5a5a5a5a

03de7e80:<000158b8> 03cdfbd0  03de7f24  00000000  00000000  03de91d0  00000000  00000000

03de7ea0: 00000001  dead4ead  ffffffff  ffffffff  03de7eb0  03de7eb0  03de7ef4 <00001618>

03de7ec0: ffa000e0  000000be  003a9940  00000000  ffffe000  00238a88  ffffffff  00000000

03de7ee0: 00000000  00000000  00000000  00000000  00000000 (00000000) ffa00996  03de6000

03de7f00: 000000be  003a9940  00000002  ffffe000  00238a88  0000fffe  ffffffff  ffffffff

03de7f20: 0003c8e0  0034c126  00008000  00000000  00000000  03de8000  0034c126  0034c126

03de7f40:<0022dd6a> ffa00fca  02002020  002357cf  0035f7ed  002357ce  0035f7ea  00000000

03de7f60: 00000000  00000000  00000000  00000000  00000000  00000000  00000000  00000000

03de7f80: 00000000  00000000  00000000  00000000  00000000  00000000  00000000  00000000

03de7fa0: 00000000  00000000  00000000  003a9940  0022e5e8  03cdfbd0  03cdfcb4  00000000

03de7fc0: 00238bb4  003a9940  038ab004  0034c120  000000be  459accfe  459acd03  00238a88

03de7fe0: ffffffff  ffffffff  000000ff  00000004  000000a6  00000004  000000be  00000006

Return addresses in stack:

 

Double Fault

Kernel OOPS in progress

Deferred Exception context

CURRENT PROCESS:

COMM=init PID=1

CPU = 0

invalid mm

return address: [0x000b8afa]; contents of:

0x000b8ad0:  e101  35d8  e3ff  ff80  2ff6  0000  0000  e800

0x000b8ae0:  0000  6008  e801  0000  0010  0000  3210  e141

0x000b8af0:  deaf  e101  1eed  e800  0003 [9110] 0808  1004

0x000b8b00:  e801  0000  0010  e141  0017  3042  e101  35d8

 

ADSP-BF527-0.0 525(MHz CCLK) 131(MHz SCLK) (mpu off)

Linux version 2.6.28.10-ADI-2009R1.1-svn8364

Built with gcc version 4.1.2 (ADI svn)

 

SEQUENCER STATUS:               Not tainted

SEQSTAT: 00060027  IPEND: 8078  SYSCFG: 0006

  EXCAUSE   : 0x27

  physical IVG3 asserted : <0xffa007bc> { _trap + 0x0 }

  interrupts disabled

  physical IVG5 asserted : <0xffa00cec> { _evt_ivhw + 0x0 }

  physical IVG6 asserted : <0xffa00ddc> { _evt_timer + 0x0 }

  physical IVG15 asserted : <0xffa00f64> { _evt_system_call + 0x0 }

  logical irq   6 mapped  : <0xffa00374> { _timer_interrupt + 0x0 }

  logical irq  31 mapped  : <0x000cb0ec> { _bfin_serial_dma_rx_int + 0x0 }

  logical irq  32 mapped  : <0x000cb344> { _bfin_serial_dma_tx_int + 0x0 }

  logical irq  35 mapped  : <0x000d4b00> { _bfin_mac_interrupt + 0x0 }

RETE: <0x00000000> /* Maybe null pointer? */

RETN: <0x03de72e4> /* kernel dynamic memory */

RETX: <0x000b8afa> { __raw_read_lock + 0xe }

RETS: <0x0012c780> { __read_lock + 0x8 }

PC  : <0x000b8afa> { __raw_read_lock + 0xe }

DCPLB_FAULT_ADDR: <0x00000004> /* Maybe null pointer?

ICPLB_FAULT_ADDR: <0x000b8afa> { __raw_read_lock + 0x

 

PROCESSOR STATE:

R0 : 00000004    R1 : deaf1eed    R2 : 00000100    R

R4 : 03de736c    R5 : 0000100d    R6 : 03de7324    R

P0 : 0021e5cc    P1 : 03de6000    P2 : 00000004    P

P4 : 03ca2be8    P5 : 03dcb694    FP : 03de72f0    S

LB0: 000122d2    LT0: 000122d2    LC0: 0002443c

LB1: 000abad8    LT1: 000abac8    LC1: 00000000

B0 : 0000001b    L0 : 00000000    M0 : 00000001    I

B1 : 56000000    L1 : 00000000    M1 : 03c8252c    I

B2 : ffffffff    L2 : 00000000    M2 : 0000001b    I

B3 : 03a81596    L3 : 00000000    M3 : 00000000    I

A0.w: 020d0000   A0.x: 00000000   A1.w: 000036da   A1

USP : 03cdfbd0  ASTAT: 02002020

 

Hardware Trace:

Kernel panic - not syncing: Double Fault - unrecovera

 

 

I don't know if thats a related problem or if thats something different.

 

Another hint is, that running with DCACHE disabled I had no "freeze"/crash for about 4 days (running iperf) - so far.

 

Kind Regards,

 

Stefan

QuoteReplyEditDelete

 

 

2010-04-13 13:15:00     Re: System Freezing

Stefan Wanja (GERMANY)

Message: 88377   

 

Hello again,

 

so far no news whether it happens also with WRITE_BACK, the freeze/stall didn't appear yet in write back mode.

 

I have seen that there is an anomaly 05000443 which is saying that if IFLUSH is used at the end of a hardware loop the processor infinitely stalls. Using the gnICE+ I've seen that the processor stalls in blackfin_dcache_flush_range.

 

Maybe this anomaly in special cases also applies to FLUSH?

QuoteReplyEditDelete

 

 

2010-04-14 11:30:04     Re: System Freezing

Stefan Wanja (GERMANY)

Message: 88421   

 

It seems not to be like the IFLUSH problem. I disabled the workaround to see what happens and as described the blackfin stayed in a hardware which I could see with the debugger.

 

In our case the blackfin really stops working. To be precise, its in bfin_mac_hard_start_xmit at the JUMP.S which comes in the assembler code after the return from bfin_dcache_flush_range. It is followed by SSYNC().

QuoteReplyEditDelete

 

 

2010-04-19 12:45:27     Re: System Freezing

Stefan Wanja (GERMANY)

Message: 88594   

 

We've had a test running with WRITE_BACK for 5 days without a freeze/stall. So it could be assumed, that it is not a problem in WRITE_BACK mode. But as I said, in WRITE_BACK is still a bug which causes packet loss which is not tolerable in our application...

 

Is actually someone reading this? Could you give me a sign, that someone cares for that problem?

 

Kind Regards,

 

Stefan

QuoteReplyEditDelete

 

 

2010-04-19 14:05:37     Re: System Freezing

Mike Frysinger (UNITED STATES)

Message: 88597   

 

what address exactly is being flushed ?  and what insn exactly is the JTAG showing it hanging at ?

QuoteReplyEditDelete

 

 

2010-04-20 05:09:07     Re: System Freezing

Stefan Wanja (GERMANY)

Message: 88636   

 

Hello and thank you for the reply!

 

We made one special build where SSYNC got NOPS preceeding and interrupts disabled because what we thought it might have to do with interrupts confusing SSYNC.

 

Thats the output from gdb:

 

GNU gdb 6.6

Copyright (C) 2006 Free Software Foundation, Inc.

GDB is free software, covered by the GNU General Public License, and you are

welcome to change it and/or distribute copies of it under certain conditions.

Type "show copying" to see the conditions.

There is absolutely no warranty for GDB.  Type "show warranty" for details.

This GDB was configured as "--host=i686-pc-linux-gnu --target=bfin-elf"...

(gdb) target remote 10.0.10.14:2000

Remote debugging using 10.0.10.14:2000

0x000d48e4 in bfin_mac_hard_start_xmit (skb=0x30617e0, dev=0x3c84000)

    at /opt/uClinux-dist-2009R1-gwyx8/linux-2.6.x/arch/blackfin/include/asm/blackfin.h:18

18                      __asm__ __volatile__(

(gdb) info registers

r0             0xffff   65535

r1             0x3091680        50927232

r2             0x31     49

r3             0x0      0

r4             0x668000 6717440

r5             0x32bba0 3324832

r6             0x3c84000        63455232

r7             0x1f60ec 2056428

p0             0x3091680        0x3091680

p1             0x31     0x31

p2             0x3091076        0x3091076

p3             0x1f5688 0x1f5688

p4             0x30617e0        0x30617e0

p5             0x1f5674 0x1f5674

sp             0x669858 0x669858

fp             0x669864 0x669864

i0             0x669cf8 6724856

i1             0x3061830        50731056

i2             0x0      0

i3             0x0      0

m0             0x0      0

m1             0x0      0

m2             0x0      0

m3             0x0      0

b0             0x0      0

b1             0x0      0

b2             0x0      0

b3             0x0      0

l0             0x0      0

l1             0x0      0

l2             0x0      0

l3             0x0      0

a0x            0x0      0

a0w            0xf4221  999969

a1x            0x0      0

a1w            0x424e   16974

astat          0x2003024        33566756

rets           0xd49fc  0xd49fc <bfin_mac_hard_start_xmit+396>

lc0            0x0      0

lt0            0xffa012a6       -6286682

lb0            0xffa012a6       -6286682

lc1            0x0      0

lt1            0x7d26   32038

lb1            0x7d26   32038

cycles         0x222a8464       573211748

cycles2        0x1d     29

---Type <return> to continue, or q <return> to quit---

usp            0x663cc4 0x663cc4

seqstat        0x2000   8192

syscfg         0x6      6

reti           0x1269d2 1206738

retx           0x384d538        59036984

retn           0x66a000 6725632

rete           0xd48e4  870628

pc           0xd48e4  0xd48e4 <bfin_mac_hard_start_xmit+116>

cc             0x0      0

text_addr      0x0      0

text_end_addr  0x0      0

data_addr      0x0      0

fdpic_exec     0x0      0

fdpic_interp   0x0      0

ipend          0x0      0

 

(gdb) info stack

#0  0x000d48e4 in bfin_mac_hard_start_xmit (skb=0x30617e0, dev=0x3c84000)

    at /opt/uClinux-dist-2009R1-gwyx8/linux-2.6.x/arch/blackfin/include/asm/blackfin.h:18

putpkt: write failed: Broken pipe.

 

-------------

 

(gdb) disas 0x000d48ce 0xd4a0c

Dump of assembler code from 0xd48ce to 0xd4a0c:

0x000d48ce <bfin_mac_hard_start_xmit+94>:       IF !CC JUMP 0x0xd49d6 <bfin_mac_hard_start_xmit+358> (BP);

0x000d48d0 <bfin_mac_hard_start_xmit+96>:       R1 = [P4 + 0x50];

0x000d48d4 <bfin_mac_hard_start_xmit+100>:      R1 = R2 + R1;

0x000d48d6 <bfin_mac_hard_start_xmit+102>:      R1 += 0x2;              /* (  2) */

0x000d48d8 <bfin_mac_hard_start_xmit+104>:      R0 = R2;

0x000d48da <bfin_mac_hard_start_xmit+106>:      CALL 0x0x7d10 <blackfin_dcache_flush_range>;

0x000d48de <bfin_mac_hard_start_xmit+110>:      CLI R0;

0x000d48e0 <bfin_mac_hard_start_xmit+112>:      NOP;

0x000d48e2 <bfin_mac_hard_start_xmit+114>:      NOP;

0x000d48e4 <bfin_mac_hard_start_xmit+116>:      SSYNC;

0x000d48e6 <bfin_mac_hard_start_xmit+118>:      STI R0;

0x000d48e8 <bfin_mac_hard_start_xmit+120>:      P1 = [P3];

0x000d48ea <bfin_mac_hard_start_xmit+122>:      P0.H = 0x1f;            /* ( 31)        P0=0x0x1f0000 <map_pid_to_cmdline+124520> */

0x000d48ee <bfin_mac_hard_start_xmit+126>:      P2.H = 0xffc0;          /* (-64)        P2=0x0xffc00000(-4194304) */

0x000d48f2 <bfin_mac_hard_start_xmit+130>:      P0.L = 0x5688;          /* (22152)      P0=0x0x1f5688 <current_tx_ptr> */

0x000d48f6 <bfin_mac_hard_start_xmit+134>:      R0 = W[P1 + 0x10] (X);

0x000d48f8 <bfin_mac_hard_start_xmit+136>:      BITSET (R0, 0x0);               /* bit  0 */

0x000d48fa <bfin_mac_hard_start_xmit+138>:      W[P1 + 0x10] = R0;

0x000d48fc <bfin_mac_hard_start_xmit+140>:      P2.L = 0xca8;           /* (3240)       P2=0x0xffc00ca8(-4191064) */

0x000d4900 <bfin_mac_hard_start_xmit+144>:      R0 = W[P2] (Z);

0x000d4902 <bfin_mac_hard_start_xmit+146>:      CC = !BITTST (R0, 0x3);         /* bit  3 */

 

(gdb) disas 0xd49ec 0xd4a0a

Dump of assembler code from 0xd49ec to 0xd4a0c:

0x000d49ec <bfin_mac_hard_start_xmit+380>:      ILLEGAL

0x000d49ee <bfin_mac_hard_start_xmit+382>:      [P1 + 0xc] = R0;

0x000d49f0 <bfin_mac_hard_start_xmit+384>:      R1 = [P4 + 0x50];

0x000d49f4 <bfin_mac_hard_start_xmit+388>:      R1 = R0 + R1;

0x000d49f6 <bfin_mac_hard_start_xmit+390>:      R1 += 0x4;              /* (  4) */

0x000d49f8 <bfin_mac_hard_start_xmit+392>:      CALL 0x0x7d10 <blackfin_dcache_flush_range>;

0x000d49fc <bfin_mac_hard_start_xmit+396>:      JUMP.S 0x0xd48de <bfin_mac_hard_start_xmit+110>;

0x000d49fe <bfin_mac_hard_start_xmit+398>:      UNLINK;

0x000d4a02 <bfin_mac_hard_start_xmit+402>:      R0 = 0x1 (X);           /*              R0=0x1(  1) */

0x000d4a04 <bfin_mac_hard_start_xmit+404>:      (R7:6, P5:3) = [SP++];

0x000d4a06 <bfin_mac_hard_start_xmit+406>:      RTS;

0x000d4a08 <bfin_mac_hard_start_xmit+408>:      CC = R0 == 0x0;

0x000d4a0a <bfin_mac_hard_start_xmit+410>:      IF !CC JUMP 0x0xd493e <bfin_mac_hard_start_xmit+206> (BP);

End of assembler dump.

 

 

Our interpretation is, that after one of the blackfin_dcache_flush_range calls the execution is continuing on 0x000d48de and going to 0x000d48e4 where it stops.

 

Without interrupts disabled we usually hang in one of the interrupt handlers evt_evt[X]. For example we had it stop at the first instruction of evt_evt7:

 

0xffa00ca8 <_evt_evt7>: [--SP] = SYSCFG;

 

while next entry on stack was as above in bfin_mac_hard_start_xmit.

 

We can not tell which range got flushed because at that time we see it, it is 0x0, which is probably not what it really was.

 

Kind Regards,

 

Stefan

QuoteReplyEditDelete

 

 

2010-04-20 05:32:09     Re: System Freezing

Mike Frysinger (UNITED STATES)

Message: 88638   

 

might be interesting to do the cli/sti around blackfin_dcache_flush_range, and do the SSYNC before the call to the flush range.  btw, you should be using the SSYNC() helper and not a direct SSYNC; as the entire point of the SSYNC() is to handle known anomalies.

 

assuming the processor is hung at the referenced pc (the SSYNC), this does appear to be a hardware anomaly.  but it would be better to see the exact debug info from an unmodified build and not one where you added arbitrary code.

QuoteReplyEditDelete

 

 

2010-04-20 11:54:54     Re: System Freezing

Stefan Wanja (GERMANY)

Message: 88657   

 

Hi Mike,

 

we just enabled ANOMALY 05000312 manually, thats why there is sync and not SSYNC(). We will try our proposals...

 

Now we also had the problem on a bf537 0.3 EZKit.

 

Here is where it stalled:

 

pc was at 0xcb5a4 <bfin_mac_hard_start_xmit+116>

 

   cb58c:       d2 b0           [P2 + 0xc] = R2;

   cb58e:       84 14           IF !CC JUMP 0xcb696 <_bfin_mac_hard_start_xmit+0x166> (BP);

   cb590:       21 e4 14 00     R1 = [P4 + 0x50];

   cb594:       4a 50           R1 = R2 + R1;

   cb596:       11 64           R1 += 0x2;              /* (  2) */

   cb598:       02 30           R0 = R2;

   cb59a:       f9 e3 45 dd     CALL 0x7024 <_blackfin_dcache_flush_range>;

   cb59e:       30 00           CLI R0;

   cb5a0:       00 00           NOP;

   cb5a2:       00 00           NOP;

   cb5a4:       24 00           SSYNC;

   cb5a6:       40 00           STI R0;

   cb5a8:       59 91           P1 = [P3];

   cb5aa:       48 e1 1e 00     P0.H = 0x1e;            /* ( 30)        P0=0x1e0004 */

   cb5ae:       4a e1 c0 ff     P2.H = 0xffc0;          /* (-64)        P2=0xffc098fc(-4155140) */

   cb5b2:       08 e1 38 5d     P0.L = 0x5d38;          /* (23864)      P0=0x1e5d38 <_current_tx_ptr> */

   cb5b6:       08 aa           R0 = W[P1 + 0x10] (X);

   cb5b8:       00 4a           BITSET (R0, 0x0);               /* bit  0 */

   cb5ba:       08 b6           W[P1 + 0x10] = R0;

   cb5bc:       0a e1 a8 0c     P2.L = 0xca8;           /* (3240)       P2=0xffc00ca8(-4191064) */

   cb5c0:       10 95           R0 = W[P2] (Z);

 

 

which is this code:

 

/* SSYNC implementation for C file */

static inline void SSYNC(void)

{

    int _tmp;

    if (ANOMALY_05000312)

        __asm__ __volatile__(

            "cli %0;"

            "nop;"

            "nop;"

            "ssync;"

            "sti %0;"

            : "=d" (_tmp)

        );

    else if (ANOMALY_05000244)

        __asm__ __volatile__(

            "nop;"

            "nop;"

            "nop;"

            "ssync;"

        );

    else

        __asm__ __volatile__("ssync;");

}

 

Difference here is that the ANOMALY 05000312 is enabled by default. This is unmodified code. For completeness attached is the kernel config for this case.

 

Maybe its time to create a bug in the tracker?

 

Kind Regards,

 

Stefan

 

config

QuoteReplyEditDelete

 

 

2010-04-20 16:15:01     Re: System Freezing

Mike Frysinger (UNITED STATES)

Message: 88666   

 

if you can reproduce with a bf537-ezkit, let's focus on that.  file a tracker item with the exact configs/binaries so we can download things and reproduce ourselves.  exact steps to reproduce would be good too.

QuoteReplyEditDelete

 

 

2010-04-23 09:10:46     Re: System Freezing

Stefan Wanja (GERMANY)

Message: 88817   

 

I have created a bug for this now: blackfin.uclinux.org/gf/project/uclinux-dist/tracker/

 

@Mike: As this looks more and more like a hardware bug, how should we deal with that? Is there a use in contacting ADI or would someone of you guys do that in such a case?

QuoteReplyEditDelete

 

 

2010-04-23 11:10:07     Re: System Freezing

Robin Getz (UNITED STATES)

Message: 88821   

 

Stefan:

 

Thanks for the bug report - if we can replicate things - we will report things to the hardware folks...

 

I posted a few followup questions in the tracker.

 

-Robin

Attachments

Outcomes