2010-07-28 09:59:30     rsh/rshd crashes

Document created by Aaronwu Employee on Aug 22, 2013
Version 1Show Document
  • View in full screen mode

2010-07-28 09:59:30     rsh/rshd crashes

Stefan Pledl (GERMANY)

Message: 91814   

 

Hello,

 

on my bf537-stamp based board 'rsh' sometimes crashes.

Compiler:        2009R1.1 (ADI-09r1.1-2)

Kernel/uClinux   2009R1.1

 

To reproduce the problem i use two identical boards with different IP addresses.

Each board tries to execute a remote command on the other device after power on.

For testing i have used the followning skript, witch will be executed automatically

from '/etc/rc' after power on.

 

/etc/rc

...

/root/test.sh &

...

 

test.sh

----

#!/bin/sh

 

while true; do rsh -v -l root <REMOTE_BOARD_IP> date; sleep 2; done

----

 

When i power on both boards at the same time, sometimes one of the systems

crashes with the following output.

 

Any ideas what's the reasan for the crash?

 

Thanks,

Stefan

 

 

Sun Jan  4 03:03:41 UTC 1970                                                                 

Data access CPLB miss                                                                                

- Used by the MMU to signal a CPLB miss on a data access.                                           

Kernel OOPS in progress                                                                              

Deferred Exception context                                                                           

CURRENT PROCESS:                                                                                     

COMM=rsh PID=200                                                                                     

CPU = 0                                                                                              

TEXT = 0x01df0040-0x01df9ee0        DATA = 0x01df9f00-0x01dfbdc0                                     

BSS = 0x01dfbdc0-0x01dfc410  USER-STACK = 0x01dfdf80                                                

                                                                                                     

return address: [0x0010e10a]; contents of:                                                           

0x0010e0e0:  190d  5013  e14a  001e  e628  0025  e10a  73d0                                          

0x0010e0f0:  9110  b120  e121  00da  e4a0  001d  5408  0c00                                          

0x0010e100:  1860  0000  0000  0000  ace7 [e53a] 002c  a0d0                                          

0x0010e110:  0c00  1807  0000  0000  0000  a318  0c00  1859                                          

                                                                                                     

ADSP-BF537-0.2 500(MHz CCLK) 100(MHz SCLK) (mpu off)                                                 

Linux version 2.6.28.10-ADI-2009R1                                                                   

Built with gcc version 4.1.2 (ADI svn)                                                               

                                                                                                     

SEQUENCER STATUS:               Not tainted                                                          

SEQSTAT: 00000026  IPEND: 8030  SYSCFG: 0006                                                        

  EXCAUSE   : 0x26                                                                                   

  interrupts disabled                                                                                

  physical IVG5 asserted : <0xffa00c90> { _evt_ivhw + 0x0 }                                          

  physical IVG15 asserted : <0xffa00fb4> { _evt_system_call + 0x0 }                                  

  logical irq   6 mapped  : <0xffa003b4> { _timer_interrupt + 0x0 }                                  

  logical irq  10 mapped  : <0x000fa7bc> { _bfin_rtc_interrupt + 0x0 }                               

  logical irq  18 mapped  : <0x000df2cc> { _bfin_serial_dma_rx_int + 0x0 }                           

  logical irq  19 mapped  : <0x000df4ac> { _bfin_serial_dma_tx_int + 0x0 }                           

  logical irq  24 mapped  : <0x000e8084> { _bfin_mac_interrupt + 0x0 }                               

RETE: <0x00000000> /* Maybe null pointer? */                                                        

RETN: <0x01de1be4> /* kernel dynamic memory */                                                      

RETX: <0x00000480> /* Maybe fixed code section */                                                   

RETS: <0x0011f94a> { _ip_queue_xmit + 0x162 }                                                       

PC  : <0x0010e10a> { _neigh_resolve_output + 0x5e }                                                 

DCPLB_FAULT_ADDR: <0x6f720124> /* kernel dynamic memory */                                           

ICPLB_FAULT_ADDR: <0x0010e10a> { _neigh_resolve_output + 0x5e }                                      

                                                                                                     

PROCESSOR STATE:                                                                                     

R0 : 00000008    R1 : 000000da    R2 : 0000000e    R3 : 01ce6876                                    

R4 : 000005a8    R5 : fffb85ed    R6 : 00000000    R7 : fffb85ed                                    

P0 : 01ce6884    P1 : 003208c4    P2 : 001e73d0    P3 : 0033db60                                    

P4 : 01dcd820    P5 : 01dceec0    FP : 6f720074    SP : 01de1b08                                    

LB0: ffa01542    LT0: ffa01542    LC0: 00000000                                                     

LB1: 000fb400    LT1: 000fb3fc    LC1: 00000000                                                     

B0 : 00000000    L0 : 00000000    M0 : 00000000    I0 : 01dcd6c8                                    

B1 : 00000000    L1 : 00000000    M1 : 00000000    I1 : 003208c4                                    

B2 : 00000000    L2 : 00000000    M2 : 00000000    I2 : 00000000                                    

B3 : 00000000    L3 : 00000000    M3 : 00000000    I3 : 00000000                                    

A0.w: 00000001   A0.x: 00000000   A1.w: 00000001   A1.x: 00000000                                    

USP : 01dcd7b8  ASTAT: 02003004                                                                      

                                                                                                     

Hardware Trace:                                                                                      

   0 Target : <0x00004dc0> { _trap_c + 0x0 }                                                         

     Source : <0xffa006ce> { _exception_to_level5 + 0xae }                                           

   1 Target : <0xffa00620> { _exception_to_level5 + 0x0 }                                            

     Source : <0xffa004dc> { _bfin_return_from_exception + 0x20 }                                    

   2 Target : <0xffa004bc> { _bfin_return_from_exception + 0x0 }                                     

     Source : <0xffa00578> { _ex_trap_c + 0x6c }                                                     

   3 Target : <0xffa003e8> { _ex_dcplb_miss + 0x0 }                                                  

     Source : <0xffa003e2> { _ex_workaround_261 + 0x1a }                                             

   4 Target : <0xffa003c8> { _ex_workaround_261 + 0x0 }                                              

     Source : <0xffa007ae> { _trap + 0x6e }                                                          

   5 Target : <0xffa0075e> { _trap + 0x1e }                                                          

     Source : <0xffa0075a> { _trap + 0x1a }                                                          

   6 Target : <0xffa00740> { _trap + 0x0 }                                                           

     Source : <0xffa004dc> { _bfin_return_from_exception + 0x20 }                                    

   7 Target : <0xffa004bc> { _bfin_return_from_exception + 0x0 }

     Source : <0xffa003da> { _ex_workaround_261 + 0x12 }

   8 Target : <0xffa003c8> { _ex_workaround_261 + 0x0 }

     Source : <0xffa007ae> { _trap + 0x6e }

   9 Target : <0xffa0075e> { _trap + 0x1e }

     Source : <0xffa0075a> { _trap + 0x1a }

  10 Target : <0xffa00740> { _trap + 0x0 }

     Source : <0x0010e108> { _neigh_resolve_output + 0x5c } 0xace7

  11 Target : <0x0010e0ac> { _neigh_resolve_output + 0x0 }

     Source : <0x0011f0d8> { _ip_finish_output + 0xb8 } JUMP (P2)

  12 Target : <0x0011f0ce> { _ip_finish_output + 0xae }

     Source : <0x00100ca4> { _skb_push + 0x2c } RTS

  13 Target : <0x00100c78> { _skb_push + 0x0 }

     Source : <0x0011f0ca> { _ip_finish_output + 0xaa } CALL pcrel

  14 Target : <0x0011f0c0> { _ip_finish_output + 0xa0 }

     Source : <0xffa01556> { _memcpy + 0x4e }

  15 Target : <0xffa01538> { _memcpy + 0x30 }

     Source : <0xffa0152a> { _memcpy + 0x22 }

 

Kernel Stack

Stack info:

SP: [0x01de1f24] <0x01de1f24> /* kernel dynamic memory */

Memory from 0x01de1f20 to 01de2000

01de1f20: 00000000 [01df133e] 00008000  00000000  00000000  01de2000  01df133e  01df133e

01de1f40:<01df0a18> ffa01018  02003004  01df1f85  01df28d9  01df1f84  01df28d8  00000000

01de1f60: 00000000  00000001  00000000  00000001  00000000  00000000  00000000  00000000

01de1f80: 00000000  00000000  00000000  00000000  00000000  00000000  00000000  00000000

01de1fa0: 00000000  00000000  00000000  01dfdf84  01dcd6c8  01dcd7b8  01dcd7c4  01dfdef4

01de1fc0: 00000003  01dfdf84  00000014  01dcd824  00000004  01dcd824  00000014  00000000

01de1fe0: 00000003  00000001  00000014  01dcd824  00000003  00000003  00000004  00000006

Return addresses in stack:

    address : <0x01df0a18> [ rsh + 0x9d8 ]

Modules linked in:

Kernel panic - not syncing: Kernel exception

 

 

QuoteReplyEditDelete

 

 

2010-07-28 11:32:17     Re: rsh/rshd crashes

Mike Frysinger (UNITED STATES)

Message: 91817   

 

there were some alignment issues in the networking stack, but they should be fixed now.  search the forums for similar posts.

QuoteReplyEditDelete

 

 

2010-07-28 17:09:06     Re: rsh/rshd crashes

Stefan Pledl (GERMANY)

Message: 91826   

 

Hi Mike,

 

i have searched the forum but i have not found the right things.

Can you give me any hints?

 

You have written that this my problems have been already fixed.

Is it possible to backport thes fixes to 2009R1?

 

Thanks,

Stefan

QuoteReplyEditDelete

 

 

2010-07-28 17:48:47     Re: rsh/rshd crashes

Mike Frysinger (UNITED STATES)

Message: 91828   

 

try the 2009R1 svn branch

QuoteReplyEditDelete

 

 

2010-07-28 18:16:42     Re: rsh/rshd crashes

Stefan Pledl (GERMANY)

Message: 91829   

 

we are already on branch 2009R1 (git)

 

bf51x: backport PH8 fixes from trunk

git-svn-id: svn://localhost/svn/linux-kernel/branches/2009R1@8921 526b6c2d-f592-4532-a319-5dd88ccb003d

 

any other ideas?

QuoteReplyEditDelete

 

 

2010-07-28 19:37:05     Re: rsh/rshd crashes

Mike Frysinger (UNITED STATES)

Message: 91831   

 

then i guess we dont have any fixes for this atm.  i might be recalling the wrong bug as i believe the issues we looked at were about unaligned accesses while yours is a straight up miss.  that usually means bad pointer.

 

what exactly is your test code doing ?  it's running rsh locally on the Blackfin board to connect to its own rshd ?

QuoteReplyEditDelete

 

 

2010-07-29 03:24:55     Re: rsh/rshd crashes

Stefan Pledl (GERMANY)

Message: 91865   

 

The local 'rsh' connects to the romote 'rshd' to execoute the 'date' command on the remote device.

 

The code does not make sense, but with this script i can reproduce the problem in about 30% off the cases when i power on both devices at the same time.

 

The script starts automatically after power on and runs on both devices, each connecting to the remote unit and vice versa.

It could also be possible the the problem depends on the fact, that the command will be executed nearly at the same time.

If i delay one of the devices the probability of an error is very low.

 

Sometimes i get also the followning output when running the test.

 

BINFMT_FLAT: reloc outside program 0xd10c (0 - 0xc3d0/0x9ea0), killing rsh!

SIGSEGV

 

 

btw. i have tried to repeat the test with kernel/uClinux from trunk (snapshot images) but with the new inetutils-1.6 rsh blocks and no output of the remote display will be displayed.

QuoteReplyEditDelete

 

 

2010-07-29 12:22:56     Re: rsh/rshd crashes

Mike Frysinger (UNITED STATES)

Message: 91876   

 

but what exactly is "local" and what is "remote" ?  what device is running `rsh` ?  the Blackfin board ?  what device is is running `rshd` ?  a different Blackfin board ?

 

so you power them up and each tries to run `rsh` as quickly as possible to connect to `rshd` on the other device ?

QuoteReplyEditDelete

 

 

2010-07-29 17:12:35     Re: rsh/rshd crashes

Stefan Pledl (GERMANY)

Message: 91889   

 

We use two equal blackfin boards (based on bf537-stamp)

 

Board 1 (192.168.7.100)                    Board 2 (192.168.7.101)

-----------------------                    -----------------------

rsh 192.168.7.101 date  -----------------> rshd

rshd                    <----------------- rsh 192.168.7.100 date

 

>so you power them up and each tries to run `rsh` as quickly as possible to connect to `rshd` on the other device ?

that is exactly what we do

QuoteReplyEditDelete

 

 

2010-07-29 17:51:34     Re: rsh/rshd crashes

Mike Frysinger (UNITED STATES)

Message: 91890   

 

ok, np.  i just wanted to be sure of the setup before i try replicating with the wrong thing.

 

presumably you're using the on-chip MAC with both ?

 

could you post your kernel .config as an attachment to your message ?

QuoteReplyEditDelete

 

 

2010-07-29 18:34:25     Re: rsh/rshd crashes

Stefan Pledl (GERMANY)

Message: 91891   

 

We are using BF537 internal MAC and the two boards are identical.

 

Kernel config see attachment.

 

config

QuoteReplyEditDelete

 

 

2010-08-03 02:56:08     Re: rsh/rshd crashes

Stefan Pledl (GERMANY)

Message: 92086   

 

Hi Mike,

 

any news about the problem with rsh/rshd?

 

-Stefan

QuoteReplyEditDelete

 

 

2010-08-12 18:34:31     Re: rsh/rshd crashes

Mike Frysinger (UNITED STATES)

Message: 92389   

 

you dont need two boards to make it crash ... just have one attempt to rsh immediately upon startup to an IP that is unreachable and it'll crash.  however, i cant get 2010R1 to crash.  i know we've fixed some memory issues in trunk with the bfin_mac driver, but i dont know if those will help and/or are backportable to 2009R1.

 

ive opened a tracker item for the issue:

  blackfin.uclinux.org/gf/tracker/6179

QuoteReplyEditDelete

 

 

2010-08-13 11:48:14     Re: rsh/rshd crashes

Stefan Pledl (GERMANY)

Message: 92415   

 

Thanks Mike,

 

we will wait for 2010R1, porting our boards and hope that the problem will not appear again.

Attachments

Outcomes