[#4184] pthread test case crash with CPLB miss error in SMP kernel

Document created by Aaronwu Employee on Aug 29, 2013
Version 1Show Document
  • View in full screen mode

[#4184] pthread test case crash with CPLB miss error in SMP kernel

Submitted By: Vivi Li

Open Date

2008-06-23 04:06:41     Close Date

2008-11-19 03:21:43

Priority:

Medium     Assignee:

Graf Yang

Jie Zhang

Status:

Closed     Fixed In Release:

N/A

Found In Release:

N/A     Release:

Category:

N/A     Board:

EZKIT Lite

Processor:

BF561     Silicon Revision:

Is this bug repeatable?:

Yes     Resolution:

Duplicate

Uboot version or rev.:

    Toolchain version or rev.:

08r1.5-11

App binary format:

N/A     

Summary: pthread test case crash with CPLB miss error in SMP kernel

Details:

 

pthread test case crash with CPLB miss error.

pthread test script and build script is located at uclinux-dist/testsuites/oprofile.

Source code of pthread test is located at uclinux-dist/user/blkfin-test/pthread_test.

 

--

root:/> cd bin/

root:/bin> ./ex1

create a succeeded 0D

                     create btaucce ded a

CaaaaPaaaaLaaaaBaaaa maaaaiss

- Used by the MMU to signal a CPLB miss on a data access.

aaaDefered Exception context

CURRENT PROCESS:

COMM=ex1 PID=98

aTEXT = 0x031c0040-0x031ce640        DATA = 0x031ce644-0x031d4854

BSS = 0x031d4854-0x031d6ee4  USER-STACK = 0x031d7f88

 

areturn address: [0x031c095c]; contents of:

0x031c0930: a 05b4  e200  321b a 0000  04c3  3228 a e141  031d

a0x031c0940:  e101  00fc a e800  002a  e428  0066 a 4f20  5008

0x031c0950:  3210  a0d0  e14a  ffb0  e10a  0000 [9310] bc55

0x031c0960:  e300 a 251e  b168  e121 a 0064  3045  5048 a 6002

 

SEQUENCER STATUS:               Not tainted

SEQSTAT: 00000026  IPEND: 0030  SYSCFG: 0036

  HWERRCAUSE: 0x0

  EXCAUSE   : 0x26

a RETE: <0x00000000> /* Maybe null pointer? */

RETN: <0x031be000> /* unknown address */

RETX: <0x031c095c> [ ex1 + 0x91c ]

RETS: <0x031c5164> [ ex1 + 0x5124 ]

a PC  : <0x031c095c> [ ex1 + 0x91c ]

DCPLB_FAULT_ADDR: <0xffb00000> /* unknown address */

aICPLB_FAULT_ADDR: <0x031c095c> [ ex1 + 0x91c ]

 

PROCESSOR STATE:

a R0 : 031e8004    R1 : 031d00fc    R2 : 00000f21    R3 : 031ec004

R4 : 031ce6ac    R5 : 031c0140    R6 : 00000030    R7 : 00004000

a P0 : 031c0938    P1 : 031ebe24    P2 : ffb00000    P3 : 0000005f

P4 : 00000000    P5 : 031ebe24    FP : 031ebe04    SP : 031bdf24

LB0: 031c6679    LT0: 031c6678    LC0: 00000000

LB1: 031c0f6f    LT1: 031c0f6e    LC1: 00000000

B0 : 00000000    L0 : 00000000    M0 : 00000000    I0 : 037adeac

B1 : 00000000    L1 : 00000000    M1 : 00000000    I1 : 00000000

B2 : 00000000    L2 : 00000000    M2 : 00000000    I2 : 00000000

B3 : 00000000    L3 : 00000000    M3 : 00000000    I3 : 00000000

A0.w: 00000000   A0.x: 00000000   A1.w: 00000000   A1.x: 00000000

USP : 031ebd5c  ASTAT: 02002020

 

aNo trace since you do not have CONFIG_DEBUG_BFIN_NO_KERN_HWTRACE enabled

 

Stack from 031bdf04:

       a 00001686 00009d54 ff700028a ff700028 ff700024 e7306c08a e3fe0017 2f925f25

        031c095c 00000030 00000026 00000000 031be000 031c095c 031c095ca 031c5164

        031e8004 02002020 031c0f6fa 031c6679 031c0f6e 031c6678 00000000a 00000000

        00000000 00000000 00000000 00000000 00000000 00000000a 00000000 00000000

        00000000 00000000 00000000 00000000a 00000000 00000000 00000000a 00000000

        00000000 00000000 00000000 037adeac 031ebd5c 031ebe04 031ebe24 00000000

 

Call Trace:

[<00004000>]a _do_settimeofday+0x10/0xd4

 

aaaaaaaaaaaaaroot:/bin>

--

 

Follow-ups

 

--- Graf Yang                                                2008-06-23 06:06:30

When this thread run on CoreB, it can't access address 0xffb00000.

 

--- Graf Yang                                                2008-06-25 02:43:02

This APP will call libthread which will access CoreA scratchpad. It failed when

run on CoreB.

 

--- Mike Frysinger                                           2008-06-25 07:44:50

then our toolchain needs updating

 

--- Graf Yang                                                2008-06-25 22:08:40

I think we'd implement following steps in toolchain for access scratchpad

1. detect which CPU we are running on

2. call sched_setaffinity() to bind current thread on this CPU

3. access scratchpad of current CPU

4. call sched_setaffinity() to allow the thread running on any CPU

 

Are there any suggestions?

 

--- Mike Frysinger                                           2008-06-26 00:26:51

part (3) would have to be done with the kernel.  the toolchain asks the kernel

"what is the currently valid scratch pad address".  this is tracked as

[#2566] already ...

 

--- Sonic Zhang                                              2008-07-08 23:37:46

Duplicate of bug [#2566]

 

--- Jean-Christian de Rivaz                                  2008-08-11 08:44:48

Can you test if the patch below for the toolchain solve the problem ?

 

diff --git a/uClibc/libc/sysdeps/linux/bfin/bfin_l1layout.h

b/uClibc/libc/sysdeps/linux/bfin/bfin_l1layout.h

index 00efd23..ce7f565 100644

--- a/uClibc/libc/sysdeps/linux/bfin/bfin_l1layout.h

+++ b/uClibc/libc/sysdeps/linux/bfin/bfin_l1layout.h

@@ -1,4 +1,4 @@

-#define L1_SCRATCH_START       0xFFB00000

+#define L1_SCRATCH_START       0xFEB00010

 

/* Data that is "mapped" into the process VM at the start of the L1

scratch

    memory, so that each process can access it at a fixed address.  Used for

 

--- Jean-Christian de Rivaz                                  2008-08-11 08:45:08

Can you test if the patch below for the toolchain solve the problem ?

 

diff --git a/uClibc/libc/sysdeps/linux/bfin/bfin_l1layout.h

b/uClibc/libc/sysdeps/linux/bfin/bfin_l1layout.h

index 00efd23..ce7f565 100644

--- a/uClibc/libc/sysdeps/linux/bfin/bfin_l1layout.h

+++ b/uClibc/libc/sysdeps/linux/bfin/bfin_l1layout.h

@@ -1,4 +1,4 @@

-#define L1_SCRATCH_START       0xFFB00000

+#define L1_SCRATCH_START       0xFEB00010

 

/* Data that is "mapped" into the process VM at the start of the L1

scratch

    memory, so that each process can access it at a fixed address.  Used for

 

--- Vivi Li                                                  2008-10-20 05:06:49

I don't see CPLB miss error after using the latest toolchain.

But pthread test result is still not right.

Sometimes it will get a correct result, and sometimes it says "CPU =

1" or get some unexpected result.

 

Bellow is the log for each test case:

--

root:/> ex1

CPU = 1

root:/> ex2

CPU = 1

root:/> ex3

bearchinC PoU t=e n1

r = 4337...

root:/> ex4

keread 4C0P Ul o=a e1

   0

Thread 400: allocating buffer at 0x3297148

root:/> ex5

CPU = 1

root:/> ex6

CPU = 1

root:/> ex7

CPU = 1

root:/> ptest

PASS

 

 

root:/> ex1

Starting process a

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa

aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaacreate a succeeded

aaaaaaaaaaaaaaaaaaaaaaaaaaaa

root:/>    (right result)

 

root:/> ex2

0 --->

1 --->

2 --->

3 --->

4 --->

5 --->

6 --->

C P-U>

       =  -1

9 --->

10 --->

11 --->

12 --->

13 --->

14 --->

15 --->

root:/>        (wrong result)

 

root:/> ex3

Searching for the number = 192...

CPU = 1        (wrong result)

 

root:/> ex4

Thread 400: allocated key 0

thrCPdU4 0= a l1c

ng buffer at 0x5c8148

Thread 402: allocating buffer at 0x5c85d0

Thread 402: "Result of first thread"

Thread 402: freeing buffer at 0x5c85d0    (wrong result)

 

root:/> ex5

0 --->

1 --->

2 --->

3 --->

4 --->

5 --->C

-71    P U -=

>

8 --->

9 --->

10 --->

11 --->

12 --->

13 --->

14 --->

15 --->            (wrong result)

 

root:/> ex6

count = 0

count = 1

count = 2

count = 3

count = 4

count = 5

count = 6

count = 7

count = 8

count = 9

count = 10

count = 11

(...)

count = 1990

count = 1991

count = 1992

count = 1993

count = 1994

count = 1995

count = 1996

count = 1997

count = 1998

count = 1999        (right result)

 

root:/> ex7

waiting 0 ms ...

count = 0

waiting 100 ms ...

count = 1

waiting 200 ms ...

count = 2

(...)

count = 18

waiting 1900 ms ...

count = 19        (right result)

 

root:/> ptest

PASS

root:/>

--

 

--- Sonic Zhang                                              2008-10-20 05:36:45

Need to call sched_setaffinity() at the beginning of main().

 

--- Mike Frysinger                                           2008-10-20 05:40:55

really we should just fix the smp system already instead of hacking every single

one of our tests with changes that grace/vivi will have to simply remove once

the kernel is fixed

 

--- Vivi Li                                                  2008-10-22 06:00:05

CPLB miss error happen again.

 

--

root:/bin> ./ex1^M

Data access CPLB miss^M

- Used by the MMU to signal a CPLB miss on a data access.^M

Deferred Exception context^M

CURRENT PROCESS:^M

COMM=ex1 PID=134^M

CPU = 1^M

TEXT = 0x03380040-0x0338eba0        DATA = 0x0338eba4-0x03394db4^M

BSS = 0x03394db4-0x03397444  USER-STACK = 0x03398f8c^M

^M

return address: [0x03380b8a]; contents of:^M

0x03380b60:  3041  e300  1277  2fd7  05e3  e14a  0339  e800 ^M

0x03380b70:  0073  e10a  6f08  b278  e149  ffb0  9110  e14a ^M

0x03380b80:  0339  e109  0000  e10a  6f0c [9308] 9110  b048 ^M

0x03380b90:  e14a  0339  e140  0339  e10a  4844  e100  488c ^M

^M

SEQUENCER STATUS:               Not tainted^M

SEQSTAT: 00060026  IPEND: 0030  SYSCFG: 0006^M

  EXCAUSE   : 0x26^M

RETE: <0x00000000> /* Maybe null pointer? */^M

RETN: <0x032ee000> /* kernel dynamic memory */^M

RETX: <0x00000480> /* Maybe fixed code section */^M

RETS: <0x033852bc> [ ex1 + 0x527c ]^M

PC  : <0x03380b8a> [ ex1 + 0xb4a ]^M

DCPLB_FAULT_ADDR: <0xffb00000> /* kernel dynamic memory */^M

ICPLB_FAULT_ADDR: <0x03380b8a> [ ex1 + 0xb4a ]^M

^M

PROCESSOR STATE:^M

R0 : 032ea004    R1 : 032ebfe4    R2 : 00000f00    R3 : 00000001^M

R4 : 03380184    R5 : 00000001    R6 : 03396f08    R7 : 03396f0c^M

P0 : 03380b68    P1 : ffb00000    P2 : 03396f0c    P3 : 03396f00^M

P4 : 03394a28    P5 : 03397168    FP : 032ebfb4    SP : 032edf24^M

LB0: 03386739    LT0: 03386738    LC0: 00000000^M

LB1: 004123a5    LT1: 0041239e    LC1: 00000000^M

B0 : 00000000    L0 : 00000000    M0 : 00000000    I0 : 03398e6c^M

B1 : 00000000    L1 : 00000000    M1 : 00000000    I1 : 00000000^M

B2 : 00000000    L2 : 00000000    M2 : 00000000    I2 : 00000000^M

B3 : 00000000    L3 : 00000000    M3 : 00000000    I3 : 00000000^M

A0.w: 00000000   A0.x: 00000000   A1.w: 00000000   A1.x: 00000000^M

USP : 032ebde8  ASTAT: 02003025^M

^M

No trace since you do not have CONFIG_DEBUG_BFIN_NO_KERN_HWTRACE enabled^M

^M

Userspace Stack^M

Stack info:^M

SP: [0x032ebde8] <0x032ebde8> [ ex1 + 0x1de8 ]^M

FP: (0x032ebf00)^M

Memory from 0x032ebde0 to 032ec000^M

032ebde0: 005b3ca0  00000000 [00000007] 00000000  0338ec0c  005b3ef0  005b3ca0

032ebe80 ^M

032ebe00: 005c3e04  0001eb42  032587f8  3f3f3f3f  00000000  03380980  03380980

0001ec34 ^M

032ebe20: 005b3ca0  00000007  0378961c  00000000  0000ffbf  0010c934  000042c2

<000010be>^M

032ebe40: 00000007  0001f32a  00000000  0338ec00  005c3e04  0000489e  00004866

032ebf24 ^M

032ebe60: 00000007  00000000  00030001  03380968  037a62bc  037a62bc  00000000

037a6040 ^M

032ebe80: 00000007  00000000  00030001  03380968  037a62bc  037a62bc  00000000

037a6040 ^M

032ebea0: 00017bcc  032ebe24  000000ac  ffffff54  00000010  00000000  032ebed0

00000001 ^M

032ebec0: 00000000  00000000  00048404  00047ba2  00010b98  032ebed0

<033a65d8> 00017fcc ^M

032ebee0: 032ebf04  00011236  005b3ca0  0057aaa0  0017e520  005c3e04  0000a6ae

032ea000 ^M

032ebf00:(00000000)<000010be> 00000002  00000030  0338013c  0000fffe

00000001  00000000 ^M

032ebf20: 03380968  03380968  03380968  00060026  00000000  032ec000  00000480

03380968 ^M

032ebf40:<033852bc> 0000a8e0  02002020  03380f99  03386739  03380f98

03386738  00000000 ^M

032ebf60: 00000000  00000000  00000000  00000000  00000000  00000000  00000000

00000000 ^M

032ebf80: 00000000  00000000  00000000  00000000  00000000  00000000  00000000

00000000 ^M

032ebfa0: 00000000  00000000  00000000  00000000  03273eac  00000000

<033852bc> 03397168 ^M

032ebfc0: 03394a28  03396f00  03396f0c  03396f08  00000001  03380184  00000003

0338013c ^M

032ebfe0: 0338ec0c  005c4004  00000f21  0339065c  005c0004  00002000  03394d24

032deff4 ^M

032ec000: 03281980 ^M

Return addresses in stack:^M

    address : <0x000010be> { _init_post + 0xa6 }^M

    address : <0x033a65d8> /* kernel dynamic memory */^M

   frame  1 : <0x000010be> { _init_post + 0xa6 }^M

    address : <0x033852bc> [ ex1 + 0x527c ]^M

    address : <0x033852bc> [ ex1 + 0x527c ]^M

root:/bin>

--

 

--- Graf Yang                                                2008-10-22 22:58:57

"CPU = 1" is printed by kernel, the kernel also printed mass dump

message to /var/log/message, but it not display to the console, so you feel

strange.

 

--- Robin Getz                                               2008-10-23 07:39:43

I agree with Mike - We should not be changing test cases to make them work on

our version of "SMP".

 

 

--- Graf Yang                                                2008-10-23 22:56:51

This issue is the same with tool-chain bug [#2566].

toolchain/uClibc/libc/sysdeps/linux/bfin/bfin_l1layout.h hardcoded the

L1_SCRATCH_START as 0xFFB00000. Because BF561 CoreB scratchpad is at 0xFF700000,

when pthread write stack info into L1_SCRATCH_START on CoreB will cause CPLB

miss error.

 

 

--- Vivi Li                                                  2008-10-24 06:13:30

Now use taskset to force to run on CoreA, the same as in bug [#4540].

It can work now.

 

 

 

    Files

    Changes

    Commits

    Dependencies

    Duplicates

    Associations

    Tags

 

File Name     File Type     File Size     Posted By

No Files Were Found

Attachments

    Outcomes