[#5974] CPLB fault or SIGABRT when throwing&catching C++ exception in static FDPIC ELF

Document created by Aaronwu Employee on Oct 17, 2013
Version 1Show Document
  • View in full screen mode

[#5974] CPLB fault or SIGABRT when throwing&catching C++ exception in static FDPIC ELF

Submitted By: Kolja Waschk

Open Date

2010-03-17 04:02:52     Close Date

2013-05-24 01:37:27

Priority:

Medium High     Assignee:

Steve Kilbane

Board:

Custom     Silicon Revision:

0.3

Resolution:

Fixed     Fixed In Release:

N/A

Processor:

BF537     

Host Operating System:

Ubuntu 9.10 (32 bit)

toolchain rev.:

2009R1, e.g. SVN4012     kernel rev.:

State:

Closed     Found In Release:

2009R1.1_RC2

Is this bug repeatable?:

yes     

Summary: CPLB fault or SIGABRT when throwing&catching C++ exception in static FDPIC ELF

Details:

 

With toolchain 2009R1.1-RC2 but also with earlier versions (at least 2009R1-RC10) and a newer toolchain 9r1 built from SVN (4012), both gcc 4.1 and gcc 4.3, a simple application crashes when throwing a C++ exception. It crashes (ie. results in Data access CPLB miss) sometimes when started standalone, sometimes when started via gdbserver, but otherwise always results in SIGABRT when started via gdbserver.

 

Steps to reproduce:

 

1. Code test_bfin_ex.cc:

 

int main()

{

  try { throw "foo"; }

  catch (...) { };

}

 

2. Compile with -static (it didn't happen without -static)

 

bfin-linux-uclibc-g++ -static -o test_bfin_ex test_bfin_ex.cc

 

3. Run

 

There are two possible results on my test system. First is a Data access CPLB miss. The console shows

 

Data access CPLB miss

- Used by the MMU to signal a CPLB miss on a data access.

Deferred Exception context

CURRENT PROCESS:

COMM=test_bfin_ex PID=227

CPU = 0

TEXT = 0x01a00000-0x01a0bea8        DATA = 0x01978ea8-0x0197f010

BSS = 0x0197f010-0x01a20000  USER-STACK = 0x01a3fe70

 

return address: [0x01a07cb6]; contents of:

0x01a07c90:  fe63  0e18  1c19  63f8  b9f1  5408  0c00  1c08

0x01a07ca0:  b9c3  640b  9120  bbc3  0981  1402  9321  0000

0x01a07cb0:  916a  6c22  5b55 [916a] 0c42  17b1  b9c0  e801

0x01a07cc0:  0000  05a4  0010  4f18  6009  4081  3001  67f8

 

ADSP-BF537-0.3 533(MHz CCLK) 133(MHz SCLK) (mpu off)

Linux version 2.6.28.10-ADI-2009R1.1-svn8523

Built with gcc version 4.1.2 (ADI svn)

 

SEQUENCER STATUS:        Not tainted

SEQSTAT: 00000026  IPEND: 0030  SYSCFG: 0006

  EXCAUSE   : 0x26

  interrupts disabled

  physical IVG5 asserted : <0xffa00c48> { _evt_ivhw + 0x0 }

RETE: <0x00000000> /* Maybe null pointer? */

RETN: <0x01970000> [ gdbserver + 0x0 ]

RETX: <0x00000480> /* Maybe fixed code section */

RETS: <0x01a07c92> [ /tmp/test_bfin_ex + 0x7c92 ]

PC  : <0x01a07cb6> [ /tmp/test_bfin_ex + 0x7cb6 ]

DCPLB_FAULT_ADDR: <0x2f9c2f0c> /* kernel dynamic memory */

ICPLB_FAULT_ADDR: <0x01a07cb6> [ /tmp/test_bfin_ex + 0x7cb6 ]

 

PROCESSOR STATE:

R0 : 01a003a0    R1 : 001f2605    R2 : 001f2605    R3 : 00000003

R4 : 0197aa38    R5 : 01992ded    R6 : 00000000    R7 : 01992ded

P0 : 01992f14    P1 : 00000034    P2 : 2e030004    P3 : 0197aa38

P4 : 0197ad30    P5 : 2f9c2f0c    FP : 01a3f5fc    SP : 0196ff24

LB0: 01a08c8d    LT0: 01a08c8c    LC0: 00000000

LB1: 01a0023d    LT1: 01a00234    LC1: 00000000

B0 : 00000000    L0 : 00000000    M0 : 00000000    I0 : 00000000

B1 : 00000000    L1 : 00000000    M1 : 00000000    I1 : 00000000

B2 : 00000000    L2 : 00000000    M2 : 00000000    I2 : 00000000

B3 : 00000000    L3 : 00000000    M3 : 00000000    I3 : 00000000

A0.w: 00000000   A0.x: 00000000   A1.w: 00000000   A1.x: 00000000

USP : 01a3f5d8  ASTAT: 02002022

 

Hardware Trace:

   0 Target : <0x00004cf4> { _trap_c + 0x0 }

     Source : <0xffa00642> { _exception_to_level5 + 0xae }

   1 Target : <0xffa00594> { _exception_to_level5 + 0x0 }

     Source : <0xffa00450> { _bfin_return_from_exception + 0x18 }

   2 Target : <0xffa00438> { _bfin_return_from_exception + 0x0 }

     Source : <0xffa004ec> { _ex_trap_c + 0x6c }

   3 Target : <0xffa00370> { _ex_dcplb_miss + 0x0 }

     Source : <0xffa0070c> { _trap + 0x58 }

   4 Target : <0xffa006b4> { _trap + 0x0 }

     Source : <0x01a07cb4> [ /tmp/test_bfin_ex + 0x7cb4 ] 0x5b55

   5 Target : <0x01a07c92> [ /tmp/test_bfin_ex + 0x7c92 ]

     Source : <0x01a07986> [ /tmp/test_bfin_ex + 0x7986 ] RTS

   6 Target : <0x01a0797c> [ /tmp/test_bfin_ex + 0x797c ]

     Source : <0x01a0796c> [ /tmp/test_bfin_ex + 0x796c ] IF CC JUMP

   7 Target : <0x01a07954> [ /tmp/test_bfin_ex + 0x7954 ]

     Source : <0x01a07c8e> [ /tmp/test_bfin_ex + 0x7c8e ] CALL pcrel

   8 Target : <0x01a07c8a> [ /tmp/test_bfin_ex + 0x7c8a ]

     Source : <0x01a07aa8> [ /tmp/test_bfin_ex + 0x7aa8 ] RTS

   9 Target : <0x01a07a9a> [ /tmp/test_bfin_ex + 0x7a9a ]

     Source : <0x01a07a94> [ /tmp/test_bfin_ex + 0x7a94 ] IF CC JUMP

  10 Target : <0x01a07a60> [ /tmp/test_bfin_ex + 0x7a60 ]

     Source : <0x01a07a2a> [ /tmp/test_bfin_ex + 0x7a2a ] JUMP (P2)

  11 Target : <0x01a07a1e> [ /tmp/test_bfin_ex + 0x7a1e ]

     Source : <0x01a07a18> [ /tmp/test_bfin_ex + 0x7a18 ] IF CC JUMP

  12 Target : <0x01a079f8> [ /tmp/test_bfin_ex + 0x79f8 ]

     Source : <0x01a07c86> [ /tmp/test_bfin_ex + 0x7c86 ] CALL pcrel

  13 Target : <0x01a07c76> [ /tmp/test_bfin_ex + 0x7c76 ]

     Source : <0x01a07c64> [ /tmp/test_bfin_ex + 0x7c64 ] IF !CC JUMP

  14 Target : <0x01a07c48> [ /tmp/test_bfin_ex + 0x7c48 ]

     Source : <0x01a079da> [ /tmp/test_bfin_ex + 0x79da ] RTS

  15 Target : <0x01a079d0> [ /tmp/test_bfin_ex + 0x79d0 ]

     Source : <0x01a079bc> [ /tmp/test_bfin_ex + 0x79bc ] IF CC JUMP

 

 

 

The debugger bfin-linux-uclibc-gdb in the CPLB miss case reports a SIGBUS. It occurs always at the same place:

 

Program received signal SIGBUS, Bus error.

0x01a57b56 in get_cie_encoding (cie=0x2c8500d) at /opt/bfin9r1/gcc-4.3/libgcc/../gcc/unwind-dw2-fde.c:273

273      if (aug[0] != 'z')

Current language:  auto; currently c

(gdb) backtrace

#0  0x01a57b56 in get_cie_encoding (cie=0x2c8500d) at /opt/bfin9r1/gcc-4.3/libgcc/../gcc/unwind-dw2-fde.c:273

#1  0x01a57c38 in classify_object_over_fdes (ob=0x1a3ad30, this_fde=0x1a53408) at /opt/bfin9r1/gcc-4.3/libgcc/../gcc/unwind-dw2-fde.c:620

#2  0x01a582ae in search_object (ob=0x1a3ad30, pc=0x1a575a7) at /opt/bfin9r1/gcc-4.3/libgcc/../gcc/unwind-dw2-fde.c:731

#3  0x01a5854c in _Unwind_Find_FDE (pc=0x1a575a7, bases=0x1a7fd04) at /opt/bfin9r1/gcc-4.3/libgcc/../gcc/unwind-dw2-fde.c:994

#4  0x01a568f4 in uw_frame_state_for (context=0x1a7fc2c, fs=0x1a7f740) at /opt/bfin9r1/gcc-4.3/libgcc/../gcc/unwind-dw2.c:1129

#5  0x01a570ee in uw_init_context_1 (context=0x1a7fc2c, outer_cfa=0x1a7fd78, outer_ra=0x1a51464) at /opt/bfin9r1/gcc-4.3/libgcc/../gcc/unwind-dw2.c:1435

#6  0x01a575a8 in _Unwind_RaiseException (exc=0x196d174) at /opt/bfin9r1/gcc-4.3/libgcc/../gcc/unwind.inc:93

#7  0x01a51464 in __cxa_throw (obj=0x1a38ef0, tinfo=<value optimized out>, dest=0x1a5044c <~__scoped_lock>) at /opt/bfin9r1/gcc-4.3/libstdc++-v3/libsupc++/eh_throw.cc:71

#8  0x01a503cc in main () at test_bfin_ex.cc:3

 

 

 

 

The other possible outcome is a SIGABRT reported by the debugger, and no data access CPLB miss.

 

Program received signal SIGABRT, Aborted.

0x0197a376 in *___GI_kill (pid=0, sig=6) at libc/sysdeps/linux/common/kill.c:16

16    static inline _syscall2(int, __syscall_kill, __kernel_pid_t, pid, int, sig);

Current language:  auto; currently c

(gdb) backtrace

#0  0x0197a376 in *___GI_kill (pid=0, sig=6) at libc/sysdeps/linux/common/kill.c:16

#1  0x019794de in *___GI_abort () at libc/stdlib/abort.c:85

#2  0x01977132 in uw_init_context_1 (context=0x1a7fc2c, outer_cfa=0x1a7fd78, outer_ra=0x1971464) at /opt/bfin9r1/gcc-4.3/libgcc/../gcc/unwind-dw2.c:1257

#3  0x019775a8 in _Unwind_RaiseException (exc=0x1a03174) at /opt/bfin9r1/gcc-4.3/libgcc/../gcc/unwind.inc:93

#4  0x01971464 in __cxa_throw (obj=0x25, tinfo=<value optimized out>, dest=<error reading variable>) at /opt/bfin9r1/gcc-4.3/libstdc++-v3/libsupc++/eh_throw.cc:71

#5  0x019703cc in main () at test_bfin_ex.cc:3

 

 

 

Single-stepping in unwind-dw2-fde.c leads to the assumption that there might be problems with CIE offsets etc., but I'm not really familiar with DWARF2 and that unwinding mechanisms to give some proper diagnosis yet.

 

Follow-ups

 

--- Kolja Waschk                                             2010-03-17 05:59:24

A note about the kernel used: It is the linux-2.6.x as it comes with

uClinux-dist 2009R1.1-RC4, with Xenomai/Adeos patch applied. Therefore it

reports "Linux version 2.6.28.10-ADI-2009R1.1-svn8523" as its version.

The problem appears independent of Xenomai/Adeos presence, though.

 

--- Kolja Waschk                                             2010-04-08 03:54:28

New observations: The stop on SIGABRT seems to be okay as a debugger reaction on

uncatched exceptions. However, the CPLB fault absolutely isn't and it turns out

now that it also happens with dynamically linked applications, just less often.

 

--- Robin Getz                                               2010-04-20 08:16:03

Kolja:

 

Rather than a snippet - can you attach a file which is 100% compilable, which

shows the error? Thanks

 

--- Kolja Waschk                                             2010-04-20 14:49:57

Hi Robin, it looks ridiculous, but the above "snippet" is 100%

compilable. For convenience, I attach it as a separate file. Kolja

 

--- David Gibson                                             2010-05-19 09:16:45

This looks like it could be related to an old issue in the trunk, which should

be resolved in the current toolchain.

 

A good summary of the original problem can be found here

(https://bugzilla.redhat.com/show_bug.cgi?id=199788).

 

The problem relates to the alignment of eh_frame entries. It reports the same

stack trace that you encountered for the case where you crash in

get_cie_encoding. It's possible that this is a new and related issue as the

eh_frame size can be platform specific.

 

There are workarounds suggested. The first is to avoid the use of -static.

The second (and more practical) is to do the following:

 

Create an empty void function in a file, say eh_frame.c:

 

    void eh_dummy_func(void) { }

 

And ensure that this is linked into your application AFTER libc.

For example:

 

    gcc test.c -lc eh_frame.c

 

I have tested this with the example code that you provided and confirm that it

resolves the issue, and in more complex examples it allows exceptions to be

correctly thrown and caught.

 

In the mean time, I will continue to investigate the issue and locate the cause

of the crash.

 

--- Steve Kilbane                                            2010-08-18 06:45:16

This turns out to be a problem with relocating, but I haven't yet worked out

*where* the problem is.

 

In our test example, the eh_frame info from libstdc++.a(pointer_type_info.o) is

being added to the executable, to produce a CIE and eight FDEs. In the

executable, they look okay - the fourth FDE has a length of 24.

 

(for those playing along at home: this is the FDE fro

__cxxabiv1::__pointer_type_info::__is_pointer_p().)

 

When the executable starts up, __self_reloc runs, and in the process, it

changes the length value of the fourth FDE from 24 to - well, to a largely

random number, as it happens. Later on, when the exceptions routines walk the

FDE list looking for a function that matches the current return address on the

stack, they use the length field to get the start of the next FDE in the chain,

and head off into the weeds.

 

What I haven't yet worked out is whether this is a problem in:

- the compiler output for pointer_type_info.cc

- the linker's output in producing the executable (and instantiating multiple

destructors)

- __self_reloc's idea of what needs relocating

- something else.

 

Investigations are currently hampered by the inability to just recompile

pointer_type_info.cc, since the build script sets up a bunch of links for the

various files in the bits directory under the uClibc tree, and then clears them

out again later.

 

--- Steve Kilbane                                            2010-08-20 08:55:25

Aha. The linker's trying to create a relocation for the personality function in

the CIE of eh_alloc.o. However, that CIE comes from a section that's been marked

as removed during the link. When we ask for the section's offset, we get -1 back

to indicate that it's been removed, but we weren't checking that, and instead

using it as a valid section offset. That generated a reloc that updated a

random, misaligned address.

 

--- Mingquan Pan                                             2013-05-24 02:47:51

close.

 

 

 

    Files

    Changes

    Commits

    Dependencies

    Duplicates

    Associations

    Tags

 

File Name     File Type     File Size     Posted By

test_bfin_ex.cc    text/x-c++src    58    Kolja Waschk

Attachments

Outcomes