[#5974] CPLB fault or SIGABRT when throwing&catching C++ exception in static FDPIC ELF
Submitted By: Kolja Waschk
Open Date
2010-03-17 04:02:52 Close Date
2013-05-24 01:37:27
Priority:
Medium High Assignee:
Steve Kilbane
Board:
Custom Silicon Revision:
0.3
Resolution:
Fixed Fixed In Release:
N/A
Processor:
BF537
Host Operating System:
Ubuntu 9.10 (32 bit)
toolchain rev.:
2009R1, e.g. SVN4012 kernel rev.:
State:
Closed Found In Release:
2009R1.1_RC2
Is this bug repeatable?:
yes
Summary: CPLB fault or SIGABRT when throwing&catching C++ exception in static FDPIC ELF
Details:
With toolchain 2009R1.1-RC2 but also with earlier versions (at least 2009R1-RC10) and a newer toolchain 9r1 built from SVN (4012), both gcc 4.1 and gcc 4.3, a simple application crashes when throwing a C++ exception. It crashes (ie. results in Data access CPLB miss) sometimes when started standalone, sometimes when started via gdbserver, but otherwise always results in SIGABRT when started via gdbserver.
Steps to reproduce:
1. Code test_bfin_ex.cc:
int main()
{
try { throw "foo"; }
catch (...) { };
}
2. Compile with -static (it didn't happen without -static)
bfin-linux-uclibc-g++ -static -o test_bfin_ex test_bfin_ex.cc
3. Run
There are two possible results on my test system. First is a Data access CPLB miss. The console shows
Data access CPLB miss
- Used by the MMU to signal a CPLB miss on a data access.
Deferred Exception context
CURRENT PROCESS:
COMM=test_bfin_ex PID=227
CPU = 0
TEXT = 0x01a00000-0x01a0bea8 DATA = 0x01978ea8-0x0197f010
BSS = 0x0197f010-0x01a20000 USER-STACK = 0x01a3fe70
return address: [0x01a07cb6]; contents of:
0x01a07c90: fe63 0e18 1c19 63f8 b9f1 5408 0c00 1c08
0x01a07ca0: b9c3 640b 9120 bbc3 0981 1402 9321 0000
0x01a07cb0: 916a 6c22 5b55 [916a] 0c42 17b1 b9c0 e801
0x01a07cc0: 0000 05a4 0010 4f18 6009 4081 3001 67f8
ADSP-BF537-0.3 533(MHz CCLK) 133(MHz SCLK) (mpu off)
Linux version 2.6.28.10-ADI-2009R1.1-svn8523
Built with gcc version 4.1.2 (ADI svn)
SEQUENCER STATUS: Not tainted
SEQSTAT: 00000026 IPEND: 0030 SYSCFG: 0006
EXCAUSE : 0x26
interrupts disabled
physical IVG5 asserted : <0xffa00c48> { _evt_ivhw + 0x0 }
RETE: <0x00000000> /* Maybe null pointer? */
RETN: <0x01970000> [ gdbserver + 0x0 ]
RETX: <0x00000480> /* Maybe fixed code section */
RETS: <0x01a07c92> [ /tmp/test_bfin_ex + 0x7c92 ]
PC : <0x01a07cb6> [ /tmp/test_bfin_ex + 0x7cb6 ]
DCPLB_FAULT_ADDR: <0x2f9c2f0c> /* kernel dynamic memory */
ICPLB_FAULT_ADDR: <0x01a07cb6> [ /tmp/test_bfin_ex + 0x7cb6 ]
PROCESSOR STATE:
R0 : 01a003a0 R1 : 001f2605 R2 : 001f2605 R3 : 00000003
R4 : 0197aa38 R5 : 01992ded R6 : 00000000 R7 : 01992ded
P0 : 01992f14 P1 : 00000034 P2 : 2e030004 P3 : 0197aa38
P4 : 0197ad30 P5 : 2f9c2f0c FP : 01a3f5fc SP : 0196ff24
LB0: 01a08c8d LT0: 01a08c8c LC0: 00000000
LB1: 01a0023d LT1: 01a00234 LC1: 00000000
B0 : 00000000 L0 : 00000000 M0 : 00000000 I0 : 00000000
B1 : 00000000 L1 : 00000000 M1 : 00000000 I1 : 00000000
B2 : 00000000 L2 : 00000000 M2 : 00000000 I2 : 00000000
B3 : 00000000 L3 : 00000000 M3 : 00000000 I3 : 00000000
A0.w: 00000000 A0.x: 00000000 A1.w: 00000000 A1.x: 00000000
USP : 01a3f5d8 ASTAT: 02002022
Hardware Trace:
0 Target : <0x00004cf4> { _trap_c + 0x0 }
Source : <0xffa00642> { _exception_to_level5 + 0xae }
1 Target : <0xffa00594> { _exception_to_level5 + 0x0 }
Source : <0xffa00450> { _bfin_return_from_exception + 0x18 }
2 Target : <0xffa00438> { _bfin_return_from_exception + 0x0 }
Source : <0xffa004ec> { _ex_trap_c + 0x6c }
3 Target : <0xffa00370> { _ex_dcplb_miss + 0x0 }
Source : <0xffa0070c> { _trap + 0x58 }
4 Target : <0xffa006b4> { _trap + 0x0 }
Source : <0x01a07cb4> [ /tmp/test_bfin_ex + 0x7cb4 ] 0x5b55
5 Target : <0x01a07c92> [ /tmp/test_bfin_ex + 0x7c92 ]
Source : <0x01a07986> [ /tmp/test_bfin_ex + 0x7986 ] RTS
6 Target : <0x01a0797c> [ /tmp/test_bfin_ex + 0x797c ]
Source : <0x01a0796c> [ /tmp/test_bfin_ex + 0x796c ] IF CC JUMP
7 Target : <0x01a07954> [ /tmp/test_bfin_ex + 0x7954 ]
Source : <0x01a07c8e> [ /tmp/test_bfin_ex + 0x7c8e ] CALL pcrel
8 Target : <0x01a07c8a> [ /tmp/test_bfin_ex + 0x7c8a ]
Source : <0x01a07aa8> [ /tmp/test_bfin_ex + 0x7aa8 ] RTS
9 Target : <0x01a07a9a> [ /tmp/test_bfin_ex + 0x7a9a ]
Source : <0x01a07a94> [ /tmp/test_bfin_ex + 0x7a94 ] IF CC JUMP
10 Target : <0x01a07a60> [ /tmp/test_bfin_ex + 0x7a60 ]
Source : <0x01a07a2a> [ /tmp/test_bfin_ex + 0x7a2a ] JUMP (P2)
11 Target : <0x01a07a1e> [ /tmp/test_bfin_ex + 0x7a1e ]
Source : <0x01a07a18> [ /tmp/test_bfin_ex + 0x7a18 ] IF CC JUMP
12 Target : <0x01a079f8> [ /tmp/test_bfin_ex + 0x79f8 ]
Source : <0x01a07c86> [ /tmp/test_bfin_ex + 0x7c86 ] CALL pcrel
13 Target : <0x01a07c76> [ /tmp/test_bfin_ex + 0x7c76 ]
Source : <0x01a07c64> [ /tmp/test_bfin_ex + 0x7c64 ] IF !CC JUMP
14 Target : <0x01a07c48> [ /tmp/test_bfin_ex + 0x7c48 ]
Source : <0x01a079da> [ /tmp/test_bfin_ex + 0x79da ] RTS
15 Target : <0x01a079d0> [ /tmp/test_bfin_ex + 0x79d0 ]
Source : <0x01a079bc> [ /tmp/test_bfin_ex + 0x79bc ] IF CC JUMP
The debugger bfin-linux-uclibc-gdb in the CPLB miss case reports a SIGBUS. It occurs always at the same place:
Program received signal SIGBUS, Bus error.
0x01a57b56 in get_cie_encoding (cie=0x2c8500d) at /opt/bfin9r1/gcc-4.3/libgcc/../gcc/unwind-dw2-fde.c:273
273 if (aug[0] != 'z')
Current language: auto; currently c
(gdb) backtrace
#0 0x01a57b56 in get_cie_encoding (cie=0x2c8500d) at /opt/bfin9r1/gcc-4.3/libgcc/../gcc/unwind-dw2-fde.c:273
#1 0x01a57c38 in classify_object_over_fdes (ob=0x1a3ad30, this_fde=0x1a53408) at /opt/bfin9r1/gcc-4.3/libgcc/../gcc/unwind-dw2-fde.c:620
#2 0x01a582ae in search_object (ob=0x1a3ad30, pc=0x1a575a7) at /opt/bfin9r1/gcc-4.3/libgcc/../gcc/unwind-dw2-fde.c:731
#3 0x01a5854c in _Unwind_Find_FDE (pc=0x1a575a7, bases=0x1a7fd04) at /opt/bfin9r1/gcc-4.3/libgcc/../gcc/unwind-dw2-fde.c:994
#4 0x01a568f4 in uw_frame_state_for (context=0x1a7fc2c, fs=0x1a7f740) at /opt/bfin9r1/gcc-4.3/libgcc/../gcc/unwind-dw2.c:1129
#5 0x01a570ee in uw_init_context_1 (context=0x1a7fc2c, outer_cfa=0x1a7fd78, outer_ra=0x1a51464) at /opt/bfin9r1/gcc-4.3/libgcc/../gcc/unwind-dw2.c:1435
#6 0x01a575a8 in _Unwind_RaiseException (exc=0x196d174) at /opt/bfin9r1/gcc-4.3/libgcc/../gcc/unwind.inc:93
#7 0x01a51464 in __cxa_throw (obj=0x1a38ef0, tinfo=<value optimized out>, dest=0x1a5044c <~__scoped_lock>) at /opt/bfin9r1/gcc-4.3/libstdc++-v3/libsupc++/eh_throw.cc:71
#8 0x01a503cc in main () at test_bfin_ex.cc:3
The other possible outcome is a SIGABRT reported by the debugger, and no data access CPLB miss.
Program received signal SIGABRT, Aborted.
0x0197a376 in *___GI_kill (pid=0, sig=6) at libc/sysdeps/linux/common/kill.c:16
16 static inline _syscall2(int, __syscall_kill, __kernel_pid_t, pid, int, sig);
Current language: auto; currently c
(gdb) backtrace
#0 0x0197a376 in *___GI_kill (pid=0, sig=6) at libc/sysdeps/linux/common/kill.c:16
#1 0x019794de in *___GI_abort () at libc/stdlib/abort.c:85
#2 0x01977132 in uw_init_context_1 (context=0x1a7fc2c, outer_cfa=0x1a7fd78, outer_ra=0x1971464) at /opt/bfin9r1/gcc-4.3/libgcc/../gcc/unwind-dw2.c:1257
#3 0x019775a8 in _Unwind_RaiseException (exc=0x1a03174) at /opt/bfin9r1/gcc-4.3/libgcc/../gcc/unwind.inc:93
#4 0x01971464 in __cxa_throw (obj=0x25, tinfo=<value optimized out>, dest=<error reading variable>) at /opt/bfin9r1/gcc-4.3/libstdc++-v3/libsupc++/eh_throw.cc:71
#5 0x019703cc in main () at test_bfin_ex.cc:3
Single-stepping in unwind-dw2-fde.c leads to the assumption that there might be problems with CIE offsets etc., but I'm not really familiar with DWARF2 and that unwinding mechanisms to give some proper diagnosis yet.
Follow-ups
--- Kolja Waschk 2010-03-17 05:59:24
A note about the kernel used: It is the linux-2.6.x as it comes with
uClinux-dist 2009R1.1-RC4, with Xenomai/Adeos patch applied. Therefore it
reports "Linux version 2.6.28.10-ADI-2009R1.1-svn8523" as its version.
The problem appears independent of Xenomai/Adeos presence, though.
--- Kolja Waschk 2010-04-08 03:54:28
New observations: The stop on SIGABRT seems to be okay as a debugger reaction on
uncatched exceptions. However, the CPLB fault absolutely isn't and it turns out
now that it also happens with dynamically linked applications, just less often.
--- Robin Getz 2010-04-20 08:16:03
Kolja:
Rather than a snippet - can you attach a file which is 100% compilable, which
shows the error? Thanks
--- Kolja Waschk 2010-04-20 14:49:57
Hi Robin, it looks ridiculous, but the above "snippet" is 100%
compilable. For convenience, I attach it as a separate file. Kolja
--- David Gibson 2010-05-19 09:16:45
This looks like it could be related to an old issue in the trunk, which should
be resolved in the current toolchain.
A good summary of the original problem can be found here
(https://bugzilla.redhat.com/show_bug.cgi?id=199788).
The problem relates to the alignment of eh_frame entries. It reports the same
stack trace that you encountered for the case where you crash in
get_cie_encoding. It's possible that this is a new and related issue as the
eh_frame size can be platform specific.
There are workarounds suggested. The first is to avoid the use of -static.
The second (and more practical) is to do the following:
Create an empty void function in a file, say eh_frame.c:
void eh_dummy_func(void) { }
And ensure that this is linked into your application AFTER libc.
For example:
gcc test.c -lc eh_frame.c
I have tested this with the example code that you provided and confirm that it
resolves the issue, and in more complex examples it allows exceptions to be
correctly thrown and caught.
In the mean time, I will continue to investigate the issue and locate the cause
of the crash.
--- Steve Kilbane 2010-08-18 06:45:16
This turns out to be a problem with relocating, but I haven't yet worked out
*where* the problem is.
In our test example, the eh_frame info from libstdc++.a(pointer_type_info.o) is
being added to the executable, to produce a CIE and eight FDEs. In the
executable, they look okay - the fourth FDE has a length of 24.
(for those playing along at home: this is the FDE fro
__cxxabiv1::__pointer_type_info::__is_pointer_p().)
When the executable starts up, __self_reloc runs, and in the process, it
changes the length value of the fourth FDE from 24 to - well, to a largely
random number, as it happens. Later on, when the exceptions routines walk the
FDE list looking for a function that matches the current return address on the
stack, they use the length field to get the start of the next FDE in the chain,
and head off into the weeds.
What I haven't yet worked out is whether this is a problem in:
- the compiler output for pointer_type_info.cc
- the linker's output in producing the executable (and instantiating multiple
destructors)
- __self_reloc's idea of what needs relocating
- something else.
Investigations are currently hampered by the inability to just recompile
pointer_type_info.cc, since the build script sets up a bunch of links for the
various files in the bits directory under the uClibc tree, and then clears them
out again later.
--- Steve Kilbane 2010-08-20 08:55:25
Aha. The linker's trying to create a relocation for the personality function in
the CIE of eh_alloc.o. However, that CIE comes from a section that's been marked
as removed during the link. When we ask for the section's offset, we get -1 back
to indicate that it's been removed, but we weren't checking that, and instead
using it as a valid section offset. That generated a reloc that updated a
random, misaligned address.
--- Mingquan Pan 2013-05-24 02:47:51
close.
Files
Changes
Commits
Dependencies
Duplicates
Associations
Tags
File Name File Type File Size Posted By
test_bfin_ex.cc text/x-c++src 58 Kolja Waschk