[#4917] KGDB single step into the middle of a 4 bytes instruction on bf561 after soft bp is hit.
Submitted By: Sonic Zhang
Open Date
2009-02-19 00:00:29 Close Date
2010-01-14 02:16:24
Priority:
Medium Assignee:
Mike Frysinger
Sonic Zhang
Status:
Closed Fixed In Release:
N/A
Found In Release:
2009R1-RC6 Release:
Category:
N/A Board:
N/A
Processor:
BF561 Silicon Revision:
Is this bug repeatable?:
Yes Resolution:
Fixed
Uboot version or rev.:
Toolchain version or rev.:
SVN trunk 4.1
App binary format:
N/A
Summary: KGDB single step into the middle of a 4 bytes instruction on bf561 after soft bp is hit.
Details:
KGDB single step into the middle of a 4-byte instruction on bf561 after soft bp is hit at this 4-byte instruction. No problem with hardware bp.
The configuration is default with KGDB and KGDB_TESTS enabled.
CONFIG_KGDB=y
CONFIG_KGDB_TESTS=y
CONFIG_SCHED_HRTICK=y
CONFIG_TICK_ONESHOT=y
# CONFIG_NO_HZ is not set
CONFIG_HIGH_RES_TIMERS=y
Wrong output after hit software bp:
(gdb) c
Continuing.
cat /proc/kgdbtest
L1(before change) : data variable addr = 0xff800000, data value is 0
L1 : code function addr = 0xffa003dc
L1(after change) : data variable addr = 0xff800000, data value is 10
L2(before change) : data variable addr = 0xfeb00084, data value is 0
L2 : code function addr = 0xfeb00000
L2(after change) : data variable addr = 0xfeb00084, data value is 20
L1(before change) : data variable addr = 0xff800000, data value is 10
L1 : code function addr = 0xffa003dc
L1(after change) : data variable addr = 0xff800000, data value is 20
L2(before change) : data variable addr = 0xfeb00084, data value is 20
L2 : code function addr = 0xfeb00000
L2(after change) : data variable addr = 0xfeb00084, data value is 40
Correct output without software bp:
(gdb) c
Continuing.
cat /proc/kgdbtest
L1(before change) : data variable addr = 0xff800000, data value is 0
L1 : code function addr = 0xffa003dc
L1(after change) : data variable addr = 0xff800000, data value is 10
L2(before change) : data variable addr = 0xfeb00084, data value is 0
L2 : code function addr = 0xfeb00000
L2(after change) : data variable addr = 0xfeb00084, data value is 20
Follow-ups
--- Sonic Zhang 2009-02-23 04:22:58
If turn off ICACHE, this bug disappear.
--- Sonic Zhang 2009-02-23 05:38:04
This bug is confirmed to be an issue in the icache invalidate code. If following
sequence is met, the wrong instruction is loaded.
1) The one instruction address is cached in the icache.
2) This instruction in SDRAM is changed.
3) IFLASH[P0] is executed only once in blackfin_icache_flush_range().
4) This instruction is executed then.
This issue is walked around properly till Mike clean up the cache flush code in
commit 5419.
------------------------------------------------------------------------
r5419 | vapier | 2008-10-15 03:05:48 +0800 (Wed, 15 Oct 2008) | 1 line
unify/cleanup cache code to (1) be correct wrt to end ranges and (2) be optimal
with a one-instruction hardware loop
------------------------------------------------------------------------
Should add the walkarround back.
--- Sonic Zhang 2009-02-23 05:52:26
Fixed.
--- Mike Frysinger 2009-02-23 11:23:39
this doesnt make sense. either the icache gets flushed, or it doesnt.
executing iflush twice on the same address should make no difference at all.
you've changed the code so it does:
IFLUSH[start];
start = start & -L1_CACHE_BYTES;
IFLUSH[start];
so even if start is in the middle of a cache line (which it rarely is), the
IFLUSH should still hit the same cache line.
--- Bernd Schmidt 2009-02-23 11:38:46
Were there any other workarounds in that file that were lost in r5419?
--- Mike Frysinger 2009-02-23 11:49:22
this isnt a documented issue. this is "it seems to work" but there's
no real information.
other csync/ssync's were dropped, but those all applied to old Blackfins we do
not support.
--- Robin Getz 2009-02-23 12:21:12
Yeah, I not sure I follow how this is a fix.
-Robin
--- Sonic Zhang 2009-02-23 23:16:40
IFLUSH twice is the only walkaround found lost in r5419.
This is a walkaround to fix this bug. I don't find a documented anomaly yet.
Yes, the 2 IFLUSH hit the same cache line. The bug occurs if this cache line is
flushed only once. Any idea on the real root cause?
--- Mike Frysinger 2009-02-23 23:27:51
can you describe exactly what's going on so we can be reproduced w/out linux let
alone kgdb
--- Sonic Zhang 2009-02-23 23:49:36
You can reproduce it in following steps:
0) config DCACHE in WT mode and enable ICACHE.
1) In following kegb_test() function, replace the first 2 bytes of the 4-byte
instruction at address 0x8866 with 2-byte instruction 0xa1.
2) call kgdb_test() for first time
3) exception is hit
4) in trap.c replace 2-byte instruction 0xa1 with the backup 2 bytes.
5) invalidate the instruction from addr to addr + 2, IFLUSH only once.
5) minus PC by 2 on saved register stack and return from the exception.
6) First instruction loaded after return is the 2-byte instruction 0xa1, not
the original 4-byte instruction.
Dump of assembler code for function kgdb_test:
0x0000885c <kgdb_test+0>: LINK 0x14; /* (20) */
0x00008860 <kgdb_test+4>: [FP + 0x8] = R0;
0x00008862 <kgdb_test+6>: [FP + 0xc] = R1;
0x00008864 <kgdb_test+8>: [FP + 0x10] = R2;
0x00008866 <kgdb_test+10>: R1.H = 0x10; /* ( 16)
R1=0x0x100000 <unix_dgram_poll+56>(1048576) */
0x0000886a <kgdb_test+14>: R1.L = 0x688c; /* (26764)
R1=0x0x10688c(1075340) */
0x0000886e <kgdb_test+18>: R2 = [FP + 0xc];
0x00008870 <kgdb_test+20>: R0 = [FP + 0x10];
0x00008872 <kgdb_test+22>: [SP + 0xc] = R0;
0x00008874 <kgdb_test+24>: R0 = [FP + 0x14];
0x00008876 <kgdb_test+26>: [SP + 0x10] = R0;
0x00008878 <kgdb_test+28>: R0 = R1;
0x0000887a <kgdb_test+30>: R1 = R2;
0x0000887c <kgdb_test+32>: R2 = [FP + 0x8];
0x0000887e <kgdb_test+34>: P2.H = 0x1; /* ( 1)
P2=0x0x10000 <do_fork+476> */
0x00008882 <kgdb_test+38>: P2.L = 0xcd8; /* (3288)
P2=0x0x10cd8 <printk> */
0x00008886 <kgdb_test+42>: CALL (P2);
0x00008888 <kgdb_test+44>: R0 = [FP + 0x14];
0x0000888a <kgdb_test+46>: [FP + 0x10] = R0;
0x0000888c <kgdb_test+48>: R0 = [FP + 0x10];
0x0000888e <kgdb_test+50>: UNLINK;
0x00008892 <kgdb_test+54>: RTS;
--- Mike Frysinger 2009-02-23 23:53:00
thanks, i think that's enough for me to try and reproduce with like u-boot/jtag
--- Sonic Zhang 2009-02-23 23:59:58
Correct one line.
6) First instruction loaded after return is a 2-byte instruction, may not be
0xa1, but also not the original 4-byte instruction.
--- Robin Getz 2009-02-24 07:07:28
I know there were some undocumented issues about 32-bit or 64-bit instructions
getting replaced by a 16-bit breakpoints, and having the remaining
"tail" opcodes being decoded in the pipeline, and causing issues like
05000281.
But I don't think a flush would fix that. Would be interesting to understand if
a CSYNC also fixed things.
--- Sonic Zhang 2009-02-25 00:41:21
Both CSYNC and SSYNC don't help.
--- Mike Frysinger 2009-02-27 02:12:19
does this only occur on the bf561-ezkit ? or have you checked other boards ?
--- Sonic Zhang 2009-02-27 03:06:04
No. This was initially discovered on BF533-STAMP and BF537-STAMP.
--- Mike Frysinger 2009-02-27 16:55:00
so how exactly is this test working on the board ? the original summary has a
dearth of information ...
to try to reproduce, i took a BF537-EZKIT with 0.3 silicon and:
$ make AnalogDevices/BF537-STAMP_default
$ make linux_menuconfig
<enable CONFIG_KGDB and CONFIG_KGDB_TESTS>
$ make
then i booted the kernel and ran:
root:/> cat /proc/kgdbtest
L1(before change) : data variable addr = 0xff800000, data value is 0
L1 : code function addr = 0xffa00390
L1(after change) : data variable addr = 0xff800000, data value is 10
then i ran the same `cat` command over and over and every time i would get the
same three lines of output
presumably something else needs to be happening here ... what is that ?
--- Mike Frysinger 2009-02-28 14:35:42
ok, i read through the kgdb expect script and found out how to reproduce this
my tests though indicate this is a problem with kgdb_disable_hw_debug(). if i
remove the writes of 0 to WPIACTL and WPDCTL, things seem to work.
this might also explain why when i connect with gdb over jtag things work one
time and then crash: gdb over jtag set the PWR bit in those registers and then
kgdb cleared them. so writing 0 to the PWR bit when it's already 0 looks like
it causes misbehavior.
i'll check with design
Files
Changes
Commits
Dependencies
Duplicates
Associations
Tags
File Name File Type File Size Posted By
No Files Were Found