[#5274] enable "CONFIG_DEBUG_BFIN_HWTRACE_EXPAND" hangs kernel
Submitted By: Yi Li
Open Date
2009-06-18 05:56:07 Close Date
2009-07-01 14:47:57
Priority:
Medium High Assignee:
Mike Frysinger
Status:
Closed Fixed In Release:
N/A
Found In Release:
N/A Release:
Category:
N/A Board:
STAMP
Processor:
BF537 Silicon Revision:
Is this bug repeatable?:
Yes Resolution:
Fixed
Uboot version or rev.:
Toolchain version or rev.:
2009R1-RC7
App binary format:
N/A
Summary: enable "CONFIG_DEBUG_BFIN_HWTRACE_EXPAND" hangs kernel
Details:
Using 2009R1 branch. Please see attached configure.
Kernel hangs while booting:
"## Booting image at 01000000 ...
Image Name: Linux-2.6.28.10-ADI-2009R1-svn67
Created: 2009-06-18 9:43:36 UTC
Image Type: Blackfin Linux Kernel Image (gzip compressed)
Data Size: 4700379 Bytes = 4.5 MB
Load Address: 00001000
Entry Point: 0018c62c
Verifying Checksum ... OK
Uncompressing Kernel Image ... OK
Starting Kernel at = 18c62c
"
Gdb shows kernel hangs at early_dma_memcpy():
while (1) {
if (!src_ch || src_ch == (struct dma_register *)MDMA_S1_NEXT_DESC_PTR) {
dst_ch = (struct dma_register *)MDMA_D0_NEXT_DESC_PTR;
src_ch = (struct dma_register *)MDMA_S0_NEXT_DESC_PTR;
} else {
dst_ch = (struct dma_register *)MDMA_D1_NEXT_DESC_PTR;
src_ch = (struct dma_register *)MDMA_S1_NEXT_DESC_PTR;
}
if (!bfin_read16(&src_ch->cfg)) {
break;
} else {
if (bfin_read16(&src_ch->irq_status) & DMA_DONE)
bfin_write16(&src_ch->cfg, 0);
}
}
Hope this is reproducible..
Follow-ups
--- Yi Li 2009-06-21 23:16:47
I would like to investigate this bug -- since it is easier for me to reproduce
and I need the expanded HW trace to investigate another bug.
--- Robin Getz 2009-06-22 00:15:15
Let me know how far you get - I was planning on looking at this in the morning.
-Robin
--- Yi Li 2009-06-22 05:49:42
Please have a look because I make no progress on this bug:
I only find DMA never stops in bfin_relocate_l1_mem(){
/* if necessary, copy _stext_l1 to _etext_l1 to L1 instruction SRAM */
l1_code_length = _etext_l1 - _stext_l1;
if (l1_code_length)
early_dma_memcpy(_stext_l1, _l1_lma_start, l1_code_length);
/* if necessary, copy _sdata_l1 to _sbss_l1 to L1 data bank A SRAM */
l1_data_a_length = _sbss_l1 - _sdata_l1;
if (l1_data_a_length)
early_dma_memcpy(_sdata_l1, _l1_lma_start + l1_code_length,
l1_data_a_length);
[Yi: DMA_DONE never get set and following early_dma_memcpy() will hang ]
/* if necessary, copy _sdata_b_l1 to _sbss_b_l1 to L1 data bank B SRAM */
l1_data_b_length = _sbss_b_l1 - _sdata_b_l1;
if (l1_data_b_length)
early_dma_memcpy(_sdata_b_l1, _l1_lma_start + l1_code_length +
l1_data_a_length, l1_data_b_length);
--- Yi Li 2009-06-22 06:16:44
Looks I have found a fix for this bug, will have more test and check-in
tomorrow:
Index: kernel/setup.c
===================================================================
--- kernel/setup.c (revision 6818)
+++ kernel/setup.c (working copy)
@@ -170,11 +170,15 @@ void __init bfin_relocate_l1_mem(void)
l1_code_length = _etext_l1 - _stext_l1;
if (l1_code_length)
early_dma_memcpy(_stext_l1, _l1_lma_start, l1_code_length);
+
+ early_dma_memcpy_done();
/* if necessary, copy _sdata_l1 to _sbss_l1 to L1 data bank A SRAM */
l1_data_a_length = _sbss_l1 - _sdata_l1;
if (l1_data_a_length)
early_dma_memcpy(_sdata_l1, _l1_lma_start + l1_code_length,
l1_data_a_length);
+
+ early_dma_memcpy_done();
/* if necessary, copy _sdata_b_l1 to _sbss_b_l1 to L1 data bank B SRAM */
l1_data_b_length = _sbss_b_l1 - _sdata_b_l1;
--- Yi Li 2009-06-22 06:31:23
We did not see this bug if not enable
"CONFIG_DEBUG_BFIN_HWTRACE_EXPAND", because "l1_data_b_length =
_sbss_b_l1 - _sdata_b_l1", l1_data_b_lenght is "0" in usual case.
--- Mike Frysinger 2009-06-22 08:40:38
i think that is just a workaround. you're still relying on things working
"by accident". i think the real fix is:
--- a/arch/blackfin/mach-common/head.S
+++ b/arch/blackfin/mach-common/head.S
@@ -90,9 +90,7 @@ ENTRY(__start)
[p0 + (ITEST_COMMAND - DTEST_COMMAND)] = R0;
CSYNC;
- trace_buffer_init(p0,r0);
- P0 = R1;
- R0 = R1;
+ trace_buffer_stop(p0, r0);
/* Turn off the icache */
p0.l = LO(IMEM_CONTROL);
@@ -198,6 +196,9 @@ ENTRY(__start)
sp = usp; /* usp hasnt been touched, so restore from there */
#endif
+ /* Now that we have all of our kernel code in L1, start up trace */
+ trace_buffer_init(p0, r0);
+
/* This section keeps the processor in supervisor mode
* during kernel boot. Switches to user mode at end of boot.
* See page 3-9 of Hardware Reference manual for documentation.
--- Robin Getz 2009-06-22 13:24:15
Yeah - I think Mike is more on the right track...
We need to make sure the vector is installed properly (both that the function
exists in the right place, and that we programmed the EVT3 properly) before we
enable the trace buffer...
The other option is to re-write arch/blackfin/include/asm/trace.h - so that the
trace_buffer_init - just turns things on (not expands past 16), and have
something new in setup_arch turn it on after the exceptions vectors have been
programmed.
That way - even if we get a really early crash (before we move things to L1) -
at least we have 16 traces...
-Robin
--- Yi Li 2009-06-22 23:09:40
I tested with Mike's patch, but kernel still hangs at
"bfin_relocate_l1_mem()". And sorry I cannot understand why
trace_buffer_init()/trace_buffer_stop() affect DMA?
--- Robin Getz 2009-06-23 00:33:14
When the trace buffer is full - it throws an exception - (EVT3, EXCAUSE =
0x11).
If the exception entry is in L1 (CONFIG_EXCPT_IRQ_SYSC_L1) - if the trace
buffer is started, and hits 16 before the dma is complete - the kernel will
crash.
I don't think it is the DMA per say - but that far into the boot process.
-Robin
--- Yi Li 2009-06-23 02:40:19
But, according to my understanding, if enables HWTRACE_EXPAND, trace buffer
configured to generate exception in "init_IRQ()". When in
“__start” and in "bfin_relocate_l1_mem()", trace buffer overflow
will not cause exception.
#ifdef CONFIG_DEBUG_BFIN_HWTRACE_EXPAND
/* Now that evt_ivhw is set up, turn this on */
trace_buff_offset = 0;
bfin_write_TBUFCTL(BFIN_TRACE_ON);
printk(KERN_INFO "Hardware Trace expanded to %ik\n",
1 << CONFIG_DEBUG_BFIN_HWTRACE_EXPAND_LEN);
#endif
#ifdef CONFIG_DEBUG_BFIN_HWTRACE_EXPAND
#define BFIN_TRACE_ON (BFIN_TRACE_INIT | (CONFIG_DEBUG_BFIN_HWTRACE_EXPAND
<< 2))
#else
#define BFIN_TRACE_ON (BFIN_TRACE_INIT)
#endif
--- Mike Frysinger 2009-06-23 07:13:12
why would it not generate an exception ? u-boot lowers itself to EVT15 which
means the kernel initializes in EVT15 which means taking an exception to process
trace buffer overflow is entirely acceptable (from the hardware point of view).
--- Robin Getz 2009-06-23 14:31:23
I just booted up on trunk - and it works fine.
root:/> dmesg | grep -i trace
Hardware Trace Active and Enabled
Hardware Trace expanded to 2k
root:/> traps_test -v 5
Running test 5 for exception 0x06: EXCPT 0x06
[snip]
Hardware Trace:
WARNING: Expanded trace turned on - can not trace exceptions
0 Target : <0x00004fb4> { _trap_c + 0x0 }
Source : <0xffa00678> { _exception_to_level5 + 0x94 } CALL pcrel
1 Target : <0xffa005e4> { _exception_to_level5 + 0x0 }
Source : <0xffa004bc> { _bfin_return_from_exception + 0x20 } RTX
2 Target : <0x03e8d9dc> [ /bin/traps_test + 0x19dc ]
[snip]
260 Target : <0x00017bca> { _update_process_times + 0x1a }
Source : <0x0000b06a> { _account_process_tick + 0x2e } RTS
261 Target : <0x0000b066> { _account_process_tick + 0x2a }
Source : <0x0000b0ce> { _account_process_tick + 0x92 } JUMP.S
262 Target : <0x0000b0b6> { _account_process_tick + 0x7a }
Source : <0x0000b08e> { _account_process_tick + 0x52 } IF !CC JUMP
-- so it seems to work for me.
I'll quickly try it out on the branch.
root:/> version
kernel: Linux release 2.6.30-ADI-2010R1-pre-svn6833, build #173 Tue Jun 23
14:26:48 EDT 2009
toolchain: bfin-linux-uclibc-gcc release gcc version 4.3.3
(ADI-trunk/svn-3407)
user-dist: release svn-8345, build #280 Mon Jun 22 10:06:56 EDT 2009
root:/> cat /proc/cpuinfo
processor : 0
vendor_id : Analog Devices
cpu family : 0x27c8
model name : ADSP-BF537 500(MHz CCLK) 100(MHz SCLK) (mpu off)
stepping : 3 (Compiled for Rev 2)
--- Mike Frysinger 2009-06-23 16:10:16
i think the issue may be dependent on (1) the size of the L1 regions being
loaded and (2) the speed of the core. after all, the problem we're basically
talking about is, will the core make 16 changes in code flow from the time it is
turned on early to L1 being fully initialized. i dont think we can guarantee
that sanely (if CCLK is maxed and there's a lot of L1 to load, the core will
spin in the early dma waiting code for a while).
--- Robin Getz 2009-06-24 00:49:07
Ok - in head.S - we call:
trace_buffer_init(p0,r0);
which is defined in arch/blackfin/include/asm/trace.h:
#define trace_buffer_init(preg, dreg) \
preg.L = LO(TBUFCTL); \
preg.H = HI(TBUFCTL); \
dreg = BFIN_TRACE_INIT; \
[preg] = dreg;
#define BFIN_TRACE_INIT ((CONFIG_DEBUG_BFIN_HWTRACE_COMPRESSION << 4) |
0x03)
Which should not cause _any_ exceptions. (As Yi stated - TBUFOVF isn't set
until arch/blackfin/kernel/irqchip.c:init_IRQ() - which I think is after
arch/blackfin/kernel/irqchip.c:init_exception_vectors()
- but to make sure, we can move it.
Index: arch/blackfin/kernel/irqchip.c
===================================================================
--- arch/blackfin/kernel/irqchip.c (revision 6833)
+++ arch/blackfin/kernel/irqchip.c (working copy)
@@ -34,7 +34,6 @@
#include <linux/kallsyms.h>
#include <linux/interrupt.h>
#include <linux/irq.h>
-#include <asm/trace.h>
#include <asm/pda.h>
static atomic_t irq_err_count;
@@ -162,12 +161,4 @@
void __init init_IRQ(void)
{
init_arch_irq();
-
-#ifdef CONFIG_DEBUG_BFIN_HWTRACE_EXPAND
- /* Now that evt_ivhw is set up, turn this on */
- trace_buff_offset = 0;
- bfin_write_TBUFCTL(BFIN_TRACE_ON);
- printk(KERN_INFO "Hardware Trace expanded to %ik\n",
- 1 << CONFIG_DEBUG_BFIN_HWTRACE_EXPAND_LEN);
-#endif
}
Index: arch/blackfin/kernel/setup.c
===================================================================
--- arch/blackfin/kernel/setup.c (revision 6833)
+++ arch/blackfin/kernel/setup.c (working copy)
@@ -33,6 +33,7 @@
#include <asm/cpu.h>
#include <asm/fixed_code.h>
#include <asm/early_printk.h>
+#include <asm/trace.h>
u16 _bfin_swrst;
EXPORT_SYMBOL(_bfin_swrst);
@@ -970,6 +971,14 @@
platform_init_cpus();
#endif
init_exception_vectors();
+
+#ifdef CONFIG_DEBUG_BFIN_HWTRACE_EXPAND
+ /* Now that the exception vectors are set up, turn this on */
+ trace_buff_offset = 0;
+ bfin_write_TBUFCTL(BFIN_TRACE_ON);
+ printk(KERN_INFO "Hardware Trace expanded to %ik\n",
+ 1 << CONFIG_DEBUG_BFIN_HWTRACE_EXPAND_LEN);
+#endif
bfin_cache_init(); /* Initialize caches for the boot CPU */
}
-Robin
--- Yi Li 2009-06-24 03:16:15
Robin,
Can you reproduce this bug on 2009R1 branch? Or as Mike says, this bug only
happens accidentally?
--- Mike Frysinger 2009-06-29 18:43:15
i think the bug here is that early_dma_memcpy() is checking the src registers
and not the dst registers. i.e. the fix is:
--- arch/blackfin/kernel/bfin_dma_5xx.c (revision 6863)
+++ arch/blackfin/kernel/bfin_dma_5xx.c (working copy)
@@ -270,11 +270,11 @@ void __init early_dma_memcpy(void *pdst,
src_ch = (struct dma_register *)MDMA_S1_NEXT_DESC_PTR;
}
- if (!bfin_read16(&src_ch->cfg)) {
+ if (!bfin_read16(&dst_ch->cfg)) {
break;
} else {
- if (bfin_read16(&src_ch->irq_status) &
DMA_DONE)
- bfin_write16(&src_ch->cfg, 0);
+ if (bfin_read16(&dst_ch->irq_status) &
DMA_DONE)
+ bfin_write16(&dst_ch->cfg, 0);
}
}
this isnt directly related to the hwtrace expand option
when it hangs at boot, you can see the early_dma_memcpy() function spinning
forever looking for a dma channel that is done ... but the src irqstat doesnt
get marked done, the dst does
(gdb) _show_dma 0xFFC00F00 /* MDMA D0 */
desc: curr: 0xeeec2804 next: 0x3ef79cf7
addr: curr: 0xffa02d70 start: 0xffa00000
X: curr: 0x0000 count: 0x0b5c mod: 0x0004 (4)
Y: curr: 0x4ce5 count: 0x4ce5 mod: 0xe6f6 (-6410)
dma config: 0x008b (enabled write 1D interrupt stop)
irq status: 0x0001 (done )
(gdb) _show_dma 0xFFC00F40 /* MDMA S0 */
desc: curr: 0xbabbffe8 next: 0x3ae94b91
addr: curr: 0x004960bc start: 0x0049334c
X: curr: 0x0000 count: 0x0b5c mod: 0x0004 (4)
Y: curr: 0x8ff1 count: 0x8ff1 mod: 0x4f7a (20346)
dma config: 0x0009 (enabled read 1D stop)
irq status: 0x0000 ()
(gdb) _show_dma 0xFFC00F80 /* MDMA D1 */
desc: curr: 0x75618662 next: 0xee29c953
addr: curr: 0xff8000c0 start: 0xff800000
X: curr: 0x0000 count: 0x0030 mod: 0x0004 (4)
Y: curr: 0xef38 count: 0xef38 mod: 0xdde1 (-8735)
dma config: 0x008b (enabled write 1D interrupt stop)
irq status: 0x0001 (done )
(gdb) _show_dma 0xFFC00FC0 /* MDMA S1 */
desc: curr: 0xbeaa824a next: 0xee71616a
addr: curr: 0x0049617c start: 0x004960bc
X: curr: 0x0000 count: 0x0030 mod: 0x0004 (4)
Y: curr: 0xb3dc count: 0xb3dc mod: 0x3d15 (15637)
dma config: 0x0009 (enabled read 1D stop)
irq status: 0x0000 ()
--- Yi Li 2009-06-30 03:17:53
Tested and I think the bug is fixed, although it looks to me, check
dst_ch->irq_status should fix the problem:
@@ -273,7 +273,7 @@ void __init early_dma_memcpy(void *pdst,
if (!bfin_read16(&src_ch->cfg)) {
break;
} else {
- if (bfin_read16(&src_ch->irq_status) & DMA_DONE)
+ if (bfin_read16(&dst_ch->irq_status) & DMA_DONE)
bfin_write16(&src_ch->cfg, 0);
}
This should be sync with the check in early_dma_memcpy_done():
void __init early_dma_memcpy_done(void)
{
while ((bfin_read_MDMA_S0_CONFIG() &&
!(bfin_read_MDMA_D0_IRQ_STATUS() & DMA_DONE)) ||
(bfin_read_MDMA_S1_CONFIG() &&
!(bfin_read_MDMA_D1_IRQ_STATUS() & DMA_DONE)))
continue;
Expanded HW trace happens to use L1_data_b, and exposed this bug.
--- Mike Frysinger 2009-07-01 14:47:57
thanks, ive reverted that part, but then fixed up the code some more as i
noticed it had other problems (it did not do ssync after the reset of the
channel that used to be utilized)
Files
Changes
Commits
Dependencies
Duplicates
Associations
Tags
File Name File Type File Size Posted By
bugreport.tar.gz application/x-gzip 18715 Yi Li