[#5745] A SMP cache coherence may lead to kernel boot failure
Submitted By: Yi Li
Open Date
2009-12-04 01:37:17
Priority:
Medium Assignee:
Yi Li
Status:
Open Fixed In Release:
N/A
Found In Release:
2010R1 Release:
Category:
N/A Board:
N/A
Processor:
BF561 Silicon Revision:
Is this bug repeatable?:
Yes Resolution:
Fixed
Uboot version or rev.:
Toolchain version or rev.:
2009R1-rc10
App binary format:
N/A
Summary: A SMP cache coherence may lead to kernel boot failure
Details:
when I tested a SMP patch, sometimes the kernel will fail to boot. Kernel will trigger an exception when initializing slab at mm/slab.c: do_tune_cpucache().
I found that we need to use smp_rmb() in ipi_call_function() and smp_wmb() in
smp_call_function() to ensure cache coherence on both cores, if
smp_call_function() requires wait.
Without these barriers, bellow code may trigger(expose) a cache
coherence bug when doing kmem_cache_create() -> do_tune_cpucache()
sometimes, making the kernel failed to boot.
mm/slab.c:
/* Always called with the cache_chain_mutex held */
static int do_tune_cpucache(struct kmem_cache *cachep, int limit,
int batchcount, int shared, gfp_t gfp)
{
struct ccupdate_struct *new;
int i;
new = kzalloc(sizeof(*new), gfp);
if (!new)
return -ENOMEM;
for_each_online_cpu(i) {
new->new[i] = alloc_arraycache(cpu_to_node(i), limit,
batchcount, gfp);
if (!new->new[i]) {
for (i--; i >= 0; i--)
kfree(new->new[i]);
kfree(new);
return -ENOMEM;
}
}
new->cachep = cachep;
/* Yi: one each CPU, call "do_ccupdate_local()", core A will change
new->new[0], core B will change new->new[1] "*/
on_each_cpu(do_ccupdate_local, (void *)new, 1);
check_irq_on();
cachep->batchcount = batchcount;
cachep->limit = limit;
cachep->shared = shared;
for_each_online_cpu(i) {
/* Yi: here if core A reads new->new[1], it may read from D-cache
instead of SDRAM- a cache coherence isssue */
struct array_cache *ccold = new->new[i];
if (!ccold)
continue;
spin_lock_irq(&cachep->nodelists[cpu_to_node(i)]->list_lock);
free_block(cachep, ccold->entry, ccold->avail,
cpu_to_node(i));
spin_unlock_irq(&cachep->nodelists[cpu_to_node(i)]->list_lock);
kfree(ccold);
}
kfree(new);
return alloc_kmemlist(cachep, gfp);
}
------------------------
Kernel failed to boot:
Starting Kernel at = 001aae7c
Linux version 2.6.31.6-ADI-2010R1-pre-svn7910 (adam@adam-laptop) (gcc version 4.1.2 (ADI svn)) #73 SMP Fri Dec 4 11:03:15 CST 2009
register early platform devices
bootconsole [early_shadow0] enabled
bootconsole [early_BFuart0] enabled
early printk enabled on early_BFuart0
Limiting kernel memory to 56MB due to anomaly 05000263
Board Memory: 64MB
Kernel Managed Memory: 64MB
Memory map:
fixedcode = 0x00000400-0x00000490
text = 0x00001000-0x0011f790
rodata = 0x0011f7a0-0x0017a788
bss = 0x0017b000-0x0018d488
data = 0x0018d4a0-0x001a0000
stack = 0x0019e000-0x001a0000
init = 0x001a0000-0x007ac000
available = 0x007ac000-0x03800000
DMA Zone = 0x03f00000-0x04000000
Hardware Trace Active and Enabled
Boot Mode: 0
Blackfin support (C) 2004-2009 Analog Devices, Inc.
Compiled for ADSP-BF561 Rev 0.3
Warning: Compiled for Rev 3, but running on Rev 5
Blackfin Linux support by http://blackfin.uclinux.org/
Processor Speed: 600 MHz core clock and 100 MHz System Clock
NOMPU: setting up cplb tables
NOMPU: setting up cplb tables
Instruction Cache Enabled for CPU0
External memory: cacheable in instruction cache
L2 SRAM : uncacheable in instruction cache
Data Cache Enabled for CPU0
External memory: cacheable (write-through) in data cache
L2 SRAM : uncacheable in data cache
Built 1 zonelists in Zone order, mobility grouping off. Total pages: 14224
Kernel command line: root=/dev/mtdblock0 rw clkin_hz=30000000 earlyprintk=serial,uart0,57600 console=ttyBF0,57600 ip=192.168.0.7:192.168.0.3:192.168.0.1:255.255.255.0:bf561-ezkit:eth0:off
PID hash table entries: 256 (order: 8, 1024 bytes)
Dentry cache hash table entries: 8192 (order: 3, 32768 bytes)
Inode-cache hash table entries: 4096 (order: 2, 16384 bytes)
Memory available: 48892k/65536k RAM, (6192k init code, 1145k kernel code, 515k data, 1024k dma, 7768k reserved)
NR_IRQS:121
Configuring Blackfin Priority Driven Interrupts
console [ttyBF0] enabled, bootconsole disabled
console [ttyBF0] enabled, bootconsole disabled
Calibrating delay loop... 1191.93 BogoMIPS (lpj=2383872)
Security Framework initialized
Mount-cache hash table entries: 512
CoreB bootstrap code to SRAM ff600000 via DMA.
Booting Core B.
Instruction Cache Enabled for CPU1
External memory: cacheable in instruction cache
L2 SRAM : uncacheable in instruction cache
Data Cache Enabled for CPU1
External memory: cacheable (write-through) in data cache
L2 SRAM : uncacheable in data cache
Calibrating delay loop...
Brought up 2 CPUs
SMP: Total of 2 processors activated (4.09 BogoMIPS).
Follow-ups
--- Yi Li 2009-12-04 01:40:03
GDB shows an exception happens on core A:
(gdb) info threads
2 Thread 2 (Core B DBGSTAT [0x0050]) 0x0000be72 in get_core_lock () at
include/linux/interrupt.h:118
* 1 Thread 1 (Core A DBGSTAT [0x0058]) 0x0000be70 in get_core_lock () at
include/linux/interrupt.h:118
(gdb) thread 2
[Switching to thread 2 (Thread 2)]#0 0x0000be72 in get_core_lock () at
include/linux/interrupt.h:118
118 {
(gdb) bt
#0 0x0000be72 in get_core_lock () at include/linux/interrupt.h:118
#1 0x0002d30e in hrtimer_run_pending () at kernel/hrtimer.c:1415
#2 0x00020c6a in run_timer_softirq (h=<value optimized out>) at
kernel/timer.c:1175
#3 0x0001cfd4 in __do_softirq () at kernel/softirq.c:219
#4 0x0001d2a4 in irq_exit () at kernel/softirq.c:303
#5 0x00005c82 in asm_do_IRQ (irq=6, regs=<value optimized out>) at
arch/blackfin/kernel/irqchip.c:134
#6 0x0000b138 in do_irq (vec=<value optimized out>, fp=0xfffffffe) at
arch/blackfin/mach-common/ints-priority.c:1179
#7 0x0000ab94 in _common_int_entry ()
(gdb) thread 1
[Switching to thread 1 (Thread 1)]#0 0x0000be70 in get_core_lock () at
include/linux/interrupt.h:118
118 {
(gdb) bt
#0 0x0000be70 in get_core_lock () at include/linux/interrupt.h:118
#1 0x00018edc in vprintk (fmt=0x14f990 "<5>NULL pointer
access\n", args=0x202bce4) at kernel/printk.c:683
#2 0x000191a2 in printk (
fmt=0x3f
"3\2143�3�3�3�3�3�3�;�3�;�\023�3�3�3�3�3�3�s�3�3�#�3�3�3l3�3�3�3�3�3�3�3�7�3�3̳�3�3�3�3�3�3�3�3�3�3�3�3�3�3�3�3�3�3�3�3�3�3�3�3�3�9�3�1�3�\023�3\2143�3�3�3�3�;�3̷�7�3�3�3�3�3�3�3�\023�3�3�3�3L3�3�3�3�3�1�3�3�3�3�3�"...)
at kernel/printk.c:588
#3 0x0000572a in trap_c (fp=0x202bd94) at arch/blackfin/kernel/traps.c:602
#4 0x0000a5c4 in exception_to_level5 ()
#5 0x0004c826 in do_tune_cpucache (cachep=0x8008, limit=<value optimized
out>, batchcount=<value optimized out>, shared=8,
gfp=39) at mm/slab.c:3927
#6 0x0004c9e6 in enable_cpucache (cachep=0x2003d40, gfp=208) at
mm/slab.c:3983
#7 0x0011cf4c in setup_cpu_cache (cachep=0x2003d40, gfp=208) at
mm/slab.c:2006
#8 0x0004cea0 in kmem_cache_create (name=0x1546c0
"sram_piece_cache", size=<value optimized out>, align=0,
flags=262144,
ctor=0) at mm/slab.c:2338
#9 0x001aaad2 in bfin_sram_init () at arch/blackfin/mm/sram-alloc.c:217
#10 0x00001040 in do_one_initcall (fn=0x1aaab4 <bfin_sram_init>) at
init/main.c:753
#11 0x001a04ba in do_initcalls () at init/main.c:793
#12 0x001a04e8 in do_basic_setup () at init/main.c:815
#13 0x001a06f2 in kernel_init (unused=<value optimized out>) at
init/main.c:911
#14 0x0000160e in kernel_thread_helper ()
at
/home/adam/workspace/local_svn/kernel/linux-kernel/arch/blackfin/include/asm/thread_info.h:75
--- Yi Li 2009-12-04 05:17:59
I've check-in a fix (revision #7934) to add smp_wmb() and smp_rmb() in
smp_call_function() and ipi_call_function() to fix this bug.
However, I think this is not a perfect fix. Since we don't have HW cache
coherence, every "for_each_online_cpu(i) { ... }" is potentially
dangerous.
See bellow senario:
/* Always called with the cache_chain_mutex held */
static int do_tune_cpucache(struct kmem_cache *cachep, int limit,
int batchcount, int shared, gfp_t gfp)
{
struct ccupdate_struct *new;
int i;
new = kzalloc(sizeof(*new), gfp);
if (!new)
return -ENOMEM;
for_each_online_cpu(i) {
new->new[i] = alloc_arraycache(cpu_to_node(i), limit,
batchcount, gfp);
if (!new->new[i]) {
for (i--; i >= 0; i--)
kfree(new->new[i]);
kfree(new);
return -ENOMEM;
}
}
new->cachep = cachep;
/* 1. execution on core A */
on_each_cpu(do_ccupdate_local, (void *)new, 1);
/* 2. D-cache on core A invalidated, but D-cache of core B is not invalidated
* However, core A changed new->new[0], new->new[0] is different from
those in core B
*/
check_irq_on();
cachep->batchcount = batchcount;
cachep->limit = limit;
cachep->shared = shared;
/* 3. For some reason, this task is migrated to core B (in this code, this will
not happen) */
for_each_online_cpu(i) {
/* 4. core B read new->new[0] from its cache wrongly */
struct array_cache *ccold = new->new[i];
if (!ccold)
continue;
spin_lock_irq(&cachep->nodelists[cpu_to_node(i)]->list_lock);
free_block(cachep, ccold->entry, ccold->avail, cpu_to_node(i));
spin_unlock_irq(&cachep->nodelists[cpu_to_node(i)]->list_lock);
kfree(ccold);
}
#ifdef CONFIG_SMP
/*
* Call a function on all processors
*/
int on_each_cpu(void (*func) (void *info), void *info, int wait)
{
int ret = 0;
preempt_disable();
ret = smp_call_function(func, info, wait);
local_irq_disable();
func(info);
local_irq_enable();
preempt_enable();
return ret;
}
EXPORT_SYMBOL(on_each_cpu);
#endif
--- Yi Li 2009-12-04 05:24:03
In above comment I said "Since we don't have HW cache
coherence, every "for_each_online_cpu(i) { ... }" is potentially
dangerous."
I think I would change it to "Since we don't have HW cache
coherence, any access to shared data without protection of spinlock is
potentially dangerous".
--- Yi Li 2009-12-30 06:41:07
Fixed this bug by forcefully invalidate D-cache on each cpus. However, in
blackfin, we assume cache coherency by using spi_lock() (since spin_lock()
invalidate whole d-cache) to protect shared data. For code that does not follow
this rule, it is potentially dangerous on Blackfin.
Files
Changes
Commits
Dependencies
Duplicates
Associations
Tags
File Name File Type File Size Posted By
No Files Were Found