[#5745] A SMP cache coherence may lead to kernel boot failure

Document created by Aaronwu Employee on Sep 5, 2013
Version 1Show Document
  • View in full screen mode

[#5745] A SMP cache coherence may lead to kernel boot failure

Submitted By: Yi Li

Open Date

2009-12-04 01:37:17    

Priority:

Medium     Assignee:

Yi Li

Status:

Open     Fixed In Release:

N/A

Found In Release:

2010R1     Release:

Category:

N/A     Board:

N/A

Processor:

BF561     Silicon Revision:

Is this bug repeatable?:

Yes     Resolution:

Fixed

Uboot version or rev.:

    Toolchain version or rev.:

2009R1-rc10

App binary format:

N/A     

Summary: A SMP cache coherence may lead to kernel boot failure

Details:

 

when I tested a SMP patch, sometimes the kernel will fail to boot. Kernel will trigger an exception when initializing slab at mm/slab.c: do_tune_cpucache().

 

I found that we need to use smp_rmb() in ipi_call_function() and smp_wmb() in

smp_call_function() to ensure cache coherence on both cores, if

smp_call_function() requires wait.

 

Without these barriers, bellow code may trigger(expose) a cache

coherence bug when doing kmem_cache_create() -> do_tune_cpucache()

sometimes, making the kernel failed to boot.

 

mm/slab.c:

 

/* Always called with the cache_chain_mutex held */

static int do_tune_cpucache(struct kmem_cache *cachep, int limit,

                                int batchcount, int shared, gfp_t gfp)

{

        struct ccupdate_struct *new;

        int i;

 

        new = kzalloc(sizeof(*new), gfp);

        if (!new)

                return -ENOMEM;

 

        for_each_online_cpu(i) {

                new->new[i] = alloc_arraycache(cpu_to_node(i), limit,

                                                batchcount, gfp);

                if (!new->new[i]) {

                        for (i--; i >= 0; i--)

                                kfree(new->new[i]);

                        kfree(new);

                        return -ENOMEM;

                }

        }

        new->cachep = cachep;

 

        /* Yi: one each CPU, call "do_ccupdate_local()", core A will change

new->new[0], core B will change new->new[1] "*/

        on_each_cpu(do_ccupdate_local, (void *)new, 1);

       

        check_irq_on();

        cachep->batchcount = batchcount;

        cachep->limit = limit;

        cachep->shared = shared;

 

        for_each_online_cpu(i) {

 

                /* Yi: here if core A reads new->new[1], it may read from D-cache

instead of SDRAM- a cache coherence isssue */

                struct array_cache *ccold = new->new[i];

                if (!ccold)

                        continue;

 

spin_lock_irq(&cachep->nodelists[cpu_to_node(i)]->list_lock);

                free_block(cachep, ccold->entry, ccold->avail,

cpu_to_node(i));

 

spin_unlock_irq(&cachep->nodelists[cpu_to_node(i)]->list_lock);

                kfree(ccold);

        }

        kfree(new);

        return alloc_kmemlist(cachep, gfp);

}

 

 

------------------------

Kernel failed to boot:

 

Starting Kernel at = 001aae7c

Linux version 2.6.31.6-ADI-2010R1-pre-svn7910 (adam@adam-laptop) (gcc version 4.1.2 (ADI svn)) #73 SMP Fri Dec 4 11:03:15 CST 2009

register early platform devices

bootconsole [early_shadow0] enabled

bootconsole [early_BFuart0] enabled

early printk enabled on early_BFuart0

Limiting kernel memory to 56MB due to anomaly 05000263

Board Memory: 64MB

Kernel Managed Memory: 64MB

Memory map:

  fixedcode = 0x00000400-0x00000490

  text      = 0x00001000-0x0011f790

  rodata    = 0x0011f7a0-0x0017a788

  bss       = 0x0017b000-0x0018d488

  data      = 0x0018d4a0-0x001a0000

    stack   = 0x0019e000-0x001a0000

  init      = 0x001a0000-0x007ac000

  available = 0x007ac000-0x03800000

  DMA Zone  = 0x03f00000-0x04000000

Hardware Trace Active and Enabled

Boot Mode: 0

Blackfin support (C) 2004-2009 Analog Devices, Inc.

Compiled for ADSP-BF561 Rev 0.3

Warning: Compiled for Rev 3, but running on Rev 5

Blackfin Linux support by http://blackfin.uclinux.org/

Processor Speed: 600 MHz core clock and 100 MHz System Clock

NOMPU: setting up cplb tables

NOMPU: setting up cplb tables

Instruction Cache Enabled for CPU0

  External memory: cacheable in instruction cache

  L2 SRAM        : uncacheable in instruction cache

Data Cache Enabled for CPU0

  External memory: cacheable (write-through) in data cache

  L2 SRAM        : uncacheable in data cache

Built 1 zonelists in Zone order, mobility grouping off.  Total pages: 14224

Kernel command line: root=/dev/mtdblock0 rw clkin_hz=30000000 earlyprintk=serial,uart0,57600 console=ttyBF0,57600 ip=192.168.0.7:192.168.0.3:192.168.0.1:255.255.255.0:bf561-ezkit:eth0:off

PID hash table entries: 256 (order: 8, 1024 bytes)

Dentry cache hash table entries: 8192 (order: 3, 32768 bytes)

Inode-cache hash table entries: 4096 (order: 2, 16384 bytes)

Memory available: 48892k/65536k RAM, (6192k init code, 1145k kernel code, 515k data, 1024k dma, 7768k reserved)

NR_IRQS:121

Configuring Blackfin Priority Driven Interrupts

console [ttyBF0] enabled, bootconsole disabled

console [ttyBF0] enabled, bootconsole disabled

Calibrating delay loop... 1191.93 BogoMIPS (lpj=2383872)

Security Framework initialized

Mount-cache hash table entries: 512

CoreB bootstrap code to SRAM ff600000 via DMA.

Booting Core B.

Instruction Cache Enabled for CPU1

  External memory: cacheable in instruction cache

  L2 SRAM        : uncacheable in instruction cache

Data Cache Enabled for CPU1

  External memory: cacheable (write-through) in data cache

  L2 SRAM        : uncacheable in data cache

Calibrating delay loop...

Brought up 2 CPUs

SMP: Total of 2 processors activated (4.09 BogoMIPS).

 

 

 

Follow-ups

 

--- Yi Li                                                    2009-12-04 01:40:03

GDB shows an exception happens on core A:

 

(gdb) info threads

  2 Thread 2 (Core B DBGSTAT [0x0050])  0x0000be72 in get_core_lock () at

include/linux/interrupt.h:118

* 1 Thread 1 (Core A DBGSTAT [0x0058])  0x0000be70 in get_core_lock () at

include/linux/interrupt.h:118

(gdb) thread 2

[Switching to thread 2 (Thread 2)]#0  0x0000be72 in get_core_lock () at

include/linux/interrupt.h:118

118    {

(gdb) bt

#0  0x0000be72 in get_core_lock () at include/linux/interrupt.h:118

#1  0x0002d30e in hrtimer_run_pending () at kernel/hrtimer.c:1415

#2  0x00020c6a in run_timer_softirq (h=<value optimized out>) at

kernel/timer.c:1175

#3  0x0001cfd4 in __do_softirq () at kernel/softirq.c:219

#4  0x0001d2a4 in irq_exit () at kernel/softirq.c:303

#5  0x00005c82 in asm_do_IRQ (irq=6, regs=<value optimized out>) at

arch/blackfin/kernel/irqchip.c:134

#6  0x0000b138 in do_irq (vec=<value optimized out>, fp=0xfffffffe) at

arch/blackfin/mach-common/ints-priority.c:1179

#7  0x0000ab94 in _common_int_entry ()

(gdb) thread 1

[Switching to thread 1 (Thread 1)]#0  0x0000be70 in get_core_lock () at

include/linux/interrupt.h:118

118    {

(gdb) bt

#0  0x0000be70 in get_core_lock () at include/linux/interrupt.h:118

#1  0x00018edc in vprintk (fmt=0x14f990 "<5>NULL pointer

access\n", args=0x202bce4) at kernel/printk.c:683

#2  0x000191a2 in printk (

    fmt=0x3f

"3\2143�3�3�3�3�3�3�;�3�;�\023�3�3�3�3�3�3�s�3�3�#�3�3�3l3�3�3�3�3�3�3�3�7�3�3̳�3�3�3�3�3�3�3�3�3�3�3�3�3�3�3�3�3�3�3�3�3�3�3�3�3�9�3�1�3�\023�3\2143�3�3�3�3�;�3̷�7�3�3�3�3�3�3�3�\023�3�3�3�3L3�3�3�3�3�1�3�3�3�3�3�"...)

    at kernel/printk.c:588

#3  0x0000572a in trap_c (fp=0x202bd94) at arch/blackfin/kernel/traps.c:602

#4  0x0000a5c4 in exception_to_level5 ()

#5  0x0004c826 in do_tune_cpucache (cachep=0x8008, limit=<value optimized

out>, batchcount=<value optimized out>, shared=8,

    gfp=39) at mm/slab.c:3927

#6  0x0004c9e6 in enable_cpucache (cachep=0x2003d40, gfp=208) at

mm/slab.c:3983

#7  0x0011cf4c in setup_cpu_cache (cachep=0x2003d40, gfp=208) at

mm/slab.c:2006

#8  0x0004cea0 in kmem_cache_create (name=0x1546c0

"sram_piece_cache", size=<value optimized out>, align=0,

flags=262144,

    ctor=0) at mm/slab.c:2338

#9  0x001aaad2 in bfin_sram_init () at arch/blackfin/mm/sram-alloc.c:217

#10 0x00001040 in do_one_initcall (fn=0x1aaab4 <bfin_sram_init>) at

init/main.c:753

#11 0x001a04ba in do_initcalls () at init/main.c:793

#12 0x001a04e8 in do_basic_setup () at init/main.c:815

#13 0x001a06f2 in kernel_init (unused=<value optimized out>) at

init/main.c:911

#14 0x0000160e in kernel_thread_helper ()

    at

/home/adam/workspace/local_svn/kernel/linux-kernel/arch/blackfin/include/asm/thread_info.h:75

 

--- Yi Li                                                    2009-12-04 05:17:59

I've check-in a fix (revision #7934) to add smp_wmb() and smp_rmb() in

smp_call_function() and ipi_call_function() to fix this bug.

 

However, I think this is not a perfect fix. Since we don't have HW cache

coherence, every "for_each_online_cpu(i) { ... }" is potentially

dangerous.  

 

See bellow senario:

 

 

/* Always called with the cache_chain_mutex held */

static int do_tune_cpucache(struct kmem_cache *cachep, int limit,

                                int batchcount, int shared, gfp_t gfp)

{

        struct ccupdate_struct *new;

        int i;

 

        new = kzalloc(sizeof(*new), gfp);

        if (!new)

                return -ENOMEM;

 

        for_each_online_cpu(i) {

                new->new[i] = alloc_arraycache(cpu_to_node(i), limit,

                                                batchcount, gfp);

                if (!new->new[i]) {

                        for (i--; i >= 0; i--)

                                kfree(new->new[i]);

                        kfree(new);

                        return -ENOMEM;

                }

        }

        new->cachep = cachep;

 

/* 1. execution on core A */

 

    on_each_cpu(do_ccupdate_local, (void *)new, 1);

 

/* 2. D-cache on core A invalidated, but D-cache of core B is not invalidated

* However, core A changed new->new[0], new->new[0] is different from

those in core B

*/

 

    check_irq_on();

    cachep->batchcount = batchcount;

    cachep->limit = limit;

    cachep->shared = shared;

 

/* 3. For some reason, this task is migrated to core B (in this code, this will

not happen) */

 

    for_each_online_cpu(i) {

 

/* 4. core B read new->new[0] from its cache wrongly */

 

        struct array_cache *ccold = new->new[i];

        if (!ccold)

            continue;

        spin_lock_irq(&cachep->nodelists[cpu_to_node(i)]->list_lock);

        free_block(cachep, ccold->entry, ccold->avail, cpu_to_node(i));

        spin_unlock_irq(&cachep->nodelists[cpu_to_node(i)]->list_lock);

        kfree(ccold);

    }

 

 

#ifdef CONFIG_SMP

/*

* Call a function on all processors

*/

int on_each_cpu(void (*func) (void *info), void *info, int wait)

{

    int ret = 0;

 

    preempt_disable();

    ret = smp_call_function(func, info, wait);

    local_irq_disable();

    func(info);

    local_irq_enable();

    preempt_enable();

    return ret;

}

EXPORT_SYMBOL(on_each_cpu);

#endif

 

--- Yi Li                                                    2009-12-04 05:24:03

In above comment I said "Since we don't have HW cache

coherence, every "for_each_online_cpu(i) { ... }" is potentially

dangerous."

 

I think I would change it to "Since we don't have HW cache

coherence, any access to shared data without protection of spinlock is

potentially dangerous".

 

--- Yi Li                                                    2009-12-30 06:41:07

Fixed this bug by forcefully invalidate D-cache on each cpus. However, in

blackfin, we assume cache coherency by using spi_lock() (since spin_lock()

invalidate whole d-cache) to protect shared data. For code that does not follow

this rule, it is potentially dangerous on Blackfin.

 

 

 

    Files

    Changes

    Commits

    Dependencies

    Duplicates

    Associations

    Tags

 

File Name     File Type     File Size     Posted By

No Files Were Found

Attachments

    Outcomes