[#5874] malloc performance frequently fails on bf561 SMP kernel

Document created by Aaronwu Employee on Sep 10, 2013
Version 1Show Document
  • View in full screen mode

[#5874] malloc performance frequently fails on bf561 SMP kernel

Submitted By: Mingquan Pan

Open Date

2010-01-28 23:18:07     Close Date

2010-02-11 03:51:10

Priority:

Medium     Assignee:

Graf Yang

Status:

Closed     Fixed In Release:

N/A

Found In Release:

2010R1     Release:

Category:

N/A     Board:

N/A

Processor:

BF561     Silicon Revision:

Is this bug repeatable?:

Yes     Resolution:

Rejected

Uboot version or rev.:

    Toolchain version or rev.:

4.3.4 (ADI-trunk/svn-3771)

App binary format:

N/A     

Summary: malloc performance frequently fails on bf561 SMP kernel

Details:

 

malloc performance frequently fails on bf561 SMP kernel now.

 

Linux version 2.6.32.2-ADI-2010R1-pre-svn8124 (test@uclinux65-561-SMP) (gcc version 4.3.4 (ADI-trunk/svn-3771) ) #40 SMP Thu Jan 7 11:54:37 GMT 2010

register early platform devices

bootconsole [early_shadow0] enabled

bootconsole [early_BFuart0] enabled

early printk enabled on early_BFuart0

Board Memory: 64MB

Kernel Managed Memory: 64MB

Memory map:

  fixedcode = 0x00000400-0x00000490

  text      = 0x00001000-0x0010c010

  rodata    = 0x0010c020-0x0015ebf0

  bss       = 0x0015f000-0x001714c8

  data      = 0x001714e0-0x00182000

    stack   = 0x00180000-0x00182000

  init      = 0x00182000-0x006d6000

  available = 0x006d6000-0x03f00000

  DMA Zone  = 0x03f00000-0x04000000

Hardware Trace Active and Enabled

Boot Mode: 0

Reset caused by Software reset

Blackfin support (C) 2004-2009 Analog Devices, Inc.

Compiled for ADSP-BF561 Rev 0.5

Blackfin Linux support by http://blackfin.uclinux.org/

Processor Speed: 600 MHz core clock and 100 MHz System Clock

NOMPU: setting up cplb tables

NOMPU: setting up cplb tables

Instruction Cache Enabled for CPU0

  External memory: cacheable in instruction cache

  L2 SRAM        : uncacheable in instruction cache

Data Cache Enabled for CPU0

  External memory: cacheable (write-through) in data cache

  L2 SRAM        : uncacheable in data cache

Built 1 zonelists in Zone order, mobility grouping off.  Total pages: 16002

Kernel command line: root=/dev/mtdblock0 rw ip=10.100.4.50 earlyprintk=serial,uart0,57600 console=ttyBF0,57600 ip=10.100.4.50:10.100.4.174:10.100.4.174:255.255.255.0:bf561-ezkit:eth0:off

PID hash table entries: 256 (order: -2, 1024 bytes)

Dentry cache hash table entries: 8192 (order: 3, 32768 bytes)

Inode-cache hash table entries: 4096 (order: 2, 16384 bytes)

Memory available: 56916k/65536k RAM, (5456k init code, 1068k kernel code, 472k data, 1024k dma, 600k reserved)

Hierarchical RCU implementation.

NR_IRQS:121

Configuring Blackfin Priority Driven Interrupts

console [ttyBF0] enabled, bootconsole disabled

console [ttyBF0] enabled, bootconsole disabled

Calibrating delay loop... 1187.84 BogoMIPS (lpj=2375680)

Mount-cache hash table entries: 512

CoreB bootstrap code to SRAM ff600000 via DMA.

Booting Core B.

Instruction Cache Enabled for CPU1

  External memory: cacheable in instruction cache

  L2 SRAM        : uncacheable in instruction cache

Data Cache Enabled for CPU1

  External memory: cacheable (write-through) in data cache

  L2 SRAM        : uncacheable in data cache

Brought up 2 CPUs

Calibrating delay loop...

SMP: Total of 2 processors activated (4.09 BogoMIPS).

Blackfin Scratchpad data SRAM: 4 KB

Blackfin Scratchpad data SRAM: 4 KB

Blackfin L1 Data A SRAM: 16 KB (16 KB free)

Blackfin L1 Data A SRAM: 16 KB (16 KB free)

Blackfin L1 Data B SRAM: 16 KB (16 KB free)

Blackfin L1 Data B SRAM: 16 KB (16 KB free)

Blackfin L1 Instruction SRAM: 16 KB (15 KB free)

Blackfin L1 Instruction SRAM: 16 KB (15 KB free)

Blackfin L2 SRAM: 128 KB (127 KB free)

NET: Registered protocol family 16

Blackfin DMA Controller

ezkit_init(): registering device resources

bio: create slab <bio-0> at 0

Switching to clocksource jiffies

NET: Registered protocol family 2

IP route cache hash table entries: 1024 (order: 0, 4096 bytes)

TCP established hash table entries: 2048 (order: 2, 16384 bytes)

TCP bind hash table entries: 2048 (order: 2, 16384 bytes)

TCP: Hash tables configured (established 2048 bind 2048)

2138.11 BogoMIPS (lpj=4276224)

TCP reno registered

NET: Registered protocol family 1

msgmni has been set to 111

io scheduler noop registered

io scheduler anticipatory registered (default)

bfin-uart: Blackfin serial driver

bfin-uart.0: ttyBF0 at MMIO 0xffc00400 (irq = 35) is a BFIN-UART

brd: module loaded

bfin-spi bfin-spi.0: Blackfin on-chip SPI Controller Driver, Version 1.0, regs_base@ffc00500, dma channel@16

smc91x.c: v1.1, sep 22 2004 by Nicolas Pitre <nico@fluxnic.net>

eth0: SMC91C11xFD (rev 2) at 2c010300 IRQ 82 [nowait]

eth0: Ethernet addr: 00:e0:22:fe:ba:2a

bfin-wdt: initialized: timeout=20 sec (nowayout=0)

TCP cubic registered

NET: Registered protocol family 17

eth0: link down

IP-Config: Complete:

     device=eth0, addr=10.100.4.50, mask=255.255.255.0, gw=10.100.4.174,

     host=bf561-ezkit, domain=, nis-domain=(none),

     bootserver=10.100.4.174, rootserver=10.100.4.174, rootpath=

Freeing unused kernel memory: 5456k freed

dma_alloc_init: dma_page @ 0x02786000 - 256 pages at 0x03f00000

eth0: link up, 100Mbps, full-duplex, lpa 0x41E1

                           _____________________________________

        a8888b.           / Welcome to the uClinux distribution \

       d888888b.         /       _     _                         \

       8P"YP"Y88        /       | |   |_|            __  __ (TM)  |

       8|o||o|88  _____/        | |    _ ____  _   _ \ \/ /       |

       8'    .88       \        | |   | |  _ \| | | | \  /        |

       8`._.' Y8.       \       | |__ | | | | | |_| | /  \        |

      d/      `8b.       \      \____||_|_| |_|\____|/_/\_\       |

     dP   .    Y8b.       \   For embedded processors including   |

    d8:'  "  `::88b        \    the Analog Devices Blackfin      /

   d8"         'Y88b        \___________________________________/

  :8P    '      :888

   8a.   :     _a88P         For further information, check out:

._/"Yaa_:   .| 88P|            - http://blackfin.uclinux.org/

\    YP"    `| 8P  `.          - http://docs.blackfin.uclinux.org/

/     \.___.d|    .'           - http://www.uclinux.org/

`--..__)8888P`._.'  jgs/a:f    - http://www.analog.com/blackfin

 

Have a lot of fun...

 

 

BusyBox v1.15.3 (2010-01-07 04:40:08 GMT) hush - the humble shell

 

root:/> version

kernel:    Linux release 2.6.32.2-ADI-2010R1-pre-svn8124, build #40 SMP Thu Jan 7 11:54:37 GMT 2010

toolchain: bfin-uclinux-gcc release gcc version 4.3.4 (ADI-trunk/svn-3771)

user-dist: release svn-9347, build #461 Thu Jan 7 11:53:29 GMT 2010

root:/> successful boot attempt

************** STEP 3: Start testing.

 

uname -a

Linux blackfin 2.6.32.2-ADI-2010R1-pre-svn8124 #40 SMP Thu Jan 7 11:54:37 GMT 2010 blackfin GNU/Linux

root:/> malloc-perf 120

00004k : 0x02772004  000000 000000 000000

00008k : 0x020a0004  000000 000000 000000

00012k : 0x020b4004  000000 000031 004000

00016k : 0x02948004  000000 000000 000000

00020k : 0x02950004  000000 000000 000000

00024k : 0x02958004  000000 000000 000000

00028k : 0x02960004  000000 000000 000000

00032k : 0x02968004  000000 000000 000000

00036k : 0x02970004  000000 000062 004000

00040k : 0x02970004  000000 000093 004000

00044k : 0x02970004  000000 000031 004000

00048k : 0x02970004  000000 000031 004000

00052k : 0x02970004  000000 000031 004000

00056k : 0x02970004  000000 000031 004000

00060k : 0x02970004  000000 000031 004000

00064k : 0x02970004  000000 000000 000000

00068k : 0x02a00004  000000 000062 004000

00072k : 0x02a00004  000000 000062 004000

00076k : 0x02a00004  000000 000062 004000

00080k : 0x02a00004  000000 000031 004000

00084k : 0x02a00004  000000 000031 004000

00088k : 0x02a00004  000000 000062 004000

00092k : 0x02a00004  000000 000031 004000

00096k : 0x02a00004  000000 000031 004000

00100k : 0x02a00004  000000 000031 004000

00104k : 0x02a00004  000000 000062 004000

00108k : 0x02a00004  000000 000062 004000

00112k : 0x02a00004  000000 000031 004000

00116k : 0x02a00004  000000 000062 004000

00120k : 0x02a00004  000000 000062 004000

00124k : 0x02a00004  000000 000031 004000

00128k : 0x02a00004  000000 000031 004000

00256k : 0x02a00004  000000 000031 004000

00384k : 0x02a00004  000000 000125 004000

00512k : 0x02a00004  000000 000031 004000

00640k : 0x02a00004  000000 000125 004000

00768k : 0x02a00004  000000 000156 004000

00896k : 0x02a00004  000000 000125 004000

01024k : 0x02a00004  000000 000156 004000

TEST FAIL

 

Follow-ups

 

--- Graf Yang                                                2010-01-30 07:47:22

The SMP kernel performance is a bit lower than the UP kernel. If it need 20%-30%

more time to finish the malloc performance test, I think it is normal.

Can you point out a probable time that the fail begins frequently?

 

--- Mingquan Pan                                             2010-02-01 04:29:00

The log of Nov 27 has much better result, it can pass 9 of 10 times.

 

root:/> version

kernel:    Linux release 2.6.31.6-ADI-2010R1-pre-svn7883, build #105 SMP Fri

Nov 27 09:24:48 GMT 2009

toolchain: bfin-uclinux-gcc release gcc version 4.3.4 (ADI-trunk/svn-3679)

user-dist: release svn-9185, build #1192 Fri Nov 27 09:23:32 GMT 2009

root:/> successful boot attempt

************** STEP 3: Start testing.

 

root:/> malloc-perf 120

00004k : 0x02bfb004  000000 000000 000000

00008k : 0x02b6e004  000000 000000 000000

00012k : 0x02bec004  000000 000000 000000

00016k : 0x020cc004  000000 000000 000000

00020k : 0x02ae8004  000000 000000 000000

00024k : 0x02a68004  000000 000000 000000

00028k : 0x02960004  000000 000000 000000

00032k : 0x029e8004  000000 000031 004000

00036k : 0x02a70004  000000 000031 004000

00040k : 0x02a70004  000000 000000 000000

00044k : 0x02a70004  000000 000062 004000

00048k : 0x02a70004  000000 000062 004000

00052k : 0x02a70004  000000 000062 004000

00056k : 0x02a70004  000000 000031 004000

00060k : 0x02a70004  000000 000000 000000

00064k : 0x02a70004  000000 000031 004000

00068k : 0x02840004  000000 000031 004000

00072k : 0x02840004  000000 000062 004000

00076k : 0x02840004  000000 000062 004000

00080k : 0x02840004  000000 000062 004000

00084k : 0x02840004  000000 000031 004000

00088k : 0x02840004  000000 000031 004000

00092k : 0x02840004  000000 000000 000000

00096k : 0x02840004  000000 000093 004000

00100k : 0x02840004  000000 000093 004000

00104k : 0x02840004  000000 000031 004000

00108k : 0x02840004  000000 000062 004000

00112k : 0x02840004  000000 000093 004000

00116k : 0x02840004  000000 000093 004000

00120k : 0x02840004  000000 000000 000000

00124k : 0x02840004  000000 000000 000000

00128k : 0x02840004  000000 000000 000000

00256k : 0x02800004  000000 000062 004000

00384k : 0x02c00004  000000 000000 000000

00512k : 0x02c00004  000000 000125 004000

00640k : 0x02c00004  000000 000187 004000

00768k : 0x02c00004  000000 000000 000000

00896k : 0x02c00004  000000 000062 004000

01024k : 0x02c00004  000000 000000 000000

TEST PASS

 

--- Graf Yang                                                2010-02-01 22:43:23

I found the new SMP kernel has lower malloc performance over the old(2009R1 for

example). I'd dig out the reason.

 

--- Graf Yang                                                2010-02-03 06:26:16

Not bug.

Since kernel updated to 2.6.32, the function prep_new_page(struct page *page,

int order, gfp_t gfp_flags) will check every page, other than only check the

first page. This makes the malloc-perf will take double time when malloc

memory.

So I suggest double the threshold to 200,

 

malloc-perf 200

 

BTW, this test need enable at least one CLOCKSOURCE options.

 

--- Graf Yang                                                2010-02-03 21:16:13

Current SMP kernel 180, UP 90 should pass the test.

 

--- Mingquan Pan                                             2010-02-11 03:50:33

After increasing the param, it can pass now.

 

root:/> malloc-perf 180^M

00004k : 0x02775004  000003 000004 000046^M

00008k : 0x027ea004  000003 000004 000040^M

00012k : 0x0289c004  000003 000004 000039^M

00016k : 0x027ea004  000003 000004 000046^M

00020k : 0x027ea004  000003 000003 000006^M

00024k : 0x027ea004  000003 000004 000030^M

00028k : 0x028c0004  000004 000004 000054^M

00032k : 0x028c8004  000031 000033 000086^M

00036k : 0x028d0004  000034 000037 000105^M

00040k : 0x028d0004  000034 000037 000097^M

00044k : 0x028d0004  000035 000037 000070^M

00048k : 0x028d0004  000034 000038 000108^M

00052k : 0x028d0004  000034 000046 001117^M

00056k : 0x028d0004  000034 000037 000103^M

00060k : 0x028d0004  000035 000037 000092^M

00064k : 0x028d0004  000035 000037 000106^M

00068k : 0x028e0004  000041 000044 000104^M

00072k : 0x028e0004  000041 000045 000193^M

00076k : 0x028e0004  000041 000046 000110^M

00080k : 0x028e0004  000041 000045 000111^M

00084k : 0x028e0004  000041 000044 000113^M

00088k : 0x028e0004  000041 000044 000101^M

00092k : 0x028e0004  000040 000044 000102^M

00096k : 0x028e0004  000040 000043 000113^M

00100k : 0x028e0004  000041 000045 000115^M

00104k : 0x028e0004  000041 000044 000098^M

00108k : 0x028e0004  000041 000045 000118^M

00112k : 0x028e0004  000041 000045 000110^M

00116k : 0x028e0004  000041 000045 000112^M

00120k : 0x028e0004  000041 000045 000138^M

00124k : 0x028e0004  000041 000045 000103^M

00128k : 0x028e0004  000041 000045 000104^M

00256k : 0x02a00004  000055 000068 001129^M

00384k : 0x02a00004  000081 000091 000164^M

00512k : 0x02a00004  000081 000091 000169^M

00640k : 0x02a00004  000134 000145 000206^M

00768k : 0x02a00004  000134 000148 000214^M

00896k : 0x02a00004  000134 000150 000243^M

01024k : 0x02a00004  000134 000149 000276^M

TEST PASS^M

root:/> malloc-perf pass

 

 

 

 

    Files

    Changes

    Commits

    Dependencies

    Duplicates

    Associations

    Tags

 

File Name     File Type     File Size     Posted By

No Files Were Found

Attachments

    Outcomes