[#5874] malloc performance frequently fails on bf561 SMP kernel
Submitted By: Mingquan Pan
Open Date
2010-01-28 23:18:07 Close Date
2010-02-11 03:51:10
Priority:
Medium Assignee:
Graf Yang
Status:
Closed Fixed In Release:
N/A
Found In Release:
2010R1 Release:
Category:
N/A Board:
N/A
Processor:
BF561 Silicon Revision:
Is this bug repeatable?:
Yes Resolution:
Rejected
Uboot version or rev.:
Toolchain version or rev.:
4.3.4 (ADI-trunk/svn-3771)
App binary format:
N/A
Summary: malloc performance frequently fails on bf561 SMP kernel
Details:
malloc performance frequently fails on bf561 SMP kernel now.
Linux version 2.6.32.2-ADI-2010R1-pre-svn8124 (test@uclinux65-561-SMP) (gcc version 4.3.4 (ADI-trunk/svn-3771) ) #40 SMP Thu Jan 7 11:54:37 GMT 2010
register early platform devices
bootconsole [early_shadow0] enabled
bootconsole [early_BFuart0] enabled
early printk enabled on early_BFuart0
Board Memory: 64MB
Kernel Managed Memory: 64MB
Memory map:
fixedcode = 0x00000400-0x00000490
text = 0x00001000-0x0010c010
rodata = 0x0010c020-0x0015ebf0
bss = 0x0015f000-0x001714c8
data = 0x001714e0-0x00182000
stack = 0x00180000-0x00182000
init = 0x00182000-0x006d6000
available = 0x006d6000-0x03f00000
DMA Zone = 0x03f00000-0x04000000
Hardware Trace Active and Enabled
Boot Mode: 0
Reset caused by Software reset
Blackfin support (C) 2004-2009 Analog Devices, Inc.
Compiled for ADSP-BF561 Rev 0.5
Blackfin Linux support by http://blackfin.uclinux.org/
Processor Speed: 600 MHz core clock and 100 MHz System Clock
NOMPU: setting up cplb tables
NOMPU: setting up cplb tables
Instruction Cache Enabled for CPU0
External memory: cacheable in instruction cache
L2 SRAM : uncacheable in instruction cache
Data Cache Enabled for CPU0
External memory: cacheable (write-through) in data cache
L2 SRAM : uncacheable in data cache
Built 1 zonelists in Zone order, mobility grouping off. Total pages: 16002
Kernel command line: root=/dev/mtdblock0 rw ip=10.100.4.50 earlyprintk=serial,uart0,57600 console=ttyBF0,57600 ip=10.100.4.50:10.100.4.174:10.100.4.174:255.255.255.0:bf561-ezkit:eth0:off
PID hash table entries: 256 (order: -2, 1024 bytes)
Dentry cache hash table entries: 8192 (order: 3, 32768 bytes)
Inode-cache hash table entries: 4096 (order: 2, 16384 bytes)
Memory available: 56916k/65536k RAM, (5456k init code, 1068k kernel code, 472k data, 1024k dma, 600k reserved)
Hierarchical RCU implementation.
NR_IRQS:121
Configuring Blackfin Priority Driven Interrupts
console [ttyBF0] enabled, bootconsole disabled
console [ttyBF0] enabled, bootconsole disabled
Calibrating delay loop... 1187.84 BogoMIPS (lpj=2375680)
Mount-cache hash table entries: 512
CoreB bootstrap code to SRAM ff600000 via DMA.
Booting Core B.
Instruction Cache Enabled for CPU1
External memory: cacheable in instruction cache
L2 SRAM : uncacheable in instruction cache
Data Cache Enabled for CPU1
External memory: cacheable (write-through) in data cache
L2 SRAM : uncacheable in data cache
Brought up 2 CPUs
Calibrating delay loop...
SMP: Total of 2 processors activated (4.09 BogoMIPS).
Blackfin Scratchpad data SRAM: 4 KB
Blackfin Scratchpad data SRAM: 4 KB
Blackfin L1 Data A SRAM: 16 KB (16 KB free)
Blackfin L1 Data A SRAM: 16 KB (16 KB free)
Blackfin L1 Data B SRAM: 16 KB (16 KB free)
Blackfin L1 Data B SRAM: 16 KB (16 KB free)
Blackfin L1 Instruction SRAM: 16 KB (15 KB free)
Blackfin L1 Instruction SRAM: 16 KB (15 KB free)
Blackfin L2 SRAM: 128 KB (127 KB free)
NET: Registered protocol family 16
Blackfin DMA Controller
ezkit_init(): registering device resources
bio: create slab <bio-0> at 0
Switching to clocksource jiffies
NET: Registered protocol family 2
IP route cache hash table entries: 1024 (order: 0, 4096 bytes)
TCP established hash table entries: 2048 (order: 2, 16384 bytes)
TCP bind hash table entries: 2048 (order: 2, 16384 bytes)
TCP: Hash tables configured (established 2048 bind 2048)
2138.11 BogoMIPS (lpj=4276224)
TCP reno registered
NET: Registered protocol family 1
msgmni has been set to 111
io scheduler noop registered
io scheduler anticipatory registered (default)
bfin-uart: Blackfin serial driver
bfin-uart.0: ttyBF0 at MMIO 0xffc00400 (irq = 35) is a BFIN-UART
brd: module loaded
bfin-spi bfin-spi.0: Blackfin on-chip SPI Controller Driver, Version 1.0, regs_base@ffc00500, dma channel@16
smc91x.c: v1.1, sep 22 2004 by Nicolas Pitre <nico@fluxnic.net>
eth0: SMC91C11xFD (rev 2) at 2c010300 IRQ 82 [nowait]
eth0: Ethernet addr: 00:e0:22:fe:ba:2a
bfin-wdt: initialized: timeout=20 sec (nowayout=0)
TCP cubic registered
NET: Registered protocol family 17
eth0: link down
IP-Config: Complete:
device=eth0, addr=10.100.4.50, mask=255.255.255.0, gw=10.100.4.174,
host=bf561-ezkit, domain=, nis-domain=(none),
bootserver=10.100.4.174, rootserver=10.100.4.174, rootpath=
Freeing unused kernel memory: 5456k freed
dma_alloc_init: dma_page @ 0x02786000 - 256 pages at 0x03f00000
eth0: link up, 100Mbps, full-duplex, lpa 0x41E1
_____________________________________
a8888b. / Welcome to the uClinux distribution \
d888888b. / _ _ \
8P"YP"Y88 / | | |_| __ __ (TM) |
8|o||o|88 _____/ | | _ ____ _ _ \ \/ / |
8' .88 \ | | | | _ \| | | | \ / |
8`._.' Y8. \ | |__ | | | | | |_| | / \ |
d/ `8b. \ \____||_|_| |_|\____|/_/\_\ |
dP . Y8b. \ For embedded processors including |
d8:' " `::88b \ the Analog Devices Blackfin /
d8" 'Y88b \___________________________________/
:8P ' :888
8a. : _a88P For further information, check out:
._/"Yaa_: .| 88P| - http://blackfin.uclinux.org/
\ YP" `| 8P `. - http://docs.blackfin.uclinux.org/
/ \.___.d| .' - http://www.uclinux.org/
`--..__)8888P`._.' jgs/a:f - http://www.analog.com/blackfin
Have a lot of fun...
BusyBox v1.15.3 (2010-01-07 04:40:08 GMT) hush - the humble shell
root:/> version
kernel: Linux release 2.6.32.2-ADI-2010R1-pre-svn8124, build #40 SMP Thu Jan 7 11:54:37 GMT 2010
toolchain: bfin-uclinux-gcc release gcc version 4.3.4 (ADI-trunk/svn-3771)
user-dist: release svn-9347, build #461 Thu Jan 7 11:53:29 GMT 2010
root:/> successful boot attempt
************** STEP 3: Start testing.
uname -a
Linux blackfin 2.6.32.2-ADI-2010R1-pre-svn8124 #40 SMP Thu Jan 7 11:54:37 GMT 2010 blackfin GNU/Linux
root:/> malloc-perf 120
00004k : 0x02772004 000000 000000 000000
00008k : 0x020a0004 000000 000000 000000
00012k : 0x020b4004 000000 000031 004000
00016k : 0x02948004 000000 000000 000000
00020k : 0x02950004 000000 000000 000000
00024k : 0x02958004 000000 000000 000000
00028k : 0x02960004 000000 000000 000000
00032k : 0x02968004 000000 000000 000000
00036k : 0x02970004 000000 000062 004000
00040k : 0x02970004 000000 000093 004000
00044k : 0x02970004 000000 000031 004000
00048k : 0x02970004 000000 000031 004000
00052k : 0x02970004 000000 000031 004000
00056k : 0x02970004 000000 000031 004000
00060k : 0x02970004 000000 000031 004000
00064k : 0x02970004 000000 000000 000000
00068k : 0x02a00004 000000 000062 004000
00072k : 0x02a00004 000000 000062 004000
00076k : 0x02a00004 000000 000062 004000
00080k : 0x02a00004 000000 000031 004000
00084k : 0x02a00004 000000 000031 004000
00088k : 0x02a00004 000000 000062 004000
00092k : 0x02a00004 000000 000031 004000
00096k : 0x02a00004 000000 000031 004000
00100k : 0x02a00004 000000 000031 004000
00104k : 0x02a00004 000000 000062 004000
00108k : 0x02a00004 000000 000062 004000
00112k : 0x02a00004 000000 000031 004000
00116k : 0x02a00004 000000 000062 004000
00120k : 0x02a00004 000000 000062 004000
00124k : 0x02a00004 000000 000031 004000
00128k : 0x02a00004 000000 000031 004000
00256k : 0x02a00004 000000 000031 004000
00384k : 0x02a00004 000000 000125 004000
00512k : 0x02a00004 000000 000031 004000
00640k : 0x02a00004 000000 000125 004000
00768k : 0x02a00004 000000 000156 004000
00896k : 0x02a00004 000000 000125 004000
01024k : 0x02a00004 000000 000156 004000
TEST FAIL
Follow-ups
--- Graf Yang 2010-01-30 07:47:22
The SMP kernel performance is a bit lower than the UP kernel. If it need 20%-30%
more time to finish the malloc performance test, I think it is normal.
Can you point out a probable time that the fail begins frequently?
--- Mingquan Pan 2010-02-01 04:29:00
The log of Nov 27 has much better result, it can pass 9 of 10 times.
root:/> version
kernel: Linux release 2.6.31.6-ADI-2010R1-pre-svn7883, build #105 SMP Fri
Nov 27 09:24:48 GMT 2009
toolchain: bfin-uclinux-gcc release gcc version 4.3.4 (ADI-trunk/svn-3679)
user-dist: release svn-9185, build #1192 Fri Nov 27 09:23:32 GMT 2009
root:/> successful boot attempt
************** STEP 3: Start testing.
root:/> malloc-perf 120
00004k : 0x02bfb004 000000 000000 000000
00008k : 0x02b6e004 000000 000000 000000
00012k : 0x02bec004 000000 000000 000000
00016k : 0x020cc004 000000 000000 000000
00020k : 0x02ae8004 000000 000000 000000
00024k : 0x02a68004 000000 000000 000000
00028k : 0x02960004 000000 000000 000000
00032k : 0x029e8004 000000 000031 004000
00036k : 0x02a70004 000000 000031 004000
00040k : 0x02a70004 000000 000000 000000
00044k : 0x02a70004 000000 000062 004000
00048k : 0x02a70004 000000 000062 004000
00052k : 0x02a70004 000000 000062 004000
00056k : 0x02a70004 000000 000031 004000
00060k : 0x02a70004 000000 000000 000000
00064k : 0x02a70004 000000 000031 004000
00068k : 0x02840004 000000 000031 004000
00072k : 0x02840004 000000 000062 004000
00076k : 0x02840004 000000 000062 004000
00080k : 0x02840004 000000 000062 004000
00084k : 0x02840004 000000 000031 004000
00088k : 0x02840004 000000 000031 004000
00092k : 0x02840004 000000 000000 000000
00096k : 0x02840004 000000 000093 004000
00100k : 0x02840004 000000 000093 004000
00104k : 0x02840004 000000 000031 004000
00108k : 0x02840004 000000 000062 004000
00112k : 0x02840004 000000 000093 004000
00116k : 0x02840004 000000 000093 004000
00120k : 0x02840004 000000 000000 000000
00124k : 0x02840004 000000 000000 000000
00128k : 0x02840004 000000 000000 000000
00256k : 0x02800004 000000 000062 004000
00384k : 0x02c00004 000000 000000 000000
00512k : 0x02c00004 000000 000125 004000
00640k : 0x02c00004 000000 000187 004000
00768k : 0x02c00004 000000 000000 000000
00896k : 0x02c00004 000000 000062 004000
01024k : 0x02c00004 000000 000000 000000
TEST PASS
--- Graf Yang 2010-02-01 22:43:23
I found the new SMP kernel has lower malloc performance over the old(2009R1 for
example). I'd dig out the reason.
--- Graf Yang 2010-02-03 06:26:16
Not bug.
Since kernel updated to 2.6.32, the function prep_new_page(struct page *page,
int order, gfp_t gfp_flags) will check every page, other than only check the
first page. This makes the malloc-perf will take double time when malloc
memory.
So I suggest double the threshold to 200,
malloc-perf 200
BTW, this test need enable at least one CLOCKSOURCE options.
--- Graf Yang 2010-02-03 21:16:13
Current SMP kernel 180, UP 90 should pass the test.
--- Mingquan Pan 2010-02-11 03:50:33
After increasing the param, it can pass now.
root:/> malloc-perf 180^M
00004k : 0x02775004 000003 000004 000046^M
00008k : 0x027ea004 000003 000004 000040^M
00012k : 0x0289c004 000003 000004 000039^M
00016k : 0x027ea004 000003 000004 000046^M
00020k : 0x027ea004 000003 000003 000006^M
00024k : 0x027ea004 000003 000004 000030^M
00028k : 0x028c0004 000004 000004 000054^M
00032k : 0x028c8004 000031 000033 000086^M
00036k : 0x028d0004 000034 000037 000105^M
00040k : 0x028d0004 000034 000037 000097^M
00044k : 0x028d0004 000035 000037 000070^M
00048k : 0x028d0004 000034 000038 000108^M
00052k : 0x028d0004 000034 000046 001117^M
00056k : 0x028d0004 000034 000037 000103^M
00060k : 0x028d0004 000035 000037 000092^M
00064k : 0x028d0004 000035 000037 000106^M
00068k : 0x028e0004 000041 000044 000104^M
00072k : 0x028e0004 000041 000045 000193^M
00076k : 0x028e0004 000041 000046 000110^M
00080k : 0x028e0004 000041 000045 000111^M
00084k : 0x028e0004 000041 000044 000113^M
00088k : 0x028e0004 000041 000044 000101^M
00092k : 0x028e0004 000040 000044 000102^M
00096k : 0x028e0004 000040 000043 000113^M
00100k : 0x028e0004 000041 000045 000115^M
00104k : 0x028e0004 000041 000044 000098^M
00108k : 0x028e0004 000041 000045 000118^M
00112k : 0x028e0004 000041 000045 000110^M
00116k : 0x028e0004 000041 000045 000112^M
00120k : 0x028e0004 000041 000045 000138^M
00124k : 0x028e0004 000041 000045 000103^M
00128k : 0x028e0004 000041 000045 000104^M
00256k : 0x02a00004 000055 000068 001129^M
00384k : 0x02a00004 000081 000091 000164^M
00512k : 0x02a00004 000081 000091 000169^M
00640k : 0x02a00004 000134 000145 000206^M
00768k : 0x02a00004 000134 000148 000214^M
00896k : 0x02a00004 000134 000150 000243^M
01024k : 0x02a00004 000134 000149 000276^M
TEST PASS^M
root:/> malloc-perf pass
Files
Changes
Commits
Dependencies
Duplicates
Associations
Tags
File Name File Type File Size Posted By
No Files Were Found