2007-08-28 17:06:28     Kernel panic while exercising compact flash.

Document created by Aaronwu Employee on Aug 5, 2013
Version 1Show Document
  • View in full screen mode

2007-08-28 17:06:28     Kernel panic while exercising compact flash.

Greg Ferrari (UNITED STATES)

Message: 43516    First some background.  I have a BF537 with a compact flash.  It is connected to a network and is mounting a disk from another machine.  The compact flash has three slices: hda1 contains the kernel image from which I boot, hda2 contains an ext2 filesystem and is mounted read/write, hda3 also contains an ext2 filesystem that I use during various tests.  I am running with the 2007R1.1-RC3 kernel and toolchain.

 

I have a test script that exercises the compact flash to make sure it is working properly (there have been issues in the past).  The script places a filesystem on hda3, mounts it to /mnt, copies a bunch of files of various sizes one at a time from the network drive to /mnt, unmounts /mnt, performs an MD5 checksum on hda1 (the slice containing my kernel image), and remounts hda3 to /mnt.  At various points during the file copies a sync is performed.  It does this in a loop forever or until I hit ctrl-C.

 

In previous versions of the kernel the MD5 checksum would change with each iteration even though hda1 was never modified.  This issues appears to have been fixed recently in an SVN version of the kernel and the latest 2007R1.1-RC3 kernel.  But I am now seeing kernel panics while running my script.  Sometimes the kernel panics in the first iteration of my script, other times it makes it through several iterations before panicing.  When it panics I've seen the dump show kswapd and cp as the process that caused problems.  There are probably others.  Here is some output from my latest test that didn't show a process at all but did indicate a "kernel BUG"...

 

 

Hardware Trace:

0 Target : <0x00004618> { _dump_stack + 0x0 }

   Source : <0x00031b7a> { _free_pages_bulk + 0x1c6 }

1 Target : <0x00031b7a> { _free_pages_bulk + 0x1c6 }

   Source : <0x00031a8c> { _free_pages_bulk + 0xd8 }

2 Target : <0x000319dc> { _free_pages_bulk + 0x28 }

   Source : <0x00031b20> { _free_pages_bulk + 0x16c }

3 Target : <0x00031ae4> { _free_pages_bulk + 0x130 }

   Source : <0x00031bac> { _free_pages_bulk + 0x1f8 }

4 Target : <0x00031ba8> { _free_pages_bulk + 0x1f4 }

   Source : <0x00031a5a> { _free_pages_bulk + 0xa6 }

5 Target : <0x000319dc> { _free_pages_bulk + 0x28 }

   Source : <0x00031b20> { _free_pages_bulk + 0x16c }

6 Target : <0x00031ae4> { _free_pages_bulk + 0x130 }

   Source : <0x00031bac> { _free_pages_bulk + 0x1f8 }

7 Target : <0x00031ba8> { _free_pages_bulk + 0x1f4 }

   Source : <0x00031a5a> { _free_pages_bulk + 0xa6 }

8 Target : <0x000319dc> { _free_pages_bulk + 0x28 }

   Source : <0x00031b20> { _free_pages_bulk + 0x16c }

9 Target : <0x00031ae4> { _free_pages_bulk + 0x130 }

   Source : <0x00031bac> { _free_pages_bulk + 0x1f8 }

10 Target : <0x00031ba8> { _free_pages_bulk + 0x1f4 }

   Source : <0x00031a5a> { _free_pages_bulk + 0xa6 }

11 Target : <0x000319dc> { _free_pages_bulk + 0x28 }

   Source : <0x00031b20> { _free_pages_bulk + 0x16c }

12 Target : <0x00031ae4> { _free_pages_bulk + 0x130 }

   Source : <0x00031bac> { _free_pages_bulk + 0x1f8 }

13 Target : <0x00031ba8> { _free_pages_bulk + 0x1f4 }

   Source : <0x00031a5a> { _free_pages_bulk + 0xa6 }

14 Target : <0x000319b4> { _free_pages_bulk + 0x0 }

   Source : <0x00031c8a> { _free_hot_cold_page + 0xce }

15 Target : <0x00031c7e> { _free_hot_cold_page + 0xc2 }

   Source : <0x00031c62> { _free_hot_cold_page + 0xa6 }

Stack from 00f73c44:

        ffa010da 00031b7e 001b3dd0 00000001 002fb960 0000001b 0002cb84 00000004

        00178f24 00031c8e 00145a34 0027d770 00f73d88 0000ffff 00145a18 00180c8c

        00f73d5c 0008b01e 00000007 00145a40 00000000 00031cca 00f73cf8 00f73cd8

        00000002 00000009 002ffee0 002f6c10 00000000 0002cc92 0027d798 00f73d18

        00000009 00000009 00000000 00000000 00000000 00000009 00000000 001ad278

        002851a0 002851c8 002a5cc0 002a5ce8 001ad930 001ad958 0027d770 0027d798

Call Trace:

[<0005037c>] _try_to_free_buffers+0x44/0xa0

[<0002d0c0>] ___pagevec_release+0x18/0x24

[<0002d834>] _truncate_inode_pages_range+0x1d4/0x2b8

[<0002d92a>] _truncate_inode_pages+0x12/0x1c

[<00046034>] _dispose_list+0x8c/0xa8

[<0004628a>] _invalidate_inodes+0x96/0xac

[<0003730c>] _generic_shutdown_super+0x5c/0x114

[<000373de>] _kill_block_super+0x1a/0x2c

[<000479c0>] _mntput_no_expire+0x34/0x84

[<0003747e>] _deactivate_super+0x42/0x5c

[<00037484>] _deactivate_super+0x48/0x5c

[<00048b86>] _sys_umount+0x112/0x1a4

[<00048aa2>] _sys_umount+0x2e/0x1a4

[<0004b07c>] _seq_release+0x18/0x24

[<00002000>] _arch_ptrace+0x8c/0x460

[<00034114>] _filp_close+0x40/0x68

[<00034114>] _filp_close+0x40/0x68

[<0003874c>] _sys_readlink+0x1c/0x24

[<00048a74>] _sys_umount+0x0/0x1a4

[<00134380>] _init_currently_empty_zone+0xc/0x130

[<00134380>] _init_currently_empty_zone+0xc/0x130

[<00008000>] _irq_panic+0x4c/0x144

[<00001000>] __start+0x0/0x164

 

 

kernel BUG at mm/page_alloc.c:362!

Kernel panic - not syncing: BUG!

 

 

Here's another panic that occurred in the sixth iteration of my loop.  In this case it shows cp as being the culprit and the output continuously displays.  It looks like the kernel got stuck in an infinite loop displaying the panic.

 

 

NULL pointer access (probably)

 

CURRENT PROCESS:

 

COMM=cp PID=836

TEXT = 0x00500040-0x0054a3a0  DATA = 0x0054a3a4-0x0055dd04

BSS = 0x0055dd04-0x00563ed4   USER-STACK = 0x0057fd30

 

return address: 0x00028730; contents of [PC-16...PC+8]:

 

 

RETE:  00000000  RETN: 0601fc40  RETX: 00028730  RETS: 00028690

IPEND: 8030  SYSCFG: 0036

SEQSTAT: 00060027    SP: 0601fb64

R0: 00000000    R1: 0000ffff    R2: 00000000    R3: 0000003f

R4: 00001000    R5: 0601fc8c    R6: 0093f1d0    R7: 0000000d

P0: 063f1468    P1: 00000020    P2: 00000000    P3: 00145a18

P4: 063f1450    P5: 063f1470    FP: 00178f24

A0.w: 00000000    A0.x: 00000000    A1.w: 00000000    A1.x: 00000000

LB0: ffa01e88  LT0: ffa01e88  LC0: 00000000

LB1: 00050b30  LT1: 00050b2c  LC1: 00000000

B0: cccccccd  L0: 00000000  M0: 00000001  I0: 0601fab8

B1: 009ff9b8  L1: 00000000  M1: 00000000  I1: 00000000

B2: 00000001  L2: 00000000  M2: 00180c68  I2: 00000000

B3: 0704bbb0  L3: 00000000  M3: 00000001  I3: 00000001

 

USP: 0057ea50   ASTAT: 02003024

DCPLB_FAULT_ADDR=00000004

ICPLB_FAULT_ADDR=00028730

 

 

Hardware Trace:

0 Target : <0x000046d4> { _trap_c + 0x0 }

   Source : <0xffa00b3c> { _exception_to_level5 + 0xb4 }

1 Target : <0xffa00a88> { _exception_to_level5 + 0x0 }

   Source : <0xffa00a86> { _ex_trap_c + 0x4e }

2 Target : <0xffa00a38> { _ex_trap_c + 0x0 }

   Source : <0xffa00bdc> { _trap + 0x28 }

3 Target : <0xffa00bb4> { _trap + 0x0 }

   Source : <0xffa01142> { __common_int_entry + 0xd8 }

4 Target : <0xffa010e0> { __common_int_entry + 0x76 }

   Source : <0xffa00ea8> { _return_from_int + 0x58 }

5 Target : <0xffa00ea8> { _return_from_int + 0x58 }

   Source : <0xffa00e7e> { _return_from_int + 0x2e }

6 Target : <0xffa00e50> { _return_from_int + 0x0 }

   Source : <0xffa010dc> { __common_int_entry + 0x72 }

7 Target : <0xffa010da> { __common_int_entry + 0x70 }

   Source : <0xffa0048a> { _asm_do_IRQ + 0x76 }

8 Target : <0xffa00482> { _asm_do_IRQ + 0x6e }

   Source : <0x00010d2e> { __local_bh_enable + 0x3e }

9 Target : <0x00010cf0> { __local_bh_enable + 0x0 }

   Source : <0x00010fa8> { ___do_softirq + 0x94 }

10 Target : <0x00010fa0> { ___do_softirq + 0x8c }

   Source : <0x00010f80> { ___do_softirq + 0x6c }

11 Target : <0x00010f72> { ___do_softirq + 0x5e }

   Source : <0x000be4e4> { _net_rx_action + 0xa0 }

12 Target : <0x000be4b4> { _net_rx_action + 0x70 }

   Source : <0x000be440> { _process_backlog + 0x120 }

13 Target : <0x000be432> { _process_backlog + 0x112 }

   Source : <0x000be42c> { _process_backlog + 0x10c }

14 Target : <0x000be3e8> { _process_backlog + 0xc8 }

   Source : <0x000be36e> { _process_backlog + 0x4e }

15 Target : <0x000be368> { _process_backlog + 0x48 }

   Source : <0x000be3c8> { _process_backlog + 0xa8 }

Stack from 0601fb44:

        00000008 ffa00b40 0014270c 0014270c 0014270c 0087a858 00050490 061528e0

        00028730 00008030 00060027 00000000 0601fc40 00028730 00028730 00028690

        00000000 02003024 00050b30 ffa01e88 00050b2c ffa01e88 00000000 00000000

        00000000 00000000 00000000 00000000 0704bbb0 00000001 009ff9b8 cccccccd

        00000000 00000000 00000000 00000000 00000001 00180c68 00000000 00000001

        00000001 00000000 00000000 0601fab8 0057ea50 00178f24 063f1470 063f1450

Call Trace:

[<00001000>] __start+0x0/0x164

[<0000ffff>] _do_setitimer+0x25b/0x520

[<00051708>] ___block_prepare_write+0x270/0x420

[<00029922>] _generic_file_buffered_write+0x3b2/0x52c

[<00001000>] __start+0x0/0x164

[<000283ec>] _sync_page+0x2c/0x34

[<00001000>] __start+0x0/0x164

[<00001000>] __start+0x0/0x164

[<00001000>] __start+0x0/0x164

[<00103c24>] _ext2_aops+0x0/0x38

[<00001000>] __start+0x0/0x164

[<00028d5c>] _do_generic_mapping_read+0x334/0x444

[<00010bdc>] _current_fs_time+0x38/0x40

[<00001000>] __start+0x0/0x164

[<00029c72>] ___generic_file_aio_write_nolock+0x1d6/0x400

[<00001000>] __start+0x0/0x164

[<00001000>] __start+0x0/0x164

[<00001000>] __start+0x0/0x164

[<0002a59e>] _generic_file_aio_read+0xa6/0x18c

[<00029ef0>] _generic_file_aio_write+0x54/0xe0

[<000357c4>] _do_sync_write+0xb0/0xe8

[<00001000>] __start+0x0/0x164

[<0001d530>] _autoremove_wake_function+0x0/0x30

[<0001d530>] _autoremove_wake_function+0x0/0x30

[<00001000>] __start+0x0/0x164

[<00001000>] __start+0x0/0x164

[<00001000>] __start+0x0/0x164

[<00035fc6>] _vfs_write+0x82/0x14c

[<00001000>] __start+0x0/0x164

[<00036532>] _sys_write+0x32/0x64

[<00001000>] __start+0x0/0x164

[<00036500>] _sys_write+0x0/0x64

[<00001000>] __start+0x0/0x164

[<00008000>] _irq_panic+0x4c/0x144

[<00001000>] __start+0x0/0x164

[<00001000>] __start+0x0/0x164

[<00001000>] __start+0x0/0x164

 

 

I suspect that my kernel panics are not really related to exercising the compact flash, but are really caused by some memory management problems in the kernel.  Copying all these files around is just a fast way of running into the bug.  I also see similar kernel panics with some of my applications that consume relatively large amounts of memory.  These applications run fine on my desktop machine and don't cause valgrind any heartburn.  I was not running any of my own applications during the tests mentioned above however.

 

Does anyone out there have any idea what could be going wrong?  Any advise on how to better diagnose the problem?

QuoteReplyEditDelete

 

 

2007-08-28 20:03:23     Re: Kernel panic while exercising compact flash.

Robin Getz (UNITED STATES)

Message: 43521    Greg:

 

The only major difference that I can think of in the new R1.1 is the page cache limiter, Can you try various values (100, 50, 10) in /proc/sys/vm/pagecache_ratio to check?

 

Thanks

-Robin

QuoteReplyEditDelete

 

 

2007-08-28 20:04:55     Re: Kernel panic while exercising compact flash.

Robin Getz (UNITED STATES)

Message: 43522    Greg:

 

If you could detail a little more of your tests, we can try to replicate the failure & ensure we don't run into the same things in the future.

 

-Robin

QuoteReplyEditDelete

 

 

2007-08-29 09:53:58     Re: Kernel panic while exercising compact flash.

Greg Ferrari (UNITED STATES)

Message: 43570    Robin:

 

It's funny that you mention the page cache limiter.  I did indeed turn that on in this release because I was seeing my applications running out of memory.  After investigating the cause I discovered that the kernel was holding on to memory way to aggressively and starving my applications.  So I turned on the limiter, set the total page cache limit to 10MB, and set the page cache limit for every file to 32 pages.  That solved my problem.

 

I tried the three page cache ratio values you suggested by editing /proc/sys/vm/pagecache_ratio file directly (I hope that was right).  I still saw kernel panics.

 

I have attached two files with this reply.  The first is the script I use to test the CF and cause kernel panics.  The second is a text file with details of my environment.

 

Please let me know if you are successful at recreating the problem.  I will continue to try and isolate the problem to a very small and specific test case.  If there is anything else you would like me to try, please let me know.

 

Thanks for your help.

 

Greg

cfTest.txt

cfTest.adi

QuoteReplyEditDelete

 

 

2007-08-29 11:26:32     Re: Kernel panic while exercising compact flash.

Greg Ferrari (UNITED STATES)

Message: 43578    Robin:

 

Here's a simpler version of my script that is still causing kernel panics.  In this version I have removed all of the file copies from my NFS disk and replaced it with a loop that copies all the files in /bin to my CF.  This eliminates all the NFS traffic and focuses on writing to CF.  I got the same "kernel BUG" mentioned earlier.

 

Greg

cfTest.noNFS

QuoteReplyEditDelete

 

 

2007-08-29 14:57:59     Re: Kernel panic while exercising compact flash.

Greg Ferrari (UNITED STATES)

Message: 43582    Robin:

 

Here's even more information.  I have attached the kernel and user configurations from 'make menuconfig' as well as the boot sequence of my development board.  Note that I am running on a 128MB development board (part #ADZS-BF537-EZLITE) with silicon rev 0.3 and an ADI compact flash daughter card.  Could the fact that I have 128MB instead of the more common 64MB have anything to do with it?

 

Greg

bootsequence.txt

user.config

kernel.config

QuoteReplyEditDelete

 

 

2007-09-28 11:44:22     Re: Kernel panic while exercising compact flash.

Michael Hennerich (GERMANY)

Message: 44692    Greg,

 

   [*]   Limit page caches

    (10)    Total page cache limit in MB unit

    (32)    Page cache limit for every file in page unit

 

You should NOT set this option.

Its buggy and causing the issues that you are seeing.

 

 

Simply do.

root:~> echo 10 > /proc/sys/vm/pagecache_ratio

 

I also noticed that you are running SCLK @ 100MHz.

This can also cause issues with some CF cards.

 

I also checked some patches for IDE driver into svn.

(svn trunk and branch 2007R1)

 

Your test script runs now since hours on two boards without any faults.

 

Please update your kernel sources.

 

-Michael

QuoteReplyEditDelete

 

 

2008-01-30 21:32:51     Re: Kernel panic while exercising compact flash.

Wang Xiang (CHINA)

Message: 50408   

 

Hello,Michael

 

1) Could you tell me how to switch off the set below?

 

[*]   Limit page caches

    (10)    Total page cache limit in MB unit

    (32)    Page cache limit for every file in page unit

 

I do not know where the set is in menuconfig?

 

2)  I can not find the pagecache_ratio in /proc/sys/vm; Which should be set in menuconfig?

 

 

 

Regards

 

Wang Xiang

 

 

QuoteReplyEditDelete

 

 

2008-01-31 03:13:00     Re: Kernel panic while exercising compact flash.

Yi Li (CHINA)

Message: 50417    > 1) Could you tell me how to switch off the set below?

These options does not exist in newer kernel (e.g, linux-2.6.22 and later).

> 2)  I can not find the pagecache_ratio in /proc/sys/vm; Which should be set in menuconfig?

Please use newer release- e.g, 2008R1 is using linux-2.6.22.

Here is mine:

root:/> ls /proc/sys/vm/                                                                                  

block_dump                 nr_pdflush_threads                                                             

dirty_background_ratio     overcommit_memory                                                              

dirty_expire_centisecs     overcommit_ratio                                                               

dirty_ratio                page-cluster                                                                   

dirty_writeback_centisecs  pagecache_ratio                                                                

drop_caches                panic_on_oom                                                                   

laptop_mode                percpu_pagelist_fraction                                                       

lowmem_reserve_ratio       swappiness                                                                     

min_free_kbytes            vfs_cache_pressure   

Attachments

    Outcomes