[#5968] __alloc_pages_internal() may loop endlessly under certain conditions
Submitted By: Enrik Berkhan
Open Fixed In Release:
Found In Release:
BF561 Silicon Revision:
Is this bug repeatable?:
Uboot version or rev.:
Toolchain version or rev.:
App binary format:
Summary: __alloc_pages_internal() may loop endlessly under certain conditions
During system load testing, our systems sometimes hang forever in __alloc_pages_internal() even though plenty of memory was free. The hanging processes could be made work again by "some external event" like telnet login.
The hanging processes called __alloc_pages_internal() from ext4 code having __GFP_FS cleared in gfp_mask intentionally. First, I had suspected ext4, so you can find some details on the ext4 list: http://marc.info/?l=linux-ext4&m=126597928719941&w=2
I think one of the reasons for the behavior is calling drop_pagecache() (Blackfin specific addition, which helps a lot generally, BTW) in __alloc_pages_internal(), which can lead to try_to_free_pages() return 0 repeatedly. That in turn can trigger endless retrying in __alloc_free_pages().
My current fix is to call get_page_from_freelist() after each call to drop_pagecache() again, because the probability of getting the pages then seems to be very high.
--- Sonic Zhang 2010-03-29 06:44:01
You suggested workaround introduces extra performance drop.
get_page_from_freelist() is already called if your gfp_mask has __GFP_FS set.
How about only call get_page_from_freelist() after drop_pagecache() with
__GFP_FS is not set?
--- Sonic Zhang 2010-03-29 06:58:28
I think it is better to do drop_pagecache() after try_to_free_pages() in
__alloc_pages_internal() to avoid endless loop from scratch.
Could you try the attached patch?
File Name File Type File Size Posted By
nommu_drop_pagecache_after_try_to_free_pages.patch text/x-patch 1256 Sonic Zhang
ext4-oom-endless-loop-workaround.diff text/x-diff 984 Enrik Berkhan