2010-12-17 06:55:09 Page alloc failure since upgrade to 2010R1-RC5
Kolja Waschk (GERMANY)
Message: 96843
Hi,
I experience another problem after upgrading my dist base from 2009R1.1 to 2010R1-RC5.
Under certain circumstances, an application tries to allocate a large block of memory (1,2 MB). It was always successful on the old platform, even when repeating the test a lot of times. Now after the upgrade, it fails as early as during the second or third test run. And even worse, the outcome is not just a NULL pointer from malloc or exception from new[] (it is a C++ app), but a kernel page allocation failure. Afterwards, the serial console on which the application was started is unusable (it doesn't accept and echo any more chars sent to it), but the system actually behaves well when contacted via telnet. All the memory occupied by the application is freed (according to /proc/meminfo).
This happens with Xenomai (in the kernel and in use by the application) and I'm currently examining whether it also happens without. But maybe someone already has some idea about possible causes? For now, I tried SLAB and SLOB allocators but that didn't make any difference. Output from /proc/meminfo and /proc/stats looks quite the same on old and new system.
Thanks in advance
Kolja
QuoteReplyEditDelete
2010-12-17 08:27:49 Re: Page alloc failure since upgrade to 2010R1-RC5
Kolja Waschk (GERMANY)
Message: 96844
The "crash" probably is/was caused by bad error handling in my application. The question remains why the allocation fails more often on the new dist, and what I could do to avoid it...
QuoteReplyEditDelete
2010-12-17 09:04:42 Re: Page alloc failure since upgrade to 2010R1-RC5
Mike Frysinger (UNITED STATES)
Message: 96846
your app probably changed the stdio behavior, or part of it is still hung. use `reset` or kill/logout of the console shell to have it restart.
try playing with the NOMMU_INITIAL_TRIM_EXCESS option in the kernel to see if that makes things perform better.
QuoteReplyEditDelete
2010-12-20 15:36:22 Re: Page alloc failure since upgrade to 2010R1-RC5
Kolja Waschk (GERMANY)
Message: 96886
Hi,
Disabling the INITIAL_TRIM_EXCESS doesn't seem to affect the behaviour. I've rewritten parts of the application that allocated so large chunks at once, but one large buffer still needs to be allocated this way at startup (as a memory pool to minimize fragmentation...) and that still fails when starting the same app a second time (after the first instance terminated) under certain circumstances. I feel that because this bites me so hard now and so unexpectedly (I had absolutely never experienced anything similar with previous dist) there must be a problem in the environment somewhere. Not necessarily in the allocator. There's also a lot of writing to ramdisk, JFFS2, and lots of network traffic involved; I'll try to minimize the application tomorrow to isolate the cause.
It may as well be related to the other problem I reported about the irritating Xenomai "kernel is anterior to 2.5.2" warning; on the other hand, it occured without Xenomai as well. Maybe the problems are related in the other way.
The problems with terminal output occur especially whenever I interrupt the app with Ctrl-C. Sometimes it seems to emit a page full of text that was printed already just before. All this didn't happen with the 2009R1.1 dist as a base.
I'll report my findings here (or in the bug tracker, if appropriate)
Kolja