2010-05-13 15:50:31 Crazy problem with FAT filesystem and pthreads
Ian Jeffray (UNITED KINGDOM)
Message: 89388
This post's a little vague I'm afraid - I've been working all week on a bug that's been biting me regarding FAT/FAT32 file systems and access from pthreads on a BF527 board... would welcome opinion and input if anyone has clues;
I've reduced the problem as much as I can, to this - I have a FAT or FAT32 filesystem with 1 folder on it, containing a few files. Nothing special. I've used a USB memory stick and a SPI SD card to reproduce this, so it's not the physical layers causing an issue here. I've built my code with mudflapth, stackchecking etc and every possible debug option in the kernel enabled (except Blackfin MPU which just seems to make the kernel die early on). I create several pthreads, basically just sitting there doing for(;;) nanosleep(10) to keep the scheduler busy. I then create another pthread, and have it doing a simple two level opendir() readdir() of the contents of the FS. And I generally get garbage returned at the end of the buffer in the first readdir() call (garabge returned from the getdents() syscall) or the system crashes and corrupts one of the other pthread stacks. Without a bunch of pthreads running, or if I do the opendir() readdir() stuff from the main thread rather than pthread, all is well.
This only happens with FAT and FAT32 filesystems, only with large cluster sizes (32K, 64K) and only on Blackfin uClinux as far I as I can tell so far. Certainly can't get it to happen on an ARM or x86 system, and can't get it to happen with small cluster size FATs.
Sounds crazy I know... but it's a pretty severe issue for me.
I've already had issues with the FAT code on Blackfin - doesn't really work at all when built with the 64bit toolchain for some reason - never got to the bottom of that. Can't help but wonder if the two issues are actually the same - some crazy little bug in the FAT code that somehow only bites on Blackfin in certain scenarios.
Suggestions most welcomed.
QuoteReplyEditDelete
2010-05-14 07:06:58 Re: Crazy problem with FAT filesystem and pthreads
Ian Jeffray (UNITED KINGDOM)
Message: 89405
To follow up on my own posting, after help from friends, we think this is actually a bug (feature) of uclibc. The implementation of readdir() causes a monsterous stack alloca() of the file system block size to happen... which is why we only see the problem happening with large cluster size FAT systems... and why it's worse when called from a pthread, and why I don't see it on systems that can do stack extension.
This seems a very poor design for uclibc to me (I've opened a thread on the toolchain forum about it) ... but at least everything is making some sense now... and it's not a kernel bug :-)
QuoteReplyEditDelete
2010-05-14 07:52:33 Re: Crazy problem with FAT filesystem and pthreads
Pawel Zdunczuk (POLAND)
Message: 89406
I have the same problem, but it solved at now.
My platform: BF527, 32MB RAM, Linux release 2.6.28.10-ADI-2009R1.1
The problem is with FAT32 only:
File size show error - some of files have the size size as previous files on disc;
Sometimes system automatically changes file names form long to short - after 'sync' command;
Sometimes appears additional file, like 'a1'.
SD card used with my linux board, after creating some files, was not recognized by windows.
In other words - it looks like a corruption in file allocation table.
I was try working with mmcspi and ramdisc - effect is the same.
The problem also appears with very small volumes: 4MB ram disk (FAT12, 8159 sectors, 4 sectors per cluster).
And the solution: using toolchain i386 rather than x86_64 is a medicine for vfat problem.
Now all works fine. I'm working at now with uSD card 16GB.
I think that problem is on x86_64 toolchain (and some times - very rarely - i have problem with recurrence of the same compilation - but this is other story).
Best Regards.
QuoteReplyEditDelete
2010-05-17 02:40:11 Re: Crazy problem with FAT filesystem and pthreads
Wolfgang Muees (GERMANY)
Message: 89468
Ian,
the problem with the alloca() call in uclibc is old, and is discussed several times in this forum, but nobody from AD has fixed it in the tree up to now. I have posted a fix, but **** - how do you grep for all posts of an author at this forum?
regards
Wolfgang
TranslateQuoteReplyEditDelete
2010-05-17 04:03:51 Re: Crazy problem with FAT filesystem and pthreads
Wolfgang Muees (GERMANY)
Message: 89475
Ian,
read this:
blackfin.uclinux.org/gf/forummessage/86389
regards
Wolfgang
TranslateQuoteReplyEditDelete
2010-05-17 04:13:06 Re: Crazy problem with FAT filesystem and pthreads
Wolfgang Muees (GERMANY)
Message: 89477
I have opened a bug tracker now:
https://blackfin.uclinux.org/gf/tracker/6031
TranslateQuoteReplyEditDelete
2010-05-19 04:50:44 Re: Crazy problem with FAT filesystem and pthreads
Xin Xin (CHINA)
Message: 89535
Yes, use the i386 toolchain instead x64, I met the problem last year...
QuoteReplyEditDelete
2010-05-20 19:03:52 Re: Crazy problem with FAT filesystem and pthreads
Ian Jeffray (UNITED KINGDOM)
Message: 89604
There are two separate problems, Xin. The alloca() problem is what hurts me most... should be quite simple to fix. I would even submit a patch if anyone wished
The second problems, with vfat returning garbage when built with a 64bit compiler, is definitely a completely separate problem, because I still get this issue when using my own calls to getdents() to avoid the first problem. I've spent quite a bit of time now trying to track this issue, but cannot pinpoint it clearly yet. Mike has replicated this compiler issue and opened a tracker for it apprently (but I couldn't see where?)
QuoteReplyEditDelete
2010-05-21 09:04:42 Re: Crazy problem with FAT filesystem and pthreads
Pawel Zdunczuk (POLAND)
Message: 89637
I'm really interested in alloca() patch.
Did alloca() problem exist only with BF527, or with other BF ?
QuoteReplyEditDelete
2010-05-21 09:56:36 Re: Crazy problem with FAT filesystem and pthreads
Wolfgang Muees (GERMANY)
Message: 89638
This alloca() bug is in uclibc. This is a toolchain bug and not related to a particular hardware.
It is present in ALL Blackfin uclinux userlands.
TranslateQuoteReplyEditDelete
2010-05-21 13:17:15 Re: Crazy problem with FAT filesystem and pthreads
Robin Getz (UNITED STATES)
Message: 89643
Just to be specific - it is not Blackfin - it is all uClibc based toolchains.
-Robin
QuoteReplyEditDelete
2010-05-24 07:24:55 Re: Crazy problem with FAT filesystem and pthreads
Pawel Zdunczuk (POLAND)
Message: 89801
A few years ago I stumbled on problem with stack overflow when readdir() on vfat was used.
I get around this by using vfs_readdir() in kernel mode to list files from dir.
Today I made a little test - I used the function readdir() again - and the reault is:
Instruction fetch CPLB miss
- CPLB miss on an instruction fetch.
Deferred Exception context
But when I increase stack size for my user mode application and for thread in this application that is using readir(), problem also disappears.
In sum - I'm waiting for bug tracker result (https://blackfin.uclinux.org/gf/tracker/6031)
Do you think that the problem with getdents() and alloca() in uClibc has an impact on other proc. like read(), write() or open() with vfat ?