tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Lost file-system story



On Tue, Dec 13, 2011 at 4:09 PM, Greg A. Woods <woods%planix.ca@localhost> 
wrote:
> At Wed, 14 Dec 2011 09:06:23 +1030, Brett Lymn 
> <brett.lymn%baesystems.com@localhost> wrote:
> Subject: Re: Lost file-system story
>>
>> On Tue, Dec 13, 2011 at 01:38:57PM +0100, Joerg Sonnenberger wrote:
>> >
>> > fsck is supposed to handle *all* corruptions to the file system that can
>> > occur as part of normal file system operation in the kernel. It is doing
>> > best effort for others. It's a bug if it doesn't do the former and a
>> > potential missing feature for the latter.
>> >
>>
>> There are a lot of slips twixt cup and lip.  If you are really unlucky
>> you can get an outage at just the wrong time that will cause the
>> filesystem to be hosed so badly that fsck cannot recover it.  Sure, fsck
>> can run to completion but all you have is most of your FS in lost+found
>> which you have to be really really desperate to sort through.  I have
>> been working with UNIX for over 20years now and I have only seen this
>> happen once and it was with a commercial UNIX.
>
> I've seen that happen more than once unfortunately.  SunOS-4 once I think.
>
> I agree 100% with Joerg here though.
>
> I'm pretty sure at least some of the times I've seen fsck do more damage
> than good it was due to a kernel bug or more breaking assumptions about
> ordered operations.
>
> There have of course also been some pretty serious bugs in various fsck
> implementations across the years and vendors.
>

I'd be suspicious of fsck failing on a regularly mounted disk with
corruption that can't otherwise be tracked to outside influences (bad
ram, bad disk cache, etc). I've seen some bizarre things happen on ram
errors over the years for instance.

James


Home | Main Index | Thread Index | Old Index