Subject: Re: Corrupt data when reading filesystems under Linux guest
To: Manuel Bouyer <bouyer@antioche.eu.org>
From: Thor Lancelot Simon <tls@rek.tjls.com>
List: port-xen
Date: 06/10/2005 20:39:41
On Sat, Jun 11, 2005 at 12:52:11AM +0200, Manuel Bouyer wrote:
> On Fri, Jun 10, 2005 at 01:05:03AM +0000, Jed Davis wrote:
> >
> > The less-easy next thing: gluing together consecutive requests where
> > applicable; e.g., a 64k transfer broken into 44k and 20k parts.
>
> I'm not sure we want to go that far. The request may be split for a
> valid reason, like write ordering requirement for the filesystem
> (although the underlying disk will probably break this anyway :(
> Of if we want to go this route, it would probably be better to do it at
> another level, so that other subsystem parts benefit from it.
I don't think that's right at all, for several reasons:
1) We cannot do this in an MI way in the only obvious place in the system
to do it, which is disksort(), because on some architectures the
mapping operations required to glue the transfers together are far
too expensive (which is why the changes to do exactly that that were
offered on the mailing lists were rejected long ago). But the Xen
backend is inherently tied to the current architecture (and a small
number of related ones, perhaps) and it's reasonable to do the
mapping operations there.
2) Not doing it *wrecks* performance by doubling the number of IOPS
needed to handle a client OS doing the perfectly reasonable thing
and sending us 64K writes on the assumption that, just like a Linux
domain0, we will merge them.
3) If you only merge forward in the ring, you can't break filesystem
ordering constraints, but you _will_ fix the problem where 64K from
the client turns into 44K + 20K.
--
Thor Lancelot Simon tls@rek.tjls.com
"The inconsistency is startling, though admittedly, if consistency is to be
abandoned or transcended, there is no problem." - Noam Chomsky