Subject: Re: Corrupt data when reading filesystems under Linux guest
To: Thor Lancelot Simon <tls@rek.tjls.com>
From: Manuel Bouyer <bouyer@antioche.eu.org>
List: port-xen
Date: 06/11/2005 11:51:05
On Fri, Jun 10, 2005 at 08:39:41PM -0400, Thor Lancelot Simon wrote:
> > (although the underlying disk will probably break this anyway :(
> > Of if we want to go this route, it would probably be better to do it at
> > another level, so that other subsystem parts benefit from it.
>
> I don't think that's right at all, for several reasons:
>
> 1) We cannot do this in an MI way in the only obvious place in the system
> to do it, which is disksort(), because on some architectures the
> mapping operations required to glue the transfers together are far
> too expensive (which is why the changes to do exactly that that were
> offered on the mailing lists were rejected long ago). But the Xen
> backend is inherently tied to the current architecture (and a small
> number of related ones, perhaps) and it's reasonable to do the
> mapping operations there.
I've got some weak thoughs about this, and I think we may want to change
the current buffer model to something more mbuf-like. This would allow a
transert to be described as a list of physical addresses/lengh instead
of the large contigous virual address space we can describe now.
This would fit better the hardware limitations than MAXPHYS does
now, and could give us a 0-copy NFS server.
But that's the wrong list to discuss this.
>
> 2) Not doing it *wrecks* performance by doubling the number of IOPS
> needed to handle a client OS doing the perfectly reasonable thing
> and sending us 64K writes on the assumption that, just like a Linux
> domain0, we will merge them.
>
> 3) If you only merge forward in the ring, you can't break filesystem
> ordering constraints, but you _will_ fix the problem where 64K from
> the client turns into 44K + 20K.
Of course, if we only do forward merging it's fine. And I forgot the issue
that the xen interface only allows 44K per request.
--
Manuel Bouyer <bouyer@antioche.eu.org>
NetBSD: 26 ans d'experience feront toujours la difference
--