Subject: Re: Corrupt data when reading filesystems under Linux guest
To: Thor Lancelot Simon <tls@rek.tjls.com>
From: Manuel Bouyer <bouyer@antioche.eu.org>
List: port-xen
Date: 06/11/2005 11:51:05
On Fri, Jun 10, 2005 at 08:39:41PM -0400, Thor Lancelot Simon wrote:
> > (although the underlying disk will probably break this anyway :(
> > Of if we want to go this route, it would probably be better to do it at
> > another level, so that other subsystem parts benefit from it.
> 
> I don't think that's right at all, for several reasons:
> 
> 1) We cannot do this in an MI way in the only obvious place in the system
>    to do it, which is disksort(), because on some architectures the
>    mapping operations required to glue the transfers together are far
>    too expensive (which is why the changes to do exactly that that were
>    offered on the mailing lists were rejected long ago).  But the Xen
>    backend is inherently tied to the current architecture (and a small
>    number of related ones, perhaps) and it's reasonable to do the
>    mapping operations there.

I've got some weak thoughs about this, and I think we may want to change
the current buffer model to something more mbuf-like. This would allow a
transert to be described as a list of physical addresses/lengh instead
of the large contigous virual address space we can describe now.
This would fit better the hardware limitations than MAXPHYS does
now, and could give us a 0-copy NFS server.
But that's the wrong list to discuss this.

> 
> 2) Not doing it *wrecks* performance by doubling the number of IOPS
>    needed to handle a client OS doing the perfectly reasonable thing
>    and sending us 64K writes on the assumption that, just like a Linux
>    domain0, we will merge them.
> 
> 3) If you only merge forward in the ring, you can't break filesystem
>    ordering constraints, but you _will_ fix the problem where 64K from
>    the client turns into 44K + 20K.

Of course, if we only do forward merging it's fine. And I forgot the issue
that the xen interface only allows 44K per request.

-- 
Manuel Bouyer <bouyer@antioche.eu.org>
     NetBSD: 26 ans d'experience feront toujours la difference
--