Subject: Re: Corrupt data when reading filesystems under Linux guest
To: Jed Davis <jdev@panix.com>
From: Manuel Bouyer <bouyer@antioche.lip6.fr>
List: port-xen
Date: 06/14/2005 15:32:42
On Mon, Jun 13, 2005 at 03:58:07AM +0000, Jed Davis wrote:
> In article <d8aovu$a59$1@sea.gmane.org>, Jed Davis <jdev@panix.com> wrote:
> > In article <d88qle$s6r$1@sea.gmane.org>, Jed Davis <jdev@panix.com> wrote:
> > > 
> > > So I might even be able to fix this myself, if no-one more knowledgeable
> > > is working on it.
> > 
> > And I think I have
> 
> No, not really.  That is, the patch I sent has, I'm pretty sure, a
> serious bug: if a pool allocation or xen_shm fails, xbdback_io will bail
> out after having potentially already enqueued several IOs.  I think it
> won't mismanage memory, but it will still defer an xbd request while
> also performing part of it and them sending the guest a completion
> message.  This is wrong in a number of ways.

I think we can deal with this by sending a partial xfer to the drive, it
shouldn't break anything. But obviously the completion message has to be
sent only one the whole request has been handled.

> 
> And the other changes I'm making don't, so far as I know, sidestep this
> issue.  I think I'll have to chain the actual IOs together, toss them
> if a pool_get fails, run them all at the end of the segment loop, and
> adjust the xenshm callback to match.  Except that the callback can fail
> to install for want of memory, it looks like.  That's... annoying.

If it's really an issue, we can preallocate the xen_shm_callback_entry in
the xbdback_request and adjust the xen_shm interface for this. This would,
at last, fix this issue.

--
Manuel Bouyer, LIP6, Universite Paris VI.           Manuel.Bouyer@lip6.fr
     NetBSD: 26 ans d'experience feront toujours la difference
--