Port-xen archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: cache sync ioctl, freezes



mlelstv%serpens.de@localhost (Michael van Elst) writes:

> gdt%lexort.com@localhost (Greg Troxel) writes:
>
>>Things are mostly ok, except:
>
>>  - I get spurious DIOCCACHESYNC failed messages (below).  I think
>>    that's because zvols don't support cache sync.  I don't know if this
>>    is harmless or not.
>
> That is mostly harmless, but may leave a disk in a less consistent
> state than necessary when the system crashes.
>
> For FreeBSD the code implements DIOCGFLUSH (the same operation),
> so it looks trivial to also add DIOCCACHESYNC like:
>
> diff -p -u -r1.13 zvol.c
> --- zvol.c      29 Feb 2020 17:03:33 -0000      1.13
> +++ zvol.c      2 Nov 2024 07:15:44 -0000
> @@ -3638,6 +3638,10 @@ zvol_ioctl(dev_t dev, int cmd, intptr_t 
>                 break;
>         }
>  
> +       case DIOCGSYNC:
> +               zil_commit(zv->zv_zilog, ZVOL_OBJ);
> +               break;
> +
>         default:
>                 dprintf("unknown disk_ioctl called\n");
>                 error = ENOTTY;

I'm now running with that in the dom0 (but with the netbsd ioctl name).
I don't get the DIOCCACHESYNC errors any more.  Interestingly, the
performance of my stress script is vastly better.   It took 7s to create
10000 files (there's a sleep 10 after sync and before rm) and 29s to
remove them.  Still feels sluggish, but it was I think more like 5m.

stress-zfs: 1730642097 00-CREATE-START
stress-zfs: 1730642104 01-CREATE-DONE
stress-zfs: 1730642104 02-CREATE-sync
stress-zfs: 1730642114 10-REMOVE-START
stress-zfs: 1730642143 10-REMOVE-DONE
stress-zfs: 1730642143 10-REMOVE-sync

I am guessing that the error return caused some bad behavior.

I have committed this to current.

>>  - If I run the system out of ram (by mallocing 8GB in a system with
>>    6GB of RAM, and then writing to each page), the system freezes, by
>>    which I mean an ssh session stops responding to CR and if I do e.g
>>       echo 204500; date
>>    typing that at 204500, then the date printed is usually 2 minutes
>>    plus later.  (This happens without zfs loaded!)
>
> Doesn't sound like a "freeze". Maybe system is (slowly) swapping?

Probably it is.  I am 99% sure I was having genuine freezes, and I am
now thinking I have not yet reproduced that.

>>  - When I run a script that creates 10000 files in zfs, it runs
>>    quickly, in seconds, maybe 10.  Trying to rm them takes over 5
>>    minutes.   Keep in mind that the disk in the zpool in the domU is
>>    backed by a zvol on the dom0.
>
> That is an issue with our ZFS code. Removing files is a mostly
> synchronous operation and that's not related to Xen or on what
> device the data is backed, except that it's obviously worse on
> slower devices.

Is there any hope of fixing this, or understanding of why NetBSD zfs is
slower than FreeBSD zfs?

But, what I am seeing now is just sort of slower, not horrific.


Home | Main Index | Thread Index | Old Index