Subject: Re: buffer priority [Re: unified buffers and responsibility]
To: Manuel Bouyer <bouyer@antioche.eu.org>
From: Milos Urbanek <urbanek@openbsd.cz>
List: tech-kern
Date: 06/13/2002 14:07:20
On Thu, Jun 13, 2002 at 12:42:57AM +0200, Manuel Bouyer wrote:
> Hi,
> I've experimented a bit about this problem of X freeze while a large
> cp is running, and the target disk is the same as the system disk.
>
> One of the reasons of the problem is that some data gets paged out
> when they shouldn't be (I see activity on the system disk when doing
> a large cp on another disk, clearly related to cp).
> Even setting filemax low (under 10%) doesn't help, and top still reports
> about 30M allocated to files (of 128M - 70M when kernel and buffer cache
> are allocated).
Is that because most of RAM pages are already assigned to file buffers and
can not be freed until they are written to the disk and freed in biodone()?
Therefore the pages of running processes are being swapped out?
>
> The second problem is I/O priority: buffers of a large, batch I/O have
> the same priority as a one-buffer I/O on which a process is blocked.
> This also kills interractive performances (and the disksort() routines
> probably make this even worse).
> On my system my test partition is the last one in the disklabel, so I
> changed disksort with this simple algorithm: the lower the partition
> number is, the highter the priority of the buffer is.
> +void
> +disksort_pri(struct buf_queue *bufq, struct buf *bp)
> +{
> + int part = DISKPART(bp->b_dev);
> + struct buf *bq, *nbq;
> +
> + bq = BUFQ_FIRST(bufq);
> + if (bq == NULL) {
> + BUFQ_INSERT_TAIL(bufq, bp);
> + return;
> + }
> +
> + while ((nbq = BUFQ_NEXT(bq)) != NULL) {
> + if (part < DISKPART(nbq->b_dev))
> + goto insert;
> + bq = nbq;
> + }
> +insert: BUFQ_INSERT_AFTER(bufq, bq, bp);
> +}
>
> This helps a lot. There is still some slowdown, but the system is now usable
> when a cp is running (without this, X will freeze completely until the cp
> completes).
>
> So I think we need something to prioritize I/O at a disk level (not partition
> level). Even for server use I'm afraid this can cause problems (I'm thinking
> about my mail server, on which some users have mailboxes of more than 100M).
Is not the issue that the disk is unable to clean buffers from the queue fast
enough? I do not think that prioritizing would help in case of one process
doing a 'cp huge_file somewhere' when there are no other processes interactive
processes performing an IO (which i think is actually the situation I observe this problem
myself).
Milos
>
> Now I don't have much idea on what algorithm to use, neither
> how to implement it. Probably something like the process scheduler, but
> for I/O, processes doing a lot of I/O having their I/O priority lowered.
> At which level it should be implemented is another problem. Maybe at
> the pagedaemon would be enouth, as it seems the problem is mostly caused by
> writes (sequential reads probably can't lead to large buf queues at the
> disk level).
>
> I already faced this problem in 1.5.x (on the mentioned mail server), but it's
> worse with UBC because:
> - the buf queue can be larger
> - a sequencial write can push out of RAM program data
>
> Any idea ?
>
> --
> Manuel Bouyer <bouyer@antioche.eu.org>
> --
>
--