Subject: Re: I/O priorities
To: Wojciech Puchar <wojtek@chylonia.3miasto.net>
From: Greywolf <greywolf@starwolf.com>
List: tech-kern
Date: 06/20/2002 12:07:27
As a non-kernel non-engineer (I know just enough to be dangerous), it
does strike me that ordering writes per partition is asking for trouble;
should the writes not be ordered per physical device?
Take this into account:
Another I/O is currently holding up the write queue long
enough to store the following two writes:
write is scheduled for blocks 229-383 of Nd0h (absolute
blocks 32999829-32999983)
write is scheduled for blocks 128-255 of Nd0a (absolute
128-255 plus change where Nd0a[0] != Nd0[0])
If we order per partition -- at least by default, with no priorities
given to partitions -- we have to seek to the middle or near the end
of the disk to scribble something (time may be negligible to
non-trivial), and then we have to seek BACK to near the beginning of the
disk (for which time will NOT be non-trivial).
Actually, that will happen whether or not we assign priorities to
different partitions, but I'm leaving it in for train-of-thought.
Anyway, wouldn't it still be best to order the writes per disk rather
than per partition?
An I/O scheduler, would that be something like the process scheduler
where the more contiguous time a write process has, the lower its
priority gets on the I/O end?
I think it should apply to hard drives only, personally, as if you do
something like that with a CD-RW, you run the risk of an underrun.
But, then, what do I know?
On Thu, 20 Jun 2002, Wojciech Puchar wrote:
# > > about this: 1) it's only a problem when the kernel allocates buffers
# > > for its own internal use, during read()s and write()s, but not mmap().
# >
# > mmap can probably trigger the problem as well. Anything that will create
# > a large buffered write will.
#
# and mmap with be used more and more often, especially with Thorpej's patch
# for direct memory to ethernet transfers.
#
# > > seconds for dd if=/dev/zero to flush all of its buffers.
# >
# > An I/O scheduler does probably make sense. Other mechanism can help too.
# > First we probably need a per-partition I/O queue, instead of per device.
#
# i don't think partitions have anything with that. doing that will make
# linux-like case (at least linux 2.2. kernels) where lots of partitions
# means even slower work and lots of disk seeks as it sorts request per
# partition not real device
--*greywolf;
--
NetBSD: Servers' choice!