Subject: Re: Disk scheduling policy (Re: NEW_BUFQ_STRATEGY)
To: None <tls@rek.tjls.com>
From: Jason Thorpe <thorpej@wasabisystems.com>
List: tech-kern
Date: 12/01/2003 16:14:54
--Apple-Mail-16--324902305
Content-Transfer-Encoding: 7bit
Content-Type: text/plain; charset=US-ASCII; format=flowed
On Dec 1, 2003, at 3:57 PM, Thor Lancelot Simon wrote:
> The way I read the SGI text, they do N requests from queue A, then
> N requests from queue B, and so forth. A simple implementation of
> this seems like it might disrupt the elevator sort quite badly, so I
> wonder if they actually did something more clever.
They probably didn't have to do anything more clever. SGI systems
almost exclusively used SCSI disks (tuned a certain way), and could
thus rely on the disk to mitigate any disruption of the elevator sort
(through command reordering).
Really, it's not clear that the elevator sort buys you much anyway,
when you're talking to raw disks, because disks don't really expose
their real geometry anymore.
That said, elevator sort could potentially be VERY useful on RAID
systems. I know of a RAID card vendor whose firmware sets a timer when
it receives a write request for a block within a given stripe, and then
buffers the write ("stalls" it from the OS's perspective). If, before
the timer expires, writes that fill out the rest of the stripe are
received, the firmware skips the r/m/w cycle for the stripe. This can
greatly improve performance.
This, of course, makes this card look horrible if you use dd(1) to test
RAID-5 write performance, since each of the writes from the dd program
are issued in lock-step. However, if the disk queue sorting algorithm
can arrange to group writes for a stripe together (not even necessarily
in sequential order), then you can potentially have a major positive
impact on overall system performance.
I would also like to see a disk sorting algorithm that could coalesce
adjacent writes or reads into single requests (perhaps enqueueing an
uber-buf that pointed to a list of sub-bufs that were treated as s/g
elements, or something). As part of this, I'd really like to add a
bus_dmamamp_load_buf() that could handle various different data
representations within "struct buf" (I have a project I'm currently
planning that could really make use of attaching mbuf chains to bufs,
rather than simple linear buffers).
-- Jason R. Thorpe <thorpej@wasabisystems.com>
--Apple-Mail-16--324902305
content-type: application/pgp-signature; x-mac-type=70674453;
name=PGP.sig
content-description: This is a digitally signed message part
content-disposition: inline; filename=PGP.sig
content-transfer-encoding: 7bit
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.3 (Darwin)
iD8DBQE/y9l+OpVKkaBm8XkRAuOHAJwP67Gi8y3dLRCKNvblgD9K2cruWgCeJBBh
Qo2P6Q0dy5MwURHzSmKedyE=
=cHv6
-----END PGP SIGNATURE-----
--Apple-Mail-16--324902305--