Subject: Re: Low AAC performance but only when tested through the file system
To: Olaf Seibert <rhialto@polderland.nl>
From: Greg 'groggy' Lehey <grog@NetBSD.org>
List: port-i386
Date: 12/03/2003 13:43:30
--Li7ckgedzMh1NgdW
Content-Type: text/plain; charset=us-ascii
Content-Disposition: inline
Content-Transfer-Encoding: quoted-printable

On Thursday, 27 November 2003 at 18:02:52 +0100, Olaf Seibert wrote:
> I am still struggling with the performance of an Adaptec 2120S RAID
> controller (aac0). When testing with bonnie++ (from pkgsrc), write
> performance is only around 4M/sec. The number of xfers/second, as shown
> by `systat vmstat', is only around 60 while it is writing and only
> around 100 when reading. This is surprisingly low.

As a disk performance test, bonnie is surprisingly misleading.
bonnie++ is probably better, but it still requires a lot of
interpretation.

> However, when I test with a benchmark called RawIO (see
> http://www.acnc.com/benchmarks.html) on the (unused) raw swap partition,
> I get much better results. Sequential writing is around 35M/sec, and the
> number of xfers/sec is over 1000, peaking at some 1900.

This is surprisingly high.  You can assume that you're not getting
valid results.

On Saturday, 29 November 2003 at 15:45:00 +0100, Olaf Seibert wrote:
>
> Output from rawio:
>
> bash-2.05b# ./rawio -s 1g 1g -a /dev/rld0b
>            Random read  Sequential read    Random write Sequential write
> ID          K/sec  /sec    K/sec  /sec     K/sec  /sec     K/sec  /sec
> ld0b      15587.8   950  32273.5  1970   15127.1   933   31824.6  1942

Look at the -s parameter there.  You're only accessing 1 GB of the
total disk space.  IIRC these things have a cache memory of 128 MB, so
you're probably hitting the cache most of the time.

rawio is in fact a pretty pessimistic benchmark.  Even the sequential
I/O tests tend to give poor results, because there are several
processes accessing different parts of the disk in parallel.  As a
result, you still get the seek latency.  This is what gives away the
results.  Here are some results done with Vinum on some rather old
disks with varying stripe sizes.  They don't look very good, but in
fact they're about as much as the hardware can handle.  Note
particularly that the sequential I/O rate is not significantly higher
than the random rate.  These tests were done with other stuff going on
on the machine, which will probably explain some of the anomalies in
the values (sequential read for stripe size 512 kB, for example).

           Random read  Sequential read    Random write Sequential write
ID          K/sec  /sec    K/sec  /sec     K/sec  /sec     K/sec  /sec

s.1k       1476.6    90	  1341.3    82	  1469.6    90	  1310.7    80=09
s.2k       1543.3    92	  1598.3    98	  1498.7    92	  1303.3    80=09
s.4k       2728.2   162	  2103.5   128	  2717.5   165	  2477.6   151=09
s.8k       3859.4   227	  3792.5   231	  3835.8   227	  3900.2   238=09
s.16k      4631.8   280	  5454.9   333	  4527.0   283	  5202.0   318=09
s.32k      5196.8   314	  6515.4   398	  5270.0   317	  6491.2   396=09
s.64k      5730.2   347	  5833.6   356	  5644.9   347	  7685.2   469=09
s.128k     5961.6   365	  8233.7   503	  6012.6   358	  8772.9   535=09
s.256k     5879.5   352	  6767.5   413	  6018.0   355	  6701.5   409=09
s.512k     5799.6   347	  3195.0   195	  5973.6   358	  7956.1   486=09
s.1024k    5984.8   368	  5031.5   307	  6191.2   372	  4376.1   267=09

On Tuesday,  2 December 2003 at 16:45:30 +0100, Olaf Seibert wrote:
> On Mon 01 Dec 2003 at 17:57:30 -0800, Bill Studenmund wrote:
>> On Mon, 1 Dec 2003 17:57:30 -0800, Bill Studenmund wrote:
>> On Mon, Dec 01, 2003 at 02:50:52AM +0100, Olaf Seibert wrote:
>>> On Sun 30 Nov 2003 at 15:07:33 -0800, Bill Studenmund wrote:
>>>> What's your stripe depth? For optimal performance, you want it to be 1=
6k.
>>>> The file system will do 64k i/o's. With 4 data drives & a 16k stripe
>>>> depth, a 64k i/o (on a 64k boundary) will hit all 4 drives at once.
>>>
>>> It's the default 64k. In the previous hardware I tried 16k also but it
>>> made no difference, so I never tried it on this one.
>>
>> With a stripe depth of 64k (stripe width of 256k) and 64k i/os you will
>> get poor performance. If you are performing random i/o, each of those 64k
>> writes means READING 3 * 64k =3D 192k then writing 128k.
>>
>> Assuming the stripe depth really is 64k, each write by the OS will
>> turn into a read/modify/write operation in the RAID card. You'll be
>> getting worse performance than if you just used one drive.
>
> Isn't that simply always the case?

Pretty much.

> I'm not sure how write size and stripe size influence this unless
> the RAID controller is exceedingly stupid and re-creading the parity
> for the whole stripe if only a single sector of it changes.

I'd be interested to know if that's the case.  I think it might be
possible.

I did a lot of investigation of UFS I/O behaviour a few years back
when I was writing Vinum.  That's one of the things I considered when
writing rawio: unless you're doing things like sequential file copies,
your I/O is surprisingly seldom an exact block.  It could be any
number of sectors with any alignment.  The only way a RAID controller
can optimize that is by coalescing requests, as you suggest.  I found
it so unlikely to make any differences that I didn't implement it in
Vinum.

Greg
--
See complete headers for address and phone numbers.

--Li7ckgedzMh1NgdW
Content-Type: application/pgp-signature
Content-Disposition: inline

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.0 (FreeBSD)

iD8DBQE/zVTaIubykFB6QiMRAjQHAJ9TMzsLNqhtSvwQ/zAu2i1MTLYO4QCdFjbn
vntkyBlPszDmHRt3lfRYL7k=
=GMXY
-----END PGP SIGNATURE-----

--Li7ckgedzMh1NgdW--