NetBSD-Users archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
Re: Beating a dead horse
- Subject: Re: Beating a dead horse
- From: "William A. Mahaffey III" <wam%hiwaay.net@localhost>
- Date: Tue, 24 Nov 2015 21:57:50 -0553.75
On 11/24/15 19:08, Robert Elz wrote:
Date: Mon, 23 Nov 2015 11:18:48 -0553.75
From: "William A. Mahaffey III" <wam%hiwaay.net@localhost>
Message-ID: <5653492E.1090102%hiwaay.net@localhost>
Much of what you wanted to know has been answered already I think, but
not everything, so....
(in a different order than they were in your message)
| Also, why did my fdisk
| choose those values when his chose apparently better ones ?
There's a size threshold - drives smaller get the offfset 63 stuff,
and drives larger, 2048 ... the assumption is that on small drives
you don't want to waste too much space, but on big ones a couple of
thousand sectors is really irrelevant...
I suspect the threshold is somewhere between your 1TB and the other 2TB
drives.
| If so, is there any way to redo that w/o a complete reinstall :-) ?
Since you're using raid, there is, but it would take forever, so you may
not want to do it... The upside is that the correction would just
slow down your system slightly, leaving it operational the whole time.
I suspect you're not going to want to bother, so I won't give all the
steps, but the basic strategy would be to use your spare disc - stop that
being a spare for now, and repartition it the way that you want all of the
drives partitioned. Then add that back as a hot spare for the raid array.
Then use raidctl to "fail" one of the other drives - raidframe will then
reconstruct the "failed" drive onto the hot spare (the one that is now
correctly divided up). Once that process finishes, the one that was failed
is no longer in use, and can be repartitioned. Do that, then add it as a
hot spare, and then fail another of the drives. Repeat until done...
Expect the whole process to take a week (not continuously, but you're
only likely to do one drive a day, once the reconstruct starts you'll
just leave it to work, and go do other stuff - on that system or whatever).
Actual human time would be about 10 minutes per drive.
If it was me, I wouldn't even think to look see if it was finished till
the next day...
Doing a complete reinstall of everything would probably be done in a
few hours, so ...
Note that none of this is relevant unless you really decide that it
is needed, and you work out first exactly what all the numbers should
be. Also note that (ab)using raidframe this way wiil only fix the
alignment of the raid arrays, if any of the raidframe params ought to be
altered, or the filesystem(s) built on the raid array, then that method
won't help at all, and starting again is definitely the best option (and
doing that before you get too invested in data on all those TBs)
| The machine works well except for horribly slow I/O to the RAID5 I setup
What is your definition of "horribly slow" and are we talking read or write or
both ?
4256EE1 # time dd if=/dev/zero of=/home/testfile bs=16k count=32768
32768+0 records in
32768+0 records out
536870912 bytes transferred in 22.475 secs (23887471 bytes/sec)
23.28 real 0.10 user 2.38 sys
4256EE1 #
i.e. about 24 MB/s. When I zero-out parts of these drive to reinitialize
them, I see ~120 MB/s for one drive. RAID5 stripes I/O onto the data
drives, so I expect ~4X I/O speed w/ 4 data drives. With various
overheads/inefficiencies, I (think I) expect 350-400 MB/s writes. I
posted a variation of this question a while back, w/ larger amount of
I/O, & someone else replied that they tried the same command & saw ~20X
faster I/O than mine reported.
Raid5 is not intended to be fast, it never will be - for writes, it should
be reasonable for read.
What really matters is not some random benchmark result (my filesystem is
faster than yours...) but whether it actually gets your workload done
well enough or not - I have used raid5 for home filesystems, and pkgsrc
distfiles, and other stuff like that (mostly read, occasionally write)
and have never even wondered about its speed - that's all simply irrelvant.
I use raid1 for filesystems with lots of writes (/usr/obj, ...) where
I want write speed to be good.
| Partitions aligned to 2048 sector boundaries, offset 2048 <------ *DING
| DING DING* !!!!
Note that that "2048" is an internal fdisk default, for how it will
help you align stuff. Of itself, it doesn't mean anything to partitions
that have been made. And:
| When I used fdisk to check my drives (well, 1 of them, all are
| identically fdisk-ed & sliced), I see the following:
|
| Partition table:
| 0: NetBSD (sysid 169)
| start 2048, size 1953523120 (953869 MB, Cyls 0/32/33-121601/80/63),
What really counts there is the "start 2048". That's what you want.
2048 is a nice multiple of 8 ... 1953523120 is also a multiple of 8,
so everything there should be nicely setup for 4K blocks. What the
default alignment were to be if you were to change things, is irrelevant.
That is, there is absolutely no need to repartition the drives, the current
layout is fine, and is not the problem (if there even really is a problem).
So, now if your I/O really is slower than it should be, and slower than
raidframe's raid5 can reasonably be expected to achieve, I think the issue
must be with either the raidframe or ffs parameters.
Those you haven't given (here anyway, I don't remember from when this
discussion was going on earlier.)
What is your raidframe layout, and what are your ffs parameters?
kre
ffs data from dumpfs for that FS (RAID5 mounted as /home):
4256EE1 # cat dumpfs.OUTPUT.head.txt
file system: /dev/rdk0
format FFSv2
endian little-endian
location 65536 (-b 128)
magic 19540119 time Tue Nov 24 21:43:06 2015
superblock location 65536 id [ 5593845d ee5eb3c ]
cylgrp dynamic inodes FFSv2 sblock FFSv2 fslevel 5
nbfree 74726702 ndir 139822 nifree 228674724 nffree 9720
ncg 4964 size 943207067 blocks 928593007
bsize 32768 shift 15 mask 0xffff8000
fsize 4096 shift 12 mask 0xfffff000
frag 8 shift 3 fsbtodb 3
bpg 23753 fpg 190024 ipg 46848
minfree 5% optim time maxcontig 2 maxbpg 4096
symlinklen 120 contigsumsize 2
maxfilesize 0x000800800805ffff
nindir 4096 inopb 128
avgfilesize 16384 avgfpdir 64
sblkno 24 cblkno 32 iblkno 40 dblkno 2968
sbsize 4096 cgsize 32768
csaddr 2968 cssize 81920
cgrotor 0 fmod 0 ronly 0 clean 0x02
wapbl version 0x1 location 2 flags 0x0
wapbl loc0 3773140352 loc1 131072 loc2 512 loc3 3
flags wapbl
fsmnt /home
volname swuid 0
cs[].cs_(nbfree,ndir,nifree,nffree):
4256EE1 # raidctl -s dk0
raidctl: ioctl (RAIDFRAME_GET_INFO) failed: Inappropriate ioctl for device
4256EE1 # raidctl -s raid0a
Components:
/dev/wd0a: optimal
/dev/wd1a: optimal
No spares.
Component label for /dev/wd0a:
Row: 0, Column: 0, Num Rows: 1, Num Columns: 2
Version: 2, Serial Number: 10, Mod Counter: 123
Clean: No, Status: 0
sectPerSU: 32, SUsPerPU: 1, SUsPerRU: 1
Queue size: 100, blocksize: 512, numBlocks: 33554368
RAID Level: 1
Autoconfig: Yes
Root partition: Yes
Last configured as: raid0
Component label for /dev/wd1a:
Row: 0, Column: 1, Num Rows: 1, Num Columns: 2
Version: 2, Serial Number: 10, Mod Counter: 123
Clean: No, Status: 0
sectPerSU: 32, SUsPerPU: 1, SUsPerRU: 1
Queue size: 100, blocksize: 512, numBlocks: 33554368
RAID Level: 1
Autoconfig: Yes
Root partition: Yes
Last configured as: raid0
Parity status: clean
Reconstruction is 100% complete.
Parity Re-write is 100% complete.
Copyback is 100% complete.
4256EE1 # df -h
Filesystem Size Used Avail %Cap Mounted on
/dev/raid0a 16G 210M 15G 1% /
/dev/raid1a 63G 1.1G 59G 1% /usr
/dev/dk0 3.5T 1.2T 2.1T 37% /home
kernfs 1.0K 1.0K 0B 100% /kern
ptyfs 1.0K 1.0K 0B 100% /dev/pts
procfs 4.0K 4.0K 0B 100% /proc
tmpfs 8.0G 4.0K 8.0G 0% /tmp
4256EE1 #
Because of its size (> 2 TB) it was setup using dkctl & raidframe won't
report anything about it, how can I get that info for you ? Thanks & TIA.
--
William A. Mahaffey III
----------------------------------------------------------------------
"The M1 Garand is without doubt the finest implement of war
ever devised by man."
-- Gen. George S. Patton Jr.
Home |
Main Index |
Thread Index |
Old Index