On 05/13/15 13:14, David Brownlee wrote:
On 13 May 2015 at 16:03, William A. Mahaffey III <wam%hiwaay.net@localhost> wrote:On 05/13/15 08:48, David Brownlee wrote:On 12 May 2015 at 16:01, William A. Mahaffey III <wam%hiwaay.net@localhost> wrote:On 05/12/15 02:32, David Brownlee wrote:On 11 May 2015 at 23:46, William A. Mahaffey III <wam%hiwaay.net@localhost> wrote: If you are using RAID5 I would strongly recommend keeping to "power-of-two + 1" components, to keep the stripe size as a nice power of two, otherwise performance is... significantly impaired.Hmmmm .... Could you amplify on that point a bit ? I am intending to maximize available storage & have already procured the mbd & 6 drives, but I could rethink things if my possibly hasty choices would be too burdensomeFor RAID5 to perform efficiently data should be written in units which re aligned with the RAID stripes and are a multiple of stripe in size, otherwise a simple write changes into a read of stripe, modification of the affected part and then a write. Filesystems tend to have sectors and blocks which are powers of two, so the easiest way to arrange this for ffs is for the filesystem block size to be a multiple of the stripe size ("1" is a fine multiple in this case). This is similar to the issue with sector drives which have 4K sectors but present them 512byte sectors - if a filesystem is not 4K aligned then write performance suffers horribly.Hmmmmm .... OK, I think I have it. The stripe size is N * <some underlying (disk|RAID) block size>, & if N (the # of active drives) is odd or prime (or both, as in my case), we would have/need bizarre filesystem block sizes (for alignment w/ RAID stripes) or unaligned FS blocks/sectors, which give crappy performance, right ? Could you estimate how crappy crappy really is ? 25% slower ? 50%, 100%, more ? Me scots sensibilities hate having almost 1 TiB of drive sitting around idle (although I do crave speed enough to override) :-/ ....If you manage 25% of the performance (that is "only" 75% hit) I would be surprised. I'd also be curious to see what number you do get :) - I'm quite fond of pkgsrc/benchmarks/bonnie++ to get simple comparable numbers. If you are testing, some things to vary - Number of drives (5, 6) - Stripe size, eg 4K per drive or 8K per drive - Filesystem block size 32K, 64K (may not be able to use 64K for boot partitions) - mounting with '-o log' or not (generally you want this :) Remember to ensure you have good (at least 4K) alignment on the base partitions. If you have a modern '4K under the covers' drive and start at sector 63... its not a good place to be
OK, sold. I guess the same arguments might apply to other RAIDs as well, no ? My original config (below) has a 2-drive RAID1 for /, a 4-drive RAID10 for /usr, I guess both of those would be good w/ HDD blocks of 4K, RAID1 blocks of 4K as well, & FFS blocksizes of 4K for / & 8K for /usr, no ? I recall from days of yore on SGI's, they always recommended tayloring HDD block sizes to what you were storing there, small block sizes for filesystems w/ lots of small files, larger block sizes for mostly larger files & better I/O performance w/ those files.
For that matter, how do I check those parameters on this machine (FreeBSD 9.3R-p13) ? I know I went w/ 4K blocks on the raw HDD's using GPT, but I don't recall what I did w/ RAID0 parameters, might be some room for improvements here as well. A bit OT, of course ....
If you want to maximise space with some redundancy then as you say, RAID5 is the way to go for the bulk of the storage. A while back I setup a machine with 5 * 2TB disks with netbsd-6, with small RAID1 partitions for root and the bulk as RAID5 http://abs0d.blogspot.co.uk/2011/08/setting-up-8tb-netbsd-file-server.html (wow, was that really four years ago) - in your position I might keep one 1TB as a scratch/build space and then RAID up the rest. If you have time definitely experiment, get a feel for the different performance available from the different options.*Wow*, another fabulous resource. Your blog documents almost verbatim what I have in mind. I am going w/ 6 drives (already procured, 6 SATA3 slots on the mbd, done deal), but philosophically very close to what you describe. 1 question: if you were doing this again today, would it be fdisk or GPT ?If I had >2TB drives it would be TB :) If not, I would still stick with fdisk. The complexity of gpt setup and wedge autoconfiguration is still greater than fdisk and disklabel. I know I'm going to have to move to it at some point, but I'm going to hold off until I need to
I meant to ask earlier, what is TB ? I think I will stick to fdisk. I used GPT for this box, apparently well supported & I found numerous very detailed tutorials on how to setup what I wanted, so I went w/ it. All is going fabulously there, BTW ....
I think I am looking at 4 partitions per drive, ~16 GB for / (RAID1, 2 drives) & /usr (4 drives, RAID10), 16 GB for swap (kernel driver, all 6 drives), 16 - 32 GB for /var (RAID5, all 6 drives), & the rest for /home (RAID5, all 6 drives). TIA & thanks again.I would definitely hold off on RAID5 for everything except the large /home. RAID1 is much simpler and more performance for writes. I would also try to avoid configuring multiple RAID5s across overlapping sets of disks, while it theoretically provides more IO bandwidth, that bandwidth will be having to compete with all the other filesystems and swap usage on the system. If you wanted to use all six disks: - 32G(RAID1 root+usr) 910G(non raid scratch space) - 32G(RAID1 root+usr) 910G(RAID5 home) - 32G(RAID1 var) 910G(RAID5 home) - 32G(RAID1 var) 910G(RAID5 home) - 32G(RAID1 swap+spare) 910G(RAID5 home) - 32G(RAID1 swap+spare) 910G(RAID5 home) 32GB space notes: - This gives you three 32GB RAID1 'pools' to allocate everything outside of /home - Can adjust the 32G up or down before partitioning, but all should be the same - In the suggestion, root+usr are kept on the same RAID (and could be a single partition), so that the system can have all of the userland available with only one disk attached, and a 'spare' partition is left in case of later moderate additional space needs - maybe an extra partition for /usr/pkg?, or for /var/pgsql, etc - Obviously allocate usage within pools to taste - could put /usr on a separate raid to provide more IO bandwidth for root & usrThis is interesting. I kinda wanted swap spread out over all 6 drives for better swap I/O performance, an issue I am having with another box which is laid out sorta like this, with swap 'on top of' a RAID0 block (admittedly under Linux, not *BSD, but still), swap performance is horrible, several min. to page in 200-300 MB worth of paged out VM. I was planning on as much parallelization of each RAID as possible for max performance, & swap handled by the kernel driver. Others have suggested swap on a RAID 'partition', is that more de-rigeur for NetBSD, or the other BSD's for that matter ? This box, under FreeBSD 9.3R-p13, has 4 swap partitions under straight kernel management, & seems very spry, although it also has a lot of RAM & doesn't swap much (on purpose, BTW) ....Separate swap devices will give the best performance, RAID1 or 5 will give robustness in the face of a single component failure. You pays your money... Of course, if you have dedicated partitions on the disk which you could RAID then you can even change your mind after install, swapctl off the swap, mess with the partitions and away you go (nerves of steve advised, though not required :)
From a reliability/robustness stand-point, if I had a HDD failure, would I be able to reboot w/ 1 lost swap partition w/o intervention ? I'm thinking not ....
910GB space notes: - This gives 5* 910GB RAID5, which provides 4*910G (or 3640G) of space - One disk is not included in the RAID5. This could be saved as a spare for a RAID5 component failure (though a better approach might be to have a disk on the desk next to the machine :), or used as non raided scratch space. If it will not be active, then probably best to put it on one of the components for the heaviest used 32G, or the most important 32GYour assessments are quite persuasive, I think I now like the 5 drive RAID5 for home, with that last partition as nebulous scratch space.
For that matter I could use the extra large HDD partition as /var & eliminate 1 group of partitions altogether, i.e. split each drive up as 16 GB for root (2X, RAID1) & /usr (4X, RAID10), 16 GB for swap (6X, raw, kernel driver), the rest for RAID5 /home (5X) & /var (raw, 1X). I'm liking this more & more as I go along ....
Note in the above that IO to /home will hit (almost) all disks, and will affect all of the 32GB pools, so if you have heavy IO to /home do not expect blistering performance from any filesystem. On the other hand when /home has very light IO then you should have relatively nice multi spindle performance from the other filesystems.Yeah, but it would speed up access to *just* /home, right ? This box will be backing up other boxen on my LAN, initially to a directory under /home, so I want that I/O to be as swift as possible. I am maxing out the RAM (also already procured), so I hope I don't have too much contention between I/O to /home & swap ....If you want home to be as fast a possible, then you really want to prefer RAID1 to RAID5 (which conflicts with the space.. I know). I run dirvish overnight from some machines to a RAID5 setup pretty much identical to the one in my post, and it works well enough.Having said all that, if I had the time to play I would install onto a USB key, then script up the building and partitioning of the system in many different forms and then chroot into the result and run some tests to see how it performs.I *definitely* want to script the partitioning both for repeatability in the event of drive failure (or setting up another box) & to avoid fat-fingered screw-ups !!!! Thanks again for a fabulously informative reply.Having discussed all this RAID5 goodness I feel obliged to comment that when I finally run out of space and need a new build I'm probably going to go for 4TB disks with six of them in RAID1 pairs for 12TB with flexibility for adding more space (in 4TB units). If I *needed* to get 16TB out of them I'd go the RAID5 route again, but I'm willing to trade off the extra space for speed and simplicity. Of course I'm really holding off having to actually *buy* six 4TB disks for as long as humanly possible (by which point they may be 6TB disks, but who can tell :)
-- William A. Mahaffey III ---------------------------------------------------------------------- "The M1 Garand is without doubt the finest implement of war ever devised by man." -- Gen. George S. Patton Jr.