NetBSD-Users archive
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]
RAIDframe write performance below expectations on a RAID-1 of two magnetic disks on NetBSD/amd64 9.1
Hello all,
this is about the write performance of RAIDframe. There is a lot to read
about this on these mailing lists and I have been very busy trying out
everything I could get my hands on, i.e. different alignment methods,
manipulating the write strategy of the drives, experimenting with the
file system parameters. Unfortunately, I have now reached a point where
I have been before. At that time my "solution" was to abandon NetBSD and
use FreeBSD with ZFS instead. This time I don't want to give up so fast
:-) So I'll give it a try and try to describe my setup as detailed as
possible. Maybe someone sees my obvious mistake and can give me the
crucial tip.
The root filesystem is on a separate disk set (also on RAIDframe but SSD
storage) and is not the subject of this problem. The problem refers to
two identical magnetic hard disks I have (each 1 TB, 4 kb sector size),
from which I want to form a RAID-1 with RAIDframe. To do this, I first
created a partition for RAIDframe on each of the two disks via GPT:
# gpt create wd2
# gpt create wd3
# gpt add -l raid1cmp0 -a 4k -t raid wd2
# gpt add -l raid1cmp1 -a 4k -t raid wd3
Then I initialized the RAID with the following parameter file:
START array
1 2 0
START disks
NAME=raid1cmp0
NAME=raid1cmp1
START layout
128 1 1 1
START queue
fifo 100
The speed of the parity rewrite had given me hope at first. I had
already made several attempts with obviously wrong alignment and run
times of approx. 10 hours were the result. With the correct alignment,
the parity re-write runs in about 2 hours which, according to my
research, should be a good average for the disk size.
On the RAID device (/dev/raid1 for me) I then created another GPT
partition table and created a 4k-aligned partition in it as well:
# gpt create raid1
# gpt add -l data -a 4k -t ffs raid1
# newfs -O 2 -b 16k -f 2k NAME=data
This was formatted with an FFS filesystem (with the recommended
parameters from [1]) and mounted with the mount option "log".
However, the write throughput remains well below my expectations and I
am despairing. When writing a 1 GB file, I achieve write rates of about
2 MB/s.
To me, this looks a bit like the hard drives are operating in the wrong
mode in general. I suspected if the PIO mode is used instead of DMA. But
I haven't found a reliable way to check that. Regardless of this, the
disks achieve significantly higher write rates (80 MB/s and more) on
their own (i.e. without a RAIDframe). In the dmesg it says that:
```
jupiter$ dmesg|grep wd2
[ 2.660025] wd2 at atabus2 drive 0
[ 2.660025] wd2: <ST1000LM048-2E7172>
[ 2.660025] wd2: drive supports 16-sector PIO transfers, LBA48
addressing
[ 2.660025] wd2: 931 GB, 1938021 cyl, 16 head, 63 sec, 512
bytes/sect x 1953525168 sectors (0 bytes/physsect; first aligned sector: 8)
[ 2.850025] wd2: GPT GUID: 01d01c56-2caf-4370-ac48-634c4c211de7
[ 2.850025] dk3 at wd2: "raid1cmp0", 1953525088 blocks at 40, type:
raidframe
[ 3.370025] wd2: drive supports PIO mode 4, DMA mode 2, Ultra-DMA
mode 6 (Ultra/133), WRITE DMA FUA, NCQ (32 tags)
[ 3.400024] wd2(ahcisata0:2:0): using PIO mode 4, DMA mode 2,
Ultra-DMA mode 6 (Ultra/133) (using DMA), WRITE DMA FUA EXT
jupiter$ dmesg|grep wd3
[ 3.400024] wd3 at atabus3 drive 0
[ 3.400024] wd3: <ST1000LM048-2E7172>
[ 3.400024] wd3: drive supports 16-sector PIO transfers, LBA48
addressing
[ 3.400024] wd3: 931 GB, 1938021 cyl, 16 head, 63 sec, 512
bytes/sect x 1953525168 sectors (0 bytes/physsect; first aligned sector: 8)
[ 3.520024] wd3: GPT GUID: aabb5ee0-c30f-4654-9380-3ab8ca81cd9b
[ 3.520024] dk4 at wd3: "raid1cmp1", 1953525088 blocks at 40, type:
raidframe
[ 3.530024] wd3: drive supports PIO mode 4, DMA mode 2, Ultra-DMA
mode 6 (Ultra/133), WRITE DMA FUA, NCQ (32 tags)
[ 3.560024] wd3(ahcisata0:3:0): using PIO mode 4, DMA mode 2,
Ultra-DMA mode 6 (Ultra/133) (using DMA), WRITE DMA FUA EXT
```
Doesn't look bad at first. The hard disks are identified as follows:
```
jupiter$ doas atactl wd2 identify
Model: ST1000LM048-2E7172, Rev: SDM1, Serial #: WES22ZJS
World Wide Name: 5000C5009D54CC56
Device type: ATA, fixed
Capacity 1000 Gbytes, 1953525168 sectors, 512 bytes/sector
Cylinders: 16383, heads: 16, sec/track: 63
Physical sector size: 4096 bytes
First physically aligned sector: 8
Command queue depth: 32
Device capabilities:
DMA
LBA
ATA standby timer values
IORDY operation
IORDY disabling
Device supports following standards:
ATA-4 ATA-5 ATA-6 ATA-7 ATA-8
Command set support:
NOP command (enabled)
READ BUFFER command (enabled)
WRITE BUFFER command (enabled)
Host Protected Area feature set (enabled)
Look-ahead (enabled)
Write cache (disabled)
Power Management feature set (enabled)
Security Mode feature set (disabled)
SMART feature set (enabled)
FLUSH CACHE EXT command (enabled)
FLUSH CACHE command (enabled)
Device Configuration Overlay feature set (enabled)
48-bit Address feature set (enabled)
SET MAX security extension (disabled)
SET FEATURES required to spin-up after power-up (enabled)
Power-Up In Standby feature set (disabled)
Advanced Power Management feature set (enabled)
DOWNLOAD MICROCODE command (enabled)
World Wide Name
WRITE DMA/MULTIPLE FUA EXT commands
General Purpose Logging feature set
SMART self-test
SMART error logging
Serial ATA capabilities:
1.5Gb/s signaling
3.0Gb/s signaling
6.0Gb/s signaling
Native Command Queuing
Host-Initiated Interface Power Management
PHY Event Counters
Serial ATA features:
DMA Setup Auto Activate (disabled)
Device-Initiated Interface Power Managment (disabled)
Software Settings Preservation (enabled)
jupiter$ doas atactl wd3 identify
Model: ST1000LM048-2E7172, Rev: SDM1, Serial #: WES23Y53
World Wide Name: 5000C5009D54C4C7
Device type: ATA, fixed
Capacity 1000 Gbytes, 1953525168 sectors, 512 bytes/sector
Cylinders: 16383, heads: 16, sec/track: 63
Physical sector size: 4096 bytes
First physically aligned sector: 8
Command queue depth: 32
Device capabilities:
DMA
LBA
ATA standby timer values
IORDY operation
IORDY disabling
Device supports following standards:
ATA-4 ATA-5 ATA-6 ATA-7 ATA-8
Command set support:
NOP command (enabled)
READ BUFFER command (enabled)
WRITE BUFFER command (enabled)
Host Protected Area feature set (enabled)
Look-ahead (enabled)
Write cache (enabled)
Power Management feature set (enabled)
Security Mode feature set (disabled)
SMART feature set (enabled)
FLUSH CACHE EXT command (enabled)
FLUSH CACHE command (enabled)
Device Configuration Overlay feature set (enabled)
48-bit Address feature set (enabled)
SET MAX security extension (disabled)
SET FEATURES required to spin-up after power-up (enabled)
Power-Up In Standby feature set (disabled)
Advanced Power Management feature set (enabled)
DOWNLOAD MICROCODE command (enabled)
World Wide Name
WRITE DMA/MULTIPLE FUA EXT commands
General Purpose Logging feature set
SMART self-test
SMART error logging
Serial ATA capabilities:
1.5Gb/s signaling
3.0Gb/s signaling
6.0Gb/s signaling
Native Command Queuing
Host-Initiated Interface Power Management
PHY Event Counters
Serial ATA features:
DMA Setup Auto Activate (disabled)
Device-Initiated Interface Power Managment (disabled)
Software Settings Preservation (enabled)
```
From this I could see that they do indeed have 4k sectors. To be on the
safe side, I also checked the SMART values - it looks good to me - or am
I wrong?
```
jupiter$ doas atactl wd2 smart status
SMART supported, SMART enabled
id value thresh crit collect reliability description raw
1 83 6 yes online positive Raw read error rate
206855044
3 99 0 yes online positive Spin-up time 0
4 91 20 no online positive Start/stop count 9956
5 100 36 yes online positive Reallocated sector count 0
7 78 45 yes online positive Seek error rate
17436089838
9 85 0 no online positive Power-on hours count
188209761891359
10 100 97 yes online positive Spin retry count 0
12 98 20 no online positive Device power cycle count 2909
184 100 99 no online positive End-to-end error 0
187 100 0 no online positive Reported Uncorrectable Errors 0
188 100 0 no online positive Command Timeout 0
189 100 0 no online positive High Fly Writes 0
190 50 40 no online positive Airflow Temperature 50
Lifetime min/max 34/0
191 100 0 no online positive G-sense error rate 179
192 100 0 no online positive Power-off retract count 27
193 1 0 no online positive Load cycle count 465174
194 50 0 no online positive Temperature 50
Lifetime min/max 0/12
197 100 0 no online positive Current pending sector 0
198 100 0 no offline positive Offline uncorrectable 0
199 200 0 no online positive Ultra DMA CRC error count 0
240 100 0 no offline positive Head flying hours
43877385902262
241 100 0 no offline positive Total LBAs Written
29912118872
242 100 0 no offline positive Total LBAs Read
25001009691
254 100 0 no online positive Free Fall Sensor 0
jupiter$ doas atactl wd3 smart status
SMART supported, SMART enabled
id value thresh crit collect reliability description raw
1 76 6 yes online positive Raw read error rate
41698710
3 99 0 yes online positive Spin-up time 0
4 100 20 no online positive Start/stop count 14
5 100 36 yes online positive Reallocated sector count 0
7 69 45 yes online positive Seek error rate
7545197
9 100 0 no online positive Power-on hours count
1155346202744
10 100 97 yes online positive Spin retry count 0
12 100 20 no online positive Device power cycle count 14
184 100 99 no online positive End-to-end error 0
187 100 0 no online positive Reported Uncorrectable Errors 0
188 100 0 no online positive Command Timeout 1
189 100 0 no online positive High Fly Writes 0
190 58 40 no online positive Airflow Temperature 42
Lifetime min/max 39/0
191 100 0 no online positive G-sense error rate 0
192 100 0 no online positive Power-off retract count 5
193 100 0 no online positive Load cycle count 256
194 42 0 no online positive Temperature 42
Lifetime min/max 0/22
197 100 0 no online positive Current pending sector 0
198 100 0 no offline positive Offline uncorrectable 0
199 200 0 no online positive Ultra DMA CRC error count 0
240 100 0 no offline positive Head flying hours
125413045043241
241 100 0 no offline positive Total LBAs Written
4269631011
242 100 0 no offline positive Total LBAs Read
6033874459
254 100 0 no online positive Free Fall Sensor 0
```
The partition tables on the raw disks look like this:
```
jupiter$ doas gpt show -a wd2
start size index contents
0 1 PMBR
1 1 Pri GPT header
2 32 Pri GPT table
34 6 Unused
40 1953525088 1 GPT part - NetBSD RAIDFrame component
Type: raid
TypeID:
49f48daa-b10e-11dc-b99b-0019d1879648
GUID: c9e7c689-5708-482d-a7bc-9f622d596fb1
Size: 932 G
Label: raid1cmp0
Attributes: None
1953525128 7 Unused
1953525135 32 Sec GPT table
1953525167 1 Sec GPT header
jupiter$ doas gpt show -a wd3
start size index contents
0 1 PMBR
1 1 Pri GPT header
2 32 Pri GPT table
34 6 Unused
40 1953525088 1 GPT part - NetBSD RAIDFrame component
Type: raid
TypeID:
49f48daa-b10e-11dc-b99b-0019d1879648
GUID: 106b5ce0-3a27-4b4d-8c5f-c8b45fac7651
Size: 932 G
Label: raid1cmp1
Attributes: None
1953525128 7 Unused
1953525135 32 Sec GPT table
1953525167 1 Sec GPT header
```
The partition table on the RAID looks like this:
```
jupiter$ doas gpt show raid1
start size index contents
0 1 PMBR
1 1 Pri GPT header
2 32 Pri GPT table
34 6 Unused
40 1953524912 1 GPT part - NetBSD FFSv1/FFSv2
1953524952 7 Unused
1953524959 32 Sec GPT table
1953524991 1 Sec GPT header
```
What can I try next? Have I made an obvious mistake?
Kind regards
Matthias
[1]
https://zhadum.org.uk/2008/07/25/raid-and-file-system-performance-tuning/
Home |
Main Index |
Thread Index |
Old Index