All Posts by Date or last 15, 30, 90 or 180 days.
also by Lloyd: diglloyd.com photography and WindInMyFace.com
Thank you for purchasing through links and ads on this site.
OWC / MacSales.com...
diglloyd Deal Finder...
Buy other stuff at Amazon.com...
Upgrade the memory of your 2019 iMac up to 128GB
Handpicked deals...
$3698 $2998
SAVE $700

$2397 $1897
SAVE $500

$1199 $898
SAVE $301

$1299 $1169
SAVE $130

$4499 $3999
SAVE $500

$1399 $1099
SAVE $300

$1499 $1299
SAVE $200

$4499 $3149
SAVE $1350

$2299 $1599
SAVE $700

$2498 $2198
SAVE $300

$1799 $1379
SAVE $420

$2998 $2498
SAVE $500

$2199 $1999
SAVE $200

$430 $220
SAVE $210

$3399 $2199
SAVE $1200

$6299 $3599
SAVE $2700

$398 $278
SAVE $120

$1499 $1019
SAVE $480

$1279 $719
SAVE $560

$1699 $1549
SAVE $150

When a Hard Drive Fails: Rebuilding a RAID-5 in SoftRAID

One bad hard drive causes very confusing failures

I’ve been having random drive blips that would take down my RAID. The same drive was always involved, determined by its ID in SoftRAID. Sometimes things would run fine for hours, and sometimes (yesterday) I could provoke the error in 30 seconds just by doing Finder copy or diglloydTools IntegrityChecker verify. I was beginning to worry that if it was my 2017 iMac 5K, but that was baseless as I was able to reproduce the issue on the 2016 MacBook Pro.

I spent the better part of my day tracking down the problem to this one drive. I tried 3 different cables, two different Macs, two different OWC Thunderbay 4 enclosures (and both ports on the enclosure), macOS 10.13 and 10.14, daisy-chained and alone, verified the file system—even a different power outlet in a different room.

In all cases, the same drive was always noted as the culprit that failed to complete an I/O, somehow having the effect of disconnecting all drives in the enclosure. This is a little strange, and so I hope it is just that drive. Still, when I removed it and tried to provoke a problem, no problem.

Replacing a failed drive in a RAID-5 or RAID-4

RAID-5 (or RAID-4) both provide fault tolerance by storing parity information with which the actual data can be reconstructed. With the loss of one drive, the RAID-5 in effect degrades to a RAID-0. So if one drive fails, nothing is lost and you can keep working.

With perhaps fifty (50 failures over the past two days (the one bad drive going AWOL causing it all), I watched SoftRAID perform flawlessly with its smart rebuild capability, which kept rebuilds to a minute or so. But this bad drive just did not play nice, so I physically removed it.

Bringing the RAID-5 back up to full fault tolerance means replacing the failed drive. The following screen shots show the process.

The degraded RAID-5

Here, “degraded” means that one of the drives has gone away and the RAID-5 is now a RAID-0 stripe—another failure and everything is toast, but that’s what backups are for. In this case I physically hot-removed the problem drive, given all the issues it was causing as discussed above.

SoftRAID: degraded RAID-5

Adding a replacement drive

The bad drive being physically removed, I bolted a replacement into a drive sled and hot-inserted it into the OWC Thunderbay 4. I then chose Add Disk… to tell SoftRAID to add this disk to the RAID-5.

SoftRAID: choose which drive to use to replace the failed RAID-5 drive

A confirmation dialog confirms the choice above:

SoftRAID: confirm use of new drive

With the replacement drive in place, SoftRAID goes to work rebuilding the RAID-5 for full fault tolerance. With 12TB drives (11.2 TiB), this takes a while (about 11 hours), since the entire capacity has to be read of each drive, in order to generate the appropriate data to go on the replacement. There is no downtime however—the volumes can keep being used, albeit with a performance loss because of the rebuild process but totally usable.

SoftRAID: rebuilding RAID-5 with newly inserted drive
View all handpicked deals...

Nikon AF-S NIKKOR 24-70mm f/2.8E ED VR Lens
$2397 $1897
SAVE $500

diglloyd.com | Terms of Use | PRIVACY POLICY
Contact | About Lloyd Chambers | Consulting | Photo Tours
Mailing Lists | RSS Feeds | Twitter
Copyright © 2019 diglloyd Inc, all rights reserved.
Display info: __RETINA_INFO_STATUS__