All Posts by Date or last 15, 30, 90 or 180 days.
also by Lloyd: diglloyd.com photography and WindInMyFace.com

Thank you for buying via links and ads on this site,
which earn me advertising fees or commissions.
As an Amazon Associate I earn from qualifying purchases.

Other World Computing...
B&H Photo...
Amazon
As an Amazon Associate I earn from qualifying purchases.
Upgrade the memory of your 2020 iMac up to 128GB
Memory Upgrades for 2019 Mac Pro - Save Up to 65% vs Factory Costs

When a Hard Drive Fails: Rebuilding a RAID-5 in SoftRAID

One bad hard drive causes very confusing failures

I’ve been having random drive blips that would take down my RAID. The same drive was always involved, determined by its ID in SoftRAID. Sometimes things would run fine for hours, and sometimes (yesterday) I could provoke the error in 30 seconds just by doing Finder copy or diglloydTools IntegrityChecker verify. I was beginning to worry that if it was my 2017 iMac 5K, but that was baseless as I was able to reproduce the issue on the 2016 MacBook Pro.

I spent the better part of my day tracking down the problem to this one drive. I tried 3 different cables, two different Macs, two different OWC Thunderbay 4 enclosures (and both ports on the enclosure), macOS 10.13 and 10.14, daisy-chained and alone, verified the file system—even a different power outlet in a different room.

In all cases, the same drive was always noted as the culprit that failed to complete an I/O, somehow having the effect of disconnecting all drives in the enclosure. This is a little strange, and so I hope it is just that drive. Still, when I removed it and tried to provoke a problem, no problem.

Replacing a failed drive in a RAID-5 or RAID-4

RAID-5 (or RAID-4) both provide fault tolerance by storing parity information with which the actual data can be reconstructed. With the loss of one drive, the RAID-5 in effect degrades to a RAID-0. So if one drive fails, nothing is lost and you can keep working.

With perhaps fifty (50 failures over the past two days (the one bad drive going AWOL causing it all), I watched SoftRAID perform flawlessly with its smart rebuild capability, which kept rebuilds to a minute or so. But this bad drive just did not play nice, so I physically removed it.

Bringing the RAID-5 back up to full fault tolerance means replacing the failed drive. The following screen shots show the process.

The degraded RAID-5

Here, “degraded” means that one of the drives has gone away and the RAID-5 is now a RAID-0 stripe—another failure and everything is toast, but that’s what backups are for. In this case I physically hot-removed the problem drive, given all the issues it was causing as discussed above.

SoftRAID: degraded RAID-5

Adding a replacement drive

The bad drive being physically removed, I bolted a replacement into a drive sled and hot-inserted it into the OWC Thunderbay 4. I then chose Add Disk… to tell SoftRAID to add this disk to the RAID-5.

SoftRAID: choose which drive to use to replace the failed RAID-5 drive

A confirmation dialog confirms the choice above:

SoftRAID: confirm use of new drive

With the replacement drive in place, SoftRAID goes to work rebuilding the RAID-5 for full fault tolerance. With 12TB drives (11.2 TiB), this takes a while (about 11 hours), since the entire capacity has to be read of each drive, in order to generate the appropriate data to go on the replacement. There is no downtime however—the volumes can keep being used, albeit with a performance loss because of the rebuild process but totally usable.

SoftRAID: rebuilding RAID-5 with newly inserted drive
View all handpicked deals...

Voigtlander APO-LANTHAR 35mm f/2 Aspherical Lens for Sony E
$1149 $949
SAVE $200

diglloyd.com | Terms of Use | PRIVACY POLICY
Contact | About Lloyd Chambers | Consulting | Photo Tours
Mailing Lists | RSS Feeds | Twitter
Copyright © 2020 diglloyd Inc, all rights reserved.
Display info: __RETINA_INFO_STATUS__