All Posts by Date or last 15, 30, 90 or 180 days.
also by Lloyd: diglloyd.com photography and WindInMyFace.com
Thank you for purchasing through links and ads on this site.
OWC / MacSales.com...
diglloyd B&H Deal Finder...
Buy other stuff at Amazon.com...
Upgrade the memory of your 2018 Mac mini up to 64GB
877-865-7002
Today’s Deal Zone Items... Handpicked deals...
$1499 $1049
SAVE $450

$1699 $1199
SAVE $500

$799 $299
SAVE $500

$15199 $13699
SAVE $1500

$399 $399
SAVE $click

$7995 $7495
SAVE $500

$1399 $1049
SAVE $350

$450 $450
SAVE $click

$1299 $1199
SAVE $100

$700 $500
SAVE $200

$764 $399
SAVE $365

$290 $250
SAVE $40

$169 $99
SAVE $70

$370 $320
SAVE $50

$748 $648
SAVE $100

$200 $170
SAVE $30

$360 $360
SAVE $click

$147 $80
SAVE $67

$1648 $1278
SAVE $370

$250 $235
SAVE $15

$3899 $3899
SAVE $click

$2799 $2499
SAVE $300

$1199 $999
SAVE $200

$5699 $4699
SAVE $1000

$2999 $2399
SAVE $600

$18599 $17099
SAVE $1500

$600 $600
SAVE $click

$1149 $799
SAVE $350

$150 $50
SAVE $100

$420 $170
SAVE $250

$1597 $1097
SAVE $500

$1499 $679
SAVE $820

$1499 $679
SAVE $820

$420 $170
SAVE $250

$369 $228
SAVE $141

$1699 $1399
SAVE $300

$1498 $998
SAVE $500

$999 $949
SAVE $50

$1099 $999
SAVE $100

OWC Envoy Pro EX SSD
Blazingly fast Thunderbolt 3 SSD!

Up to 4TB capacity, USB-C compatible.

USB-C model also available


Great for travel or for desktop!

Detecting Data Corruption Caused by Bit Rot or Bad Drives or Software Bugs with diglloydTools IntegrityChecker

Data integrity is an increasingly interesting issue in the era of hard drives that hit 14TB over a year ago, and are heading to 20TB.

Error rates in modern hard drives are of implausible odds, but they are in the range of real world concern.

For example, the Toshiba 14TB enterprise hard drives specify “Nonrecoverable Read Errors” as 1 in 10^16 bits read, which equates to 1 bit error every 1250 terabytes [calculated as 10^16/8/(1000^4)]. That’s a tiny chance—you’d need 89 drives of size 14TB to in theory encounter an error. Which of course big data centers have to concern themselves with!

But there is no telling what happens to error rates 4+ years out in a drive’s life. Or whether data transfer is always reliable.

If you count on your backups without proving they are good, this is a bad way to operate as a professional. For myself, I don’t want my photos or my spreadsheets or anything else going bad today or next year or five years from now.

The critical step is to validate your data, both originals and backups, using diglloydTools IntegrityChecker, which can validate data on any media on any platform that supports Java (Mac, Windows, Linux, etc).

Flaky hard drives? Or maybe not.

Which leads me to today’s findings. After 5 days of Thunderbolt 3 hell, I finally had most of my backups made and proceeded to validate one large RAID-4 backup volume with about 13.6TB of data on it. This backup was created using Carbon Copy Cloner, which I have found to be highly reliable. The data in question was a clone of a backup clone—two generations away from the original.

The elegant thing about IntegrityChecker is that validating the Nth-generation copy proves that all copies were valid, at least when they were copied. If a corrupt file pops up, one can validate intermediate copies to see which devices introduced the error, then replace those unreliable devices. More important, the warning lets you know that your backup data is at risk.

After about 24 hours (it takes a long time to validate 13.6TB), I was disturbed to find that a few files were flagged as having changed.

One file in particular was interesting. When I checked it against the original and the first clone, I found that the 2nd copy (the clone of the clone) had indeed changed. Even more curious, when I inspected it (hexdump -C), I saw the file contents change at least twice times reading the file, indicating that the file could not be read reliably! A group of 10 bytes or so was different each time read! Then things stabilized and the same results were seen each time, but always incorrect (corrupt).

Hard drives are supposed to detect bit errors. But were there actually errors? This seems vanishingly unlikely.

Extended attributes can cause a file to be modified?

Investigating further, I found that the file in question ("TrainingAndRaces") had one curious thing going on: 3 extended attributes, as follows:

diglloyd-iMac:Training lloyd$ xattr -l TrainingAndRaces 
com.apple.FinderInfo:
00000000  58 4C 53 38 58 43 45 4C 01 00 00 00 00 80 00 00  |XLS8XCEL........|
00000010  00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  |................|
00000020
com.apple.lastuseddate#PS:
00000000  69 80 07 5E 00 00 00 00 87 67 AE 1D 00 00 00 00  |i..^.....g......|
00000010
com.apple.quarantine: 0082;5e07806d;Microsoft Excel;

So now I am thinking that one of the extended attributes can actually cause a file contents to be modified, during cloning by rsync, or afterwards for unknown reasons.

This is exactly the sort of software risk that makes IntegrityChecker valuable—who knows what weird stuff can corrupt your files in addition to bit rot or other hardware issues?

IntegrityChecker output

Brief excerpt, showing the file flagged as corrupt—its file dates are the same but the hash of its contents are different—the file has been changed—corrupted. And that is confirmed by doing a shasum of the original and the cloned copy.

diglloyd-iMac:DIGLLOYD lloyd$ icj verify /Volumes/AtticClone_R4/_MasterClone/MyData/
# icj version 1.1 b2 @ 2019-12-27 19:00
....
3%: 78 files 76 MiB @ 37 MiB/sec, 00:02.045
TrainingAndRaces   20480 HASH_CHANGED_DATE_UNCHANGED
8%: 258 files 191 MiB @ 47 MiB/sec, 00:04.045
...
CONTENT-CHANGED FILES for /Volumes/AtticClone_R4/_MasterClone/MyData/Training
TrainingAndRaces

And the shasum results. It turns out that ALL backups of this file are corrupt. So the behavior is clearly some or rsync or similar bug related to extended attributes, a behavior that actually changes the file contents.

diglloyd-iMac:Training lloyd$ shasum /Master/MyData/Training/TrainingAndRaces
/Volumes/AtticClone_R4/_MasterClone/MyData/Training/TrainingAndRaces  
/Volumes/Attic.MasterClone/MyData/Training/TrainingAndRaces
/Volumes/EVP_MasterClone/MyData/Training/TrainingAndRaces
cb0e75cd20d6bec6ce82ed5cab730323762107b6  /Master/MyData/Training/TrainingAndRaces
00f7c06b6b4861e0ec947015e0656a412931f994  /Volumes/AtticClone_R4/_MasterClone/MyData/Training/TrainingAndRaces
b86583d32ba2af186b4ea5a68f6743b344b5621b  /Volumes/Attic.MasterClone/MyData/Training/TrainingAndRaces
b86583d32ba2af186b4ea5a68f6743b344b5621b  /Volumes/EVP_MasterClone/MyData/Training/TrainingAndRaces

Save Big $$$$ on Memory for 2019 Mac Pro

Up to 65% better pricing than Apple

Lloyd recommends 32GB RDIMM modules for most users (more expensive LRDIMMS are for 512GB or more).


OWC Accelsior 4M2 PCIe SSD
6000 MB/sec!
Mac or PC.


Ideal for Lightroom, Photoshop, video.
Capacity up to 16TB!
View all handpicked deals...

Apple iPad Pro 12.9" (512GB, Wi-Fi + 4G LTE, Space Gray, Previous Gen)
$1499 $1049
SAVE $450

diglloyd.com | Terms of Use | PRIVACY POLICY
Contact | About Lloyd Chambers | Consulting | Photo Tours
Mailing Lists | RSS Feeds | Twitter
Copyright © 2020 diglloyd Inc, all rights reserved.
Display info: __RETINA_INFO_STATUS__