Other World Computing
OWC
Our Trusted Vendor

IntegrityChecker — verifying file integrity

Last updated January 8, 2010

IntegrityChecker™ provides ultra-efficient validation of your data, including your originals and backups. Even single-bit errors are detected, anywhere in a file.

You can validate the integrity of your files at any time, even backups on CDs or DVDs!

It’s the perfect tool when moving data (backing up, switching systems, etc), because you can verify that all data has 100% fidelity to the original copy. And you can do so a day, a week or a month later, an important consideration if a system meltdown requires restoring all your data.

See Why would I want to use it?, below.Get IntegrityChecker (beta version)...

How does it work?

IntegrityChecker™ creates one hidden file per folder (“.ic”), which contains the integrity-validation information for each file in that folder. There is no “database”, and there is no modification to your files.

You choose which folder(s) you want to validate, running IntegrityChecker (“ic”) on those folders. Later, the originals or any copies of the originals can be validated, because the .ic files “go along for the ride” when the folder(s) are copied or backed-up.

Of course, files often legitimately change, and here IntegrityChecker provides a report on which files changed and how, including flagging suspicious files that appear to be damaged. When files change, you can run IntegrityChecker on just the folders containing the changed files, to update their status appropriately. The validation detects detect flaky hardware that might sporadically generate bit errors in files, or software nasties, etc.

*The '.ic' file contains the integrity-validation information for each file in that folder, which is based on the 160-bit cryptographically strong Secure Hash Algorithm, SHA1. Which means that if a single bit changes, the odds of missing it are incredibly small — you’re far more likely to win the lottery.

Mac OS X compatibility

IntegrityChecker is a single-file universal binary that runs on 32-bit or 64-bit systems, Mac OS X 10.6 (Snow Leopard) or 10.5 (Leopard). There are no plans to support Mac OS X 10.4 or earlier.

Command line interface

IntegrityChecker is run in Terminal as a command line interface. This is highly appropriate for the demanding task it addresses and more useful for automated environments than a GUI.

See the IntegrityChecker User Manual for information on usage.

Is it safe?

Absolutely. IntegrityChecker only reads your files, and always opens them read-only (not-writeable). Only its own “.ic” files are written to, when updating the validation information. The validation process writes no files at all— it’s strictly read-only.

Is it fast?

IntegrityChecker’s sophisticated multi-threaded design can make full use of all CPU cores, scaling to 16 CPU cores or more. It is can process data at (approximately) 2800MB/sec, if you are unusual enough to have a disk array that can run that fast. Well before Apple’s Grand Central, IntegrityChecker was written efficiently using robust “pthreads” technology. Numerous small files do not run that fast due to system overhead of course. Processing speed is dependent on how fast the disks are, not the speed of the CPU, unless you have a very fast disk and a very slow CPU.

Show below are five copies of IntegrityChecker™ (“ic”) running on a Mac Pro Nehalem 2.93Ghz 8-core. Four striped RAID arrays were configured to provide the 1.4 gigabytes per second disk I/O for testing. The Mac Pro CPUs are still only about 50% utilized; it could accept nearly 3GB/sec before “maxing out”.

Mac Pro Nehalem 8-core 2.93GHz processing 1.4 gigabytes per second

Why would I want to use it?

This section details use cases for IntegrityChecker. It is part of a serious workflow where data integrity matters.

Note that IntegrityChecker validates only those files that were previously hashed; it cannot validate new files that were never before processed.

Backup validation

Before making a backup, run IntegrityChecker on all the folders to be backed-up. Later, when the backup is complete, run IntegrityChecker on the backup to verify that all files are 100% accurate. You can repeat this process a day or a month or a year later to validate older backups, including those on CD or DVD. Since validation reads all the files, it is also a good “sanity check” on the health of the drive or CD or DVD itself.

When restoring from a backup, you can verify the backup first, or simply restore your files and then verify the restored files. In this manner, you can know for certain that the files you’ve restored are perfect.

Data transfer

Data transfer is essentially the same as backup, but it’s more about moving files to another computer, to another drive, etc. Follow the same process as for backup, and be assured that your files are perfect after the transfer.

What’s changed on the system

Especially with multiple users and/or large data sets, you might want to see which files have changed. Changes include not just legitimately-changed files, but insidious corruption by software or flaky hardware, or a CD or DVD that is damaged.

IntegrityChecker can also be used on system software, but you’ll have to run with privileges using the 'sudo' command, to allow IntegrityChecker access to protected folders (it’s as easy as typing sudo in front of the command).

Example usage

Volume“Master”

Se the user manual. This is a brief overview. Here, Master is the name of a volume, as seen in the Finder on the desktop. Your drive(s) will likely have different names.

Create hashes for all files on Master: ic update-all Master

Verify all files on Master: ic verify Master

Create hashes for changed files on Master: ic update Master

Create hashes for new files on Master: ic update-new Master

Show status of all files on Master: ic status Master

Remove all .ic files on Master: ic clean Master

Of course, you can operate on any files or folders, not just an entire volume, so you can check a specific folder or folders in a single command. You can even operate on many volumes at once, though it’s fastest to run one copy of IntegrityChecker for each volume (so they run in parallel rather than sequentially): start one Terminal window for each volume.

IntegrityChecker emits a summary report when done, as well as flagging any suspicious files.

FAQ (frequently asked questions)

Why is it better than a backup program that checks files?

IntegrityChecker can check any backup containing the validation (“.ic”) file at any time. No special formats or database is required, and it works on any media, including CD or DVD. Since the validation information is carried in the same folder as the files themselves, it “goes along for the ride”.

IntegrityChecker also runs as fast as your drives can go, so validating your files is not a tedious experience.


Copyright © 2008-2010 diglloyd Inc, all rights reserved | Terms of Use