All Posts by Date or last 15, 30, 90 or 180 days.

As an Amazon Associate I earn from qualifying purchases @AMAZON

Designed for the most demanding needs of photographers and videographers.
Connect and charge all of your devices through a single Thunderbolt or USB-C port.

How to More than Double I/O Speed for Validation of Data Integrity of Large Amounts of Data (IntegrityChecker)

As discussed in Years Later, macOS Big Sur leaves Major Disk I/O Performance Bug Unfixed, Apple cannot seem to do more with each release of macOS than create an onslaught new bugs, emojis and glittery features. While fixing more bugs than are created along with improving performance and reliability are way down the list.

So all you can do to deal with Apple’s problems is to work around them, when a workaround exists. Here is one such workaround.

I regularly verify data integrity of my work, which totals ~16TB at this point. It is all on fast SSDs, and much of it on the OWC Accelsior 4M2, which is capable of sustained speeds of nearly 7GB/sec.

But that speed is more than cut in half due to the caching bug in macOS referenced above, at least on systems with a lot of memory. And the more memory you have, the greater the performance loss. That’s right: the more memory you have and the faster the SSD, the greater the performance loss—thank you Apple!

Speed on hard drives and slow SSDs will not be affected much or at all; this issue applies only to very fast SSDs (more than 3GB/sec).

Short version

You can get that lost SSD speed back at least when using diglloydTools IntegrityChecker Java (icj): run 'icj' with 'sudo' as in:
sudo icj verify ...

Diagnosing the performance bug

As files get cached, the algorithm used by macOS incurs more and more overhead—very poor software engineering. Shocking really, given that my Mac Pro has 384GB memory, but can accept 1536GB, where the problem would be 4X worse. And maybe much worse, depending on the algorithm involved.

Below, IntegrityChecker is using only about 10 CPU cores on a 28-core machine. It is being starved for data that should be arriving at 6.9GB/sec, but which macOS has degraded to 3GB/sec because of the caching overhead.

Poor CPU utilization on 28-core Mac Pro, with excessive kernel usage also

Below, the CPU history shows an excessive amount of system (kernel) CPU time being used (red), with the green stuff being the application usage. The reason is inefficient caching, so inefficient that caching overhead takes more time than just reading the data!

Excessive system-level CPU usage caused by inefficient macOS caching algorithm

IntegrityChecker log excerpt —  impacted by caching bug

The caching problems have throttled the speed to 2971 MiB/sec, taking ~43 minutes to complete for 7.45 TiB of data. That’s a performance loss of 55%.

Why not set a “no cache” flag? Because no such flag is available in the Java APIs. You have to descend way down into POSIX APIs to do so—not available in Java. The old Apple Carbon APIs had such a flag, and DiskTester and the native 'ic' IntegrityChecker could use them. But those APIs are deprecated and slated for removal.

Read on to see how this can be overcome/fixed by using 'sudo'.

diglloydMP:DIGLLOYD lloyd$ icj verify Work
# icj 2.0 2021-03-09 10:00
# ©2020 DIGLLOYD INC. All Rights Reserved. Valid license required.  https://diglloydtools.com
# 2021-10-26 11:30:34 {USER=lloyd homeDir=/Users/lloyd OS=macOS}
Process folder: /Volumes/Work
...
# Hash data for 11773 folders containing 154269 files loaded in 3136 ms.
Hashing 154269 files totaling 7451.4 GiB in 11773 folders... 
0%: 505 files 18.8 GiB @ 6406 MiB/sec, 00:03.000
0%: 985 files 38.6 GiB @ 6579 MiB/sec, 00:06.000
0%: 1543 files 58.1 GiB @ 6613 MiB/sec, 00:09.001
1%: 2120 files 77.9 GiB @ 6642 MiB/sec, 00:12.0
...
5%: 8252 files 375.2 GiB @ 6399 MiB/sec, 01:00
...
6%: 8571 files 449.8 GiB @ 5900 MiB/sec, 01:18
...
8%: 12725 files 600.6 GiB @ 4996 MiB/sec, 02:03
...
10%: 15662 files 749.8 GiB @ 4407 MiB/sec, 02:54
...
12%: 19939 files 904.0 GiB @ 4109 MiB/sec, 03:45
...
15%: 25153 files 1121.3 GiB @ 3823 MiB/sec, 05:00
...
20%: 32835 files 1494.7 GiB @ 3437 MiB/sec, 07:25
...
99%: 153926 files 7427.8 GiB @ 2971 MiB/sec, 42:39
99%: 153982 files 7435.6 GiB @ 2971 MiB/sec, 42:42
99%: 154080 files 7443.6 GiB @ 2971 MiB/sec, 42:45
Waiting for 25 of 154269 files to finish...
100%: 154269 files 7451.4 GiB @ 2971 MiB/sec, 42:48

Checking overall status for 11773 folders... done.
========================================================================================================================
2021-10-26 12:13:27 : 11773 folders totaling 7451.4 GiB
/Volumes/Work
========================================================================================================================
# With hash: 154269
# Without hash: 0
# Hashed: 154269
# Missing Files: 0
# Missing Folders: 0
# Changed size: 0
# Changed date: 0
# Changed content + date, size unchanged: 0
# Total files differing: 0
# Num ignored folders: 5
# Num ignored files: 435
# SUSPICIOUS files: 0
icj done at Tue Oct 26 12:13:27 PDT 2021 runtime 42:53

Fixing the performance bug

Application developers cannot fix macOS kernel bugs. But sometimes a workaround exists.

When I discovered this issue with my 2019 Mac Pro and its 384GB memory, I also found that running the 'purge' command to to keep the amount of caching down greatly improved speed. Therefore, IntegrityChecker Java (icj), automatically execs 'sudo purge' frequently and continually. To do so, it must be run with 'sudo', because 'purge' is a privileged command.

With the macOS caching kept under control, data now arrives at about ~6.9GB/sec, and CPU usage by icj leaps accordingly from ~10 CPU cores to 18 to 24 CPU cores, such as shown here.

Poor CPU utilization on 28-core Mac Pro, with excessive kernel usage also

With the macOS caching kept under control, system CPU usage (red) drops dramatically while the application CPU usage (green) doubles. Since the 'purge' is also system CPU usage, the real reduction of caching overhead is even larger than it appears here. But the purge chews up an entire CPU core almost continually.

Excessive system-level CPU usage caused by inefficient macOS caching algorithm

IntegrityChecker log excerpt

With the macOS caching bug under control, IntegrityChecker has maxed-out the read capabilities of the OWC Accelsior 4M2 PCIe SSD, delivering 6635 MiB/sec = 6957 MB/sec. That is all the SSD can do at its very best!

IntegrityChecker on this 28-core Mac Pro could perhaps deliver 9000 to 11000 GB/sec, were the SSD fast enough.

diglloydMP:DIGLLOYD lloyd$ sudo icj verify Work
# icj 2.0 fc6 2020-10-01 08:20
# ©2020 DIGLLOYD INC. All Rights Reserved. Valid license required.  https://diglloydtools.com
# 2021-10-26 12:22:42 {USER=lloyd as root homeDir=/Users/lloyd OS=macOS isRoot=true}
Process folder: /Volumes/Work
# Hash data for 11773 folders containing 154269 files loaded in 2197 ms.
# Purging operating system file cache...3608 ms. Subsequent purges every 5 seconds.
Hashing 154269 files totaling 7451.4 GiB in 11773 folders... 
0%: 508 files 18.9 GiB @ 6444 MiB/sec, 00:03.000
0%: 982 files 38.4 GiB @ 6556 MiB/sec, 00:06.000
0%: 1539 files 58.0 GiB @ 6600 MiB/sec, 00:09.000 P 
1%: 2116 files 77.7 GiB @ 6629 MiB/sec, 00:12.0
...
5%: 8543 files 407.9 GiB @ 6625 MiB/sec, 01:03
...
8%: 12991 files 608.1 GiB @ 6482 MiB/sec, 01:36
...
10%: 16035 files 764.8 GiB @ 6522 MiB/sec, 02:00
...
12%: 19867 files 901.6 GiB @ 6544 MiB/sec, 02:21
..
15%: 25529 files 1136.9 GiB @ 6573 MiB/sec, 02:57
...
20%: 33013 files 1504.4 GiB @ 6579 MiB/sec, 03:54
...
99%: 153755 files 7409.9 GiB @ 6633 MiB/sec, 19:03
99%: 153935 files 7429.5 GiB @ 6634 MiB/sec, 19:06
99%: 154175 files 7449.2 GiB @ 6634 MiB/sec, 19:09 P 
Waiting for 68 of 154269 files to finish...
100%: 154269 files 7451.4 GiB @ 6635 MiB/sec, 19:10
Checking overall status for 11773 folders... done.
========================================================================================================================
2021-10-26 12:41:59 : 11773 folders totaling 7451.4 GiB
/Volumes/Work
========================================================================================================================
# With hash: 154269
# Without hash: 0
# Hashed: 154269
# Missing Files: 0
# Missing Folders: 0
# Changed size: 0
# Changed date: 0
# Changed content + date, size unchanged: 0
# Total files differing: 0
# Num ignored folders: 5
# Num ignored files: 435
# SUSPICIOUS files: 0
Purging operating system file cache...done purging.
icj done at Tue Oct 26 12:42:05 PDT 2021 runtime 19:22
View all handpicked deals...

Seagate 22TB IronWolf Pro 7200 rpm SATA III 3.5" Internal NAS HDD (CMR)
$500 $400
SAVE $100

diglloyd.com | Terms of Use | PRIVACY POLICY
Contact | About Lloyd Chambers | Consulting | Photo Tours
Mailing Lists | RSS Feeds | X.com/diglloyd
Copyright © 2020 diglloyd Inc, all rights reserved.
Display info: __RETINA_INFO_STATUS__