With extra discount for active personnel.
Nehalem Cores vs. Clock Speed: Digital Camera RAW-file Processing
This test measures digital camera RAW file processing speed with various converters. All tests repeated to cross-check.
CaptureONE Pro offers a 21% reduction in processing time on the 8-core 2.93GHz, taking 26% longer on the 3.33Ghz quad-core.
Lightroom 2.7 is 40% slower on the 8-core! More is less. This is apparently due to a memory leak bug, which I have previously observed on an 8 core Mac Pro. LR 2.7 scarfed up 2GB more memory than on the quad-core (and LR 3 did not). I repeated the test 3 times, each time with the same results. LR 2.7 also slows down with more files (I estimate 2-3 days to process ~5000 files when the bug occurs ).
Lightroom 3 beta ekes out a measly 3% gain on the 8-core While better than LR 2.7, it is very disappointing showing that Lightroom is not engineered to use more than 4 cores.
Aperture 3 is 19% slower on the 8 core.
Digital Photo Professional is essentially the same speed on both models.
RAW Developer is 9% slower on the 8 core and its speed is in line with the clock speed difference. Not surprising, since it is single-threaded.
Observe also the absolute time for the conversions: LR 3 beta is very fast compared to the other converters, even if it does not use an 8-core machine efficiently. It’s a pity it does not scale to 8 cores.
None of the RAW-file converters make full use of CPU resources. Put simply, this is the result of poor software engineering, notwithstanding the lame excuses you’re bound to hear from software vendors.
For the Mac-bashers out there: this is not an OS X limitation at all. There are well written programs that do make full use of all available cores, and two of them are included in this report.
Even the simple solution is not used
In the simplest approach, programs could process one RAW file per CPU core, with a worker thread delivering I/O services— but none of them do. They all are brain-dead on the algorithm: process one file at a time, in sequence. It’s an idiotic algorithm for a multi-core world.
Most of the programs use multiple threads, but with disappointing efficiency; they just do not scale beyond a few threads. The inefficiency is not related to disk I/O, both by observation as well as having these tests run on the fastest possible internal disk setup.
This lack of attention to efficiency is an engineering misfeasance in today’s market of multi-core computers where time is money for many professionals. Witness the stupid trick for DPP and stupid trick for Lightroom proving that if only there were a will to do so, performance could be raised natively.