SSDs, hard drives, iPad, enclosures, used Macs and much more!
Optimizing Canon Digital Photo Professional
Canon’s Digital Photo Professional (DPP) has made minimal progress in improving its utilization of multiple CPU cores over the years. It’s a little better than version 1.0, but not much. This article covers version 3.5.1, the latest version as of December 2008.
The solution to faster batch processing should be obvious to even a junior engineer, yet Canon hasn’t done it. It would probably take one day to implement the user-level workaround described here: start multiple workers, one worker per CPU core, see details.
Update September 1, 2010
By manually starting multiple workers, the 12-core Mac Pro can make short work of CR2 RAW files.
Multiple workers with DPP
In Digital Photo Professional (DPP), multiple images may be selected, and processed as a group (“batch”).
A test*** was done with 128 Canon EOS 1Ds Mark III RAW files, averaging 24.7MB each, processed to 120.5MB 16-bit TIF files.
The Mac Pro system used two striped RAIDs, one for the CR2 files and one for output.
DPP Process 128 CR2 files to 16-bit TIF (Canon 1Ds Mark III 21MP files)
|MBP 2.4 GHz||MBP 2.8GHz
|1||1252||2108||1700||Single worker is a losing proposition.|
|2||775||1552||1386||MBP 2.4GHz is ~maxed-out with 2 workers.
|3||702||1536||1235||MBP 2.8GHz can use a 3rd worker for a 12% gain over 2 workers.|
|4||660||-||-||4 cores is optimal for the quad-core Mac Pro.|
|8||682||-||-||Using even twice as many workers as the number of cores has minimal downside.|
On a Mac Pro with 3 workers, the available CPU cores are almost fully utilized, but 4 workers provides the best results eg one worker per CPU core. With 8 workers (4 core Mac Pro), the additional overhead adds only a 3% penalty, so there should be no hesitation in using whatever number of workers is convenient, keeping in mind that each worker uses ~500MB of real memory. On systems with only 4GB memory, limiting workers to at most four (4) seems wise. This is another reason to avoid a dead-end Mac.
Rule of thumb: use at least 3 workers on a quad-core system, 4 is optimal, but more will have little negative impact. On an 8-core system, 6-8 workers seems likely to produce best results (not tested). On dual-core systems, two workers is adequate.
Unlike Canon’s DPP, Nikon’s Capture NX is so poorly engineered that it not only makes poor use of CPU cores, but actively defeats a workaround like this one, by not allowing more than one batch at a time! There are indeed different levels of brain-deadness.
*** Test used a 3.0GHz quad-core Mac Pro with Mac OS X 10.5.5. On the Mac Pro, a striped RAID volume was used for the RAW files and a separate striped RAID was used for the 120.5MB 16-bit TIF output files. In other words, disk I/O speed was not a factor (and generally won’t be with DPP).
Shown below is the DPP main window.
Be sure to view image icons as small thumbnails in order to fit more thumbnails into the main window. Use thecommand. You can shift-click to select a group of images.
When multiple folders of images are involved, it’s quite natural to start more than one batch process. But if you have a large number of images in a single folder, simply shift-click to select a sub-group of images, and do this for multiple sub-groups, starting a batch for each one.
Starting a batch
To start a batch process, use cmd-B or choose. Each time you do so, a new “batch worker” will be started. By starting more than one batch, you create more than one worker, each of which will use about 150% of a CPU core (100% means one full CPU core).
On a quad-core system, 3-4 batches exploit all cores, on an 8-core system you’ll need 5-6 workers.
We can see just how effective multiple workers are by viewing CPU usage in Activity Monitor How. With a single worker, the four cores of the Mac Pro are grossly underutilized (green and red mean active use, black means unused).
A single worker takes 60% longer than two workers, and 90% longer than four workers.
With two workers, CPU utilization is much better. Still two workers takes 17% longer than four workers.
Three workers further improves the situation, and for small jobs, 2 or 3 workers makes sense (lower user involvement/less mental effort). But for big jobs, that black area shown below represents wasted CPU time.
With four workers, CPU utilization is almost 400% (100% per core). On 8-core systems, more workers would be needed.
With an 8-core Mac Pro, it’s more of a challenge; 6-7 workers are needed to keep all 8 cores busy, which should look something like this:
It would be trivial to improve DPP’s batch processing; perhaps a Canon engineer will read this section and take action! But that hasn't happened for 2.5 years so far.
The existing logic used for batch-processing with Digital Photo Professional is brain-dead:
allFiles = getListOfFilesToProcess();
start1Worker( allFiles );
DPP already is capable of launching multiple workers via user action eg select some files start a batch, select more files, start a batch, etc. The following logic shows how DPP could instantly improve performance for batch-processing on multiple CPU/core machines with a very simple logic change:
allFiles = getListOfFilesToProcess();
file_lists = divideListIntoOneListForEachCPU(allFiles)
for ( each file_list) startWorker( file_list )
There are plenty of other improvements that could be made in efficiency (including threading the processing of individual files in interactive use), but this approach is “low hanging fruit” that could be implemented in a day or two by a competent engineer.
Memory usage of workers
Each batch worker takes about 500MB of real memory (peak). In order to run 4 workers and DPP itself, you’ll need close to 2GB of free memory. If memory is constrained, you can quit DPP itself after starting the batch workers. It also means that on 4GB systems, using more than 4 workers is not advised.
For batch jobs, major gains in workflow speed can be made by using as few as two batch workers, easily done by selecting files and starting two separate jobs, even on files in the same folders.
In general, use N or N+1 workers for fastest processing, where N is the number of CPU cores, but watch the memory usage on systems with limited memory.