diglloyd Mac Performance Guide

The Speed You Need

100% Kona, 100% Family Owned

Optimizing Canon Digital Photo Professional

Last updated June 01, 2009 - Send Feedback

Canon’s Digital Photo Professional (DPP) has made minimal progress in improving its utilization of multiple CPU cores over the years. It’s a little better than version 1.0, but not much. This article covers version 3.5.1, the latest version as of December 2008.

The solution to faster batch processing should be obvious to even a junior engineer, yet Canon hasn’t done it. It would probably take one day to implement the user-level workaround described here: start multiple workers, one worker per CPU core, see details.

Update September 1, 2010

By manually starting multiple workers, the 12-core Mac Pro can make short work of CR2 RAW files.

DPP selecting images
On a 12-core Mac Pro, cut the time by 2/3!

Multiple workers with DPP

In Digital Photo Professional (DPP), multiple images may be selected, and processed as a group (“batch”).

Test results

A test*** was done with 128 Canon EOS 1Ds Mark III RAW files, averaging 24.7MB each, processed to 120.5MB 16-bit TIF files.

The Mac Pro system used two striped RAIDs, one for the CR2 files and one for output.

DPP Process 128 CR2 files to 16-bit TIF (Canon 1Ds Mark III 21MP files)
Time in seconds, lower is faster

Workers Mac Pro
quad-core 3GHz
MBP 2.4 GHz MBP 2.8GHz
2008 unibody
Comments
1 1252 2108 1700 Single worker is a losing proposition.
2 775 1552 1386 MBP 2.4GHz is ~maxed-out with 2 workers.
3 702 1536 1235 MBP 2.8GHz can use a 3rd worker for a 12% gain over 2 workers.
4 660 - - 4 cores is optimal for the quad-core Mac Pro.
8 682 - - Using even twice as many workers as the number of cores has minimal downside.

On a Mac Pro with 3 workers, the available CPU cores are almost fully utilized, but 4 workers provides the best results eg one worker per CPU core. With 8 workers (4 core Mac Pro), the additional overhead adds only a 3% penalty, so there should be no hesitation in using whatever number of workers is convenient, keeping in mind that each worker uses ~500MB of real memory. On systems with only 4GB memory, limiting workers to at most four (4) seems wise. This is another reason to avoid a dead-end Mac.

Rule of thumb: use at least 3 workers on a quad-core system, 4 is optimal, but more will have little negative impact. On an 8-core system, 6-8 workers seems likely to produce best results (not tested). On dual-core systems, two workers is adequate.

Unlike Canon’s DPP, Nikon’s Capture NX is so poorly engineered that it not only makes poor use of CPU cores, but actively defeats a workaround like this one, by not allowing more than one batch at a time! There are indeed different levels of brain-deadness.

*** Test used a 3.0GHz quad-core Mac Pro with Mac OS X 10.5.5. On the Mac Pro, a striped RAID volume was used for the RAW files and a separate striped RAID was used for the 120.5MB 16-bit TIF output files. In other words, disk I/O speed was not a factor (and generally won’t be with DPP).

Selecting images

Shown below is the DPP main window.

Be sure to view image icons as small thumbnails in order to fit more thumbnails into the main window. Use the View => Small thumbnail command. You can shift-click to select a group of images.

DPP selecting images
DPP main window: select files to be batch-processed

When multiple folders of images are involved, it’s quite natural to start more than one batch process. But if you have a large number of images in a single folder, simply shift-click to select a sub-group of images, and do this for multiple sub-groups, starting a batch for each one.

Starting a batch

To start a batch process, use cmd-B or choose File => Batch Process. Each time you do so, a new “batch worker” will be started. By starting more than one batch, you create more than one worker, each of which will use about 150% of a CPU core (100% means one full CPU core).

On a quad-core system, 3-4 batches exploit all cores, on an 8-core system you’ll need 5-6 workers.

CPU utilization

We can see just how effective multiple workers are by viewing CPU usage in Activity Monitor How. With a single worker, the four cores of the Mac Pro are grossly underutilized (green and red mean active use, black means unused).

A single worker takes 60% longer than two workers, and 90% longer than four workers.

Digital Photo Professional CPU utilization 1 worker
CPU utilization with one (1) batch worker

With two workers, CPU utilization is much better. Still two workers takes 17% longer than four workers.

Digital Photo Professional CPU utilization 2 workers
CPU utilization with two (2) batch workers

Three workers further improves the situation, and for small jobs, 2 or 3 workers makes sense (lower user involvement/less mental effort). But for big jobs, that black area shown below represents wasted CPU time.

Digital Photo Professional CPU utilization 2 workers
CPU utilization with three (3) batch workers

With four workers, CPU utilization is almost 400% (100% per core). On 8-core systems, more workers would be needed.

Digital Photo Professional CPU utilization 4 workers
CPU utilization with four (4) batch workers

With an 8-core Mac Pro, it’s more of a challenge; 6-7 workers are needed to keep all 8 cores busy, which should look something like this:

Digital Photo Professional CPU utilization 4 workers
CPU utilization with seven (7) batch workers

Fire (off) more workers!

It would be trivial to improve DPP’s batch processing; perhaps a Canon engineer will read this section and take action! But that hasn't happened for 2.5 years so far.

The existing logic used for batch-processing with Digital Photo Professional is brain-dead:

allFiles = getListOfFilesToProcess();
start1Worker( allFiles );

DPP already is capable of launching multiple workers via user action eg select some files start a batch, select more files, start a batch, etc.  The following logic shows how DPP could instantly improve performance for batch-processing on multiple CPU/core machines with a very simple logic change:

allFiles = getListOfFilesToProcess();
file_lists = divideListIntoOneListForEachCPU(allFiles)
for ( each file_list) startWorker( file_list )

There are plenty of other improvements that could be made in efficiency (including threading the processing of individual files in interactive use), but this approach is “low hanging fruit” that could be implemented in a day or two by a competent engineer.

Memory usage of workers

Each batch worker takes about 500MB of real memory (peak). In order to run 4 workers and DPP itself, you’ll need close to 2GB of free memory. If memory is constrained, you can quit DPP itself after starting the batch workers. It also means that on 4GB systems, using more than 4 workers is not advised.

Conclusions

For batch jobs, major gains in workflow speed can be made by using as few as two batch workers, easily done by selecting files and starting two separate jobs, even on files in the same folders.

In general, use N or N+1 workers for fastest processing, where N is the number of CPU cores, but watch the memory usage on systems with limited memory.

Help us help you! You support the reviews and how-to articles on this site when you buy from trusted site sponsor Other World Computing (use the links/ads anywhere in this Guide).

OWC does not sell Macs, so when buying a Mac, please click here to go to the Apple store, but remember to save money by getting your memory and hard drives, backup systems, etc at OWC.

Please let others know about this site by posting a hyperlink to site page(s) in discussion forums, web sites, etc — thank you!


The Speed You Need

diglloyd.com | Terms of Use | PRIVACY POLICY
Contact | About Lloyd Chambers | Consulting | Photo Tours
Mailing Lists | RSS Feeds | Twitter
Copyright © 2008-2014 diglloyd Inc, all rights reserved.