diglloyd Mac Performance Guide
Aura SSD for 2013 Mac Pro

Mac Pro Nehalem Scalability

Last updated June 01, 2009 - Send Feedback

This page investigates how well applications scale.

We’ll start with Adobe Photoshop CS4, and perhaps add others later.

CPU core enabling/disabling

Mac Pro Nehalem memory copy speed bandwidth
CHUD Tools
Processor Palette

Apple’s CHUD Tools was used to selectively disable CPU cores using its Processor Palette window.

As displayed, CPUs 1/2/3/4 are on physical CPU chip #1, and CPUs 5/6/7/8 are on physical CPU chip #2. Each CPU chip contains 8MB of on-chip L3 cache memory shared by the 4 cores on that chip, critical to performance.

For best performance, cores were enabled on different CPUs first eg CPU 1, 5, 2, 6, 3, 7, 4, 8, in that order. Hyperthreading is left enabled in call cases.

Other combinations are possible (hyperthreads or not), which CPUs in which order, but this was not explored.

 

Photoshop scalability using diglloydSpeed1

Mac Pro Nehalem memory copy speed bandwidth
16 cores mostly idle

Please see the test results for diglloydSpeed1 comparing different Macs.

Here we investigate the effect of the number of CPU cores on execution of the diglloydSpeed1 benchmark. Using a stopwatch, it was possible to obtain times consistent to within 0.1 second on multiple trials for all CPU counts except 8, where the numbers were consistent to within 0.3 seconds, 3 trials run for the latter case, results averaged in all cases.

The times shown below are seconds to execute the test. There is a nice speedup going from a single CPU core to two, a slowdown with three, then incremental improvements until 5 cores, beyond which only trivial improvements are seen. The improvements beyond 2 cores are small, indicating poor scalability: probably contention for shared synchronization locks, since memory bandwidth is not an issue with 3 cores (triple-channel memory configuration was used). Certain Photoshop filters may improve upon this showing, but the diglloydSpeed1 benchmark uses the most common operations.

Ideally, 8 cores would run in slightly more than 1/8 the time of a single core. But even if each core were only good for half of its potential speed, we could see execution time in the 10-12 second range: equivalent to running a ~12GHz Mac Pro!

Mac Pro Nehalem memory copy speed bandwidth
Scalability of Photoshop with diglloydSpeed1

Checking the number of threads via Activity Monitor during the test run, it is clear that Photoshop is creating 3 threads for each virtual CPU core eg 48 threads for an 8-core (16 virtual core) machine (some baseline number of threads are used as well, eg 5 threads to make 5 + 16*3 = 53):

1  core: 11 threads
2 cores: 17 threads
4 cores: 29 threads
8 cores: 53 threads

Lots of threads running, not much work getting done per thread. Adobe has some work to do here to bring out the promise of current-generation machines.

Conclusions

Most of the computing power on the MP09 goes unused with Photoshop CS4 commonly used operations.

See How to Choose and Buy a Mac.

Cycling

diglloyd.com | Terms of Use | PRIVACY POLICY
Contact | About Lloyd Chambers | Consulting | Photo Tours
Mailing Lists | RSS Feeds | Twitter
Copyright © 2008-2015 diglloyd Inc, all rights reserved.