Mac Pro Nehalem Scalability
This page investigates how well applications scale.
We’ll start with Adobe Photoshop CS4, and perhaps add others later.
CPU core enabling/disabling
Apple’swas used to selectively disable CPU cores using its window.
As displayed, CPUs 1/2/3/4 are on physical CPU chip #1, and CPUs 5/6/7/8 are on physical CPU chip #2. Each CPU chip contains 8MB of on-chip L3 cache memory shared by the 4 cores on that chip, critical to performance.
For best performance, cores were enabled on different CPUs first eg CPU 1, 5, 2, 6, 3, 7, 4, 8, in that order. Hyperthreading is left enabled in call cases.
For example by enabling CPUs 1 and 5 as shown at right (click for larger view), the on-chip cache from each physical CPU is used; enabling CPUs 1 and 2 would mean those two cores would share the cache on one physical chip eg half the amount of cache. Performance was observed to be much faster by enabling the cores optimally.
Other combinations are possible (hyperthreads or not), which CPUs in which order, but this was not explored.
Photoshop scalability using diglloydSpeed1 PERMALINK
Please see the test results for diglloydSpeed1 comparing different Macs.
Here we investigate the effect of the number of CPU cores on execution of the diglloydSpeed1 benchmark. Using a stopwatch, it was possible to obtain times consistent to within 0.1 second on multiple trials for all CPU counts except 8, where the numbers were consistent to within 0.3 seconds, 3 trials run for the latter case, results averaged in all cases.
The times shown below are seconds to execute the test. There is a nice speedup going from a single CPU core to two, a slowdown with three, then incremental improvements until 5 cores, beyond which only trivial improvements are seen. The improvements beyond 2 cores are small, indicating poor scalability: probably contention for shared synchronization locks, since memory bandwidth is not an issue with 3 cores (triple-channel memory configuration was used). Certain Photoshop filters may improve upon this showing, but the diglloydSpeed1 benchmark uses the most common operations.
Ideally, 8 cores would run in slightly more than 1/8 the time of a single core. But even if each core were only good for half of its potential speed, we could see execution time in the 10-12 second range: equivalent to running a ~12GHz Mac Pro!
Checking the number of threads via Activity Monitor during the test run, it is clear that Photoshop is creating 3 threads for each virtual CPU core eg 48 threads for an 8-core (16 virtual core) machine (some baseline number of threads are used as well, eg 5 threads to make 5 + 16*3 = 53):
1 core: 11 threads
2 cores: 17 threads
4 cores: 29 threads
8 cores: 53 threads
Lots of threads running, not much work getting done per thread. Adobe has some work to do here to bring out the promise of current-generation machines.
Help us help you! You support the reviews and how-to articles on this site when you buy from trusted site sponsor Other World Computing (use the links/ads anywhere in this Guide).
Please let others know about this site by posting a hyperlink to site page(s) in discussion forums, web sites, etc — thank you!
Most of the computing power on the MP09 goes unused with Photoshop CS4 commonly used operations.