As an Amazon Associate I earn from qualifying purchases @AMAZON
Mac Pro Nehalem Scalability
This page investigates how well applications scale.
We’ll start with Adobe Photoshop CS4, and perhaps add others later.
CPU core enabling/disabling

Processor Palette
Apple’s
was used to selectively disable CPU cores using its window.As displayed, CPUs 1/2/3/4 are on physical CPU chip #1, and CPUs 5/6/7/8 are on physical CPU chip #2. Each CPU chip contains 8MB of on-chip L3 cache memory shared by the 4 cores on that chip, critical to performance.
For best performance, cores were enabled on different CPUs first eg CPU 1, 5, 2, 6, 3, 7, 4, 8, in that order. Hyperthreading is left enabled in call cases.
Other combinations are possible (hyperthreads or not), which CPUs in which order, but this was not explored.
Photoshop scalability using diglloydSpeed1 PERMALINK

Please see the test results for diglloydSpeed1 comparing different Macs.
Here we investigate the effect of the number of CPU cores on execution of the diglloydSpeed1 benchmark. Using a stopwatch, it was possible to obtain times consistent to within 0.1 second on multiple trials for all CPU counts except 8, where the numbers were consistent to within 0.3 seconds, 3 trials run for the latter case, results averaged in all cases.
The times shown below are seconds to execute the test. There is a nice speedup going from a single CPU core to two, a slowdown with three, then incremental improvements until 5 cores, beyond which only trivial improvements are seen. The improvements beyond 2 cores are small, indicating poor scalability: probably contention for shared synchronization locks, since memory bandwidth is not an issue with 3 cores (triple-channel memory configuration was used). Certain Photoshop filters may improve upon this showing, but the diglloydSpeed1 benchmark uses the most common operations.
Ideally, 8 cores would run in slightly more than 1/8 the time of a single core. But even if each core were only good for half of its potential speed, we could see execution time in the 10-12 second range: equivalent to running a ~12GHz Mac Pro!

Checking the number of threads via Activity Monitor during the test run, it is clear that Photoshop is creating 3 threads for each virtual CPU core eg 48 threads for an 8-core (16 virtual core) machine (some baseline number of threads are used as well, eg 5 threads to make 5 + 16*3 = 53):
1 core: 11 threads
2 cores: 17 threads
4 cores: 29 threads
8 cores: 53 threads
Lots of threads running, not much work getting done per thread. Adobe has some work to do here to bring out the promise of current-generation machines.
Conclusions PERMALINK
Most of the computing power on the MP09 goes unused with Photoshop CS4 commonly used operations.