Mac Pro Nehalem Tests: Memory and Compute Speed
This page covers memory speed and raw compute power.
Compute speed PERMALINK
Raw compute speed with 16 threads MemoryTester 1.2 beta 2 (available as part of DAP) using this command:
dlt compute --num-threads 16 --buffer-size 32M
All threads compute the SHA1 hash, which is compute-intensive with modest memory access requirements (triple vs dual channel speeds are the same). For both MP09 and MP08 a total of 16 threads were used; this actually improved the MP08 results slightly over 8 threads, even though it doesn’t have 16 cores.
These results suggest that the 16 virtual cores of the MP09 have about 58% more raw computing power than the 8 cores of the 2.8GHz MP08, or 38% more than a 3.2GHz MP00 — not particularly impressive on the face of it, but very welcome for the rare program that can use all cores!
Single vs dual CPU: This is a compute-bound test and it shows, as expected, that the dual-CPU machine is twice as fast as the single-CPU machine when all cores are used. What’s interesting is the relatively poor showing of the single-CPU MP09 relative to the previous-generation Mac Pro; this is probably the difference between its 8 virtual cores vs 8 real cores in the dual-CPU MP08.
Memory bandwidth PERMALINK
A garden hose delivers a trickle compared to a fire hose. It’s the same idea for memory: you want a “fire hose” delivering data to CPUs that need it.
Memory bandwidth is measured in megabytes per second (MB/sec), where a megabyte is 1024 X 1024. With Mac Pro models prior to the March 2008 Nehalem, memory bandwidth has been a serious bottleneck for some programs, causing the CPU cores to “stall” waiting for memory, making 8 cores no faster than 6. That appears to no longer be the case with the Mac Pro Nehalem.
Do not confuse memory bandwidth with real-world application performance.
The difference between dual channel (8 modules) and triple channel (6 modules) generally amounts to no difference at all. Vastly more important is having enough memory to eliminate the need for disk access, and to allow caching.
Fast enough now
While memory bandwidth is important, most programs are relatively insensitive to memory bandwidth because of the principle of locality of reference, which allows the CPU to keep frequently used data in the on-chip caches, mitigating any actual memory speed difference. It’s even less important because most programs use only a fraction of the available 16 virtual cores in the MP09.
Memory bandwidth is influenced by the number, installation location and single/double sided assembly. More on this below. The system automatically notifies the user whether the installation location is optimal or not, each time the memory configuration is changed.
Virtual cores and memory
With 16 virtual cores in the new 2009 Nehalem Mac Pro, memory bandwidth is of significant importance to applications programmed to use the massive parallelism they offer. However, most programs will see little of no effect from memory bandwidth, because they use only a handful of the available cores.
For users running specialized software that actually uses 16 cores, the new Mac Pro is a clear win, regardless of memory bandwidth.
Some programs manipulate large amounts of memory. Examples include large images in Photoshop. Even so, measured differences are nil in most cases, and no more than 1% in the diglloydSpeed1 case.
Memory test results
The memory copy speed results shown below were obtained via MemoryTester 1.2 using the command dlt stress, a machine and memory stress test that emits statistics in aggregate and per-thread basis View.
Single vs dual CPU: The single-CPU MP09 cannot compete on memory bandwidth; it has only one bank of memory for its single CPU. But with 1/2 the cores of the dual-CPU MP09, it has bandwidth per core on par with or better than the dual-CPU machine.
The results shown above represent the real-world ability to copy memory using memcpy(), a system function. They represent actual achievable rates with a real program subject to the usual context-switching and scheduling and overhead of the operating system (Mac OS X). The threads are synchronized via an operating system semaphore (event) and are started together, then run forever, with the numbers quickly converging on the sustainable memory throughput. In short, the numbers reflect reality with all its warts, not theoretical memory bandwidth measured on a test bench.
- Memory copy speed is 52% faster with six modules instead of eight (triple vs dual channel);
- Double-sided modules are about 10% faster than single-sided modules;
- Six modules is 68% faster than 3 modules (this might not matter if only a few cores are in use, but consider the 4-core Mac Pro with only 4 memory slots);
- Even the slowest memory configuration is faster than the previous-generation 2008 Mac Pro!
How Much memory PERMALINK
Whenever it’s a case of disk access versus memory, more memory is always as good or better (see Optimizing Photoshop). This applies not just to Photoshop, but to the ability of the system to use unused memory for caching, which speeds up all programs.
Bottom line: forget about 8 modules vs 6 and get what you need eg 16GB vs 12GB.
Memory configurations (details for geeks) PERMALINK
This section really doesn't need to be read: just buy whatever memory that your observations tell you is needed How.
Installing 8 modules instead of 6 modules drops the memory bandwidth from triple-channel to dual-channel speed, and this shows up in memory testing quite clearly, as graphed above.
Optimal configuration for the MP09 is always six modules. But except (possibly) for extremely specialized applications which thrash all 16 virtual CPU cores, the difference is unlikely to ever matter by more than a few percent, at most. More likely is that such programs will suffer by not having an adequate amount of memory, leading to more disk activity, which might be 100X slower.
Memory module density
Different density memory modules do seem to influence bandwidth. Notice the especially strong showing of the eight OWC 4GB modules, with bandwidth about 18% faster than eight 2GB modules. This is consistent with the fact that six 2GB modules are faster than six 1GB modules, and might be influenced by the number of memory chips on the module (OWC 2GB modules have twice the chips of Apple 1GB modules, and OWC 4GB modules have twice the chips of 2GB modules). Higher density chips may influence this situation in the future, and it’s unclear whether Apple’s sky-high prices for 4GB modules result from chips with twice the density per chip.off as any normal user could accomplish.
Speed claims made with the optimal modules
Apple’s marketing claims for improved memory performance note in the fine-print footnote “systems were configured with 6GB of RAM for Mac Pro 8-core 2.93GHz and 8GB of RAM for Mac Pro 8-core 3.2GHz.”. Ummm...since when is 6GB the same as 8GB? Still, it appears to matter little for real apps.
Memory speed conclusions PERMALINK
Memory speed is strongly influenced by the number and type of modules, but this has little effect on real applications.