Thank you for purchasing through links and ads on this site.
OWC / MacSales.com...
diglloyd Deal Finder...
Buy other stuff at Amazon.com...
Get up to 16x more storage and 2x the speeds of the original drive
Handpicked deals...
$2499 $1399
SAVE $1100

$400 $280
SAVE $120

$1798 $1598
SAVE $200

$3297 $2797
SAVE $500

$3397 $2797
SAVE $600

$150 $90
SAVE $60

$1398 $898
SAVE $500

$3698 $2998
SAVE $700

$2998 $2498
SAVE $500

$1799 $1329
SAVE $470

$1999 $1199
SAVE $800

$2299 $1599
SAVE $700

$2399 $2049
SAVE $350

$2799 $2399
SAVE $400

$2799 $1899
SAVE $900

$1199 $920
SAVE $279

$997 $897
SAVE $100

$2099 $1699
SAVE $400

$1999 $1369
SAVE $630

$1999 $1599
SAVE $400

$1349 $949
SAVE $400

$4499 $3099
SAVE $1400

$4499 $3999
SAVE $500

$329 $329
SAVE $0

$1499 $1029
SAVE $470

$1499 $1289
SAVE $210

$2199 $1999
SAVE $200

$3399 $2199
SAVE $1200

Scalability — multi-core performance

Last updated 2009-06-01 - Send Feedback
Related: CPU cores, Mac Pro, macOS, memory, Photoshop

This pages discusses what it means to “scale” in the context of multiple CPU cores.

In a perfect world, 8 cores would complete a task in exactly half the time that 4 cores requires.

In the real world that almost never happens: every useful task accesses memory and/or a hard drive or the network. There is also the overhead of coordinating multiple “threads” (workers), generally one per CPU core. Yet there are well-written programs that can approach perfect scaling.

For examples of scalability or non-scalability, see:

What full CPU usage looks like

We will use Genuine Fractals 6.0 as the “poster child” for very good scalability from a commercial software application

Genuine Fractals 6.0 CPU usage on 8-core Mac Pro

On Mac OS X, Activity Monitor can be used to view CPU usage How.

This graph shows almost total usage of the CPU cores (800% at 100% per core), but understand that full use doesn’t mean full efficiency.

That’s right: even though the CPU cores are busy, the cores could actually be mostly idle, twiddling their thumbs (so to speak), competing for access to the same memory.

OWC Thunderbolt 3 Dock
Ideal for any Mac with Thunderbolt 3


Dual Thunderbolt 3 ports
USB 3 • USB-C
Gigabit Ethernet
5K and 4K display support plus Mini Display Port
Analog sound in/out and Optical sound out

Works on any Mac with Thunderbolt 3

Genuine Fractals scalability

Fortunately, Genuine Fractals 6.0 does mostly computing (calculation), with minimal disk access and moderate memory access requirements, so the CPU cores are actually being used with high efficiency. The black inverse spikes indicate times when the CPU cores are idle; monitoring shows that these are times when the disk is being used.

Genuine Fractals 6.0 CPU usage on 8-core Mac Pro

To determine the scalability of Genuine Fractals, I timed the same task with 2, 4 and 8 cores, easy to do with Apple’s developer tools (CHUD) via the menu or the Processor Palette. This is not a perfect test; the program still thinks there are 8 cores, even if some of them are disabled, so it’s going to create 8 “threads” instead of the number of threads for the actual active cores available to do work.

The job was to scale an image to 40X30" at 360 dpi. Times were recorded on a 2.8GHz 8-core Mac Pro (2008) with 32GB memory on Mac OS X 10.5.6. Scalability is very good, but not perfect; disk I/O causes brief pauses on a regular basis (the black areas in the CPU usage graph above).

With some engineering effort, the Genuine Fractals engineers might be overlap the disk I/O with computation so as to eliminate the regular (though short) periods of idle CPU usage; visually that disk I/O appears to be responsible for a significan part of the less-than-perfect scalability.

Bottom line: with Genuine Fractals 6.0, an 8-core Mac Pro is effectively a 7.4-core machine in actual results. That’s not perfect, but it’s very very good.

# Cores Seconds Comments
1 1474 1.0X With a single core, inefficiencies do creep in; disk I/O potentially stops all computing activity until done. Also, any background activity (Mac OS X itself, other programs, etc) take away from the single active core’s ability to get its job done.
2 729 2.02X With two cores, one core might continue computing while the other is idle doing disk I/O, and contention for memory access is still relatively low.
4 385 3.82X Best possible is 4.0X. With 4 cores, contention for memory access rises.
8 200 7.37 Best possible is 8.0X. With 8 cores, contention for memory access rises further.
Which Camera System 📷 is Best?
Which Lenses to Choose?🌈


Avoid costly mistakes and get the ideal system for your needs: diglloyd photographic consulting.

MemoryTester test-compute-speed

The MemoryTester test-compute-speed command computes a SHA1 hash, using all active CPU cores, with a mix of pure computation and moderate memory access (most of which can be cached by the CPU on-chip cache). MemoryTester is smart enough to recognize how many active CPU cores are present.

Let’s see how it scales, keeping in mind that with a single core, background system activity scarfs up a wee bit of processing power, which is larger as a percentage for a single core than with more than one core.

We can see here perfect scalability, the ideal situation. Almost certainly test-compute-speed would scale very well to 16 cores. Most applications have no hope of scaling this well, either from poor engineering, or simply the nature of the work to be done.

# Cores MB/sec Comments
1 201.2 1.0X
2 403 2.00X
4 809 4.02X
8 1615 8.03X

Limitations on scalability

The limiting factors for scalability is usually software. It requires top-notch engineers to design and build correct and efficient applications—it’s hard stuff.

Specialty programs like Genuine Fractals 6.0 have engineered efficient use of 8 cores. Other programs use multiple CPU cores to some degree, but might not consider the “low hanging fruit”; for example Adobe Photoshop CS4 is single-threaded while opening and saving files, and won’t even allow the user to work on something else while an open or save is in progress; 7 of 8 cores on a Mac Pro sit idle during the process.

Apple iMac 5K Speed Demon
$3849
$3499

It rocks! What Lloyd uses every day, best upgrade in many years!

SMART MOVES:
• Add 128GB or 64GB memory
fast SSD storage
Thunderbolt storage

√ B&H Photo PAYS THE SALES TAX FOR YOU More info...

Memory bandwidth

When disk I/O is involved, there are techniques to overlap disk I/O with simultaneous computation, so even programs that use the disk heavily can exploit multiple cores efficiently, at least so that the disk alone becomes the limiting factor (CPU speed having little effect).

For compute-bound applications, the main bottleneck is memory access. This is why we keep seeing faster and faster memory (and larger on-chip caches) in each generation of computer; fast CPUs aren’t so fast when they’re forced to compete for access to memory. The 2008 Mac Pro improved memory speed to 800MHz from 667MHz (20% faster), and a 2009 Mac Pro will almost certainly move to 1066MHz or faster memory.

 


Deals Updated Daily at B&H Photo
View all handpicked deals...

Samsung 2TB T5 Portable Solid-State Drive (Black)
$400 $280
SAVE $120

diglloyd.com | Terms of Use | PRIVACY POLICY
Contact | About Lloyd Chambers | Consulting | Photo Tours
Mailing Lists | RSS Feeds | Twitter
Copyright © 2019 diglloyd Inc, all rights reserved.
Display info: __RETINA_INFO_STATUS__