Highly Recommended!
External SSD wish list • Deals on OWC
$220 SAVE $130 = 37.0% Western Digital 16.0TB Western Digital Ultrastar DC HC550 3.5-in… in Storage: Hard Drives	$680 OWC 2.0TB OWC Atlas Ultra CFexpress 4.0 Type B Memory Card in All Other Categories	$580 OWC 4.0TB OWC Envoy Pro Elektron USB-C Portable NVMe SSD in All Other Categories	$630 OWC 4.0TB OWC Express 1M2 USB4 (40Gb/s) Bus-Powered Portable NVM… in All Other Categories	$2380 OWC 72.0TB OWC ThunderBay 4 Four-Drive Thunderbolt External Stor… in All Other Categories

Newly In Stock: SEE ALL...
Sony Mirrorless wish list • Deals on Sony
$2997 Nikon Z7 II Mirrorless Camera OUT OF STOCK in Cameras: Mirrorless	$2297 Nikon D780 DSLR OUT OF STOCK in Cameras: DSLR	$2997 Nikon D850 DSLR OUT OF STOCK in Cameras: DSLR	$720 ZEISS 32mm f/1.8 Touit Lens for FUJIFILM X OUT OF STOCK in Lenses: Mirrorless	$497 Pentax 70mm f/2.4 HD Pentax DA Limited IN STOCK in Lenses: DSLR

Pathological Network Performance in Apple OS X

2015-01-04 • SEND FEEDBACK |

Related: 4K and 5K display, 6K display, Apple, Apple macOS, computer display, laptop, MacBook, MacBook Pro, memory, networking

This is a technical piece safely skipped by most readers.

Apple throws resources at eye candy frippery in the OS, while leaving critical areas in serious “AWOL reliability” territory. More Apple Core Rot.

Your author spent about 14 hours tracking down an OS X performance bug while testing very high server loads (48 client threads from two machines with 12 cores total on local LAN gigabit against a highly optimized Tomcat web server). The test scenario involved 5000 to 15,000 client hits against the server per second, reaching up to 87MB/sec in delivering ~2K to ~40K HTML files to the client machines.

In a nutshell, the OS X networking stack enters a pathological performance situation which essentially shuts down all networking capability for ~30 seconds at a time (“AWOL ~30 seconds”). That is, with the default networking buffer sizes (ncl=131072 seems to be the default buffer size = 256MB memory). The performance bug was reproduced using the server on an 8-core 3.3 GHz Mac Pro, 2-core MacBook Pro, 4-core MacBook Pro and 4-core MacBook Pro Retina (16GB for the laptops, 64GB for the MacPro, total memory not relevant, ample to spare). Observed on OS X 10.10.1 and 10.8.5, so it is not a new bug.

When the system locks up its networking stack, netstat shows something like this (100% in use was also seen).

diglloydMP:MPG lloyd$ netstat -m

24615/24615 mbufs in use:
24565 mbufs allocated to data
50 mbufs allocated to socket names and addresses
712/712 mbuf 2KB clusters in use
19884/19884 mbuf 4KB clusters in use
2730/2730 mbuf 16KB clusters in use
131754 KB allocated to network (99.8% in use)
0 KB returned to the system
0 requests for memory denied
1038 requests for memory delayed
226 calls to drain routines

Ruling out many things and tearing out much hair, it became clear that the problem was in the OS itself. Much experimentation found that increasing the networking buffer memory to 512MB (ncl=262144) resolved the issue, at least with 48 client threads over local gigabit LAN hitting the server from a total of 12 cores on 2 clients.

Doubling the memory for the networking buffers almost entirely (but not quite) solves the problem:

sudo nvram boot-args="ncl=262144" (reboot required)

Note that ncl is a maximum and that the system dynamically allocates memory as needed up to that maximum, so that netstat -mm will show much smaller memory usage until a load is applied. Attempting to use 384K buffers hosed the networking stack. ncl=262144 might be the hard limit.

With the larger buffers in place, the system was able to handle the test load, but attempting to use more buffer space makes the networking stack fail entirely (dead). In short, OS X can barely handle gigabit ethernet speeds with a high volume of relatively small requests (4K to 40K typical). A toy OS for serious use. This explains some head scratchers MPG has seen in the past: a fundamentally broken OS X networking stack that goes AWOL for ~30 seconds at a time if the load is too high.

With ncl=262144 (256K buffers X 2K per buffer = 512MB memory) and 48 client threads over local gigabit LAN 99.6% utilization was seen, with no AWOL networking stack. The figures shown below are not the highest utilization observed, but are close.

netstat -mm
class buf active ctotal total cache cached uncached memory
name        size    bufs    bufs    bufs    state   bufs    bufs    usage
———-        —–      ——–     ——–     ——–     —–      ——–     ——–     ———
mbuf        256     83190   14688   86000   on      345     2465    3.6     MB
cl          2048    19213   609     19822   purge   0       609     1.2     MB
bigcl       4096    52099   0       52099   purge   0       0       0
16kcl       16384   10922   0       10922   on      0       0       0
mbuf_cl     2304    19213   19213   19213   purge   0       0       42.2    MB
mbuf_bigcl  4352    52099   52099   52099   purge   0       0       216.2   MB
mbuf_16kcl  16640   10922   10922   10922   on      0       0       173.3   MB
17654/83190 mbufs in use:
17307 mbufs allocated to data
347 mbufs allocated to packet headers
65536 mbufs allocated to caches
19213/19822 mbuf 2KB clusters in use
52099/52099 mbuf 4KB clusters in use
10922/10922 mbuf 16KB clusters in use
447022 KB allocated to network (99.6% in use)
0 KB returned to the system
0 requests for memory denied
0 requests for memory delayed
4 calls to drain routines

Z C writes:

About your article, and what things are shaping into, it seems lessons have not been learned.

I worked for over 25 years in Mainframe datacenters. When IBM introduced Z/OS, replacing MVS and consolidating the move to 64b architecture, they too came out with very frequent upgrades to their OS. Many BIG PROBLEMS emerged in enterprises and companies that invest millions of $$$ in IT, and we, the tech. systems guys, were struggling with stupid bugs and serious performance/workload issues. The icing on the cake came when we upgraded our CPU and it came with a microcode so advanced... that did not support our current OS version (about 2 releases, 1,5 years behind the then latest version)… What was to be a simple 16H weekend intervention turned into a nonstop 72 hour party, with weeks of aftermath…
This was the beginning of the end for me in the IT business...

Seems that the need to push people to buy new HW moves this? I understand that maintaining legacy products is expensive, and that change is good. But abandoning the use of CD/DVD drives is one thing, another is forcing changes in the OS so as to sell new HW. I think everybody who is or has been in the IT world is preoccupied now at what Apple will do next with OSX.

MPG: history repeats itself in core issues. OS X Yosemite is not exactly “Vista”, but maybe we’re headed that way.