Thunderbolt Bug: Drives Disconnect at Random and Intermittent Times
Get Thunderbolt 2 cables and Thunderbolt 3 cables at MacSales.com.
See also: Thunderbolt on OS X: Spontaneous Drive Disconnect and 2013 Apple Mac Pro: Cables and Rotating Chassis.
It is bad enough that poor Apple hardware design can toast a $700 optical Thunderbolt 2 cable.
Back in 2016 in Thunderbolt on OS X: Spontaneous Drive Disconnect, I wrote about a frustrating problem of drives going AWOL at random times. Well, it seems that this issue persists with Thunderbolt 3 as well, a situation I’ve been monitoring and investigating for months. I’ve held off to be certain, but it is now time to discuss it, since I am certain it is a real and disturbing low-level bug (hardware or software I cannot say).
Here is what I know:
- Occurs on 2017 iMac 5K and 2017 iMac Pro. I have no other Thunderbolt 3 Macs to test and since it is so intemittent, testing would require weeks to make sense of things. But since I saw similar issues on Thunderbolt 2 on a 2013 Mac Pro, I very much doubt that it is machine specific—I think it is a fundamental Thunderbolt bug, or (perhaps) yet one more Apple Core Rot bug.
- I have 'intel' (I can say no more here) that says that this is a real issue that has nothing to do with my machines or my Thunderbolt 3 enclosures. Moreover, it happened with both Thunderbolt 2 and Thunderbolt 3 devices on different machines., and probably a hardware one.
- Apple and Intel are mum on the issue. It feels like a cover-up to me, perhaps some intractable hardware bug. But maybe not, and maybe it is fixable in software, if Apple can ever get its quality mojo back, when it’s not busy damaging minds of every age with iPhones.
- It is intermittent. I’ve had no trouble for a week, then it might happen 3 times in a day.
- It can happen coming out of sleep, or it can happen spontaneously while working actively at the computer.
- The drives disconnect then immediately reconnect. But the damage is done—this can screw up all sorts of things.
I’m not happy about this at all. I’ve been putting up with it for years, and now it persists with Thunderbolt 3 with all-new hardware and it has been an ongoing problem. I am not the only one seeing this sort of issue.
Kobi E writes:
Aha, so it wasn't just me. The transition to the 2013 MacPro was almost uniformly bad -- don't get me started on just how poorly the Trash Can served my needs. But what pushed over the brink was having my external SSD and RAID arrays go offline at random times.
I gave up trying to diagnose the issue, and wondered vaguely if the weight of the Thunderbolt cable was pulling it out of the socket enough to cause a glitch. I sold the thing and spent the money on a badass Hackintosh with room to house all my spinners.
I honestly can't say I'm particularly happy with running macOS on unsanctioned hardware, and it's a huge pain to upgrade the software, but my RAID arrays never disappear. I should be grateful for that little mercy, I guess.
MPG: Apple can fix this, but will it happen or not? All great companies decline (have their been any exceptions?!) and suck at some point. Apple started to suck in 2013, notwithstanding the addiction of the iPhone which is making Apple untold billions in profit.
Joshua H writes:
I have the same problem on thunderbolt 3 on a windows laptop for music production. With brand new hardware and brand new cables. Every updated. The computer manufacturer thinks the thunderbolt firmware is to blame. Who knows if he’s right!
MPG: “firmware to blame” seems consistent with the issue.
I read the article about Thunderbolt drives intermittently disconnecting. I’d say it is not actually intermittent. I believe it’s a connector reliability problem.
My Thunderbolt 2 external drives do the same thing, but I’m pretty sure it’s because even slight movement of the cable or the computer causes an interruption of one of the electrical signals. It wouldn’t have to be longer that a couple bit periods.
It I try to remove the adjacent T-bolt 2, it’s almost sure to cause the disconnect. If I bump the MBP (2015), that can do it, too.
OTOH, hands off the computer and cables, and it’s okey…
Now, for the other Thunderbolt 2 problem. If the computer is put to sleep by menu, waking from sleep is problematic. It can take 1-½ minutes to turn on the display. OTOH, removing the Thunderbolt drive the display comes on immediately, along with the finger-wagging Drive not put away...
I should re-phrase the issue - it’s a connection reliability issue. It seems that periodically cleaning the connectors with IPA and treating with DeOxit reduces the sensitivity… for a while. What’s needed is a way to solidly lock the connector to the computer and drive case.
IIRC, TB3 and 4 aren’t higher bit rate, they just have paralleled 10G bits streams. But maintaining two or four synchronized signals only makes it worse.
At 10 Gb/s, the bit period is 0.1 nanoseconds. The receiver clock recovery circuit will maintain phase lock for only a short period if the signal is momentarily interrupted; at which point the decoded bitstream becomes garbage. So it only takes an interruption of a nanosecond or less before the hardware doesn’t recognize the ‘data’.
After I wrote the previous reply, it occurred to me that my former work was applicable. I qualified 1, 10 and 25Gb/s optical transceivers used in high end routers. Those links, whether optical or electrical, all require re-timers in the link to reduce the accumulated phase jitter so the receiver can maintain phase lock with the incoming bit stream.
FYI, this is a rough outline of what's involved in moving 10 Gb/s from one place to another over )in this case), a simple optical fiber system. All-electrical links have similar constraints.
A transmission link is actually comprised of five sections of serial data transfer. in each of these, the signal accumulates jitter. If jitter exceeds more than ~ .3 bit period, bit errors start to occur, so the signal is re-timed before jitter reaches a critical level.
The link portions are
a) from the generating electrical circuit, propagating on a circuit board to the transceiver electrical input, (in some cases, re-timed here),
b) conversion from an electrical to optical signal, which becomes the transceiver optical output;
c) propagation through the transmission medium to the receiver on the next router port. This is where the signal suffers the greatest degradation because its bandwidth is usually sacrificed in favor of more lenient requirements for other parameters. The signal-to-noise ratio degrades due to attenuation in the medium (fiber or cable). (Highest signal frequencies also are attenuated by the circuit board materials and capacitance, but in a predictable way.)
d) At the receiver optical input it is converted back to an electrical signal, re-timed, and becomes the receiver electrical output.
e) The receiver output propagates on the circuit board to the intended functional circuit.
MPG: these problems are not confined to Thunderbolt 2; they plague Thunderbolt 3 and Thunderbolt 4 also.