The break-neck rush to 'advance' technology is never a good idea. Anything can be over-extended, and the unexpected side-effects of even apparently good things can be surprising.

Now, after 50+ years of pushing to make integrated circuits smaller and faster some inherent problems are cropping up. Most people have probably forgotten the 'spectre' panic where bad designs allowed possible exploits to break security measures. This has not (yet) been exploited in the real world, but it's still a potential risk since all of the vulnerabilities cannot be completely fixed.

But now it looks like the sheer smallness of circuitry in newer CPUs has reached a reliability limit. A small percentage of CPUs that seem 'good' when tested in the factory will turn out to have strange errors in real operation over time. Occasional 'jumps' of electrons between microscopic transistors can cause things to not function properly. This is a problem 'at scale' for now, meaning for large masses of CPUs. But that means large users of huge numbers of CPUs, "cloud providers", will have more and more transient errors in their systems. And what have we been doing for decades now? Moving more and more of our lives, not only economic data but important things like medical data, into these mass-computing environments. This looks like another crisis in the making...



Computer chips have advanced to the point that they're no longer reliable: they've become "mercurial," as Google puts it, and may not perform their calculations in a predictable manner.

Not that they were ever completely reliable. CPU errors have been around as long as CPUs themselves. They arise not only from design oversights but also from environmental conditions and from physical system failures that produce faults.

But these errors have tended to be rare enough that only the most sensitive calculations get subject to extensive verification if systems appear to be operating as expected. Mostly, computer chips are treated as trustworthy.

Lately, however, two of the world's larger CPU stressors, Google and Facebook, have been detecting CPU misbehavior more frequently, enough that they're now urging technology companies to work together to better understand how to spot these errors and remediate them.


