Superlinear Growth

Understanding the nuances of strong, accelerating trends in technology that can be fed back into themselves is the reccuring theme of this blog. I find the implications of these trends to be the strongest axioms for making predictions about the future of our civilisation.

According to Ray Kurzweil, the progress of the entire 20th century would have happened in only 20 years at the rate of advancement in the new millennium. In other words, the rate of progress at the end of the 20th century was five times faster than the average. This rate has only continued to accelerate. It took just 14 years to achieve another 20th-century’s-worth of growth, and another 20th-century’s-worth of progress between 2014 and 2021. Put plainly, history speeds up.

This progress has no theoretical limitations, or at least none worth considering; at this rate, in a couple of decades, a 20th-century’s-worth of growth will happen multiple times in the same year, and later still, multiple times per month. If this holds, the 21st century will achieve many, many magnitudes more progress than the 20th century.

But this doesn’t seem obvious, and one naturally hesitates about this concept. Tim Urban prescribes this hesitation to three factors.

When it comes to history, we think in straight lines.
The trajectory of very recent history often tells a distorted story.
Our own experience makes us unwilling to accept drastic change, so we become ‘stubborn old men’ about the future.

When we try to imagine the progress of the next 30 years, we look back to the progress of the previous 30 years as an indicator of the scope of what is likely to be achieved. More wise predictions might forecast the advances of the next 30 years not by looking at the previous 30 years, but by taking the current rate of progress and judging based on that. They’d be more accurate, but still way off (see Figure 1). While it’s intuitive for us to think linearly, we would do better to think exponentially.

Graphs
Figures 1 & 2. Two important observations of the same exponential curve, visualised by Urban. [2]

II. The Shape of Discovery

In the real world, exponential growth isn’t perfectly smooth and uniform, it is driven by an unpredictable force: discovery. If we infer the y-axis of Figure 1 & 2 instead as “The Body of Human Knowledge”, then progress is driven by a growing stream of new scientific discoveries, made available in journals.

Scientific discoveries are either fundamental improvements or incremental refinements to existing knowledge. Our exponential curve is therefore made up of separate “S-curves”, depending on the present paradigm of “doing technology”. For example, in the case of modern computing, that paradigm of technology is the integrated circuit.

An S-curve begins when a new technological advancement sweeps the world, each time repeating three phases: initially, slow growth (below the general rate of exponential growth), followed by rapid growth (the explosive phase of the exponential), and finally, a levelling off as the paradigm matures.

There are many other disruptive paradigms to illustrate this.

Powered flight, for example, was first achieved in 1903 with the Wright Flyer. This was a fundamental breakthrough in “doing technology” that laid the path for an entirely new industry. News quickly spread and inspired a revolution in flying machines. The first 12-second flight of 120 feet (37m) was beaten later that day with a 59-second, 852 foot (260m) personal best. The Wright brothers achieved flight at 14.4kph. Two years later, the Flyer III flew 39km in 39 minutes, averaging 60kph.

The following nine years were considered the “Pioneer Era” of aeronautics. The characteristic aeroplane’s tail was devised in 1909: technology enabling military use. In 1914, Italy used planes to drop bombs on Turkey. In 1923 came the first successful helicopter flight, and in 1927, Spirit of St. Louis made the first nonstop trans-Atlantic flight of 5,800 km, averaging 175kph. Commercial jetliners were invented in 1949, kickstarting commercial air travel, and supersonic airliners proved possible in 1969. The greatest advancement in flight technology remains the 1966 SR-71 Blackbird, with a top speed of 3,500kph.

Planes
Figure 2. The timeline of aeronautics.

In the case of the aeronautics industry, the S-curve arguably peaked with the retirement of Concord and the Blackbird. No substantial improvements have been made when it comes to flight effectiveness, only efficiency. If Boom Supersonic manage a working, price-competitive airliner in the next few years, there is potential for a new golden age of flight. [3]

III. The Death of Moore’s Law

Another important example of the S-curve is transistor density, a waymark of exponential progress that has functioned as the time pacer of technology since 1961. Gordon Moore’s “wild extrapolation” that transistor density would “double every year” in 1965 has stayed mostly true until the last decade. [4]

Transistor scaling and miniaturization (specifically of MOSFET compact transistor microchips) known as “Dennard’s scaling” were the driving force behind Moore’s Law since the 70s. Other factors are the exponential increase in die sizes and a decrease in defective densities (meaning semiconductor manufacturers could work with larger areas without losing reduction yields). Finer minimum dimensions have also enabled more effective circuit architectures. [5]

Since 1965, when transistors cost $10 each, computing power has increased by 550 million times. In 1971, the first CPU cost $60 and had 2,300 transistors. In 2022 the Apple M1 GPU has 114 billion transistors, selling for around £2k in a MacBook Pro. Dennard scaling ended in the mid-2000s, which lead to a shift of focus from semiconductor scaling to more software improvements that could achieve an effective rate of increase in-line with or greater than Moore’s law.

Computations
Figure 3. Falling cost of computation since 1956.

In 2015, the CEO of Intel announced that their cadence was closer to two-and-a-half years than two. As the size of nodes approached atomic levels, architectural improvements have slowed, with single core performance increasing only 3% in 2017 and new performance coming from higher clock rates and larger caches.

Despite this, TSMC and Samsung have both claimed to keep pace in their hardware with 10nm and 7nm nodes in mass production, 5nm nodes in risk production (used in the M1), and 3nm and 2nm nodes on the horizon by 2024. [6]

IV. The Future of Accelerated Computing

The slowdown of Moore’s Law was anticipated; it was not the first, but the fifth paradigm to bring exponential growth to the price-performance of computing (measured in calculations per second per constant dollar, not clock speed or transistor density). As renowned physicist Max Tegmark has shown, the trend of exponential growth could continue for 200 years (or 33 orders of magnitude).

The potential for accelerated computing is within the entire stack; from processor to algorithms to software. This end-to-end perspective makes it obvious that there are fewer limits on power than what the layman may perceive. While Moore’s Law looks at advancements in CPU transistors, the force of “Huang’s Law” promises that as nascent deep learning software becomes more availed, the improvements to GPU scaling will improve performance and behaviours of modern software stacks. [7]

Jensen Huang, CEO of Nvidia described this process as universal, “the only way to solve a problem, in retail, telecommunications, automotive, is to re-engineer the stack… from the top to the bottom. Let’s say you have an airplane that has to deliver a package [but] takes 12 hours to deliver it. Instead of making the plane go faster, concentrate on how to deliver the package faster, look at 3D printing at the destination,”

Accelerated computing is not about making the chip go faster, it is about delivering the goal faster. The death of Moore’s Law simply means that developers cannot expect their code to get faster without effort, and as we have seen, even that isn’t a certainty. Deep learning optimises the algorithmic element of the stack, automatically speeding up high-level code for programmes where it is used.

Scientists and business leaders have highlighted a few interesting areas worth looking at for the advancement of computing.

Switching to III-V semiconductor materials - These semiconductors have the benefit of high electron mobility, enabling high-frequency and high-power applications. They are used for enhancing MOSFET microchips already, but are not easily integrated into silicon. The compounds are superior given their three or five valence electrons. [8]
Switching to 3-D computing - “MOSFET transistors” are metal-oxide-semiconductor field-effect transistors. MOS describes the material, FET describes the function. A field-effect transistor simply uses the voltage applied to its input terminal (called the gate) to control the current flowing from the source to drain.

There are other forms of transistor; JFETs are function field effect transistors; v-TFETs are vertical tunnel field-effect transistors. They use voltage differently and can offer improvements to traditional methods of manufacture.

If Moore’s Law was achieved by shrinking transistor sizes on a flat integrated circuit, Ray Kurzweil believes there is space for growth in the third dimension. 3D nanosystem chips offer both data storage and microprocessing by stacking layers. The design below replaces silicon with carbon nanotubes (sheets of 2D graphine rolled into nanocylinders). Cells of Resistant Random-Access Memory (RRAM) are embedded within the structure. These microchips are called carbon-nanotube field-effect transistors (CNFETs) and they scale beyond the limit of silicon MOSFETs. [9]

CNFETs
Figure 4. CNFETs. If this looks complex, that’s precisely the point.

Switching to analogue computing - A different paradigm to conventional “digital” computation is offered by analogue. Typically, bits are held as binary (1 or 0), but analogue can increase their complexity by varying the flow of electrons over a gate and storing them when switched off. Mythic has developed a microchip that computes at 25 million MIPS on just 3 watts and has plans to out-do this to offer market-level power at lower cost.

Switching to quantum computing - Also a different paradigm to conventional digital computation. In quantum computing, machines control the uncertainty of “q-bits”, which simulate an additional dimension of binary (the superposition) instead of just 1 and 0. Whether these will eventually be cheap enough to replace personal computers, only time will tell.

Deep Learning efficiencies - New software systems can make programme applications go faster, independent of the chip. Nvidia has spoken about this and generally have referred to it as “Huang’s Law”. This drove much of the speed up by the Google Brain team concerning TensorFlow. [8]

V. Other Waymarks for Technological Innovation

Each successive paradigm is obvious only in hindsight, for discovery and innovation are inherently unpredictable. A new advancement in computing power is unlikely to make a perfect growth curve atop the existing wave; in the case of Moore’s Law, the etching power rarely would double in a predicable way, or it would have happened before.

Silicon computing (processing) power growth described by the last 60 years of Moore’s Law is important, but it isn’t the only thing to bear in mind when it comes to new entrant paradigms of “doing technology”. Jeff Bezos presented the following graph at a shareholder conference in 2006 as Amazon began looking at web storage services. [10]

Amazon Shareholders
Figure 5. 2006 trends in technological growth presented at Amazon shareholder meeting.

Price versus performance curves of data storage double every twelve months. Networking speed is doubling over optical fibre every nine months. When technology gets cheap, price elasticity begins to control the market; consumers will buy more than they did before and can choose from a wider range of close-substitutes. We see this in the form of wearables, RFID tags on luggage, smart homes, and smart devices.

Arguably the most important application lies in machine learning. Often, we humans base our ideas about the world on our previous experience, and this blinds us to upcoming events. Viewers who watched Deep Blue beat Garry Kasparov at chess in 1997 and went ‘Holy shit!’ would be fools to watch AlphaGo beat Lee Sedol 20 years later and express the exact same degree of surprise. Humans tend to reset their expectations in line with John McCarthy’s prophetic mantra that “as soon as it works, no one calls it AI anymore.”

“If the most advanced species on a planet keeps making larger leaps forward at an ever-faster rate, at some point we’ll make a leap so great that it completely alters life as we know it and the perception we have of what it means to be human.” - Tim Urban.

This will be the subject of my next essay.

Notes

[1] - Coiner of this principle is Ray Kurzweil, read more on it here.

In The Innovator’s Dialemma, Harvard prof Clayton Christensen attributes the term “fast history” to his colleague.

[2] - For Urban’s take on the singularity, this is a brilliant starting place.

[3] - The Supersonic mission to not go bankrupt… here. theoretically, Boom should work. Restoring the curve to its former glory might be more of a challenge. Musk’s comments on transnational rocket flights (here) are certainly interesting though…

[4] - Moore’s Law is in fact a perfect eponym, as “no scientific law of discovery is named after its original discoverer” (Stigler’s Law of Eponymy, here). Likewise, it far from a law of nature, and rather a principle. I have more essays tracking the principle’s history and studying it’s nuances.

[5] - This, according to Roger Stough’s 1996 paper which gives a great insight into the early days of exponential growth from an architectural POV.

[6] - The case (A) against the deceleration of Moore’s Law

TSMC microprocessors blogpost and article.

And this statement from TSMC themselves.

The case (B) pro-deceleration of Moore’s Law in 2018 here. And a great breakdown here.

[7] - Max Tegmark’s estimation is covered in this brilliant interview with Sam Harris.

[8] - Huang’s Law is covered greatly here by Bharath Ramsundar.

[9] - See Kurzweil’s thoughts on “Skyscraper Microchips” here.

[10] - More github blogs!