GPT7: When AI Becomes a State Affair?
If the scaling hypothesis is correct, only states can afford super machine intelligence
Large Language Models (LLMs) like GPT-4 have faced criticism from many, not to mention from many within the AI research community. As discussed previously, the critique cuts to the core: not only do they not resemble human intelligence, but they lack real understanding, functioning merely as a sophisticated form of auto-complete.
Yet there are others in the community with a very different opinion. When GPT2 came onto the scene, some thought something significant was happening within deep learning. With the transformer architecture on which ChatGPT is based in particular, the larger these models became, the more data they were fed, that they were showing “emergent” properties that they were not trained for.
When we say “larger models,” we generally mean the number of parameters it has. The parameters of an AI model control the size and bias of the pathways between the neurons of the model, pathways that are learned from the training data. Think of them as the model's internal variables that shape its actions and effectiveness.
Sam Altman from OpenAI, Dario Amodei of Anthropic and Demis Hassabis of Google Deepmind all believe that large language models (LLMs) possess a type of understanding and suggest that with sufficient data and parameter size, they could achieve superhuman intelligence. We can call this way of thinking the "scaling hypothesis of intelligence," founded on two straightforward ideas: a larger "brain" correlates with greater intelligence, and there's a functional similarity between artificial and natural neurons. The scaling hypothesis has been substantiated in scholarly articles, and the findings remain impressively consistent: Expanding the models' parameters and concurrently amplifying the training data improves their performance.
The neat thing about this point of view is, that if it is true, then reaching superhuman AI will require few new scientific breakthroughs. It is mainly an engineering and financing feat. So there’s been a group of influential AI researchers that argue that this is just the beginning for large language models (LLMs) and that we're entering an era of exponential enhancement. Some go even further and postulate that consciousness might just be a by-product when a certain scale is reached.
Gary Marcus, a well-known sceptic, points out a critical observation: it's a year and a half since ChatGPT's release, and while Google, Anthropic, and recently Meta have unveiled models that may surpass GPT-4 in some aspects, they're all within the same range of performance. He argues that if we were to witness exponential growth, at least one should have demonstrated a significant leap forward by now, and a more profound leap than the progression from GPT-3 to GPT-4. Thus, he asks whether large language models (LLMs) hit a performance ceiling?
Marcus may well be right, but there’s another possibility. The scaling hypothesis might explain why we are not seeing a massive leap, because it’s becoming harder to achieve. The hypothesis posists the exponential need for inputs, to achieve linear improvements.
The scaling hypothesis might explain why we are not seeing a massive leap, because it’s becoming harder to achieve. The hypothesis posists the exponential need for inputs, to achieve linear improvements.
The hypothesis argues unambiguously that training more advanced large language models (LLMs) with the transformer architecture requires exponentially more resources. And we have seen that in the past. For instance, GPT-2's training was relatively modest, costing around $45,000 in compute power. In contrast, GPT-3, is roughly 10 times bigger in data and parameter count, and incurred a cost 100 times greater, soaring above $4 million.
The specifics of GPT-4 remain under wraps, yet it's speculated that it increased by a factor of nearly 10 in both parameter count and data used. The training expense is estimated to be between $50 and $100 million. Regarding data volume, it's believed that GPT-4 was trained on an amount comparable to that held in the US Library of Congress, equivalent to approximately 10 trillion words. Yet, although GPT4 is considerably better than GPT3, and GPT3 better than 2, their superior performance on benchmarks is linear, not exponential.
You can see where this is heading. To get another improvement on benchmarks of the same magnitude, GPT5 will probably have to be trained on something like 10 times the data GPT4 was trained on, and be a 10 times larger model, and cost 100 times more to train — in the order of a couple of billion dollars or more.
That is an eyewatering sum. But very much within the budget ranges of most big tech companies who typically fund these training runs. But also, to get that amount of quality data together to train on isn’t trivial. Epoch AI projects that the pool of high-quality text data will be exhausted between 2024 and 2026. They estimate that approximately 10 trillion words of such data are available, an amount akin to what was used to train GPT-4. This data includes sources like books, news articles, scientific papers, Wikipedia, and curated web content.
This view may be overly pessimistic. For instance, Anthropic's Claude 3 model partially relies on internally generated data for training, a form of synthetic data. Some fear this is a kind of intellectual inbreeding, but Claude 3 works well. Similarly, ChatGPT is creating a substantial amount of data through user interactions, already thought to be equivalent to the volume it was trained on. However, with heightened awareness of data's value and the flurry of copyright litigation, procuring sufficient data might become a complex and costly endeavour.
No wonder, then, that GPT5 is taking a while to arrive. But when it does, I predict it will be noticeably better than GPT4—but not enough to silence the detractors. And if the scaling hypothesis remains the only pathway to machine intelligence, expect a new set of players to come to the fore by the time we get to GPT6: States with their massive financing abilities. And GPT7?
Only two people on the planet will be able to afford to build GPT7, Xi and Biden, or their successors.
This first piece first appeared in the Vrye Weekblad, where I write a bi-weekly column on AI.