👾 Scale Is All You Need? Part 1

Kim Isenberg
September 23, 2024

Introduction

AGI is generally regarded as the holy grail in the AI scene. Along with ASI and the singularity, it is considered the most far-reaching and world-changing moment in that it is the necessary prerequisite for the others in this chain. It will be a decisive moment because AGI will have an influence and impact on all areas of human life. (Narrow) artificial intelligence (i.e. the currently common language models) has already gradually found its way into the diverse areas of people's lives and work and is widely used in a wide range of areas. For example, excellent translations no longer have to be painstakingly done by professional translators, but are automatically translated almost perfectly by AI. E-mails can be written and sent with a simple prompt. 41% of all the code flourishing on the internet is already AI-generated¹ , and first-level support is now provided by AI instead of real people (the latest example being the fintech Klarna)² .

AGI, in turn, will significantly build on this. The accuracy and generalizability achieved by applying the eponymous Artificial GENERAL Intelligence will, on the one hand, create even broader application as the generality encompasses even more fields of work, and, on the other hand, better output through more precise reasoning. In addition, it is unanimously accepted that AGI can also act and operate largely autonomously:

❝

It is widely accepted that an AGI is a highly autonomous artificial system that has at least the same or better cognitive and intellectual abilities than humans in all areas. It understands and performs any intellectual task that a human can understand and perform.

We therefore see AGI as one of the most significant milestones, at least in the tech scene, but presumably much more so in the history of humanity as a whole. It is very irritating, though, that to date there is no uniform agreement on a basic definition of AGI! There are different approaches, and probably the most sophisticated one comes from Google DeepMind (see below), but a generally accepted idea of AGI does not yet exist - only attempts to reach an agreement.³

This is a problem in that if we don't know exactly what is AGI, we can say even less about how to get there. After all, we first need a clear goal in mind in order to get behind the wheel and start the journey to the destination. With the help of a definition, we can classify which milestones we have already achieved for general intelligence and what still needs to be achieved. In the absence of a definition, it is therefore not surprising that different views and questions are raised, such as whether we can achieve AGI at all with the current architecture (Yann Lecun)⁴ .

In short, we need a definition and must then clarify the challenges that still need to be overcome. At present, numerous experts say that we are still a long way from the end of the exponential development of language models and that “scale” is everything we need on the road to AGI. However, for the reasons mentioned above, this is not the case.

In this four-part series, we will approach the question of what AGI is and how we achieve it. However, I don't want to leave it at that and will also highlight the biggest problems on the road to the singularity and provide a possible outlook on the post-AGI world. Part 1 is therefore dedicated to the definition of AGI and the various approaches. Part 2 is about the journey to General Intelligence and what we need to achieve it. Part 3, in turn, focuses on the biggest challenges (bottlenecks) and Part 4, on the other hand, provides an outlook on post-AGI society.

In summary, the overall analysis could be broken down into the question circulating in the AI scene: “Is scale all you need”⁵ - a question posed in this way, which of course alludes to the origin of modern AI, the DeepMind research paper “Attention is all you need”, which can be regarded as a modern primary source. Could it be that around seven years after the breakthrough of Google DeepMind and the introduction of the transformer architecture, we now only need more compute power to achieve AGI? Probably the most important question in the history of humanity at the moment.

What is AGI?

❝

An Artificial General Intelligence is universally applicable and not limited to a specific range of tasks like weak AI. It covers the full spectrum of human abilities. (...) Strong AI is full artificial intelligence, or AGI, capable of carrying out tasks at the human cognitive level despite having only limited background knowledge.

AGI is the acronym for artificial general intelligence. While we currently already operate with the first and last word of the acronym throughout and artificial intelligence has even become a cultural concept, its generalizability (generality) (1) represents a novelty that has not yet been achieved. AGI is therefore an AI that is not limited to a single specific field, but can be applied in general, across the board. It must therefore be designed in such a way that its underlying training enables it to be used widely; the model must be able to philosophize just as well as it must perform physical calculations; translate just as well as code; and master the Harvard law entrance test just as well as the metalworker's test. It must therefore be an expert in all areas of human life, a general expert.

In addition to general applicability, another qualitative criterion is self-learning (2), i.e. the ability of a model to acquire knowledge and functions itself.

Scientists at Temple University in Philadelphia summarized this as follows:

AGI systems should not be designed for specific problems, but they are general-purposed systems without designers' or developers' specifying problems to be solved by the systems; (...) AGI systems are general systems instead of general algorithms, meaning that after deployed, no human developers need to intervene in the source code of the system, by contrast, an algorithm is general in the sense that human developers can apply it to various problems by generating a solver instance for each problem.

The scientists noted that, despite years of AGI research, no consensus exists, and their paper aims to establish a minimal agreement.⁶

Scientists at Amazon's cloud service, in turn, specified the field of application in more detail and placed the human level at the center.

Artificial General Intelligence (AGI) is a field of theoretical AI research that attempts to develop software with human-like intelligence and the ability for self-study. The goal is for the software to be able to perform tasks for which it has not necessarily been trained or developed. (...).

In addition to generalizability (1), self-learning (2) is therefore the focus of attention. However, there is another essential criterion that is increasingly being incorporated into LLMs: multimodality. Multimodality refers to the processing of a language model from different modalities, such as text, audio, video and prompts. For example, models can process a photo of a well-stocked refrigerator and use it to suggest recipes for cooking. That is what multimodality (3) means. Let's consult Amazon's scientists again.

According to researchers, general artificial intelligence will make use of various AI concepts and technologies. These technologies include, for example, machine learning, deep learning, artificial neural networks (ANN), transfer learning, natural language processing, computer vision and hybrid AI (a mix of symbolic and sub-symbolic AI). For movement and interaction in physical environments, robotics is used.

Multimodality enables incorporation. The general-purpose system is given the ability to act in a human-like corpus through vision, hearing and natural language processing. It is human-like because today's robots are all modeled on humans, simply because the world has been built according to human standards and is geared to human needs. A model in a body, on the other hand, would have the opportunity to acquire completely new knowledge and completely new training data. It could go out into the world and try things out, train and develop much more easily on its own. Just as a child develops through daily life experiences by trying things out through play, a model could also be applied in robotics to engage in self-learning by means of multimodality.

Former OpenAI employee Andrej Karpathy recently reiterated his definition of AGI, urging consensus on a single definition. His definition, which aligns with OpenAI's, remains regrettably vague.

“AGI: “a highly autonomous system that outperforms humans at most economically valuable work”

For ‘most economically valuable work’ I like to reference the index of all occupations from U.S. Bureau of Labor Statistic”

General expert level (1), self-learning (2) and multimodality (3) in conjunction with autonomous action (agentic) seem to have emerged as the essential criteria for AGI in broad reception.⁷ That is, the ability of a model to solve problems independently at the human expert level without necessarily having been trained on them beforehand and to draw conclusions for its further action.

❝

Artificial general intelligence (AGI) is the representation of generalized human cognitive abilities in software so that, faced with an unfamiliar task, the AGI system could find a solution. The intention of an AGI system is to perform any task that a human being is capable of.

As already mentioned, Google DeepMind introduced this table as a discussion proposal to delineate the various stages on the way to AGI. But again: this is not a universally accepted definition, but in my opinion it is an excellent working basis. Let's take a closer look at some of the aspects.

Differentiation: ANI, AGI and ASI

In principle, a distinction is made between weak and strong AI. Weak AI is also called “narrow AI” (artificial narrow intelligence, ANI)⁸ , the terms are synonymous. ANI is narrow in that, unlike artificial general intelligence, it can only be trained and applied to a few narrow areas of knowledge. ANI is currently the most widespread form of intelligence. It specializes in solving specific tasks efficiently without developing a comprehensive understanding or consciousness. The most common example would be Amazon's simple language assistant Alexa, insofar as it 'understands' simple questions, reacts and, for example, retrieves information from the internet. AI-based image recognition systems, such as those used in social media or medical diagnostics, are also examples of ANI, in that they identify and mark faces in photos or recognize objects or scenes in images.

Artificial intelligence comes in different forms. In the following, the terms General Artificial Intelligence (AGI), Artificial Intelligence (AI) and Artificial Super Intelligence (ASI) will be distinguished from each other. When we talk about artificial intelligence, we often mean narrow artificial intelligence. Another term for this is artificial narrow intelligence (ANI). Narrow artificial intelligence is limited to one area of application and the completion of individual tasks. It is able to perform certain tasks, but fails at tasks from other areas. The AI systems and generative AI models from the text, video or image area that are available and actively used today are usually weak AI.

“By contrast, narrow AI or strong AI are AI systems that are limited to computational specifications”

The table above, created by Google DeepMind, shows that the The table above by Google DeepMind shows that the Google researchers have added various “levels” to better subdivide the categories ANI, AGI and ASI in order to subdivide the respective levels of knowledge. The researchers place my example with Amazon's Alexa in level 2 and already rate this artificial intelligence as “competent”. Overall, the table provides a good breakdown and gradation to help you navigate the different AI modalities.

AGI and ASI

I have already explained the specifics of AGI in some detail above, which in my opinion can be defined in three or more criteria (general expert level (1), self-learning (2) and multimodality (3) and working as an autonomous agent, provided that this is defined as a fourth criterion and not as an integral part of the other three criteria; And here is one more thing that needs to be mentioned, because it is not talked about enough: having autonomous agents at the expert level also means embodying them. As autonomously acting robots, they have access to the outside world and thus to an infinite amount of additional training data, since they participate in the real world.). Compared to “narrow” AI, it is therefore easy to see which criteria must be met to make the transition from “narrow” to “strong” AI. However, it must be said that the first subtle differences can already be seen here. DeepMind already describes GPT-4 as emerging AGI due to its sophisticated level subdivision. This is something that Elon Musk also mentioned in his lawsuit against OpenAI, which not least shows the disagreement regarding the definition of AGI and should be taken to court.⁹ So far, Musk's lawsuit is the greatest proof of precisely this dispute and discord within the scientific community regarding the question of what AGI is and how to determine it.

However, there is disagreement in some areas and details of the definition of a general artificial intelligence. For example, there are different opinions about whether an AGI must exhibit some form of consciousness or sentience. Some definitions extend into the realm of philosophy.

ASI, on the other hand, is the presumably necessary consequence and causal process from general intelligence (“strong AI”) and its abbreviation stands for artificial superintelligence. In short, ASI (artificial superintelligence) or artificial superintelligence is a hypothetical form of artificial intelligence that significantly surpasses human intelligence in almost all areas

Ilya Sutskever, formerly chief scientist and co-founder of OpenAI and probably a familiar name to everyone in the AI community, is a highly regarded AI scientist and often forgotten AI scientist protégé of AI founding father Geoffrey Hinton, and is now dedicating himself to the question and research of ASI.

After leaving OpenAI, Ilya founded the company Safe Superintelligence Inc. together with Daniel Gross and Daniel Levy to develop a safe ASI. The fact that Ilya is already moving directly to research on ASI instead of first developing AGI says something indirectly. While we can only speculate that he is already researching ASI, the logical conclusion is that 1) Ilya knows that AGI will definitely be achieved (or has been achieved internally) and 2) he knows the necessary steps and can therefore go straight to research into ASI. His company ssi.inc, which raised $1b in its first round of funding from well-known venture capitalists such as Andreessen Horowitz without being able to present a model, is impressive.¹⁰ Ilya and Co. define ASI as follows:

Artificial Superintelligence is the enhancement and evolution of General Artificial Intelligence. It is comprehensively more intelligent than humans and surpasses their abilities and achievements in all areas.

❝

Building safe superintelligence (SSI) is the most important technical problem of our time. We have started the world's first straight-shot SSI lab, with one goal and one product: a safe superintelligence.

Conclusion

Based on these criteria, the question naturally arises as to how an AGI model should be structured. The question of model size and architecture arises. In particular, the question of architecture is still the subject of heated discussion. AI luminaries like Yann Lecun (Meta) have unequivocally stated that AGI cannot be achieved with today's transformer architecture and the corresponding large language models. Lecun claims that this structure does not allow the model to create new knowledge (i.e. knowledge that is not already contained in the training data) or to take a planned approach.

So first of all it's not going to be an event right the idea somehow which you know is popularized by science fiction and Hollywood that you know somehow somebody is going to discover the secret the secret to AGI or human level AI or Ami whatever you want to call it and then you know turn on a machine and then we have ai that's just not going to happen it's not going to be an event it's going to be gradual progress are we going to have systems that can learn from video how the world works and learn good World presentations yeah before we get them to the scale and performance that we observe in humans it's going to take quite a while. (...) it's not going to happen in one day um have systems that can learn like hierarchical planning hierarchical representations systems that can be configured for a lot of different situation at hands the way the human brain can um you know all of this is going to take you know at least a decade and probably much more because there are a lot of problems that we're not seeing right now we have not encountered and so we don't know if there is a easy solution within this framework. I've I've been hearing people for the last 12, 15 years claiming that you know AGI is just around the corner and being systematically wrong and I knew they were wrong when they were saying it. I call this bullshit.

“certainly scale is necessary but not sufficient, so we certainly e're still far in terms of compute power from you know what we would need to match the compute power of the human brain”

What's more, Lecun “confidently predicted that LLMs will never be able to draw basic spatial conclusions”.

OpenAI, on the other hand, claims that scale and compute are currently the most essential things you need, insofar as Sam Altman repeatedly refers the focus to the bottlenecks to AGI: compute and electricity, but hardly a word about a new architecture. Compute continues to be provided by Microsoft Azure, but this also makes OpenAI somewhat dependent on Microsoft (remember Satya Nadella's words when Sam Altman was briefly fired last year) and, what's more, it has recently been expanded.

It should be noted that breakthroughs in algorithms like Q* would further develop language models in such a way that independent planning, process subdivision and system 2 thinking would also be possible on the basis of transformers (among other things by means of CoT). Could OpenAI's Project Q* be the key to AGI, the holy grail we've been searching for so long? And if so, is Sam Altman perhaps right that we only need scale if we have Q* to achieve AGI?

It should be noted that breakthroughs in algorithms such as Q* would further develop language models in such a way that independent planning, process subdivision and system 2 thinking would also be possible on the basis of transformers (among other things, using CoT). So could OpenAI's Project Q* be the key to AGI, the holy grail we've been searching for so long? And if so, Sam Altman might be right after all that we only need scale if we have Q* to achieve AGI. So we see that it is difficult to give a clear answer to the question of where we stand and what we still need to achieve AGI. If you agree with LeCun's basic thesis that we still need significant changes in architecture and reasoning, AGI still seems very far away. But if we take as a basis that, as Elon Musk has already said, GPT-4 shows the first signs of AGI and has also taken this as a reason for his lawsuit (see above), then on the other hand only minor developments are needed. If we follow the second thesis and try to build an artificial narrow intelligence on it, it presumably requires the expansion of independent planning, autonomous execution (agentic) and self-learning. Recently, there have been big headlines about Project Strawberry/Q*. If I am right, both self-learning and independent planning through so-called System 2 thinking according to Daniel Kahnemann could soon become reality.

Presumably, this particular algorithm is processed as follows:

From the name and the plausibility of the technology, however, I still think it is Q-learning, a form of A*-search, CoT and PRM. But I could be wrong. But I am firmly convinced that planning and System 2 thinking were key guiding principles of Q* and account for its success. Q* will probably produce superior accuracy in the output of results. It will achieve at least similar, if not better, results than Google DeepMind's AlphaProof2 and Alpha Geometry (which recently achieved silver) through self-learning, pathfinding and process subdivision. Q* is probably the closest to AGI. Whether and how much compute and energy is necessary for this remains unclear (though it seems very much of both is needed). [All further details can be found in the article].

Should OpenAI succeed in creating an algorithm with Q* that achieves self-learning, independent planning and cross-checking, only the fundamental agentic of AGI stands in the way. And presumably that will be the least of their problems, and judging by the rumors, OpenAI is also well advanced in this regard.

The well-known leaker Jimmy Apples still has “WAGMI”, “we are gonna make it”, presumably in reference to AGI 2025, in his profile. Considering the rapid pace of development, this is not too far-fetched.

But what we also need, what the current state of play is on the road to the AGI and how realistic 2025 really seems, we will address this essential question in the second part.

Continue Reading Part 2: The Journey to AGI

The Journey to AGI

Explore key AI breakthroughs, from early models to AGI progress. Learn how scaling deep learning drives development and the hurdles ahead.

www.matthewberman.com/p/the-journey-to-agi

About the author

Kim Isenberg

Follow Kim on X

	_Part 1_ What is AGI, anyway_ Footnotes & Sources.pdf78.30 KB • PDF File

Reply

or to participate.