Beyond Parroting: The Intelligence Behind Language Models

Computers, we’ve been told, operate using just zeroes and ones. On the face of it, for most of us, that might sound ridiculous. Hard to fathom. How can all information be reduced to just zeroes and ones?

But yes, it is possible. Because the smallest unit of data is a bit - a binary unit - a simple true/false, yes/no, 1/0 dichotomy. It is that signal that was delivered from atop a castle in medieval Europe by a mole to signal readiness for invasion. It is the color of the smoke from the Vatican that signals that much-awaited Papal decision. Indeed, within the computer itself, it’s the flow or non-flow of electricity at a given moment that signals the zero or the one.

❝

While we do see randomness and unpredictability in computing devices, traditional software is fundamentally a deterministic phenomenon.

A necessary consequence of this is what can be called deterministic behavior. While we do see randomness and unpredictability in computing devices, traditional software is fundamentally a deterministic phenomenon. Code that is written, which is ultimately converted to those zeroes and ones, is rules based, where the rules seek to specify precisely how the machine should behave in each preconceived scenario.

This is good, in many cases, because we want deterministic behavior: expected outcomes - accurate, consistent and reliable. Your calculator always gives you 2+2=4. The cash machine you’d expect to always dispense the exact amount of cash, at least not less! You expect the instrumentation in the plane to show the exact values so the pilot’s rightly informed.

However, there are many cases where it is not possible to take a rules-based approach. It is simply not feasible to preconceive every scenario, premeditate every likelihood and code for it. This is where we have systems that, given large amounts of data, can learn from the data and develop the right behaviors necessary.

Algorithms that, by mimicking the human brain, can carry out such learning is what we call Artificial Intelligence. And perhaps the strongest use case for such a learning machine is in the handling of human language - the very device we use to express our perception of reality, in all its nuance and complexity.

If you have seen traditional computer code, you’d have noticed that it’s in a very rigid format, with a very limited set of keywords laid out in a very particular way - even a missing comma can wreak havoc, just ask a programmer!

But that is part of the deterministic space traditional computing lives in. AI such as the large language models (LLMs) that generate language that sounds convincingly human is an entirely different phenomenon. In fact, it’s the very opposite of deterministic: it’s probabilistic or stochastic.

LLMs, which are typically based on the transformer architecture, are just that - they simply guess the most likely next word given a series of words: “last but not the...?” They haven’t been given any set of rules that specify what word comes next. But that is by deliberate choice. After all, in real life, how can you necessarily guess what word your interlocutor is going to utter next?

Conventional software has been and continues to be useful, for the reasons already mentioned. But with the advent of architectures such as neural nets that can mimic the human brain, and hardware to match their computing needs, this probabilistic — effectively guesswork-based technology - AI is becoming an increasingly viable way of modeling aspects of our reality, perhaps the most striking, as stated, being these ‘chatbots’ generating human speech.

Alan Turing, the WW-era British scientist, is most famously associated with this question: can a machine speak convincingly enough like a human: The Turing Test. If you have used ChatGPT or Claude or a similar chat model, you’ll probably say, yes, the Turing Test has finally passed.

Alan Turing also mused whether all reality can be expressed in a deterministic, fully predictable way, and if so whether there is no such thing as free will. That’s for another day.

Continue to the next article in this series here.

About the author

Ash Stuart

Follow Ash on X

👾 The Intelligence Behind Language Models: Beyond Parroting

Computers, we’ve been told, operate using just zeroes and ones. On the face of it, for most of us, that might sound ridiculous. Hard to fathom. How can all information be reduced to just zeroes and ones?

About the author

Ash Stuart

Reply

Account

Content

Tools

Resources

Subscribe to keep reading