šŸ‘¾ Ask And You Shall Receive

Previously we explored the concept of inference, and the notion that the large language models underlying popular AI chatbots are in fact next-word-predictors. Letā€™s explore this idea further: how they turn from simply predicting the next word to having a useful conversation. 

But letā€™s start with a joke.

Man finds bottle, rubs it, out comes a genie, and grants him one wish (this oneā€™s a bit stingy). Man says, Iā€™d like a million dollars please. 

Sure, says they genie, and lo and behold, man has in front of him a million Zimbabwean dollars.

Ok, hold on to the joke, for itā€™ll soon become relevant. 

LLMs are implementations of the so-called transformer architecture. At its most basic, given a sequence of words, by definition, an LLM should output that sequence of words plus the next word - which is the most likely word the LLM thinks should appear next. Observe:

  • To be or not to > To be or not to be be

  • The cat sat on the > The cat sat on the mat

The next word in these two examples is rather obvious, so the probability score of the predicted word is quite high. In most real-life scenarios, however, itā€™s not always obvious to predict the next word. But LLMs, with their large, imbibed body of sentences (practically most of the content on the Internet and other publicly available sources) can, by way of a complex algorithm, compute the probabilities of each word in the entire vocabulary it has, as next-word contenders to that sequence, and then output the most probable one. (This is a bit simplistic of an explanation, but it should be good enough here.)

ā

Large language models have practically
swallowed the whole Internet

Bringing a transformer model to the extent of capability is called pre-training - thatā€™s the p in GPT: Generative Pre-trained Transformer, and such a model is usually called a base model, not something most of us would have seen or need to bother with. The more fun fella is the one who can chat with us - like a chatbot.

To get to this conversational stage, base models are subjected to further training - processes such as fine-tuning come into play. Letā€™s look at how OpenAIā€™s GPT base models are transformed into what you see in, for example, ChatGPT. In its simplest explanation, human users essentially curate a set of ā€˜ideal-worldā€™ inputs to the LLM and outputs from it. 

Thus, the sentence ā€˜What is the tallest mountain in the world?ā€™ should lead to the output ā€˜The tallest mountain in the world is Mt. Everest.ā€™ (Whereas a pure base model might come up with something like ā€˜What is the longest river in the world?ā€™ !) 

The model is trained with such sample sets utilizing a rating for each answer, which forms the reward for the model. This technique is called Reinforcement Learning from Human Feedback (RLHF). This is what prepares a base model to become, whatā€™s sometimes called a reward model or instruct model, but in essence, it is the chat model you are used to chatting with. 

So is that it? So does this all-singing, all-dancing chat model know what I want, will it respond to any of my whims and desires exactly as Iā€™d expect? Well, you can see from the above joke where I am going with this. 

The problem is, 

ā

human language is inherently ambiguous. 

We use the same words to mean different things, and different words to mean the same things. Some information is implicit, some intent may be assumed, and so on. (We touched upon this when we discussed structured vs unstructured data). 

So the last piece of the puzzle is a skill that falls back to the human user of these systems: how to ask well so you shall receive well! 

This activity is called prompting the language model, and as of now, itā€™s a lot more of an art than science. And getting it right has been so hard to come by that there has emerged a new category of highly sought-after job description: prompt engineer. 

Try it out yourself, fire up Perplexity.AI (or Claude or ChatGPT) on your browser, and ask a non-trivial open-ended question. You can even try a non-trivial statement. You will likely get what you might think of as the plausible response. Now open up a new conversation (so it has no context of the other conversation) and copy-paste the same question or statement. Itā€™s highly likely that youā€™ll get a slightly, or even hugely, different answer!

This might be fine in a casual conversation you might have with a chatbot over a drink (try it out if you havenā€™t, it can be fun!) But for serious professional use, in actual enterprise environments, this will not do. Hence the need for this new wizardry called prompt engineering

Despite their marvelous capabilities, these language models are still just machines. They do not embody the experience of the messiness of human life and the complexity of the real world; you may say they lack common sense. Things will no doubt improve as the technology progresses, but perhaps itā€™ll still be on us to know what we want and how best to ask AI so we shall receive. Thus weā€™ll have a lot more to say on this, stay tuned. 

About the author

Ash Stuart

Engineer | Technologist | Hacker | Linguist | Polyglot | Wordsmith | Futuristic Historian | Nostalgic Futurist | Time-traveler

Reply

or to participate.