AI and LLMs: 9 Key Concepts and Ideas

In the last 9 articles, we touched upon some of the important concepts underlying Artificial Intelligence, and in particular with regard to large language models. Let’s have a brief recap and summary. 

We all have knowingly or unknowingly used machines powered by software. Conventional software is rules-based - these rules are specified using a specialized language with a very rigid format (or syntax) - a computer program written in a programming language, or code

In contrast, with AI we have a neural net or a similar computational device which mimics the human brain - in particular, given a large amount of data, it discerns patterns from it and develops an understanding of the domain and is able to reason within the bounds of the knowledge provided to it. 

In other words, conventional software is top-down, whilst AI is bottom-up, emergent. Correspondingly, while conventional software is deterministic, language models are by definition probabilistic, or stochastic.

We next discussed how to convey, represent or encode information. The smallest unit of information is a bit - a binary digit. A simple yes/no, true/false, good/bad. In an electronic system, this state is captured by the presence or absence of electricity at a given point in time - the essence of how a transistor works.

From here we can expand to more complex forms of representing knowledge, which allows us to get to how a language model can encode meaning. 

Beyond a bit, we come to a one-dimensional setup - the number line. We then add a second axis to it to come up with the 2-dimensional Cartesian plane with X and Y axes. Next of course is a 3-D space, with X, Y, and Z. We are able to both visualize these using geometry and manipulate points in each case using algebraic equations, noting that the two representations are fully interchangeable

But while humans cannot visualize more than 3 dimensions, we still can carry out algebraic representations of higher dimensions - in fact any number of dimensions. This is what allows us to computationally encode meaning, because meaning - any idea, point, concept, matter - has several dimensions to it, in a sense, several connotations and implications. 

Thus meaning in a language model resides in an n-dimensional space, where n is typically a few hundred, or in some cases, more than a thousand. This we’ve called the semantic space

Different points in such an n-dimensional space have different relationships with other points in space - in general the closer a couple of them are to each other the closer the meaning. Thus we saw that ‘cat’ is closer in semantic space to ‘dog’ than to ‘car’, despite the literal resemblance (they both have the connotation of animalness whilst a car doesn’t). 

Such a system consequently allows us to capture unstructured data for the very first time. In the realm of conventional software, just as the code that’s written to give the machine instructions, even the data fed to such a program has to perforce be structured, examples being data in a spreadsheet, a database table, or even your email address. 

Given the ability of LLMs to handle meaning however, we can have them imbibe unstructured data (normal text such as this, laid out in sentences and paragraphs) and create semantic representations of such text within their semantic space. Such representations are called embeddings (or vectors) and they are stored in what are called vector databases or vectorstores

Given such a set up, it is then possible to retrieve parts of that information as and when relevant. This opens up the possibility of inquiry of large tracts of data (text) for specific pieces of information, something that was nearly impossible with unstructured data in the pre-LLM world. This technique is called retrieval augmented generation (RAG). 

We dug deeper into the probabilistic nature of language models, touching upon the fact that they are essentially next-word-predictors. Given a sequence of words, they simply predict the next word, and then the next and so on cumulatively. Such a system is then enhanced using reinforcement learning via human feedback (RLHF) to convert them to be capable of useful conversation with us. 

A consequence of this probabilistic setup is that language models confidently come up with a response or answer even when they do not know about the subject matter in question. This is described as hallucination

The RAG system mentioned above is part of the solution to mitigate hallucination in language models. A vital ingredient in this is the human skill of framing the question (including additional context via RAG) to the LLM most effectively in the first place, so that the language model has the best chance of providing the most desirable response. This, more an art than science, is called Prompt Engineering

Finally we touched upon how to assess the outputs as part of ensuring good prompt engineering and RAG. This is quite a challenging process, given the non-deterministic nature of language. Evaluation (or evals) as it is called, is an on-going challenge in the generative AI industry. 

Are you ready to move on to the next stage in this exciting adventure?

About the author

Ash Stuart

Engineer | Technologist | Hacker | Linguist | Polyglot | Wordsmith | Futuristic Historian | Nostalgic Futurist | Time-traveler

Reply

or to participate.