Large Language Models (LLMs) refer to large general-purpose language models that can be pre-trained and then fine-tuned for specific purposes. They are trained to solve common language tasks, such as text classification, question answering, document summarization, machine translation, and text generation. Typical training corpora for LLMs include natural language (e.g. web data) or other types of languages (e.g. programming languages).
For instance, ChatGPT, which surged to public prominence upon its release by OpenAI in November 2022, is a chatbot powered by a series of LLMs. It is capable of generating natural language and code in a dialogue format for a variety of tasks. More advanced versions of such models support multimodal inputs and outputs (e.g., text, images, audio, and videos).
Word prediction
LLMs predict the probabilities of next word (token), given an input string of text, based on the statistical properties of the language in the training data. The insight of large language modeling is that almost any NLP task can be modeled as word prediction in a large language model, and that a powerful-enough language model can solve them with a high degree of accuracy.
LLMs are large
LLMs are large, not only because of their large size of training data, but also their large number of parameters. They display different behaviors from smaller models and have important implications for those who develop and use AI systems.
First, the ability to solve complex tasks with minimal training data through in-context learning. Second, LLMs are accessed primarily through a prompting interface, which necessitates human comprehension of how LLMs function and the ability to format tasks in a way that LLMs can comprehend. Third, the development of LLMs no longer distinguishes between research and engineering. The training of LLMs requires extensive hands-on experience in processing large amounts of data and training in distributed parallel training. To develop effective LLMs, researchers must address complex engineering issues and work alongside engineers or have engineering expertise themselves.
Scaling laws
A fundamental principle in LLM development is the existence of "scaling laws". Roughly speaking, the performance of a large language model scales as a power-law with each of the following properties of model training: model size (the number of parameters), dataset size (the amount of training data), and the amount of compute used for training. The relationships between these factors and performance are known as scaling laws.
Multimodality
Some LLMs have incorporated additional modalities. Multimodal models can process and understand multiple data types, such as texts, images, audio, and videos. This means users can prompt a model with virtually any input to generate virtually any content type. These models can perform tasks such as image generation, visual question answering, image captioning, image classification, image search, and more.
Reasoning models
LLMs trained with reinforcement learning to perform reasoning tasks are reasoning models. These models are trained to enhance their capabilities in reasoning-intensive tasks such as coding, mathematics, science, logic reasoning, and multi-step planning for agentic workflows.
Reasoning models think before they answer, producing a long internal chain of thought before responding to the user.
Agents
LLMs serve as the foundation for building AI agents, which are software systems that perform tasks autonomously on behalf of users. LLMs act as the "brain" of an agent, enabling agents to make decisions, learn, and adapt. The key features of an AI agent include reasoning and acting.
Reference
Explainers
Visual explainers
Courses
Keeping up with generative A