Research Guides: Machines and Society: What Large Language Models Are

Overview

Large Language Models (LLMs) refer to large general-purpose language models that can be pre-trained and then fine-tuned for specific purposes. They are trained to solve common language tasks, such as text classification, question answering, document summarization, machine translation, and text generation. Typical training corpora for LLMs include natural language (e.g. web data) or other types of languages (e.g. programming languages).

For instance, ChatGPT, which surged to public prominence upon its release by OpenAI in November 2022, is a chatbot powered by a series of LLMs. It is capable of generating natural language and code in a dialogue format for a variety of tasks. More advanced versions of such models support multimodal inputs and outputs (e.g., text, images, audio, and videos).

Word prediction

LLMs predict the probabilities of next word (token), given an input string of text, based on the statistical properties of the language in the training data. The insight of large language modeling is that almost any NLP task can be modeled as word prediction in a large language model, and that a powerful-enough language model can solve them with a high degree of accuracy.

LLMs are large

LLMs are large, not only because of their large size of training data, but also their large number of parameters. They display different behaviors from smaller models and have important implications for those who develop and use AI systems.

First, the ability to solve complex tasks with minimal training data through in-context learning. Second, LLMs are accessed primarily through a prompting interface, which necessitates human comprehension of how LLMs function and the ability to format tasks in a way that LLMs can comprehend. Third, the development of LLMs no longer distinguishes between research and engineering. The training of LLMs requires extensive hands-on experience in processing large amounts of data and training in distributed parallel training. To develop effective LLMs, researchers must address complex engineering issues and work alongside engineers or have engineering expertise themselves.

Scaling laws

A fundamental principle in LLM development is the existence of "scaling laws". Roughly speaking, the performance of a large language model scales as a power-law with each of the following properties of model training: model size (the number of parameters), dataset size (the amount of training data), and the amount of compute used for training. The relationships between these factors and performance are known as scaling laws.

Multimodality

Some LLMs have incorporated additional modalities. Multimodal models can process and understand multiple data types, such as texts, images, audio, and videos. This means users can prompt a model with virtually any input to generate virtually any content type. These models can perform tasks such as image generation, visual question answering, image captioning, image classification, image search, and more.

Reasoning models
LLMs trained with reinforcement learning to perform reasoning tasks are reasoning models. These models are trained to enhance their capabilities in reasoning-intensive tasks such as coding, mathematics, science, logic reasoning, and multi-step planning for agentic workflows.

Reasoning models think before they answer, producing a long internal chain of thought before responding to the user.

Agents
LLMs serve as the foundation for building AI agents, which are software systems that perform tasks autonomously on behalf of users. LLMs act as the "brain" of an agent, enabling agents to make decisions, learn, and adapt. The key features of an AI agent include reasoning and acting.

Reference

Daniel Jurafsky & James H. Martin. (January 12, 2025). Ch 10 Large Language Models. Speech and Language Processing.

Andrej Karpathy. (Nov 23, 2023). [1hr Talk] Intro to Large Language Models (YouTube video).

Microsoft Developer. (May 25, 2023). State of GPT.

Google Cloud Tech. (May 9, 2023). Introduction to large language models (YouTube video).

Zhao, W. X. et al. (March 31, 2023). A Survey of Large Language Models. https://doi.org/10.48550/arXiv.2303.18223. GitHub.

Bommasani, R., Hudson, D. A., Adeli, E., Altman, R., Arora, S., von Arx, S., ... & Liang, P. (2021). On the Opportunities and Risks of Foundation Models. https://doi.org/10.48550/arXiv.2108.07258

Explainers

What Is ChatGPT Doing … and Why Does It Work?, Wolfram (Feb 14, 2023). Article, YouTube video.
GPT-4 - How does it work, and how do I build apps with it? - CS50 Tech Talk (May 01, 2023)
State of GPT, Microsoft Developer (May 25, 2023). Download video and transcript on this page.
RLHF: Reinforcement Learning from Human Feedback (May 02, 2023)
Generative AI and Large Language Models (LLMs), NYU Libraries

Visual explainers

LLM Visualization, Brendan Bycroft
LLM Transformer Model Visually Explained, Polo Club of Data Science
The Illustrated Transformer, Jay Alammar
FineWeb: decanting the web for the finest text data at scale, a Hugging Face Space by HuggingFaceFW

Courses

Generative AI Learning Path, Google Cloud. YouTube playlist, website.
Generative AI training courses, Google Cloud
Artificial Intelligence for Beginners - A Curriculum, Microsoft
Generative AI for Everyone, DeepLearning.AI
LangChain for LLM Application Development, DeepLearning.AI
CS50's Introduction to Artificial Intelligence with Python, Harvard University

Keeping up with generative A

Quickstart Resources - AI Pedagogy Project, metaLAB (at) Harvard
AI Canon
Lil'Log, Lilian Weng
One Useful Thing, Ethan Mollick
Chip Huyen, Chip Huyen
AI by @ttunguz, Tomasz Tunguz

Machines and Society

Overview

Read More