Practical knowledge of how LLMs work can be advantageous in generating high-quality outcomes using LLMs.
Prompt
Prompting is the way we talk to LLMs, or the way we program the models using natural language.
A prompt is a text input that a user gives to a language model to get the model to do something useful. It provides the model with context, instructions, and examples that help it understand user intent. The process of iteratively designing and optimizing prompts to guide LLMs towards generating the desired responses for a task is known as prompt engineering.
Tokens
LLMs process input and output texts by breaking them down into smaller units called tokens. Tokens can be words, chunks of words, or single characters. The models infer statistical relationships between tokens and predict the next token in a sequence of tokens.
Tokenization is the task of splitting character sequences into tokens. To see how this process happens, The Tokenizer Playground on Hugging Face visualizes how text is tokenized by different LLMs.
Context length / context window
Context length / context window is the maximum number of words the models can "remember" in a conversation with the user. Different models have different context window sizes.
A context window typically includes user prompt, model output, conversation history, system instructions, contents of any attached documents, or websites the model searches.
LLM features
Features of LLMs can include thinking or reasoning, deep research, file uploads, Internet search, code interpreter, and multimodality. Find out more here.
Reasoning capabilities and multimodality are explained here.
Internet search: LLMs can be instructed or automatically recognize the need to perform web searches, retrieve information from web pages, and incorporate it into the context window to answer questions about recent topics not limited by the cut-off date of training data. Perplexity.ai is a tool built around this.
Deep research: An agent that combines internet search and uses reasoning over an extended period (tens of minutes) to generate multi-source multi-step reports on complex topics at the level of a research analyst. This feature is provided by OpenAI, Gemini, and Perplexity.
File uploads: Users can upload documents (e.g., texts, PDFs, images) to load their content into the context window, allowing the LLM to answer questions about the specific document.
Code interpreter: LLMs can write and execute code (e.g., Python, JavaScript) through an integrated interpreter. This is essential for accurate calculations beyond simple arithmetic, data analysis, plotting, and software development.
Model selection
Be mindful of the specific model version being used, as capabilities, knowledge, and costs vary significantly. Compare, for instance, the differences among these model series (e.g., OpenAI's GPT series and o-series models, Google's Gemini model variants, and Anthropic's Claude family) to understand their strengths and limitations.
There are various techniques and methods to use prompts to improve the performance of language models on many tasks, tapping into the emergent abilities of large language models that are not present in small models.
Chain-of-thought prompting (CoT) instructs the model to explain its reasoning through intermediate steps to derive the final answer. Note that reasoning models now perform chain-of-thought reasoning by default, even if users do not explicitly structure their prompts to get stepwise explanations or logical breakdowns. In fact, explicitly asking for chain-of-thought in addition to these models' built-in reasoning can sometimes cause confusion or degrade performance. For instance, there are differences to consider when prompting a reasoning model versus prompting a GPT model. While reasoning models will provide better results on tasks with only high-level guidance, GPT models benefit from very precise instructions.
Few-shot prompting gives multiple input/output examples of a task in the prompt. Transformer-based LLMs can learn a new task from a few examples without the need for any new training data. This ability is referred to as in-context learning, a concept popularized by the GPT-3 paper.
Zero-shot prompting provides the model with a direct instruction or question without any additional context or examples. This strategy is used in scenarios such as idea generation, summarization, or translation.
Zero-shot CoT prompting combines zero-shot and chain-of-thought prompting by simply adding "Let's think step by step" before each answer.
For strategies and best practices on crafting effective prompts, please refer to the guides and documentation specific to your chosen LLM.
Before you begin, consider defining clear success criteria for your use case and identifying ways to empirically test against those criteria. Additionally, prepare an initial draft of the prompt you wish to improve.
A prompt can include several elements, such as the objective, instructions, persona or role, constraints, examples, reasoning steps, output format, and input data or documents.
In general, an effective prompt sets clear goals, gives specific and explicit instructions, and provides context behind instructions. Prompts can be improved through iterative experimentation through refining phrasing, specificity, detail, length, and tone.
Reference