Large Language Models (LLMs) refer to large general-purpose language models that can be pre-trained and then fine-tuned for specific purposes. They are trained to solve common language problems, such as text classification, question answering, document summarization, and text generation. The models can then be adapted to solve specific problems in different fields using a relatively small size of field datasets via fine-tuning.
The ability of LLMs taking the knowledge learnt from one task and applying it to another task is enabled by transfer learning. Pre-training is the dominant approach to transfer learning in deep learning.
LLMs predict the probabilities of next word (token), given an input string of text, based on the statistical properties of the language in the training data. Typical training corpora for LLMs include natural language (e.g. web data). But LLMs can also be trained on other types of languages (e.g. programming languages).
LLMs are large, not only because of their large size of training data, but also their large number of parameters. They display different behaviors from smaller models and have important implications for those who develop and use AI systems. First, the ability to solve complex tasks with minimal training data through in-context learning. Second, LLMs are accessed primarily through a prompting interface, which necessitates human comprehension of how LLMs function and the ability to format tasks in a way that LLMs can comprehend. Third, the development of LLMs no longer distinguishes between research and engineering. The training of LLMs requires extensive hands-on experience in processing large amounts of data and training in distributed parallel training. To develop effective LLMs, researchers must address complex engineering issues and work alongside engineers or have engineering expertise themselves.
Reference
A growing reading list.