Skip to Main Content

Machines and Society

A growing guide on the latest in data-driven research and emerging technologies at the intersection of society, information and technology.

Overview

Large Language Models (LLMs) refer to large general-purpose language models that can be pre-trained and then fine-tuned for specific purposes. They are trained to solve common language problems, such as text classification, question answering, document summarization, and text generation. The models can then be adapted to solve specific problems in different fields using a relatively small size of field datasets via fine-tuning.

The ability of LLMs taking the knowledge learnt from one task and applying it to another task is enabled by transfer learning. Pre-training is the dominant approach to transfer learning in deep learning.

LLMs predict the probabilities of next word (token), given an input string of text, based on the statistical properties of the language in the training data. Typical training corpora for LLMs include natural language (e.g. web data). But LLMs can also be trained on other types of languages (e.g. programming languages).

LLMs are large, not only because of their large size of training data, but also their large number of parameters. They display different behaviors from smaller models and have important implications for those who develop and use A.I. systems. First, the ability to solve complex tasks with minimal training data through in-context learning. Second, LLMs are accessed primarily through a prompting interface, which necessitates human comprehension of how LLMs function and the ability to format tasks in a way that LLMs can comprehend. Third, the development of LLMs no longer distinguishes between research and engineering. The training of LLMs requires extensive hands-on experience in processing large amounts of data and training in distributed parallel training. To develop effective LLMs, researchers must address complex engineering issues and work alongside engineers or have engineering expertise themselves.


Reference

Zhao, W. X. et al. (March 31, 2023). A Survey of Large Language Models. https://doi.org/10.48550/arXiv.2303.18223. GitHub.

 

Bommasani, R., Hudson, D. A., Adeli, E., Altman, R., Arora, S., von Arx, S., ... & Liang, P. (2021). On the Opportunities and Risks of Foundation Models. https://doi.org/10.48550/arXiv.2108.07258

 

Google Cloud Tech. (May 9, 2023). Introduction to large language models (YouTube video).

 

Microsoft Developer. (May 25, 2023). State of GPT.

 

Wolfram. What Is ChatGPT Doing … and Why Does It Work? (Feb 14, 2023). ArticleYouTube video.

 

Weidinger, L., Mellor, J., Rauh, M., Griffin, C., Uesato, J., Huang, P. S., ... & Gabriel, I. (2021). Ethical and Social Risks of Harm from Language Models. https://doi.org/10.48550/arXiv.2112.04359 

 

Bowman, S. R. (2023). Eight Things to Know about Large Language Models. https://arxiv.org/abs/2304.00612 

Capabilities, Limitations, and Future Directions

A growing reading list.
 

Bender, E. M., Gebru, T., McMillan-Major, A., & Shmitchell, S. (2021). On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? 🦜. Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, 610–623. https://doi.org/10.1145/3442188.3445922

 

Bommasani, R., Hudson, D. A., Adeli, E., Altman, R., Arora, S., von Arx, S., ... & Liang, P. (2021). On the Opportunities and Risks of Foundation Models. https://doi.org/10.48550/arXiv.2108.07258

 

Bubeck, S., Chandrasekaran, V., Eldan, R., Gehrke, J., Horvitz, E., Kamar, E., ... & Zhang, Y. (2023). Sparks of Artificial General Intelligence: Early Experiments with GPT-4. https://doi.org/10.48550/arXiv.2303.12712. [YouTube]

 

Kaddour, J., Harris, J., Mozes, M., Bradley, H., Raileanu, R., & McHardy, R. (2023). Challenges and Applications of Large Language Models. https://doi.org/10.48550/arXiv.2307.10169

 

Tamkin, A., Brundage, M., Clark, J., & Ganguli, D. (2021). Understanding the Capabilities, Limitations, and Societal Impact of Large Language Models. https://doi.org/10.48550/ARXIV.2102.02503