In this section, we explore various approaches of utilizing generative AI as research assistants or research methods for data-driven research. Generative AI has the potential to enhance academic performance in multiple stages of scientific research, and can be done with caution in conjunction with human evaluation and interpretation.
The major incentive for integrating generative AI into the research process is that it enables researchers to concentrate on the fundamental tasks by delegating supplementary responsibilities to generative AI.
However, when conducting research with generative AI, it is crucial to evaluate the model's limitations and potential variability in performance. In the end, it is domain expertise, the ability to generate unique insights, and ethical considerations that will be essential in distinguishing research outcomes.
One of the most apparent use cases is using generative AI as a coding assistant across various stages in the data workflow. Researchers can use the tool to
Note that generative AI's performance in these cases depends on the level of support for the particular language or software package. Additionally, researchers should be cautious about using generative AI for anything beyond coding assistance. Even tasks as seemingly straightforward like data cleaning require making many small decisions based on clearly defined objectives. Relying on generative AI for more complex tasks like model selection without evaluating the outputs is also not recommended.
* This section is outdated and will be updated.
Researchers may leverage generative AI as a natural language processing tool for various tasks, including labeling topics, extracting entities, and assessing sentiments for given text data, among other tasks.
There are some experiments using ChatGPT for tasks with potential for feature generation, including
Note that this technique may be helpful for researchers to obtain preliminary understanding of their data and documents at the initial stages of research. However, the generation process is not transparent and arguably not reproducible when the full technical details of the model are absent, making it challenging to make informed decisions along the way.
LLMs are implicit computational models of humans (often referred to as homo silicus) by nature of their training on vast human-generated data. This makes them potent for building autonomous agents that simulate individuals and societies in single-agent setups, multi-agent systems, or human-AI interactions.
LLM agents are utilized in social simulations widely. They are constructed to explore social dynamics, develop or test theories of human behavior, or populate virtual spaces with realistic social phenomena. They provide ethical, scalable alternatives to real-world human studies, including topics very difficult to examine or populations very difficult to access.
Economics
Social computing
Social theories
LLM agents use modular components to enhance human-like behavior in dynamic settings: profiling module to identify roles, memory module to recall past behaviors, planning module to plan future actions, and action module to translate the agent’s decisions into specific outputs.
Evaluation strategies of LLM agents include subjective assessments by human judges who score or rank agent outputs or differentiate them from human outputs, as well as quantitative metrics and standardized benchmarks.
Deploying LLM agents for social simulations requires careful consideration, as model limitations may affect the accuracy of outputs or lead to unintended consequences.
Bias: LLMs tend to give responses not representative of the diverse public (see Bias section for more details).
Alignment: LLMs are usually fine-tuned to align with human values. Besides, as a byproduct of fine-tuning, some models tend to be overly agreeable. However, an ideal social simulation of real-world problems may require representing negative human behaviors, which is often restricted.
Low variance of response distributions: LLMs generate less diverse responses than humans would.
Temporal gaps: The temporal information in LLM training data (e.g., from the internet) is often lost, making it risky to simulate historical contexts or current populations accurately if there's a gap between the model's training data cutoff and the period being modeled.
Cross-linguistic influence: If a model has been trained on a mixture of languages, knowledge and attitudes from one socio-linguistic system may affect others in the model. For instance, lthe internal representations can be partially language-agnostic, partially biased toward English-centric reasoning, and partially differentiated by language.
Lack of sensory experience: LLMs lack embodied experiences, limiting their understanding of real-world context.
Alien cognition: LLMs may at times deviate from natural human behavior, generating misleading human simulations. For instance, surprises that emerge from analysis may be misconstrued as discoveries when they are mere errors in simulation.
Knowledge boundary: LLMs' vast knowledge can be disadvantageous when simulating scenarios requiring agents to operate with limited or specific knowledge, as they might make decisions based on information real users wouldn't have.
Yun Dai
Data Services
yun.dai@nyu.edu