Skip to Main Content

Evaluating Generative AI Tools for Academic Research

Support for critical approaches to GenAI tools for academic research.

AI Tools for Discovery

GenAI tools are marketed to perform a number of tasks as part of the academic research workflow including brainstorming, search, summarizing, coding, and more. Evaluating their usefulness starts with an understanding of the tools themselves - how they’re built and work - as well as careful validation and human oversight to ensure currency, reliability, authority, purpose (point of view/bias), and truthfulness in conjunction with the time it takes to verify outputs. 

All search tools that use Generative AI can return misinformation, including fake facts and fabricated citations.

Below is a brief overview of the types of tools/tasks currently available.

GenAI-enhanced Search Tools

This type of feature has been added to tools you are already familiar with:

  • Search engines/sites like Google.com and DuckDuckGo “crawl” the internet, index pages and match your search terms to priorities such as relevance and currency. Now, the results may also provide an AI-generated summary. 

AI summaries do not appear with every search. These AI features are a separate function powered by large language models and other tools (e.g. Google AI Overview uses Google’s Gemini LLM, and is “supplemented with aspects of the company’s Search system, like the Knowledge Graph” according to this May 16, 2024 article in Wired by Reece Rogers).

  • Some Library databases now include generative AI features including search/keyword suggestions, article summaries, suggestions for research questions, and related resources (e.g. Statista Research AI AssistantJSTOR). These tools use commercial LLMs as underlying infrastructure and use retrieval-augmented generation (RAG) trained on the database’s corpus in order to focus the outputs; this allows AI-generated responses to cite sources from within the database only, or to generate outputs that are relevant to the subject domain of the platform. These features may still generate lies, misinformation, or be misleading and even harmful. Outputs must always be independently verified and evaluated.

    NYU Libraries provides access to approximately 1,500 databases. AI features in these resources are evolving rapidly and we are evaluating them as they develop. If you are unsure about whether a specific NYU Libraries database uses AI or about how this will affect your research process, contact a librarian.

Note: ProQuest’s AI Research Assistant was evaluated by NYU librarians in December 2024 and was found to be unsuitable for inclusion in NYU Libraries databases. The tool will be re-evaluated and re-considered at a future date.

Chatbots

Chatbots are sometimes referred to as "answer engines" (as opposed to search engines). Chatbots generate a synthesized response (output) to a user query (input) based on a probabilistic and statistical algorithm. The responses appear as conversational summaries. Some chatbots have conversational memory and can build and follow-up on questions and queries. 

Chatbots can:

  • Have their own platform (e.g. ChatGPT, Claude, Microsoft CoPilot, Perplexity), which can be multi-modal - offering different Gen AI tools on the same platform. 
  • Be an add-on/extension (e.g. Perplexity Chrome extension)
  • Be a feature within a library resource (e.g. Statista Research AI Assistant)

Some of these tools generate outputs with hyperlinks to sources/citations to help users verify information. Please note that it is still possible for the outputs to include misinformation. Many of these tools do not cite scholarly, academic sources. The number of sources may be limited. The accuracy and relevance of sources must be verified and evaluated. 

Personalized Knowledge Management and Research Organization Tools

Digital AI Notebooks and Research Organization tools allow users to upload and query documents that can range from PDFs of articles, bibliographies, citations, links, personal notes, and paper drafts written by the user (e.g. ElicitChatGPTGoogle NotebookLM). The tool then provides summaries, suggests connections between the uploaded materials, and/or allows users to generate inputs that “query” the materials in the form of a chat exchange similar to the tools described above.

These tools use RAG AI and results are subject to the same risk of false, synthetic information and confabulations as AI generated tools.

Note: Exercise caution when uploading material into these tools as there are privacy and legal concerns to consider. 

The materials you upload may include content that should remain private and/or may be subject to restrictive licensing agreements. For instance, PDFs of articles retrieved via NYU Libraries’ resources are for non-commercial, academic purposes only; some companies reserve the right to add uploaded material to their training data or use for other commercial purposes and/or outputs generated may violate licensed use. It is the responsibility of each user to carefully review the terms of services and related policies.

Google Gemini and NotebookLM at NYU

As of February 2025, NYU provides access to Google Gemini and Google NotebookLM to current students, faculty, and staff. The latest information on the NYU instances of these tools including data privacy settings and how to access them can be found via NYU Generative AI (GenAI) Services.

Consider both the note above regarding data privacy and restrictions on uploading NYU library resources along with the rubric below when integrating these tools into your research workflow.

Citation Chasing and Visualization

Citation chasing is the process of using an article that you have in-hand as a starting point then tracking down other sources connected to that work as you create a literature review (e.g. ElicitResearch RabbitInciteful). Connections are made via citations (i.e. a paper is cited in or cites a connected paper), shared authors, shared subject matter, etc. Some tools generate a visualization that maps these connections.

Evaluating AI Tools for Discovery

Many researchers select a tool based on what is commonly used by peers in a department or field. As a researcher, it is your responsibility to investigate tools independently. In addition to identifying your own research or accessibility requirements, there are a number of other things to consider.

The table below provide a rubric for evaluating AI tools for discovery.
Criteria Considerations
Accuracy and Verification GenAI tools are based on statistical probability and predicting text, not retrieving information. GenAI has been proven to create responses with false summaries and/or imaginary citations and sources that do not exist. The reliability and authenticity of every piece of information must be independently verified regardless of the GenAI tool(s) you are using.
Sources/Training Data

Are the training data sources transparent? Have the sources and data been ethically sourced?

What are the sources of information the tool draws on or what is the training data (these can be different)? Are the subjects and disciplines relevant to your search included?

Does the tool provide verifiable citations in the form of links to information sources? Does it use scholarly databases? Peer-reviewed material? The open web? 

Note: Confabulations occur even in tools that draw data from verified scholarly datasets or databases. 
Privacy and Data Collection

Is the tool cloud-based or run locally on your device? How is data transfer and privacy managed for either?

How will your inputs (queries, uploaded files, etc.) be used? Are chatbot conversations, research queries, and personal data protected? Will your data be used to train the tool? Are your search queries archived or preserved, or will they be lost if you stop subscribing to the tool?

Licensing Do you have permission to provide your inputs/files to the platform/tool? (see Upload and Query Documents in the above table)
Access and Equity

Does it cost money to use this tool? If so, how much and how much are you willing/able to pay? Are there different tiers with different fees? Is the tool easy or difficult to access?

Note: Free tools might come with trade-offs (e.g. a lack of transparency about the tool, or about their data retention policy and/or usability). If the tool is free and proprietary (as opposed to open source) be sure to read the terms of service closely.

Bias and Ethics

Have the sources and data been ethically sourced? Was the tool trained on copyrighted data? Are the training data sources used with the knowledge and permission of the original creators? Are the results biased or skewed? How will you assess bias and ethical issues in the outputs or the tools themselves?

Currency How current are the data sources for the tool? Many tools have time cut-offs and updates to the tool’s training data may be on an ad hoc or regular schedule. How does the tool deal with outdated or retracted information? Are you getting the most recent sources and research studies? Does the training data go back far enough?
Relevance

How well do the AI features work for the purpose of your research project? Was it designed for the type of task you are performing? 

Are sources for training data or cited in outputs poor quality? Are responses vague, irrelevant, and lacking in detail? Are they mixed in with high-quality sources, with no distinction? Is the number of citations generated limited? 

The Research Process and Learning Goals Research is not just a search for answers. The research process should help you craft your research question. It should lead you to new knowledge and new ideas. Does the tool that you're using help, or hurt, this process? Does it take more time and effort to verify outputs than it would be to perform the task yourself?

Additional Resources

New to AI? Learn More with These Free Resources: