Research Guides: Evaluating Generative AI Tools for Academic Research: For Transcription

AI Tools for Transcription

AI transcription tools are quickly becoming the norm in academic settings. Sometimes referred to as speech-to-text (STT) or automatic speech recognition (ASR) tools, they may be used in qualitative research to help a researcher capture, reflect on, and analyze spoken aspects of audio data, and/or to support accessibility in class, helping students transcribe, take notes, and/or summarize a lecture. No matter the purpose, you should think carefully about the ethical and legal implications of using these types of tools and get consent to ahead of time to record, transcribe, and store comments from all parties. Although New York State is a one-party consent state, researchers need to follow guidelines from the NYU Institutional Review Board when using these tools for their project(s).

Please note that AI transcription tools have been known to add text that was not part of the source audio, mis-attribute text to speaker, and/or incorrectly transcribe phrases or full sentences. In some cases, these errors can be harmful and offensive (see: Careless Whisper: Speech-to-Text Hallucination Harms by Allison Koenecke et al, 2024), or can create confusion and misrepresent speakers. It is your responsibility to review outputs to ensure accuracy.

Evaluating AI Tools for Transcription

Many researchers select a tool based on what is commonly used by peers in a department or field. As a researcher or student, it is your responsibility to investigate tools independently. In addition to identifying your own research or accessibility requirements, there are a number of other things to consider.

The table below provide a rubric for evaluating AI tools for transcription.
Criteria	Considerations
Privacy and Security	Whether a recording or transcription contain personally identifying information (PII) or the intellectual property of someone else, the privacy and security of your data should be carefully considered. Do you know how the platform will store and secure your data? What is the company’s data retention and privacy policy? (see Did You Read the Terms of Service below) Are they transparent about any use or sale of your data? Do they re-use your data to help train their tool?
Data Storage	Data privacy and security issues may or may not relate to privacy and security issues, but where it is stored has further implications. For instance, do you want to store data on the cloud, or locally on your device? If local, does your computer have the storage space and processing power to use an automated transcription tool? Note that high quality audio will lead to a better transcription, but will require more processing power. If on the cloud, can you find information about where the company’s servers are located? Does the service have limits on the size and length of audio files?
Accuracy	Accuracy may depend on some of the features including language or sound quality, but an understanding of how accurate a transcription is can help you assess how much work you may need to do to correct it and how easily that can be done. Can you find information about the “word error rate” (WER), or the number of errors compared to a human transcription? Does the tool provide an editor you can use to correct the transcript? As stated in the introduction above, AI transcription tools not only incorrectly transcribe, but can also append text that was not part of the source audio. In some cases, these errors can be harmful and offensive (see: Careless Whisper: Speech-to-Text Hallucination Harms by Allison Koenecke et al, 2024). It is your responsibility to review outputs to ensure accuracy.
Features	Does the tool provide features useful to your research such as: Languages - transcription quality may vary dramatically across languages Commenting or highlighting features Time stamping or indexing Identifying speakers Summarizing Sharing transcripts or transcription credits with collaborators
User Experience	Do you find the tool’s interface easy to navigate and understand? If you have a problem, can you access tech support? How long does the tool take to transcribe your audio? How does the tool export text, and can you customize the format of the transcript?
Cost	Is the tool free and/or open source? Note that free tools might come with trade-offs (e.g., a lack of transparency, privacy, and/or usability). If it is free and proprietary, be sure to read the terms of service especially closely. If there is a cost, how is the cost calculated and how much is it? What happens to your account and data if you stop paying? Does the service have a limited free plan you can use?
Access	NYU provides access to some platforms and tools that help with transcription including Zoom, NYU Stream, or the online version of Microsoft Word. Details on these platforms and more can be found in our Qualitative Data Analysis Research Guide.

Did You Read the Terms of Service?

Cloud-based services often have lengthy and dense terms of service. When evaluating transcription tools, you may be interested in finding information about the service’s data security, retention, and privacy practices. You may also want to know if the service can use your data to improve their AI, and if you can opt out from this practice.

Some companies have dedicated policies separate from the Terms of Service. For example, Transcribe by Wreally's policies highlight issues of relevance to researchers.
- Transcribe's Data Retention Policy
- Transcribe's Privacy Policy
You can often request more information about a company’s security and privacy practices. For example, Rev asks you to contact their sales team.
In some cases you can ensure your data is not used to train a platform’s AI. Rev allows you to opt out of AI training by emailing support@rev.com.

Third-Party AI Assistants

Third-party AI assistants like OtterPilot (from otter.ai), Read.ai, or Fireflies.ai may automatically record and transcribe your online meetings. Acting as a user, these “assistants” will automatically join an online meeting and record the conversation. While these tools may be used for legitimate purposes, it’s good practice to ask for explicit consent from meeting attendees to record and transcribe the meeting. We also encourage users to review the Terms of Service carefully prior to signing up for these tools which are often free, but lack transparency in their privacy policy and data use.

Additional Resources

"Transcription and Qualitative Methods: Implications for Third Sector Research" by C. McMullin (2023)
Abstract: While there is a vast literature that considers the collection and analysis of qualitative data, there has been limited attention to audio transcription as part of this process. In this paper, I address this gap by discussing the main considerations, challenges and implications of audio transcription for qualitative research on the third sector. I present a framework for conducting audio transcription for researchers and transcribers, as well as recommendations for writing up transcription in qualitative research articles.
Citation: McMullin, C. Transcription and Qualitative Methods: Implications for Third Sector Research. Voluntas 34, 140–153 (2023). https://doi.org/10.1007/s11266-021-00400-3
"How to Stop Your Data From Being Used to Train AI" by M. Burgess (2024, April 10; Wired Magazine)
"When the Terms of Service Change to Make Way for A.I. Training" by E. Tan (2024, June 26; The New York Times)
"Third-party AI “Assistants” in Zoom" by San Jose State University
"Careless Whisper: Speech-to-Text Hallucination Harms" by A. Koenecke et al. (2024)
Abstract: Speech-to-text services aim to transcribe input audio as accurately as possible. They increasingly play a role in everyday life, for example in personal voice assistants or in customer-company interactions. We evaluate Open AI’s Whisper, a state-of-the-art automated speech recognition service outperforming industry competitors, as of 2023. While many of Whisper’s transcriptions were highly accurate, we find that roughly 1% of audio transcriptions contained entire hallucinated phrases or sentences which did not exist in any form in the underlying audio. We thematically analyze the Whisper-hallucinated content, finding that 38% of hallucinations include explicit harms such as perpetuating violence, making up inaccurate associations, or implying false authority. We then study why hallucinations occur by observing the disparities in hallucination rates between speakers with aphasia (who have a lowered ability to express themselves using speech and voice) and a control group. We find that hallucinations disproportionately occur for individuals who speak with longer shares of non-vocal durations—a common symptom of aphasia. We call on industry practitioners to ameliorate these language-model-based hallucinations in Whisper, and to raise awareness of potential biases amplified by hallucinations in downstream applications of speech-to-text models.
Citation: Allison Koenecke, Anna Seo Gyeong Choi, Katelyn X. Mei, Hilke Schellmann, and Mona Sloane. 2024. Careless Whisper: Speech-to-Text Hallucination Harms. In The 2024 ACM Conference on Fairness, Accountability, and Transparency (FAccT ’24), June 03–06, 2024, Rio de Janeiro, Brazil. ACM, New York, NY, USA, 10 pages. https://doi.org/10.1145/3630106.3658996
"Lomax & Whisper.cpp" by Matt Miller (2024)
Miller is a librarian and researcher at the Library of Congress. He tested a version of OpenAI's Whisper transcription tool on Alan Lomax’s 1938 Midwest Folk Song Collection at the Library of Congress.
"Researchers say an AI-powered transcription tool used in hospitals invents things no one ever said" by Garance Burke & Hilke Schellmann (2024, October 26; The Associated Press)
"Tech behemoth OpenAI has touted its artificial intelligence-powered transcription tool Whisper as having near “human level robustness and accuracy.”
But Whisper has a major flaw: It is prone to making up chunks of text or even entire sentences, according to interviews with more than a dozen software engineers, developers and academic researchers. Those experts said some of the invented text — known in the industry as hallucinations — can include racial commentary, violent rhetoric and even imagined medical treatments."

Evaluating Generative AI Tools for Academic Research

Related Research Guides

Subject Librarian

Guide Contributors

AI Tools for Transcription

Evaluating AI Tools for Transcription

Did You Read the Terms of Service?

Third-Party AI Assistants

Additional Resources