AI transcription tools are quickly becoming the norm in academic settings. Sometimes referred to as speech-to-text (STT) or automatic speech recognition (ASR) tools, they may be used in qualitative research to help a researcher capture, reflect on, and analyze spoken aspects of audio data, and/or to support accessibility in class, helping students transcribe, take notes, and/or summarize a lecture. No matter the purpose, you should think carefully about the ethical and legal implications of using these types of tools and get consent to ahead of time to record, transcribe, and store comments from all parties. Although New York State is a one-party consent state, researchers need to follow guidelines from the NYU Institutional Review Board when using these tools for their project(s).
Please note that AI transcription tools have been known to add text that was not part of the source audio, mis-attribute text to speaker, and/or incorrectly transcribe phrases or full sentences. In some cases, these errors can be harmful and offensive (see: Careless Whisper: Speech-to-Text Hallucination Harms by Allison Koenecke et al, 2024), or can create confusion and misrepresent speakers. It is your responsibility to review outputs to ensure accuracy.
Many researchers select a tool based on what is commonly used by peers in a department or field. As a researcher or student, it is your responsibility to investigate tools independently. In addition to identifying your own research or accessibility requirements, there are a number of other things to consider.
Criteria | Considerations |
---|---|
Privacy and Security |
Whether a recording or transcription contain personally identifying information (PII) or the intellectual property of someone else, the privacy and security of your data should be carefully considered.
|
Data Storage |
Data privacy and security issues may or may not relate to privacy and security issues, but where it is stored has further implications. For instance, do you want to store data on the cloud, or locally on your device?
|
Accuracy |
Accuracy may depend on some of the features including language or sound quality, but an understanding of how accurate a transcription is can help you assess how much work you may need to do to correct it and how easily that can be done.
As stated in the introduction above, AI transcription tools not only incorrectly transcribe, but can also append text that was not part of the source audio. In some cases, these errors can be harmful and offensive (see: Careless Whisper: Speech-to-Text Hallucination Harms by Allison Koenecke et al, 2024). It is your responsibility to review outputs to ensure accuracy. |
Features |
Does the tool provide features useful to your research such as:
|
User Experience |
|
Cost |
|
Access | NYU provides access to some platforms and tools that help with transcription including Zoom, NYU Stream, or the online version of Microsoft Word. Details on these platforms and more can be found in our Qualitative Data Analysis Research Guide. |
Cloud-based services often have lengthy and dense terms of service. When evaluating transcription tools, you may be interested in finding information about the service’s data security, retention, and privacy practices. You may also want to know if the service can use your data to improve their AI, and if you can opt out from this practice.
Third-party AI assistants like OtterPilot (from otter.ai), Read.ai, or Fireflies.ai may automatically record and transcribe your online meetings. Acting as a user, these “assistants” will automatically join an online meeting and record the conversation. While these tools may be used for legitimate purposes, it’s good practice to ask for explicit consent from meeting attendees to record and transcribe the meeting. We also encourage users to review the Terms of Service carefully prior to signing up for these tools which are often free, but lack transparency in their privacy policy and data use.