Translation, which is distinct from transliteration (writing words of one language using the alphabet of another), sits at the intersection of several fields: “sociology, psychology, computer science, information science, and linguistics, its birthplace” (Erton, 2020, p. 1910). Although it’s easy to think of translation as merely working out the “same meaning” between two texts, meaning can be idiomatic or stored in the context - temporal, political, social, et al.. The intersection of all of these issues makes translation difficult, but the standard by which translation is judged is ultimately set by humans. As a result, translation can liken itself to more of an interpretive rather than definitive practice, making the development and the evaluation of machine translation tools particularly challenging.
Machine Translation (MT) has been around for some time. As an umbrella term, it refers to the use of computers to translate words and phrases from one language to another. There are a number of ways this relationship can occur:
Translation can be unidirectional (e.g. from Spanish to Korean only) or bidirectional (e.g. from Spanish to Korean AND from Korean to Spanish).
The source or input format for MT is typically audio or text; this can also include the audio or transcription of the audio in a video, or images where optical character recognition (OCR) can be used to turn text that appears in images into machine-readable characters.
Please note that GenAI translation tools can generate inaccurate outputs including errors in translating individual words and context. It is your responsibility to review outputs to ensure accuracy. Researchers using these tools for their project(s) also need to follow guidelines from the NYU Institutional Review Board.
Many researchers select a tool based on what is commonly used by peers in a department or field. As a researcher or student, it is your responsibility to investigate tools independently. In addition to identifying your own research or accessibility requirements, there are a number of other things to consider.
Criteria | Considerations |
---|---|
Privacy and Security |
Whether a recording or transcription contains personally identifying information (PII) or the intellectual property of someone else, the privacy and security of your data should be carefully considered.
|
Data Storage |
Data privacy and security issues may or may not relate to privacy and security issues, but where it is stored has further implications. For instance, do you want to store data on the cloud, or locally on your device?
|
Accuracy |
Accuracy may depend on the source material or languages available, but an understanding of how accurate a translation is can help you assess how much work you may need to do to correct outputs and how easily that can be done.
The accuracy of any AI translation tool can sometimes be debated because of the interpretative nature of the task as described above. It is your responsibility to review outputs to ensure accuracy. |
Features |
Does the tool provide features useful to your research such as:
|
User Experience |
|
Cost |
|
Cloud-based services often have lengthy and dense terms of service. When evaluating translation tools, you may be interested in finding information about the service’s data security, retention, and privacy practices. You may also want to know if the service can use your data to improve their AI, and if you can opt out from this practice.