In addition to the English-Corpora web interface, NYU users have access to the entire raw file archives for computational analysis. The corpora in this package include: iWeb, the Intelligent Web Corpus; NOW, News on the Web; Coronavirus Corpus; COCA: Corpus of Contemporary American English; GloWbE, Global Web-based English; Wikipedia Corpus; COHA: Corpus of Historical American English; TV Corpus; Movies Corpus, SOAP Corpus, and the Corpus del Español and Corpus do Português. More information on the corpora is at https://www.corpusdata.org/corpora.asp
The data can be accessed and downloaded from Research Workspace. Further documentation about the collection is available on its home record on the UltraViolet research repository at https://doi.org/10.58153/c927w-hjr36.
Research Workspace Access Instructions
NYU researchers with a valid netID can mount the cloud-based ds_collections
Research Workspace share on any local computer on the NYU network (i.e. on campus or on NYU VPN if off campus). Follow the instructions for how to access Research Workspace, using ds_collections
as the project name. The English-Corpora collection will be found at ds_collections/english-corpora
.
Questions?
Contact data.services@nyu.edu for questions about this data source.