In addition to the English-Corpora web interface, NYU users have access to the entire raw file archives for computational analysis. The corpora in this package include: iWeb, the Intelligent Web Corpus; NOW, News on the Web; Coronavirus Corpus; COCA: Corpus of Contemporary American English; GloWbE, Global Web-based English; Wikipedia Corpus; COHA: Corpus of Historical American English; TV Corpus; Movies Corpus, SOAP Corpus, and the Corpus del Español and Corpus do Português. More information on the corpora is at https://www.corpusdata.org/corpora.asp
The data can be accessed and downloaded from Research Workspace.
Research Workspace Access Instructions
NYU researchers with a valid netID can mount the cloud-based
ds_collections Research Workspace share on any local computer on the NYU network (i.e. on campus or on NYU VPN if off campus). Follow the instructions for how to access Research Workspace, using
ds_collections as the project name. The English-Corpora collection will be found at
Contact firstname.lastname@example.org for questions about this data source.