Research Guides: Text Data Mining: ProQuest Congressional Record TDM

Congressional Record Text-as-Data Collection

The ProQuest Congressional Record text-as-data collection consists of machine-readable files capturing the full text and a small number of metadata fields for a full run of the Congressional Record between 1789 and 2005. Metadata fields include the date of publication, subjects (for issues for which such information exists in the ProQuest system), and URLs linking the full text to the canonical online record for that issue on the ProQuest Congressional platform. A total of 31,952 issues are available.

The collection is restricted to members of the NYU community.

The data are arranged in JSON format, with each file encompassing a single issue. The files are split into three parts:

Part A: years 1789 to 1997
Part B: years 1998 to 2001
Part C: years 2002 to 2005

Note that this collection is not updated with the latest issues. Complete access to the full text Congressional Record that includes the most recent issues can be made using ProQuest TDM Studio, however those full text materials remain in the ProQuest platform and cannot be downloaded for local use.

The PDF files, one for each issue, that provided the extracted text present in the JSON are available on request. Contact data.services@nyu.edu to request access.

The data can be accessed through two options:

Research Workspace

NYU researchers with a valid netID can mount the cloud-based ds_collections Research Workspace share on any local computer on the NYU network (i.e. on campus or on NYU VPN if off campus). Follow the instructions for how to access Research Workspace, using ds_collections as the project name. The collection will be found at ds_collections/proquest/proquest_congressional_record. A README file with further information about using the files is available there.

NYU High Performance Computing

The files can be found and used directly for batch jobs on the NYU HPC at /scratch/work/public/proquest/proquest_congressional_record. A README file with further information about using the files is available there. To request an HPC account (faculty sponsorship is required), visit the HPC homepage.

Questions?

Contact libraries-tdm@nyu.edu for questions about this data source.

Text Data Mining

General Information

Service Desk and Chat

Congressional Record Text-as-Data Collection