It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.
Data Sources: Data Sources for Instructional purposes
This list contains data products that are especially valuable for instructional purposes.
These sources are particularly useful for teaching due to their presentation, ease of access, relatively "clean" condition of the data, and topical and subject diversity represented. Many of these sources provide fulsome documentation, detailed code books, and contextualizing materials, and some even provide pre-made lessons that go along with the datasets.These collections have been licensed by the NYU Libraries and have datasets that go beyond what is freely available on the web in general, with resources that have been selected, manipulated, and/or harmonized to increase their utility.
Nearly all of these sources offer download of data in at least .csv or .xls format, and often in a statistical package-ready (SPSS, STATA, etc.) format.
This easy-to-use resource contains datasets sourced from reputable public and private organizations, helpfully arranged into topics across 16 broad subject areas, including education, population and income, industry, commerce, and trade, housing and construction. Datasets can be filtered and manipulated and downloaded for offline analysis or lesson building.
Statista aggregates data on markets, industry, politics, economics and population. Sources include trade publications, market and opinion research institutions, government sources, business and economic databases, reference publications, media sources and scholarly articles. It has an international scope and covers over 600 industries, and includes data from hard-to-find industries like sports, entertainment, health, and pharmaceuticals. Sources are downloadable for offline analysis.
ICPSR is a data repository that includes curated data sets drawn from the social sciences disciplines. In addition to detailed codebooks and descriptions of methodology, it offers multiple file format downloads for offline analysis. ICPSR is one of the only tools that provides variable-level searching, which allows for discovery of interesting and unique datasets. ICPSR also includes data-driven learning guides for many of the datasets included, which can help instructors incorporate these sources into teaching.
Statistical Insight provides indexing and abstracts for federal, state, and international statistical publications as well as selected business and professional publications containing statistics. This resource includes all content from a number of classic statistical indexes like the American Statistics Index.
The SSEDL contains datasets which are organized into nine topical data archives centered around health and the family: AIDS/STDs, Disability, the American Family, Adolescent Pregnancy and Pregnancy Prevention, Aging, Maternal Drug Abuse, Child Well-Being and Poverty, Complementary and Alternative Medicine and Contextual (Geographic) Data. Sources are downloadable and include full codebooks and documentation.
To access the data in the Social Science Electronic Data Library (SSEDL), click on "Browse" and select "Data."
This exhibition index dataset was compiled by a project team from the MoMA Archives as part of their work to preserve, describe, and open to the public over 22,000 folders of exhibition records dating from 1929 to 1989 from its registrar and curatorial departments.
Social Explorer provides access to U.S. Census data dating back to 1790. Users can interact with social explorer via tables or maps. All materials are downloadable for offline analysis in multiple formats.
To access Social Explorer, create a user account after signing in to NYU's SSO. If you have questions about authenticating your NYU Social Explorer account from off-campus, please contact Data Services.
Data.Census.Gov, the interface that replaces American FactFinder, is the portal to US Census data and includes data from the American Community Survey, Population Estimates, Economic Census, and Annual Economic Surveys. Users can generate simple visualizations, extract reports, and explore census data by theme or topic.
The Statistical Abstract of the United States is the authoritative and comprehensive summary of statistics on the social, political, and economic organization of the United States. The Census Bureau ceased production of the Statistical Abstract in 2012; newer versions are compiled and distributed by ProQuest. Sources of data include the Census Bureau, Bureau of Labor Statistics, Bureau of Economic Analysis, and many other Federal agencies and private organizations. Most materials are downloadable for offline analysis.
"The National Center for Health Statistics (NCHS)... offers downloadable public-use data files through the Centers for Disease Control and Prevention's (CDC) FTP file server. Users of this service have access to data sets, documentation, and questionnaires from NCHS surveys and data collection systems." (From the site)
These data provide insights into the performance of schools eligible to receive federal financial aid, and offer a look at the outcomes of students at those schools. The dataset includes indicators on earning potential after college, average student debt, and more. It comes with a robust API.
UNData contains over 55 million data points covering a wide range of themes including Agriculture, Education, Employment, Energy, Environment, Health, HIV/AIDS, Human Development, Industry, Information and Communication Technology, National Accounts, Population, Refugees, Tourism, Trade, as well as the Millennium Development Goals indicators.
OECD iLibrary is the online portal for the Organization for Economic Co-operation and Development. The iLibrary includes book collections, policy reports, statistical abstracts, and the OECD.Stat tool, with which users can create custom extracts from the OECD data warehouse.
Political Risk Yearbook provides political risk reports for the current year for many countries. CountryData allows users to generate exportable tables with political risk rankings and economic indicators for current and historical years, as well as current forecasts, for various countries.
Country Data is available on campus only. Use NYU VPN for off-campus access.
Other interesting datasets
There are a number of other datasets of interest listed here that particularly lend themselves to teaching advanced quantitative analysis and data manipulation techniques.
Users can access Gallup's U.S. Daily tracking and World Poll data to compare residents' responses region by region and nation by nation to questions on topics such as economic conditions, government and business, health and wellbeing, infrastructure, and education. Materials are downloadable as .xls for offline analysis.
Data from RealtyTrac covers information on property foreclosures that have taken place in the United States between January 1, 2005 and January, 2016. The data, drawn from information released in public tax records across the United States, is not available to the public. It includes information on property characteristics, location, ownership, and use.
These sources contain pre-built instructional materials.
There are a handful of resources that specialize in presenting datasets along with added instructional materials. These products contain contextual documentation and often lesson plans and curriculum building tools.
ICPSR has a special section devoted to teaching resources. These resources use ICPSR's vast repository of data as raw materials to teach concepts and methods in the social sciences. Click on "Teaching and Learning with ICPSR" or go directly to their teaching resources, or go to their Resources for Teachers list for links to videos and other resources for instructional purposes.
SAGE Research Methods datasets collection includes datasets alongside sample lessons on various methods and topic areas, and high-quality, short videos to introduce research concepts. Click on "Content" and then "Datasets" or "Videos."
TeachingWithData.org is a portal where faculty can find resources and ideas to reduce the challenges of bringing real data into post-secondary classes. Using real data is a great way for students to become more engaged in the content of a course, but significant barriers, largely in terms instructor preparation, exist that can make using data a challenge.
Interact with microdata from several national surveys using the online Survey Documentation & Analysis (SDA) interface - includes the General Social Survey, American National Election Studies, and others.
"These pages contains examples (often hypothetical) illustrating the application of different statistical analysis techniques using different statistical packages. Each page provides a handful of examples of when the analysis might be used along with sample data, an example analysis and an explanation of the output, followed by references for more information. These pages merely introduce the essence of the technique and do not provide a comprehensive description of how to use it" (from the site).
An online video training library with over one hundred thousand expert-led videos and thousands of courses. lynda.com includes a number videos on data cleaning and analysis, as well as many common software platforms like Excel. Access to lynda.com is available in the Academics and Worktabs of NYUHome as well as through this link.
Formerly Lynda.com, LinkedIn Learning provides the same services and access to resources via the LinkedIn platform.
This OSF project page collects case studies of research data-related events hosted or co-hosted by academic libraries. Case studies collected for the panel describe four flavors of event: the Center for Open Science Workshop on Reproducible Research, Data Carpentry, Software Carpentry, and Day of Data. These events are potentially useful for developing learning opportunities with data.
data-8 is a introduction to Data Science course at UC Berkeley. The course combines three perspectives: inferential thinking, computational thinking, and real-world relevance. Given data arising from some real-world phenomenon, how does one analyze that data so as to understand that phenomenon? All course materials, including slides, are posted online.
PolicyMap is a data extraction tool that offers a range of public demographic data. These resources include several discipline-based exercises, sample data sets, lesson plans, and ideas for teaching with data.