Skip to Main Content

Data Sources

An overview of data sources at NYU and beyond

Browse the tabs on this guide to learn about many of the data sources available at NYU and on the web

Keep in mind that what is described here is usually at the resource level, rather than at the dataset level or the variable level. Think about how your topic might be classified. For example, you won't find any mention of firefighters on this guide, but you will find the Bureau of Labor Statistics, which provides data about people in different professions, including fire safety.

Search the literature

This will give you clues about which data sets others are using to investigate your topic, and even if no specific data sets are mentioned, you might learn about the organizations, government entities, or others that are likely to collect related data. (Need help with your literature search? Check out our other research guidesfind your subject librarian, or use Ask a Librarian). Another great resource is the ICPSR Bibliography of Data-Related Literature, which allows you to search for articles and other publications first, and then link directly to the datasets used by those articles.

Who cares about this information? Statistics cost a lot to collect. Who cares enough about the information to collect it? Some of the most likely stats collectors include governments, marketers, trade groups, and advocacy associations. Depending on your subject area, finding useful statistics can be very challenging.

Here are a few things to think about when trying to find a statistic: The most recent statistic may not be from this year. Because statistics take time and money to collect and disseminate, the most recent ones may sometimes be a few years old. Follow the trail. Finding statistics can sometimes be an exercise in detective work. Always look at the source of the statistic. If you read an article and it sites a source, e.g., the CDC or Pew Research, consult that source. It may provide additional statistics or context that wasn't referenced in the article.

Evaluate the source. As with all information, you should evaluate the source providing the statistic. Are they biased? Is the group or website reliable? Do they provide access to data that the statistic came from? Read the statistic carefully. Be sure to pay close attention to any information provided about how the statistic was collected, etc. You don't want to misrepresent the statistic or its significance in your own writing.

Data and Statistics are related, but not exactly the same. Understanding the difference is helpful as you conduct your search!

Here is an over-simplified explanation of the difference:

Data

  • Data is the raw material produced by research, administrative record-keeping, scientific instruments, or other collection methods.
  • Most often: a row represents a record/observation/case and a column represents a variable. 
  • When dealing with data, the unit of analysis is important: what was being studied? What is represented in each row? A person? A household? A country?
  • There are also different types of variables. Sometimes a number is just as it seems, but often a number is a code that stands for something.
    • For example, in the Age column, "40" means that person is age 40. But in the "Gender" column, maybe "1" means "female" and "2" means "male." 
  • Data is designed to be read by a machine. For a human to make sense of a dataset, we need documentation or a codebook, which tells us what the rows represent, what the variable names mean, what the codes mean, how the data was collected, and everything else we need to understand the data.
  • Data usually looks something like this:



Statistics

  • Statistics are an aggregated description of a dataset; they interpret or summarize the dataset. 
  • Statistics are produced by some kind of analysis of a dataset.
  • When you find statistics, they're usually "ready to go" -- you can use them immediately in their current form. They're made to be read by humans!
  • Statistics usually look something like this:


Source: QuickFacts from the United States Census Bureau

 

Here's an example:

According to the United States Census Bureau, 50.8% of the population of the United States is female: this is a statistic.

The statistic mentioned above was calculated from the Census Bureau's Decennial Census SF1 dataset, which has 311,591,917 cases/rows—one representing each person in the U.S.—each of which has an entry in the Sex variable/column.

Conveniently, someone at the Census Bureau analyzed the SF1 dataset to produce the statistic above; if that's what you were looking for, then great! However, if you wanted to do your own analysis, for example, examining Sex, Occupation, and Age (and there wasn't another statistic available to tell you what you wanted to know), then you'd need to use the dataset.

 

Currently Offered Classes in Data Finding

NYU Data Services provides a number of introductory classes and training resources for understanding the principles of data discovery. To request a specialized or customized class session, please fill out our request form.