Skip to main content

Health Statistics: Search Strategies

A guide to locating numerical data and and using health statistics.

Overview of Data Search Strategies

Searching the literature may help you locate statistics embedded in a study, or associated with an organization or government entity that collects and aggregates data. This will give you clues about which data sets others are using to investigate your topic, and even if no specific data sets are mentioned. Start with the Published Works tab above.

Who cares about this information? Statistics cost a lot to collect. Who cares enough about the information to collect it? Some of the most likely stats collectors include governments, marketers, trade groups, and advocacy associations. Depending on your subject area, finding useful statistics can be very challenging.

Here are a few things to think about when trying to find a statistic: The most recent statistic may not be from this year. Because statistics take time and money to collect and disseminate, the most recent ones may sometimes be a few years old. Follow the trail. Finding statistics can sometimes be an exercise in detective work. Always look at the source of the statistic. If you read an article and it sites a source, e.g., the CDC or Pew Research, consult that source. It may provide additional statistics or context that wasn't referenced in the article.

Evaluate the source. As with all information, you should evaluate the source providing the statistic. Are they biased? Is the group or website reliable? Do they provide access to data that the statistic came from? Read the statistic carefully. Be sure to pay close attention to any information provided about how the statistic was collected, etc. You don't want to misrepresent the statistic or its significance in your own writing.

Data and Statistics are related, but not exactly the same. Understanding the difference is helpful as you conduct your search!

Here is an over-simplified explanation of the difference:

Data

  • Data is the raw material produced by research, administrative record-keeping, scientific instruments, or other collection methods.
  • Most often: a row represents a record/observation/case and a column represents a variable. 
  • When dealing with data, the unit of analysis is important: what was being studied? What is represented in each row? A person? A household? A country?
  • There are also different types of variables. Sometimes a number is just as it seems, but often a number is a code that stands for something.
    • For example, in the Age column, "40" means that person is age 40. But in the "Gender" column, maybe "1" means "female" and "2" means "male." 
  • Data is designed to be read by a machine. For a human to make sense of a dataset, we need documentation or a codebook, which tells us what the rows represent, what the variable names mean, what the codes mean, how the data was collected, and everything else we need to understand the data.
  • For a more detailed explanation, visit the UCLA Data Archive resource, About Social Science Data.
  • Data usually looks something like this:



Statistics

  • Statistics are an aggregated description of a dataset; they interpret or summarize the dataset. 
  • Statistics are produced by some kind of analysis of a dataset.
  • When you find statistics, they're usually "ready to go" -- you can use them immediately in their current form. They're made to be read by humans!
  • Statistics usually look something like this:


Source: QuickFacts from the United States Census Bureau

 

Here's an example:

According to the United States Census Bureau, 50.8% of the population of the United States is female: this is a statistic.

The statistic mentioned above was calculated from the Census Bureau's Decennial Census SF1 dataset, which has 311,591,917 cases/rows—one representing each person in the U.S.—each of which has an entry in the Sex variable/column.

Conveniently, someone at the Census Bureau analyzed the SF1 dataset to produce the statistic above; if that's what you were looking for, then great! However, if you wanted to do your own analysis, for example, examining Sex, Occupation, and Age (and there wasn't another statistic available to tell you what you wanted to know), then you'd need to use the dataset.

 

Loading

Librarian for Geospatial Information Systems