Skip to Main Content

Data Services Class Descriptions

Information, materials, and schedules for all currently offered Data Services classes.
An essential first step in working with your data is processing it to fit your analysis or visualization needs. This tutorial introduces you to key concepts and techniques under the umbrella of data cleaning, transformation, preparation, munging, wrangling, and tidying. Focusing on tabular data, we’ll consider the implications of processing and ways to document our activities. In this tutorial, we will look at spreadsheet files in Google Sheets and use OpenRefine, an essential open-source tool for fast clean up of tabular data in preparation for analysis.
Software: OpenRefine, Google Sheets
Duration: 120 min

Room description:

Some tutorials are held remotely and require NYU sign on to access, while others are held in person, without a remote component. Please note the correct modality and location of the tutorial when registering

Prerequisites:

None

Skills Taught / Learning Outcomes:
  • Perform mass edits on data syntax to enable accurate data analysis
  • Perform automated transformations to save time in cleaning data
  • Split and join cells and columns
  • Perform built-in transformations (changing case, removing leading/trailing whitespace)
  • Be introduced to regular expressions and GREL for advanced transformations
  • Learn how to use the OpenRefine interface, import and export datasets
  • Understand how OpenRefine documents changes to datasets to enable reproducible scholarship
  • Reflect on the implications of processing choices and best practices for documentation
Class Materials:

2018 Squirrel Census data

Related Classes:

Data Cleaning and Management Using Python

Data Wrangling in R

Data Wrangling in Stata

Additional Training Materials:

Websites:

Exercises/Projects:

Feedback: bit.ly/feedbackds

Upcoming sessions for this tutorial