Skip to Main Content

Data Services Class Descriptions

Information, materials, and schedules for all currently offered Data Services classes.
This session is an intermediate level class that will examine ways to perform data cleaning, transformation, and management using Python. We will look at some efficient ways to load data and parse it into a container for ease of use in Python, to store it in helpful formats, and to perform some basic cleaning and transformations typical for mixed string-and-numeric formats. Finally, we'll try putting it all together using a dataset form the NYC Open Data portal.
Software: Computer workstations with Anaconda Python/Jupyter Notebook are available for in-person tutorials in Bobst 617. For remote tutorials, while some patrons decide to approach tutorials as a demonstration of the software, other patrons approach tutorials with a more “hands-on” approach and wish to interact with the software during the tutorial. If the latter is the case, we recommend referencing our supported software page for additional information on accessing the software prior to the tutorial.
Duration: 120 min

Room description:

Some tutorials are held remotely and require NYU sign on to access, while others are held in person, without a remote component. Please note the correct modality and location of the tutorial when registering

Prerequisites:
  • Ability to set and understand the object type of a variable
  • Familiarity with foundational object types (lists, strings, numbers, dictionaries) in Python
  • Familiarity with common data storage file types such as JSON and CSV
  • Comfort with, or willingness to learn more about dataframe and array objects in Python
  • Comfort with using Jupyter Notebooks for writing code
Skills Taught / Learning Outcomes:
  • Transforming common formats for distributing data (CSV, JSON) into arrays and dataframes for cleaning and analysis
  • Building simple but robust environments in SQLite using Python’s sqlite3 to store and query larger datasets
  • Data syntactical cleaning and refactoring to enable accurate data analysis
  • Parsing incomplete or ill-formed datasets from open sources for robust research use
Class Materials:
Related Classes:

Data Visualization with Tableau

Data Cleaning Using OpenRefine

Introduction to Jupyter Notebooks

Introduction to Python

Additional Training Materials:

Python for Essential Data Science Training via LinkedIn Learning (NYU NetID required)

Working with SQLite Databases using Python and Pandas

Feedback: bit.ly/feedbackds

 

Upcoming sessions for this tutorial