Research Guides: Data Services Class Descriptions: Data Cleaning and Management Using Python

This session is an intermediate level class that will examine ways to perform data cleaning, transformation, and management using Python. We will look at some efficient ways to load data and parse it into a container for ease of use in Python, to store it in helpful formats, and to perform some basic cleaning and transformations typical for mixed string-and-numeric formats. Finally, we'll try putting it all together using a dataset form the NYC Open Data portal.

Software:	Computer workstations with Anaconda Python/Jupyter Notebook are available for in-person tutorials in Bobst 617. For remote tutorials, while some patrons decide to approach tutorials as a demonstration of the software, other patrons approach tutorials with a more “hands-on” approach and wish to interact with the software during the tutorial. If the latter is the case, we recommend referencing our supported software page for additional information on accessing the software prior to the tutorial.
Duration:	120 min
Room description:	Some tutorials are held remotely and require NYU sign on to access, while others are held in person, without a remote component. Please note the correct modality and location of the tutorial when registering
Prerequisites:	Ability to set and understand the object type of a variable Familiarity with foundational object types (lists, strings, numbers, dictionaries) in Python Familiarity with common data storage file types such as JSON and CSV Comfort with, or willingness to learn more about dataframe and array objects in Python Comfort with using Jupyter Notebooks for writing code
Skills Taught / Learning Outcomes:	Transforming common formats for distributing data (CSV, JSON) into arrays and dataframes for cleaning and analysis Building simple but robust environments in SQLite using Python’s sqlite3 to store and query larger datasets Data syntactical cleaning and refactoring to enable accurate data analysis Parsing incomplete or ill-formed datasets from open sources for robust research use
Class Materials:
Related Classes:	Data Visualization with Tableau Data Cleaning Using OpenRefine Introduction to Jupyter Notebooks Introduction to Python
Additional Training Materials:	Python for Essential Data Science Training via LinkedIn Learning (NYU NetID required) Working with SQLite Databases using Python and Pandas
Feedback:	bit.ly/feedbackds

Data Services Class Descriptions

General Information

Service Desk and Chat

Upcoming sessions for this tutorial