R is a programming language for statistical analysis of data. This tutorial will introduce you to the basic elements of R, to working with data sets in R, to visualizing them, and to implementing common statistical procedures.
An introduction to managing, annotating, organizing, archiving, and publishing research data using the Open Science Framework.
This class will provide a brief overview of what Hadoop is and the various components that are involved in the Hadoop ecosystem. There will be a hands on showcase for the users on how to use Dumbo (Hadoop) cluster to run basic mapreduce jobs. Various hands on exercises have been incorporated for the users to get a better understanding.
This workshop introduces the basic concepts of Git version control. Whether you're new to version control or just need an explanation of Git and GitHub, this two hour tutorial will help you understand the concepts of distributed version control. Get to know basic Git concepts and GitHub workflows through step-by-step lessons. We'll even rewrite a bit of history, and touch on how to undo (almost) anything with Git. This is a class for users who are comfortable with a command-line interface.
Accessing U.S. Census Data is intended to be an overview of sources and concepts for accessing census data and its related products and surveys at NYU and beyond.
This session will introduce basic principles of data visualization and demonstrate use of Tableau - a popular interactive data visualization tool.
This session covers the basics of cleaning and managing data in R as well as working with strings, dates, and writing your own functions.
SPSS is a software package for statistical analysis of data. This tutorial will introduce you to the basics of SPSS and will cover importing, creating and editing data sets, and provide an overview of commonly used statistical procedures.
Digital Tools for QDA is designed to address the advantages of using qualitative software tools for social science researchers. We introduce researchers to the logic of QDA software by comparing software design elements and highlighting the methodological implications of using QDA software for text analysis. This class is a general overview of QDA software tools featuring Atlas.ti, MAXQDA, and Taguette.
This session is an intermediate-to-advanced level class that offers some ideas for how to approach common data wrangling needs in research. This courses focuses on obtaining data and loading it into a suitable data "container" for analysis, often via a web interface, especially an API; parsing data retrieved via an API and turning it into a useful object for manipulation and analysis; performing some basic summary counts of records in a dataset and work up quick visualizations.
This session is an intermediate level class that will examine ways to perform data cleaning, transformation, and management using Python. We will look at some efficient ways to load data and parse it into a container for ease of use in Python, to store it in helpful formats, and to perform some basic cleaning and transformations typical for mixed string-and-numeric formats. Finally, we'll try putting it all together using a dataset form the NYC Open Data portal.
This course offers an introduction to extracting and organizing textual and tabular data using the Optical Character Recognition (OCR) softwares Tesseract and ABBYY FineReader. Use of OCR can significantly cut down on data entry and enables digital analysis of non-digital materials.
This session covers creation of charts with base R functions and using the popular ggplot2 package.
Stata is a software package for statistical analysis of data. This tutorial will introduce you to the basics of Stata and will cover importing, creating and editing data sets, and provide an overview of commonly used statistical procedures.
Esri Story Maps & ArcGIS Online are web-based applications geospatial and non-geospatial analysis and storytelling. This tutorial covers the definition of GIS, data finding using NYU’s Spatial Data Repository (SDR), visualizing spatial data in a thematic web map and exporting that map into a multimedia presentation including images, videos, text, and web pages.
This class covers the basics of writing a successful data management plan for federal funding agencies such as the NEH, NSF, NIH, NASA, and others. Attendees will learn about the different requirements funding agencies have for your research data as well as how to best meet those obligations within your lab or research group.
An introduction to the creation, collection, distribution and user management using NYU REDCap data collection tool.
SAS is a software package for statistical analysis of data. This tutorial will introduce you to the basics of SAS and will cover importing, creating and editing data sets, and provide an overview of commonly used statistical procedures.
An introduction to the creation, collection, distribution, and analysis of online surveys using Qualtrics - an online survey platform. Hands-on class taught with practical examples in a lab with computers.
MAXQDA is a software that provides researchers with the qualitative data analysis software tools to code and organize their data materials. This is a hands-on tutorial that reviews how to start a QDA project from uploading text, classifying and coding transcripts, memo writing, and producing analyses.
ArcGIS Pro is a software package for geospatial analysis. This tutorial covers the definition of GIS, data finding using NYU’s Spatial Data Repository (SDR), some differences between ArcGIS Pro and the older ArcMap software, visualizing spatial data in a thematic map and exporting that map in different file formats.
This interactive session covers overarching strategies for finding data sources. Instead of cycling through lists of data portals, sites, and sources, this class models inductive thinking about data itself: who provides it, who is responsible for gathering it, and who has an incentive to release it? Working with several prepared data searching questions, we will explore strategies for finding data available at NYU Libraries and beyond. Participants will be invited to submit their own data questions in advance of the session. Come prepared to converse with fellow data seekers about your process.
Getting Started with Python Pandas is an intermediate-to-advanced level class that offers basic strategies for reading, cleaning, and visualizing data with the Pandas Python library.
R is a programming language for statistical analysis of data. This tutorial will introduce you to the basic elements of R, to working with data sets in R, to visualizing them, and to implementing common statistical procedures.
This session covers the basics of cleaning and managing data in R as well as working with strings, dates, and writing your own functions.