Skip to Main Content

Data Services Class Descriptions

Information, materials, and schedules for all currently offered Data Services classes.
This class will provide a brief overview of what Hadoop is and the various components that are involved in the Hadoop ecosystem. There will be a hands on showcase for the users on how to use Dumbo (Hadoop) cluster to run basic mapreduce jobs. Various hands on exercises have been incorporated for the users to get a better understanding.
Software: Hadoop Framework
Duration: 120 min

Room description:

Some tutorials are held remotely and require NYU sign on to access, while others are held in person, without a remote component. Please note the correct modality and location of the tutorial when registering

Prerequisites:
Skills Taught / Learning Outcomes:
  • Hadoop Framework and its main services
  • MapReduce Framework
  • Overview of NYU HPC Dumbo cluster
  • HDFS filesystem
  • Running classic java MapReduce jobs
  • Running MapReduce jobs written in Python
Class Materials:
Related Classes:

Using Slurm on Greene Cluster

Introduction to Unix/Linux and the Shell

Big Data Tutorial 2: Using Hive

Big Data Tutorial 3: Introduction to Spark

Additional Training Materials:

Hadoop Fundamentals available via LinkedIn Learning (NYU NetID required)

Feedback: bit.ly/feedbackds

 

Upcoming sessions for this tutorial