Skip to Main Content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.

Data Services Class Descriptions

Information, materials, and schedules for all currently offered Data Services classes
This class will provide a brief overview of what Hadoop is and the various components that are involved in the Hadoop ecosystem. There will be a hands on showcase for the users on how to use Dumbo (Hadoop) cluster to run basic mapreduce jobs. Various hands on exercises have been incorporated for the users to get a better understanding.
Software: Hadoop Framework
Duration: 120 min

Room description:

During the Fall 2021 semester, some tutorials are held remotely and require NYU sign on to access, while others are held in person, without a remote component. Please note the correct modality and location of the tutorial when registering

Prerequisites:
Skills Taught / Learning Outcomes:
  • Hadoop Framework and its main services
  • MapReduce Framework
  • Overview of NYU HPC Dumbo cluster
  • HDFS filesystem
  • Running classic java MapReduce jobs
  • Running MapReduce jobs written in Python
Class Materials:
Related Classes:

Using Slurm on Greene Cluster

Introduction to Unix/Linux and the Shell

Big Data Tutorial 2: Using Hive

Big Data Tutorial 3: Introduction to Spark

Additional Training Materials:

Hadoop Fundamentals available via LinkedIn Learning (NYU NetID required)

Feedback: bit.ly/feedbackds

 

Upcoming sessions for this tutorial