Skip to Main Content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.

Data Services Class Descriptions

Information, materials, and schedules for all currently offered Data Services classes
This tutorial provides a basic understanding of Apache Spark and its usage in the Hadoop ecosystem. There will be hands-on examples on how to use Apache Spark and step-by-step instructions on how to run Spark jobs using NYU's Dumbo (Hadoop) Cluster.
Software: Apache Spark
Duration: 120 min

Room description:

During the Fall 2021 semester, some tutorials are held remotely and require NYU sign on to access, while others are held in person, without a remote component. Please note the correct modality and location of the tutorial when registering

Prerequisites:
Skills Taught / Learning Outcomes:
  • Brief overview of the Spark ecosystem 
  • RDD, transformation and action 
  • Running spark-shell, pyspark
  • Compiling Java code with Maven
  • Spark-submit
  • Spark SQL
  • Accessing Hive database in Spark
Class Materials: Link to Class Materials
Related Classes:

Introduction to Unix/Linux and the Shell

Big Data Tutorial 1: MapReduce

Big Data Tutorial 2: Using Hive

Additional Training Materials:

Available via LinkedIn Learning (NYU NetID required):

Feedback: bit.ly/feedbackds

 

Upcoming sessions for this tutorial