Skip to Main Content

Data Services Class Descriptions

Information, materials, and schedules for all currently offered Data Services classes.
This tutorial provides a basic understanding of Apache Spark and its usage on the HPC. There will be hands-on examples on how to use Apache Spark and step-by-step instructions on how to run Spark jobs.
Software: None
Duration: 120 min

Room description:

Some tutorials are held remotely and require NYU sign on to access, while others are held in person, without a remote component. Please note the correct modality and location of the tutorial when registering

Prerequisites:
Skills Taught / Learning Outcomes:
  • Brief overview of the Spark ecosystem 
  • RDD, transformation and action 
  • Running spark-shell, pyspark
  • Compiling Java code with Maven
  • Spark-submit
  • Spark SQL
  • Accessing Hive database in Spark
Class Materials: Link to Class Materials
Related Classes:

Introduction to HPC

Advanced HPC

Globus and Data Transfers

Additional Training Materials:

Available via LinkedIn Learning (NYU NetID required):

Feedback: bit.ly/feedbackds

 

Upcoming sessions for this tutorial