Skip to Main Content

Data Services Class Descriptions

Information, materials, and schedules for all currently offered Data Services classes.
This tutorial provides a basic understanding of Apache Hive and its usage in the Hadoop ecosystem. There will be hands-on examples on how to use Apache Hive and step-by-step instructions on how to run Hive jobs using NYU's Dumbo (Hadoop) Cluster.
Software: Apache Hive
Duration: 120 min

Room description:

Some tutorials are held remotely and require NYU sign on to access, while others are held in person, without a remote component. Please note the correct modality and location of the tutorial when registering

Prerequisites:
Skills Taught / Learning Outcomes:
  • Hadoop Framework and its main services
  • Overview of Hive 
  • Overview of NYU HPC Dumbo cluster
  • Internal tables
  • External tables
  • Static Partitions
  • Dynamic partitions
  • Bucketing
Class Materials: Link to Class Materials
Related Classes:

Introduction to Unix/Linux and the Shell

Big Data Tutorial 1: MapReduce

Big Data Tutorial 3: Introduction to Spark

Additional Training Materials:

Analyzing Big Data with Hive available via LinkedIn Learning (NYU NetID required)

Feedback: bit.ly/feedbackds

 

Upcoming sessions for this tutorial