Skip to Main Content
It looks like you're using Internet Explorer 11 or older. This website works best with modern browsers such as the latest versions of Chrome, Firefox, Safari, and Edge. If you continue with this browser, you may see unexpected results.

Data Services Class Descriptions

Information, materials, and schedules for all currently offered Data Services classes
This tutorial provides a basic understanding of Apache Hive and its usage in the Hadoop ecosystem. There will be hands-on examples on how to use Apache Hive and step-by-step instructions on how to run Hive jobs using NYU's Dumbo (Hadoop) Cluster.
Software: Apache Hive
Duration: 120 min

Room description:

During the Fall 2021 semester, some tutorials are held remotely and require NYU sign on to access, while others are held in person, without a remote component. Please note the correct modality and location of the tutorial when registering

Prerequisites:
Skills Taught / Learning Outcomes:
  • Hadoop Framework and its main services
  • Overview of Hive 
  • Overview of NYU HPC Dumbo cluster
  • Internal tables
  • External tables
  • Static Partitions
  • Dynamic partitions
  • Bucketing
Class Materials: Link to Class Materials
Related Classes:

Introduction to Unix/Linux and the Shell

Big Data Tutorial 1: MapReduce

Big Data Tutorial 3: Introduction to Spark

Additional Training Materials:

Analyzing Big Data with Hive available via LinkedIn Learning (NYU NetID required)

Feedback: bit.ly/feedbackds

 

Upcoming sessions for this tutorial