cloudera-training-courses
8404  Reviews star_rate star_rate star_rate star_rate star_half

Building an Open Data Lakehouse Using Apache Iceberg

The Open Data Lakehouse is a modern data architecture that enables versatile analytics on streaming and stored data within cloud-native object stores. This architecture can span hybrid and...

Read More
Duration 4 days
Course Code DENG-251
Available Formats Classroom

Overview

The Open Data Lakehouse is a modern data architecture that enables versatile analytics on streaming and stored data within cloud-native object stores. This architecture can span hybrid and multi-cloud environments. This course introduces Apache Ozone, a hybrid storage service addressing the limitations of HDFS. You'll also explore Apache Iceberg, an open-table format optimized for petabyte-scale datasets. The course covers Iceberg's benefits, architecture, read/write operations, streaming, and advanced features like time travel, partition evolution, and Data-as-Code. Over 25 hands-on labs and a capstone project will equip you with the skills to build an efficient, performant Open Data Lakehouse within your own environment.

Skills Gained

Open Data Lakehouse Fundamentals

  • Understand core Open Data Lakehouse concepts and benefits.
  • Introduction to Apache Ozone and its integration within the CDP Ecosystem.

Who Can Benefit

This course is designed for data professionals within organizations using Cloudera Data Warehouse or Cloudera Data Engineering solutions. If you're building an Open Data Lakehouse powered by Apache Iceberg, this course will provide the knowledge and skills you need. Ideal roles include Data Engineers, Hive/Impala SQL Developers, Kafka Streaming Engineers, Data Scientists, and CDP Admins.

Prerequisites

A basic understanding of HDFS and experience with Hive and Spark are prerequisites.

Course Details

Apache Ozone Mastery

  • Configure Ozone, use CLI commands, and transfer data between HDFS and Ozone.
  • Integrate Ozone into applications.

Apache Iceberg Expertise

  • Explore Iceberg's integration with CDP, architecture, and data lakehouse design principles.
  • Master data management, governance, and optimization best practices.
  • Understand snapshots and time travel queries.
  • Design tables strategically (external/managed, copy-onwrite, merge-on-read).
  • Employ advanced features: change data capture (CDC), schema/partition evolution, hidden partitions.

Iceberg Administration

  • Perform table maintenance tasks.
  • Configure and manage access control settings.

Capstone Project

  • Apply all concepts by implementing an Open Data Lakehouse use case in CDP.
  • Develop a comprehensive Open Data Lakehouse implementation runbook.

Day 1

  • Iceberg Introduction
  • DataLake Concepts
  • Open Lakehouse
  • Hive Architecture and Tables
  • Introduction and working with Ozone
  • Transfer data between HDFS & Ozone
  • Ozone Application Integration
  • Iceberg Architecture
  • Iceberg Spark, SQL Setup
  • Iceberg Catalog Review
  • Iceberg Tables: Managed & External
  • Table Design and Practice
  • Iceberg Table Tune for Read vs Write

Day 2

  • Schema Evaluation, Understand various data types issues between Hive and Iceberg during migration
  • Hidden Partition: How partition works in the Iceberg table. Compare Hive and Iceberg Partition
  • Time Travel. Various ways of Time Travel and How it helps for testing.
  • Data-As-Code including WAP - For ETL, branching & Tags - For Zero Copy Clone for Testing QA and ML
  • Iceberg Metadata for Maintenance.

Day 3

  • Change Data Capture CDC
  • Rollback Data
  • Migration – Practice various Hive to Iceberg migration
  • Shallow Migration
  • In-Place Migration
  • Hybrid Migration
  • Snapshot migration for testing
  • Late Late-arriving data migration
  • RunBook build
  • Table Maintenance
  • Streaming

Day 4

The capstone project aims to create a Type 2 table data flow, which is a system for managing historical changes to data in a database table. In a Type 2 table, each record maintains historical information, allowing users to track changes over time. This is crucial for data warehousing and analytics, where historical data is often required for analysis and reporting purposes.

Schedule

FAQ

Does the course schedule include a Lunchbreak?

Classes typically include a 1-hour lunch break around midday. However, the exact break times and duration can vary depending on the specific class. Your instructor will provide detailed information at the start of the course.

What languages are used to deliver training?

Most courses are conducted in English, unless otherwise specified. Some courses will have the word "FRENCH" marked in red beside the scheduled date(s) indicating the language of instruction.

What does GTR stand for?

GTR stands for Guaranteed to Run; if you see a course with this status, it means this event is confirmed to run. View our GTR page to see our full list of Guaranteed to Run courses.

Does Ascendient Learning deliver group training?

Yes, we provide training for groups, individuals and private on sites. View our group training page for more information.

What does vendor-authorized training mean?

As a vendor-authorized training partner, we offer a curriculum that our partners have vetted. We use the same course materials and facilitate the same labs as our vendor-delivered training. These courses are considered the gold standard and, as such, are priced accordingly.

Is the training too basic, or will you go deep into technology?

It depends on your requirements, your role in your company, and your depth of knowledge. The good news about many of our learning paths, you can start from the fundamentals to highly specialized training.

How up-to-date are your courses and support materials?

We continuously work with our vendors to evaluate and refresh course material to reflect the latest training courses and best practices.

Are your instructors seasoned trainers who have deep knowledge of the training topic?

Ascendient Learning instructors have an average of 27 years of practical IT experience and have also served as consultants for an average of 15 years. To stay current, instructors spend at least 25 percent of their time learning new, emerging technologies and courses.

Do you provide hands-on training and exercises in an actual lab environment?

Lab access is dependent on the vendor and the type of training you sign up for. However, many of our top vendors will provide lab access to students to test and practice. The course description will specify lab access.

Will you customize the training for our company’s specific needs and goals?

We will work with you to identify training needs and areas of growth.  We offer a variety of training methods, such as private group training, on-site of your choice, and virtually. We provide courses and certifications that are aligned with your business goals.

How do I get started with certification?

Getting started on a certification pathway depends on your goals and the vendor you choose to get certified in. Many vendors offer entry-level IT certification to advanced IT certification that can boost your career. To get access to certification vouchers and discounts, please contact info@ascendientlearning.com.

Will I get access to content after I complete a course?

You will get access to the PDF of course books and guides, but access to the recording and slides will depend on the vendor and type of training you receive.

How do I request a W9 for Ascendient Learning?

View our filing status and how to request a W9.

Reviews

It is very good and very simple instructions. almost to much hand holding.

this class was informative, made me think about certifying for the suse manager cert.

Good training materials and lecture. And hands on lab and the instructor guiding was good.

Good training. A lot to take in for the short amount of time we have though

Very good couse and again we would like to see more videos on removing FRUs