8340  Reviews star_rate star_rate star_rate star_rate star_half

Data Engineering and Data Analytics using Azure Databricks

In this data engineering and analytics course, participants explore cloud data engineering with Azure Databricks. They delve into fundamental Big Data principles, practical applications of Apache...

Read More
$1,495 USD
Duration 2 days
Course Code WA3714
Available Formats Classroom

Overview

In this data engineering and analytics course, participants explore cloud data engineering with Azure Databricks. They delve into fundamental Big Data principles, practical applications of Apache Spark, and hands-on Azure Databricks utilization for data engineering and analysis. Through instruction and hands-on labs, students explore data lake storage integration, database management, Delta Lake fundamentals, and advanced data analysis techniques. Additionally, the course covers pipeline and job automation, as well as monitoring strategies for optimized performance.

Skills Gained

  • Understand the fundamental principles of Big Data and its significance in modern data management
  • Navigate the Azure Databricks platform effectively, including its architecture, portal, and cluster management functionalities
  • Develop practical skills in working with databases and tables within Azure Databricks, utilizing both SQL and PySpark for data manipulation
  • Learn advanced data analysis techniques, including querying, visualization, and exploratory data analysis (EDA), to derive meaningful insights from large datasets
  • Explore pipeline and workflow automation strategies to streamline data processing tasks
  • Implement effective monitoring techniques to optimize performance and ensure reliable data processing workflows

Who Can Benefit

This course is designed for data engineers, analysts, and professionals seeking to enhance their skills in cloud data engineering with Azure Databricks, spanning from beginners to intermediate-level learners.

Prerequisites

A basic understanding of SQL and Python is helpful.

Course Details

Software Requirements

  • A computer with an internet connection is required
  • A remote lab VM with an Azure account will be provided to the participants

Cloud Data Engineering Fundamentals

  • Big Data Overview
  • On-Premises vs. Cloud Data Management Contrasts
  • Data Engineering Essentials
  • Business-driven Data Processing
  • Introduction to Apache Spark
  • Spark’s Practical Applications

Azure Databricks Basics

  • Spark and Azure Databricks
  • Azure Databricks Architecture Overview
  • Navigating the Azure Databricks Portal
  • Cluster Creation Process
  • Cluster Management Essentials

Azure Databricks Development Environment

  • Overview of Development Environment
  • Notebooks Functionality
  • Practical Notebook Utilization

File Systems and Data Lake Integration

  • Understanding DBFS
  • Accessing DBFS via Databricks UI
  • Uploading Data to DBFS
  • dbutils for DBFS Interaction
  • Azure Data Lake Storage Integration
  • Utilizing dbutils for Data Lake Mounting

Database and Table Management in Azure Databricks

  • Understanding Databases and Tables
  • Creating and Managing Databases
  • Working with Tables
  • Using SQL with Tables
  • Using PySpark with Tables
  • Table Features Exploration
  • Understanding Partitioned Tables

Views in Azure Databricks

  • Understanding Views
  • Using SQL with Views
  • Temporary and Global Views
  • Using PySpark with Views

Data Analysis in Azure Databricks

  • Querying, Visualizing, and EDA
  • SQL Data Querying
  • PySpark Data Querying
  • Multi-Table Joins
  • Exploratory Data Analysis
  • Table Visualization Techniques
  • Using Charts
  • Data Profiling

JDBC Integration in Azure Databricks

  • Advantages of JDBC Usage
  • Data Source Addition via JDBC
  • JDBC URL and Connection Parameters
  • Query Execution via JDBC

Delta Lake in Azure Databricks

  • Introduction to Delta Lake
  • Delta Lake Architecture
  • Features and Advantages of Delta Lake
  • Using Delta Lake for Reliable Data Lakes

Pipeline and Workflow Automation in Azure Databricks

  • Introduction to Pipelines and Workflow Automation
  • Creating and Managing Pipelines
  • Defining Dependencies and Triggers
  • Incorporating Data Processing
  • Implementing Error Handling
  • Scheduling Execution

Monitoring and Optimization

  • Spark UI Monitoring
  • Storage Performance Analysis
  • Worker Node and Executor Evaluation
  • Performance Metrics Utilization

Schedule

FAQ

How do I get a Microsoft exam voucher?

Pearson Vue Exam vouchers can be requested and ordered with your course purchase or can be ordered separately by clicking here.

  • Vouchers are non-refundable and non-returnable. Vouchers expire 12 months from the date they are issued unless otherwise specified in the terms and conditions.
  • Voucher expiration dates cannot be extended. The exam must be taken by the expiration date printed on the voucher.

Do Microsoft courses come with post lab access?

Most Microsoft official courses will include post-lab access ranging from 30 to 180 calendar days after instructor led course delivery. A lab training key in class will be provided that can be leveraged to continue connecting to a remote lab environment for the individual course attendee.

Does the course schedule include a Lunchbreak?

Lunch is normally an hour-long after 3-3.5 hours of the class day.

What languages are used to deliver training?

Microsoft courses are conducted in English unless otherwise specified.

Reviews

They are very good and made sure we had all the appropriate materials for class.

Both course material and instructor demonstrated a sound foundation on Maximo material

I think the platform is very good and look forward to taking my next course in early October.

Very good company. I've done technical trainings at their facility in downtown Montreal in the past and I'Ve always appreciated them.

The training was good but needed the basic skills of maximo before getting deep in the configuration of it.