Course Outline
Cloud Data Engineering Fundamentals
- Big Data Overview
- On-Premises vs. Cloud Data Management Contrasts
- Data Engineering Essentials
- Business-driven Data Processing
- Introduction to Apache Spark
- Spark's Practical Applications
Azure Databricks Basics
- Spark and Azure Databricks
- Azure Databricks Architecture Overview
- Navigating the Azure Databricks Portal
- Cluster Creation Process
- Cluster Management Essentials
Azure Databricks Development Environment
- Overview of Development Environment
- Notebooks Functionality
- Practical Notebook Utilization
File Systems and Data Lake Integration
- Understanding DBFS
- Accessing DBFS via Databricks UI
- Uploading Data to DBFS
- dbutils for DBFS Interaction
- Azure Data Lake Storage Integration
- Utilizing dbutils for Data Lake Mounting
Database and Table Management in Azure Databricks
- Understanding Databases and Tables
- Creating and Managing Databases
- Working with Tables
- Using SQL with Tables
- Using PySpark with Tables
- Table Features Exploration
- Understanding Partitioned Tables
Views in Azure Databricks
- Understanding Views
- Using SQL with Views
- Temporary and Global Views
- Using PySpark with Views
Data Analysis in Azure Databricks
- Querying, Visualizing, and EDA
- SQL Data Querying
- PySpark Data Querying
- Multi-Table Joins
- Exploratory Data Analysis
- Table Visualization Techniques
- Using Charts
- Data Profiling
JDBC Integration in Azure Databricks
- Advantages of JDBC Usage
- Data Source Addition via JDBC
- JDBC URL and Connection Parameters
- Query Execution via JDBC
Delta Lake in Azure Databricks
- Introduction to Delta Lake
- Delta Lake Architecture
- Features and Advantages of Delta Lake
- Using Delta Lake for Reliable Data Lakes
Pipeline and Workflow Automation in Azure Databricks
- Introduction to Pipelines and Workflow Automation
- Creating and Managing Pipelines
- Defining Dependencies and Triggers
- Incorporating Data Processing
- Implementing Error Handling
- Scheduling Execution
Monitoring and Optimization
- Spark UI Monitoring
- Storage Performance Analysis
- Worker Node and Executor Evaluation
- Performance Metrics Utilization