8280  Reviews star_rate star_rate star_rate star_rate star_half

Advanced Data Analytics with PySpark

When you feel constrained by the computing power of a single computer, you can leverage the Apache Spark platform's massively parallel processing capabilities using PySpark, a Python-based language...

Read More
$1,460 USD
Duration 2 days
Course Code WA2936
Available Formats Classroom, Virtual

Overview

When you feel constrained by the computing power of a single computer, you can leverage the Apache Spark platform's massively parallel processing capabilities using PySpark, a Python-based language supported by Spark. Along with introducing PySpark, this course covers Spark Shell to interactively explore and manipulate data. Spark SQL is introduced for a uniform programming API to work with structured data. The course ends with covering Pandas for data manipulation and analysis and data visualization with seaborn.

Skills Gained

  • Learn PySpark Shell Environment
  • Understand Spark DataFrames
  • Process Data with the PySpark DataFrame API
  • Work with Pivot Tables in PySpark
  • Perform Data Visualization and Exploratory Data Analysis (EDA) in PySpark

Who Can Benefit

  • Business Analysts who want a scalable platform for solving SQL-centric problem

Prerequisites

Knowledge of SQL, familiarity with Python (or the ability to learn the basics of a new language)

Course Details

Outline

Chapter 1. Introduction to Apache Spark

  • What is Apache Spark
  • The Spark Platform
  • Spark vs Hadoop's MapReduce (MR)
  • Common Spark Use Cases
  • Languages Supported by Spark
  • Running Spark on a Cluster
  • The Spark Application Architecture
  • The Driver Process
  • The Executor and Worker Processes
  • Spark Shell
  • Jupyter Notebook Shell Environment
  • Spark Applications
  • The spark-submit Tool
  • The spark-submit Tool Configuration
  • Interfaces with Data Storage Systems
  • Project Tungsten
  • The Resilient Distributed Dataset (RDD)
  • Datasets and DataFrames
  • Spark SQL, DataFrames, and Catalyst Optimizer
  • Spark Machine Learning Library
  • GraphX
  • Extending Spark Environment with Custom Modules and Files
  • Summary

Chapter 2. The Spark Shell

  • The Spark Shell
  • The Spark v.2 + Command-Line Shells
  • The Spark Shell UI
  • Spark Shell Options
  • Getting Help
  • Jupyter Notebook Shell Environment
  • Example of a Jupyter Notebook Web UI (Databricks Cloud)
  • The Spark Context (sc) and Spark Session (spark)
  • Creating a Spark Session Object in Spark Applications
  • The Shell Spark Context Object (sc)
  • The Shell Spark Session Object (spark)
  • Loading Files
  • Saving Files
  • Summary

Chapter 3. Introduction to Spark SQL

  • What is Spark SQL?
  • Uniform Data Access with Spark SQL
  • Hive Integration
  • Hive Interface
  • Integration with BI Tools
  • What is a DataFrame?
  • Creating a DataFrame in PySpark
  • Commonly Used DataFrame Methods and Properties in PySpark
  • Grouping and Aggregation in PySpark
  • The "DataFrame to RDD" Bridge in PySpark
  • The SQLContext Object
  • Examples of Spark SQL / DataFrame (PySpark Example)
  • Converting an RDD to a DataFrame Example
  • Example of Reading / Writing a JSON File
  • Using JDBC Sources
  • JDBC Connection Example
  • Performance, Scalability, and Fault-tolerance of Spark SQL
  • Summary

Chapter 4. Practical Introduction to Pandas

  • What is pandas?
  • The Series Object
  • Accessing Values and Indexes in Series
  • Setting Up Your Own Index
  • Using the Series Index as a Lookup Key
  • Can I Pack a Python Dictionary into a Series?
  • The DataFrame Object
  • The DataFrame's Value Proposition
  • Creating a pandas DataFrame
  • Getting DataFrame Metrics
  • Accessing DataFrame Columns
  • Accessing DataFrame Rows
  • Accessing DataFrame Cells
  • Using iloc
  • Using loc
  • Examples of Using loc
  • DataFrames are Mutable via Object Reference!
  • Deleting Rows and Columns
  • Adding a New Column to a DataFrame
  • Appending / Concatenating DataFrame and Series Objects
  • Example of Appending / Concatenating DataFrames
  • Re-indexing Series and DataFrames
  • Getting Descriptive Statistics of DataFrame Columns
  • Getting Descriptive Statistics of DataFrames
  • Applying a Function
  • Sorting DataFrames
  • Reading From CSV Files
  • Writing to the System Clipboard
  • Writing to a CSV File
  • Fine-Tuning the Column Data Types
  • Changing the Type of a Column
  • What May Go Wrong with Type Conversion
  • Summary

Chapter 5. Data Visualization with seaborn in Python

  • Data Visualization
  • Data Visualization in Python
  • Matplotlib
  • Getting Started with matplotlib
  • Figures
  • Saving Figures to a File
  • Seaborn
  • Getting Started with seaborn
  • Histograms and KDE
  • Plotting Bivariate Distributions
  • Scatter plots in seaborn
  • Pair plots in seaborn
  • Heatmaps
  • Summary

Chapter 6. (Optional) Quick Introduction to Python for Data Engineers

  • What is Python?
  • Additional Documentation
  • Which version of Python am I running?
  • Python Dev Tools and REPLs
  • IPython
  • Jupyter
  • Jupyter Operation Modes
  • Jupyter Common Commands
  • Anaconda
  • Python Variables and Basic Syntax
  • Variable Scopes
  • PEP8
  • The Python Programs
  • Getting Help
  • Variable Types
  • Assigning Multiple Values to Multiple Variables
  • Null (None)
  • Strings
  • Finding Index of a Substring
  • String Splitting
  • Triple-Delimited String Literals
  • Raw String Literals
  • String Formatting and Interpolation
  • Boolean
  • Boolean Operators
  • Numbers
  • Looking Up the Runtime Type of a Variable
  • Divisions
  • Assignment-with-Operation
  • Comments:
  • Relational Operators
  • The if-elif-else Triad
  • An if-elif-else Example
  • Conditional Expressions (a.k.a. Ternary Operator)
  • The While-Break-Continue Triad
  • The for Loop
  • try-except-finally
  • Lists
  • Main List Methods
  • Dictionaries
  • Working with Dictionaries
  • Sets
  • Common Set Operations
  • Set Operations Examples
  • Finding Unique Elements in a List
  • Enumerate
  • Tuples
  • Unpacking Tuples
  • Functions
  • Dealing with Arbitrary Number of Parameters
  • Keyword Function Parameters
  • The range Object
  • Random Numbers
  • Python Modules
  • Importing Modules
  • Installing Modules
  • Listing Methods in a Module
  • Creating Your Own Modules
  • Creating a Runnable Application
  • List Comprehension
  • Zipping Lists
  • Working with Files
  • Reading and Writing Files
  • Reading Command-Line Parameters
  • Accessing Environment Variables
  • What is Functional Programming (FP)?
  • Terminology: Higher-Order Functions
  • Lambda Functions in Python
  • Example: Lambdas in the Sorted Function
  • Other Examples of Using Lambdas
  • Regular Expressions
  • Using Regular Expressions Examples
  • Python Data Science-Centric Libraries
  • Summary

Lab Exercises

  • Lab 1. Learning the Databricks Community Cloud Lab Environment
  • Lab 2. Learning PySpark Shell Environment
  • Lab 3. Understanding Spark DataFrames
  • Lab 4. Learning the PySpark DataFrame API
  • Lab 5. Processing Data in PySpark using the DataFrame API (Project)
  • Lab 6. Working with Pivot Tables in PySpark (Project)
  • Lab 7. Data Visualization and EDA in PySpark
  • Lab 8. Data Visualization and EDA in PySpark (Project)
|
View Full Schedule

Schedule

4 options available

  • Apr 22, 2025 - Apr 23, 2025 (2 days)
    Virtual | 10:00 AM 6:00 PM EST
    Language English
    Select from 1 options below
    Virtual |10:00 AM 6:00 PM EST
    Virtual | 10:00 AM 6:00 PM EST
    Enroll
    Enroll Add to quote
  • Apr 28, 2025 - Apr 29, 2025 (2 days)
    Virtual | 10:00 AM 6:00 PM EST
    Language English
    Select from 1 options below
    Virtual |10:00 AM 6:00 PM EST
    Virtual | 10:00 AM 6:00 PM EST
    Enroll
    Enroll Add to quote
  • Jun 2, 2025 - Jun 3, 2025 (2 days)
    Virtual | 10:00 AM 6:00 PM EST
    Language English
    Select from 1 options below
    Virtual |10:00 AM 6:00 PM EST
    Virtual | 10:00 AM 6:00 PM EST
    Enroll
    Enroll Add to quote
  • Jun 9, 2025 - Jun 10, 2025 (2 days)
    Virtual | 10:00 AM 6:00 PM EST
    Language English
    Select from 1 options below
    Virtual |10:00 AM 6:00 PM EST
    Virtual | 10:00 AM 6:00 PM EST
    Enroll
    Enroll Add to quote

FAQ

Does the course schedule include a Lunchbreak?

Classes typically include a 1-hour lunch break around midday. However, the exact break times and duration can vary depending on the specific class. Your instructor will provide detailed information at the start of the course.

What languages are used to deliver training?

Most courses are conducted in English, unless otherwise specified. Some courses will have the word "FRENCH" marked in red beside the scheduled date(s) indicating the language of instruction.

What does GTR stand for?

GTR stands for Guaranteed to Run; if you see a course with this status, it means this event is confirmed to run. View our GTR page to see our full list of Guaranteed to Run courses.

Does Ascendient Learning deliver group training?

Yes, we provide training for groups, individuals and private on sites. View our group training page for more information.

What does vendor-authorized training mean?

As a vendor-authorized training partner, we offer a curriculum that our partners have vetted. We use the same course materials and facilitate the same labs as our vendor-delivered training. These courses are considered the gold standard and, as such, are priced accordingly.

Is the training too basic, or will you go deep into technology?

It depends on your requirements, your role in your company, and your depth of knowledge. The good news about many of our learning paths, you can start from the fundamentals to highly specialized training.

How up-to-date are your courses and support materials?

We continuously work with our vendors to evaluate and refresh course material to reflect the latest training courses and best practices.

Are your instructors seasoned trainers who have deep knowledge of the training topic?

Ascendient Learning instructors have an average of 27 years of practical IT experience and have also served as consultants for an average of 15 years. To stay current, instructors spend at least 25 percent of their time learning new, emerging technologies and courses.

Do you provide hands-on training and exercises in an actual lab environment?

Lab access is dependent on the vendor and the type of training you sign up for. However, many of our top vendors will provide lab access to students to test and practice. The course description will specify lab access.

Will you customize the training for our company’s specific needs and goals?

We will work with you to identify training needs and areas of growth.  We offer a variety of training methods, such as private group training, on-site of your choice, and virtually. We provide courses and certifications that are aligned with your business goals.

How do I get started with certification?

Getting started on a certification pathway depends on your goals and the vendor you choose to get certified in. Many vendors offer entry-level IT certification to advanced IT certification that can boost your career. To get access to certification vouchers and discounts, please contact info@ascendientlearning.com.

Will I get access to content after I complete a course?

You will get access to the PDF of course books and guides, but access to the recording and slides will depend on the vendor and type of training you receive.

How do I request a W9 for Ascendient Learning?

View our filing status and how to request a W9.

Reviews

it was good and very informative. Instructure covered everything in detail.

great class and packed with material, would have lived to spread it more into many days but overall very informative.

Good training materials and lecture. And hands on lab and the instructor guiding was good.

You get detailed labs to guide you through the technical material giving you a hands on method of learning otherwise difficult material.

The training was very good to understand the concepts and how to set up things .

Prerequisites