8340  Reviews star_rate star_rate star_rate star_rate star_half

Introduction to Python and PySpark

This Python and PySpark training course teaches learners the fundamentals of Python, including data types, variables, functions, and classes. Studetns also learn how to use Python to create powerful...

Read More
$2,090 USD
Duration 3 days
Course Code WA2914
Available Formats Classroom

Overview

This Python and PySpark training course teaches learners the fundamentals of Python, including data types, variables, functions, and classes. Studetns also learn how to use Python to create powerful scripts and applications.

Skills Gained

  • Code in Python
  • Create Python Scripts
  • Create and use variables in Python
  • Work with Python Collections
  • Write and use control statements and loops in Python
  • Define and use functions in Python
  • Read and write text files in Python
  • Learn about functional programming in Python
  • Use the Databricks Community Cloud Lab Environment
  • Use pandas and seaborn for data visualization and EDA
  • Use the PySpark Shell Environment
  • Understand Spark DataFrames
  • Learn the PySpark DataFrame API
  • Repair and normalize data in PySpark
  • Use Spark SQL with PySpark

Who Can Benefit

  • Developers and/or Data Analysts

Prerequisites

  • Programming and/or scripting experience in a modern programming language.

Course Details

Outline

Chapter 1 - Introduction to Python

  • What is Python
  • Uses of Python
  • Installing Python
  • Python Package Manager (PIP)
  • Using the Python Shell
  • Python Code Conventions
  • Importing Modules
  • The Help(object) Command
  • The Help Prompt
  • Summary

Chapter 2 - Python Scripts

  • Executing Python Code
  • Python Scripts
  • Writing Scripts
  • Running Python Scripts
  • Self Executing Scripts
  • Accepting Command-Line Parameters
  • Accepting Interactive Input
  • Retrieving Environment Settings
  • Summary

Chapter 3 - Data Types and Variables

  • Creating Variables
  • Displaying Variables
  • Basic Concatenation
  • Data Types
  • Strings
  • Strings as Arrays
  • String Methods
  • Combining Strings and Numbers
  • Numeric Types
  • Integer Types
  • Floating Point Types
  • Boolean Types
  • Checking Data Type
  • Summary

Chapter 4 - Python Collections

  • Python Collections
  • List Type
  • Modifying Lists
  • Sorting a List
  • Tuple Type
  • Python Sets
  • Modifying Sets
  • Dictionary (Map) Type
  • Dictionary Methods
  • Sequences
  • Summary

Chapter 5 - Control Statements and Looping

  • If Statement
  • elif Keyword
  • Boolean Conditions
  • Single Line If Statements
  • For-in Loops
  • Looping over an Index
  • Range Function
  • Nested Loops
  • While Loops
  • Exception Handling
  • Built-in Exceptions
  • Exceptions thrown by Built-In Functions
  • Summary

Chapter 6 - Functions in Python

  • Defining Functions
  • Naming Functions
  • Using Functions
  • Function Parameters
  • Named Parameters
  • Variable Length Parameter List
  • How Parameters are Passed
  • Variable Scope
  • Returning Values
  • Docstrings
  • Best Practices
  • Single Responsibility
  • Returning a Value
  • Function Length
  • Pure and Idempotent Functions
  • Summary

Chapter 7 - Working With Data in Python

  • Data Type Conversions
  • Conversions from other Types to Integer
  • Conversions from other Types to Float
  • Conversions from other Types to String
  • Conversions from other Types to Boolean
  • Converting Between Set, List and Tuple Data Structures
  • Modifying Tuples
  • Combining Set, List and Tuple Data Structures
  • Creating Dictionaries from other Data Structures
  • Summary

Chapter 8 - Reading and Writing Text Files

  • Opening a File
  • Writing a File
  • Reading a File
  • Appending to a File
  • File Operations Using the With Statement
  • File and Directory Operations
  • Reading JSON
  • Writing JSON
  • Summary

Chapter 9 - Functional Programming Primer

  • What is Functional Programming?
  • Benefits of Functional Programming
  • Functions as Data
  • Using Map Function
  • Using Filter Function
  • Lambda expressions
  • List.sort() Using Lambda Expression
  • Difference Between Simple Loops and map/filter Type Functions
  • Additional Functions
  • General Rules for Creating Functions
  • Summary

Chapter 10 - Introduction to Apache Spark

  • What is Apache Spark
  • A Short History of Spark
  • Where to Get Spark?
  • The Spark Platform
  • Spark Logo
  • Common Spark Use Cases
  • Languages Supported by Spark
  • Running Spark on a Cluster
  • The Driver Process
  • Spark Applications
  • Spark Shell
  • The spark-submit Tool
  • The spark-submit Tool Configuration
  • The Executor and Worker Processes
  • The Spark Application Architecture
  • Interfaces with Data Storage Systems
  • Limitations of Hadoop's MapReduce
  • Spark vs MapReduce
  • Spark as an Alternative to Apache Tez
  • The Resilient Distributed Dataset (RDD)
  • Datasets and DataFrames
  • Spark Streaming (Micro-batching)
  • Spark SQL
  • Example of Spark SQL
  • Spark Machine Learning Library
  • GraphX
  • Spark vs R
  • Summary

Chapter 11 - The Spark Shell

  • The Spark Shell
  • The Spark v.2 + Command-Line Shells
  • The Spark Shell UI
  • Spark Shell Options
  • Getting Help
  • Jupyter Notebook Shell Environment
  • Example of a Jupyter Notebook Web UI (Databricks Cloud)
  • The Spark Context (sc) and Spark Session (spark)
  • Creating a Spark Session Object in Spark Applications
  • The Shell Spark Context Object (sc)
  • The Shell Spark Session Object (spark)
  • Loading Files
  • Saving Files
  • Summary

Chapter 12 - Spark RDDs

  • The Resilient Distributed Dataset (RDD)
  • Ways to Create an RDD
  • Supported Data Types
  • RDD Operations
  • RDDs are Immutable
  • Spark Actions
  • RDD Transformations
  • RDD Transformations
  • Other RDD Operations
  • Chaining RDD Operations
  • RDD Lineage
  • The Big Picture
  • What May Go Wrong
  • Checkpointing RDDs
  • Local Checkpointing
  • Parallelized Collections
  • More on parallelize() Method
  • The Pair RDD
  • Where do I use Pair RDDs?
  • Example of Creating a Pair RDD with Map
  • Example of Creating a Pair RDD with keyBy
  • Miscellaneous Pair RDD Operations
  • RDD Caching
  • RDD Persistence
  • Summary

Chapter 13 - Parallel Data Processing with Spark

  • Running Spark on a Cluster
  • Data Partitioning
  • Data Partitioning Diagram
  • Single Local File System RDD Partitioning
  • Multiple File RDD Partitioning
  • Special Cases for Small-sized Files
  • Parallel Data Processing of Partitions
  • Parallel Data Processing of Partitions
  • Spark Application, Jobs, and Tasks
  • Stages and Shuffles
  • The "Big Picture"
  • Summary

Chapter 14 - Shared Variables in Spark

  • Shared Variables in Spark
  • Broadcast Variables
  • Creating and Using Broadcast Variables
  • Example of Using Broadcast Variables
  • Problems with Global Variables
  • Example of the Closure Problem
  • Accumulators
  • Creating and Using Accumulators
  • Example of Using Accumulators (Scala Example)
  • Example of Using Accumulators (Python Example)
  • Custom Accumulators
  • Summary

Chapter 15 - Introduction to Spark SQL

  • What is Spark SQL?
  • What is Spark SQL?
  • Uniform Data Access with Spark SQL
  • Using JDBC Sources
  • Hive Integration
  • What is a DataFrame?
  • Creating a DataFrame in PySpark
  • Creating a DataFrame in PySpark (Cont'd)
  • Creating a DataFrame in PySpark (Cont'd)
  • Commonly Used DataFrame Methods and Properties in PySpark
  • Commonly Used DataFrame Methods and Properties in PySpark (Cont'd)
  • Grouping and Aggregation in PySpark
  • The "DataFrame to RDD" Bridge in PySpark
  • The SQLContext Object
  • Examples of Spark SQL / DataFrame (PySpark Example)
  • Converting an RDD to a DataFrame Example
  • Example of Reading / Writing a JSON File
  • Performance, Scalability, and Fault-tolerance of Spark SQL
  • Summary

Chapter 16 - Repairing and Normalizing Data

  • Repairing and Normalizing Data
  • Dealing with the Missing Data
  • Sample Data Set
  • Getting Info on Null Data
  • Dropping a Column
  • Interpolating Missing Data in pandas
  • Replacing the Missing Values with the Mean Value
  • Scaling (Normalizing) the Data
  • Data Preprocessing with scikit-learn
  • Scaling with the scale() Function
  • The MinMaxScaler Object
  • Summary

Chapter 17 - Data Grouping and Aggregation in Python

  • Data Aggregation and Grouping
  • Sample Data Set
  • The pandas.core.groupby.SeriesGroupBy Object
  • Grouping by Two or More Columns
  • Emulating SQL's WHERE Clause
  • The Pivot Tables
  • Cross-Tabulation
  • Summary

Lab Exercises

  • Lab 1. Introduction to Python
  • Lab 2. Creating Scripts
  • Lab 3. Variables in Python
  • Lab 4. Collections
  • Lab 5. Control Statements and Loops
  • Lab 6. Functions in Python
  • Lab 7. Reading and Writing Text Files
  • Lab 8. Functional Programming
  • Lab 9. Learning the Databricks Community Cloud Lab Environment
  • Lab 10. Data Visualization and EDA with pandas and seaborn
  • Lab 11. Learning PySpark Shell Environment
  • Lab 12. Understanding Spark DataFrames
  • Lab 13. Learning the PySpark DataFrame API
  • Lab 14. Data Repair and Normalization in PySpark
  • Lab 15. Spark SQL with PySpark

Schedule

FAQ

Does the course schedule include a Lunchbreak?

Classes typically include a 1-hour lunch break around midday. However, the exact break times and duration can vary depending on the specific class. Your instructor will provide detailed information at the start of the course.

What languages are used to deliver training?

Most courses are conducted in English, unless otherwise specified. Some courses will have the word "FRENCH" marked in red beside the scheduled date(s) indicating the language of instruction.

What does GTR stand for?

GTR stands for Guaranteed to Run; if you see a course with this status, it means this event is confirmed to run. View our GTR page to see our full list of Guaranteed to Run courses.

Does Ascendient Learning deliver group training?

Yes, we provide training for groups, individuals and private on sites. View our group training page for more information.

What does vendor-authorized training mean?

As a vendor-authorized training partner, we offer a curriculum that our partners have vetted. We use the same course materials and facilitate the same labs as our vendor-delivered training. These courses are considered the gold standard and, as such, are priced accordingly.

Is the training too basic, or will you go deep into technology?

It depends on your requirements, your role in your company, and your depth of knowledge. The good news about many of our learning paths, you can start from the fundamentals to highly specialized training.

How up-to-date are your courses and support materials?

We continuously work with our vendors to evaluate and refresh course material to reflect the latest training courses and best practices.

Are your instructors seasoned trainers who have deep knowledge of the training topic?

Ascendient Learning instructors have an average of 27 years of practical IT experience and have also served as consultants for an average of 15 years. To stay current, instructors spend at least 25 percent of their time learning new, emerging technologies and courses.

Do you provide hands-on training and exercises in an actual lab environment?

Lab access is dependent on the vendor and the type of training you sign up for. However, many of our top vendors will provide lab access to students to test and practice. The course description will specify lab access.

Will you customize the training for our company’s specific needs and goals?

We will work with you to identify training needs and areas of growth.  We offer a variety of training methods, such as private group training, on-site of your choice, and virtually. We provide courses and certifications that are aligned with your business goals.

How do I get started with certification?

Getting started on a certification pathway depends on your goals and the vendor you choose to get certified in. Many vendors offer entry-level IT certification to advanced IT certification that can boost your career. To get access to certification vouchers and discounts, please contact info@ascendientlearning.com.

Will I get access to content after I complete a course?

You will get access to the PDF of course books and guides, but access to the recording and slides will depend on the vendor and type of training you receive.

How do I request a W9 for Ascendient Learning?

View our filing status and how to request a W9.

Reviews

ExitCertified provided us with a great opportunity to learn more about React and in easy to follow way.

They are very good and made sure we had all the appropriate materials for class.

Simply great training provider that I can go for updating/acquiring my skill sets.

This was a good program to get prepared for the solutions architect associate exam.

Very good couse and again we would like to see more videos on removing FRUs