8296  Reviews star_rate star_rate star_rate star_rate star_half

Introduction to Spark 3 in Scala with Scala Primer

This Spark 3 in Scala with Scala Primer training course gives attendees a solid technical introduction to the Spark architecture and how Spark works. After getting quickly ramped up on Scala,...

Read More
Duration 4 days
Course Code SPRK-108
Available Formats Classroom

Overview

This Spark 3 in Scala with Scala Primer training course gives attendees a solid technical introduction to the Spark architecture and how Spark works. After getting quickly ramped up on Scala, participants learn how to leverage Spark SQL, DataFrames, and DataSets, which are now the preferred programming API. In addition, students explore possible performance issues and strategies for optimization. The course also covers more advanced topics, including the use of Spark Streaming to process streaming data and Kafka server integration.

Skills Gained

All students will:

  • Receive a thorough Scala Introduction
  • Understand the need for Spark in data processing
  • Understand the Spark architecture and how it distributes computations to cluster nodes
  • Be familiar with basic installation/setup/layout of Spark
  • Use the Spark shell for interactive and ad-hoc operations
  • Understand RDDs (Resilient Distributed Datasets), and data partitioning, pipelining, and computations
  • Understand/use RDD ops such as map(), filter(), and others.
  • Understand and use Spark SQL and the DataFrame/DataSet API.
  • Understand DataSet/DataFrame capabilities, including the Catalyst query optimizer and Tungsten memory/CPU optimizations.
  • Be familiar with performance issues, and use the DataSet/DataFrame and Spark SQL for efficient computations
  • Understand Spark’s data caching and use it for efficient data transfer
  • Write/run standalone Spark programs with the Spark API
  • Use Spark Structured Streaming to process streaming (real-time) data
  • Ingest streaming data from Kafka, and process via Spark Structured Streaming
  • Understand performance implications and optimizations when using Spark

Prerequisites

All attendees must have object-oriented programming knowledge. No previous Scala knowledge is presumed.

Course Details

Training Materials

All Spark training attendees receive comprehensive courseware.

Software Requirements

  • Windows, Mac, or Linux PCs with the current Chrome or Firefox browser.
    • Most class activities will create Spark code and visualizations in a browser-based notebook environment. The class also details how to export these notebooks and how to run code outside of this environment.
  • Internet access

Outline

  • Introduction
  • Scala Ramp Up
    • Scala Introduction, Variables, Data Types, Control Flow
    • The Scala Interpreter
    • Collections and their Standard Methods (e.g. map())
    • Functions, Methods, Function Literals
    • Class, Object, Trait, case Class
  • Introduction to Spark
    • Overview, Motivations, Spark Systems
    • Spark Ecosystem
    • Spark vs. Hadoop
    • Acquiring and Installing Spark
    • The Spark Shell, SparkContext
  • RDDs and Spark Architecture
    • RDD Concepts, Lifecycle, Lazy Evaluation
    • RDD Partitioning and Transformations
    • Working with RDDs - Creating and Transforming (map, filter, etc.)
  • Spark SQL, DataFrames, and DataSets
    • Overview
    • SparkSession, Loading/Saving Data, Data Formats (JSON, CSV, Parquet, text, etc.)
    • Introducing DataFrames and DataSets (Creation and Schema Inference)
    • Supported Data Formats (JSON, Text, CSV, Parquet)
    • Working with the DataFrame (untyped) Query DSL (Column, Filtering, Grouping, Aggregation)
    • SQL-based Queries
    • Working with the DataSet (typed) API
    • Mapping and Splitting (flatMap(), explode(), and split())
    • DataSets vs. DataFrames vs. RDDs
  • Shuffling Transformations and Performance
    • Grouping, Reducing, Joining
    • Shuffling, Narrow vs. Wide Dependencies, and Performance Implications
    • Exploring the Catalyst Query Optimizer (explain(), Query Plans, Issues with lambdas)
    • The Tungsten Optimizer (Binary Format, Cache Awareness, Whole-Stage Code Gen)
  • Performance Tuning
    • Caching - Concepts, Storage Type, Guidelines
    • Minimizing Shuffling for Increased Performance
    • Using Broadcast Variables and Accumulators
    • General Performance Guidelines
  • Creating Standalone Applications
    • Core API, SparkSession.Builder
    • Configuring and Creating a SparkSession
    • Building and Running Applications - sbt/build.sbt and spark-submit
    • Application Lifecycle (Driver, Executors, and Tasks)
    • Cluster Managers (Standalone, YARN, Mesos)
    • Logging and Debugging
  • Spark Streaming
    • Introduction and Streaming Basics
    • Structured Streaming
      • Continuous Applications
      • Table Paradigm, Result Table
      • Steps for Structured Streaming
      • Sources and Sinks
    • Consuming Kafka Data
      • Kafka Overview
      • Structured Streaming - "Kafka" format
      • Processing the Stream
  • Conclusion

Schedule

FAQ

Does the course schedule include a Lunchbreak?

Classes typically include a 1-hour lunch break around midday. However, the exact break times and duration can vary depending on the specific class. Your instructor will provide detailed information at the start of the course.

What languages are used to deliver training?

Most courses are conducted in English, unless otherwise specified. Some courses will have the word "FRENCH" marked in red beside the scheduled date(s) indicating the language of instruction.

What does GTR stand for?

GTR stands for Guaranteed to Run; if you see a course with this status, it means this event is confirmed to run. View our GTR page to see our full list of Guaranteed to Run courses.

Does Ascendient Learning deliver group training?

Yes, we provide training for groups, individuals and private on sites. View our group training page for more information.

What does vendor-authorized training mean?

As a vendor-authorized training partner, we offer a curriculum that our partners have vetted. We use the same course materials and facilitate the same labs as our vendor-delivered training. These courses are considered the gold standard and, as such, are priced accordingly.

Is the training too basic, or will you go deep into technology?

It depends on your requirements, your role in your company, and your depth of knowledge. The good news about many of our learning paths, you can start from the fundamentals to highly specialized training.

How up-to-date are your courses and support materials?

We continuously work with our vendors to evaluate and refresh course material to reflect the latest training courses and best practices.

Are your instructors seasoned trainers who have deep knowledge of the training topic?

Ascendient Learning instructors have an average of 27 years of practical IT experience and have also served as consultants for an average of 15 years. To stay current, instructors spend at least 25 percent of their time learning new, emerging technologies and courses.

Do you provide hands-on training and exercises in an actual lab environment?

Lab access is dependent on the vendor and the type of training you sign up for. However, many of our top vendors will provide lab access to students to test and practice. The course description will specify lab access.

Will you customize the training for our company’s specific needs and goals?

We will work with you to identify training needs and areas of growth.  We offer a variety of training methods, such as private group training, on-site of your choice, and virtually. We provide courses and certifications that are aligned with your business goals.

How do I get started with certification?

Getting started on a certification pathway depends on your goals and the vendor you choose to get certified in. Many vendors offer entry-level IT certification to advanced IT certification that can boost your career. To get access to certification vouchers and discounts, please contact info@ascendientlearning.com.

Will I get access to content after I complete a course?

You will get access to the PDF of course books and guides, but access to the recording and slides will depend on the vendor and type of training you receive.

How do I request a W9 for Ascendient Learning?

View our filing status and how to request a W9.

Reviews

Some Labs are very good but some steps it ask to update but its already updated, but overall its very good training.

Simply great training provider that I can go for updating/acquiring my skill sets.

ExitCertified provided great learning material and the instructor was great.

it was good and very informative. Instructure covered everything in detail.

The training was very good to understand the concepts and how to set up things .