Recommended Databricks Training
The IT Skills Taught in Databricks Training
Big data came to the forefront of technology in the mid- to late 2000s. As organizations started to gather more and more data, they struggled in finding ways to store and process it, which ultimately may have resulted in a fewer major innovations. Although the Apache Hadoop project gained popularity quickly, many organizations faced technical challenges in working with the distributed processing framework. But when a group of UC Berkeley students built Apache Spark, it was quickly recognized as the next major step of big data processing. Those students went on to start the company Databricks, leveraging their intimate knowledge of the Apache Spark framework to build a web-based platform for working with Spark. Databricks offers automated cluster management and IPython-style notebooks that make working with data very easy for data scientists and developers alike. Databricks also developed Delta Lake, the open source storage layer for data lakes. Due to its ability to execute and the completeness of its vision, Databricks has been named a leader in Gartner’s Magic Quadrant for Data Science and Machine Learning for three years in a row.
In order to help fulfill its customers’ growing needs for training, Databricks chose Ascendient Learning to be its lone certified training partner in North America. So if you’re looking for a in-depth 3 day Apache Spark Programming class or are interested in Data Engineering with Databricks or Scalable Machine Learning with Apache Spark, we have you covered. Explore the Databricks training courses below and get started on your learning journey.
Databricks Certified Data Analyst Associate
The Databricks Certified Data Analyst Associate certification exam assesses an individual’s ability to use the Databricks SQL service to complete introductory data analysis tasks. This includes an understanding of the Databricks SQL service and its capabilities, an ability to manage data with Databricks tools following best practices, using SQL to complete data tasks in the Lakehouse, creating production-grade data visualizations and dashboards, and developing analytics applications to solve common data analytics problems. Individuals who pass this certification exam can be expected to complete basic data analysis tasks using Databricks SQL and its associated capabilities.
The exam covers:
Databricks SQL – 22%
Data Management – 20%
SQL – 29%
Data Visualization and Dashboards – 18%
Analytics Applications – 11%
Assessment Details
Type: Proctored certification
Total number of questions: 45
Time limit: 90 minutes
Registration fee: $200
Question types: Multiple choice
Test aides: None allowed
Languages: English
Delivery method: Online proctored
Prerequisites: None, but related training highly recommended
Recommended experience: 6+ months of hands-on experience performing the data analysis tasks outlined in the exam guide
Validity period: 2 years
Recertification: Recertification is required every two years to maintain your certified status. To recertify, you must take the current version of the exam. Please review the “Getting Ready for the Exam” section below to prepare for your recertification exam.
Unscored content: Exams may include unscored items to gather statistical information for future use. These items are not identified on the form and do not impact your score. Additional time is factored into the exam to account for this content.
Related Training
Instructor-led: Data Analysis with Databricks
Self-paced (available in Databricks Academy): Data Analysis with Databricks. This self-paced course will soon be replaced with the following two modules.
AI/BI for Data Analysts
SQL Analytics on Databricks
Getting Ready for the Exam
Review the Data Analyst Associate Exam Guide to understand what will be on the exam
Take the related training
Register for the exam
Review the technical requirements for taking an online proctored exam and run a system check
Review the exam guide again to identify any gaps
Study to fill in the gaps
Take your exam!
The certification exam will assess the tester’s ability to use SQL. In all cases, the SQL in this certification exam adheres to ANSI SQL standards.
Databricks Certified Data Engineer Associate
The Databricks Certified Data Engineer Associate certification exam assesses an individual’s ability to use the Databricks Lakehouse Platform to complete introductory data engineering tasks. This includes an understanding of the Lakehouse Platform and its workspace, its architecture, and its capabilities. It also assesses the ability to perform multi-hop architecture ETL tasks using Apache Spark™ SQL and Python in both batch and incrementally processed paradigms. Finally, the exam assesses the tester’s ability to put basic ETL pipelines and Databricks SQL queries and dashboards into production while maintaining entity permissions. Individuals who pass this certification exam can be expected to complete basic data engineering tasks using Databricks and its associated tools.
The exam covers:
Databricks Lakehouse Platform – 24%
ELT With Spark SQL and Python – 29%
Incremental Data Processing – 22%
Production Pipelines – 16%
Data Governance – 9%
Assessment Details
Type: Proctored certification
Total number of questions: 45
Time limit: 90 minutes
Registration fee: $200
Question types: Multiple choice
Test aides: None allowed
Languages: English, 日本語, Português BR, 한국어
Delivery method: Online proctored
Prerequisites: None, but related training highly recommended
Recommended experience: 6+ months of hands-on experience performing the data engineering tasks outlined in the exam guide
Validity period: 2 years
Recertification: Recertification is required every two years to maintain your certified status. To recertify, you must take the current version of the exam. Please review the “Getting Ready for the Exam” section below to prepare for your recertification exam.
Unscored content: Exams may include unscored items to gather statistical information for future use. These items are not identified on the form and do not impact your score. Additional time is factored into the exams to account for this content.
Related Training
Instructor-led: Data Engineering With Databricks
Self-paced (available in Databricks Academy):
Data Ingestion with Delta Lake
Deploy Workloads with Databricks Workflows
Build Data Pipelines with Delta Live Tables
Data Management and Governance with Unity Catalog
Getting Ready for the Exam
Review the Data Data Engineer Associate Exam Guide to understand what will be on the exam
Take the related training
Register for the exam
Review the technical requirements for taking an online proctored exam and run a system check
Review the exam guide again to identify any gaps
Study to fill in the gaps
Take your exam!
Data manipulation code in this exam is provided in SQL when possible. In all other cases, code will be in Python.
Databricks Certified Data Engineer Professional
The Databricks Certified Data Engineer Professional certification exam assesses an individual’s ability to use Databricks to perform advanced data engineering tasks. This includes an understanding of the Databricks platform and developer tools like Apache Spark™, Delta Lake, MLflow, and the Databricks CLI and REST API. It also assesses the ability to build optimized and cleaned ETL pipelines. Additionally, the ability to model data into a lakehouse using knowledge of general data modeling concepts will be assessed. Finally, being able to ensure that data pipelines are secure, reliable, monitored and tested before deployment will also be included in this exam. Individuals who pass this certification exam can be expected to complete advanced data engineering tasks using Databricks and its associated tools.
The exam covers:
Databricks Tooling – 20%
Data Processing – 30%
Data Modeling – 20%
Security and Governance – 10%
Monitoring and Logging – 10%
Testing and Deployment – 10%
Assessment Details
Type: Proctored certification
Total number of questions: 60
Time limit: 120 minutes
Registration fee: $200
Question types: Multiple choice
Test aides: None allowed
Languages: English
Delivery method: Online proctored
Prerequisites: None, but related training highly recommended
Recommended experience: 1+ years of hands-on experience performing the data engineering tasks outlined in the exam guide
Validity period: 2 years
Recertification: Recertification is required every two years to maintain your certified status. To recertify, you must take the current version of the exam. Please review the “Getting Ready for the Exam” section below to prepare for your recertification exam.
Unscored content: Exams may include unscored items to gather statistical information for future use. These items are not identified on the form and do not impact your score. Additional time is factored into the exams to account for this content.
Related Training
Instructor-led: Advanced Data Engineering With Databricks
Self-paced (available in Databricks Academy):
Databricks Streaming and Delta Live Tables - Delta Dawn
Databricks Data Privacy
Databricks Performance Optimization Delta Dawn
Automated Testing and Deployment with Databricks Asset Bundle
Getting Ready for the Exam
Review the Data Engineer Professional Exam Guide to understand what will be on the exam
Take the related training
Register for the exam
Review the technical requirements and run a system check
Review the exam guide again to identify any gaps
Study to fill in the gaps
Take your exam!
Code examples in this exam will primarily be in Python. However, any and all references to Delta Lake functionality will be made in SQL.
Databricks Certified Machine Learning Associate
The Databricks Certified Machine Learning Associate certification exam assesses an individual’s ability to use Databricks to perform basic machine learning tasks. This includes an ability to understand and use Databricks and its machine learning capabilities like AutoML, Unity Catalog and select features of MLflow. It also assesses the ability to explore data and perform feature engineering. Additionally, the exam assesses model building through training, tuning and evaluation and selection. Finally, an ability to deploy machine learning models is assessed. Individuals who pass this certification exam can be expected to complete basic machine learning tasks using Databricks and its associated tools.
This exam covers:
Databricks Machine Learning – 38%
ML Workflows – 19%
Model Development – 31%
Model Deployment – 12%
Assessment Details
Type: Proctored certification
Total number of questions: 48
Time limit: 90 minutes
Registration fee: $200
Question types: Multiple choice
Test aides: None allowed
Languages: English, 日本語, Português BR, 한국어
Delivery method: Online proctored
Prerequisites: None, but related training highly recommended
Recommended experience: 6+ months of hands-on experience performing the machine learning tasks outlined in the exam guide
Validity period: 2 years
Recertification: Recertification is required every two years to maintain your certified status. To recertify, you must take the current version of the exam. Please review the “Getting Ready for the Exam” section below to prepare for your recertification exam.
Unscored content: Exams may include unscored items to gather statistical information for future use. These items are not identified on the form and do not impact your score. Additional time is factored into the exam to account for this content.
Related Training
Instructor-led: Machine Learning With Databricks
Self-paced (available in Databricks Academy):
Data Preparation for Machine Learning
Machine Learning Model Deployment
Machine Learning Model Development
Machine Learning Ops
Getting Ready for the Exam
Review the Machine Learning Associate Exam Guide to understand what will be on the exam
Take the related training
Register for the exam
Review the technical requirements for taking an online proctored exam and run a system check
Review the exam guide again to identify any gaps
Study to fill in the gaps
Take your exam!
All machine learning code within this exam will be in Python. In the case of workflows or code not specific to machine learning tasks, data manipulation code could be provided in SQL.
Databricks Certified Machine Learning Professional
The Databricks Certified Machine Learning Professional certification exam assesses an individual’s ability to use Databricks Machine Learning and its capabilities to perform advanced machine learning in production tasks. This includes the ability to track, version, and manage machine learning experiments and manage the machine learning model lifecycle. In addition, the certification exam assesses the ability to implement strategies for deploying machine learning models. Finally, test-takers will also be assessed on their ability to build monitoring solutions to detect data drift. Individuals who pass this certification exam can be expected to perform advanced machine learning engineering tasks using Databricks Machine Learning.
This exam covers:
Experimentation - 30%
Model Lifecycle Management - 30%
Model Deployment - 25%
Solution and Data Monitoring - 15%
Assessment Details
Type: Proctored certification
Total number of questions: 60
Time limit: 120 minutes
Registration fee: $200
Question types: Multiple choice
Test aides: None allowed
Languages: English
Delivery method: Online proctored
Prerequisites: None, but related training highly recommended
Recommended experience: 1+ years of hands-on experience performing the machine learning tasks outlined in the exam guide
Validity period: 2 years
Recertification: Recertification is required every two years to maintain your certified status. To recertify, you must take the current version of the exam. Please review the “Getting Ready for the Exam” section below to prepare for your recertification exam.
Unscored content: Exams may include unscored items to gather statistical information for future use. These items are not identified on the form and do not impact your score. Additional time is factored into the exam to account for this content.
Related Training
Instructor-led: Machine Learning in Production
Self-paced (available in Databricks Academy):
Machine Learning at Scale
Advanced ML Ops
Getting Ready for the Exam
Review the Machine Learning Professional Exam Guide to understand what will be on the exam
Take the related training
Register for the exam
Review the technical requirements and run a system check
Review the exam guide again to identify any gaps
Study to fill in the gaps
Take your exam!
The certification exam will assess the tester’s ability to use SQL. In all cases, the SQL in this certification exam adheres to ANSI SQL standards.
Databricks Certified Generative AI Engineer Associate
The Databricks Certified Generative AI Engineer Associate certification exam assesses an individual’s ability to design and implement LLM-enabled solutions using Databricks. This includes problem decomposition to break down complex requirements into manageable tasks as well as choosing appropriate models, tools and approaches from the current generative AI landscape for developing comprehensive solutions. It also assesses Databricks-specific tools such as Vector Search for semantic similarity searches, Model Serving for deploying models and solutions, MLflow for managing a solution lifecycle, and Unity Catalog for data governance. Individuals who pass this exam can be expected to build and deploy performant RAG applications and LLM chains that take full advantage of Databricks and its toolset.
The exam covers:
Design Applications – 14%
Data Preparation – 14%
Application Development – 30%
Assembling and Deploying Apps – 22%
Governance – 8%
Evaluation and Monitoring – 12%
Assessment Details
Type: Proctored certification
Total number of questions: 45
Time limit: 90 minutes
Registration fee: $200
Question types: Multiple choice
Test aides: None allowed
Languages: English, 日本語, Português BR, 한국어
Delivery method: Online proctored
Prerequisites: None, but related training highly recommended
Recommended experience: 6+ months of hands-on experience performing the generative AI solutions tasks outlined in the exam guide
Validity period: 2 years
Recertification: Recertification is required every two years to maintain your certified status. To recertify, you must take the current version of the exam. Please review the “Getting Ready for the Exam” section below to prepare for your recertification exam.
Unscored content: Exams may include unscored items to gather statistical information for future use. These items are not identified on the form and do not impact your score. Additional time is factored into the exam to account for this content.
Related Training
Instructor-led: Generative AI Engineering With Databricks
Self-paced (available in Databricks Academy): Generative AI Engineering with Databricks. This self-paced course will soon be replaced with the following four modules.
Generative AI Solution Development (RAG)
Generative AI Application Development (Agents)
Generative AI Application Evaluation and Governance
Generative AI Application Deployment and Monitoring
Getting Ready for the Exam
Review the Databricks Certified Generative AI Engineer Associate Guide to understand what will be on the exam
Take the related training
Register for the exam
Review the technical requirements and run a system check
Review the exam guide again to identify any gaps
Study to fill in the gaps
Take your exam!
All machine learning code within this exam will be in Python. In the case of workflows or code not specific to machine learning tasks, data manipulation code could be provided in SQL.
Databricks Certified Associate Developer for Apache Spark
The Databricks Certified Associate Developer for Apache Spark certification exam assesses the understanding of the Spark DataFrame API and the ability to apply the Spark DataFrame API to complete basic data manipulation tasks within a Spark session. These tasks include selecting, renaming and manipulating columns; filtering, dropping, sorting, and aggregating rows; handling missing data; combining, reading, writing and partitioning DataFrames with schemas; and working with UDFs and Spark SQL functions. In addition, the exam will assess the basics of the Spark architecture like execution/deployment modes, the execution hierarchy, fault tolerance, garbage collection, and broadcasting. Individuals who pass this certification exam can be expected to complete basic Spark DataFrame tasks using Python or Scala.
Learning Pathway
This certification is part of the Apache Spark learning pathway.
Learning Path
Exam Details
Key details about the certification exam are provided below.
Minimally Qualified Candidate
The minimally qualified candidate should be able to:
Understanding the basics of the Spark architecture, including Adaptive Query Execution
Apply the Spark DataFrame API to complete individual data manipulation task, including:
selecting, renaming and manipulating columns
filtering, dropping, sorting, and aggregating rows
joining, reading, writing and partitioning DataFrames
working with UDFs and Spark SQL functions
While it will not be explicitly tested, the candidate must have a working knowledge of either Python or Scala. The exam is available in both languages.
Duration
Testers will have 120 minutes to complete the certification exam.
Questions
There are 60 multiple-choice questions on the certification exam. The questions will be distributed by high-level topic in the following way:
Apache Spark Architecture Concepts – 17% (10/60)
Apache Spark Architecture Applications – 11% (7/60)
Apache Spark DataFrame API Applications – 72% (43/60)
Cost
Each attempt of the certification exam will cost the tester $200. Testers might be subjected to tax payments depending on their location. Testers are able to retake the exam as many times as they would like, but they will need to pay $200 for each attempt.
Test Aids
The following test aids will be available to be used by candidates during the exam:
Apache Spark API documentation for the language in which they’re taking the exam. An example of these test aids is available here: Python/Scala.
A digital notepad to use during the active exam time – candidates will not be able to bring notes to the exam or take notes away from the exam
Programming Language
This certification exam is available in Python and Scala.
Preparation
In order to learn the content assessed by the certification exam, candidates should take one of the following Databricks Academy courses:
Instructor-led: Apache Spark Programming with Databricks
Self-paced: Apache Spark Programming with Databricks (available in Databricks Academy)
In addition, candidates can learn more about the certification exam by taking the Certification Overview: Databricks Certified Associate Developer for Apache Spark Exam course.
Before taking the exam, it is recommended that you complete the practice exam for your language of choice: Python or Scala.