8293  Reviews star_rate star_rate star_rate star_rate star_half

Analyzing Big Data with R Programming

Ascendient Learning's Analyzing Big Data with R Programming training teaches attendees how to use In-memory/on-disk, distributed analysis using H20, Hadoop, and Apache Spark, and how to integrate...

Read More
$3,020 USD
Duration 4 days
Course Code ACCEL-R-ABDP
Available Formats Classroom

Overview

Ascendient Learning's Analyzing Big Data with R Programming training teaches attendees how to use In-memory/on-disk, distributed analysis using H20, Hadoop, and Apache Spark, and how to integrate Microsoft Machine Learning Server and R.

Skills Gained

  • Understand how R works with big data sets
  • Manage big data in memory with data.table
  • Conduct exploratory data analysis with data.table
  • Learn big data management strategies such as sampling, chunk-and-pull, and pushing compute to the database
  • Run SQL queries directly against R dataframes using DuckDB
  • Use DuckDB as an out-of memory backend for R dataframes
  • Perform machine learning operations using mlr3
  • Interface with Apache Spark using Sparklyr or SparkR
  • Use H2O for data munging and machine learning

Prerequisites

In addition to their professional experience, students who attend this course should have:

  • Programming experience using R, and familiarity with common R packages
  • Knowledge of common statistical methods and data analysis best practices
  • Basic knowledge of the Microsoft Windows operating system and its core functionality

Course Details

Training Materials

All R training students receive comprehensive courseware.

Software Requirements

  • A recent release of R 4.x
  • IDE or text editor of your choice (RStudio recommended)

Outline

  • Introduction: 
    • Does R work with big datasets?
    • What challenges does big data introduce when using R?
    • ETL and descriptive data tasks
    • Modeling tasks, optimization challenges
  • In-memory Big Data: Data.table
    • Why do we need data.table?
    • The i and the j arguments in data.table
    • Renaming columns
    • Adding new columns
    • Binning data (continuous to categorical)
    • Combining categorical values
    • Transforming variables
    • Group-by functions with data.table
    • Chaining commands with data.table
    • Data.table pronouns .N, .SD, SDCols
    • Handling missing data
  • EDA with Data.table
    • Data subsetting, splitting, and merging
    • Managing datasets
    • Long to wide and back
    • Merging datasets together
    • Stacking datasets together (concatenation)
    • Data summarization
      • Numerical summaries
      • Categorical summaries
      • Multivariate summaries
    • Creating visualizations
  • Big Three Strategies for dealing with Big Data in R
    • https://rviews.rstudio.com/2019/07/17/3-big-data-strategies-for-r/
    • 1. Sampling
    • 2. Chunk-and-pull
    • 3. Push compute to DB
  • DuckDB 
    • Overview: DuckDB works nicely with R
    • Basic SQL commands for working with DuckDB
    • Understanding query performance optimizations
    • Using dbplyr to work with DuckDB
  • mlr3 for Machine Learning in R
    • Overview of mlr3
    • Goals of machine learning
    • mlr3 R6 object-oriented R and methods
    • Defining a task
    • Assigning roles to data
    • Performing a classification
    • Performing a regression
    • Visualization with mlr3
    • Pipelines
    • Model assessment
    • Model optimization
    • Implementing general linear models
    • Establishing and leveraging partitions/clusters
    • Fitting regression models and making predictions
    • Decision trees and random forests
    • Naïve bayes
    • Implementing stacked models via pipelines
    • Implementing an AutoML model via pipelines
    • Managing resource utilization through parallelization
  • Apache Spark
    • Overview of Spark
    • APIs to use Apache Spark with R
    • Sparklyr versus SparkR
    • R, Python, Java and Scala APIs to Spark
    • Applied Examples using SparkR
    • Spark and H2O together: sparklingwater
    • Data import and manipulation in Spark(R)
    • The Spark machine learning library MLlib:
      • General linear models
      • Random forest
      • Naïve bayes
    • Data Munging and Machine Learning Via H20
      • Intro to H20
      • Launching the cluster, checking status
      • Data Import, manipulation in H20
      • Fitting models in H20
      • Generalized Linear Models
      • Naïve bayes
      • Random forest
      • Gradient boosting machine (GBM)
      • Ensemble model building
      • AutoML
      • Methods for explaining modeling output
  • Conclusion

Schedule

FAQ

Does the course schedule include a Lunchbreak?

Classes typically include a 1-hour lunch break around midday. However, the exact break times and duration can vary depending on the specific class. Your instructor will provide detailed information at the start of the course.

What languages are used to deliver training?

Most courses are conducted in English, unless otherwise specified. Some courses will have the word "FRENCH" marked in red beside the scheduled date(s) indicating the language of instruction.

What does GTR stand for?

GTR stands for Guaranteed to Run; if you see a course with this status, it means this event is confirmed to run. View our GTR page to see our full list of Guaranteed to Run courses.

Does Ascendient Learning deliver group training?

Yes, we provide training for groups, individuals and private on sites. View our group training page for more information.

What does vendor-authorized training mean?

As a vendor-authorized training partner, we offer a curriculum that our partners have vetted. We use the same course materials and facilitate the same labs as our vendor-delivered training. These courses are considered the gold standard and, as such, are priced accordingly.

Is the training too basic, or will you go deep into technology?

It depends on your requirements, your role in your company, and your depth of knowledge. The good news about many of our learning paths, you can start from the fundamentals to highly specialized training.

How up-to-date are your courses and support materials?

We continuously work with our vendors to evaluate and refresh course material to reflect the latest training courses and best practices.

Are your instructors seasoned trainers who have deep knowledge of the training topic?

Ascendient Learning instructors have an average of 27 years of practical IT experience and have also served as consultants for an average of 15 years. To stay current, instructors spend at least 25 percent of their time learning new, emerging technologies and courses.

Do you provide hands-on training and exercises in an actual lab environment?

Lab access is dependent on the vendor and the type of training you sign up for. However, many of our top vendors will provide lab access to students to test and practice. The course description will specify lab access.

Will you customize the training for our company’s specific needs and goals?

We will work with you to identify training needs and areas of growth.  We offer a variety of training methods, such as private group training, on-site of your choice, and virtually. We provide courses and certifications that are aligned with your business goals.

How do I get started with certification?

Getting started on a certification pathway depends on your goals and the vendor you choose to get certified in. Many vendors offer entry-level IT certification to advanced IT certification that can boost your career. To get access to certification vouchers and discounts, please contact info@ascendientlearning.com.

Will I get access to content after I complete a course?

You will get access to the PDF of course books and guides, but access to the recording and slides will depend on the vendor and type of training you receive.

How do I request a W9 for Ascendient Learning?

View our filing status and how to request a W9.

Reviews

The tool provided to practice the course teachings is very functional and easy to use.

They were very good. They made sure everyone was able to get into the training and got all of the material needed for class.

very good and spcecific course and above all a very good instructor. In few days I have learned a lot.

great class and packed with material, would have lived to spread it more into many days but overall very informative.

The class and material is good. I think some of the software needs to be updated.