NVLogo wht bg v2
8404  Reviews star_rate star_rate star_rate star_rate star_half

Deploying RAG Pipelines for Production at Scale

Retrieval-Augmented Generation (RAG) pipelines are revolutionizing enterprise operations. However, most existing tutorials stop at proof-of-concept implementations that falter when scaling. This...

Read More
$500 USD
Duration 1 day
Course Code NV-RAG-PS
Available Formats Classroom

Overview

Retrieval-Augmented Generation (RAG) pipelines are revolutionizing enterprise operations. However, most existing tutorials stop at proof-of-concept implementations that falter when scaling. This workshop aims to bridge that gap, focusing on building scalable, production-ready RAG pipelines powered by NVIDIA NIM microservices and Kubernetes. Participants will gain hands-on experience deploying, monitoring, and scaling RAG pipelines with the NIM Operator and learn best practices for infrastructure optimization, performance monitoring, and handling high traffic volumes. The workshop begins by building a simple RAG pipeline using the NVIDIA API catalog. Participants will deploy and test individual components in a local environment using Docker Compose. Once familiar with the basics, the focus will shift to deploying NIMs, such as LLM, NeMo Retriever Text Embedding, and NeMo Retriever Text Reranking, in a Kubernetes cluster using the NIM Operator. This will include managing the deployment, monitoring, and scalability of NVIDIA NIM microservices. Building on these deployments, the workshop will cover constructing a production-grade RAG pipeline using the deployed NIMs and explore NVIDIA's blueprint for PDF ingestion, learning how to integrate it into the RAG pipeline. To ensure operational efficiency, the workshop will introduce Prometheus and Grafana for monitoring pipeline performance, cluster health, and resource utilization. Scalability will be addressed through the use of the Kubernetes Horizontal Pod Autoscaler (HPA) for dynamically scaling NIMs based on custom metrics in conjunction with the NIM Operator. Custom dashboards will be created to visualize key metrics and interpret performance insights.

Skills Gained

By participating in this workshop, participants will be equiped to:

  • Build a simple RAG pipeline using API endpoints, deployed locally with Docker Compose.
  • Deploy a variety of NVIDIA NIM microservices in a Kubernetes cluster using the NIM Operator.
  • Combine NIMs into a cohesive, production-grade RAG pipeline and integrate advanced data ingestion workflows.
  • Monitor RAG pipelines and Kubernetes clusters with Prometheus and Grafana.
  • Scale NIMs to handle high traffic using the NIM Operator.
  • Create, deploy, and scale RAG pipelines for a variety of agentic workflows, including PDF ingestion.

Prerequisites

  • Familiarity working with LLM based applications
  • Familiarity with RAG pipelines
  • Familiarity working with Kubernetes
  • Familiarity working with Helm

Course Details

Topics Covered

In service of teaching and demonstrating how to deploy enterprise-scale LLM-based agentic and RAG applications this workshop will cover the following topics and technologies:

  • The current landscape of enterprise generative AI applications
  • NVIDIA NIM microservices
  • Components and architecture of enterprise-grade RAG applications
  • At-scale inference considerations and optimations
  • Kubernetes, Helm, and the NVIDIA RAG operator to deploy, manage, and scale RAG application services
  • Prometheus and Grafana for cluster-wide application behavior and performance visibility
  • Techniques for deploying and scaling multimodal RAG applications at scale

Production Deployment of Generative AI Applications

  • Survey the current landscape of generative AI applications across a wide variety of capabilities.
  • Review the many constituent parts of enterprise grade generative AI applications including RAG pipelines.
  • Understand the challenges that enterprises face when transitioning from naive to enterprise-grade generative AI applications.
  • Learn about the capabilities of NVIDIA NIM microservices for deploying LLMs and other generative AI application components.
  • Discuss techniques and technologies for improving the performance of LLM-based application inference at scale.

Core Concepts of Enterprise-scale Generative AI DevOps

  • Survey the current landscape of available DevOps tools for enterprise-scale containerized application deployment.
  • Learn about the value of the Kubernetes container orchestration platform.

Deploying Simple RAG Applications

  • Learn how to access remote LLMs and RAG services over API calls.
  • Review the core LangChain programming patterns used in engineering RAG applications.
  • Build a simple RAG application using API-hosted services, and deploy it with Docker Compose.

Kubernetes Core Concepts

  • Learn the core concepts and techniques required for working with Kubernetes clusters.
  • Familiarize yourself with the interactive multi-node Kubernetes cluster programming environment provided to you in this workshop.
  • Interactive utilize `kubectl` to deploy, manage, and monitor container-based applications across a cluster.

Deploying Self-hosted RAG Applications

  • Deploy and coordinate a variety of containerized microservices in service of a cluster-based RAG application.
  • Learn about and utilize the NIM operator to manage and scale a variety of RAG microservices.
  • Configure storage on your cluster for various model caches.
  • Deploy LLM, text embedding, and text retriever services onto your cluster.

Monitoring GPU Utilization

  • Use NVIDIA Data Center GPU Manager (DCGM) to monitor and manage GPU resources across a Kubernetes cluster.
  • Deploy Prometheus and Grafana onto the cluster to better monitor and visualize cluster resources.
  • Use DCGM Exporter to export GPU utilization metrics for Prometheus which can be visualized with Grafana

Autoscaling NIM Microservices

  • Use Prometheus service monitors to extract custom metrics from the NIM microservices running on the cluster.
  • Create HorizontalPodAutoscalers to automatically scale the cluster's services based on custom metrics.
  • Test and observe automatic horizontal autoscaling by performing load tests using Locust.

Building Multimodal RAG Pipelines

  • Learn how to isolate different modalities from multimodal PDF documents like text, figures, and tables.
  • Practice various strategies for chunking extracted text.
  • Perform table extraction from PDF documents.
  • Perform image/table extraction from PDF documents.
  • Use the NV-YOLOX model to identify PDF page elements.
  • Perform end-to-end multimodal data extraction on the ChipNemo technical paper.

Using Generative AI to Represent Extracted Modalities

  • Create detailed, contextual descriptions for extracted elements such as text, figures, charts, and tables.
  • Perform image transform using VLMs.
  • Use a state-of-the-art context-aware chart element detection (CHART) model to detect classes for chart basic elements, including plot elements.
  • Combine the use of LLMs and VLMs in an end-to-end example on the ChipNemo technical paper.

Multimodal Embedding, Storing, and Retrieving

  • Convert all extracted modalities into a common format for use in a universal RAG pipeline.
  • Construct an end-to-end multimodal RAG pipeline.

Workshop Assessment

  • Use everything you've learned in the workshop to deploy and scale your own enterprise-grade RAG pipeline and earn a certificate of competency for the workshop.

Schedule

FAQ

Does the course schedule include a Lunchbreak?

Classes typically include a 1-hour lunch break around midday. However, the exact break times and duration can vary depending on the specific class. Your instructor will provide detailed information at the start of the course.

What languages are used to deliver training?

Most courses are conducted in English, unless otherwise specified. Some courses will have the word "FRENCH" marked in red beside the scheduled date(s) indicating the language of instruction.

What does GTR stand for?

GTR stands for Guaranteed to Run; if you see a course with this status, it means this event is confirmed to run. View our GTR page to see our full list of Guaranteed to Run courses.

Does Ascendient Learning deliver group training?

Yes, we provide training for groups, individuals and private on sites. View our group training page for more information.

What does vendor-authorized training mean?

As a vendor-authorized training partner, we offer a curriculum that our partners have vetted. We use the same course materials and facilitate the same labs as our vendor-delivered training. These courses are considered the gold standard and, as such, are priced accordingly.

Is the training too basic, or will you go deep into technology?

It depends on your requirements, your role in your company, and your depth of knowledge. The good news about many of our learning paths, you can start from the fundamentals to highly specialized training.

How up-to-date are your courses and support materials?

We continuously work with our vendors to evaluate and refresh course material to reflect the latest training courses and best practices.

Are your instructors seasoned trainers who have deep knowledge of the training topic?

Ascendient Learning instructors have an average of 27 years of practical IT experience and have also served as consultants for an average of 15 years. To stay current, instructors spend at least 25 percent of their time learning new, emerging technologies and courses.

Do you provide hands-on training and exercises in an actual lab environment?

Lab access is dependent on the vendor and the type of training you sign up for. However, many of our top vendors will provide lab access to students to test and practice. The course description will specify lab access.

Will you customize the training for our company’s specific needs and goals?

We will work with you to identify training needs and areas of growth.  We offer a variety of training methods, such as private group training, on-site of your choice, and virtually. We provide courses and certifications that are aligned with your business goals.

How do I get started with certification?

Getting started on a certification pathway depends on your goals and the vendor you choose to get certified in. Many vendors offer entry-level IT certification to advanced IT certification that can boost your career. To get access to certification vouchers and discounts, please contact info@ascendientlearning.com.

Will I get access to content after I complete a course?

You will get access to the PDF of course books and guides, but access to the recording and slides will depend on the vendor and type of training you receive.

How do I request a W9 for Ascendient Learning?

View our filing status and how to request a W9.

Reviews

The labs and course material gave me valuable insights into cloud security architecture

had a good time with the course, however some topics were left out due to the compact amount of time for training.

Overall it was a good bootcamp. A lot to cover so it is understandable that the pace had to be a little fast.

Overall ExitCertified is a great training provider and the remote learning is as effective as in person.

The technical data in the AWS Solutions Architect course was very thorough.