8781  Reviews star_rate star_rate star_rate star_rate star_half

Building an Enterprise AI Platform

Design an enterprise AI platform that tenants build on without rebuilding the plumbing. Platform thinking and multi-tenant control planes apply the platform mindset to AI through golden-path design,...

Read More
Duration 3 days
Course Code GAI-2702
Available Formats Classroom

Overview

Course Description

Design an enterprise AI platform that tenants build on without rebuilding the plumbing. Platform thinking and multi-tenant control planes apply the platform mindset to AI through golden-path design, opinion versus optionality, tenancy models, namespace and quota design, isolation across compute, data, and identity, GPU capacity planning, and noisy-neighbour patterns. Model serving, routing, and the shared data plane span inference gateways, model registries, traffic-splitting, fallback routing, autoscaling and continuous batching, shared retrieval, vector and feature stores with index lifecycle, memory services, and evaluation tooling that tenants consume. Identity, secrets, entitlements, FinOps, and sustainability draw on per-tenant identity, data-access entitlements, gateway and runtime policy, cost attribution per tenant and per workload, model-mix optimisation, request-level cost telemetry, and sustainability levers. Developer experience and platform-as-product address SDK and CLI design, scaffolding templates, discovery, intake, roadmaps, service-level expectations, and the metrics that prove return on the platform, with positioning across cloud, on-prem, and hybrid deployment. Hands-on labs produce tenancy designs, routing exercises, entitlement modelling, FinOps attribution worksheets, golden paths, and a platform-product canvas. The course is designed for platform, AI, and infrastructure architects plus technical leads building shared AI capability.

Skills Gained

By the end of this course, participants will be able to:

  • Design a multi-tenant control plane that isolates compute, data, and identity for AI workloads
  • Architect model serving plus routing layers that decouple tenants from underlying model choices
  • Specify the shared data, memory, and evaluation plane that tenants consume rather than rebuild
  • Enforce identity, entitlements, plus policy across the AI platform surface
  • Apply FinOps and sustainability patterns to attribute and reduce AI workload cost
  • Run an AI platform as a product, with discovery, golden paths, and adoption metrics

Who Can Benefit

This course is designed for:

  • Platform Architects
  • AI Architects
  • Infrastructure Architects
  • Technical Leads

Prerequisites

Participants should enter this course with:

  • GAI-1701 or equivalent
  • Familiarity with cloud platform engineering and multi-tenant systems

Organizational Objectives

This course assists organizations to:

  • Reduce duplicated AI infrastructure investment by consolidating shared capability into a platform
  • Lower compliance and entitlement risk through centralized policy enforcement on the AI gateway
  • Improve developer velocity through golden paths that remove repeated platform decisions
  • Build cost transparency for AI workloads through per-tenant and per-request FinOps attribution
  • Establish a platform-as-product discipline that proves return on the AI platform investment

Software

All attendees must have a modern web browser and an Internet connection.

Course Details

Course Details

Thinking in Platforms and Paved Roads

By the end of this module, you will be able to apply the platform-engineering mindset to AI, design golden paths that remove repeated decisions for tenant teams, and distinguish opinion from optionality at the right altitude.

  • Platform mindset - enabling tenant teams without rebuilding their plumbing
  • Golden paths and paved roads for AI workloads
  • Opinion versus optionality - when to constrain, when to leave choice
  • Team Topologies platform-team mode - X-as-a-Service plus collaboration during onboarding
  • Anti-patterns - platform-as-policy-only, platform-as-IT-2.0, optionality-by-default
  • Maturity progression - enabling team to product team
  • <b>Hands-on Lab:</b> Define the paved-road slice for a candidate AI platform by writing the opinion-versus-optionality matrix and naming three decisions the platform will own on behalf of tenants.

Designing Multi-Tenant Control Planes and Isolation

By the end of this module, you will be able to design a multi-tenant control plane for AI workloads, isolate compute, data, and identity across tenants, and plan GPU capacity that holds under noisy-neighbour conditions.

  • Tenancy models - silo, pool, bridge, and the AI-specific trade-offs
  • Namespace and quota design for AI workloads
  • Isolation across compute, data, and identity layers
  • GPU capacity planning - fractional sharing, gang scheduling, partitioning, quota fairness
  • Schedulers for AI - open-source and commercial offerings
  • Noisy-neighbour patterns and how the control plane absorbs them
  • <b>Hands-on Lab:</b> Run a tenancy-design workshop on a candidate AI platform that picks one tenancy model, draws the isolation boundaries across compute, data, and identity, and names the GPU scheduling approach.

Serving Models and Routing AI Workloads Across Tenants

By the end of this module, you will be able to architect model serving and routing layers that decouple tenants from underlying model choices, choose an inference gateway, and design autoscaling that matches the latency profile of an LLM workload.

  • Inference gateways - open-source, commercial, and cloud-native
  • Decision axes - data residency, p99 latency, native guardrails, cost attribution granularity, semantic caching
  • Model registries, traffic-splitting, and fallback routing across providers
  • Inference autoscaling - continuous batching, queue-based scaling, cold-start engineering
  • Batch versus online serving as separate platform contracts
  • Per-tenant priority queues and request shaping
  • <b>Hands-on Lab:</b> Run a routing-strategy exercise on a candidate platform that picks an inference gateway, defines the routing policy across providers, and names the autoscaling signal the platform will use.

Sharing the Data, Memory, and Evaluation Plane

By the end of this module, you will be able to specify the shared data, memory, and evaluation plane that tenants consume rather than rebuild, design a vector-index lifecycle that respects ACLs and freshness, and run evaluation as a first-class platform service.

  • Shared retrieval infrastructure and the boundary with tenant-owned data
  • Vector store choices - managed services, open-source databases, and SQL-extension indexes
  • Vector-index lifecycle - rebuild, backfill, freshness, ACL sync
  • Feature stores for AI as platform tenancy patterns
  • Memory services as platform primitives, not per-tenant stores
  • Evaluation-as-a-service - golden-set governance, CI gating, judge calibration
  • Hands-on Lab: Specify the shared data and evaluation plane for a candidate platform including the vector store choice, the index-lifecycle policy, and the evaluation contract tenants will consume.

Enforcing Identity, Secrets, Entitlements, and Policy

By the end of this module, you will be able to enforce identity, entitlements, and policy across the AI platform surface, design authorisation-before-retrieval for RAG, and pick the policy engine that fits the platform’s enforcement points.

  • Per-tenant identity rooted in the IdP - JWT, workload identity, SPIFFE/SPIRE
  • Authorisation before retrieval - identity-aware filters before vector search
  • Fine-grained authorisation engines - OPA, AuthZEN, plus vendor-offered policy engines
  • Per-document ACL sync into vector metadata with versioning
  • Gateway-layer policy versus runtime-layer policy
  • Secret management for foundation-model credentials and tenant tokens
  • <b>Hands-on Lab:</b> Draft an entitlement-modelling design for a candidate platform that names the IdP, the authorisation engine, the ACL-sync path into vector metadata, and the policy boundary at gateway and runtime.

Running FinOps and Sustainability for AI Workloads

By the end of this module, you will be able to apply FinOps patterns that attribute and reduce AI workload cost, name the cost levers the platform owns versus the tenant owns, and report sustainability against current AI-energy standards.

  • FinOps for AI - cost attribution per tenant, per workload, per feature
  • Cost levers - model-router cascade, prompt and semantic caching, batching, output-token discipline
  • Platform-owned versus tenant-owned levers - who pulls each at which layer
  • Quantisation, distillation, and continuous batching as platform decisions
  • Sustainability standards - Green Software Foundation SCI for AI plus model-energy scoring frameworks
  • Cost-per-feature reporting and the path to budget-aware platform decisions
  • Hands-on Lab: Build a FinOps attribution worksheet for a candidate platform that maps each cost lever to its owner, its layer, and its reporting line, and adds the sustainability metric the platform will publish.

Designing Developer Experience and Golden Paths

By the end of this module, you will be able to design the developer experience that drives platform adoption, scaffold AI workloads through golden-path templates, and pick the discovery surface that lets tenants find what the platform offers.

  • SDK and CLI design for AI platforms
  • Scaffolding templates that bake in tenancy, identity, and observability defaults
  • Catalogue and discovery surfaces - Backstage and AI-specific extensions
  • Golden paths versus optional paths - documenting the trade-off
  • DevEx metrics - time-to-first-call, time-to-production, support volume
  • Anti-patterns - documentation-as-DevEx, template-without-runtime, abandoned scaffolds
  • Hands-on Lab: Design the golden path for a candidate AI platform that names the scaffolding template, the discovery surface, the SDK shape, and the time-to-production target the platform commits to.

Running the Platform as a Product

By the end of this module, you will be able to run an AI platform as a product with intake, roadmaps, and adoption metrics that prove return, write the tenant contract artefacts the platform commits to, and position the platform across cloud, on-prem, and hybrid deployment.

  • Intake, roadmaps, and the discovery process for new tenant requests
  • Service-level expectations as a written contract per tier
  • Adoption metrics - tenant count, time-to-production, retention, support load
  • Deprecation and EOL policy for platform features
  • Shared-responsibility model - where platform ends and tenant begins
  • Positioning across cloud, on-prem, and hybrid deployment paths
  • <b>Hands-on Lab:</b> Produce a capstone platform-product canvas for a candidate AI platform that includes the SLA matrix, the deprecation policy, the shared-responsibility statement, and the adoption metrics the platform reports.

Schedule

FAQ

Does the course schedule include a Lunchbreak?

Classes typically include a 1-hour lunch break around midday. However, the exact break times and duration can vary depending on the specific class. Your instructor will provide detailed information at the start of the course.

What languages are used to deliver training?

Most courses are conducted in English, unless otherwise specified. Some courses will have the word "FRENCH" marked in red beside the scheduled date(s) indicating the language of instruction.

What does GTR stand for?

GTR stands for Guaranteed to Run; if you see a course with this status, it means this event is confirmed to run. View our GTR page to see our full list of Guaranteed to Run courses.

Does Ascendient Learning deliver group training?

Yes, we provide training for groups, individuals and private on sites. View our group training page for more information.

What does vendor-authorized training mean?

As a vendor-authorized training partner, we offer a curriculum that our partners have vetted. We use the same course materials and facilitate the same labs as our vendor-delivered training. These courses are considered the gold standard and, as such, are priced accordingly.

Is the training too basic, or will you go deep into technology?

It depends on your requirements, your role in your company, and your depth of knowledge. The good news about many of our learning paths, you can start from the fundamentals to highly specialized training.

How up-to-date are your courses and support materials?

We continuously work with our vendors to evaluate and refresh course material to reflect the latest training courses and best practices.

Are your instructors seasoned trainers who have deep knowledge of the training topic?

Ascendient Learning instructors have an average of 27 years of practical IT experience and have also served as consultants for an average of 15 years. To stay current, instructors spend at least 25 percent of their time learning new, emerging technologies and courses.

Do you provide hands-on training and exercises in an actual lab environment?

Lab access is dependent on the vendor and the type of training you sign up for. However, many of our top vendors will provide lab access to students to test and practice. The course description will specify lab access.

Will you customize the training for our company’s specific needs and goals?

We will work with you to identify training needs and areas of growth.  We offer a variety of training methods, such as private group training, on-site of your choice, and virtually. We provide courses and certifications that are aligned with your business goals.

How do I get started with certification?

Getting started on a certification pathway depends on your goals and the vendor you choose to get certified in. Many vendors offer entry-level IT certification to advanced IT certification that can boost your career. To get access to certification vouchers and discounts, please contact info@ascendientlearning.com.

Will I get access to content after I complete a course?

You will get access to the PDF of course books and guides, but access to the recording and slides will depend on the vendor and type of training you receive.

How do I request a W9 for Ascendient Learning?

View our filing status and how to request a W9.

Reviews

This was a good program to get prepared for the solutions architect associate exam.

Quick to sign-up to course, and was able to garner some information from the course.

The class was very vast paced however the teacher was very good at checking in on us while giving us time to complete the labs.

Course was great and the instructor had an answer for anything that was asked during the course.

ExitCertified provided a very organized way to learn and provided materials to follow along.