Measuring the Value in Generative AI Projects

Anne Fernandez | Wednesday, July 2, 2025

Measuring the Value in Generative AI Projects

Measuring the Value in Generative AI Projects

As Generative AI (GenAI) continues to permeate various industries, understanding and measuring its value is becoming increasingly crucial. In a recent webinar, Dr. Dan Grahn outlined a structured approach to effectively evaluate and make a case for GenAI initiatives. This article summarizes the key takeaways from the webinar, providing a roadmap for managers, leaders, or anyone looking to demonstrate the impact of their GenAI projects to stakeholders.

The GenAI "Tech Buzz Lifecycle"

The "Tech Buzz Lifecycle" describes the typical evolution of technologies like AI. Initially, there's a surge of excitement and hype surrounding a new technology, often driven by its novelty and perceived potential. However, this early phase eventually gives way to a more pragmatic period where the focus shifts squarely towards demonstrating tangible results and concrete value. Dr. Grahn addresses this critical transition, providing a framework for moving beyond the initial buzz and into practical application and measurable impact.

A Four-Step Process for Making a Case for AI Projects

Using this four-step process, you will be able to effectively present and evaluate GenAI projects, ensuring they are well-understood, and their benefits are clearly articulated:

  1. Define the Problem: Before diving into solutions, it's paramount to clearly articulate the core problem you're trying to solve. This involves identifying the underlying need, rather than just the symptoms. Techniques like the "Five Whys" can be incredibly helpful here, allowing you to drill down to the root cause of the problem by repeatedly asking "why." The "Five Whys" is an iterative interrogative technique used to explore the cause-and-effect relationships underlying a particular problem. The goal is to determine the root cause of a defect or problem by repeating the question "Why?" typically five times, but sometimes more or less, until the underlying issue is uncovered. Once the problem is thoroughly understood, you must also define the desired outcome and identify all relevant stakeholders who will be impacted by or contribute to the solution. For example, if a company wants to use GenAI for customer service, the problem might be "long customer wait times," the desired outcome "reduced average handling time," and stakeholders would include customers, call center agents, and management.
  2. Align with Business Goals: Any GenAI project should not exist in a vacuum; it must directly contribute to the overarching business goals of the organization. The project should aim to make things either better, faster, cheaper, or enable something new. To achieve this alignment, a deep understanding of the company's mission, vision, and how success is ultimately measured is essential. For instance, a GenAI solution that automates report generation might align with the goal of making operations "faster" and "cheaper" by reducing manual effort and turnaround time, freeing up employees for higher-value tasks.
  3. Define Scope and Objectives: With the problem and business alignment established, the next step is to clearly define the project's scope and its measurable objectives. This involves selecting relevant Key Performance Indicators (KPIs) that directly link to the project's success before starting, it's crucial to establish baselines for these KPIs to accurately measure improvement. Define clear, measurable objectives using frameworks like SMART (Specific, Measurable, Achievable, Relevant, Timebound) or FAST (Frequently discussed, Ambitious, Specific, Transparent). For example, a SMART objective for a GenAI content creation tool might be: "Increase blog post production by 20% within the next quarter while maintaining current quality standards, as measured by reader engagement metrics."
  4. Make Your Case: Even the most brilliant GenAI project won't gain traction if its value isn't effectively communicated. You must tell a compelling story that resonates with your audience, emphasizing the value proposition in their terms. Avoid technical jargon or overwhelming stakeholders with excessive information. Instead, focus on the impact and benefits. Consider framing your presentation around the problem, your GenAI solution, and the tangible results it will deliver for the business.

Deep Dive into Generative AI Metrics

The webinar further delved into specific metrics for evaluating GenAI outputs, introducing a novel and intriguing method called "LLM as a Judge." This cutting-edge approach involves using a Large Language Model itself to evaluate or even create metrics for GenAI outputs. This is achieved by breaking down complex tasks into discrete, evaluable steps, allowing the LLM to assess performance against defined criteria. An example provided was calculating a "toxicity score" by having an LLM analyze generated text for harmful content. Key categories of metrics discussed include:

  • Conversational Metrics: These metrics are vital for evaluating GenAI models designed for interactive dialogue. They assess aspects such as the logical coherence of the conversation (does it flow naturally?), adherence to a defined role (does the AI consistently act as a customer service agent?), overall helpfulness to the user, and the completeness of the conversation (was the user's query fully addressed?). For example, a good conversational metric might measure how often a chatbot successfully resolves a customer issue without human intervention.
  • Knowledge/RAG Metrics: For Retrieval Augmented Generation (RAG) systems, where GenAI combines its generative capabilities with external knowledge retrieval, these metrics are crucial. They evaluate the answer completeness (does it provide all necessary information?), correctness (is the information accurate?), and the relevance of the retrieved information (is the external data used truly pertinent to the query?). An example would be assessing a medical GenAI system's ability to provide a complete and accurate answer to a patient's question, citing relevant information from a verified medical database.
  • Safety Metrics: As AI becomes more pervasive, ensuring its safety is paramount. These metrics are designed to identify and quantify undesirable outputs such as bias (unfair or prejudiced responses), toxicity (abusive or harmful language), ungrounded attributes (information presented as fact without basis), and other forms of harmful content. Regular monitoring of these metrics is essential to maintain ethical AI deployment.
  • Agentic Metrics: For AI agents designed to perform specific actions or tasks, these metrics measure their effectiveness and efficiency. This includes assessing task completion (did the agent successfully perform the requested action?), efficiency (how quickly and resource-effectively did it complete the task?), and specificity (was the action precisely what was intended?). An example might be measuring how many steps an AI agent takes to book a flight and whether all parameters (dates, destination, etc.) were correctly applied.
  • Other Metrics: Beyond the core categories, a range of other metrics can be employed. These include instruction following (how well does the AI adhere to given prompts?), summarization quality (how accurate and concise are generated summaries?), style/tone adherence (does the output match the desired voice, e.g., formal, casual, humorous?), and custom metrics tailored to unique project needs.

Cost-effectiveness, a valuable method often borrowed from fields like healthcare (where concepts like Quality Adjusted Life Years are used), can also be applied. This involves a direct comparison of different interventions or solutions by evaluating the cost incurred versus the value generated.

For a GenAI project, this could mean comparing the cost of developing and running the AI solution against the savings from increased efficiency or revenue from new capabilities.

Considerations for Effective Metrics

For metrics to be truly effective in evaluating GenAI projects, they must possess several key characteristics:

  • Align with business goals: Metrics should directly reflect the impact on the organization's strategic objectives.
  • Be transparent and measurable: The methodology for calculating metrics should be clear, and the data points should be quantifiable.
  • Provide a complete picture: A holistic view is crucial; relying on a single metric might lead to a skewed understanding of performance.
  • Be easy to understand: Metrics should be comprehensible to both technical and non-technical stakeholders to facilitate informed decision-making.

Conclusion

Successfully implementing and demonstrating the value of GenAI projects requires a thoughtful, data-driven approach that aligns technological capabilities with clear business objectives. By defining the problem, aligning with business goals, establishing clear scope and objectives, and effectively communicating the value through well-chosen metrics, practitioners can build a compelling case for their GenAI initiatives.

You can view the original 1-hour webinar recording here for more information and examples. If you would like this (or any other) webinar topic to be customized for your team, contact us.

Generative AI Training

Ascendient Learning offers Generative AI training and upskilling programs for teams and organizations. Contact us for more information and customization requests.

Building Generative AI Applications
Deploying and Scaling Generative AI Applications
Managing Generative AI Projects
What is Explainable AI (XAI)?

What is Explainable AI (XAI)?

AI Explainability is a key component of a robust overall AI Governance strategy. Discover the five ways you can incorporate XAI into various sectors with examples.

Best Agile Certifications

Best Agile Certifications

If you want to advance in your career operating in an Agile framework, then obtaining an Agile certification is the right step forward. Learn which Agile certifications are right for you in our blog post.