Building a Realistic Pathway to Production-Ready GenAI

BMC HelixGPT helps overcome GenAI challenges in service management and AIOps by addressing data fragmentation, automation risks and security concerns.

Oct 29th, 2024 11:08am by Henry Bassey

Featued image for: Building a Realistic Pathway to Production-Ready GenAI

Featured image by Getty Images for Unsplash+.

Generative AI (GenAI) is everywhere. It promises to transform IT operations (ITOps) by automating workflows, resolving production issues faster and streamlining our work by eliminating manual tasks. But is this vision of GenAI (especially in service management and AIOps) delivering on the promises? Beneath the excitement lies a fundamental question about the nature of GenAI itself.

At its core, GenAI operates differently from the way humans approach problem-solving. When humans reason, they start with first principles — truths derived from deep understanding and logic. GenAI, however, doesn’t create solutions based on these principles. Instead, it processes vast amounts of existing data to generate patterns without understanding their concepts.

This distinction becomes critical in service management and AIOps, where precision and accuracy are non-negotiable. Relying on a system that cannot grasp the contextual meaning of data could lead to errors. Even worse, given the sensitivity of IT data, GenAI’s approach raises concerns about security and privacy. Can IT teams trust an AI that doesn’t fully understand the data it handles? And how do they ensure it won’t introduce risks through data leaks or inaccuracies?

With the growing pressure to automate, IT teams need solutions that remediate GenAI’s inherent limitations while harnessing its full potential.

Challenges of Adopting GenAI in Production

GenAI holds the potential to transform IT operations, but we must address several pressing challenges, particularly around data accuracy, security and automation. These obstacles can limit GenAI’s effectiveness if they’re not managed thoroughly in production environments.

Data Accuracy and Integration Issues

GenAI models depend heavily on clean, structured and up-to-date data. In IT environments where data such as metrics, events and logs are often inconsistent, noisy or incomplete, the accuracy of AI-driven insights can be severely compromised. Fragmented data sets from isolated systems only add complexity, as inconsistencies between formats and timestamps increase the likelihood of errors.

AIOps and observability amplify these challenges, requiring real-time data processing across high-velocity streams from multiple sources. When key data points are missing or incorrectly labeled, GenAI may fail to detect emerging issues, such as bottlenecks or performance degradations, which could escalate into systemwide failures.

Security and Privacy Concerns

As GenAI models handle vast amounts of operational data, they become prime targets for adversarial attacks and data breaches. In production environments, a single breach of sensitive information can lead to severe operational, legal and financial consequences. GenAI models are vulnerable to adversarial inputs, where attackers feed manipulated data to alter the model’s behavior, potentially leading to incorrect decisions.

Automation and the Risk of Inaccurate Actions

Automation, driven by GenAI, promises faster incident resolution and operational efficiency. However, this same automation introduces risks if it’s not closely monitored. GenAI can trigger inappropriate or ineffective responses, particularly in dynamic IT environments, when it misinterprets data patterns.

Trust and Accountability Issues

One of the most significant challenges we face with GenAI is its opacity. AI models often operate as black boxes, providing conclusions without clear explanations, which can erode trust, especially during critical incidents requiring quick, decisive action.

Lack of interpretability not only erodes trust but introduces accountability concerns. When something goes wrong, we need a clear audit trail to understand the decision flow that led to the failure. Building trust in GenAI systems requires ensuring that AI-driven insights are explainable and backed by clear reasoning, allowing teams to validate model outputs against their understanding and empowering them to act decisively and with confidence.

How BMC Helix Addresses These Challenges

To mitigate the challenges of deploying GenAI in mission-critical environments, BMC employs a multilayered approach grounded in reliable data management, security and intelligent AI model design. BMC HelixGPT is BMC’s strategic approach to generative AI, enabling simplified, actionable insights and automated resolutions across the BMC Helix platform. BMC HelixGPT helps close the gap between the promise of GenAI and practical, secure and effective implementation in enterprise IT operations.

Data Integration and Curation: The Foundation of HelixGPT

At the heart of BMC HelixGPT’s effectiveness is the BMC Helix platform, which aggregates and curates observability and service management data across multiple sources, such as topology, metrics, logs, incidents and changes. This platform creates a reliable foundation for AI services to thrive.

Through native ingestion, open integrations, service modeling and data reconciliation, the platform addresses data fragmentation challenges, providing GenAI with accurate and up-to-date information and improving the overall effectiveness of AI-driven automation and remediation. BMC HelixGPT’s design approach involves:

A Composite AI Architecture

The architecture supporting HelixGPT consists of a Composite AI framework integrating specialized AI modules that operate in a pipeline to produce focused, task-specific outputs. This modular approach narrows the scope of GenAI, enabling HelixGPT to tackle precise operational challenges rather than generating broad, generalized insights. Similar to how the human brain processes different types of information, Composite AI combines multiple data types — such as logs, metrics and changes — to detect anomalies and identify root causes, providing more accurate and actionable results.

BMC Composite AI framework

Data Curation and Actionability

AI modules such as the ranking expert drive data curation within the platform. This module evaluates ticket data to measure various critical factors, including the actionability of the ticket, the human pain factor (based on bounce rates, duration and sentiment), and the risk level (e.g., for change requests). These insights allow BMC HelixGPT to prioritize and generate more meaningful, context-aware outputs.

As BMC’s chief AI architect Erhan Giral said, “Any insightful, interesting aggregation or correlation we do for data is now so much more accessible to the end user,” making GenAI highly responsive to the needs of IT teams.

Fine-Tuning and Model Iteration

BMC HelixGPT continuously learns from the outcomes of the resolutions it processes. For instance, the system automatically compares the resolutions suggested by its best action recommendation (BAR) capability with those logged by IT teams.

Giral explained, “If BAR was ignored and the actual resolution was different… we now have a pair. This information will allow us to use direct preference optimization (DPO) to refine the model to generate insights that align more closely with IT operations’ practical realities.”

Continuous Learning Without Human Input

A major strength of BMC HelixGPT is its ability to autotune based on implicit feedback, keeping the model evolving without requiring continuous human intervention. Traditional models often rely on explicit feedback mechanisms, where users are asked to rate the quality of the AI’s suggestions.

BMC HelixGPT, on the other hand, continually uses the outcomes of actual IT processes — such as whether a suggested action was followed or ignored — to improve its understanding of the environment. Here’s how implicit feedback works behind the scenes.

Situation fingerprinting: BMC HelixGPT generates a BAR for each IT situation it identifies. The system then tracks whether the IT team follows the recommended steps or chooses a different approach.
Feedback integration: If the resolution differs from the BAR, the AI learns from this discrepancy, effectively using the ignored recommendation as negative feedback to refine future suggestions. This mechanism allows the AI to capture and learn from real-world decision-making, improving its recommendations without explicit feedback.
DPO-based training: These comparisons feed into the DPO model over time, allowing BMC HelixGPT to rapidly learn from successes and failures. This iterative process helps the model stay in tune with the latest operational strategies of the IT team, making it more effective at resolving future incidents.

How Do We Get Machines to Reason Like an SRE?

A significant innovation in BMC HelixGPT is the application of tenant-specific fine-tuning. While large language models (LLMs) offer a baseline of generative capabilities, they often fall short in real-world IT operations where the nuances of individual systems and workflows are valuable. Rather than relying solely on general knowledge derived from LLMs, HelixGPT learns from tenant-specific historical data, including ticket logs and chat sessions.

This specialized training allows AI to mimic how human experts, such as site reliability engineers (SREs), diagnose and resolve complex IT issues. Unlike a generic LLM, which tries to apply broad computer science principles, HelixGPT models real-world human reasoning. Giral noted,

“When we process ticket logs, runbooks and chat/bridge sessions, we learn how actual humans reason about the proposed problem.” This approach enables BMC HelixGPT to create more precise and actionable insights, drawing from practical problem-solving methods observed in real-world environments.

Data Security and Privacy: Protecting Sensitive Information

One of the most pressing concerns with GenAI deployment is potential data security and privacy risk. BMC HelixGPT includes several key mechanisms designed to protect tenant-specific data:

Data isolation: Tenant data is always processed in isolation, so sensitive information never crosses the boundaries of an individual tenant’s environment. This practice shields each organization’s data from potential leaks or model contamination.
Separation of raw data and curriculum: An essential aspect of the fine-tuning process is the creation of a curriculum, a structured set of data points that guides the AI’s learning process. The curriculum is separated from the raw data, allowing for the safe and efficient training of models without exposing sensitive personal information (PII).
Compliance with global regulations: Meeting the highest standards for data protection, including compliance with GDPR, HIPAA and other regional regulations, allows enterprises to deploy GenAI solutions confidently, knowing that their data is handled in line with the strictest legal requirements.

Wrapping Up

BMC Helix offers a structured, multi-layered approach to harnessing GenAI’s power in IT service management and operations. By addressing data fragmentation, integrating AI/ML for anomaly detection and root cause isolation, and ensuring security and privacy, BMC HelixGPT provides a balanced pathway to production-ready GenAI.

BMC HelixGPT, with its composite AI architecture and tenant-specific fine-tuning revolutionizes traditional incident management and response using real-time data processing, accurate insights and continuous learning to drive effective outcomes.

Learn more about BMC HelixGPT or book a demo with BMC’s sales team to explore how it can elevate your IT operations.

Henry Bassey holds an MBA from the Quantic School of Business and Technology with a solid technical background. He's a strong advocate for innovation and thought leadership in all the content he creates.