Reading, UK
+44 11893 23001

USE CASE

Creating a custom Rovo Agent to Streamline Incident Management

Case study at a glance

The Challenge

  • Significant delays to root cause identification and resolution as engineers must manually investigate DevOps incidents
  • Complex AWS environments require deep technical expertise to identify the exact component causing an issue
  • Lack of a centralised, conversational interface for incident investigation
  • Frustrated users and reduced operational efficiencies
  • Communication gaps and slower decision-making
  • Increased risk of human error during high-pressure incidents due to engineers manually gathering and interpreting data from disparate sources

The Solution

  • Developed a custom Forge-built Rovo Agent to automate and consolidate root cause analysis for AWS incidents within Jira Service Management (JSM).
  • The Agent connects securely to AWS, gathering and interpreting diagnostic data to detect failures and anomalies.
  • Provides a conversational interface that allows engineers to ask questions and receive clear, actionable summaries and recommendations directly within the Jira ticket.
  • Updates incident tickets in real time with new findings, ensuring all stakeholders have immediate access to the latest information.

The Benefit

  • Faster Incident Resolution
    Dramatically reduces mean time to resolution (MTTR) by automating initial root cause analysis and delivering concise, actionable insights.
  • Improved Reliability
    Minimises service disruptions and downtime, directly enhancing the availability and reliability of business-critical systems.
  • Operational Efficiency
    Frees up engineering time by eliminating manual data gathering, allowing teams to focus on resolving issues rather than diagnosis.
  • Transparency & Communication: Shared visibility, clear summaries and real-time ticket updates promote informed, data-driven decision-making.
  • Reduced Human Error: Automation lowers the risk of mistakes during high-pressure incidents, improving the accuracy and confidence of incident response.
  •  

The full story: In depth

Background

Here at AC, our team wanted to address the challenges faced by DevOps and Cloud operations teams when an AWS service or resource experiences an issue.

To tackle this, we developed a custom, Forge-built Rovo DevOps Agent – designed to consolidate Root Cause Analysis (RCA) and streamline incident management.

The Challenge

When an AWS service or resource experiences an issue, it often results in a huge burden on IT teams. With engineers forced to sift through multiple metrics and consoles to identify the root cause, it’s a time-consuming, manual investigative effort, and can delay incident resolution.

The additional complexity of some AWS environments, which often consist of numerous interconnected services and resources, makes it even more difficult to identify the specific component causing an issue.

This process requires expertise, patience, and a deep understanding of system architecture. Without a centralised, conversational interface for incident investigation, teams struggle to collaborate and share information effectively.

The Solution

At Automation Consultants, we recognised these pain points as a significant barrier to operational efficiency and effective incident management.

The solution? Developing an intelligent, custom Rovo Agent, to streamline root cause analysis by consolidating all relevant information into a single interface.

Built in Forge, and harnessing powerful AI, the Agent automates the initial stages of incident investigation by quickly providing engineers with a summary of the root cause and actionable next steps to resolve the issue.

How this custom Rovo Agent works

When an incident is raised in Jira Service Management (JSM), the Agent springs into action by identifying which AWS resource (such as an EC2 instance or Lambda function) is linked to the incident. It then connects to AWS through secure API calls and gathers a wealth of diagnostic data, including instance states, CloudWatch metrics, error logs, and other relevant data collected.

What makes this Agent unique?

This Agent does not just collect technical information; it also analyses it to detect failures and anomalies (such as a stopped EC2 instance, a spike in error rates, or a malfunctioning Lambda function).

It then structures these findings into a clear, digestible summary, pinpointing the likely root cause and highlighting the affected components. This summary is delivered directly to the DevOps team through the Jira ticket, ensuring that engineers have immediate access to actionable insights without having to manually trawl through multiple AWS dashboards.

Conversational interface

This Agent also supports a range of conversational prompts, allowing users to ask questions. In response, it can produce detailed reports on the cause of service breakdowns, interpret information from JSM, Assets, and AWS, and even provide tailored recommendations for remediation.

Additionally, the Agent can update the incident ticket with new findings as more data becomes available, keeping all stakeholders informed in real time.

The Benefits

By automating the initial stages of root cause analysis, we anticipate that the agent would dramatically reduce an incident’s mean time to resolution (MTTR).

Engineers would no longer need to manually gather and interpret data from disparate AWS consoles. Instead, they would receive a concise, accurate summary of the problem with suggested next steps, all without ever having to leave the JSM incident ticket.

This streamlined process improves the accuracy of the agent in identifying problems, and enhances user satisfaction with the agent’s insights

This acceleration in incident response would lead to fewer and shorter service disruptions, directly improving the reliability and availability of business-critical systems.

The reduction in manual AWS investigations would both free-up valuable engineering time, and minimise the risk of human error during high-pressure incidents. Teams would be able to focus their expertise on resolving the underlying issue, rather than on data gathering and initial diagnosis, further contributing to operational efficiency.

The agent also enhances transparency and collaboration. By updating the Jira ticket with detailed findings and explanations, the agent ensures that all relevant stakeholders, from engineers to IT managers, have access to the same up-to-date information. This shared visibility fosters better communication, more informed decision-making, and a stronger culture of accountability.

Our accreditations

SOLUTIONS

Rovo AI Accelerator

If a custom Rovo Agent can dramatically cut MTTR for incident resolution, imagine what a full AI adoption strategy could do for the rest of your business. With AC’s Rovo AI Accelerator, we help you identify high-impact use cases, design bespoke agents, and roll out Rovo successfully across your teams.

Ready to explore what Rovo Agents could do for your team?

If you’d like guidance on streamlining your incident management processes or want to explore building a custom Rovo Agent, our experts are here to help.