The Future of AI in Software Quality: How Autonomous Platforms are Transforming DevOps

Continuous integration and continuous development (CI/CD) pipelines have transformed the practice of delivering software, resulting in higher quality and faster results. This is the way in which the infrastructure that supports CI/CD enables teams to shift left many test and verification tasks. In doing so, they can avoid costly fixes that would otherwise need to be applied late in the project. Agentic artificial intelligence (AI) now presents an opportunity to take these workflows even further.

Many bugs can be eliminated early on through more effective unit testing and source code validation. Proven over decades of use, static analysis tools can scan the entire codebase of each project and assess whether it meets quality objectives and coding standards. Compliance with coding standards not only improves maintainability and quality. It also helps avoid safety issues and security vulnerabilities through mechanisms like buffer overflows.

Other tools focus on integration testing, checking how modules interact with each other and external services through published APIs. Coverage tools fed with data from those tests keep track of how well the growing testbed exercises the overall body of code.

CI/CD environments such as Jenkins provide a means to automate many of the tasks that revolve around the use of code quality tools. For example, scripts will run regression tests on each build when a programmer checks in a revised module. But the outputs from those automated steps require a great deal of developer time to parse, triage, and provide fixes when problems are detected. This is where generative AI can make invaluable contributions.

Test-time AI scaling, which revolves around the increased capabilities large language models (LLMs) show when allowed to iterate on problems, led to the creation of specific agentic AI solutions focused on code quality. They are enabling novel workflows to be built around CI/CD pipelines.

One known issue that agentic AI can have when deployed on generic business tasks is that there are few inherent checks on the correctness of the LLM outputs. Though agents could be tasked with the job of analyzing and fixing code, it is difficult to guarantee that the raw LLM outputs will always be correct. This is where the formalized structure of the CI/CD pipeline and the capabilities of existing code quality tools play vital roles. The same tools used to highlight quality issues in source code, or that execute and analyze the outcome of unit tests can help agents generate results that are correct. Those checks ensure agents fix problems the initial tests find.

Using MCP to combine LLMs and code analysis tools provides the means to build agents that can iteratively find and fix issues inside source code as soon as developers check in individual modules. In doing so, they can ensure shift-left strategies for early verification and validation are successful. In such a pipeline, one or more agents will inspect test reports and direct AI modules to triage and fix problems before submitting the altered code to tests that determine whether the module is fit to check in.

The workflows enabled by agentic AI are potentially extremely powerful. But organizations need to build the agentic AI workflows that best suit their development needs. Some will want humans in the loop to check progress and determine that the agents do not miss important criteria. Others will want to maximize automation. They will, at the same time, be keen to ensure the AI works on practical issues rather than performing unnecessary changes. Therefore, incremental adoption of tool-supported agentic AI is important.

The introduction of agentic AI can start small, focusing on specific tasks at first until the team builds enough confidence in the automation to expand its roles. A good place to start is with fixes aimed at code quality and security improvements. Using static analysis tools will highlight deviations from coding guidelines for quality and security. Agents can step in and attempt to fix the quality issues automatically. Existing unit tests combined with further runs of static analysis check for errors. If all the test results are green, the agent can check in the new code and generate a report of its changes and the reasons for making them. That provides an audit trail that human developers can follow to determine how well the agent is performing.

Once successful, agents can be prompted to design unit tests for code modules within the CI/CD pipeline for regression testing. MCP connections to code coverage tools can help determine the degree to which these tests help meet project metrics and overall code coverage. For example, if test coverage falls below 80%, agents can look at the codebase to determine where holes need to be filled. This strategy leads to an automated CI/CD pipeline that is dynamic and goal-driven.

When implementing an agentic workflow, organizations need to consider how best to deploy the components of each agent. Some uses will allow for cloud-hosted LLMs. But privacy and security concerns will point to some or all of the agent infrastructure being confined to on-prem or self-hosted cloud environments.

As the system scales up, managers should also consider how best to track progress. A combination of reporting and dashboards will provide the level of information needed to decide if agent prompts need further optimization or where additional tasks can be assigned to LLMs. Reports provide the vital audit trails that track changes of code in the project. Dashboards provide important information on quality trends, showing if agents need to be refined to maintain targets such as code coverage levels.

By harnessing agentic AI and the additional logical power provided by proven software analysis tools through MCP, development teams can massively improve productivity and continue the trend to shift left verification and validation tasks. In doing so, they can deliver both higher quality and lower project costs.