How Static Analysis Can Save Your Software

Neglecting static analysis in code review, especially for complex C++ projects, could compromise your project’s success.

Mar 8th, 2024 12:00pm by Abbas Sabra

Featued image for: How Static Analysis Can Save Your Software

Featured image by GMB Fitness on Unsplash.

At the start of my career, I quickly learned that for developers, efficiency is key. I witnessed so much productivity lost to tooling issues that I and other developers could have easily avoided by using the right processes. We put the vast majority of our time — I’d estimate 80% — toward debugging.

Discovering a bug that took me two days to fix opened my eyes to numerous other instances where similar issues occurred. But the solution was as easy as writing a simple script. That solution saved me, and probably other developers, from other multi-day debugging sessions.

The entire ordeal inspired me to look into productivity enhancements and automated solutions geared toward detecting code issues or identifying patterns that could potentially cause them. That’s how I became hooked on static analysis.

Coders can’t continue to ignore static analysis if they want to churn out clean code — code that is consistent, intentional, adaptable and responsible.

What Is Static Analysis?

According to OWASP, static analysis tools “attempt to highlight possible vulnerabilities within ‘static’ (non-running) source code.” It’s usually done during the implementation phase of the code review process.

Static analysis helps you find bugs and the patterns leading to them — all without compiling your code. It contrasts with dynamic analysis, which works at runtime. Static analysis can help you understand why an issue exists and if there’s a better way to handle it.

Think of it as the ultimate “shift left” move: By dealing with things early, they won’t become a costly problem later. I specialize in C++, which is a fast-moving target, and even the most experienced developers may struggle to keep up. But static analysis helps. Using it shouldn’t be disputable, as it’s simply a better way to create top-quality code.

The Difficulty of C++ Tooling

C++ is a lot more complicated than most other programming languages. Using static analysis tools, code inspection tools, integrated development environments (IDEs), or even a syntax highlighter or code formatter can introduce problems that other programming languages don’t experience. C++ developers understand that parsing the language is time-intensive and complex due to its grammatical particularities. Adding on a preprocessor and compiler extensions only increases the complexity.

Clang has helped somewhat, but it’s not a magic bullet. While Clang tools can get small, limited-scope projects pretty far, more complex and performance-sensitive tools with a wide range of use cases have extra complexities that Clang tooling can’t fully cover.

For example, even knowing that Clang can be slow to implement new language features, you have to account for incomplete code and exotic compiler extensions. If Clang assumes your code is complete and compiles based on that expectation, you will have a compiler error. But if you’re doing something that needs to understand the code while you’re writing it, you’ll see an added layer of complexity. Clang also has performance limitations that differ from those of interactive IDE-based tools.

Languages with a more regular syntax than C++ are easier to work with. That’s why developers often find IDEs for Java or C# simpler and more productive to use. And with “modern” C++, things are even worse. Tools now need a full-blown C++ interpreter just to parse the code. If you’re using C++ tooling, expect frequent bottlenecks, and know in advance that backward compatibility means there’s no moving forward.

The Problem of Path Explosion

While static analysis is a means of pattern detection, fixing an actual bug (for example, dereferencing a null pointer) is much harder, albeit possible. It becomes mathematically difficult to track exponentially increasing possible states. We call this “path explosion.”

Say you’re writing code that, given two integers, divides one by the other, and there are various failure modes depending on the integers’ values. But what if the denominator is zero? That results in undefined behavior, and it means you need to look at where those integers came from, their possible values and what branches they took along the way.

If you can see that the denominator is checked against zero before the division — and branches away if it is — you should be safe from division-by-zero issues. This theoretical stepping through stages of code is called “symbolic execution.” It’s not too complicated if the checkpoint is fairly close to the division process, but the further away it gets, the more branches you must account for.

Crossing the function boundary gets even trickier. But once you have calls from other translation units, the problem becomes intractable in the general case. Some specific cases may allow a whole program analysis to catch cross-translation unit issues, but generally, that’s unrealistic. You’d have to execute the whole program in the analyzer to find all possible ranges of inputs. And you might not even have all the source code.

Despite these issues, symbolic execution does have value. It can, and does, detect complex bugs in established codebases, and it’s one of the many techniques the development community uses.

You can run dynamic analyst tools like Valgrind and Clang sanitizers (including MemorySanitizer, AddressSanitizer and UndefinedBehaviorSanitizer) alongside static analysis, but they can typically only detect issues encountered during runtime. This is where the value of static analysis comes in. Detecting buggy patterns early in the process is critical to prevent more sizable, expensive issues that can derail your project entirely.

Automatic Analysis Boosts Server-Based Tools

Server-based tools are fantastic solutions, but they require configuring, iterating them into your toolchain and maintaining them over time. With C++ being so complicated, it often makes those processes more involved. Having dedicated DevOps resources makes this a non-issue, but it can be a blocker if this is a part-time responsibility for a developer or if they’re an open source author.

My team at Sonar wanted to eliminate that complexity and provide a zero-configuration option for systematically incorporating static analysis across a project. I thought C++ would make this an impossible task. But last year, we had a breakthrough — and I can say that our automatic analysis for C++ has exceeded our expectations.

We give SonarCloud access to source code and ask it to perform an analysis. It helps us figure out the most likely build options and dependencies, then analyzes the code on that basis. This minute-long process is now our default recommendation for C++ analysis. Our user data proves the value: the accuracy is a whopping 95%. Only 100% is good enough for compilation, but this is still an incredible feat.

Static Analysis Is a Necessity, Not an Option

With these technologies available, employing static analysis could save your project. I’m excited to see how it further opens up the development community and what it does for open source projects.

Neglecting static analysis, particularly in C++, could compromise your success. Employing this technology to enable your best work isn’t an option; it’s something developers must do to create top-quality software.

Abbas Sabra is a Principal Software Engineer at Sonar, where he has discovered the ideal platform to pursue his passion for C++ development, development processes and tooling. His career began in the financial industry, where he identified inefficiencies within the C++...