The Halting Problem, Rice’s Theorem,and the Walls They Build

Eleftheria DrosopoulouApril 17th, 2026Last Updated: April 14th, 2026

0 147 9 minutes read

Why no algorithm — and no AI model — can fully decide what programs do, and what that means for every static analysis tool you’ve ever shipped.

In 1936, Alan Turing proved something extraordinary: no algorithm can exist that reliably determines whether an arbitrary program will eventually stop running or loop forever. This was not a temporary gap in our knowledge — it was a proof that the gap is permanent. Eighty-nine years later, that result still explains why your linter produces false positives, why your AI code reviewer misses certain bugs, and why formal verification tools ask you to write invariants by hand.

This article is for developers who want the theory behind the tools — not just what these limits are, but why they exist and, critically, what they imply for the static analysis landscape of 2026.

1. The Setup: What Does “Decidable” Actually Mean?

Before we get to the proof itself, it’s worth being precise about what we’re claiming. A problem is decidable if there exists an algorithm that, given any valid input, terminates in finite time and outputs a correct yes or no. A problem is undecidable if no such algorithm can possibly exist — not because we haven’t found one yet, but because it can be proven that one cannot exist.

Notice the strength of that claim. We’re not talking about computational complexity — about problems that are solvable in principle but slow in practice. Undecidability is a categorical wall. Throwing more hardware, more memory, or more model parameters at an undecidable problem doesn’t help. The impossibility is mathematical, not technological.

Turing formalised computation itself in order to make this kind of proof possible. He invented a theoretical device — now called a Turing machine — which is a minimal model of computation powerful enough to simulate any algorithm. Once you have a formal model, you can ask rigorous questions about what that model can and cannot compute.

2. The Halting Problem: Turing’s Original Proof

The Halting Problem asks: given a program P and an input I, does P eventually halt when run on I? Turing proved in his 1936 paper that no algorithm can answer this for all possible (P, I) pairs. The proof is a masterpiece of diagonal reasoning — a technique also used by Cantor to show that the real numbers are uncountable.

The Halting Problem — Proof by Contradiction
Assume a decider HALT(P, I) exists that always returns YES if P halts on I, and NO otherwise.

Now construct a new program D(P):

If HALT(P, P) returns YES → loop forever.
If HALT(P, P) returns NO → halt immediately.

Ask: what does D(D) do? If D halts when run on itself, HALT(D, D) returns YES, so D loops forever — contradiction. If D loops forever, HALT(D, D) returns NO, so D halts — contradiction.

Conclusion: HALT cannot exist. The assumption leads to a logical impossibility regardless of which branch we follow. QED.

The key move is self-reference: we construct a program that takes another program as its own input, then feeds itself to itself. This diagonal argument creates a situation with no consistent answer, proving that no general decider can handle it.

Importantly, the proof says nothing about specific programs. We can absolutely determine that many individual programs halt — compilers do this all the time for simple loops. The undecidability is about the general case: there is no algorithm that works for every possible program.

3. Rice’s Theorem: The Full Generalisation

If the Halting Problem feels like a narrow edge case, Rice’s Theorem removes that comfort entirely. Proved by H. Gordon Rice in 1953, it says something much broader:

Rice’s Theorem (informal): For any non-trivial semantic property of programs — any question about what a program does rather than what it looks like — there is no general algorithm that decides whether an arbitrary program has that property.

A semantic property is one that depends on the program’s behaviour, not its syntax. “Does this program ever output the number 42?” is semantic. “Does this program contain a for-loop?” is syntactic. Rice’s Theorem covers all semantic properties.

“Non-trivial” means the property is true of some programs and false of others. The theorem doesn’t cover the vacuous cases of “true of all programs” or “true of no programs.” Every interesting question you’d want to ask about program behaviour is non-trivial.

Together, the two results form a complete boundary. The Halting Problem is one instance of an undecidable semantic property. Rice’s Theorem tells you that all semantic properties are undecidable — without exception.

Question about a Program	Type	Decidable?
“Does it contain a try-catch block?”	Syntactic	Yes — parse the AST
“Does it always terminate?”	Semantic	No — Halting Problem
“Does it ever throw a NullPointerException?”	Semantic	No — Rice’s Theorem
“Does it produce the same output as program Q?”	Semantic	No — Rice’s Theorem
“Does it ever access an uninitialised variable?”	Semantic	No — Rice’s Theorem
“Is it free of all security vulnerabilities?”	Semantic	No — Rice’s Theorem
“Does it conform to this type signature?”	Semantic (restricted)	Sometimes — with language restrictions

That last row is worth pausing on. Strongly typed functional languages like Haskell and languages with dependent types like Coq or Lean deliberately restrict what programs can express, specifically so that certain semantic properties become decidable. The price is expressiveness; the reward is provability. That trade-off is the entire philosophy behind type theory.

4. What This Means for Static Analysis Tools

Static analysis is the practice of examining code without running it, in order to find bugs, enforce style, or verify correctness. Every major IDE ships with static analysis. So does every CI pipeline worth its salt. But given Rice’s Theorem, what are these tools actually doing?

The answer is that they operate by restricting the problem. Since the general case is undecidable, all sound static analysis tools approximate — and every approximation has one of two failure modes:

Analysis Strategy	What It Does	False Positives?	False Negatives?	Example Tools
Over-approximation (Sound)	Reports all possible errors, including impossible ones	Yes	Never	Astree, Polyspace, Infer
Under-approximation (Complete)	Only reports confirmed real errors	Never	Yes	Fuzzing, bounded model checking
Heuristic (Neither)	Balances usability against both error types	Yes	Yes	ESLint, SonarQube, Semgrep, CodeQL

Facebook’s Infer, for example, uses a technique called bi-abduction derived from separation logic to reason about heap memory. It is sound for the properties it checks — it won’t miss a null dereference it has committed to checking — but it produces false positives and doesn’t cover all possible properties. That’s not a bug in Infer; it’s the only mathematically coherent choice available.

Meanwhile, tools like ESLint operate with heuristics: pattern-matching on ASTs, dataflow through a bounded number of steps, known-bad code signatures. These are neither sound nor complete, but they are useful — and usefulness, not completeness, is the engineering goal.

Static Analysis Tools: Soundness vs. False Positive Rate

Bubble size represents approximate deployment breadth. Tools closer to the origin are more practically useful day-to-day; tools toward the top-right sacrifice noise tolerance for guarantees.

5. The AI Problem: Why Larger Models Don’t Change the Math

In 2026, the most discussed static analysis tools are increasingly AI-powered: GitHub Copilot’s review suggestions, Amazon CodeGuru, Cursor’s inline diagnostics, and a growing constellation of LLM-backed linters. A natural question follows: do these tools escape the limits imposed by Rice’s Theorem?

The answer is no — and understanding why clarifies what AI code analysis can and cannot do.

Rice’s Theorem applies to any computational procedure, not just traditional algorithms. A transformer model is a fixed function mapping inputs to outputs. Running it constitutes a computation. Therefore, if it claimed to decide an arbitrary semantic property of programs — with no false positives and no false negatives — it would constitute a decider for that property, which Rice’s Theorem proves cannot exist.

What this means practically: An AI model that claimed to reliably detect all possible null dereferences in all possible programs would be claiming to solve an undecidable problem. Either it misses some (false negatives), flags non-issues (false positives), or both. This is not a current limitation of AI capability — it is a permanent mathematical boundary.

What AI does change is the nature of the approximation. Traditional static analysis tools use handcrafted rules and formal abstractions. AI tools use patterns learned from billions of lines of code. Neither approach is complete, but they have very different failure profiles:

Dimension	Traditional Static Analysis	AI-Powered Analysis (2026)
Theoretical basis	Formal semantics, abstract interpretation, dataflow	Statistical patterns from training data
Error type bias	Predictable (over- or under-approximation by design)	Unpredictable — depends on training distribution
Novel bug classes	Misses bugs not in its rule set	May generalise; may hallucinate new patterns
Auditability	High — rules are inspectable	Low — weights are not human-readable
Rice’s Theorem status	Subject to it; acknowledged explicitly	Subject to it; rarely acknowledged

The practical implication is that AI analysis tools are best understood as extremely sophisticated heuristics. They can catch bugs that pattern-matching rules miss. They can suggest fixes with context sensitivity that no rule system achieves. But they cannot, even in principle, achieve completeness. Every benchmark that evaluates them — including SWE-bench and emerging security benchmarks — captures an approximation of capability, not a proof of coverage.

Furthermore, AI models introduce a subtly different failure mode: confident incorrectness. A formal sound analyser that flags a false positive at least knows it’s making a conservative approximation. An LLM that hallucinates a security vulnerability explanation may do so with high fluency and apparent certainty. For safety-critical applications, this distinction matters enormously.

6. Formal Verification: Escaping Undecidability Through Proof

If undecidability prevents algorithms from automatically deciding semantic properties, how do formal verification tools like Coq, Lean, and Isabelle actually work? The answer is that they shift the burden — they don’t decide properties automatically, they help humans construct machine-checked proofs of those properties.

The key distinction is between verification and decision. Deciding whether a property holds requires an algorithm that works on any program. Verifying a specific property of a specific program, with a human-written proof, is a different task — one that can succeed even for undecidable properties, as long as a proof exists and someone writes it.

Real-world example: The seL4 microkernel is a formally verified OS kernel whose correctness — including memory safety, absence of deadlocks, and adherence to its security policy — has been mechanically checked in Isabelle. The verification took approximately 11 person-years of proof engineering. Undecidability didn’t prevent this; it just meant a human had to specify and prove the invariants, rather than a tool discovering them automatically.

This is precisely why formal verification requires human-specified invariants, loop contracts, and pre/post conditions. These annotations are the human’s contribution to the proof. Without them, the tool has no way to know what you intend the program to do — and by Rice’s Theorem, it cannot infer that from the code alone.

Assurance Level vs. Engineering Effort Across Verification Approaches

Effort is normalised relative to writing the program itself. Assurance reflects the theoretical coverage guarantee, not bug-finding rate in practice.

7. Code Coverage and the Completeness Illusion

One final implication is worth spelling out directly, because it catches even experienced engineers off guard. 100% code coverage does not mean your test suite covers all program behaviours.

Code coverage measures whether a line or branch was executed during a test. It says nothing about whether the execution explored all reachable paths, encountered all possible inputs, or triggered all observable behaviours. A function can have 100% branch coverage and still behave incorrectly on inputs your tests never considered.

Rice’s Theorem provides the theoretical underpinning for why this gap is irreducible. Deciding whether a test suite covers all semantically distinct behaviours of a program is itself an undecidable problem. No coverage tool, however sophisticated, can close that gap in general — which is exactly why coverage is a proxy metric, not a correctness certificate.

Practical takeaway: Use coverage as a floor, not a ceiling. 80% line coverage is a meaningful minimum signal. But “we have 100% branch coverage” should not be read as “we have verified correctness.” The two claims belong to entirely different theoretical categories.

Demonstrating the limits: a halting program a simple checker gets wrong

The following Python snippet is self-contained, requires Python 3.x, and illustrates why even straightforward-looking termination is hard to determine statically. It implements the Collatz sequence — a computation no one has yet proven always halts, despite being expressible in four lines of code:

# Python 3.x required — no external packages needed
# Run with: python collatz.py

def collatz(n):
    """
    Collatz conjecture: for any positive integer n,
    this sequence is conjectured to always reach 1.
    No general proof exists. A static analyser cannot
    decide termination for this function — neither can
    any human, as of 2026.
    """
    steps = 0
    while n != 1:
        n = n // 2 if n % 2 == 0 else 3 * n + 1
        steps += 1
    return steps

# These all terminate quickly
for start in [6, 27, 871]:
    print(f"collatz({start}) → {collatz(start)} steps")

# Output:
# collatz(6)   →  8 steps
# collatz(27)  →  111 steps
# collatz(871) →  178 steps

No static analysis tool, no AI model, and no existing mathematical proof can tell you that collatz(n) terminates for every positive integer n. This is not a gap in tooling maturity — it is an open problem in mathematics that Rice’s Theorem tells us cannot, in general, be mechanically decided.

8. What We’ve Learned

The Halting Problem proves — by contradiction, via Turing’s diagonal argument — that no algorithm can decide whether an arbitrary program halts for all possible inputs. This is a permanent mathematical boundary, not a temporary engineering gap.
Rice’s Theorem generalises this completely: every non-trivial semantic property of programs — anything about what a program does — is undecidable. Security, termination, output correctness, null safety: all of them, without exception.
Static analysis tools escape undecidability by restricting the problem: sound tools over-approximate (false positives, no false negatives), complete tools under-approximate (false negatives, no false positives), and heuristic tools like ESLint and SonarQube trade formal guarantees for practical usefulness.
AI-powered code analysis is subject to exactly the same theoretical limits. Larger models change the shape of the approximation — and can be dramatically better heuristics — but cannot achieve completeness for any non-trivial semantic property.
Formal verification (Coq, Lean, Isabelle, seL4) sidesteps undecidability by having humans write proofs rather than asking tools to discover them automatically. Human-specified invariants are not a workaround — they are a necessary contribution to the proof.
Code coverage is a proxy for testing effort, not a correctness certificate. 100% coverage cannot, in principle, guarantee all semantic behaviours are tested — Rice’s Theorem explains precisely why.

The Halting Problem, Rice’s Theorem,and the Walls They Build

1. The Setup: What Does “Decidable” Actually Mean?

2. The Halting Problem: Turing’s Original Proof

3. Rice’s Theorem: The Full Generalisation

4. What This Means for Static Analysis Tools

5. The AI Problem: Why Larger Models Don’t Change the Math

6. Formal Verification: Escaping Undecidability Through Proof

7. Code Coverage and the Completeness Illusion

Demonstrating the limits: a halting program a simple checker gets wrong

8. What We’ve Learned

Thank you!

Eleftheria Drosopoulou

Thank you!

1. The Setup: What Does “Decidable” Actually Mean?

2. The Halting Problem: Turing’s Original Proof

3. Rice’s Theorem: The Full Generalisation

4. What This Means for Static Analysis Tools

5. The AI Problem: Why Larger Models Don’t Change the Math

6. Formal Verification: Escaping Undecidability Through Proof

7. Code Coverage and the Completeness Illusion

Demonstrating the limits: a halting program a simple checker gets wrong

8. What We’ve Learned

Thank you!

Related Articles

Thank you!