Design of the Kalman Covariance Threshold kcc_recal_p_est_thresh
Abstract
When kcc_probe_rtt_decouple is active and the Kalman filter expires, the KCC algorithm compares the error covariance estimate p_est against the threshold kcc_recal_p_est_thresh to decide whether to trigger a PROBE_RTT drain. This article explains why the threshold is chosen as 25000, based on a precise analysis of the covariance steady state and the covariance-matched noise estimator. The goal is to trigger the expensive recovery only when the filter’s process model has genuinely failed, avoiding false positives caused by real measurement noise.
1. Trigger Condition
When filter_expired fires and kcc_probe_rtt_decouple is active, KCC performs a Kalman health check: if p_est > kcc_recal_p_est_thresh, the filter is deemed to have lost confidence, and a traditional PROBE_RTT drain is triggered (inflight drops to 4 packets, causing a throughput cliff); otherwise the probe is suppressed and the filter is considered healthy.
2. Dynamics of p_est
The Kalman covariance update is
p_pred = p_est + Q (prediction: add process noise)
p_new = p_pred * R / (p_pred + R) (posterior: shrink toward R)
p_est converges to a steady state determined by Q and R:
p_ss = (−Q + √(Q² + 4·Q·R)) / 2
This steady state is self-limiting. On the pure heuristic noise path, Q ≤ 2000, R ≤ 3200 (r_max_boost × base_R), and the steady state is p_ss ≈ 1700 — two orders of magnitude below 25000.
3. The Only Path That Pushes p_est Up: Covariance-Matched Noise Estimation
KCC defaults to noise_mode = 1, which adds a covariance-matched estimator:
matched_r_est += α * max(0, innov² − p_pred) (α = 0.1, upper cap 1e9)
The effective R is max(heuristic_r, matched_r_est). As long as large innovations persist, matched_r_est can grow without bound, raising the effective R and therefore the steady-state p_ss. For example, with Q=2000, matched_r_est ≈ 250k gives p_ss ≈ 21k; at 500k, p_ss ≈ 30k.
The following mechanisms do not push p_est above the threshold: directional update (positive innovations skip the state update), outlier gating (rejected samples still converge toward the heuristic R), and Q-boost (resets p_est to 1000). Only accepted samples with large innovations inflate the effective noise model through the matched-R path.
4. Evaluation of Candidate Thresholds
Using the maximum adaptive Q = 2000 (constrained by q_scale_cap), the R required to reach a given threshold p is derived from the steady-state equation:
R = (p² + p·Q) / Q
The table below lists candidate p_ss values, the required R, and its ratio to the heuristic cap of 3200:
| p_ss | R(p_ss) | R / 3200 |
|---|---|---|
| 5000 | 17.5 k | 5.5 |
| 7500 | 35.6 k | 11.1 |
| 10000 | 60.0 k | 18.8 |
| 12500 | 90.6 k | 28.3 |
| 15000 | 127.5 k | 39.8 |
| 17500 | 170.6 k | 53.3 |
| 20000 | 220.0 k | 68.8 |
| 22500 | 275.6 k | 86.1 |
| 25000 | 337.5 k | 105.5 |
Analysis:
- 5000–17500: The required R ranges from 17.5k to 170k, which falls within the normal noise levels of WiFi bursts and multi-tenant VPS contention. The filter is actually tracking measurement noise correctly; triggering a drain would be a false positive.
- 20000 (R=220k): Borderline. Most shared-VPS links stay below this, but links with persistent competing flows (cellular, congested APs) can occasionally reach it — still a false positive.
- 22500 (R=276k): Above the typical shared-VPS noise ceiling, but still at the edge of extreme yet real high-jitter paths.
- 25000 (R≈338k): The matched noise model expands to 105 times the heuristic cap, two orders of magnitude beyond. Such a level of “noise” cannot be produced by any real measurement process; the most parsimonious explanation is that the filter’s process model has failed.
5. The Choice of 25000: A Structural Boundary
An elevated p_est can arise from two distinct causes:
(a) Measurement noise is genuinely high (jitter, queue oscillations) — the filter is tracking correctly, min_rtt is accurate, and a drain would be a false positive.
(b) The process model has broken (path rerouted, NIC changed) — the filter state no longer corresponds to the physical path, and a drain is needed to re-acquire the ground truth.
The sole purpose of the threshold is to separate (a) from (b). When R must exceed the heuristic budget by 100 times to explain p_est, the reasonable diagnosis is no longer “the data is noisy” but “the filter model is wrong.” 25000 is the lowest integer value at which R crosses the 100× mark, forming a clean structural boundary.
On benign noisy paths (WiFi bursts, VPS contention), matched_r_est typically stabilizes below 200k (corresponding to p_ss < 18k). The 25000 threshold allows the decouple to correctly suppress unnecessary drains.
From an attacker’s perspective, forcing a drain requires pushing the matched R to 338k, which demands sustained large-innovation injection over approximately 20 RTTs (given the EWMA with α=0.1) and an innovation amplitude of roughly 25 ms (assuming a 200 ms RTT). Raising the threshold linearly increases the attacker’s cost, acting as an anti-DOS barrier.
6. Conclusion
kcc_recal_p_est_thresh = 25000 is the first threshold at which the required measurement noise R exceeds two orders of magnitude above the heuristic cap. This value rules out the possibility that real measurement noise could inflate p_est to such a level, ensuring that a PROBE_RTT drain is triggered only when the Kalman process model has failed, balancing both false-positive resistance and anti-attack practicality.


1万+

被折叠的 条评论
为什么被折叠?



