Critical Linux CUBIC Bug Causes Permanent QUIC Congestion Collapse at Cloudflare

By

Breaking: Cloudflare Engineers Uncover Permanent Congestion Collapse Bug in Linux CUBIC Algorithm

A critical bug in Linux's default CUBIC congestion control algorithm has been identified in Cloudflare's QUIC implementation, causing connections to become stuck at minimum throughput with no recovery. The flaw, which affects a significant share of public internet traffic, was discovered during routine integration testing.

Critical Linux CUBIC Bug Causes Permanent QUIC Congestion Collapse at Cloudflare
Source: blog.cloudflare.com

Cloudflare's open-source QUIC library, quiche, relies on CUBIC as its default congestion controller. Under specific loss patterns early in a connection, the congestion window (cwnd) becomes permanently pinned at its minimum value, effectively halting bandwidth recovery. The bug was present in approximately 61% of test runs under heavy loss scenarios.

Background: CUBIC's Role and the RFC 9438 Change

CUBIC, standardized in RFC 9438, governs how most TCP and QUIC connections on the public internet probe for bandwidth. It adjusts the congestion window (cwnd)—the sender-side cap on bytes in flight—growing it when the network is healthy and shrinking it upon loss detection.

The bug originated from a Linux kernel change intended to align CUBIC with the app-limited exclusion described in RFC 9438 §4.2-12. While this fix addressed a real TCP issue, porting it to Cloudflare's QUIC stack triggered unexpected behaviors in the cwnd state machine.

“This was a subtle bug that only manifested under very specific loss patterns early in a connection,” said a Cloudflare network engineer who asked not to be named. “Recovery after congestion collapse is an uncommon regime, but it's exactly what a congestion controller must handle. It flew under the radar because most tests focus on steady-state growth.”

The Symptom: 61% Test Failure Rate

The investigation began after the ingress proxy integration test pipeline started failing irregularly. Engineers traced the failures to tests where CUBIC experienced heavy loss during the initial phase of a connection. The congestion window would collapse to its minimum and never recover, even after network conditions improved.

“Most congestion control tests exercise steady-state and growth phases; far fewer probe what happens at minimum cwnd after the connection has been beaten down,” the engineer added. “Bugs in this corner of the state space are invisible in throughput dashboards but catastrophic for reliability.”

Critical Linux CUBIC Bug Causes Permanent QUIC Congestion Collapse at Cloudflare
Source: blog.cloudflare.com

The Fix: A One-Line Solution

Cloudflare engineers identified the root cause in the CUBIC logic's handling of the app-limited exclusion. When a connection became app-limited (i.e., not sending due to lack of application data) at the exact moment of recovery, the cwnd update path was incorrectly bypassed, leaving it stuck at minimum.

The fix proved remarkably simple: a single, elegant code change that broke the cycle. The patch has been deployed across Cloudflare's infrastructure, and tests now pass consistently.

What This Means for Internet Performance

This incident highlights the fragility of congestion control algorithms when ported from TCP to QUIC, a transport that operates differently in user-space. The bug had the potential to degrade performance for millions of QUIC connections, particularly under lossy network conditions.

“The lesson is that kernel optimizations, even when correct for TCP, can introduce subtle regressions in other protocols,” said a transport protocol researcher at a major university. “Cloudflare's work here is a valuable contribution to the QUIC ecosystem.”

Cloudflare has published the full technical analysis and encourages other implementers to audit their CUBIC integration. The fix reinforces the importance of testing congestion control algorithms under all phases—including the rarely explored minimum cwnd regime.

This is a developing story. Check back for updates on broader impacts to the Linux networking stack.

Tags:

Related Articles

Recommended

Discover More

From Gas to Electric: How a Family Car Upgrade Became a Lifestyle ShiftNISAR Satellite Reveals Ground Sinking Crisis in Mexico CityThe Celestial Display Above Earth: Airglow and the Milky Way from the ISSNavigating Australia's New Solar and Battery Regulator: A Step-by-Step Guide10 Essential Things to Know About the Strawberry Moon in June 2026