Fix tc class/filter leaks on the bridge and reconcile them periodically#289
Fix tc class/filter leaks on the bridge and reconcile them periodically#289hiroTamada wants to merge 3 commits into
Conversation
Per-VM HTB classes and basic filters leaked on busy hosts, and the filter list is walked linearly for every VM-egress packet, degrading all VM network throughput as leaks accumulate. Root causes fixed: - Cleanup paths with no persisted class ID fell back to the hash-derived ID, deleting the wrong class/filter whenever the original add had been collision-probed to a different ID. The class is now resolved from the bridge filter table via the TAP's ifindex (the authoritative tap->class link); when nothing resolves, cleanup defers to the reconciler instead of guessing. - removeVMClass swallowed every delete error; failures are now logged and counted (hypeman_network_tc_cleanup_failures_total). - CleanupOrphanedClasses only ran at process startup and never swept dangling filters. It now runs every TAP GC tick under the tc mutex, sweeps filters not anchored to a live TAP, and counts removals (hypeman_network_tc_orphans_swept_total). It bails if no rt_iif matches parse, so an unexpected tc output format can't trigger a sweep of live state. - The persisted classid is cleared after instance release so the routine stop-then-delete double release doesn't re-delete the class. scripts/sweep-stale-tc.sh applies the same sweep out-of-band (dry-run by default) to recover already-polluted hosts without a restart. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
|
Created a monitoring plan for this PR. What this PR does: Stops VM network throughput from degrading on busy hosts by fixing stale tc class/filter leaks on the hypeman bridge and running periodic reconciliation every 60s. Intended effect:
Risks:
Status updates will be posted automatically on this PR as monitoring progresses. |
A persisted classID can go stale (saveClassID failing after a collision-probed add), in which case deleting by it removes the wrong class — or worse, one belonging to another live VM if the stale ID collides. Resolve the class from the TAP's own filter when one exists, and skip flowid-matched deletes whose anchor is a live interface. Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
| "bridge", bridgeName, "tap", tapName, "class", fullClassID, | ||
| "error", err, "output", strings.TrimSpace(string(output))) | ||
| m.recordTCCleanupFailure(ctx, "class_del") | ||
| } |
There was a problem hiding this comment.
Filter list failure skips safety
Medium Severity
When listBridgeFilters fails in removeVMClass, cleanup still proceeds to delete the HTB class using only the persisted classID. That bypasses the new filter-table checks (including classClaimedByOther) meant to avoid removing another VM’s class after collision probing, reintroducing the wrong-ID delete this change is meant to fix.
Reviewed by Cursor Bugbot for commit 4ea5f81. Configure here.
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes using high effort and found 2 potential issues.
There are 3 total unresolved issues (including 1 from previous review).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit 9a7d193. Configure here.
| } | ||
| m.recordTCOrphanSwept(ctx, "filter") | ||
| deleted++ | ||
| } |
There was a problem hiding this comment.
Unparsed rt_iif treated as stale
Medium Severity
The CleanupOrphanedClasses and removeVMClass functions might incorrectly delete filters for active VMs. When the rt_iif value from tc filter show output isn't fully parsed, filters for active TAPs are mistakenly identified as orphaned. This can disrupt upload shaping or, in removeVMClass, lead to deleting another VM's filter.
Reviewed by Cursor Bugbot for commit 9a7d193. Configure here.
| return | ||
| } | ||
| if deleted := m.networkManager.CleanupOrphanedTAPs(ctx, preserve, tapGCMinAge); deleted > 0 { | ||
| log.DebugContext(ctx, "TAP GC: no preserve candidates, skipping TAP pass") |
There was a problem hiding this comment.
TC sweep skipped on list failure
Medium Severity
Periodic bridge tc reconciliation runs inside reconcileTAPs only after ListInstances succeeds. When listing fails, the function returns early and never calls CleanupOrphanedClasses, so stale classes and filters are not swept on that 60s tick despite the PR’s goal of periodic cleanup.
Reviewed by Cursor Bugbot for commit 9a7d193. Configure here.


Summary
Per-VM HTB classes and
basicfilters leak on the bridge on busy hosts. Since the bridge filter list is evaluated linearly for every VM-egress packet, accumulated leaks degrade network throughput for every VM on the host (observed: ~1,880 stale class/filter pairs vs ~6 live VMs on one production host).Leak mechanisms fixed
Wrong-ID deletes. Cleanup paths that have no persisted class ID (periodic TAP reaper, orphaned-TAP cleanup, recreate-existing-TAP, TAP-create failure) fell back to the hash-derived class ID. Whenever the original
tc class addhad been collision-probed tohash+N, these paths deleted the wrong ID and leaked the real class and its filter — a feedback loop, since more stale classes cause more probing.removeVMClassnow resolves the class from the bridge filter table via the TAP's ifindex (meta(rt_iif eq N)is the authoritative tap→class link). When neither a persisted ID nor a filter resolves, it deletes nothing and defers to the reconciler instead of guessing.Silent delete failures.
removeVMClassran everytcdelete best-effort with no logging. Failures are now logged and counted viahypeman_network_tc_cleanup_failures_total{operation}.Reconcile never ran and missed filters.
CleanupOrphanedClassesonly ran at process startup and iterated classes only, so dangling filters (class deleted, filter left, or TAP gone) were invisible to it. It now:rt_iifematch parses, so an unexpectedtcoutput format can't cause a sweep of live statehypeman_network_tc_orphans_swept_total{kind}Double-release noise. The persisted classid file is cleared after an instance-scoped release, so the routine stop→delete double release no longer re-deletes the class.
Out-of-band recovery
scripts/sweep-stale-tc.shapplies the same sweep logic standalone (dry-run by default,--applyto delete) to recover already-polluted hosts immediately, without waiting for a deploy/restart. Candidates are snapshotted before keep-sets so anything allocated mid-run is never considered stale.Test plan
go build/go vetonlib/networkandlib/instancesgo test ./lib/network/(includes newparseBridgeFiltersunit tests)bash -n+ dry-run ofscripts/sweep-stale-tc.shmake test-linuxintegration suite — not run locally (requires KVM + HTB kernel modules not available in my environment); needs CI or a metal hosttc class show dev vmbr0 | wc -l, then--applyand confirmhypeman_network_tc_class_collisions_totaland softirq CPU drop🤖 Generated with Claude Code
Note
Medium Risk
Changes live Linux bridge traffic-control delete and reconciliation logic under
tcMu; incorrect sweeps could affect VM upload shaping, though multiple safety bail-outs limit that risk.Overview
Fixes leaked per-VM HTB classes and
basicfilters on the bridge, which linearly slow every VM’s egress when they accumulate.Release and delete paths now resolve the real class via the bridge filter table (
meta(rt_iif eq N)) instead of guessing the hash-derived ID after collision probing; they skip deleting when a persisted ID appears owned by another live TAP. Failedtcdeletes are logged and counted. Instance-scoped release clears the persistedclassidfile after a successful delete so stop→delete doesn’t repeat cleanup.CleanupOrphanedClassesis extended to sweep dangling filters first, then unreferenced classes (live TAP ifindexes, filter flowids, and stored allocation class IDs), holdstcMuto avoid racing async rate-limit setup, and bails if filter output can’t be parsed safely. It runs on every 60s TAP GC tick (even when the TAP pass is skipped), not only at startup.Adds
hypeman_network_tc_cleanup_failures_totalandhypeman_network_tc_orphans_swept_total, unit tests forparseBridgeFilters, andscripts/sweep-stale-tc.shfor dry-run /--applyrecovery on already-polluted hosts.Reviewed by Cursor Bugbot for commit 9a7d193. Bugbot is set up for automated code reviews on this repo. Configure here.