Consolidate the active desktop path around redbear-full while landing the greeter/session stack and the runtime fixes needed to keep Wayland and KWin bring-up moving forward.
11 KiB
xhcid Device-Level Improvement Plan
Purpose
This document defines the implementation sequence for hardening xhcid at the device level in
Red Bear OS.
It is a focused companion to local/docs/USB-IMPLEMENTATION-PLAN.md. The USB plan remains the
subsystem-wide authority; this document narrows scope to the xhcid device lifecycle,
configuration, teardown, PM behavior, enumerator robustness, and bounded proof coverage.
Scope
In scope:
recipes/core/base/source/drivers/usb/xhcid/src/xhci/device_enumerator.rsrecipes/core/base/source/drivers/usb/xhcid/src/xhci/mod.rsrecipes/core/base/source/drivers/usb/xhcid/src/xhci/scheme.rsrecipes/core/base/source/drivers/usb/xhcid/src/xhci/irq_reactor.rs- bounded QEMU validation scripts under
local/scripts/ - canonical USB documentation under
local/docs/
Out of scope:
- generic USB redesign
- unrelated class-driver feature work
- hardware-validation claims beyond what the repo can currently prove
Repo-Fit Note
Technical implementation targets live in upstream-owned source under
recipes/core/base/source/..., but durable Red Bear preservation belongs in
local/patches/base/. This plan names the technical work locations, not a recommendation to leave
work stranded only in upstream-owned trees.
Current Audited Findings
The current xhcid tree has already improved materially:
- lifecycle gating exists through
PortLifecycleandPortOperationGuard configure_endpoints_once()is now transactional relative to earlier behavior- detach waits before removing published state
- a bounded QEMU lifecycle proof exists
Remaining risks:
- partial attach visibility still exists around publication timing
- detach can still depend on bounded-but-incomplete purge semantics
- suspend/resume is still mainly software gating
- rollback failure is not yet a fully hardened degraded-state path
- enumerator logic still relies on timing- and assumption-heavy behavior
- proof coverage is still QEMU-bounded and misses key interleavings
Design Invariants
The implementation should satisfy these invariants:
- No half-attached device is publicly usable.
- No new work is admitted after detach begins.
- Detach always reaches a bounded terminal outcome.
- Failed configure leaves either the old config intact or the device explicitly degraded/reset-required.
- PM transitions reflect actual usable state, not only software policy.
- Enumerator behavior is bounded and diagnosable, not panic-driven.
- Validation claims match what scripts actually prove.
Phase 1 — Proof-First Expansion
Goal
Make the current blind spots reproducible before changing behavior.
Work
- extend
test-xhci-device-lifecycle-qemu.sh - extend
test-usb-qemu.sh - extend
test-xhci-irq-qemu.sh - add bounded injection hooks in
xhcidfor configure-failure and attach/detach timing cases
Required Cases
- repeated attach/detach
- detach during storage startup
- transfer-during-detach surrogate
- configure failure injection
- suspend/resume admission checks
- rapid event ordering cases
Per-File Focus
local/scripts/test-xhci-device-lifecycle-qemu.sh
- add repeated HID/storage attach-detach loops
- add detach-during-driver-start for storage
- add storage attach long enough to exercise startup/read activity before unplug
- require explicit attach-entered, attach-finished, detach-completed evidence
local/scripts/test-usb-qemu.sh
- separate boot progress from proof failure
- keep result lines distinct for xHCI init, HID spawn, SCSI spawn, bounded readback, and crash scan
- add repeated full-stack run mode or bounded loop count if needed for ordering-sensitive regressions
local/scripts/test-xhci-irq-qemu.sh
- verify interrupt-mode evidence still holds under actual attached-device pressure, not only empty-controller boot
xhci test hooks
- add bounded test-only failure hooks in
scheme.rs/mod.rsfor:- fail after
CONFIGURE_ENDPOINT - fail after
SET_CONFIGURATION - optional delay before final attach commit
- fail after
- current bounded implementation uses one-shot guest-side commands written to
/tmp/xhcid-test-hook, consumed byxhcidon the next matching lifecycle point
Exit Criteria
- scripts are syntax-clean
- new cases fail meaningfully on current gaps
- failures identify the specific missed milestone
Phase 2 — Atomic Attach Publication
Goal
Prevent half-built devices from becoming publicly reachable.
Work
- refactor
Xhci::attach_device - split attach staging from published
PortState - narrow lifecycle exposure so scheme paths cannot reach a device before final commit
- make attach cleanup direct for prepublication failure
Key Targets
xhci/mod.rs::Xhci::attach_devicexhci/mod.rs::PortLifecycle::*xhci/device_enumerator.rs::DeviceEnumerator::run
Per-File Focus
xhci/mod.rs
- stop inserting into
port_statesbefore all attach substeps complete - keep slot, input context, EP0 ring, quirks, and descriptors in a private staging carrier
- commit published
PortStatein one final block - keep prepublication cleanup separate from
detach_device()where possible
xhci/device_enumerator.rs
- ensure duplicate connect handling still treats
EAGAINor equivalent as "already published" rather than "half-built staging state"
Exit Criteria
- no public state before attach commit
- attach failure leaves no published device and no child driver
Phase 3 — Bounded Detach and Purge
Goal
Make teardown bounded, dominant, and safe against stale completions.
Work
- bound
PortLifecycle::begin_detaching() - reject all new work immediately once detach starts
- purge or tombstone pending transfer/reactor state
- separate graceful drain from forced teardown
- preserve correct slot-disable/remove ordering
- ensure child-driver shutdown cannot wedge detach
Key Targets
xhci/mod.rsxhci/irq_reactor.rs- transfer bookkeeping in
xhci/scheme.rs
Per-File Focus
xhci/mod.rs
- add timeout or bounded wait to detach drain logic
- distinguish graceful drain from forced teardown
- keep
port_states.remove(...)after terminal teardown outcome
xhci/irq_reactor.rs
- add per-port invalidation or tombstone behavior so stale completions cannot target removed state
xhci/scheme.rs
- ensure operation-entry helpers fail immediately once detach starts
Exit Criteria
- detach cannot hang forever
- no stale completion can target removed device state
- unload-under-activity proof passes
Phase 4 — Configure Rollback Hardening
Goal
Make configuration changes fully transactional and recoverable.
Work
- formalize stage/program/commit boundaries
- ensure snapshots cover all mutated controller-facing state
- promote rollback failure into explicit degraded-state handling
- define deterministic behavior for post-
SET_CONFIGURATIONfailure - keep alternate/config bookkeeping coherent after rollback
- quarantine or reset on unrecoverable ambiguity
Key Targets
xhci/scheme.rs::configure_endpoints_oncerestore_configure_input_contextconfigure_endpointsset_configurationset_interface
Per-File Focus
xhci/scheme.rs
- keep endpoint/ring state staged until commit
- verify snapshots cover every mutated slot/endpoint field
- treat rollback failure as a first-class degraded state
- ensure post-failure descriptor and alternate bookkeeping still reflect live state
Exit Criteria
- injected configure failure preserves old state or explicitly degrades/resets device
- no staged endpoint state leaks into live software state
Phase 5 — Real PM Sequencing
Goal
Replace software-only PM gating with meaningful quiesce/resume semantics.
Work
- define richer PM transition states
- quiesce before suspend
- tie resume to controller/device validity
- define PM interaction with detach
- define PM interaction with configure
- add bounded PM proof cases
Key Targets
xhci/scheme.rs::suspend_devicexhci/scheme.rs::resume_devicexhci/scheme.rs::ensure_port_active- supporting helpers in
xhci/mod.rs
Exit Criteria
- suspend blocks new I/O only after quiesce starts
- resume only returns success from a genuinely usable state
- PM/detach/configure interleavings are deterministic
Phase 6 — Enumerator Cleanup and Timing Hardening
Goal
Remove panic-style and magic-delay behavior from the enumerator path.
Work
- remove panic-class assumptions from
DeviceEnumerator::run - replace fixed sleeps with bounded readiness checks
- make duplicate/out-of-order event handling explicit
- align enumerator decisions with the new attach/detach state machine
- improve logging for reset/attach/detach milestones
Key Targets
xhci/device_enumerator.rs- supporting interactions in
xhci/mod.rs
Exit Criteria
- no ordinary event path panics
- no unnecessary fixed sleep remains
- rapid event-order tests pass in QEMU
Phase 7 — Final Validation, Docs, and Preservation
Goal
Close the loop with evidence, canonical docs, and durable patch carriers.
Work
- rerun the full bounded proof matrix on a rebuilt image
- run source-level verification (
lsp_diagnostics,cargo check,cargo test) - update canonical docs:
local/docs/USB-IMPLEMENTATION-PLAN.mdlocal/docs/USB-VALIDATION-RUNBOOK.md
- refresh durable patch carriers under
local/patches/base/ - delete only clearly stale, superseded docs after link sweep
Exit Criteria
- all bounded USB/xHCI proofs pass on a fresh image
- changed files are diagnostics-clean
- canonical docs match actual proof scope
- patch carrier is refreshed and reapplicable
Validation Matrix
Required final proofs:
bash ./local/scripts/test-xhci-device-lifecycle-qemu.sh --check <tracked-target>bash ./local/scripts/test-usb-qemu.sh --check <tracked-target>bash ./local/scripts/test-xhci-irq-qemu.sh --checkbash ./local/scripts/test-usb-maturity-qemu.sh <tracked-target>
Required source checks:
lsp_diagnosticson all changed filescargo check/cargo testforxhcidcargo checkfor any touched class daemon or helper crate
Commit Strategy
- proof/harness expansion
- atomic attach publication
- bounded detach and purge
- configure rollback hardening
- PM sequencing
- enumerator cleanup
- docs, patch preservation, stale-doc cleanup
Canonical Doc Authority
Authoritative docs after cleanup:
local/docs/USB-IMPLEMENTATION-PLAN.mdlocal/docs/USB-VALIDATION-RUNBOOK.md
This xhcid plan is a focused implementation document beneath those subsystem-level authorities.
Completion Standard
This work is complete only when:
- all seven phases are done in order
- no changed-file diagnostics remain
xhcidbuilds/tests cleanly- bounded QEMU proof matrix passes on a rebuilt image
- canonical docs are synchronized
- durable patch carrier is refreshed
- remaining gaps, if any, are explicitly documented as future or hardware-only work