6689f751d9
Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
304 lines
11 KiB
Markdown
304 lines
11 KiB
Markdown
# Hardware Quirks Improvement Plan
|
|
|
|
## Purpose
|
|
|
|
This plan replaces vague “quirks support” follow-up work with a concrete path to:
|
|
|
|
1. keep quirks data and reporting honest,
|
|
2. integrate quirks into real runtime driver behavior,
|
|
3. reduce duplicated quirk logic,
|
|
4. leave DMI and USB device quirks in a maintainable state.
|
|
|
|
## Current status snapshot
|
|
|
|
Completed from this plan:
|
|
|
|
- runtime DMI TOML loading in `redox-driver-sys`,
|
|
- subsystem-gated PCI TOML matching in both the canonical path and `pcid-spawner`,
|
|
- shipped DMI TOML overrides in the brokered `pcid-spawner` env-var path,
|
|
- direct canonical `redox-driver-sys` quirk lookup from `pcid-spawner` instead of a separate in-tree PCI quirk engine,
|
|
- real USB device quirk consumption in `xhcid`,
|
|
- first real linux-kpi quirk consumption in the Red Bear amdgpu path.
|
|
|
|
Still open after this implementation wave:
|
|
|
|
- no remaining implementation items in the current quirks scope.
|
|
|
|
The runtime-behavior milestone from this plan is now implemented. The remaining work is
|
|
maintenance, validation depth, and future refinement rather than missing quirks behavior for the
|
|
shipped paths.
|
|
|
|
It is based on the current in-tree state of:
|
|
|
|
- `redox-driver-sys` as the canonical quirks library,
|
|
- `pcid-spawner` as an upstream-owned PCI launch broker that now brokers canonical quirks,
|
|
- `redox-drm`, `xhcid`, and the amdgpu Redox glue/runtime path as real runtime PCI quirk consumers,
|
|
- `lspci`, `lsusb`, and `redbear-info` as reporting surfaces.
|
|
|
|
## Reassessment Summary
|
|
|
|
### What is real today
|
|
|
|
- `redox-driver-sys` owns the canonical PCI/USB quirk flag definitions and lookup helpers.
|
|
- `redox-drm` consumes PCI quirks for interrupt fallback and `DISABLE_ACCEL`.
|
|
- `xhcid` consumes PCI controller quirks via `PCI_QUIRK_FLAGS` for IRQ mode selection and reset delay.
|
|
- `linux-kpi` exposes `pci_get_quirk_flags()` / `pci_has_quirk()` for C drivers, and amdgpu now consumes them in its Redox init path.
|
|
- `lspci` and `lsusb` surface active PCI/USB quirk flags for discovered devices.
|
|
- `redbear-info --quirks` reports configured TOML entries and DMI rule counts.
|
|
|
|
### What is still weak
|
|
|
|
- USB quirks now have a first real runtime consumer in `xhcid`, but broader USB-driver adoption is still missing.
|
|
- The `linux-kpi` bridge now has a first real in-tree C consumer: amdgpu uses it for firmware gating and quirk-aware IRQ expectation logging. Broader C-driver adoption is still missing.
|
|
- `pcid-spawner` still synthesizes a partial `PciDeviceInfo` instead of reusing a richer canonical PCI object, because it operates as an upstream-owned broker with a narrow interface.
|
|
|
|
### What should not be “fixed” in the wrong layer
|
|
|
|
- `firmware-loader` should stay a generic scheme service. `NEED_FIRMWARE` belongs in device driver policy, not in the firmware scheme daemon.
|
|
- `redbear-info` should describe configured and observable state; it should not pretend to prove runtime quirk application.
|
|
|
|
## Target Architecture
|
|
|
|
### Upstream-preference policy
|
|
|
|
When upstream Redox already provides the same functionality, the upstream path wins by default
|
|
unless the Red Bear-local implementation is materially better. For quirks and driver support,
|
|
this means the canonical path should converge on `redox-driver-sys` instead of preserving
|
|
lower-quality duplicate quirk engines as a steady state.
|
|
|
|
### Canonical rule
|
|
|
|
`redox-driver-sys` remains the authoritative quirks model:
|
|
|
|
- flag definitions,
|
|
- compiled-in tables,
|
|
- TOML parsing semantics,
|
|
- DMI matching behavior.
|
|
|
|
All other code should either:
|
|
|
|
1. call the canonical lookup directly, or
|
|
2. receive lookup results from a single broker that is guaranteed to use the same semantics.
|
|
|
|
### Driver integration rule
|
|
|
|
- **Rust PCI drivers using `redox-driver-sys`** should call `info.quirks()` directly.
|
|
- **C drivers using `linux-kpi`** should call `pci_has_quirk()` / `pci_get_quirk_flags()` directly in probe/init paths.
|
|
- **Upstream base drivers that cannot depend on `redox-driver-sys`** may continue using brokered quirk bits from `pcid-spawner`, but only if that broker is made semantically identical to the canonical library.
|
|
- **USB device quirks** should be consumed inside `xhcid` device enumeration/configuration logic, not only in tooling.
|
|
|
|
## Concrete Work Plan
|
|
|
|
### Wave 1 — Cleanup and truthfulness
|
|
|
|
#### Task 1.1: Keep docs and reporting surfaces honest
|
|
|
|
Scope:
|
|
|
|
- `local/docs/QUIRKS-SYSTEM.md`
|
|
- `local/recipes/system/redbear-info/source/src/main.rs`
|
|
- related AGENTS references if needed
|
|
|
|
Goals:
|
|
|
|
- separate reporting surfaces from real runtime consumers,
|
|
- remove claims that imply driver integration where only tooling exists,
|
|
- keep “not yet implemented” items explicit.
|
|
|
|
QA:
|
|
|
|
- `cargo test` in `local/recipes/system/redbear-info/source`
|
|
- review `redbear-info --help` text and `--quirks` output strings
|
|
|
|
#### Task 1.2: Remove stale equivalence claims from extraction/documentation
|
|
|
|
Scope:
|
|
|
|
- `local/scripts/extract-linux-quirks.py`
|
|
- `local/docs/QUIRKS-SYSTEM.md`
|
|
|
|
Goals:
|
|
|
|
- avoid mapping Linux flags to incorrect Red Bear flags,
|
|
- clearly mark heuristic extraction limits for PCI handler-name mode.
|
|
|
|
QA:
|
|
|
|
- run the script on a small synthetic USB/PXI input sample,
|
|
- confirm output omits unsupported PCI flag mappings instead of inventing equivalents.
|
|
|
|
### Wave 2 — Unify PCI quirk semantics
|
|
|
|
#### Task 2.1: Eliminate semantic drift between `pcid-spawner` and `redox-driver-sys`
|
|
|
|
Constraint:
|
|
|
|
- `pcid-spawner` is upstream-owned base code, so any convergence work must be implemented as upstream-base changes carried by Red Bear patching until upstream absorbs them.
|
|
|
|
Best approach:
|
|
|
|
- make `pcid-spawner` consume generated/shared quirk data instead of hand-maintained duplicated tables and flag maps.
|
|
|
|
Preferred implementation options, in order:
|
|
|
|
1. **Shared generated data module** used by both `redox-driver-sys` and `pcid-spawner`.
|
|
2. **Protocol extension** where a single canonical broker calculates quirk bits and hands them to drivers.
|
|
3. Keep duplication only as a short-term fallback if generation is not yet practical.
|
|
|
|
Do **not** continue manually editing two separate PCI quirk engines long-term.
|
|
|
|
Success criteria:
|
|
|
|
- one authoritative source for compiled PCI quirk entries and flag name mapping,
|
|
- subsystem matching behavior aligned,
|
|
- explicit decision on whether DMI is brokered by `pcid-spawner` or left to driver-local lookup.
|
|
|
|
QA:
|
|
|
|
- compare quirk outputs for the same synthetic PCI info through both paths,
|
|
- verify `PCI_QUIRK_FLAGS` emitted by `pcid-spawner` matches canonical lookup for representative devices.
|
|
|
|
#### Task 2.2: Decide DMI ownership clearly
|
|
|
|
Decision needed:
|
|
|
|
- either `pcid-spawner` becomes DMI-aware and brokers the final PCI quirk bitmask,
|
|
- or `pcid-spawner` remains PCI/TOML-only and DMI stays driver-local in `redox-driver-sys` consumers.
|
|
|
|
Recommendation:
|
|
|
|
- near term: document the split clearly,
|
|
- medium term: move toward one brokered result for upstream base drivers.
|
|
|
|
QA:
|
|
|
|
- one design note added to the docs explaining the chosen ownership model.
|
|
|
|
### Wave 3 — Real driver integration
|
|
|
|
#### Task 3.1: Integrate USB device quirks in `xhcid`
|
|
|
|
Best integration points:
|
|
|
|
- after device descriptor read,
|
|
- before SetConfiguration,
|
|
- before enabling LPM/U1/U2 or USB3-specific behavior,
|
|
- after reset paths where extra delay or reset-after-probe is needed.
|
|
|
|
Minimum runtime behaviors to wire first:
|
|
|
|
- `NO_SET_CONFIG`
|
|
- `NEED_RESET`
|
|
- `NO_LPM`
|
|
- `NO_U1U2`
|
|
- `BAD_DESCRIPTOR`
|
|
|
|
Success criteria:
|
|
|
|
- `xhcid` calls `lookup_usb_quirks()` for enumerated devices,
|
|
- these flags alter runtime behavior in concrete branches,
|
|
- tooling and runtime logic agree on the same device-level quirks.
|
|
|
|
QA:
|
|
|
|
- unit/integration tests for selector logic where possible,
|
|
- manual logging proof that a known vendor/product entry triggers the expected path.
|
|
|
|
#### Task 3.2: Consume linux-kpi quirks in `amdgpu`
|
|
|
|
Best integration points:
|
|
|
|
- probe path,
|
|
- IRQ mode selection,
|
|
- firmware gating,
|
|
- memory/power-management setup.
|
|
|
|
First flags to consume:
|
|
|
|
- `NO_MSI`
|
|
- `NO_MSIX`
|
|
- `NEED_FIRMWARE`
|
|
- `NO_ASPM`
|
|
- `NEED_IOMMU`
|
|
|
|
Success criteria:
|
|
|
|
- at least one real C driver uses `pci_has_quirk()` in production code,
|
|
- runtime logs show quirk-informed decision making.
|
|
|
|
Current state:
|
|
|
|
- `local/recipes/gpu/amdgpu/source/amdgpu_redox_main.c` now queries linux-kpi PCI quirks in the real Redox runtime path,
|
|
- `PCI_QUIRK_NEED_FIRMWARE` turns missing DMCUB firmware into an init failure instead of a warning-only fallback,
|
|
- logs now show the active quirk bitmask plus the implied IRQ fallback policy.
|
|
|
|
QA:
|
|
|
|
- `grep` shows real in-tree call sites in amdgpu,
|
|
- build passes for linux-kpi + amdgpu recipe path.
|
|
|
|
#### Task 3.3: Keep firmware policy in drivers, not firmware-loader
|
|
|
|
Action:
|
|
|
|
- when a driver has `NEED_FIRMWARE`, the driver should gate initialization until the firmware load succeeds.
|
|
- `firmware-loader` remains a transport/provider only.
|
|
|
|
Success criteria:
|
|
|
|
- docs stop implying that firmware-loader interprets quirk flags,
|
|
- driver init paths own the policy decision.
|
|
|
|
QA:
|
|
|
|
- driver code path shows firmware gating tied to quirks or explicit device rules.
|
|
|
|
### Wave 4 — DMI completion
|
|
|
|
#### Task 4.1: DMI TOML runtime loading
|
|
|
|
Scope:
|
|
|
|
- `toml_loader.rs` parses `[[dmi_system_quirk]]`,
|
|
- matching uses live DMI info served by `acpid` at `/scheme/acpi/dmi`,
|
|
- resulting PCI quirk overrides flow through the canonical `redox-driver-sys` DMI path.
|
|
|
|
Success criteria:
|
|
|
|
- `50-system.toml` entries are no longer config-only,
|
|
- runtime DMI TOML behavior is testable and documented through the live `acpid` DMI scheme.
|
|
|
|
QA:
|
|
|
|
- tests for TOML parsing,
|
|
- one mock DMI input path proving a TOML DMI rule applies flags.
|
|
|
|
#### Task 4.2: ACPI blacklist/override layer
|
|
|
|
Current state:
|
|
|
|
- `acpid` now supports narrow `[[acpi_table_quirk]]` skip rules, optionally gated by the same
|
|
DMI-style `match.*` fields used elsewhere.
|
|
- The implementation is intentionally limited to table suppression during ACPI table load; it is
|
|
not a broad AML patching or firmware replacement framework.
|
|
|
|
## Suggested Immediate Deliverables
|
|
|
|
If work resumes right away, the next concrete implementation sequence should be:
|
|
|
|
1. clean remaining stale quirks docs/reporting text,
|
|
2. write a design note for canonical PCI quirk ownership,
|
|
3. integrate `lookup_usb_quirks()` into `xhcid` enumeration/configuration,
|
|
4. add first real `pci_has_quirk()` use in `amdgpu`,
|
|
5. validate and extend shipped DMI TOML coverage as needed.
|
|
|
|
## Exit Criteria For The Next Quirks Milestone
|
|
|
|
The next milestone is complete when all are true:
|
|
|
|
- `pcid-spawner` and `redox-driver-sys` no longer drift semantically,
|
|
- `xhcid` consumes USB device quirks at runtime,
|
|
- at least one real C driver consumes linux-kpi quirks,
|
|
- docs distinguish clearly between reporting, infrastructure, and true runtime behavior,
|
|
- DMI TOML entries are either runtime-applied or removed from shipped config.
|