docs: final stale doc cleanup — 22 archived, 18 active
Archived: IOMMU-SPEC, KERNEL-IPC, KERNEL-SCHEDULER, PROFILE-MATRIX, QUIRKS-IMPROVEMENT, RELIBC-IPC, repo-governance, SCHEDULER-REVIEW, SCRIPT-BEHAVIOR, USB-VALIDATION, XHCID-DEVICE-IMPROVEMENT. Active: all implementation plans + 3 audits + governance docs.
This commit is contained in:
@@ -0,0 +1,348 @@
|
||||
# Hardware Quirks Improvement Plan
|
||||
|
||||
## Purpose
|
||||
|
||||
This plan replaces vague “quirks support” follow-up work with a concrete path to:
|
||||
|
||||
1. keep quirks data and reporting honest,
|
||||
2. integrate quirks into real runtime driver behavior,
|
||||
3. reduce duplicated quirk logic,
|
||||
4. leave DMI and USB device quirks in a maintainable state.
|
||||
|
||||
## Current status snapshot
|
||||
|
||||
Completed from this plan:
|
||||
|
||||
- runtime DMI TOML loading in `redox-driver-sys`,
|
||||
- subsystem-gated PCI TOML matching in both the canonical path and `pcid-spawner`,
|
||||
- shipped DMI TOML overrides in the brokered `pcid-spawner` env-var path,
|
||||
- direct canonical `redox-driver-sys` quirk lookup from `pcid-spawner` instead of a separate in-tree PCI quirk engine,
|
||||
- real USB device quirk consumption in `xhcid`,
|
||||
- first real linux-kpi quirk consumption in the Red Bear amdgpu path,
|
||||
- canonical GPU quirk policy moved to the Rust driver boundary in `redox-drm`, so Intel and AMD now consume one shared quirk source for init-time policy,
|
||||
- PCI quirk extraction upgraded from handler-name guessing to explicit handler-body evidence in `local/scripts/extract-linux-quirks.py`.
|
||||
|
||||
Still open after this implementation wave:
|
||||
|
||||
- document the provenance of existing AMD `need_firmware` entries in `quirks.d/10-gpu.toml`,
|
||||
- keep AMD device-specific GPU quirk growth review-gated on Linux-backed evidence,
|
||||
- keep Intel GPU quirk expansion deferred until Red Bear has a real Intel-side firmware/runtime
|
||||
policy surface that can honestly consume additional flags.
|
||||
|
||||
Current naming/source split:
|
||||
|
||||
- PCI vendor/device **names** now come from the shipped `pciids` database (`/usr/share/misc/pci.ids`).
|
||||
- PCI/USB/storage **quirk flags** still come from Red Bear’s canonical quirk path: compiled tables,
|
||||
shipped TOML files, and conservative Linux-source extraction where applicable.
|
||||
- The `extract-linux-quirks.py` script remains a quirk-mining tool, not the source of human-readable
|
||||
PCI device names.
|
||||
|
||||
The runtime-behavior milestone from this plan is now implemented. The remaining work is
|
||||
maintenance, validation depth, and future refinement rather than missing quirks behavior for the
|
||||
shipped paths.
|
||||
|
||||
It is based on the current in-tree state of:
|
||||
|
||||
- `redox-driver-sys` as the canonical quirks library,
|
||||
- `pcid-spawner` as an upstream-owned PCI launch broker that now brokers canonical quirks,
|
||||
- `redox-drm`, `xhcid`, and the amdgpu Redox glue/runtime path as real runtime PCI quirk consumers,
|
||||
- `lspci`, `lsusb`, and `redbear-info` as reporting surfaces.
|
||||
|
||||
## Reassessment Summary
|
||||
|
||||
### What is real today
|
||||
|
||||
- `redox-driver-sys` owns the canonical PCI/USB quirk flag definitions and lookup helpers.
|
||||
- `redox-drm` consumes PCI quirks for interrupt fallback and `DISABLE_ACCEL`.
|
||||
- `xhcid` consumes PCI controller quirks via `PCI_QUIRK_FLAGS` for IRQ mode selection and reset delay.
|
||||
- `linux-kpi` exposes `pci_get_quirk_flags()` / `pci_has_quirk()` for C drivers, and amdgpu now consumes them in its Redox init path.
|
||||
- `lspci` and `lsusb` surface active PCI/USB quirk flags for discovered devices.
|
||||
- `redbear-info --quirks` reports configured TOML entries and DMI rule counts.
|
||||
|
||||
### What is still weak
|
||||
|
||||
- USB quirks now have a first real runtime consumer in `xhcid`, but broader USB-driver adoption is still missing.
|
||||
- The `linux-kpi` bridge now has a first real in-tree C consumer: amdgpu uses it for quirk-aware IRQ expectation logging. Broader C-driver adoption is still missing.
|
||||
- `pcid-spawner` still synthesizes a partial `PciDeviceInfo` instead of reusing a richer canonical PCI object, because it operates as an upstream-owned broker with a narrow interface.
|
||||
|
||||
### What should not be “fixed” in the wrong layer
|
||||
|
||||
- `firmware-loader` should stay a generic scheme service. `NEED_FIRMWARE` belongs in device driver policy, not in the firmware scheme daemon.
|
||||
- `redbear-info` should describe configured and observable state; it should not pretend to prove runtime quirk application.
|
||||
|
||||
## Target Architecture
|
||||
|
||||
### Upstream-preference policy
|
||||
|
||||
When upstream Redox already provides the same functionality, the upstream path wins by default
|
||||
unless the Red Bear-local implementation is materially better. For quirks and driver support,
|
||||
this means the canonical path should converge on `redox-driver-sys` instead of preserving
|
||||
lower-quality duplicate quirk engines as a steady state.
|
||||
|
||||
### Canonical rule
|
||||
|
||||
`redox-driver-sys` remains the authoritative quirks model:
|
||||
|
||||
- flag definitions,
|
||||
- compiled-in tables,
|
||||
- TOML parsing semantics,
|
||||
- DMI matching behavior.
|
||||
|
||||
All other code should either:
|
||||
|
||||
1. call the canonical lookup directly, or
|
||||
2. receive lookup results from a single broker that is guaranteed to use the same semantics.
|
||||
|
||||
### Driver integration rule
|
||||
|
||||
- **Rust PCI drivers using `redox-driver-sys`** should call `info.quirks()` directly.
|
||||
- **C drivers using `linux-kpi`** should call `pci_has_quirk()` / `pci_get_quirk_flags()` directly in probe/init paths.
|
||||
- **Upstream base drivers that cannot depend on `redox-driver-sys`** may continue using brokered quirk bits from `pcid-spawner`, but only if that broker is made semantically identical to the canonical library.
|
||||
- **USB device quirks** should be consumed inside `xhcid` device enumeration/configuration logic, not only in tooling.
|
||||
|
||||
## Concrete Work Plan
|
||||
|
||||
### Wave 1 — Cleanup and truthfulness
|
||||
|
||||
#### Task 1.1: Keep docs and reporting surfaces honest
|
||||
|
||||
Scope:
|
||||
|
||||
- `local/docs/QUIRKS-SYSTEM.md`
|
||||
- `local/recipes/system/redbear-info/source/src/main.rs`
|
||||
- related AGENTS references if needed
|
||||
|
||||
Goals:
|
||||
|
||||
- separate reporting surfaces from real runtime consumers,
|
||||
- remove claims that imply driver integration where only tooling exists,
|
||||
- keep “not yet implemented” items explicit.
|
||||
|
||||
QA:
|
||||
|
||||
- `cargo test` in `local/recipes/system/redbear-info/source`
|
||||
- review `redbear-info --help` text and `--quirks` output strings
|
||||
|
||||
#### Task 1.2: Remove stale equivalence claims from extraction/documentation
|
||||
|
||||
Scope:
|
||||
|
||||
- `local/scripts/extract-linux-quirks.py`
|
||||
- `local/docs/QUIRKS-SYSTEM.md`
|
||||
|
||||
Goals:
|
||||
|
||||
- avoid mapping Linux flags to incorrect Red Bear flags,
|
||||
- clearly mark the supported explicit PCI extraction patterns and the limits of unsupported handlers.
|
||||
|
||||
QA:
|
||||
|
||||
- run the script on a small synthetic USB/PXI input sample,
|
||||
- confirm output omits unsupported PCI flag mappings instead of inventing equivalents.
|
||||
|
||||
Current state:
|
||||
|
||||
- `local/scripts/extract-linux-quirks.py` no longer guesses PCI quirks from handler names.
|
||||
- PCI extraction now maps only explicit handler-body evidence for supported `PCI_DEV_FLAGS_*`
|
||||
assignments plus `pci_d3cold_disable(...)`.
|
||||
- Running the upgraded extractor on Linux 7.0 `drivers/pci/quirks.c` currently yields only a
|
||||
very small high-confidence PCI subset and no directly usable modern Intel/AMD DRM GPU entries.
|
||||
- This is intentional: false negatives are preferred over wrong GPU quirk claims.
|
||||
- The existing AMD `need_firmware` entries in `quirks.d/10-gpu.toml` are manually reviewed policy
|
||||
entries, not extractor-produced Linux facts. Future extraction runs will not refresh those flags
|
||||
automatically.
|
||||
- Intel firmware classes should be treated explicitly: DMC for display power management, GuC for
|
||||
scheduling/power, HuC for media offload, and GSC for newer authentication flows.
|
||||
- Red Bear now has a bounded Intel DMC startup manifest/preload path for the first supported Intel
|
||||
device families, but Intel `need_firmware` must still stay out of the canonical GPU quirk set
|
||||
until the broader Intel runtime policy surface is real and validated.
|
||||
- AMD device-specific GPU quirk growth remains review-gated on explicit Linux-backed evidence.
|
||||
- Intel GPU quirk expansion is deferred until Red Bear has a real Intel-side firmware/runtime
|
||||
policy surface that can honestly consume additional flags.
|
||||
|
||||
### Wave 2 — Unify PCI quirk semantics
|
||||
|
||||
#### Task 2.1: Eliminate semantic drift between `pcid-spawner` and `redox-driver-sys`
|
||||
|
||||
Constraint:
|
||||
|
||||
- `pcid-spawner` is upstream-owned base code, so any convergence work must be implemented as upstream-base changes carried by Red Bear patching until upstream absorbs them.
|
||||
|
||||
Best approach:
|
||||
|
||||
- make `pcid-spawner` consume generated/shared quirk data instead of hand-maintained duplicated tables and flag maps.
|
||||
|
||||
Preferred implementation options, in order:
|
||||
|
||||
1. **Shared generated data module** used by both `redox-driver-sys` and `pcid-spawner`.
|
||||
2. **Protocol extension** where a single canonical broker calculates quirk bits and hands them to drivers.
|
||||
3. Keep duplication only as a short-term fallback if generation is not yet practical.
|
||||
|
||||
Do **not** continue manually editing two separate PCI quirk engines long-term.
|
||||
|
||||
Success criteria:
|
||||
|
||||
- one authoritative source for compiled PCI quirk entries and flag name mapping,
|
||||
- subsystem matching behavior aligned,
|
||||
- explicit decision on whether DMI is brokered by `pcid-spawner` or left to driver-local lookup.
|
||||
|
||||
QA:
|
||||
|
||||
- compare quirk outputs for the same synthetic PCI info through both paths,
|
||||
- verify `PCI_QUIRK_FLAGS` emitted by `pcid-spawner` matches canonical lookup for representative devices.
|
||||
|
||||
#### Task 2.2: Decide DMI ownership clearly
|
||||
|
||||
Decision needed:
|
||||
|
||||
- either `pcid-spawner` becomes DMI-aware and brokers the final PCI quirk bitmask,
|
||||
- or `pcid-spawner` remains PCI/TOML-only and DMI stays driver-local in `redox-driver-sys` consumers.
|
||||
|
||||
Recommendation:
|
||||
|
||||
- near term: document the split clearly,
|
||||
- medium term: move toward one brokered result for upstream base drivers.
|
||||
|
||||
QA:
|
||||
|
||||
- one design note added to the docs explaining the chosen ownership model.
|
||||
|
||||
### Wave 3 — Real driver integration
|
||||
|
||||
#### Task 3.1: Integrate USB device quirks in `xhcid`
|
||||
|
||||
Best integration points:
|
||||
|
||||
- after device descriptor read,
|
||||
- before SetConfiguration,
|
||||
- before enabling LPM/U1/U2 or USB3-specific behavior,
|
||||
- after reset paths where extra delay or reset-after-probe is needed.
|
||||
|
||||
Minimum runtime behaviors to wire first:
|
||||
|
||||
- `NO_SET_CONFIG`
|
||||
- `NEED_RESET`
|
||||
- `NO_LPM`
|
||||
- `NO_U1U2`
|
||||
- `BAD_DESCRIPTOR`
|
||||
|
||||
Success criteria:
|
||||
|
||||
- `xhcid` calls `lookup_usb_quirks()` for enumerated devices,
|
||||
- these flags alter runtime behavior in concrete branches,
|
||||
- tooling and runtime logic agree on the same device-level quirks.
|
||||
|
||||
QA:
|
||||
|
||||
- unit/integration tests for selector logic where possible,
|
||||
- manual logging proof that a known vendor/product entry triggers the expected path.
|
||||
|
||||
#### Task 3.2: Consume linux-kpi quirks in `amdgpu`
|
||||
|
||||
Best integration points:
|
||||
|
||||
- probe path,
|
||||
- IRQ mode selection,
|
||||
- firmware gating,
|
||||
- memory/power-management setup.
|
||||
|
||||
First flags to consume:
|
||||
|
||||
- `NO_MSI`
|
||||
- `NO_MSIX`
|
||||
- `NEED_FIRMWARE`
|
||||
- `NO_ASPM`
|
||||
- `NEED_IOMMU`
|
||||
|
||||
Success criteria:
|
||||
|
||||
- at least one real C driver uses `pci_has_quirk()` in production code,
|
||||
- runtime logs show quirk-informed decision making.
|
||||
|
||||
Current state:
|
||||
|
||||
- `local/recipes/gpu/amdgpu/source/amdgpu_redox_main.c` now queries linux-kpi PCI quirks in the real Redox runtime path,
|
||||
- logs now show the active quirk bitmask plus the implied IRQ fallback policy,
|
||||
- firmware policy has been pulled back to the Rust-side driver boundary so the C backend does not
|
||||
re-enforce `NEED_FIRMWARE` independently.
|
||||
|
||||
QA:
|
||||
|
||||
- `grep` shows real in-tree call sites in amdgpu,
|
||||
- build passes for linux-kpi + amdgpu recipe path.
|
||||
|
||||
#### Task 3.3: Keep firmware policy in drivers, not firmware-loader
|
||||
|
||||
Action:
|
||||
|
||||
- when a driver has `NEED_FIRMWARE`, the driver should gate initialization until the firmware load succeeds.
|
||||
- `firmware-loader` remains a transport/provider only.
|
||||
|
||||
Success criteria:
|
||||
|
||||
- docs stop implying that firmware-loader interprets quirk flags,
|
||||
- driver init paths own the policy decision.
|
||||
|
||||
QA:
|
||||
|
||||
- driver code path shows firmware gating tied to quirks or explicit device rules.
|
||||
|
||||
Current state:
|
||||
|
||||
- `local/recipes/gpu/redox-drm/source/src/drivers/intel/mod.rs` now reads the canonical
|
||||
`info.quirks()` policy during init, rejects `DISABLE_ACCEL`, and explicitly warns if
|
||||
`NEED_FIRMWARE` appears on Intel instead of silently ignoring quirk policy.
|
||||
- `local/recipes/gpu/redox-drm/source/src/main.rs` now makes firmware preload expectations explicit
|
||||
at the Rust-side driver boundary, reports whether preload is quirk-required, and summarizes
|
||||
missing candidate blobs when preload cannot satisfy the current policy.
|
||||
- `local/recipes/gpu/amdgpu/source/amdgpu_redox_main.c` still consumes linux-kpi quirks for
|
||||
runtime expectations, but it no longer owns the final firmware gating decision.
|
||||
|
||||
### Wave 4 — DMI completion
|
||||
|
||||
#### Task 4.1: DMI TOML runtime loading
|
||||
|
||||
Scope:
|
||||
|
||||
- `toml_loader.rs` parses `[[dmi_system_quirk]]`,
|
||||
- matching uses live DMI info served by `acpid` at `/scheme/acpi/dmi`,
|
||||
- resulting PCI quirk overrides flow through the canonical `redox-driver-sys` DMI path.
|
||||
|
||||
Success criteria:
|
||||
|
||||
- `50-system.toml` entries are no longer config-only,
|
||||
- runtime DMI TOML behavior is testable and documented through the live `acpid` DMI scheme.
|
||||
|
||||
QA:
|
||||
|
||||
- tests for TOML parsing,
|
||||
- one mock DMI input path proving a TOML DMI rule applies flags.
|
||||
|
||||
#### Task 4.2: ACPI blacklist/override layer
|
||||
|
||||
Current state:
|
||||
|
||||
- `acpid` now supports narrow `[[acpi_table_quirk]]` skip rules, optionally gated by the same
|
||||
DMI-style `match.*` fields used elsewhere.
|
||||
- The implementation is intentionally limited to table suppression during ACPI table load; it is
|
||||
not a broad AML patching or firmware replacement framework.
|
||||
|
||||
## Suggested Immediate Deliverables
|
||||
|
||||
If work resumes right away, the next concrete implementation sequence should be:
|
||||
|
||||
1. clean remaining stale quirks docs/reporting text,
|
||||
2. write a design note for canonical PCI quirk ownership,
|
||||
3. integrate `lookup_usb_quirks()` into `xhcid` enumeration/configuration,
|
||||
4. add first real `pci_has_quirk()` use in `amdgpu`,
|
||||
5. validate and extend shipped DMI TOML coverage as needed.
|
||||
|
||||
## Exit Criteria For The Next Quirks Milestone
|
||||
|
||||
The next milestone is complete when all are true:
|
||||
|
||||
- `pcid-spawner` and `redox-driver-sys` no longer drift semantically,
|
||||
- `xhcid` consumes USB device quirks at runtime,
|
||||
- at least one real C driver consumes linux-kpi quirks,
|
||||
- docs distinguish clearly between reporting, infrastructure, and true runtime behavior,
|
||||
- DMI TOML entries are either runtime-applied or removed from shipped config.
|
||||
Reference in New Issue
Block a user