Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
27 KiB
Red Bear OS ACPI Improvement Plan
Purpose
This document turns the current ACPI assessment into a concrete execution plan.
It does not replace local/docs/ACPI-FIXES.md. That file remains the historical record for the
P0 bring-up work and the current table-by-table status snapshot. This document is the forward-looking
plan for improving completeness, robustness, ownership clarity, consumer integration,
and validation quality.
The goal is not to treat ACPI as a generic checklist of table parsers. The goal is to make the Red Bear ACPI stack:
- correct enough to survive bad firmware,
- clear enough that ownership boundaries stay maintainable,
- observable enough that failures are diagnosable,
- and validated enough that "complete" means more than "boots on one machine".
Scope
This plan covers the Red Bear ACPI stack and its direct dependency chain:
- kernel ACPI discovery and early table handling,
acpidas the main ACPI/AML/FADT/DMI/power daemon,iommuas the IVRS/AMD-Vi runtime owner,pcid//configas the MCFG replacement path,- DMI-backed quirks flowing through
acpidandredox-driver-sys, - ACPI-consuming services such as
redbear-sessiondandredbear-info.
Primary focus is the current x86_64 path, because that is the active Red Bear hardware target and the area where the current implementation and validation debt is concentrated. ARM64 ACPI support remains in scope only where kernel ownership decisions or generic parser quality would affect it.
Evidence Model
This plan uses five evidence buckets and does not treat them as equivalent:
- source-visible — behavior is visible in the current checked-in source tree
- patch-carried — behavior exists through
local/patches/*rather than plain upstream source - build-visible — code compiles and stages successfully in the current build
- runtime-validated — behavior has been exercised successfully in real boot/runtime paths
- negative-result-documented — failure modes and platform gaps are recorded explicitly
This matters because the current ACPI stack has already crossed the bring-up threshold, but still has meaningful gaps between implemented, robust, and trusted.
Ownership Model
The long-term ownership split should be:
- Kernel ACPI — minimum early discovery and unavoidable early platform setup
acpid— ACPI table serving, AML execution, FADT power/reboot logic, DMI exposure, power-state exposure, ACPI table quirk filteringiommudaemon — IVRS runtime parsing and AMD-Vi controller ownership- future Intel IOMMU owner — DMAR runtime handling, not
acpid pcid— PCI config space access replacing broken MCFG-in-acpid stubs- consumers — query ACPI-exposed services; do not parse ACPI firmware directly unless they are the designated owner
This ownership split is not fully enforced today. The plan below is designed to move the current tree from transitional ownership to explicit ownership without destabilizing the working bring-up path.
Current State Summary
What is strong today
- Kernel RSDP/RSDT/XSDT/MADT handling exists and is sufficient for current boot bring-up.
acpidowns FADT parsing, AML integration, DMI exposure, and ACPI-backed power state exposure.- IVRS was correctly removed from the broken
acpidstub path and moved to theiommudaemon. - MCFG ownership was correctly removed from
acpidand replaced with thepcid /configpath. - DMI-backed quirks are integrated through
/scheme/acpi/dmiandredox-driver-sys. acpidstartup uses typedStartupErrorwith explicit error messages and clean exit paths (Wave 1 boot-path hardening partially complete).- AML mutex state has real tracked implementation with handle-based acquire/release semantics in
aml_physmem.rs(Wave 2 AML mutex work partially complete). - EC access width is handled via
read_bytes/write_bytesbyte-transaction sequences for u16/u32/u64 accesses (Wave 2 EC width work partially complete). - DMAR table parsing module exists in
acpidbut is not wired into the startup path; DMAR ownership is effectively deferred to theiommudaemon (Wave 3 DMAR separation partially complete). - Shutdown eventing uses
/scheme/kernel.acpi/kstopas the kernel-to-userspace shutdown signal;redbear-sessiondlistens on this path forPrepareForShutdownD-Bus signals. - Kernel registers the
kstopscheme at boot and ACPI subsystem shutdown uses PM1a/PM1b CNT writes with\_S5sleep types.
What is still weak today
- Sleep state transitions (
\_Sxmethods beyond\_S5) and sleep eventing remain unsupported; there is no/scheme/acpi/sleepor event-driven sleep contract. - AML opregion error propagation still has some silent failure paths; not all correctness-critical reads return error to caller.
AmlSymbolsinitialization order is still tied to PCI FD registration timing; AML initialization is not fully deterministic.SLP_TYPbhandling remains unimplemented for sleep states beyond\_S5.- DMAR table parsing module is present but unused; the module itself has not been removed, creating latent confusion about ownership.
- Docs still risk equating "implemented" with "validated" without explicit evidence qualification.
Honest status statement
Red Bear ACPI is materially complete for the historical P0 boot goal, but it is not yet complete
for robustness, ownership cleanliness, sleep state support, or broad platform confidence. Sleep
state eventing is a known gap. The shutdown eventing contract via kstop is implemented but only
validated in QEMU; bare-metal validation is still outstanding.
Canonical Related Documents
Read these alongside this plan:
local/docs/ACPI-FIXES.md— current status ledger and historical P0 fixeslocal/docs/IRQ-AND-LOWLEVEL-CONTROLLERS-ENHANCEMENT-PLAN.md— controller-level validation and quality contextlocal/docs/IOMMU-SPEC-REFERENCE.md— IVRS/DMAR technical referencelocal/docs/QUIRKS-SYSTEM.md— DMI-backed quirks and ACPI table blacklist behaviorlocal/docs/AMD-FIRST-INTEGRATION.md— historical AMD-first framing and hardware contextlocal/docs/BAREMETAL-LOG.md— real-machine failure notes and negative results
Work Classification
Every task in this plan is tagged by its main purpose:
- Completeness — functionality exists but is still missing or partial
- Robustness — behavior exists but is too fragile under bad firmware or runtime stress
- Quality — ownership, observability, maintainability, or docs are below target
Wave 0 — Contracts, truthfulness, and degraded-mode policy
Goal
Stop treating ACPI as a loose cluster of working code and instead define:
- who owns what,
- which failures are fatal versus degradable,
- and what status words in the docs actually mean.
Why this wave is first
Without an explicit contract, later hardening work turns into hidden rewrites and docs drift.
Scope
local/docs/ACPI-FIXES.mdlocal/docs/IRQ-AND-LOWLEVEL-CONTROLLERS-ENHANCEMENT-PLAN.md- this file
- related references in
README.md,docs/02-GAP-ANALYSIS.md, andAGENTS.mdif needed
Status: Wave 0 execution partially complete
Tasks 0.1, 0.2, and 0.3 are partially executed in this documentation pass. The degraded-mode matrix
and normalized vocabulary are new in this pass. Ownership boundaries are partially documented below;
the canonical statement still lives in local/docs/ACPI-FIXES.md.
Vocabulary normalization (Task 0.2 — partially executed)
Replace ambiguous wording such as "complete" with one of:
- implemented — behavior exists in the current source tree
- validated in QEMU — behavior has been exercised in QEMU/OVMF but not on real hardware
- validated on bounded real hardware — behavior verified on specific hardware that was tested
- still transitional — behavior exists but ownership or robustness is not yet clean
- known gap — functionality is absent or broken; the gap is documented
ACPI degraded-mode matrix (Task 0.1 — new)
This matrix documents the expected system behavior for ACPI failure cases. All entries reflect implemented behavior visible in the current source tree.
| Condition | Kernel behavior | Userspace (acpid) behavior |
Session impact |
|---|---|---|---|
| Bad RSDP checksum | Warns, continues with best-effort RSDP parse | No ACPI init if RSDT/XSDT unreadable; exits cleanly | No ACPI services |
| Bad SDT checksum | Logs warning per table, continues | Table skipped; other tables still served | Reduced ACPI surface |
| Truncated FADT | FADT fields fall back to zero defaults | Uses zero defaults for PM registers; acpi_shutdown may not fire |
Shutdown may fall back to keyboard controller |
| Truncated DMAR | N/A (DMAR not used by kernel) | Logs error, continues without DMAR; iommu daemon not started |
No Intel IOMMU via DMAR |
| Truncated IVRS | N/A (IVRS not used by kernel) | No effect on acpid (IVRS owned by iommu daemon) |
No AMD-Vi via IVRS |
| AML interpreter init failure | N/A | acpid exits with typed error; no ACPI scheme |
No AML, no power methods |
| EC timeout | N/A | Returns AmlError::MutexAcquireTimeout to AML interpreter |
AML opregion access fails gracefully |
| EC unsupported width access | N/A | Wider accesses split into byte transactions via read_bytes/write_bytes |
Works on byte-access ECs only |
| Missing DMAR on Intel | N/A | acpid logs DMAR absent; iommu daemon not started |
No Intel VT-d |
| Missing IVRS on AMD | N/A | No effect (IVRS owned by iommu daemon) |
No AMD-Vi |
Missing \_S5 sleep types |
N/A | acpi_shutdown logs error and returns without writing PM registers |
Shutdown may fall back to keyboard controller |
Missing /scheme/kernel.acpi/kstop |
Kernel does not register kstop scheme | acpid exits with error on startup |
No kernel-orchestrated shutdown |
Sleep state transition (\_Sx) |
N/A | Not implemented; no event-driven sleep contract | Sleep states not available |
redbear-sessiond shutdown watcher |
Kernel signals kstop on shutdown |
acpi_watcher.rs reads kstop and emits D-Bus PrepareForShutdown |
Login1 session manager informed |
Ownership boundaries (Task 0.3 — partially documented)
This section documents the current ownership split as visible in the source tree. Items marked transitional indicate the ownership boundary is not yet fully enforced by code.
| Component | Owner | Status |
|---|---|---|
| Early table discovery (RSDP, RSDT, XSDT) | Kernel | implemented |
| MADT, HPET, SPCR, GTDT parsing | Kernel | implemented |
FADT parsing, \_S5 sleep types, PM registers |
acpid |
implemented |
| AML interpreter initialization and execution | acpid |
implemented |
| EC access (byte-wide and widened via byte transactions) | acpid |
implemented |
| AML mutex state tracking | acpid (aml_physmem.rs) |
implemented (real tracked state, not placeholder) |
| FADT shutdown/reboot via PM1a/PM1b CNT | acpid |
implemented |
| Keyboard controller fallback reboot | acpid |
implemented |
| DMAR table parsing | acpid (module present) |
transitional — module not wired; effectively owned by iommu daemon |
| IVRS ownership | iommu daemon |
implemented |
| MCFG/PCI config space | pcid /config endpoint |
implemented |
| DMI exposure and quirks | acpid via /scheme/acpi/dmi |
implemented |
Power methods (\_PS0/\_PS3/\_PPC) |
acpid |
implemented |
Sleep state transitions (\_Sx beyond \_S5) |
none | known gap |
| Sleep eventing | none | known gap |
Shutdown event via kstop |
Kernel + acpid + redbear-sessiond |
implemented (QEMU-validated; bare-metal validation outstanding) |
| DMAR runtime ownership (Intel VT-d) | iommu daemon |
transitional — not yet fully separated from acpid DMAR module |
Acceptance criteria
- one canonical ownership statement exists — partially met (this table, plus
ACPI-FIXES.md) - one degraded-mode matrix exists — met (this pass)
- all high-level ACPI status claims use the same vocabulary — partially met (normalized in this pass)
Validation
- doc review only,
- no code changes required for vocabulary and matrix,
- Wave 0 should be treated as ongoing; new evidence may require matrix updates
Wave 1 — Boot-path hardening and parser strictness
Goal
Remove catastrophic or silent failure behavior from the boot-critical ACPI path.
Main files
recipes/core/base/source/drivers/acpid/src/main.rsrecipes/core/base/source/drivers/acpid/src/acpi.rsrecipes/core/base/source/drivers/acpid/src/scheme.rsrecipes/core/kernel/source/src/acpi/mod.rs- kernel ACPI submodules for RSDP/RSDT/XSDT/MADT/HPET/SPCR/GTDT
Status: Task 1.1 partially executed
acpid main.rs now uses a typed StartupError enum covering:
ReadRootTable— failed to read/scheme/kernel.acpi/rxsdtParseRootTable— failed to parse[R|X]SDTUnexpectedRootTableSignature— wrong root table signatureMalformedRootTableEntries— malformed entry areaInitializeAcpi— failed ACPI context init- port I/O rights acquisition failure
- shutdown pipe open failure (
/scheme/kernel.acpi/kstop) - event queue creation failure
- scheme socket creation failure
- event queue subscription failure
- scheme registration failure
Each failure path logs a human-readable error and calls std::process::exit(1). No panic!
remains on these paths. Empty RSDT (no ACPI) causes a clean exit(0) after daemon.ready().
Tasks 1.2 and 1.3 remain open.
Tasks
Task 1.1 — Replace panic-grade startup failures in acpid — partially done
Typed StartupError enum is implemented in main.rs. The following failure classes are now handled:
- hard fail with typed error message and exit code 1,
- soft fail with degraded behavior (ACPI absent → clean exit 0),
- or early clean exit when
/scheme/kernel.acpi/rxsdtis empty.
Task 1.2 — Make table rejection policy explicit — open
For kernel and acpid table use, define when a bad length/checksum/revision:
- is logged and ignored,
- is logged and downgraded,
- or is fatal.
This policy must be table-specific, not one global "warn and continue" convention.
Task 1.3 — Improve parser observability — open
Every accepted or rejected table should leave enough evidence to reconstruct why:
- table signature,
- physical address if known,
- length/revision/checksum status,
- consumer that requested it,
- fallback path chosen.
Acceptance criteria
- no
panic!/expect()remains on firmware-origin or optional-service startup paths inacpid— partially met (Tasks 1.2 and 1.3 still open), - malformed-table decisions are deterministic and documented — open,
- degraded boot still succeeds in all cases classified as degradable by Wave 0 — open.
Validation
- negative tests for malformed checksums and table lengths,
- QEMU validation with intentionally damaged tables if feasible,
- one AMD and one Intel bounded hardware boot recheck,
- evidence captured in
local/docs/BAREMETAL-LOG.mdor a successor log.
Wave 2 — AML, opregions, EC, and power-state correctness
Goal
Close the biggest runtime-correctness gaps in the ACPI stack.
Main files
recipes/core/base/source/drivers/acpid/src/acpi.rsrecipes/core/base/source/drivers/acpid/src/aml_physmem.rsrecipes/core/base/source/drivers/acpid/src/ec.rs
Status: Tasks 2.1, 2.2, and 2.5 partially executed
Task 2.1 — Remove placeholder AML mutex behavior — partially done
AmlMutexState in aml_physmem.rs implements real tracked state:
AmlMutexState::create_handle()generates unique handles via incrementingnext_handleAmlMutexState::statesis aFxHashMap<Handle, bool>tracking locked/unlocked statelock_aml_mutexes()wraps the state map with properMutexguard and poisoned-state recovery- The
acquire()method looks up the handle in the map, sets it totrueon success, and returnsAmlError::MutexAcquireTimeouton timeout or unknown handle
This is no longer a placeholder implementation. Remaining work: timeout semantics documentation and concurrent acquire/release stress testing.
Task 2.2 — Eliminate silent zero-on-failure physical reads — partially done
EC reads via read_bytes now propagate AmlError::MutexAcquireTimeout on EC timeout rather than
returning zero. Kernel-physmem reads still have some silent failure paths; this task is not fully
closed.
Task 2.5 — Decide and validate EC width behavior — partially done
ec.rs now implements read_u16, read_u32, read_u64, write_u16, write_u32, and write_u64
on Ec via byte-transaction sequences in read_bytes/write_bytes:
ensure_access()validates the access fits in a u8 addressable rangeread_bytes<const N: usize>loops over individual byte reads with timeout per bytewrite_bytes()loops over individual byte writes with timeout per byte
Wider accesses are emulated through byte transactions rather than being rejected. This is implemented behavior, not a placeholder. Validation on real EC hardware remains outstanding.
Tasks still open
Task 2.3 — Finish AmlSymbols initialization contract — open
AmlSymbols initialization order is still tied to PCI FD registration timing. AML initialization
is not fully deterministic. The upstream TODO to "use these parsed tables for the rest of acpid"
remains.
Task 2.4 — Fix power-state completeness gaps — open
SLP_TYPb handling remains unimplemented. Sleep state transitions beyond \_S5 are not supported.
Sleep eventing is not implemented. These are documented as known gaps.
Acceptance criteria
- AML synchronization is no longer placeholder-driven — partially met (2.1 done; 2.3 open),
- physmem failures do not silently fabricate correctness-critical values — partially met (EC done; kernel-physmem still open),
- AML initialization order is reproducible and documented — open (Task 2.3),
- sleep-state handling is explicit for both implemented and out-of-scope states — open (Task 2.4),
- EC behavior is either implemented or honestly bounded — met (byte transactions for wider widths)
Validation
- targeted AML method execution tests,
- shutdown/reboot proof on QEMU and bounded real hardware,
- EC timeout/error-path tests where possible,
- concurrent ACPI scheme reads while AML methods run.
Wave 3 — Ownership cleanup: reduce kernel ACPI scope and remove DMAR from acpid
Goal
Move from transitional ownership to architecture that is easier to maintain.
Main files
recipes/core/kernel/source/src/acpi/mod.rs- kernel ACPI submodules as needed
recipes/core/base/source/drivers/acpid/src/acpi/dmar/mod.rsrecipes/core/base/source/drivers/acpid/src/scheme.rslocal/recipes/system/iommu/source/src/*
Status: Tasks 3.1 and 3.2 partially executed
Task 3.1 — Define the minimum kernel ACPI surface — open
The kernel still carries TODOs for kernel ACPI scope reduction. No staged migration contract has been written yet.
Task 3.2 — Move DMAR to the correct owner — partially done
The acpi/dmar/mod.rs module remains present in acpid source but is not imported or called from
main.rs startup. The DMAR parsing code itself is not executed at daemon startup. However, the
module has not been removed from the source tree, creating latent confusion about ownership.
The iommu daemon is responsible for IVRS/DMAR runtime handling. DMAR is not initialized by
acpid. The exit path from acpid for DMAR is therefore effectively achieved, but the cleanup
(task: remove the unused module or move it to the iommu crate) is not complete.
Task 3.3 — Ensure handoff paths are explicit — open
Handoff paths for table discovery and CPU/topology are not yet documented as a staged migration contract.
Acceptance criteria
- the minimal kernel ACPI contract is written down — open,
- DMAR has a concrete exit path from
acpid— partially met (not wired; module still present), - ownership reductions are staged and do not break current bring-up — open
Validation
- before/after boot regressions,
- Intel-specific validation for DMAR path changes,
- AMD regression checks proving IVRS ownership remains isolated in
iommu.
Wave 4 — Consumer integration and eventing quality
Goal
Make ACPI consumers correct and low-friction, not just functional.
Main files
local/recipes/system/redbear-sessiond/source/src/acpi_watcher.rsrecipes/core/base/source/drivers/acpid/src/scheme.rs- DMI/quirk consumers under
redox-driver-sysand reporting surfaces
Status: Task 4.1 partially executed; sleep eventing still a gap
Task 4.1 — Replace polling-based ACPI state consumption — partially done
Shutdown eventing is now event-driven via /scheme/kernel.acpi/kstop:
acpidopenskstopat startup and subscribes to it viaRawEventQueue- When the kernel triggers shutdown,
acpidreceives an event on thekstopfile descriptor redbear-sessiond'sacpi_watcher.rsopenskstopand reads one byte in a blockingspawn_blockingcall, then emits D-BusPrepareForShutdown(true)signal
Sleep eventing (\_Sx transitions) remains unsupported. There is no /scheme/acpi/sleep surface
and no event-driven sleep contract. This is a known gap.
Task 4.2 — Bound DMI quirk authority — open
Task 4.3 — Improve operator-facing observability — open
Acceptance criteria
- no periodic polling remains for core ACPI power/session transitions if eventing is feasible — partially met (shutdown done; sleep still polling/absent),
- quirk precedence is documented — open,
- consumer-visible behavior is diagnosable from logs and status outputs — open
Validation
- repeated shutdown/sleep edge tests,
- DMI quirk application checks on known systems,
- race checks with multiple simultaneous consumers of
/scheme/acpi/*.
Wave 5 — Validation closure and release gate
Goal
Convert the current implementation from bring-up evidence into release-grade trust.
Validation matrix
At minimum, require:
- QEMU/OVMF boot with ACPI active,
- modern AMD hardware,
- modern Intel hardware,
- one platform that exercises EC-backed AML behavior,
- malformed-table or degraded-mode evidence where feasible.
Tasks
Task 5.1 — Publish a platform matrix
For each validated platform, record:
- firmware mode,
- key ACPI tables detected,
- APIC mode,
- whether shutdown/reboot worked,
- whether DMI and power exposure worked,
- whether any AML/EC failure was observed.
Task 5.2 — Capture negative results
Do not hide unsupported AML opcodes, partial EC behavior, or platform-specific regressions behind a generic "works on tested hardware" label.
Task 5.3 — Define the ACPI release gate
Before calling ACPI complete for current Red Bear goals, require:
- clean boot on the bounded matrix,
- explicit degraded-mode behavior for known bad firmware cases,
- documented ownership state,
- and current docs that distinguish implemented vs validated.
Acceptance criteria
- one bounded but honest validation matrix exists,
- negative results are documented,
- ACPI status claims are tied to explicit evidence rather than inference.
Upstream vs Red Bear Work Split
Upstream-first work
These are generic ACPI correctness or architecture issues and should be solved upstream whenever possible, with temporary Red Bear patch carriers only if necessary:
acpidstartup hardening- AML mutex semantics in
aml_physmem.rs SLP_TYPbcompletion- EC error typing and possibly EC access-width handling
- using parsed tables for the rest of
acpid - DMAR leaving
acpid - kernel ACPI scope reduction TODOs
- generic parser quality for kernel ACPI modules
Red Bear-owned work
These remain Red Bear responsibilities even if upstream code improves:
- honest status/phase documentation
- bounded validation matrix and operator runbooks
redbear-sessiondevent consumption quality- DMI quirk governance and integration policy
- temporary durable patch carriers in
local/patches/* - coordination between
acpid,iommu,pcid, and downstream consumers
Sequencing Constraints
- Wave 0 must come first so later changes do not drift architecturally.
- Wave 1 must come before Wave 2 so runtime correctness sits on a hardened startup path.
- Wave 2 should come before Wave 4 because consumer contracts should depend on correct AML and power behavior.
- Wave 3 should not start until Waves 1 and 2 are at least partially complete; ownership moves are dangerous if the runtime behavior is still fragile.
- Wave 5 closes the work; it must not be used as a substitute for architecture.
Main Risks
- stricter parser/error handling may expose machines that currently boot only by luck,
- AML/EC changes may uncover hidden ordering assumptions with PCI registration,
- reducing kernel ownership too early may regress early platform bring-up,
- moving DMAR out of
acpidmay create Intel-only regressions if the replacement contract is vague, - DMI quirks can become a crutch if they are allowed to override runtime facts indiscriminately.
Deliverable Order
If work from this plan is executed, the recommended order is:
- documentation and degraded-mode contract,
- startup hardening,
- AML/EC correctness,
- ownership cleanup,
- consumer/eventing quality,
- validation closure.
Definition of Done for the Current ACPI Plan
This plan can be considered substantially complete only when:
- ownership boundaries are explicit — partially met (this doc; module-level cleanup still needed)
- boot-critical panic/silent-fallback paths are removed or justified — partially met (Task 1.1 done; Tasks 1.2 and 1.3 open)
- AML and EC behavior are no longer TODO-grade — partially met (mutex state and EC width done; AML init order and SLP_TYPb open)
- DMAR and IVRS ownership are cleanly separated — partially met (DMAR not wired; module still present in acpid)
- ACPI consumers are event-driven or explicitly bounded — partially met (shutdown done via kstop; sleep not implemented)
- sleep state transitions and eventing are implemented or explicitly documented as known gaps — open
- the repo contains platform evidence that supports its status claims — open (QEMU validated; bare-metal evidence still needed)
Current truthful status for Red Bear ACPI:
materially complete for historical bring-up, but still under active robustness, ownership, sleep-state, and validation improvement. Shutdown eventing is implemented via kstop. Sleep state transitions are a known gap. EC width support is implemented via byte transactions. AML mutex state is real-tracked, not placeholder. DMAR is not initialized by acpid. Bare-metal validation for the full ACPI surface is still outstanding.