# Red Bear OS — Low-Level Infrastructure Reassessment & Updated Plan **Version**: 1.0 (2026-05-21) **Supersedes**: Fragmentary assessments in `COMPREHENSIVE-SYSTEM-ASSESSMENT-AND-IMPROVEMENT-PLAN.md` §2–§4 for ACPI/IRQ/PCI/driver topics **Canonical adjacent plans** (remain authoritative for subsystem detail): - `ACPI-IMPROVEMENT-PLAN.md` — ACPI waves W0–W7 - `IRQ-AND-LOWLEVEL-CONTROLLERS-ENHANCEMENT-PLAN.md` — PCI/IRQ/MSI-X waves W1–W6 - `BOOT-PROCESS-HARDWARE-DETECTION-PLAN.md` — Boot detection waves W0–W6 - `SMP-SCHEDULER-IMPROVEMENT-PLAN.md` — SMP bottlenecks B1–B7 --- ## 1. Executive Summary This document is a **code-grounded reassessment** of four interdependent low-level subsystems: ACPI/acpid, IRQ/PCI, enumeration/driver binding, and driver infrastructure. It is based on direct source inspection (file paths and line numbers provided throughout), cross-referenced against existing plans. ### Bottom-line verdict | Subsystem | Verdict | Blocking Bare Metal? | |-----------|---------|---------------------| | **ACPI boot** | Boot-baseline complete, not release-grade | Partial — shutdown timing fragile | | **ACPI shutdown** | S5 derivation works, timing-dependent on PCI | Yes — pre-PCI shutdown degrades weakly | | **ACPI thermal/fan** | Discovery exists, no runtime backend | No — thermal safety gap | | **ACPI C-states** | Discovery exists, **no kernel cpuidle** | **Yes** — root cause of heat | | **IRQ delivery** | Architecturally strong, QEMU-proven only | Partial — no HW validation | | **MSI/MSI-X** | Code complete, **IOMMU validation stubbed** | **Yes** — `iommu_validate_msi_irq()` returns `true` | | **PCI enumeration** | Userspace-only (correct), pcid complete | No | | **Driver binding** | Manual class-code matching, no ACPI _HID/_CID | Partial — limited device coverage | | **redox-driver-sys** | Production quality, zero stubs | No | | **linux-kpi** | Structurally complete for GPU+Wi-Fi | No | | **GPU drivers** | Compile-only, synthetic EDID everywhere | **Yes** — no real display detection | | **Wi-Fi** | Compile+host-test only | Yes — no HW validation | | **USB** | xhcid only, no EHCI/UHCI/OHCI | **Yes** — legacy USB keyboards unreachable | ### What changed since last assessment (2026-05-20) 1. **Critical stub discovered**: `iommu_validate_msi_irq()` at `kernel/src/scheme/irq.rs:231` unconditionally returns `true` — this was not flagged as a blocking item in the IRQ enhancement plan (all 6 waves marked "complete"). 2. **Critical stub discovered**: `aml_physmem.rs:195` and `:274` fabricate zero values on physical memory access failure — affects all AML runtime evaluation. 3. **Dual AML interpreter architecture** identified as a maintenance risk — kernel `acpi_ext` crate and userspace `acpi` crate parse DSDT/SSDT independently. 4. **APIC timer disabled** (`local_apic.rs:81`) — not flagged in any existing plan as a blocker. 5. **Synthetic EDID used in all GPU drivers** — blocks real display detection on bare metal. 6. **40 total TODOs** in ACPI code (16 kernel + 24 userspace) — higher than previously documented. --- ## 2. ACPI / acpid Reassessment ### 2.1 Architecture The ACPI subsystem has **three operational levels**: ``` Bootloader → KernelArgs.hwdesc_base (RSDP pointer) │ ▼ Kernel ACPI (src/acpi/ + src/scheme/acpi.rs + src/arch/x86_shared/sleep.rs) ├── RSDP→RSDT/XSDT→SDT enumeration (MADT, SRAT, SLIT, HPET) ├── Export via /scheme/kernel.acpi/{rxsdt, kstop, sleep} └── Kernel-side AML interpreter (acpi_ext crate) for S3/S5 sleep │ ▼ Userspace acpid (drivers/acpid/src/) ├── Reads rxsdt, loads SDTs from physical memory ├── Userspace AML interpreter (acpi crate) — SEPARATE from kernel's ├── Exports /scheme/acpi/{dmi, tables, symbols, thermal, fan, cstates} └── Shutdown via kstop pipe + PM1a/PM1b write ``` ### 2.2 What Is Working | Component | File | Evidence | |-----------|------|----------| | RSDP discovery + dual checksum | `acpi/rsdp.rs` | ACPI 1.0 + 2.0+ validation, 62 lines | | MADT parsing (10 entry types) | `acpi/madt/mod.rs` | Types 0x0–0xA + aarch64 GICC/GICD, 340 lines | | x2APIC support | `acpi/madt/mod.rs` | Types 0x9/0xA, `P20–P22` patches | | IOAPIC init from MADT | `device/ioapic.rs` | GSI resolution, source overrides, affinity, 502 lines | | LAPIC/x2APIC | `device/local_apic.rs` | MSR + MMIO dual path, 312 lines | | SRAT/SLIT NUMA | `acpi/srat.rs`, `acpi/slit.rs` | Affinity + distance matrix | | HPET timer | `acpi/hpet.rs` | Init from ACPI tables | | Kernel scheme export | `scheme/acpi.rs` | rxsdt, kstop, sleep — 398 lines | | acpid SDT loading | `acpid/src/acpi.rs:162–217` | Page-span handling, PhysmapGuard | | acpid FADT parsing | `acpid/src/acpi.rs:965–1122` | ACPI 2.0 extended fields | | acpid EC handler | `acpid/src/ec.rs` | Full protocol (RD_EC/WR_EC/BE_EC/BD_EC/QR_EC), 317 lines | | acpid S5 derivation | `acpid/src/acpi.rs:754–813` | FADT + AML \__S5, cached | | acpid DMI | `acpid/src/dmi.rs` | SMBIOS 32/64-bit entry points, 350 lines | | acpid thermal/fan/cstate discovery | `thermal.rs`, `fan.rs`, `cstate.rs` | AML-backed \__TZ, \__PR namespace | | hwd ACPI backend | `hwd/backend/acpi.rs` | \__CID/\__HID device discovery, 119 lines | ### 2.3 Critical Stubs | Location | Line | Issue | Severity | |----------|------|-------|----------| | `acpid/src/aml_physmem.rs` | 195 | `read_phys_or_fault()` returns `T::zero()` on failure — **fabricates data** | 🔴 CRITICAL | | `acpid/src/aml_physmem.rs` | 274 | `map_physical_region()` falls back to **zero page** on failure — writes lost | 🔴 CRITICAL | | `kernel/src/arch/x86_shared/sleep.rs` | 257–276 | `read_pci_u8/u16/u32` always return **0**; `write_pci_*` are no-ops | 🔴 CRITICAL | | `kernel/src/arch/x86_shared/sleep.rs` | 275 | `nanos_since_boot()` returns **0** — broken AML timing | 🟠 HIGH | | `kernel/src/arch/x86_shared/sleep.rs` | 294–298 | `acquire()`/`release()` for AML mutexes are **no-ops** | 🟠 HIGH | | `acpid/src/acpi.rs` | 545 | `Dmar::init(&this)` **commented out** — "TODO (hangs on real hardware)" | 🟠 HIGH | | `hwd/backend/legacy.rs` | 13 | `LegacyBackend::probe()` is a **TODO no-op** | 🟠 HIGH | | `acpid/src/acpi.rs` | 820–822 | `set_global_s_state(state)` returns `Ok` for any state != 5 | 🟡 MEDIUM | ### 2.4 Architectural Risks 1. **Dual AML interpreters**: Kernel `sleep.rs` uses `acpi_ext` crate; userspace `acpid` uses `acpi` crate. They parse the same DSDT/SSDT independently with different handler implementations. Bug fixes in one do not affect the other. 2. **RSDP_ADDR contract**: acpid AML init requires `RSDP_ADDR` environment variable (from `hwd` via `KernelArgs.hwdesc_base`). x86 has BIOS fallback; non-x86 paths are unresolved. 3. **S5 derivation timing**: Depends on AML readiness which depends on PCI registration. Pre-PCI shutdown falls back gracefully but the degraded contract is weak. 4. **DMAR orphaned**: 533 lines of Intel VT-d parsing code exist but are not wired into startup. ### 2.5 TODO Inventory - **Kernel ACPI**: 16 TODOs (`madt` arch variants, `hpet` x86 assumption, `spcr` type support, `scheme/acpi` context switch, `gtdt`) - **Userspace acpid**: 24 TODOs (`acpi.rs`: 10, `dmar/`: 9, `main.rs`: 3, `scheme.rs`: 1, `aml_physmem.rs`: 1) - **Total**: 40 TODOs ### 2.6 Alignment with ACPI-IMPROVEMENT-PLAN.md | Wave | Plan Status | Code Reality | Delta | |------|-------------|--------------|-------| | W0 Contracts | ~80% | Truth statement accurate | — | | W1 Startup hardening | ~60% | P19 patch removed panic-grade expects; remaining `expect()` in firmware-origin paths | Underdocumented | | W2 AML ordering/shutdown | ~50% | S5 derivation improved (P24); explicit error types exist; timing still coupled to PCI | Underdocumented | | W3 Honest power surface | Open | Battery/AC probing exists but not trustworthy; thermal/fan discovery real but no backend action | — | | W4 Physmem/EC/fault handling | ~40% | **Two critical stubs at lines 195, 274 not flagged in plan** | **New finding** | | W5 Ownership cleanup | Open | DMAR still orphaned; dual interpreters unresolved | — | | W6 Consumer integration | ~60% | kstop→sessiond path works | — | | W7 Validation closure | Open | No bare-metal validation matrix executed | — | --- ## 3. IRQ / PCI Reassessment ### 3.1 Architecture ``` PCI Device → MSI/MSI-X message (address 0xFEE0_0xxx + data) │ ▼ APIC (local or I/O) → Vector delivery to target CPU │ ▼ Kernel IDT → generic_irq handler (vec 32–255) │ ▼ scheme/irq.rs → irq_trigger(irq, token) ├── iommu_validate_msi_irq(irq) ← STUB: returns true unconditionally ├── increment COUNTS[irq] ├── walk HANDLES for matching fd └── trigger EVENT_READ │ ▼ Userspace driver → IrqHandle::wait() returns with count ``` ### 3.2 What Is Working | Component | File | Evidence | |-----------|------|----------| | IDT (256 entries) | `arch/x86_shared/idt.rs` | 224 generic vectors, legacy IRQ bindings, IPI handlers, 374 lines | | 8259 PIC | `arch/x86_shared/device/pic.rs` | Master/slave init, mask, ack, ISR query, 98 lines | | I/O APIC | `arch/x86_shared/device/ioapic.rs` | MADT-parsed, GSI resolution, affinity reprogramming, 502 lines | | LAPIC/x2APIC | `arch/x86_shared/device/local_apic.rs` | MMIO + MSR dual path, IPI, EOI, ESR, 312 lines | | IRQ dispatch | `arch/x86_shared/interrupt/irq.rs` | PIC/APIC switching, spurious accounting, 352 lines | | IRQ scheme | `scheme/irq.rs` | Registration, delivery, affinity, per-CPU listing, 650 lines | | MSI kernel code | `arch/x86_shared/device/msi.rs` | Message composition, validation, capability parsing, 183 lines | | Vector allocator | `arch/x86_shared/device/vector.rs` | CAS bitmap for 224 vectors, 53 lines | | redox-driver-sys IRQ | `redox-driver-sys/src/irq.rs` | MSI-X table mapping, vector allocation, affinity, 491 lines, **zero TODOs** | | redox-driver-sys PCI | `redox-driver-sys/src/pci.rs` | Config space, BAR probing, MSI-X enable, 1446 lines, **zero TODOs** | | pcid daemon | `drivers/pcid/src/` | Enumeration, scheme:pci, driver spawn, ~1400 lines total | | driver-manager | `driver-manager/src/main.rs` | PciBus + AcpiBus binding, boot timeline, 553 lines | ### 3.3 Critical Stubs | Location | Line | Issue | Severity | |----------|------|-------|----------| | `kernel/src/scheme/irq.rs` | 231 | `iommu_validate_msi_irq(_irq) -> bool { true }` — **zero IOMMU validation** | 🔴 CRITICAL | | `kernel/src/arch/x86_shared/device/local_apic.rs` | 81 | `//self.setup_timer();` — **APIC timer disabled** | 🟠 HIGH | | `kernel/src/arch/x86_shared/interrupt/irq.rs` | 307 | `println!("Local apic timer interrupt");` — debug artifact | 🟡 MEDIUM | | `kernel/src/arch/x86_shared/device/ioapic.rs` | 329–331 | `.unwrap()` on cpuid — panic risk | 🟡 MEDIUM | | `drivers/pcid/src/driver_interface/irq_helpers.rs` | — | "FIXME for cpu_id >255 need IOMMU IRQ remapping" | 🟠 HIGH | | `drivers/pcid/src/driver_interface/irq_helpers.rs` | — | "FIXME allow allocating multiple interrupt vectors" | 🟠 HIGH | ### 3.4 Patch-Backed Code The following kernel code does **not exist in upstream** — it is entirely Red Bear patches: - `msi.rs` (+183 lines) — added by `P8-msi.patch` (281 lines, 12 hunks) - `vector.rs` (+53 lines) — added by `P8-msi.patch` - IOAPIC affinity — `P9-ioapic-irq-affinity.patch` - IRQ affinity wiring — `P10-irq-affinity-wiring.patch` - x2APIC ICR fix — `P20-x2apic-icr-mode-fix.patch` - x2APIC SMP fix — `P21-x2apic-smp-fix.patch` - x2APIC MADT fallback — `P22-x2apic-madt-fallback.patch` **Risk**: If upstream kernel rebases, these patches must be rebased. The MSI/MSI-X subsystem is entirely patch-dependent. ### 3.5 Alignment with IRQ Enhancement Plan The plan reports all 6 Waves as **✅ Complete**. Code inspection confirms the Waves addressed panic hardening and code quality. However, **6 priority areas remain entirely open** and the plan does not flag: - `iommu_validate_msi_irq()` stub (CRITICAL — not mentioned) - APIC timer disabled (not mentioned) - Single-vector-per-device limit (mentioned as FIXME but not prioritized) --- ## 4. Enumeration / Driver Binding Reassessment ### 4.1 Current Flow ``` pcid enumerates PCI bus → /scheme/pci/{segment}--{bus}--{device}.{function}/ │ ▼ driver-manager (or pcid-spawner legacy) reads /scheme/pci/ │ ▼ For each device: query config space (vendor, device, class, subclass) │ ▼ Match against driver config (PCI class/vendor/device ID lookup) │ ▼ Spawn driver daemon with PCID_CLIENT_CHANNEL env var │ ▼ Driver opens /scheme/pci/{addr}/config and /scheme/irq/{irq} ``` ### 4.2 Limitations 1. **No ACPI _HID/_CID matching**: Non-PCI devices (ACPI-enumerated GPIO, I2C, etc.) are not bound through the driver-manager. 2. **No modalias generation**: Drivers are matched by simple class-code or vendor/device ID — no automatic alias generation from PCI class/subclass/prog-if. 3. **LegacyBackend is a stub**: `hwd/backend/legacy.rs:13` — "TODO: handle driver spawning from legacy backend" — any non-ACPI, non-DTB platform gets no hardware discovery. 4. **Initfs transitional**: `hwd` and `acpid` live on initfs boot path, not under stable rootfs service contract. ### 4.3 Alignment with Boot-Process-Hardware-Detection-Plan.md | Wave | Plan Status | Code Reality | |------|-------------|--------------| | W0 Boot stage definitions | ✅ Done | Config-only | | W1 ACPI bus in driver-manager | ✅ Done | `AcpiBus` exists | | W2 Resource parser (_CRS, _PRT) | ✅ Done | Parsed | | W2b ACPI device binding | ✅ Done | Wired | | W2c GPIO/I2C configs | Partial | Runtime _CRS evaluation **not started** | | W3 Service rewiring | ✅ Done | Stage targets wired | | W4 Dead /etc/pcid.d/ removal | ✅ Done | Removed | | W5 Deferred probing | ✅ Already had | Scheme-aware | | W6 USB topology enumeration | **Not started** | Depends on xHCI IRQ stability | --- ## 5. Driver Infrastructure Reassessment ### 5.1 redox-driver-sys **Status: ✅ Production quality, zero stubs, zero TODOs** - **Schemes**: memory (physical mapping, cache type control), irq (registration, wait, affinity), pci (enumeration, config space, BARs, MSI-X) - **Quirks**: 3-layer (compiled-in 11 entries + TOML runtime + DMI/SMBIOS 8 rules), 22 PCI flags, 21 USB flags - **MSI-X**: Full `MsixTable` with validated x86 message programming, vector allocation, CPU round-robin - **DMA**: `DmaBuffer` (phys-contiguous), `IommuDmaAllocator` (MAP/UNMAP protocol) - **Tests**: 30+ unit tests in `pci.rs` ### 5.2 linux-kpi **Status: ✅ Structurally complete for GPU + Wi-Fi, 119 tests passing, zero stubs** - **17 Rust modules**, **32 C headers** - **Full implementations**: pci (777 lines), net (809), wireless (1002), mac80211 (959), irq (228), firmware (277), drm_shim (374) - **No `todo!()`/`unimplemented!()`** in any audited module - **C header coverage**: pci.h, skbuff.h, interrupt.h, firmware.h, netdevice.h, ieee80211.h, nl80211.h, cfg80211.h, mac80211.h, drm*.h, atomic.h, spinlock.h, mutex.h, workqueue.h, timer.h, wait.h, list.h, slab.h, mm.h, io.h, types.h, errno.h, compiler.h, export.h, printk.h, module.h, refcount.h, jiffies.h, kernel.h, idr.h, bug.h ### 5.3 firmware-loader **Status: ✅ Production quality** - `scheme:firmware` daemon with `SchemeSync` impl - MANIFEST generation (BLAKE3), `--probe`, `--request-nowait` - Path traversal prevention, 64MB blob cap, cache with source signature validation - AMD GPU: 17 firmware keys expected; Intel: per-generation DMC firmware ### 5.4 GPU Drivers | Driver | Status | Key Gap | |--------|--------|---------| | redox-drm (AMD) | 🟡 Compiles, 616 lines | `synthetic_edid()` fallback — no real DDC/I²C | | redox-drm (Intel) | 🟡 Compiles, 693 lines | `synthetic_edid()` fallback — no real DDC/I²C | | redox-drm (VirtIO) | 🟡 Compiles | `synthetic_edid()` fallback | | amdgpu (C port) | 🟡 Compiles, ~1487 lines | Hardcoded 4 connector descriptors, no real HPD | **All three GPU drivers use `synthetic_edid()`** at `redox-drm/src/kms/connector.rs:35` — a hardcoded 128-byte EDID 1.4 block for 1920×1080@60Hz. This blocks real display detection on bare metal. ### 5.5 Wi-Fi **Status: 🟡 Compiles + host-tested, zero hardware validation** - `redbear-iwlwifi`: C transport layer (~2450 lines) + Rust daemon (~1550 lines) - 8 host tests pass - Commands time out without real firmware — by design - No Intel Wi-Fi device ever exercised ### 5.6 USB **Status: 🟡 xhcid builds + QEMU proofs pass, bare-metal incomplete** - xhcid: Red Bear patched, QEMU IRQ delivery proven - usbscsid: USB mass storage with inline quirks (214 storage quirks) - usbhubd: Hub port management - **Gap**: No EHCI, UHCI, or OHCI drivers — legacy USB keyboards on companion controllers are unreachable on bare metal --- ## 6. Cross-Cutting Critical Gaps (Updated Priority) ### Gap 1 — IOMMU MSI Validation (CRITICAL) **File**: `kernel/src/scheme/irq.rs:231` ```rust fn iommu_validate_msi_irq(_irq: u8) -> bool { true } ``` Every MSI/MSI-X interrupt bypasses IOMMU remapping validation. This is a security and correctness gap. The hook exists but has zero logic. **Root cause**: IOMMU daemon (`iommu`) provides AMD-Vi runtime but no Intel VT-d. The validation function needs remapping table data from the IOMMU daemon, or validation must move to userspace via a scheme call. **Action**: Implement real validation against IOMMU remapping tables, or explicitly document that MSI/MSI-X without IOMMU is only safe on trusted buses. ### Gap 2 — AML Physical Memory Stubs (CRITICAL) **Files**: `acpid/src/aml_physmem.rs:195`, `:274` - `read_phys_or_fault()` returns `T::zero()` on failure — fabricates data - `map_physical_region()` falls back to zero page — silent data loss **Impact**: Any AML method accessing a physical memory region that fails to map will see fabricated zeroes. This can cause: - Incorrect battery/thermal readings - Silent EC communication failures - Wrong power state transitions **Action**: Propagate `Result` errors to AML evaluation callers instead of fabricating values. ### Gap 3 — Kernel Sleep Path PCI Stubs (CRITICAL) **File**: `kernel/src/arch/x86_shared/sleep.rs:257–276` - `read_pci_u8/u16/u32` always return 0 - `write_pci_*` are no-ops **Impact**: Any AML code using PCI config space access in the kernel S3/S5 sleep path gets fabricated values. This is only safe if the sleep path guarantees no PCI-dependent AML methods are evaluated. **Action**: Either wire real PCI config space access in the kernel sleep path, or explicitly scope the kernel AML interpreter to exclude PCI-dependent methods. ### Gap 4 — APIC Timer Disabled (HIGH) **File**: `kernel/src/arch/x86_shared/device/local_apic.rs:81` - `setup_timer()` commented out - System uses PIT fallback for all timer interrupts **Impact**: No per-CPU timer interrupts (all CPUs share PIT on BSP), no TSC deadline mode for modern CPUs, potential timer skew on SMP. **Action**: Re-enable APIC timer with calibration against PIT or TSC. Required for per-CPU timer distribution. ### Gap 5 — Synthetic EDID in All GPU Drivers (HIGH) **File**: `redox-drm/src/kms/connector.rs:35` - All three drivers (AMD, Intel, VirtIO) use hardcoded EDID - No real DDC/I²C display detection **Impact**: Display will not work on bare metal with non-1080p panels, multi-monitor setups, or displays with non-standard timings. **Action**: Implement I²C-over-DDC EDID retrieval in `redox-drm`, or at minimum implement a real connector detection path that queries HPD + DDC before falling back to synthetic. ### Gap 6 — Dual AML Interpreters (HIGH) **Files**: `kernel/src/arch/x86_shared/sleep.rs` (acpi_ext crate) + `acpid/src/acpi.rs` (acpi crate) - Two independent parsers for the same DSDT/SSDT - Different handler implementations (kernel has PCI stubs, userspace has physmem stubs) - Bug fixes in one do not affect the other **Impact**: Maintenance risk, correctness divergence, two surfaces for AML security issues. **Action**: Converge on a single canonical interpreter. Recommendation: userspace (acpid) since all drivers are userspace per project model. Kernel sleep path should delegate to userspace or use a shared, read-only AML namespace. ### Gap 7 — No EHCI/UHCI/OHCI Drivers (HIGH) **Impact**: Legacy USB keyboards on companion controller paths unreachable on bare metal. Only xHCI-native USB devices work. **Action**: Implement EHCI driver (highest priority — covers most USB 2.0 controllers with xHCI companion). UHCI/OHCI are lower priority (very old hardware). ### Gap 8 — No C-State Kernel Backend (HIGH) **Impact**: CPUs run at full frequency constantly on bare metal. Thermal throttling only. **Action**: Implement `cpuidle`/`cpufreq` kernel backend using MWAIT or HLT. Discovery exists in acpid (`cstate.rs`) but kernel has no idle driver. ### Gap 9 — DMAR Orphaned (MEDIUM) **File**: `acpid/src/acpi.rs:545` - 533 lines of Intel VT-d parsing code - `Dmar::init()` commented out — "hangs on real hardware" **Action**: Either fix the hang and assign a runtime owner (iommu daemon), or remove the orphaned code until ready. ### Gap 10 — >256 CPU MSI Remapping (MEDIUM) **File**: `drivers/pcid/src/driver_interface/irq_helpers.rs` - 8-bit APIC destination field limits MSI target selection - IOMMU interrupt remapping required for >256 CPUs **Action**: Gated on IOMMU maturity (Gap 1). --- ## 7. Updated Execution Plan ### Phase 1: Critical Stub Removal (2–3 weeks) **Goal**: Remove all CRITICAL-severity stubs before any hardware validation. | # | Task | File | Effort | Owner | |---|------|------|--------|-------| | 1.1 | Fix `read_phys_or_fault()` zero-return | `acpid/src/aml_physmem.rs:195` | 2 days | — | | 1.2 | Fix `map_physical_region()` zero-page fallback | `acpid/src/aml_physmem.rs:274` | 2 days | — | | 1.3 | Fix kernel sleep path PCI read stubs | `kernel/src/arch/x86_shared/sleep.rs:257–276` | 3 days | — | | 1.4 | Document kernel PCI stub scope | `sleep.rs` | 1 day | — | | 1.5 | Remove `println!` debug artifact | `kernel/src/arch/x86_shared/interrupt/irq.rs:307` | 1 hour | — | **Gate**: All CRITICAL stubs removed + `cargo check` clean on affected modules. ### Phase 2: IOMMU + MSI Validation (3–4 weeks) **Goal**: Make MSI/MSI-X delivery trustworthy. | # | Task | File | Effort | Owner | |---|------|------|--------|-------| | 2.1 | Implement `iommu_validate_msi_irq()` real logic | `kernel/src/scheme/irq.rs:231` | 1 week | — | | 2.2 | Wire IOMMU remapping table read into kernel | `iommu` daemon ↔ `scheme/irq` | 1 week | — | | 2.3 | QEMU validation: MSI-X with IOMMU enabled | `test-msix-qemu.sh` | 2 days | — | | 2.4 | Fix or remove orphaned DMAR code | `acpid/src/acpi.rs:545` | 2 days | — | **Gate**: `test-msix-qemu.sh` passes with IOMMU enabled + no `iommu_validate_msi_irq()` stub. ### Phase 3: Timer + CPU Power (2–3 weeks) **Goal**: Enable per-CPU timers and basic CPU idle. | # | Task | File | Effort | Owner | |---|------|------|--------|-------| | 3.1 | Re-enable APIC timer with calibration | `kernel/src/arch/x86_shared/device/local_apic.rs:81` | 3 days | — | | 3.2 | Implement kernel cpuidle backend (MWAIT/HLT) | New file: `kernel/src/arch/x86_shared/cpuidle.rs` | 1 week | — | | 3.3 | Wire acpid C-state discovery to kernel idle | `acpid/src/cstate.rs` → kernel | 3 days | — | | 3.4 | QEMU validation: timer + idle | `test-timer-qemu.sh` | 2 days | — | **Gate**: `test-timer-qemu.sh` passes with APIC timer + CPU idle active. ### Phase 4: Display Detection (4–6 weeks) **Goal**: Replace synthetic EDID with real display detection. | # | Task | File | Effort | Owner | |---|------|------|--------|-------| | 4.1 | Implement I²C-over-DDC EDID retrieval | `redox-drm/src/kms/ddc.rs` (new) | 2 weeks | — | | 4.2 | Wire HPD interrupt to connector detection | `redox-drm/src/drivers/amd/mod.rs`, `intel/mod.rs` | 1 week | — | | 4.3 | Replace `synthetic_edid()` with real → fallback | `redox-drm/src/kms/connector.rs:35` | 3 days | — | | 4.4 | QEMU validation: EDID readback | `test-drm-display-runtime.sh` | 2 days | — | | 4.5 | Bare-metal validation: AMD GPU display | `test-amd-gpu.sh` | 1 week | — | | 4.6 | Bare-metal validation: Intel GPU display | `test-intel-gpu.sh` | 1 week | — | **Gate**: Real EDID retrieved from at least one display on bare metal (AMD or Intel). ### Phase 5: USB Legacy Controllers (3–4 weeks) **Goal**: Enable USB keyboard on non-xHCI paths. | # | Task | File | Effort | Owner | |---|------|------|--------|-------| | 5.1 | Implement EHCI host controller driver | `local/recipes/drivers/ehcid/` (new) | 2 weeks | — | | 5.2 | Wire EHCI into driver-manager PCI binding | `driver-manager/src/main.rs` | 3 days | — | | 5.3 | QEMU validation: EHCI keyboard | `test-usb-qemu.sh` | 2 days | — | | 5.4 | UHCI/OHCI assessment | — | 1 week | — | **Gate**: USB keyboard works via EHCI in QEMU. ### Phase 6: AML Convergence (3–4 weeks) **Goal**: Resolve dual AML interpreter risk. | # | Task | File | Effort | Owner | |---|------|------|--------|-------| | 6.1 | Evaluate kernel sleep.rs → userspace delegation | `kernel/src/arch/x86_shared/sleep.rs` | 1 week | — | | 6.2 | Implement kernel→userspace S3/S5 sleep RPC | `scheme/kernel.acpi/sleep` → `acpid` | 1 week | — | | 6.3 | Remove kernel `acpi_ext` crate if delegated | `kernel/src/arch/x86_shared/sleep.rs` | 3 days | — | | 6.4 | QEMU validation: sleep/wake cycle | `test-sleep-qemu.sh` | 2 days | — | **Gate**: S5 shutdown works with single AML interpreter (userspace only). ### Phase 7: Hardware Validation Matrix (4–6 weeks, parallel with 4–6) **Goal**: Evidence-based support claims. | # | Task | Hardware | Effort | |---|------|----------|--------| | 7.1 | Class A1 validation (AMD desktop + discrete GPU) | Ryzen 5000/7000 + AMD GPU | 1 week | | 7.2 | Class A2 validation (Intel desktop + iGPU) | Core 12th–14th Gen | 1 week | | 7.3 | Class A3 validation (AMD laptop) | Ryzen Mobile | 1 week | | 7.4 | Class A4 validation (Intel laptop) | Core Mobile | 1 week | | 7.5 | Regression test suite on all 4 classes | All | 2 weeks | **Gate**: All 4 hardware classes pass boot, shutdown, USB keyboard, and display detection. --- ## 8. Timeline Synthesis ``` Week 1–3: Phase 1 — Critical stub removal Week 4–7: Phase 2 — IOMMU + MSI validation Week 7–9: Phase 3 — Timer + CPU power (parallel with Phase 2 week 7) Week 10–15: Phase 4 — Display detection (parallel with Phase 5) Week 10–13: Phase 5 — USB legacy controllers (parallel with Phase 4) Week 14–17: Phase 6 — AML convergence Week 14–19: Phase 7 — Hardware validation matrix (parallel with Phase 6) Total: 19 weeks (≈4.5 months) with 2 developers ``` ### What the existing plans said vs this plan | Plan | Claimed Timeline | Reality | |------|-----------------|---------| | COMPREHENSIVE P1 (bare-metal hardening) | 6–8 weeks | Understated — no critical stub removal phase | | COMPREHENSIVE P2 (USB) | 4–6 weeks | Realistic for EHCI only | | COMPREHENSIVE P3 (IRQ/IOMMU) | 4–6 weeks | Realistic if focused on Gap 1 only | | IRQ plan Waves 1–6 | "Complete" | Code quality complete, validation not started | | ACPI plan Waves 0–7 | W0–W4 partial, W5–W7 open | Accurate, but two critical stubs not flagged | | SMP plan bottlenecks | 11–18 days | Realistic for B1–B2 only | ### Dependencies ``` Phase 1 (stub removal) │ ├── required by ──► Phase 2 (IOMMU validation) │ ├── required by ──► Phase 3 (timer + idle) │ └── required by ──► Phase 4 (display detection) Phase 2 (IOMMU) └── required by ──► Phase 7 (hardware validation — safe MSI) Phase 3 (timer + idle) └── required by ──► Phase 7 (hardware validation — no overheating) Phase 4 (display) └── required by ──► Phase 7 (hardware validation — working console) Phase 5 (USB EHCI) └── required by ──► Phase 7 (hardware validation — keyboard input) Phase 6 (AML convergence) └── not blocking ──► Phase 7 (can validate with dual interpreters) ``` --- ## 9. Risk Register | # | Risk | Likelihood | Impact | Mitigation | |---|------|-----------|--------|------------| | R1 | `aml_physmem` stub fix reveals deeper AML memory access issues | Medium | High | Fix with comprehensive error propagation; add fallback to kernel scheme for problematic regions | | R2 | IOMMU validation implementation requires kernel ABI change | Medium | High | Prototype in userspace first via `scheme:iommu` call; only promote to kernel if performance requires it | | R3 | APIC timer calibration fails on specific CPU models | Medium | Medium | Keep PIT fallback path; detect calibration failure and degrade gracefully | | R4 | DDC/I²C implementation requires GPIO/I2C subsystem not yet built | High | High | Scope Phase 4 to "query EDID via ACPI _DDC method first, then direct I²C"; fallback to synthetic still acceptable for initial bring-up | | R5 | EHCI driver requires IRQ/MSI-X fixes first | Medium | Medium | Phase 5 starts after Phase 2 gate; use legacy IRQ for EHCI if MSI-X not ready | | R6 | AML convergence breaks S3 sleep path | Medium | High | Keep kernel sleep.rs as fallback during transition; remove only after S3 validated via userspace path | | R7 | No bare-metal hardware available for validation | Medium | Critical | Prioritize QEMU proofs for all phases; document "QEMU-validated" vs "bare-metal-validated" per subsystem | --- ## 10. Verification Gates ### Gate A: Boot-Baseline Ready (end of Phase 1) - [ ] `aml_physmem.rs:195` returns `Result` instead of `T::zero()` - [ ] `aml_physmem.rs:274` propagates mapping errors instead of zero-page fallback - [ ] `sleep.rs:257–276` either wired to real PCI or explicitly scoped out - [ ] `cargo check` clean on `acpid`, `kernel`, `redox-drm` - [ ] `repo validate-patches kernel` passes - [ ] `repo validate-patches base` passes ### Gate B: IRQ/IOMMU Trustworthy (end of Phase 2) - [ ] `iommu_validate_msi_irq()` performs real validation - [ ] `test-msix-qemu.sh` passes with IOMMU enabled - [ ] `test-iommu-qemu.sh` passes - [ ] No unconditional `true` returns in IRQ validation path ### Gate C: Timer + Power (end of Phase 3) - [ ] APIC timer fires and calibrates correctly in QEMU - [ ] CPU idle backend enters C1/C2 via MWAIT or HLT - [ ] `test-timer-qemu.sh` passes - [ ] No PIT-only fallback in boot log ### Gate D: Display Detection (end of Phase 4) - [ ] `synthetic_edid()` is fallback, not primary - [ ] Real EDID retrieved from at least one display in QEMU - [ ] `test-drm-display-runtime.sh` passes ### Gate E: USB Legacy (end of Phase 5) - [ ] EHCI driver enumerates devices in QEMU - [ ] USB keyboard functional via EHCI in QEMU - [ ] `test-usb-qemu.sh` passes ### Gate F: Single AML Interpreter (end of Phase 6) - [ ] S5 shutdown works with userspace AML only - [ ] Kernel `acpi_ext` crate removed or explicitly deprecated - [ ] `test-sleep-qemu.sh` passes (S3 + S5) ### Gate G: Hardware Validation (end of Phase 7) - [ ] Class A1 (AMD desktop) boots, shuts down, displays, accepts USB keyboard - [ ] Class A2 (Intel desktop) boots, shuts down, displays, accepts USB keyboard - [ ] Class A3 (AMD laptop) boots, shuts down, displays, accepts USB keyboard - [ ] Class A4 (Intel laptop) boots, shuts down, displays, accepts USB keyboard - [ ] Validation artifacts committed to `local/docs/HARDWARE-VALIDATION-MATRIX.md` --- ## 11. Appendix: Key File Reference ### ACPI - `recipes/core/kernel/source/src/acpi/mod.rs` — Kernel ACPI orchestrator - `recipes/core/kernel/source/src/acpi/rsdp.rs` — RSDP discovery - `recipes/core/kernel/source/src/acpi/madt/mod.rs` — MADT parser - `recipes/core/kernel/source/src/scheme/acpi.rs` — Kernel ACPI scheme - `recipes/core/kernel/source/src/arch/x86_shared/sleep.rs` — Kernel AML interpreter for sleep - `recipes/core/kernel/source/src/arch/x86_shared/stop.rs` — Shutdown orchestrator - `recipes/core/base/source/drivers/acpid/src/main.rs` — acpid daemon entry - `recipes/core/base/source/drivers/acpid/src/acpi.rs` — Core ACPI context - `recipes/core/base/source/drivers/acpid/src/aml_physmem.rs` — AML physmem handler (stubs at :195, :274) - `recipes/core/base/source/drivers/acpid/src/ec.rs` — Embedded Controller handler - `recipes/core/base/source/drivers/acpid/src/thermal.rs` — Thermal zone discovery - `recipes/core/base/source/drivers/acpid/src/fan.rs` — Fan device discovery - `recipes/core/base/source/drivers/acpid/src/cstate.rs` — C-state discovery - `recipes/core/base/source/drivers/acpid/src/dmi.rs` — SMBIOS DMI parser - `recipes/core/base/source/drivers/hwd/src/backend/acpi.rs` — hwd ACPI backend - `recipes/core/base/source/drivers/hwd/src/backend/legacy.rs` — LegacyBackend stub (:13) ### IRQ / PCI - `recipes/core/kernel/source/src/scheme/irq.rs` — IRQ scheme (stub at :231) - `recipes/core/kernel/source/src/arch/x86_shared/interrupt/irq.rs` — IRQ dispatch - `recipes/core/kernel/source/src/arch/x86_shared/device/ioapic.rs` — I/O APIC - `recipes/core/kernel/source/src/arch/x86_shared/device/local_apic.rs` — LAPIC (timer disabled at :81) - `recipes/core/kernel/source/src/arch/x86_shared/device/msi.rs` — MSI code (patch-based) - `recipes/core/kernel/source/src/arch/x86_shared/device/vector.rs` — Vector allocator (patch-based) - `recipes/core/kernel/source/src/arch/x86_shared/device/pic.rs` — 8259 PIC - `recipes/core/kernel/source/src/arch/x86_shared/idt.rs` — IDT setup - `local/recipes/drivers/redox-driver-sys/source/src/irq.rs` — Userspace IRQ handling - `local/recipes/drivers/redox-driver-sys/source/src/pci.rs` — Userspace PCI abstraction - `recipes/core/base/source/drivers/pcid/src/main.rs` — pcid daemon - `recipes/core/base/source/drivers/pcid/src/scheme.rs` — PciScheme - `recipes/core/base/source/drivers/pcid/src/driver_interface/irq_helpers.rs` — IRQ helper FIXMEs - `local/recipes/system/driver-manager/source/src/main.rs` — Driver manager ### Driver Infrastructure - `local/recipes/drivers/redox-driver-sys/source/src/lib.rs` — Core library - `local/recipes/drivers/redox-driver-sys/source/src/quirks/mod.rs` — Quirks API - `local/recipes/drivers/linux-kpi/source/src/lib.rs` — linux-kpi crate - `local/recipes/drivers/linux-kpi/source/src/rust_impl/pci.rs` — PCI KPI (777 lines) - `local/recipes/drivers/linux-kpi/source/src/rust_impl/drm_shim.rs` — DRM GEM shim - `local/recipes/drivers/linux-kpi/source/src/rust_impl/mac80211.rs` — mac80211 KPI (959 lines) - `local/recipes/drivers/linux-kpi/source/src/rust_impl/wireless.rs` — cfg80211 KPI (1002 lines) - `local/recipes/system/firmware-loader/source/src/main.rs` — firmware-loader daemon - `local/recipes/gpu/redox-drm/source/src/main.rs` — DRM daemon - `local/recipes/gpu/redox-drm/source/src/drivers/amd/mod.rs` — AMD GPU driver - `local/recipes/gpu/redox-drm/source/src/drivers/intel/mod.rs` — Intel GPU driver - `local/recipes/gpu/redox-drm/source/src/kms/connector.rs` — Connector + synthetic EDID (:35) - `local/recipes/gpu/amdgpu/source/amdgpu_redox_main.c` — Bounded AMD display C port - `local/recipes/gpu/amdgpu/source/redox_glue.h` — Linux→Redox C glue - `local/recipes/gpu/amdgpu/source/redox_stubs.c` — Kernel emulation stubs ### Patches - `local/patches/kernel/redbear-consolidated.patch` — Consolidated mega-patch - `local/patches/kernel/P8-msi.patch` — MSI + vector allocator - `local/patches/kernel/P9-ioapic-irq-affinity.patch` — IRQ affinity - `local/patches/kernel/P10-irq-affinity-wiring.patch` — Affinity wiring - `local/patches/kernel/P20-x2apic-icr-mode-fix.patch` — x2APIC ICR - `local/patches/kernel/P21-x2apic-smp-fix.patch` — x2APIC SMP - `local/patches/kernel/P22-x2apic-madt-fallback.patch` — x2APIC MADT fallback - `local/patches/kernel/P24-cstate-mwait-idle.patch` — C-state MWAIT - `local/patches/kernel/P25-cpuidle-deep-cstates.patch` — Deep C-states - `local/patches/base/P19-acpid-startup-hardening.patch` — acpid startup - `local/patches/base/P24-acpi-s5-derivation-shutdown-semantics.patch` — S5 derivation - `local/patches/base/P44-acpid-thermal-zones.patch` — Thermal zones - `local/patches/base/P48-acpid-fan-support.patch` — Fan support - `local/patches/base/P52-acpid-cstates.patch` — C-state discovery --- ## 12. Document Authority This document is a **cross-cutting reassessment** that references but does not replace the canonical subsystem plans: - For ACPI wave-level execution detail, see `ACPI-IMPROVEMENT-PLAN.md` - For IRQ/PCI wave-level execution detail, see `IRQ-AND-LOWLEVEL-CONTROLLERS-ENHANCEMENT-PLAN.md` - For boot detection wave detail, see `BOOT-PROCESS-HARDWARE-DETECTION-PLAN.md` - For SMP bottleneck detail, see `SMP-SCHEDULER-IMPROVEMENT-PLAN.md` - For desktop path blockers, see `CONSOLE-TO-KDE-DESKTOP-PLAN.md` **When this document conflicts with a canonical subsystem plan**, the **canonical plan** wins on subsystem-specific details, and this document wins on cross-cutting prioritization and inter-subsystem dependencies. **This document should be updated** after each phase gate is reached, or when new critical stubs are discovered.