Files
RedBear-OS/local/docs/LOWLEVEL-INFRASTRUCTURE-REASSESSMENT-AND-PLAN.md
T

37 KiB
Raw Blame History

Red Bear OS — Low-Level Infrastructure Reassessment & Updated Plan

Version: 1.0 (2026-05-21) Supersedes: Fragmentary assessments in COMPREHENSIVE-SYSTEM-ASSESSMENT-AND-IMPROVEMENT-PLAN.md §2–§4 for ACPI/IRQ/PCI/driver topics Canonical adjacent plans (remain authoritative for subsystem detail):

  • ACPI-IMPROVEMENT-PLAN.md — ACPI waves W0W7
  • IRQ-AND-LOWLEVEL-CONTROLLERS-ENHANCEMENT-PLAN.md — PCI/IRQ/MSI-X waves W1W6
  • BOOT-PROCESS-HARDWARE-DETECTION-PLAN.md — Boot detection waves W0W6
  • SMP-SCHEDULER-IMPROVEMENT-PLAN.md — SMP bottlenecks B1B7

1. Executive Summary

This document is a code-grounded reassessment of four interdependent low-level subsystems: ACPI/acpid, IRQ/PCI, enumeration/driver binding, and driver infrastructure. It is based on direct source inspection (file paths and line numbers provided throughout), cross-referenced against existing plans.

Bottom-line verdict

Subsystem Verdict Blocking Bare Metal?
ACPI boot Boot-baseline complete, not release-grade Partial — shutdown timing fragile
ACPI shutdown S5 derivation works, timing-dependent on PCI Yes — pre-PCI shutdown degrades weakly
ACPI thermal/fan Discovery exists, no runtime backend No — thermal safety gap
ACPI C-states Discovery exists, no kernel cpuidle Yes — root cause of heat
IRQ delivery Architecturally strong, QEMU-proven only Partial — no HW validation
MSI/MSI-X Code complete, IOMMU validation stubbed Yesiommu_validate_msi_irq() returns true
PCI enumeration Userspace-only (correct), pcid complete No
Driver binding Manual class-code matching, no ACPI _HID/_CID Partial — limited device coverage
redox-driver-sys Production quality, zero stubs No
linux-kpi Structurally complete for GPU+Wi-Fi No
GPU drivers Compile-only, synthetic EDID everywhere Yes — no real display detection
Wi-Fi Compile+host-test only Yes — no HW validation
USB xhcid only, no EHCI/UHCI/OHCI Yes — legacy USB keyboards unreachable

What changed since last assessment (2026-05-20)

  1. Critical stub discovered: iommu_validate_msi_irq() at kernel/src/scheme/irq.rs:231 unconditionally returns true — this was not flagged as a blocking item in the IRQ enhancement plan (all 6 waves marked "complete").
  2. Critical stub discovered: aml_physmem.rs:195 and :274 fabricate zero values on physical memory access failure — affects all AML runtime evaluation.
  3. Dual AML interpreter architecture identified as a maintenance risk — kernel acpi_ext crate and userspace acpi crate parse DSDT/SSDT independently.
  4. APIC timer disabled (local_apic.rs:81) — not flagged in any existing plan as a blocker.
  5. Synthetic EDID used in all GPU drivers — blocks real display detection on bare metal.
  6. 40 total TODOs in ACPI code (16 kernel + 24 userspace) — higher than previously documented.

2. ACPI / acpid Reassessment

2.1 Architecture

The ACPI subsystem has three operational levels:

Bootloader → KernelArgs.hwdesc_base (RSDP pointer)
    │
    ▼
Kernel ACPI (src/acpi/ + src/scheme/acpi.rs + src/arch/x86_shared/sleep.rs)
    ├── RSDP→RSDT/XSDT→SDT enumeration (MADT, SRAT, SLIT, HPET)
    ├── Export via /scheme/kernel.acpi/{rxsdt, kstop, sleep}
    └── Kernel-side AML interpreter (acpi_ext crate) for S3/S5 sleep
    │
    ▼
Userspace acpid (drivers/acpid/src/)
    ├── Reads rxsdt, loads SDTs from physical memory
    ├── Userspace AML interpreter (acpi crate) — SEPARATE from kernel's
    ├── Exports /scheme/acpi/{dmi, tables, symbols, thermal, fan, cstates}
    └── Shutdown via kstop pipe + PM1a/PM1b write

2.2 What Is Working

Component File Evidence
RSDP discovery + dual checksum acpi/rsdp.rs ACPI 1.0 + 2.0+ validation, 62 lines
MADT parsing (10 entry types) acpi/madt/mod.rs Types 0x00xA + aarch64 GICC/GICD, 340 lines
x2APIC support acpi/madt/mod.rs Types 0x9/0xA, P20P22 patches
IOAPIC init from MADT device/ioapic.rs GSI resolution, source overrides, affinity, 502 lines
LAPIC/x2APIC device/local_apic.rs MSR + MMIO dual path, 312 lines
SRAT/SLIT NUMA acpi/srat.rs, acpi/slit.rs Affinity + distance matrix
HPET timer acpi/hpet.rs Init from ACPI tables
Kernel scheme export scheme/acpi.rs rxsdt, kstop, sleep — 398 lines
acpid SDT loading acpid/src/acpi.rs:162217 Page-span handling, PhysmapGuard
acpid FADT parsing acpid/src/acpi.rs:9651122 ACPI 2.0 extended fields
acpid EC handler acpid/src/ec.rs Full protocol (RD_EC/WR_EC/BE_EC/BD_EC/QR_EC), 317 lines
acpid S5 derivation acpid/src/acpi.rs:754813 FADT + AML __S5, cached
acpid DMI acpid/src/dmi.rs SMBIOS 32/64-bit entry points, 350 lines
acpid thermal/fan/cstate discovery thermal.rs, fan.rs, cstate.rs AML-backed __TZ, __PR namespace
hwd ACPI backend hwd/backend/acpi.rs __CID/__HID device discovery, 119 lines

2.3 Critical Stubs

Location Line Issue Severity
acpid/src/aml_physmem.rs 195 read_phys_or_fault() returns T::zero() on failure — fabricates data 🔴 CRITICAL
acpid/src/aml_physmem.rs 274 map_physical_region() falls back to zero page on failure — writes lost 🔴 CRITICAL
kernel/src/arch/x86_shared/sleep.rs 257276 read_pci_u8/u16/u32 always return 0; write_pci_* are no-ops 🔴 CRITICAL
kernel/src/arch/x86_shared/sleep.rs 275 nanos_since_boot() returns 0 — broken AML timing 🟠 HIGH
kernel/src/arch/x86_shared/sleep.rs 294298 acquire()/release() for AML mutexes are no-ops 🟠 HIGH
acpid/src/acpi.rs 545 Dmar::init(&this) commented out — "TODO (hangs on real hardware)" 🟠 HIGH
hwd/backend/legacy.rs 13 LegacyBackend::probe() is a TODO no-op 🟠 HIGH
acpid/src/acpi.rs 820822 set_global_s_state(state) returns Ok for any state != 5 🟡 MEDIUM

2.4 Architectural Risks

  1. Dual AML interpreters: Kernel sleep.rs uses acpi_ext crate; userspace acpid uses acpi crate. They parse the same DSDT/SSDT independently with different handler implementations. Bug fixes in one do not affect the other.
  2. RSDP_ADDR contract: acpid AML init requires RSDP_ADDR environment variable (from hwd via KernelArgs.hwdesc_base). x86 has BIOS fallback; non-x86 paths are unresolved.
  3. S5 derivation timing: Depends on AML readiness which depends on PCI registration. Pre-PCI shutdown falls back gracefully but the degraded contract is weak.
  4. DMAR orphaned: 533 lines of Intel VT-d parsing code exist but are not wired into startup.

2.5 TODO Inventory

  • Kernel ACPI: 16 TODOs (madt arch variants, hpet x86 assumption, spcr type support, scheme/acpi context switch, gtdt)
  • Userspace acpid: 24 TODOs (acpi.rs: 10, dmar/: 9, main.rs: 3, scheme.rs: 1, aml_physmem.rs: 1)
  • Total: 40 TODOs

2.6 Alignment with ACPI-IMPROVEMENT-PLAN.md

Wave Plan Status Code Reality Delta
W0 Contracts ~80% Truth statement accurate
W1 Startup hardening ~60% P19 patch removed panic-grade expects; remaining expect() in firmware-origin paths Underdocumented
W2 AML ordering/shutdown ~50% S5 derivation improved (P24); explicit error types exist; timing still coupled to PCI Underdocumented
W3 Honest power surface Open Battery/AC probing exists but not trustworthy; thermal/fan discovery real but no backend action
W4 Physmem/EC/fault handling ~40% Two critical stubs at lines 195, 274 not flagged in plan New finding
W5 Ownership cleanup Open DMAR still orphaned; dual interpreters unresolved
W6 Consumer integration ~60% kstop→sessiond path works
W7 Validation closure Open No bare-metal validation matrix executed

3. IRQ / PCI Reassessment

3.1 Architecture

PCI Device → MSI/MSI-X message (address 0xFEE0_0xxx + data)
    │
    ▼
APIC (local or I/O) → Vector delivery to target CPU
    │
    ▼
Kernel IDT → generic_irq handler (vec 32255)
    │
    ▼
scheme/irq.rs → irq_trigger(irq, token)
    ├── iommu_validate_msi_irq(irq)  ← STUB: returns true unconditionally
    ├── increment COUNTS[irq]
    ├── walk HANDLES for matching fd
    └── trigger EVENT_READ
    │
    ▼
Userspace driver → IrqHandle::wait() returns with count

3.2 What Is Working

Component File Evidence
IDT (256 entries) arch/x86_shared/idt.rs 224 generic vectors, legacy IRQ bindings, IPI handlers, 374 lines
8259 PIC arch/x86_shared/device/pic.rs Master/slave init, mask, ack, ISR query, 98 lines
I/O APIC arch/x86_shared/device/ioapic.rs MADT-parsed, GSI resolution, affinity reprogramming, 502 lines
LAPIC/x2APIC arch/x86_shared/device/local_apic.rs MMIO + MSR dual path, IPI, EOI, ESR, 312 lines
IRQ dispatch arch/x86_shared/interrupt/irq.rs PIC/APIC switching, spurious accounting, 352 lines
IRQ scheme scheme/irq.rs Registration, delivery, affinity, per-CPU listing, 650 lines
MSI kernel code arch/x86_shared/device/msi.rs Message composition, validation, capability parsing, 183 lines
Vector allocator arch/x86_shared/device/vector.rs CAS bitmap for 224 vectors, 53 lines
redox-driver-sys IRQ redox-driver-sys/src/irq.rs MSI-X table mapping, vector allocation, affinity, 491 lines, zero TODOs
redox-driver-sys PCI redox-driver-sys/src/pci.rs Config space, BAR probing, MSI-X enable, 1446 lines, zero TODOs
pcid daemon drivers/pcid/src/ Enumeration, scheme:pci, driver spawn, ~1400 lines total
driver-manager driver-manager/src/main.rs PciBus + AcpiBus binding, boot timeline, 553 lines

3.3 Critical Stubs

Location Line Issue Severity
kernel/src/scheme/irq.rs 231 iommu_validate_msi_irq(_irq) -> bool { true }zero IOMMU validation 🔴 CRITICAL
kernel/src/arch/x86_shared/device/local_apic.rs 81 //self.setup_timer();APIC timer disabled 🟠 HIGH
kernel/src/arch/x86_shared/interrupt/irq.rs 307 println!("Local apic timer interrupt"); — debug artifact 🟡 MEDIUM
kernel/src/arch/x86_shared/device/ioapic.rs 329331 .unwrap() on cpuid — panic risk 🟡 MEDIUM
drivers/pcid/src/driver_interface/irq_helpers.rs "FIXME for cpu_id >255 need IOMMU IRQ remapping" 🟠 HIGH
drivers/pcid/src/driver_interface/irq_helpers.rs "FIXME allow allocating multiple interrupt vectors" 🟠 HIGH

3.4 Patch-Backed Code

The following kernel code does not exist in upstream — it is entirely Red Bear patches:

  • msi.rs (+183 lines) — added by P8-msi.patch (281 lines, 12 hunks)
  • vector.rs (+53 lines) — added by P8-msi.patch
  • IOAPIC affinity — P9-ioapic-irq-affinity.patch
  • IRQ affinity wiring — P10-irq-affinity-wiring.patch
  • x2APIC ICR fix — P20-x2apic-icr-mode-fix.patch
  • x2APIC SMP fix — P21-x2apic-smp-fix.patch
  • x2APIC MADT fallback — P22-x2apic-madt-fallback.patch

Risk: If upstream kernel rebases, these patches must be rebased. The MSI/MSI-X subsystem is entirely patch-dependent.

3.5 Alignment with IRQ Enhancement Plan

The plan reports all 6 Waves as Complete. Code inspection confirms the Waves addressed panic hardening and code quality. However, 6 priority areas remain entirely open and the plan does not flag:

  • iommu_validate_msi_irq() stub (CRITICAL — not mentioned)
  • APIC timer disabled (not mentioned)
  • Single-vector-per-device limit (mentioned as FIXME but not prioritized)

4. Enumeration / Driver Binding Reassessment

4.1 Current Flow

pcid enumerates PCI bus → /scheme/pci/{segment}--{bus}--{device}.{function}/
    │
    ▼
driver-manager (or pcid-spawner legacy) reads /scheme/pci/
    │
    ▼
For each device: query config space (vendor, device, class, subclass)
    │
    ▼
Match against driver config (PCI class/vendor/device ID lookup)
    │
    ▼
Spawn driver daemon with PCID_CLIENT_CHANNEL env var
    │
    ▼
Driver opens /scheme/pci/{addr}/config and /scheme/irq/{irq}

4.2 Limitations

  1. No ACPI _HID/_CID matching: Non-PCI devices (ACPI-enumerated GPIO, I2C, etc.) are not bound through the driver-manager.
  2. No modalias generation: Drivers are matched by simple class-code or vendor/device ID — no automatic alias generation from PCI class/subclass/prog-if.
  3. LegacyBackend is a stub: hwd/backend/legacy.rs:13 — "TODO: handle driver spawning from legacy backend" — any non-ACPI, non-DTB platform gets no hardware discovery.
  4. Initfs transitional: hwd and acpid live on initfs boot path, not under stable rootfs service contract.

4.3 Alignment with Boot-Process-Hardware-Detection-Plan.md

Wave Plan Status Code Reality
W0 Boot stage definitions Done Config-only
W1 ACPI bus in driver-manager Done AcpiBus exists
W2 Resource parser (_CRS, _PRT) Done Parsed
W2b ACPI device binding Done Wired
W2c GPIO/I2C configs Partial Runtime _CRS evaluation not started
W3 Service rewiring Done Stage targets wired
W4 Dead /etc/pcid.d/ removal Done Removed
W5 Deferred probing Already had Scheme-aware
W6 USB topology enumeration Not started Depends on xHCI IRQ stability

5. Driver Infrastructure Reassessment

5.1 redox-driver-sys

Status: Production quality, zero stubs, zero TODOs

  • Schemes: memory (physical mapping, cache type control), irq (registration, wait, affinity), pci (enumeration, config space, BARs, MSI-X)
  • Quirks: 3-layer (compiled-in 11 entries + TOML runtime + DMI/SMBIOS 8 rules), 22 PCI flags, 21 USB flags
  • MSI-X: Full MsixTable with validated x86 message programming, vector allocation, CPU round-robin
  • DMA: DmaBuffer (phys-contiguous), IommuDmaAllocator (MAP/UNMAP protocol)
  • Tests: 30+ unit tests in pci.rs

5.2 linux-kpi

Status: Structurally complete for GPU + Wi-Fi, 119 tests passing, zero stubs

  • 17 Rust modules, 32 C headers
  • Full implementations: pci (777 lines), net (809), wireless (1002), mac80211 (959), irq (228), firmware (277), drm_shim (374)
  • No todo!()/unimplemented!() in any audited module
  • C header coverage: pci.h, skbuff.h, interrupt.h, firmware.h, netdevice.h, ieee80211.h, nl80211.h, cfg80211.h, mac80211.h, drm*.h, atomic.h, spinlock.h, mutex.h, workqueue.h, timer.h, wait.h, list.h, slab.h, mm.h, io.h, types.h, errno.h, compiler.h, export.h, printk.h, module.h, refcount.h, jiffies.h, kernel.h, idr.h, bug.h

5.3 firmware-loader

Status: Production quality

  • scheme:firmware daemon with SchemeSync impl
  • MANIFEST generation (BLAKE3), --probe, --request-nowait
  • Path traversal prevention, 64MB blob cap, cache with source signature validation
  • AMD GPU: 17 firmware keys expected; Intel: per-generation DMC firmware

5.4 GPU Drivers

Driver Status Key Gap
redox-drm (AMD) 🟡 Compiles, 616 lines synthetic_edid() fallback — no real DDC/I²C
redox-drm (Intel) 🟡 Compiles, 693 lines synthetic_edid() fallback — no real DDC/I²C
redox-drm (VirtIO) 🟡 Compiles synthetic_edid() fallback
amdgpu (C port) 🟡 Compiles, ~1487 lines Hardcoded 4 connector descriptors, no real HPD

All three GPU drivers use synthetic_edid() at redox-drm/src/kms/connector.rs:35 — a hardcoded 128-byte EDID 1.4 block for 1920×1080@60Hz. This blocks real display detection on bare metal.

5.5 Wi-Fi

Status: 🟡 Compiles + host-tested, zero hardware validation

  • redbear-iwlwifi: C transport layer (~2450 lines) + Rust daemon (~1550 lines)
  • 8 host tests pass
  • Commands time out without real firmware — by design
  • No Intel Wi-Fi device ever exercised

5.6 USB

Status: 🟡 xhcid builds + QEMU proofs pass, bare-metal incomplete

  • xhcid: Red Bear patched, QEMU IRQ delivery proven
  • usbscsid: USB mass storage with inline quirks (214 storage quirks)
  • usbhubd: Hub port management
  • Gap: No EHCI, UHCI, or OHCI drivers — legacy USB keyboards on companion controllers are unreachable on bare metal

6. Cross-Cutting Critical Gaps (Updated Priority)

Gap 1 — IOMMU MSI Validation (CRITICAL)

File: kernel/src/scheme/irq.rs:231

fn iommu_validate_msi_irq(_irq: u8) -> bool {
    true
}

Every MSI/MSI-X interrupt bypasses IOMMU remapping validation. This is a security and correctness gap. The hook exists but has zero logic.

Root cause: IOMMU daemon (iommu) provides AMD-Vi runtime but no Intel VT-d. The validation function needs remapping table data from the IOMMU daemon, or validation must move to userspace via a scheme call.

Action: Implement real validation against IOMMU remapping tables, or explicitly document that MSI/MSI-X without IOMMU is only safe on trusted buses.

Gap 2 — AML Physical Memory Stubs (CRITICAL)

Files: acpid/src/aml_physmem.rs:195, :274

  • read_phys_or_fault() returns T::zero() on failure — fabricates data
  • map_physical_region() falls back to zero page — silent data loss

Impact: Any AML method accessing a physical memory region that fails to map will see fabricated zeroes. This can cause:

  • Incorrect battery/thermal readings
  • Silent EC communication failures
  • Wrong power state transitions

Action: Propagate Result<T> errors to AML evaluation callers instead of fabricating values.

Gap 3 — Kernel Sleep Path PCI Stubs (CRITICAL)

File: kernel/src/arch/x86_shared/sleep.rs:257276

  • read_pci_u8/u16/u32 always return 0
  • write_pci_* are no-ops

Impact: Any AML code using PCI config space access in the kernel S3/S5 sleep path gets fabricated values. This is only safe if the sleep path guarantees no PCI-dependent AML methods are evaluated.

Action: Either wire real PCI config space access in the kernel sleep path, or explicitly scope the kernel AML interpreter to exclude PCI-dependent methods.

Gap 4 — APIC Timer Disabled (HIGH)

File: kernel/src/arch/x86_shared/device/local_apic.rs:81

  • setup_timer() commented out
  • System uses PIT fallback for all timer interrupts

Impact: No per-CPU timer interrupts (all CPUs share PIT on BSP), no TSC deadline mode for modern CPUs, potential timer skew on SMP.

Action: Re-enable APIC timer with calibration against PIT or TSC. Required for per-CPU timer distribution.

Gap 5 — Synthetic EDID in All GPU Drivers (HIGH)

File: redox-drm/src/kms/connector.rs:35

  • All three drivers (AMD, Intel, VirtIO) use hardcoded EDID
  • No real DDC/I²C display detection

Impact: Display will not work on bare metal with non-1080p panels, multi-monitor setups, or displays with non-standard timings.

Action: Implement I²C-over-DDC EDID retrieval in redox-drm, or at minimum implement a real connector detection path that queries HPD + DDC before falling back to synthetic.

Gap 6 — Dual AML Interpreters (HIGH)

Files: kernel/src/arch/x86_shared/sleep.rs (acpi_ext crate) + acpid/src/acpi.rs (acpi crate)

  • Two independent parsers for the same DSDT/SSDT
  • Different handler implementations (kernel has PCI stubs, userspace has physmem stubs)
  • Bug fixes in one do not affect the other

Impact: Maintenance risk, correctness divergence, two surfaces for AML security issues.

Action: Converge on a single canonical interpreter. Recommendation: userspace (acpid) since all drivers are userspace per project model. Kernel sleep path should delegate to userspace or use a shared, read-only AML namespace.

Gap 7 — No EHCI/UHCI/OHCI Drivers (HIGH)

Impact: Legacy USB keyboards on companion controller paths unreachable on bare metal. Only xHCI-native USB devices work.

Action: Implement EHCI driver (highest priority — covers most USB 2.0 controllers with xHCI companion). UHCI/OHCI are lower priority (very old hardware).

Gap 8 — No C-State Kernel Backend (HIGH)

Impact: CPUs run at full frequency constantly on bare metal. Thermal throttling only.

Action: Implement cpuidle/cpufreq kernel backend using MWAIT or HLT. Discovery exists in acpid (cstate.rs) but kernel has no idle driver.

Gap 9 — DMAR Orphaned (MEDIUM)

File: acpid/src/acpi.rs:545

  • 533 lines of Intel VT-d parsing code
  • Dmar::init() commented out — "hangs on real hardware"

Action: Either fix the hang and assign a runtime owner (iommu daemon), or remove the orphaned code until ready.

Gap 10 — >256 CPU MSI Remapping (MEDIUM)

File: drivers/pcid/src/driver_interface/irq_helpers.rs

  • 8-bit APIC destination field limits MSI target selection
  • IOMMU interrupt remapping required for >256 CPUs

Action: Gated on IOMMU maturity (Gap 1).


7. Updated Execution Plan

Phase 1: Critical Stub Removal (23 weeks)

Goal: Remove all CRITICAL-severity stubs before any hardware validation.

# Task File Effort Owner
1.1 Fix read_phys_or_fault() zero-return acpid/src/aml_physmem.rs:195 2 days
1.2 Fix map_physical_region() zero-page fallback acpid/src/aml_physmem.rs:274 2 days
1.3 Fix kernel sleep path PCI read stubs kernel/src/arch/x86_shared/sleep.rs:257276 3 days
1.4 Document kernel PCI stub scope sleep.rs 1 day
1.5 Remove println! debug artifact kernel/src/arch/x86_shared/interrupt/irq.rs:307 1 hour

Gate: All CRITICAL stubs removed + cargo check clean on affected modules.

Phase 2: IOMMU + MSI Validation (34 weeks)

Goal: Make MSI/MSI-X delivery trustworthy.

# Task File Effort Owner
2.1 Implement iommu_validate_msi_irq() real logic kernel/src/scheme/irq.rs:231 1 week
2.2 Wire IOMMU remapping table read into kernel iommu daemon ↔ scheme/irq 1 week
2.3 QEMU validation: MSI-X with IOMMU enabled test-msix-qemu.sh 2 days
2.4 Fix or remove orphaned DMAR code acpid/src/acpi.rs:545 2 days

Gate: test-msix-qemu.sh passes with IOMMU enabled + no iommu_validate_msi_irq() stub.

Phase 3: Timer + CPU Power (23 weeks)

Goal: Enable per-CPU timers and basic CPU idle.

# Task File Effort Owner
3.1 Re-enable APIC timer with calibration kernel/src/arch/x86_shared/device/local_apic.rs:81 3 days
3.2 Implement kernel cpuidle backend (MWAIT/HLT) New file: kernel/src/arch/x86_shared/cpuidle.rs 1 week
3.3 Wire acpid C-state discovery to kernel idle acpid/src/cstate.rs → kernel 3 days
3.4 QEMU validation: timer + idle test-timer-qemu.sh 2 days

Gate: test-timer-qemu.sh passes with APIC timer + CPU idle active.

Phase 4: Display Detection (46 weeks)

Goal: Replace synthetic EDID with real display detection.

# Task File Effort Owner
4.1 Implement I²C-over-DDC EDID retrieval redox-drm/src/kms/ddc.rs (new) 2 weeks
4.2 Wire HPD interrupt to connector detection redox-drm/src/drivers/amd/mod.rs, intel/mod.rs 1 week
4.3 Replace synthetic_edid() with real → fallback redox-drm/src/kms/connector.rs:35 3 days
4.4 QEMU validation: EDID readback test-drm-display-runtime.sh 2 days
4.5 Bare-metal validation: AMD GPU display test-amd-gpu.sh 1 week
4.6 Bare-metal validation: Intel GPU display test-intel-gpu.sh 1 week

Gate: Real EDID retrieved from at least one display on bare metal (AMD or Intel).

Phase 5: USB Legacy Controllers (34 weeks)

Goal: Enable USB keyboard on non-xHCI paths.

# Task File Effort Owner
5.1 Implement EHCI host controller driver local/recipes/drivers/ehcid/ (new) 2 weeks
5.2 Wire EHCI into driver-manager PCI binding driver-manager/src/main.rs 3 days
5.3 QEMU validation: EHCI keyboard test-usb-qemu.sh 2 days
5.4 UHCI/OHCI assessment 1 week

Gate: USB keyboard works via EHCI in QEMU.

Phase 6: AML Convergence (34 weeks)

Goal: Resolve dual AML interpreter risk.

# Task File Effort Owner
6.1 Evaluate kernel sleep.rs → userspace delegation kernel/src/arch/x86_shared/sleep.rs 1 week
6.2 Implement kernel→userspace S3/S5 sleep RPC scheme/kernel.acpi/sleepacpid 1 week
6.3 Remove kernel acpi_ext crate if delegated kernel/src/arch/x86_shared/sleep.rs 3 days
6.4 QEMU validation: sleep/wake cycle test-sleep-qemu.sh 2 days

Gate: S5 shutdown works with single AML interpreter (userspace only).

Phase 7: Hardware Validation Matrix (46 weeks, parallel with 46)

Goal: Evidence-based support claims.

# Task Hardware Effort
7.1 Class A1 validation (AMD desktop + discrete GPU) Ryzen 5000/7000 + AMD GPU 1 week
7.2 Class A2 validation (Intel desktop + iGPU) Core 12th14th Gen 1 week
7.3 Class A3 validation (AMD laptop) Ryzen Mobile 1 week
7.4 Class A4 validation (Intel laptop) Core Mobile 1 week
7.5 Regression test suite on all 4 classes All 2 weeks

Gate: All 4 hardware classes pass boot, shutdown, USB keyboard, and display detection.


8. Timeline Synthesis

Week  13:  Phase 1 — Critical stub removal
Week  47:  Phase 2 — IOMMU + MSI validation
Week  79:  Phase 3 — Timer + CPU power (parallel with Phase 2 week 7)
Week 1015: Phase 4 — Display detection (parallel with Phase 5)
Week 1013: Phase 5 — USB legacy controllers (parallel with Phase 4)
Week 1417: Phase 6 — AML convergence
Week 1419: Phase 7 — Hardware validation matrix (parallel with Phase 6)

Total: 19 weeks (≈4.5 months) with 2 developers

What the existing plans said vs this plan

Plan Claimed Timeline Reality
COMPREHENSIVE P1 (bare-metal hardening) 68 weeks Understated — no critical stub removal phase
COMPREHENSIVE P2 (USB) 46 weeks Realistic for EHCI only
COMPREHENSIVE P3 (IRQ/IOMMU) 46 weeks Realistic if focused on Gap 1 only
IRQ plan Waves 16 "Complete" Code quality complete, validation not started
ACPI plan Waves 07 W0W4 partial, W5W7 open Accurate, but two critical stubs not flagged
SMP plan bottlenecks 1118 days Realistic for B1B2 only

Dependencies

Phase 1 (stub removal)
    │
    ├── required by ──► Phase 2 (IOMMU validation)
    │
    ├── required by ──► Phase 3 (timer + idle)
    │
    └── required by ──► Phase 4 (display detection)

Phase 2 (IOMMU)
    └── required by ──► Phase 7 (hardware validation — safe MSI)

Phase 3 (timer + idle)
    └── required by ──► Phase 7 (hardware validation — no overheating)

Phase 4 (display)
    └── required by ──► Phase 7 (hardware validation — working console)

Phase 5 (USB EHCI)
    └── required by ──► Phase 7 (hardware validation — keyboard input)

Phase 6 (AML convergence)
    └── not blocking ──► Phase 7 (can validate with dual interpreters)

9. Risk Register

# Risk Likelihood Impact Mitigation
R1 aml_physmem stub fix reveals deeper AML memory access issues Medium High Fix with comprehensive error propagation; add fallback to kernel scheme for problematic regions
R2 IOMMU validation implementation requires kernel ABI change Medium High Prototype in userspace first via scheme:iommu call; only promote to kernel if performance requires it
R3 APIC timer calibration fails on specific CPU models Medium Medium Keep PIT fallback path; detect calibration failure and degrade gracefully
R4 DDC/I²C implementation requires GPIO/I2C subsystem not yet built High High Scope Phase 4 to "query EDID via ACPI _DDC method first, then direct I²C"; fallback to synthetic still acceptable for initial bring-up
R5 EHCI driver requires IRQ/MSI-X fixes first Medium Medium Phase 5 starts after Phase 2 gate; use legacy IRQ for EHCI if MSI-X not ready
R6 AML convergence breaks S3 sleep path Medium High Keep kernel sleep.rs as fallback during transition; remove only after S3 validated via userspace path
R7 No bare-metal hardware available for validation Medium Critical Prioritize QEMU proofs for all phases; document "QEMU-validated" vs "bare-metal-validated" per subsystem

10. Verification Gates

Gate A: Boot-Baseline Ready (end of Phase 1)

  • aml_physmem.rs:195 returns Result<T> instead of T::zero()
  • aml_physmem.rs:274 propagates mapping errors instead of zero-page fallback
  • sleep.rs:257276 either wired to real PCI or explicitly scoped out
  • cargo check clean on acpid, kernel, redox-drm
  • repo validate-patches kernel passes
  • repo validate-patches base passes

Gate B: IRQ/IOMMU Trustworthy (end of Phase 2)

  • iommu_validate_msi_irq() performs real validation
  • test-msix-qemu.sh passes with IOMMU enabled
  • test-iommu-qemu.sh passes
  • No unconditional true returns in IRQ validation path

Gate C: Timer + Power (end of Phase 3)

  • APIC timer fires and calibrates correctly in QEMU
  • CPU idle backend enters C1/C2 via MWAIT or HLT
  • test-timer-qemu.sh passes
  • No PIT-only fallback in boot log

Gate D: Display Detection (end of Phase 4)

  • synthetic_edid() is fallback, not primary
  • Real EDID retrieved from at least one display in QEMU
  • test-drm-display-runtime.sh passes

Gate E: USB Legacy (end of Phase 5)

  • EHCI driver enumerates devices in QEMU
  • USB keyboard functional via EHCI in QEMU
  • test-usb-qemu.sh passes

Gate F: Single AML Interpreter (end of Phase 6)

  • S5 shutdown works with userspace AML only
  • Kernel acpi_ext crate removed or explicitly deprecated
  • test-sleep-qemu.sh passes (S3 + S5)

Gate G: Hardware Validation (end of Phase 7)

  • Class A1 (AMD desktop) boots, shuts down, displays, accepts USB keyboard
  • Class A2 (Intel desktop) boots, shuts down, displays, accepts USB keyboard
  • Class A3 (AMD laptop) boots, shuts down, displays, accepts USB keyboard
  • Class A4 (Intel laptop) boots, shuts down, displays, accepts USB keyboard
  • Validation artifacts committed to local/docs/HARDWARE-VALIDATION-MATRIX.md

11. Appendix: Key File Reference

ACPI

  • recipes/core/kernel/source/src/acpi/mod.rs — Kernel ACPI orchestrator
  • recipes/core/kernel/source/src/acpi/rsdp.rs — RSDP discovery
  • recipes/core/kernel/source/src/acpi/madt/mod.rs — MADT parser
  • recipes/core/kernel/source/src/scheme/acpi.rs — Kernel ACPI scheme
  • recipes/core/kernel/source/src/arch/x86_shared/sleep.rs — Kernel AML interpreter for sleep
  • recipes/core/kernel/source/src/arch/x86_shared/stop.rs — Shutdown orchestrator
  • recipes/core/base/source/drivers/acpid/src/main.rs — acpid daemon entry
  • recipes/core/base/source/drivers/acpid/src/acpi.rs — Core ACPI context
  • recipes/core/base/source/drivers/acpid/src/aml_physmem.rs — AML physmem handler (stubs at :195, :274)
  • recipes/core/base/source/drivers/acpid/src/ec.rs — Embedded Controller handler
  • recipes/core/base/source/drivers/acpid/src/thermal.rs — Thermal zone discovery
  • recipes/core/base/source/drivers/acpid/src/fan.rs — Fan device discovery
  • recipes/core/base/source/drivers/acpid/src/cstate.rs — C-state discovery
  • recipes/core/base/source/drivers/acpid/src/dmi.rs — SMBIOS DMI parser
  • recipes/core/base/source/drivers/hwd/src/backend/acpi.rs — hwd ACPI backend
  • recipes/core/base/source/drivers/hwd/src/backend/legacy.rs — LegacyBackend stub (:13)

IRQ / PCI

  • recipes/core/kernel/source/src/scheme/irq.rs — IRQ scheme (stub at :231)
  • recipes/core/kernel/source/src/arch/x86_shared/interrupt/irq.rs — IRQ dispatch
  • recipes/core/kernel/source/src/arch/x86_shared/device/ioapic.rs — I/O APIC
  • recipes/core/kernel/source/src/arch/x86_shared/device/local_apic.rs — LAPIC (timer disabled at :81)
  • recipes/core/kernel/source/src/arch/x86_shared/device/msi.rs — MSI code (patch-based)
  • recipes/core/kernel/source/src/arch/x86_shared/device/vector.rs — Vector allocator (patch-based)
  • recipes/core/kernel/source/src/arch/x86_shared/device/pic.rs — 8259 PIC
  • recipes/core/kernel/source/src/arch/x86_shared/idt.rs — IDT setup
  • local/recipes/drivers/redox-driver-sys/source/src/irq.rs — Userspace IRQ handling
  • local/recipes/drivers/redox-driver-sys/source/src/pci.rs — Userspace PCI abstraction
  • recipes/core/base/source/drivers/pcid/src/main.rs — pcid daemon
  • recipes/core/base/source/drivers/pcid/src/scheme.rs — PciScheme
  • recipes/core/base/source/drivers/pcid/src/driver_interface/irq_helpers.rs — IRQ helper FIXMEs
  • local/recipes/system/driver-manager/source/src/main.rs — Driver manager

Driver Infrastructure

  • local/recipes/drivers/redox-driver-sys/source/src/lib.rs — Core library
  • local/recipes/drivers/redox-driver-sys/source/src/quirks/mod.rs — Quirks API
  • local/recipes/drivers/linux-kpi/source/src/lib.rs — linux-kpi crate
  • local/recipes/drivers/linux-kpi/source/src/rust_impl/pci.rs — PCI KPI (777 lines)
  • local/recipes/drivers/linux-kpi/source/src/rust_impl/drm_shim.rs — DRM GEM shim
  • local/recipes/drivers/linux-kpi/source/src/rust_impl/mac80211.rs — mac80211 KPI (959 lines)
  • local/recipes/drivers/linux-kpi/source/src/rust_impl/wireless.rs — cfg80211 KPI (1002 lines)
  • local/recipes/system/firmware-loader/source/src/main.rs — firmware-loader daemon
  • local/recipes/gpu/redox-drm/source/src/main.rs — DRM daemon
  • local/recipes/gpu/redox-drm/source/src/drivers/amd/mod.rs — AMD GPU driver
  • local/recipes/gpu/redox-drm/source/src/drivers/intel/mod.rs — Intel GPU driver
  • local/recipes/gpu/redox-drm/source/src/kms/connector.rs — Connector + synthetic EDID (:35)
  • local/recipes/gpu/amdgpu/source/amdgpu_redox_main.c — Bounded AMD display C port
  • local/recipes/gpu/amdgpu/source/redox_glue.h — Linux→Redox C glue
  • local/recipes/gpu/amdgpu/source/redox_stubs.c — Kernel emulation stubs

Patches

  • local/patches/kernel/redbear-consolidated.patch — Consolidated mega-patch
  • local/patches/kernel/P8-msi.patch — MSI + vector allocator
  • local/patches/kernel/P9-ioapic-irq-affinity.patch — IRQ affinity
  • local/patches/kernel/P10-irq-affinity-wiring.patch — Affinity wiring
  • local/patches/kernel/P20-x2apic-icr-mode-fix.patch — x2APIC ICR
  • local/patches/kernel/P21-x2apic-smp-fix.patch — x2APIC SMP
  • local/patches/kernel/P22-x2apic-madt-fallback.patch — x2APIC MADT fallback
  • local/patches/kernel/P24-cstate-mwait-idle.patch — C-state MWAIT
  • local/patches/kernel/P25-cpuidle-deep-cstates.patch — Deep C-states
  • local/patches/base/P19-acpid-startup-hardening.patch — acpid startup
  • local/patches/base/P24-acpi-s5-derivation-shutdown-semantics.patch — S5 derivation
  • local/patches/base/P44-acpid-thermal-zones.patch — Thermal zones
  • local/patches/base/P48-acpid-fan-support.patch — Fan support
  • local/patches/base/P52-acpid-cstates.patch — C-state discovery

12. Document Authority

This document is a cross-cutting reassessment that references but does not replace the canonical subsystem plans:

  • For ACPI wave-level execution detail, see ACPI-IMPROVEMENT-PLAN.md
  • For IRQ/PCI wave-level execution detail, see IRQ-AND-LOWLEVEL-CONTROLLERS-ENHANCEMENT-PLAN.md
  • For boot detection wave detail, see BOOT-PROCESS-HARDWARE-DETECTION-PLAN.md
  • For SMP bottleneck detail, see SMP-SCHEDULER-IMPROVEMENT-PLAN.md
  • For desktop path blockers, see CONSOLE-TO-KDE-DESKTOP-PLAN.md

When this document conflicts with a canonical subsystem plan, the canonical plan wins on subsystem-specific details, and this document wins on cross-cutting prioritization and inter-subsystem dependencies.

This document should be updated after each phase gate is reached, or when new critical stubs are discovered.