Replace 5 stale planning docs with unified assessment: - New: COMPREHENSIVE-SYSTEM-ASSESSMENT-AND-IMPROVEMENT-PLAN.md (12-subsystem audit vs Linux 7.1, 6 phases of work) - Removed: IMPLEMENTATION-MASTER-PLAN, SUBSYSTEM-ASSESSMENT-2026-05, SMP-BOOT-HARDENING-PLAN, CPU-DMA-IRQ-MSI-SCHEDULER-FIX-PLAN, COMPREHENSIVE-BOOT-IMPROVEMENT-PLAN
36 KiB
Red Bear OS — Comprehensive System Assessment & Improvement Plan
Version: 1.0 (2026-05-20)
Reference: Linux kernel 7.1 (local/reference/linux-7.1/)
Supersedes: IMPLEMENTATION-MASTER-PLAN.md, SUBSYSTEM-ASSESSMENT-2026-05.md,
SMP-BOOT-HARDENING-PLAN.md, CPU-DMA-IRQ-MSI-SCHEDULER-FIX-PLAN.md,
COMPREHENSIVE-BOOT-IMPROVEMENT-PLAN.md
Canonical adjacent plans (remain authoritative for subsystem detail):
ACPI-IMPROVEMENT-PLAN.md— ACPI waves W0–W7IRQ-AND-LOWLEVEL-CONTROLLERS-ENHANCEMENT-PLAN.md— PCI/IRQ/MSI-XUSB-IMPLEMENTATION-PLAN.md— USB phases U0–U6CONSOLE-TO-KDE-DESKTOP-PLAN.md— desktop pathDRM-MODERNIZATION-EXECUTION-PLAN.md— GPU stack
1. Executive Summary
Red Bear OS is architecturally sound but has significant gaps in hardware-facing subsystems. The system boots to a login prompt in QEMU with working console, networking, and basic device enumeration. However, the boot log and codebase audit reveal that bare-metal usability is limited: the system runs hot (no C-states, no thermal backend), may not see all CPU cores (AP startup races), may lose USB keyboard (only xHCI exists), and has minimal observability for operators.
This document is a truthful, evidence-based assessment of every low-level subsystem, grounded in source code inspection, boot log analysis, and comparison against Linux 7.1 reference source. It replaces five stale/duplicate planning documents with one canonical assessment and forward plan.
Bottom-line verdicts
| Subsystem | Verdict |
|---|---|
| SMP | Real in kernel, but AP startup races and no bare-metal validation |
| CPU power (C-states) | Completely missing — root cause of heat on bare metal |
| CPU power (P-states) | Partial — cpufreqd exists but fragile |
| Thermal / sensors | Daemon exists but no backend — runs with empty surface |
| ACPI boot | Boot-baseline complete, not release-grade |
| ACPI thermal/fan | Missing — not implemented in acpid |
| USB xHCI | Real, QEMU-validated only |
| USB EHCI/UHCI/OHCI | No drivers exist — bare-metal USB keyboard unreliable |
| PCI / IRQ / MSI-X | Architecturally strong, low adoption in drivers |
| IOMMU AMD-Vi | Real, QEMU first-use proof only |
| IOMMU Intel VT-d | Missing — orphaned DMAR parsing only |
| Firmware loading | Real, on-demand, async |
| Memory management | Basic frame allocator — no swap/NUMA/hotplug |
| Logging | Append-only /var/log/system.log — no rotation/structured storage |
| Udev | Real but limited — polling hotplug, hardcoded rules |
2. Assessment by Subsystem
2.1 SMP / CPU Bring-up
Status: 🟡 Implemented, QEMU-proven, bare-metal unvalidated
Linux 7.1 equivalent: arch/x86/kernel/smpboot.c, arch/x86/kernel/apic/,
kernel/smp.c
What is real
The kernel has a complete AP bring-up path:
- AP trampoline with INIT/SIPI sequencing (
madt/arch/x86.rs) - x2APIC/LocalApic branching with zero-extended ID fallback
(
local_apic.rs) multi_corefeature enabled by default (Cargo.toml)- Per-CPU data structures (
percpu.rs) - IPI support for TLB shootdowns and scheduler wakeups
- CPU set tracking (
cpu_set.rs)
Source files inspected:
recipes/core/kernel/source/src/acpi/madt/arch/x86.rsrecipes/core/kernel/source/src/arch/x86_shared/device/local_apic.rsrecipes/core/kernel/source/src/startup/mod.rsrecipes/core/kernel/source/src/cpu_set.rs
Why you see "SMP: 1 CPUs online"
The boot log shows:
kernel::acpi::madt::arch:INFO -- SMP: 1 CPUs online (max 256)
This can happen for three reasons:
- QEMU i440fx exposes only 1 vCPU to the guest (most likely in this boot)
- AP startup timeout —
AP_SPIN_LIMIT=1_000_000spin counts vary by clock speed; on slow or heavily loaded bare metal, APs may not signal readiness in time - Firmware MADT only exposes 1 processor entry — rare but possible on broken firmware
On real bare metal with an AMD Ryzen or Intel Core system, if the firmware exposes multiple LocalApic entries and AP startup succeeds, the kernel will bring up all cores. But this has never been validated on the project's hardware matrix.
Critical weaknesses (38 kernel issues found)
SMP-BOOT-HARDENING-PLAN.md (2026-05-16) documented 54 issues across kernel
and userspace boot. The most critical kernel-side items are:
| Issue | Severity | File | Description |
|---|---|---|---|
| AP startup LogicalCpuId race | Critical | madt/arch/x86.rs:153,244,276,365 |
Two APs load CPU_COUNT simultaneously → same ID |
| AP_READY dual-mechanism race | Critical | madt/arch/x86.rs:174-225 |
Trampoline u64 write + static AtomicBool — inconsistent ordering |
| TLB shootdown range race | Critical | percpu.rs:134-137 |
Concurrent shootdowns overwrite range between flag set and IPI |
| MCS lock missing fences | Critical | sync/mcs.rs:74-101 |
No Release/Acquire on MCS lock handoff |
| Unbounded priority inversion | Critical | sync/mcs.rs:126-145 |
PI donation one level only |
| Scheduler panic flag leak | Critical | switch.rs:164,298 |
in_context_switch stays true on panic → CPU lockup |
| Missing SIPI delays | High | madt/arch/x86.rs:192-337 |
Spin-count delays, not TSC-based. Intel SDM requires 10ms INIT→SIPI |
| NUMA node set after CPU visible | High | madt/arch/x86.rs:244,253 |
CPU_COUNT.fetch_add() before numa_node.set() |
| MAX_CPU_COUNT=128 too small | High | cpu_set.rs:44 |
AMD EPYC has 128C/256T, Threadripper PRO 96C/192T |
| Global IRQ count lock | High | scheme/irq.rs:67 |
COUNTS.lock() is global spinlock on hot path |
These are not theoretical. The LogicalCpuId race means two APs can claim the same CPU ID, leading to corrupted per-CPU data. The missing SIPI delays mean APs may fail to start on real hardware with strict firmware timing requirements.
Gaps vs Linux 7.1
| Feature | Linux 7.1 | Red Bear |
|---|---|---|
| Robust AP bring-up | smpboot.c with TSC delays, online checks |
Spin-count delays, race conditions |
| CPU hotplug | Full hot-add/hot-remove | Not implemented |
| CPU isolation | isolcpus, nohz_full |
Not implemented |
| NUMA | Node-aware scheduling, memory policies | No NUMA awareness |
| Per-CPU idle threads | cpuhp/, idle thread per CPU |
APs enter idle loop directly |
| x2APIC fallback | Clean fallback with explicit disable | Fallback works but warns |
Verdict: SMP infrastructure is real but has critical races that must be fixed before bare-metal multi-core can be trusted. No hardware validation exists.
2.2 CPU Power Management (P-states / C-states)
Status: 🟡 P-states partial, C-states missing entirely
Linux 7.1 equivalent: drivers/cpufreq/, drivers/cpuidle/,
drivers/acpi/processor.c, arch/x86/kernel/acpi/cstate.c
P-states (frequency scaling)
cpufreqd is a real userspace daemon that:
- Reads ACPI
_PSS(Performance States) tables - Samples CPU load periodically
- Writes
IA32_PERF_CTLMSR to change P-state - Supports governors: Ondemand, Performance, Powersave
- Exposes
/scheme/cpufreq
Source: local/recipes/system/cpufreqd/source/src/main.rs
But it is fragile:
write_msr()ignores itsmsrparameter and writes only the value to/dev/cpu/<n>/msr. This suggests it depends on a Linux-style MSR driver that uses file offset as the MSR index. No such driver was found in the Red Bear tree.- The daemon reads MSR temperature via
IA32_THERM_STATUSbut has no actionable thermal policy — it can request "powersave" from cpufreqd itself, but there is no thermal trip point logic. - On the boot log:
cpufreqd: CPU0: 4 P-states (2400 - 1200 kHz)followed bycpufreqd: CPU0: MSR write failed (1/1)— the P-state change is failing.
C-states (idle power states)
This is completely missing and is the single largest contributor to system heat on bare metal.
What exists:
- The kernel has a normal
hltinstruction in the idle loop when no threads are runnable - No dedicated cpuidle subsystem
- No ACPI
_CST(C-state) table parsing - No
mwait/monitorusage for deeper C-states - No C1E, C3, C6, C7 support
What Linux 7.1 has:
drivers/cpuidle/with multiple drivers:acpi_idle,intel_idle,amd_idle_CSTtable parsing in ACPI processor drivermwaithint selection based on C-state depth- Latency and power measurements per C-state
- Scheduler integration:
cpuidle_enter()called from idle loop
Verdict: cpufreqd is real but MSR writes are failing. C-states are completely absent. On bare metal, CPUs run at full power even when idle. This is why the system is "very hot."
2.3 Thermal Management / Sensors / Hardware Monitoring
Status: 🔴 Thermal daemon exists but no backend; sensors missing; hwmon
absent
Linux 7.1 equivalent: drivers/thermal/, drivers/hwmon/,
drivers/acpi/thermal.c, drivers/acpi/fan.c
thermald
thermald is real code, not a stub. It:
- Attempts to read ACPI thermal zones
- Reads CPU MSR temperature (
IA32_THERM_STATUS) - Can request powersave from cpufreqd
- Can request ACPI sleep
- Exposes
/scheme/thermal
Source: local/recipes/system/thermald/source/src/main.rs
But it runs with an empty surface:
- ACPI thermal zone enumeration is missing from acpid. The ACPI daemon's
scheme surface (
/scheme/acpi) has no thermal or fan nodes. thermaldexpects/scheme/acpi/thermaland/scheme/acpi/fanto exist, but they do not.fan.rsexists in the thermald source tree but is orphaned — it is not wired intomain.rs(mod fan;is absent).
The boot log shows:
[ OK ] Started Thermal management daemon
2026-05-20T09-13-44.583Z [@thermald:19 INFO] thermald: started
And then nothing. No thermal zones found, no temperature readings, no fan control.
Hardware sensors (hwmon)
There is no hwmon infrastructure in Red Bear OS.
What is missing:
- No
/sys/class/hwmonequivalent - No
/scheme/hwmon - No sensor drivers
Linux 7.1 has 100+ hwmon drivers covering:
- CPU temperature:
coretemp(Intel),k10temp(AMD) - Motherboard sensors:
nct6775,it87,f71882fg - Voltage regulators:
ina2xx,ltc2947 - Fan speed monitors: various Super-I/O chips
Red Bear has none of these.
SMBIOS / DMI
SMBIOS parsing exists in acpid/src/dmi.rs, but the boot log shows:
2026-05-20T09-12-40.920Z [@acpid::dmi:124 WARN] SMBIOS entry point not found in 0xF0000-0xFFFFF
This means DMI-based quirks and system identification are best-effort only. On systems without a valid SMBIOS entry point, the quirk system falls back to PCI/USB device ID matching only.
Verdict: thermald is real but powerless. No hwmon, no sensor drivers, no ACPI thermal backend. The system has zero thermal awareness.
2.4 ACPI Stack
Status: 🟡 Boot-baseline complete, not release-grade
Linux 7.1 equivalent: drivers/acpi/, include/acpi/
What is strong
- Kernel early ACPI discovery: RSDP, RSDT, XSDT
- MADT parsing: LocalApic, IoApic, IntSrcOverride, NMI
- x2APIC fallback with zero-extended IDs
- FADT parsing, PM1a/PM1b register access
- AML interpreter v6.1.1 with real mutex tracking
- EC (Embedded Controller) byte-transaction access
_S5shutdown derivation (though timing is fragile)kstopkernel shutdown eventing consumed byredbear-sessiond- DMI exposure via
/scheme/acpi/dmi
Source files:
recipes/core/kernel/source/src/acpi/recipes/core/base/source/drivers/acpid/src/
What is weak
| Area | Status | Detail |
|---|---|---|
| acpid startup | Fragile | Active panic-grade expect() paths on firmware-origin data |
_S5 timing |
Fragile | Derived after PCI registration; pre-PCI shutdown reports "AML not ready" |
| DMAR | Orphaned | Parsing exists in acpid/src/dmar/mod.rs but not wired; Intel VT-d has no owner |
| Sleep beyond S5 | Missing | set_global_s_state() is S5-only; S3 suspend not validated |
| Thermal zones | Missing | No ACPI thermal zone enumeration |
| Fan devices | Missing | No ACPI fan device support |
| Battery/power | Provisional | power_snapshot() does real AML-backed probing but bootstrap preconditions are weak |
| AML fault handling | Partial | aml_physmem.rs has "log then fabricate 0" paths |
| SMBIOS | Best-effort | Entry point missing on many systems |
The ACPI improvement plan (ACPI-IMPROVEMENT-PLAN.md) tracks 8 waves of work
(W0–W7). Current status:
- W0 (Contracts): partially complete
- W1 (Startup hardening): partially complete
- W2 (AML ordering/shutdown): partially complete
- W3 (Honest power surface): open
- W4 (Physmem/EC/fault): partially complete
- W5 (Ownership cleanup): open
- W6 (Consumer integration): partially complete
- W7 (Validation closure): open
Verdict: ACPI is the most mature low-level subsystem, but it is still boot-baseline complete, not release-grade. Thermal and fan support are completely absent.
2.5 PCI / IRQ / MSI-X
Status: 🟡 Architecturally strong, adoption-incomplete
Linux 7.1 equivalent: drivers/pci/, arch/x86/kernel/apic/,
drivers/iommu/
What is real
pcidenumerates PCI devices via config space (I/O ports 0xCF8/0xCFC fallback when no ECAM/MCFG)- Capability parsing: MSI, MSI-X, power management, vendor-specific
driver-managermatches TOML configs by bus/class/vendor and spawns drivers- Kernel MSI message composition and validation (
msi.rs,vector.rs) - MSI-X table mapping and vector allocation
redox-driver-sysprovides IRQ handle abstractions, affinity helpers- IOAPIC routing with interrupt source overrides
- Legacy PIC fallback
Source files:
recipes/core/base/source/drivers/pcid/local/recipes/system/driver-manager/recipes/core/kernel/source/src/arch/x86_shared/device/msi.rslocal/recipes/drivers/redox-driver-sys/source/src/irq.rs
What is weak
| Issue | Detail |
|---|---|
| Legacy IRQ dominance | e1000d and ided still use legacy IRQ (IRQ 11, IRQ 14/15) |
| MSI-X adoption | Only ixgbed and GPU paths use MSI-X; most drivers on legacy INTx |
| IOMMU MSI gate | iommu_validate_msi_irq() is a stub — always returns true |
| IRQ affinity | Available in API but not widely used |
| pcid helper fragility | Some paths still treat malformed capabilities as invariants |
| Hardware validation | MSI-X proven in QEMU only; no real hardware vector validation |
The IRQ/low-level plan (IRQ-AND-LOWLEVEL-CONTROLLERS-ENHANCEMENT-PLAN.md)
correctly identifies that the architecture is sound but the runtime proof is
thin. Priority 1 is "MSI-X runtime validation on real devices."
Verdict: The PCI/IRQ substrate is one of the strongest parts of the stack, but it is not yet release-grade because MSI-X is not widely adopted and hardware validation is missing.
2.6 IOMMU / DMA
Status: 🟡 AMD-Vi real but unvalidated; Intel VT-d missing
Linux 7.1 equivalent: drivers/iommu/amd/, drivers/iommu/intel/,
drivers/iommu/dma-iommu.c
AMD-Vi
The iommu daemon is real, not a stub:
AmdViUnit::init()maps MMIO, programs device tables, command buffer, event log, interrupt remap table (IRTE)- QEMU first-use proof passes: discovers units, initializes, drains events
- Self-test path exists:
redbear-phase-iommu-check
Source: local/recipes/system/iommu/source/src/amd_vi.rs
But:
- The boot log shows:
iommu: no AMD-Vi units found (source=none, kernel_acpi_status=empty, ivrs_path=none) - This happens because the IVRS table is absent on this platform (QEMU i440fx does not provide IVRS)
- When zero units are found, the daemon registers
scheme:iommuand exits - Real AMD hardware validation: NONE
Intel VT-d
- DMAR parsing exists in
acpid/src/dmar/mod.rsbut is orphaned - No Intel VT-d runtime daemon
- No DMA remapping for Intel platforms
iommudaemon is AMD-Vi only
DMA integration
- DMA allocation exists in
redox-driver-sys - But IOMMU integration is incomplete:
iommu_validate_msi_irq()is a no-op, and there is no enforced DMA map/unmap with IOMMU translation - Linux 7.1 has
dma-iommu.cwhich handles IOMMU-aware DMA mapping for all devices behind an IOMMU
Verdict: AMD-Vi is implemented but unvalidated. Intel VT-d is missing. DMA/IOMMU integration is incomplete.
2.7 USB Stack
Status: 🟡 xHCI real but QEMU-only; EHCI/UHCI/OHCI missing
Linux 7.1 equivalent: drivers/usb/host/, drivers/usb/core/,
drivers/hid/usbhid/
xHCI
The xHCI driver (xhcid) is real and substantial:
- ~6,000 lines of Rust
- 88+ error handling fixes applied via Red Bear patch
- Interrupt-driven path restored (MSI/MSI-X/INTx)
- Event ring growth implemented (ring doubling)
- BOS/SuperSpeed descriptor fetching
- Speed detection for hub children
- USB 3 hub endpoint configuration
- Suspend/resume API skeleton
Source: recipes/core/base/source/drivers/usb/xhcid/
But:
- Only QEMU-validated — no real hardware testing
- ~57 TODO/FIXME comments remain
- Some
panic!()sites remain in device enumerator
Missing host controllers
No EHCI, UHCI, or OHCI drivers exist in the Red Bear tree.
| Controller | Speed | Why it matters |
|---|---|---|
| EHCI | USB 2.0 High Speed | Most USB 2.0 keyboards/mice |
| OHCI | USB 1.1 Full/Low Speed | AMD/VIA legacy USB |
| UHCI | USB 1.1 Full/Low Speed | Intel legacy USB |
Linux 7.1 has full implementations for all three:
drivers/usb/host/ehci-hcd.c(~4,500 lines)drivers/usb/host/ohci-hcd.c(~3,500 lines)drivers/usb/host/uhci-hcd.c(~2,800 lines)
The USB implementation plan honestly states:
"External USB keyboard input is reliably available only when the keyboard is reached through the
xHCI -> usbhubd/usbhidd -> inputdpath."
On many bare-metal systems, USB keyboards route through EHCI or OHCI, not xHCI. Red Bear cannot claim reliable USB keyboard boot fallback.
Class drivers
| Driver | Status | Quality |
|---|---|---|
usbhubd |
Real | Good — interrupt-driven change detection, graceful per-port errors |
usbhidd |
Real | Good — HID report parsing, named producers, no panics in loop |
usbscsid |
Real | Good — BOT transport, stall recovery, ReadCapacity16 |
Verdict: xHCI is real but QEMU-only. The absence of EHCI/UHCI/OHCI is a critical bare-metal gap.
2.8 Firmware Loading
Status: 🟢 Real and functional
Linux 7.1 equivalent: drivers/base/firmware_loader/
The firmware-loader daemon is one of the most complete subsystems:
- On-demand blob loading via
scheme:firmware - Indexes
/lib/firmwareat startup - Persistent cache with fallback chains
- Async
request_firmware_nowait()with timeout and retry - Emits uevents for consumers
- Read-only scheme with mmap support
Source: local/recipes/system/firmware-loader/source/
The boot log does not show firmware loading activity because no device requested firmware during this boot (no GPU, no Wi-Fi).
Verdict: This subsystem is production-ready architecturally. Needs hardware validation when GPU/Wi-Fi drivers are active.
2.9 Memory Management
Status: 🟡 Basic but functional; advanced features missing
Linux 7.1 equivalent: mm/, arch/x86/mm/
What is real
- Frame allocator / buddy-like free list
- Kernel page-table setup (4-level on x86_64)
- Device-memory mapping for MMIO
- Explicit memory-region handling
- Early boot memory map parsing from ACPI/firmware
- 7,092 MB detected in boot log
Source:
recipes/core/kernel/source/src/memory/mod.rsrecipes/core/kernel/source/src/startup/memory.rs
What is missing
| Feature | Linux 7.1 | Red Bear |
|---|---|---|
| Swap | Full swap with page reclaim | Not implemented |
| NUMA | Node-aware allocation, migrate pages | No NUMA awareness |
| Memory hotplug | Add/remove memory at runtime | Not implemented |
| Reclaim/compaction | kswapd, memory pressure handling |
Not implemented |
| OOM killer | out_of_memory() kills processes |
Not implemented |
| Huge pages | THP, hugetlbfs | Not implemented |
| Memory cgroups | memcg resource limits |
Not implemented |
| Demand paging | Lazy allocation on fault | Basic but no swap backing |
Verdict: Sufficient for current boot and userspace needs, but not production-grade for memory-intensive workloads.
2.10 Logging Infrastructure
Status: 🟡 Basic append-only; no rotation, no structured storage
Linux 7.1 equivalent: No direct equivalent; compare to systemd-journald,
rsyslog, syslog-ng
What is real
logddaemon servesscheme:log- Persists to
/var/log/system.log - prepends startup banner, backfills new sinks
- Mirrors kernel log input
- relibc syslog API (
syslog(),openlog()) writes to/scheme/log
Source:
recipes/core/base/source/logd/src/main.rsrecipes/core/base/source/logd/src/scheme.rs
What is weak
| Issue | Detail |
|---|---|
| Append-only | /var/log/system.log grows forever |
| No rotation | No size-based or time-based truncation |
| No retention | Old logs never deleted |
| No structured format | Plain text only; no JSON or binary journal |
| read path TODO | scheme.rs has a TODO for reading log history |
| Console dominance | Most daemon output still goes to console timestamps |
| No per-service logs | All logs in one file |
The boot log shows console timestamps because daemons write to stderr, which
init captures and logs. The persistent /var/log/system.log exists but is
append-only with no management.
Verdict: Functional for debugging but not suitable for production observability. Needs rotation, structured format, and per-service separation.
2.11 Udev / Device Discovery
Status: 🟡 Real but limited
Linux 7.1 equivalent: drivers/base/core.c, lib/kobject_uevent.c, udev/
What is real
udev-shim is a real implementation, not a placeholder:
- Enumerates PCI devices via
pcidscheme - Classifies devices by class/subclass/vendor
- Creates
/devnodes and symlinks - Writes
/etc/udev/rules.d/50-default.rules - Exposes
scheme:udev - Polls for changes (not event-driven)
Source: local/recipes/system/udev-shim/source/
The boot log shows:
[ OK ] Started udev compatibility shim
[INFO] udev-shim: enumerated 1 PCI device(s)
[INFO] udev-shim: wrote default rules to /etc/udev/rules.d/50-default.rules
What is weak
| Issue | Detail |
|---|---|
| Hardcoded rules | Only 3 rules: net naming (enp*), NVMe by-id, SATA by-id |
| Polling hotplug | Polls every N seconds; not event-driven like Linux udev/netlink |
| No rules engine | Cannot parse Linux udev rules; rules are compiled-in |
| libudev-stub TODO | local/recipes/libs/libudev-stub/recipe.toml explicitly marked TODO |
| Limited coverage | Only PCI devices; no USB, no ACPI, no platform devices |
| No persistent db | Device state not saved across reboots |
Linux 7.1 udev:
- Event-driven via netlink
NETLINK_KOBJECT_UEVENT - Full rules engine with
MATCH,ACTION,ENV,RUN - Persistent database in
/run/udev/ udevadmtool for querying and triggering- Integrates with
systemdfor device units
Verdict: Functional for basic PCI device naming but far from a full udev replacement. Polling hotplug is inefficient.
2.12 Input Stack
Status: 🟡 Real but uneven quality
Linux 7.1 equivalent: drivers/input/, drivers/hid/, drivers/serio/
What is real
| Component | Status | Detail |
|---|---|---|
ps2d |
Real | PS/2 keyboard + mouse; kernel serio byte queues |
usbhidd |
Real | HID report parsing, named producers |
inputd |
Real | Producer/consumer scheme, VT switching, keymaps |
evdevd |
Real | evdev scheme, orbclient→evdev translation |
i2c-hidd |
Real | ACPI PNP0C50 scan, _CRS parsing |
intel-thc-hidd |
Partial | PCI init works; main loop sleeps 5s — no input streaming |
The boot log shows PS/2 and evdev working:
[ OK ] Started PS/2 driver
[ OK ] Started Evdev input daemon
[INFO] evdevd: registered scheme:evdev
Gaps vs Linux 7.1
| Gap | Severity | Linux Reference |
|---|---|---|
| intel-thc-hidd no streaming | High | drivers/hid/intel-thc-hid/ full probe+report |
| No multitouch/ABS_MT | High | drivers/input/input-mt.c |
| No libinput acceleration | High | libinput: velocity curves, palm detection |
| No PS/2 extended protocols | Medium | libps2.c ImPS/2 scroll, Explorer 5-btn |
| No HID quirks table | Medium | hid-quirks.c 4000+ entries |
| No input hotplug | Medium | udev + inotify on /dev/input/ |
Verdict: The input stack exists and works for basic keyboard/mouse. Touch and advanced HID are incomplete.
3. Root Cause Analysis
Why the system runs hot on bare metal
- No C-state management → CPUs never enter low-power idle states (C1, C1E, C3, C6, C7). They spin in the kernel idle loop at full power.
- No ACPI thermal zones →
acpiddoes not enumerate thermal zones, sothermaldhas no temperature data to act on. - No hwmon sensor drivers → No temperature sensors are readable. The system is "flying blind."
- No ACPI fan control → Fan devices are not enumerated, so
thermaldcannot turn on cooling. - cpufreqd MSR writes failing → Even P-state throttling is not working
reliably (
MSR write failedin boot log).
Fix priority: C-states (immediate heat reduction) > ACPI thermal zones (enables thermald) > hwmon sensors (operator visibility) > fan control (active cooling).
Why only 1 CPU shows online
- QEMU i440fx exposes only 1 vCPU by default (most likely in the provided boot log)
- AP startup races — LogicalCpuId race, missing SIPI delays, AP_READY dual mechanism can cause APs to fail startup on real hardware
- MAX_CPU_COUNT=128 too small for high-core-count AMD EPYC
- No bare-metal validation means we don't know which of these is the real blocker on actual hardware
Why USB keyboard may not work on bare metal
- Only xHCI exists — no EHCI/UHCI/OHCI drivers
- Many systems route USB 2.0 keyboards through EHCI
- Some AMD/VIA systems use OHCI for legacy ports
- Some Intel systems use UHCI for legacy ports
- No companion controller support to route low-speed devices from EHCI to xHCI
4. Honest Status Matrix
| Subsystem | Status | Linux 7.1 Parity | Evidence Class |
|---|---|---|---|
| SMP bring-up | 🟡 Partial | ~30% | Source + QEMU; bare metal unvalidated |
| C-states (cpuidle) | 🔴 Missing | 0% | No subsystem exists |
| P-states (cpufreq) | 🟡 Partial | ~20% | Daemon real but MSR writes failing |
| Thermal management | 🔴 Missing backend | ~10% | thermald exists but no ACPI backend |
| Hardware sensors (hwmon) | 🔴 Missing | 0% | No infrastructure, no drivers |
| ACPI boot / shutdown | 🟢 Baseline | ~40% | Boots, shutdown works, sleep partial |
| ACPI thermal / fan | 🔴 Missing | 0% | Not implemented in acpid |
| PCI enumeration | 🟢 Working | ~60% | Real, robust, driver-manager binds |
| MSI/MSI-X infrastructure | 🟡 Real | ~40% | Kernel real, driver adoption low |
| IOMMU AMD-Vi | 🟡 Real, unvalidated | ~30% | QEMU proof only |
| IOMMU Intel VT-d | 🔴 Missing | 0% | Orphaned DMAR parsing only |
| USB xHCI | 🟡 Real, QEMU-only | ~30% | No hardware validation |
| USB EHCI/UHCI/OHCI | 🔴 Missing | 0% | No drivers |
| Firmware loading | 🟢 Real | ~70% | On-demand, async, validated in build |
| Memory management | 🟡 Basic | ~30% | Frame allocator; no swap/NUMA/hotplug |
| Logging | 🟡 Basic | ~20% | Append-only, no rotation |
| Udev | 🟡 Limited | ~25% | Polling, hardcoded rules |
| Input (PS/2, USB HID) | 🟢 Working | ~50% | Real but touch/advanced HID missing |
| Input (I2C HID, THC) | 🟡 Partial | ~20% | i2c-hidd real; intel-thc-hidd non-functional |
| D-Bus system bus | 🟢 Working | ~60% | Real, services wired |
| D-Bus session bus | 🟡 Partial | ~30% | Partially wired |
| Network (wired) | 🟢 Working | ~60% | e1000d, virtio-net work |
| Network (Wi-Fi) | 🟡 Host-tested | ~20% | Intel stack builds; no hardware validation |
| Bluetooth | 🟡 Experimental | ~15% | BLE controller probe works; limited |
5. New Improvement Plan
This plan is ordered by impact on bare-metal usability and dependency chain. Earlier phases unblock later ones.
Phase 1: Bare-Metal Boot Hardening (6–8 weeks)
Goal: Boot reliably on diverse bare metal with all cores, reasonable temperature, and working USB keyboard.
1.1 Fix SMP AP Startup (2 weeks)
- Fix K1 (LogicalCpuId race) — use
fetch_addbefore AP reads ID - Fix K2 (AP_READY dual mechanism) — consolidate to single atomic
- Fix K7 (missing SIPI delays) — add TSC-based 10ms INIT→SIPI delay per Intel SDM
- Increase MAX_CPU_COUNT to 256
- Validate on AMD Ryzen and Intel Core bare metal
- Capture boot log showing
SMP: N CPUs onlinewhere N > 1
1.2 Implement Basic C-states (2 weeks)
- Add
cpuidleframework in kernel: idle state table, enter/exit hooks - Parse ACPI
_CSTtable in acpid, expose via/scheme/acpi/cstates - Implement
hlt-based idle (C1) — immediate heat reduction - Add
mwait-based C1E/C3 for Intel; addAMD C1Esupport - Wire to scheduler idle path: call
cpuidle_enter()when no runnable threads - Validate temperature drop on bare metal
1.3 Enable ACPI Thermal Zones (2 weeks)
- Add thermal zone enumeration to acpid (
_TZnamespace walk) - Expose
/scheme/acpi/thermalwith zone temperatures and trip points - Wire thermald to read from
/scheme/acpi/thermal - Add passive cooling policy: throttle cpufreqd when trip point exceeded
- Add ACPI fan device support (
_FANobjects) - Wire thermald fan control
1.4 Add Basic Sensor Drivers (2 weeks)
- Create
scheme:hwmonor extend/scheme/acpi/thermal - Port
coretempdriver (Intel CPU temperature MSR) - Port
k10tempdriver (AMD CPU temperature MSR) - Add temperature readout to
redbear-info - Validate sensor readings on bare metal
Phase 2: USB Completeness (4–6 weeks)
Goal: USB keyboard and storage work on all bare metal.
2.1 EHCI Host Controller (3 weeks)
- Implement EHCI HCD based on Linux
drivers/usb/host/ehci-hcd.c - Support USB 2.0 high-speed keyboards, mice, storage
- Integrate with driver-manager config
- Validate on Intel and AMD bare metal
2.2 OHCI/UHCI Fallback (2 weeks)
- Implement OHCI for AMD/VIA systems
- Implement UHCI for Intel legacy systems
- Add companion controller topology support
2.3 USB Boot Resilience (1 week)
- Ensure USB keyboard available before login prompt on all profiles
- Add USB storage boot support
- Hot-plug stress testing on real hardware
Phase 3: IRQ / IOMMU / MSI-X Hardening (4–6 weeks)
Goal: Production-grade interrupt and DMA safety.
3.1 MSI-X Adoption (2 weeks)
- Migrate
e1000dto MSI-X - Migrate
idedto MSI-X (or document legacy-IRQ-only rationale) - Add MSI-X fallback logging to all PCI drivers
- Validate on real hardware
3.2 IOMMU Hardware Validation (2 weeks)
- AMD-Vi validation on real AMD hardware
- Implement Intel VT-d daemon (migrate from orphaned acpid DMAR)
- Replace
iommu_validate_msi_irq()stub with real validation - DMA map/unmap with IOMMU translation
3.3 IRQ Quality (2 weeks)
- IRQ affinity validation per driver
- Interrupt coalescing for network/storage
- Spurious IRQ accounting improvement
Phase 4: Observability & Logging (2–4 weeks)
Goal: Operator can diagnose system health.
4.1 Structured Logging (2 weeks)
- Add JSON-structured log format option to logd
- Per-service log files in
/var/log/<service>/ - Size-based log rotation (e.g., 10 MB per file)
- Time-based log retention (e.g., 7 days)
4.2 Udev Rules Engine (2 weeks)
- Replace hardcoded rules with subset of Linux udev rules parser
- Event-driven hotplug via scheme notifications (replace polling)
- Persistent device database across reboots
4.3 System Health Dashboard (1 week)
redbear-infothermal/CPU/fan display tab- Boot timeline persistence across switchroot
- Real-time CPU/memory/network metrics
Phase 5: Hardware Validation Matrix (4–6 weeks)
Goal: Evidence-based support claims.
5.1 Define Validation Targets
Minimum 4 hardware classes:
- AMD desktop (Ryzen, discrete GPU)
- Intel desktop (Core, integrated GPU)
- AMD laptop (Ryzen mobile)
- Intel laptop (Core mobile)
5.2 Per-Target Checklist
For each target, validate and record:
- Boots to login prompt
- All CPU cores online (
SMP: N CPUs onlinematches hardware) - USB keyboard works at boot
- USB storage mounts
- Network (wired) obtains DHCP lease
- Temperature readable via
redbear-info - Shutdown succeeds cleanly
- Reboot succeeds cleanly
5.3 Negative-Result Capture
- Document failures per target (e.g., "AMD X670E: AP startup timeout", "Intel Raptor Lake: SMBIOS missing")
- Update this assessment with validation evidence
Phase 6: Desktop Stack Continuation (Parallel)
Goal: Continue the CONSOLE-TO-KDE path on top of hardened substrate.
This phase is orthogonal to the low-level work above. It depends on:
- Qt6Quick/QML downstream proof (unblocks kirigami)
- Real KWin build
- GPU CS ioctl backend + Mesa HW cross-compile
See CONSOLE-TO-KDE-DESKTOP-PLAN.md for detailed desktop path planning.
6. Stale Documents — Remove
The following documents are superseded by this assessment and should be
removed from local/docs/:
| File | Reason |
|---|---|
IMPLEMENTATION-MASTER-PLAN.md |
Master plan role now covered by CONSOLE-TO-KDE v4.1 and this doc |
SUBSYSTEM-ASSESSMENT-2026-05.md |
Assessment consolidated here with broader scope |
SMP-BOOT-HARDENING-PLAN.md |
SMP issues and fixes incorporated here; detailed issue list can be referenced from git history |
CPU-DMA-IRQ-MSI-SCHEDULER-FIX-PLAN.md |
MSI Phase 1 is complete; remaining DMA/scheduler work tracked here |
COMPREHENSIVE-BOOT-IMPROVEMENT-PLAN.md |
Boot issues consolidated into this assessment |
Canonical documents that remain authoritative:
ACPI-IMPROVEMENT-PLAN.md— detailed ACPI wave executionIRQ-AND-LOWLEVEL-CONTROLLERS-ENHANCEMENT-PLAN.md— PCI/IRQ/MSI-X detailsUSB-IMPLEMENTATION-PLAN.md— USB phase executionCONSOLE-TO-KDE-DESKTOP-PLAN.md— desktop pathDRM-MODERNIZATION-EXECUTION-PLAN.md— GPU stackWIFI-IMPLEMENTATION-PLAN.md— Wi-Fi architectureBLUETOOTH-IMPLEMENTATION-PLAN.md— Bluetooth stackDBUS-INTEGRATION-PLAN.md— D-Bus architectureGREETER-LOGIN-IMPLEMENTATION-PLAN.md— greeter designQUIRKS-SYSTEM.md— quirk infrastructurePATCH-GOVERNANCE.md— patch workflowBUILD-SYSTEM-HARDENING-PLAN.md— build system
7. Evidence Model
This assessment uses the same evidence vocabulary as the canonical subsystem plans:
| Class | Meaning |
|---|---|
| Source-visible | Behavior visible in checked-in source |
| Build-visible | Code compiles and stages in current build |
| QEMU-validated | Behavior exercised successfully in QEMU |
| Runtime-validated | Behavior exercised in real boot/runtime |
| Hardware-validated | Behavior proven on named bare-metal hardware |
| Negative-result-documented | Failures and gaps are explicitly recorded |
No subsystem in this assessment is marked "hardware-validated" because no
component has been proven on real bare metal with the rigor defined in
ACPI-IMPROVEMENT-PLAN.md Wave 7.
8. Definition of Done
This plan is complete when:
- SMP brings up all cores reliably on AMD and Intel bare metal
- C-states reduce idle power consumption measurably
- ACPI thermal zones are readable and thermald responds to trip points
- At least 2 sensor drivers report temperature on bare metal
- EHCI driver enables USB keyboard on systems without xHCI routing
- MSI-X is adopted by all new PCI drivers; legacy IRQ is documented fallback
- IOMMU validates on at least one AMD and one Intel platform
- Logging has rotation and per-service separation
- Udev-shim supports event-driven hotplug
- A validation matrix with 4+ hardware targets is published and maintained
End of assessment.