Merge bootprocess branch overlay into 0.2.0

Restore all bootprocess branch files that were overwritten by later 0.2.0 commits. This overlay brings back the complete boot infrastructure: - Configs: redbear-full, redbear-mini, redbear-device-services, driver .d files - Kernel: IRQ affinity, x2APIC, C-states, NUMA (SLIT/SRAT), MCS locks, cpuidle - Base patches: P0-P55 + new P6 (lived block_size=512) + P57 (fbbootlogd graceful init) - Driver infra: driver-manager, udev-shim, thermald, cpufreqd, iommu, redox-driver-sys/core - GPU: redox-drm with improved connector handling - System: redbear-info, redbear-hwutils phase-timer-check - Build system: fetch.rs improvements, build-iso.sh, run_full.sh - Kernel source: new ACPI (SLIT, SRAT), cpuidle, cstate, MCS lock modules 83 files changed, +3966/-1248 lines
2026-05-27 06:47:23 +03:00
parent af05babbb2
commit b9de373b31
83 changed files with 3969 additions and 1251 deletions
@@ -19,7 +19,113 @@ human-initiated operations. Durable Red Bear state belongs in `local/patches/`,
 The current baseline is **Red Bear OS 0.1.0** (Redox snapshot at build-system commit `f55acba68`).
 All recipe sources are pinned and archived in `sources/redbear-0.1.0/`.

-## NO SILENT UPSTREAM PULLS — OFFLINE-FIRST POLICY
+## BUILD SYSTEM DURABILITY — THE CARDINAL RULE
+
+**THE `recipes/*/source/` DIRECTORY WILL ALWAYS BE REWRITTEN. DO NOT EVER USE IT FOR ANY
+WORK THAT YOU INTEND TO KEEP. THOSE TREES ARE EPHEMERAL — THEY ARE DESTROYED AND REGENERATED
+ON EVERY `repo fetch`, `repo cook`, `make clean`, AND `make distclean`. ANY EDIT MADE THERE
+WILL BE SILENTLY LOST ON THE NEXT BUILD. COMMITTING TO A SUBMODULE INSIDE `source/` DOES NOT
+PROTECT YOUR WORK — THE ENTIRE DIRECTORY IS DELETED AND RE-CLONED/RE-EXTRACTED FROM SCRATCH.**
+
+This is the #1 mistake AI agents and new contributors make. It has caused repeated work loss
+in this project. The rule is:
+
+| What you want to do | Where to do it |
+|---|---|
+| Change a kernel source file | Create or update a patch in `local/patches/kernel/` |
+| Change an init or daemon source file | Create or update a patch in `local/patches/base/` |
+| Change relibc | Create or update a patch in `local/patches/relibc/` |
+| Change a driver | Create or update a patch in `local/patches/base/` or `local/patches/<driver>/` |
+| Add a new package | Create a recipe in `local/recipes/<category>/<name>/` |
+| Change build config | Edit `config/redbear-*.toml` |
+| Add documentation | Write to `local/docs/` |
+
+### How the build system works
+
+```
+repo cook <package>
+  ├── repo fetch <package>
+  │   ├── Clone/fetch upstream source → recipes/<pkg>/source/
+  │   ├── Apply patches from recipe.toml → patches are read from local/patches/<pkg>/
+  │   └── Source tree is now fully patched and ready for build
+  ├── Cargo/cmake/configure build
+  └── Stage artifacts into sysroot
+```
+
+The `source/` directory is a disposable working copy. It is produced at the start of every
+build by cloning the upstream source + applying patches sequentially. The recipe's
+`patches = [...]` list in `recipe.toml` controls which patches are applied.
+
+### Two-layer architecture
+
+```
+Layer 1: Ephemeral (destroyed on clean/fetch/rebuild)
+  recipes/<pkg>/source/       ← working tree, cloned + patched
+  build/                      ← build outputs
+  target/                     ← cargo target dir
+
+Layer 2: Durable (survives clean/fetch/rebuild/release provisioning)
+  local/patches/<pkg>/        ← .patch files — the actual source code changes
+  local/recipes/<pkg>/        ← custom recipe directories
+  config/redbear-*.toml       ← Red Bear OS build configs
+  local/docs/                 ← planning and integration docs
+  recipes/<pkg>/recipe.toml   ← the patches list (git-tracked)
+```
+
+### The correct workflow for any source change
+
+1. **Make the change** in `recipes/<pkg>/source/` to validate it compiles
+2. **Generate a patch**: `cd recipes/<pkg>/source && git diff > ../../../local/patches/<pkg>/my-fix.patch`
+3. **Wire the patch**: add `"my-fix.patch"` to the recipe's `recipe.toml` `patches = [...]` list
+4. **Validate**: `./target/release/repo validate-patches <pkg>`
+5. **Rebuild**: `./target/release/repo cook <pkg>`
+6. **Commit**: `git add local/patches/ recipes/<pkg>/recipe.toml && git commit`
+
+### Common anti-patterns
+
+| Anti-pattern | Why it fails |
+|---|---|
+| Editing `source/` files then running `make all` | `make all` calls `repo fetch` which regenerates `source/` — edits are lost |
+| Creating a patch but not wiring it into `recipe.toml` | Patch file exists but is never applied — build uses unpatched source |
+| **Hand-writing patches manually** | **FORBIDDEN. Unified diffs hand-written by humans routinely have incorrect line counts, wrong context, malformed hunks, or timestamp headers — all of which cause `patch(1)` to reject them. The ONLY acceptable way to generate patches is `git diff -U0 -w` from a committed source tree baseline.** |
+| Editing `recipe.toml` patches list without creating the actual `.patch` file | Build fails with "missing patch" error |
+| Editing `recipe.toml` patches list without creating the actual `.patch` file | Build fails with "missing patch" error |
+| Expecting `source/` changes to survive `make clean` | `make clean` deletes `source/` directories |
+| Running `repo cook` without `--allow-protected` for core packages | Protected recipes (kernel, relibc, base) are offline-only by default |
+
+### Patch file location convention
+
+- `local/patches/base/` — for the `base` package (init, daemon, all drivers)
+- `local/patches/kernel/` — for the kernel
+- `local/patches/relibc/` — for relibc
+- `local/patches/installer/` — for the installer
+- `local/patches/bootloader/` — for the bootloader
+- `local/patches/<package>/` — for any other patched package
+
+### Recipe patch wiring
+
+Each recipe's `recipe.toml` lists patches relative to `local/patches/<pkg>/`:
+
+```toml
+[source]
+git = "https://gitlab.redox-os.org/redox-os/base.git"
+rev = "463f76b96..."
+patches = [
+    "P0-daemon-fix-init-notify-unwrap.patch",   # applied first
+    "P9-init-scheduler-completed.patch",          # applied second
+    # ... more patches
+]
+```
+
+Patches are applied in listed order. Dependencies between patches must be respected (a patch
+that defines a type must come before a patch that uses it).
+
+### Kernel-specific notes
+
+The kernel source at `recipes/core/kernel/source/` is a separate git worktree (rev `866dfad`).
+The kernel recipe is at `recipes/core/kernel/recipe.toml` and patches are at
+`local/patches/kernel/`. The same durability rules apply — all kernel changes must be
+in `local/patches/kernel/*.patch`, never in the `source/` tree directly.

 **Red Bear OS is offline-first by default. No script, build target, or tool may silently pull
 from any upstream repository without explicit user instruction.**
@@ -178,10 +284,24 @@ make all
  → mk/fstools.mk (build cookbook repo binary + fstools)
  → mk/repo.mk (repo cook --filesystem=config/*.toml)
    → For each recipe: fetch source → apply patches → build → stage into sysroot
+    → Each successful build produces repo/<arch>/<name>.pkgar + <name>.toml
  → mk/disk.mk (create filesystem.img, harddrive.img, redbear-live.iso or harddrive.img)
    → redoxfs-mkfs → redox_installer → bootloader embedding
 ```

+### Build Outputs
+
+Every successful `repo cook <package>` produces:
+
+| Artifact | Location | Purpose |
+|----------|----------|---------|
+| Package archive | `repo/x86_64-unknown-redox/<name>.pkgar` | Binary package for image assembly |
+| Package manifest | `repo/x86_64-unknown-redox/<name>.toml` | Metadata, version, deps, hashes |
+| Staged sysroot | `recipes/*/<name>/target/.../stage/` | Files for `repo push` |
+| Source tree | `recipes/*/<name>/source/` | Fetched + patched source (disposable) |
+
+**A build is not complete until the .pkgar and .toml exist in `repo/`.**
+
 ## CONVENTIONS

 - **Rust edition 2024**, nightly channel
@@ -444,6 +564,65 @@ or any path that is already git-tracked and not inside a fetched source tree.

 ## BUILD SYSTEM POLICIES

+### Build Durability Rule — Every Build Lands in the Repo
+
+Every successful `repo cook` produces two durable artifacts:
+
+1. **Package in the repo**: `repo/x86_64-unknown-redox/<name>.pkgar` + `<name>.toml`
+2. **Patched source form**: All source modifications are in `local/patches/<component>/` and wired into `recipe.toml`
+
+A build is **not complete** until both artifacts exist:
+
+```bash
+# After cooking, verify the package is in the repo
+./target/release/repo find <package>
+
+# Check the repo manifest exists
+ls repo/x86_64-unknown-redox/<package>.toml
+ls repo/x86_64-unknown-redox/<package>.pkgar
+```
+
+If a package was built but the repo artifacts are missing, the build did not complete.
+Re-run `repo cook <package>` to regenerate them.
+
+If source patches were applied but not mirrored to `local/patches/`, see the
+DURABILITY POLICY section above.
+
+### Cascade Rebuild Rule
+
+When a low-level package changes (relibc, kernel, base, or any library), **all
+packages that depend on it must be rebuilt**. A stale dependent silently produces
+link errors, ABI mismatches, or runtime crashes.
+
+Use the cascade rebuild script:
+
+```bash
+# Rebuild relibc and everything that depends on it
+./local/scripts/rebuild-cascade.sh relibc
+
+# Dry run: show what would be rebuilt without building
+./local/scripts/rebuild-cascade.sh --dry-run relibc
+
+# Multiple root packages
+./local/scripts/rebuild-cascade.sh relibc ncurses
+```
+
+The script:
+1. Finds all packages whose `recipe.toml` lists the target in `dependencies`
+2. Transitively expands the reverse dependency graph (BFS)
+3. Builds the root package(s) first, then dependents in order
+4. Pushes all rebuilt packages to the sysroot
+
+**When to use cascade rebuilds:**
+- After changing relibc headers or ABI
+- After rebuilding a shared library (ncurses, zlib, openssl, etc.)
+- After kernel ABI changes that affect userspace
+- After any change to a package listed in other packages' `dependencies`
+
+**When NOT to use cascade rebuilds:**
+- Standalone applications with no dependents (editors, games, utilities)
+- Terminal/leaf packages that nothing depends on
+
 ### Atomic Patch Application

 The cookbook tool (`src/cook/fetch.rs`) applies patches **atomically**:
@@ -466,12 +645,78 @@ Patches may use either format:

 Git-specific headers (`diff --git`, `diff -ruN`, `index`, `new file mode`, `rename from/to`,
 `similarity index`, `dissimilarity index`) are automatically stripped before
-`patch` is invoked. The build system uses `--fuzz=0` for strict context matching.
+`patch` is invoked. The build system uses `--fuzz=3` for resilient context matching.

 **Timestamps in `---`/`+++` lines** (common in `diff -ruN` output) should be removed.
 Use `--- a/path` and `+++ b/path` without timestamps. The `normalize_patch` function
 does NOT strip timestamps — they should be removed from the patch file directly.

+### Robust Patch Generation (REQUIRED)
+
+**MANDATORY: All patches MUST be generated using `git diff -U0 -w` from a committed source tree.
+Hand-writing unified diffs is FORBIDDEN — it routinely produces incorrect line counts, malformed
+hunks, or timestamp headers that cause `patch(1)` to reject them. The build system uses
+`--fuzz=3` for resilient context matching, which requires properly generated diffs.**
+
+Context-line mismatches (renamed variables, shifted line numbers, upstream refactors)
+are the single largest source of patch application failures. Use the zero-context,
+whitespace-ignored technique to make patches resilient to drift:
+
+**Workflow (mandatory):**
+```bash
+# 1. Start with a clean P0..P(N-1) source tree (repo fetch already applied earlier patches)
+cd recipes/<component>/source
+
+# 2. Commit the P0..P(N-1) state as a git baseline
+git add -A && git commit -m "P0..P(N-1) baseline"
+
+# 3. Make P(N) edits in the source tree
+#    (edit files, test compile, etc.)
+
+# 4. Generate the P(N) patch using ONLY git diff -U0 -w:
+git diff -U0 -w > ../../../local/patches/<component>/P<N>-<description>.patch
+
+# 5. Wire the patch into recipe.toml patches list
+
+# 6. Validate: repo validate-patches <package>
+# 7. Rebuild: repo cook <package>
+# 8. Commit: git add local/patches/ recipes/<pkg>/recipe.toml && git commit
+```
+
+**Apply (for manual testing):**
+```bash
+patch -p1 --fuzz=3 < local/patches/<component>/P<N>-<description>.patch>
+```
+
+**Why this works:**
+- `-U0` produces zero lines of surrounding context, so the patch has no fragile context
+  lines that can drift when surrounding code changes
+- `-w` ignores all whitespace changes, so indentation-only refactors don't break the patch
+- `--fuzz=3` allows `patch(1)` to find the target location even when nearby lines have shifted
+- Together these three flags eliminate the entire class of "context mismatch" failures
+
+**Why hand-writing is forbidden:**
+- Human-written diffs routinely have wrong `@@` line counts, missing or extra context lines,
+  incorrect `--- a/` / `+++ b/` paths, or embedded timestamps — all of which cause `patch(1)`
+  to reject the patch or silently apply it to the wrong location
+- The `git diff -U0 -w` command produces mechanically correct diffs every time
+
+**Before this technique**, patches routinely broke when:
+- A variable was renamed (e.g., `deamon` → `daemon` in context)
+- Lines were added or removed above the changed code
+- Indentation was reformatted
+- An earlier patch in the chain shifted line numbers
+
+**With this technique**, patches survive all of the above. A hunk consists only of the
+changed lines themselves — no context that can go stale.
+
+**Conventions:**
+- Always use `--- a/path` and `+++ b/path` headers (no timestamps)
+- Always name patches `P<N>-<description>.patch` with sequential numbering
+- Always wire patches into `recipe.toml` `patches = [...]` in application order
+- Always validate with `repo validate-patches <package>` after creating or editing a patch
+- When updating an existing patch, regenerate it entirely rather than editing line numbers manually
+
 ### Protected Recipes

 Core recipes (`base`, `kernel`, `relibc`, `bootloader`, etc.) and any recipe carrying
@@ -18,7 +18,7 @@ path = "/usr/lib/init.d/10_acid.service"
 data = """
 [unit]
 description = "Acid test runner"
-requires_weak = ["00_pcid-spawner.service"]
+requires_weak = ["00_driver-manager.service"]

 [service]
 cmd = "ion"
@@ -1,6 +1,11 @@
 # Red Bear OS shared device-service wiring
 #
 # Shared by profiles that ship the firmware/input/Wi-Fi control compatibility stack.
+#
+# Driver matching: driver-manager reads /lib/drivers.d/*.toml and matches against
+# devices from both PCI and ACPI buses. ACPI devices are classified with PCI-equivalent
+# class/subclass/vendor codes by redox-driver-acpi's AcpiBus, allowing reuse of existing
+# driver match rules.

 [packages]
 redbear-quirks = {}
@@ -32,9 +37,9 @@ data = """
 path = "/etc/init.d/12_boot-late.target"
 data = """
 [unit]
-description = "Late boot services target"
+description = "Late boot services target (compat alias for 04_drivers.target)"
 requires_weak = [
-    "00_base.target",
+    "04_drivers.target",
 ]
 """

@@ -54,6 +59,7 @@ priority = 100
 command = ["/usr/lib/drivers/nvmed"]

 [[driver.match]]
+bus = "pci"
 class = 1
 subclass = 8

@@ -64,6 +70,7 @@ priority = 100
 command = ["/usr/lib/drivers/ahcid"]

 [[driver.match]]
+bus = "pci"
 class = 1
 subclass = 6

@@ -74,6 +81,7 @@ priority = 100
 command = ["/usr/lib/drivers/ided"]

 [[driver.match]]
+bus = "pci"
 class = 1
 subclass = 1

@@ -84,6 +92,7 @@ priority = 100
 command = ["/usr/lib/drivers/virtio-blkd"]

 [[driver.match]]
+bus = "pci"
 vendor = 0x1AF4
 device = 0x1001
 class = 1
@@ -100,6 +109,7 @@ priority = 50
 command = ["/usr/lib/drivers/e1000d"]

 [[driver.match]]
+bus = "pci"
 vendor = 0x8086
 class = 2

@@ -110,6 +120,7 @@ priority = 50
 command = ["/usr/lib/drivers/rtl8168d"]

 [[driver.match]]
+bus = "pci"
 vendor = 0x10EC
 class = 2

@@ -120,6 +131,7 @@ priority = 50
 command = ["/usr/lib/drivers/rtl8139d"]

 [[driver.match]]
+bus = "pci"
 vendor = 0x10EC
 device = 0x8139

@@ -130,6 +142,7 @@ priority = 50
 command = ["/usr/lib/drivers/ixgbed"]

 [[driver.match]]
+bus = "pci"
 vendor = 0x8086
 class = 2
 subclass = 0
@@ -141,6 +154,7 @@ priority = 50
 command = ["/usr/lib/drivers/virtio-netd"]

 [[driver.match]]
+bus = "pci"
 vendor = 0x1AF4
 class = 2
 """
@@ -155,6 +169,7 @@ priority = 80
 command = ["/usr/lib/drivers/xhcid"]

 [[driver.match]]
+bus = "pci"
 class = 0x0C
 subclass = 0x03
 prog_if = 0x30
@@ -169,6 +184,7 @@ command = ["/usr/lib/drivers/ehcid"]
 # control-transfer pass-through while the wider USB stack continues converging.

 [[driver.match]]
+bus = "pci"
 class = 0x0C
 subclass = 0x03
 prog_if = 0x20
@@ -180,6 +196,7 @@ priority = 80
 command = ["/usr/lib/drivers/ohcid"]

 [[driver.match]]
+bus = "pci"
 class = 0x0C
 subclass = 0x03
 prog_if = 0x10
@@ -191,6 +208,7 @@ priority = 80
 command = ["/usr/lib/drivers/uhcid"]

 [[driver.match]]
+bus = "pci"
 class = 0x0C
 subclass = 0x03
 prog_if = 0x00
@@ -206,6 +224,7 @@ priority = 60
 command = ["/usr/bin/redox-drm"]

 [[driver.match]]
+bus = "pci"
 class = 0x03
 """

@@ -233,6 +252,7 @@ priority = 40
 command = ["/usr/lib/drivers/ihdad"]

 [[driver.match]]
+bus = "pci"
 vendor = 0x8086
 class = 0x04

@@ -243,10 +263,89 @@ priority = 40
 command = ["/usr/lib/drivers/ac97d"]

 [[driver.match]]
+bus = "pci"
 class = 0x04
 subclass = 0x01
 """

+[[files]]
+path = "/etc/init.d/00_acpid.service"
+data = """
+[unit]
+description = "ACPI daemon (provides scheme:acpi)"
+default_dependencies = false
+
+[service]
+cmd = "acpid"
+inherit_envs = ["RSDP_ADDR", "RSDP_SIZE"]
+type = "notify"
+"""
+
+# ACPI GPIO/I2C controller drivers
+# These match against ACPI-enumerated devices (class/subclass/vendor from _HID).
+[[files]]
+path = "/lib/drivers.d/60-gpio-i2c.toml"
+data = """
+# I2C bus registry — infrastructure, no hardware match
+[[driver]]
+name = "i2cd"
+description = "I2C host adapter registry"
+priority = 85
+command = ["/usr/lib/drivers/i2cd"]
+
+# GPIO pin registry — infrastructure, no hardware match
+[[driver]]
+name = "gpiod"
+description = "GPIO controller registry"
+priority = 85
+command = ["/usr/lib/drivers/gpiod"]
+
+# Intel ACPI I2C controller (DesignWare)
+# Matches: INT33C3, INT3433, INT3442, INT3446, INT3447, INT3455, INT34B9
+[[driver]]
+name = "dw-acpi-i2cd"
+description = "DesignWare ACPI I2C controller"
+priority = 80
+command = ["/usr/lib/drivers/dw-acpi-i2cd"]
+depends_on = ["acpi", "i2c"]
+
+[[driver.match]]
+bus = "acpi"
+class = 0x0C
+subclass = 0x05
+vendor = 0x8086
+
+# AMD MP2 I2C controller
+# Matches: AMDI0010, AMDI0510, AMDI0019
+[[driver]]
+name = "amd-mp2-i2cd"
+description = "AMD MP2 I2C controller"
+priority = 80
+command = ["/usr/lib/drivers/amd-mp2-i2cd"]
+depends_on = ["acpi", "i2c"]
+
+[[driver.match]]
+bus = "acpi"
+class = 0x0C
+subclass = 0x05
+vendor = 0x1022
+
+# Intel ACPI GPIO controller
+# Matches: INT33C7, INT3437, INT3450, INT345D, INT34BB
+[[driver]]
+name = "intel-gpiod"
+description = "Intel ACPI GPIO registrar"
+priority = 80
+command = ["/usr/lib/drivers/intel-gpiod"]
+depends_on = ["acpi", "gpio"]
+
+[[driver.match]]
+bus = "acpi"
+class = 0x0C
+subclass = 0x80
+vendor = 0x8086
+"""
+
 [[files]]
 path = "/lib/drivers.d/70-usb-class.toml"
 data = """
@@ -281,15 +380,15 @@ vendor = 0xFFFF
 device = 0xFFFF
 """

-# Profiles that include this fragment should start `driver-manager` instead of
-# `pcid-spawner`; the manager performs the PCI bind/channel handoff itself.
+# driver-manager owns PCI device enumeration, driver matching, and bind/channel
+# handoff — replacing the old pcid + pcid-spawner pair entirely.
 [[files]]
 path = "/etc/init.d/00_driver-manager.service"
 data = """
 [unit]
 description = "Red Bear driver manager"
 requires_weak = [
-    "00_base.target",
+    "02_early_hw.target",
 ]

 [service]
@@ -298,33 +397,26 @@ args = ["--hotplug"]
 type = "oneshot_async"
 """

+# Override the base package's 30_thermald.service with a no-op since
+# 15_thermald.service (above) replaces it with earlier start ordering.
 [[files]]
-path = "/etc/init.d/10_evdevd.service"
+path = "/etc/init.d/30_thermald.service"
 data = """
 [unit]
-description = "Evdev input daemon"
-requires_weak = [
-    "12_boot-late.target",
-    "00_driver-manager.service",
-]
+description = "Thermal management daemon (suppressed; use 15_thermald.service)"

 [service]
-cmd = "evdevd"
-type = "oneshot_async"
+cmd = "echo"
+args = ["thermald: started earlier as 15_thermald.service"]
+type = "oneshot"
 """

-[[files]]
-path = "/etc/firmware-fallbacks.d"
-data = ""
-directory = true
-mode = 0o755
-
 [[files]]
 path = "/etc/init.d/15_cpufreqd.service"
 data = """
 [unit]
 description = "CPU frequency scaling daemon"
-requires_weak = ["12_boot-late.target"]
+requires_weak = ["04_drivers.target"]

 [service]
 cmd = "/usr/bin/cpufreqd"
@@ -336,13 +428,25 @@ path = "/etc/init.d/15_thermald.service"
 data = """
 [unit]
 description = "Thermal management daemon"
-requires_weak = ["12_boot-late.target"]
+requires_weak = ["04_drivers.target"]

 [service]
 cmd = "/usr/bin/thermald"
 type = "oneshot_async"
 """

+[[files]]
+path = "/etc/init.d/15_coretempd.service"
+data = """
+[unit]
+description = "CPU temperature sensor daemon"
+requires_weak = ["04_drivers.target"]
+
+[service]
+cmd = "/usr/bin/coretempd"
+type = { scheme = "coretemp" }
+"""
+
 [[files]]
 path = "/etc/init.d/15_hwrngd.service"
 data = """
@@ -372,7 +476,7 @@ path = "/etc/init.d/16_redbear-acmd.service"
 data = """
 [unit]
 description = "USB CDC ACM serial daemon"
-requires_weak = ["12_boot-late.target"]
+requires_weak = ["04_drivers.target"]

 [service]
 cmd = "/usr/bin/redbear-acmd"
@@ -384,7 +488,7 @@ path = "/etc/init.d/16_redbear-ecmd.service"
 data = """
 [unit]
 description = "USB CDC ECM/NCM ethernet daemon"
-requires_weak = ["12_boot-late.target"]
+requires_weak = ["04_drivers.target"]

 [service]
 cmd = "/usr/bin/redbear-ecmd"
@@ -396,7 +500,7 @@ path = "/etc/init.d/16_redbear-usbaudiod.service"
 data = """
 [unit]
 description = "USB Audio Class daemon"
-requires_weak = ["12_boot-late.target"]
+requires_weak = ["04_drivers.target"]

 [service]
 cmd = "/usr/bin/redbear-usbaudiod"
@@ -137,7 +137,7 @@ kwin = {}
 redbear-authd = {}
 redbear-session-launch = {}
 seatd = {}
-redbear-greeter = "ignore"  # WIP: blocked on qmlimportscanner from qtdeclarative
+redbear-greeter = {}
 amdgpu = {}

 # Core Red Bear umbrella package
@@ -148,8 +148,8 @@ relibc-phase1-tests = {}

 # Native build toolchain (Phase 3: GCC + binutils running on redox)
 # Produces gcc/g++/as/ld that execute inside Red Bear OS
-gcc-native = "ignore"  # WIP: depends on binutils-native
-binutils-native = "ignore"  # WIP: source archive not in offline cache
+gcc-native = {}
+binutils-native = {}
 # llvm-native = {}  # suppressed: Redox C++/pthread header gaps; not needed for greeter proof
 # rust-native = {}  # suppressed: depends on llvm-native; not needed for greeter proof

@@ -237,7 +237,7 @@ data = """
 [unit]
 description = "Boot essential services target"
 requires_weak = [
-    "00_base.target",
+    "04_drivers.target",
 ]
 """

@@ -261,7 +261,7 @@ data = """
 [unit]
 description = "DRM/KMS display driver (AMD + Intel + VirtIO)"
 requires_weak = [
-    "05_boot-essential.target",
+    "04_drivers.target",
 ]

 [service]
@@ -276,7 +276,7 @@ data = """
 [unit]
 description = "D-Bus system bus"
 requires_weak = [
-    "12_boot-late.target",
+    "06_services.target",
    "00_ipcd.service",
 ]

@@ -292,6 +292,7 @@ data = """
 [unit]
 description = "Red Bear session broker (org.freedesktop.login1)"
 requires_weak = [
+    "06_services.target",
    "12_dbus.service",
 ]

@@ -306,6 +307,7 @@ data = """
 [unit]
 description = "seatd seat management daemon"
 requires_weak = [
+    "06_services.target",
    "12_dbus.service",
    "13_redbear-sessiond.service",
 ]
@@ -425,6 +427,7 @@ data = """
 [unit]
 description = "Red Bear greeter service"
 requires_weak = [
+    "08_userland.target",
    "00_driver-manager.service",
    "14_redox-drm.service",
    "12_dbus.service",
@@ -444,8 +447,9 @@ path = "/etc/init.d/29_activate_console.service"
 data = """
 [unit]
 description = "Activate fallback console VT"
+default_dependencies = false
 requires_weak = [
-    "05_boot-essential.target",
+    "00_base.target",
 ]

 [service]
@@ -459,6 +463,7 @@ path = "/etc/init.d/30_console.service"
 data = """
 [unit]
 description = "Console terminals"
+default_dependencies = false
 requires_weak = [
    "29_activate_console.service",
 ]
@@ -474,6 +479,7 @@ path = "/etc/init.d/31_debug_console.service"
 data = """
 [unit]
 description = "Debug console on serial port"
+default_dependencies = false
 requires_weak = [
    "29_activate_console.service",
 ]
@@ -517,34 +523,6 @@ members = ["greeter"]
 gid = 100
 members = ["messagebus"]

-[[files]]
-path = "/etc/pcid.d/ihdgd.toml"
-data = """
-[[drivers]]
-name = "Intel GPU (VGA compatible)"
-class = 0x03
-vendor = 0x8086
-subclass = 0x00
-command = ["redox-drm"]
-
-[[drivers]]
-name = "Intel GPU (3D controller)"
-class = 0x03
-vendor = 0x8086
-subclass = 0x02
-command = ["redox-drm"]
-"""
-
-[[files]]
-path = "/etc/pcid.d/virtio-gpud.toml"
-data = """
-[[drivers]]
-name = "VirtIO GPU"
-class = 0x03
-vendor = 0x1af4
-device = 0x1050
-command = ["/usr/bin/redox-drm"]
-"""

 [[files]]
 path = "/etc/environment.d/90-dbus.conf"
@@ -8,7 +8,7 @@ data = """
 [unit]
 description = "Boot essential services target"
 requires_weak = [
-    "00_base.target",
+    "04_drivers.target",
 ]
 """

@@ -101,7 +101,7 @@ data = """
 [unit]
 description = "Activate fallback console VT"
 requires_weak = [
-    "05_boot-essential.target",
+    "08_userland.target",
 ]

 [service]
@@ -3,14 +3,9 @@
 # 00_base.service: stripped base setup (tmpdir only, no sudo — sudo runs from
 #                  base.toml's 00_sudo.service). ipcd and ptyd are started by
 #                  00_ipcd.service and 00_ptyd.service from the base recipe.
-# 00_drivers / 10_net: no longer overridden — the legacy scripts were removed
-#             from base.toml. The retained 00_pcid-spawner.service unit name now
-#             launches driver-manager so existing init ordering remains stable.
-# 00_pcid-spawner.service: compatibility wrapper for driver-manager. The base
-#          recipe uses type="oneshot" which blocks init until pcid-spawner exits.
-#          Running driver-manager here with oneshot_async keeps the historic unit
-#          name for downstream `requires_weak` consumers while moving PCI driver
-#          spawning to the manager that performs bind/channel handoff.
+# 00_pcid-spawner.service has been fully replaced by 00_driver-manager.service
+#                  (defined in redbear-device-services.toml). The old pcid-spawner
+#                  unit name is no longer used anywhere.

 [packages]
 zsh = {}
@@ -38,16 +33,3 @@ default_dependencies = false
 cmd = "audiod"
 type = "oneshot_async"
 """
-
-[[files]]
-path = "/etc/init.d/00_pcid-spawner.service"
-data = """
-[unit]
-description = "PCI driver spawner compatibility alias"
-default_dependencies = false
-
-[service]
-cmd = "echo"
-args = ["pcid-spawner compatibility alias: driver-manager owns PCI driver spawning"]
-type = "oneshot"
-"""
@@ -9,7 +9,7 @@
 # - all non-graphics, non-firmware packages from the full profile
 # - no linux-firmware payload, no firmware-loader, no GPU/display drivers

-include = ["minimal.toml", "redbear-legacy-base.toml", "redbear-netctl.toml", "redbear-device-services.toml"]
+include = ["minimal.toml", "redbear-legacy-base.toml", "redbear-netctl.toml", "redbear-device-services.toml", "redbear-boot-stages.toml"]

 [general]
 filesystem_size = 1536
@@ -27,9 +27,8 @@ redbear-release = {}
 redbear-hwutils = {}
 redbear-quirks = {}

-# Device driver infrastructure: driver-manager is started by
-# redbear-device-services.toml, with 00_pcid-spawner.service retained only as a
-# compatibility dependency alias for older service units.
+# Device driver infrastructure: driver-manager replaces pcid-spawner;
+# 00_driver-manager.service is defined in redbear-device-services.toml.
 ehcid = {}
 ohcid = {}
 uhcid = {}
@@ -53,6 +52,7 @@ redbear-info = {}
 cub = {}
 cpufreqd = {}
 thermald = {}
+coretempd = {}
 hwrngd = {}
 redbear-acmd = {}
 redbear-ecmd = {}
@@ -99,7 +99,7 @@ meson = {}
 ninja-build = {}
 m4 = {}
 #git = {}  # suppressed: cascading rebuild; git not needed for boot/recovery
-htop = {}
+#htop = {}  # disabled: build failure in redoxer env (pre-existing)
 #mc = {}  # suppressed: C99 format warning errors in compilation

 # ── Build / packaging utilities ──
@@ -231,6 +231,7 @@ path = "/etc/init.d/00_i2c-dw-acpi.service"
 data = """
 [unit]
 description = "DesignWare ACPI I2C controller (non-blocking)"
+default_dependencies = false
 requires_weak = [
    "00_i2cd.service",
 ]
@@ -245,6 +246,7 @@ path = "/etc/init.d/00_intel-gpiod.service"
 data = """
 [unit]
 description = "Intel ACPI GPIO registrar (non-blocking)"
+default_dependencies = false
 requires_weak = [
    "00_gpiod.service",
    "00_i2cd.service",
@@ -260,6 +262,7 @@ path = "/etc/init.d/00_i2c-gpio-expanderd.service"
 data = """
 [unit]
 description = "I2C GPIO expander companion bridge (non-blocking on live-mini)"
+default_dependencies = false
 requires_weak = [
    "00_i2cd.service",
    "00_gpiod.service",
@@ -275,6 +278,8 @@ path = "/etc/init.d/00_i2c-hidd.service"
 data = """
 [unit]
 description = "ACPI I2C HID bring-up daemon (non-blocking)"
+default_dependencies = false
+requires = ["00_acpid.service"]
 requires_weak = [
    "00_i2cd.service",
    "00_i2c-dw-acpi.service",
@@ -292,6 +297,7 @@ path = "/etc/init.d/00_ucsid.service"
 data = """
 [unit]
 description = "USB-C UCSI topology detector (non-blocking on live-mini)"
+default_dependencies = false
 requires_weak = [
    "00_base.target",
    "00_i2cd.service",
@@ -306,9 +312,9 @@ type = { scheme = "ucsi" }
 path = "/etc/init.d/12_boot-late.target"
 data = """
 [unit]
-description = "Late boot services target"
+description = "Late boot services target (compat alias for 04_drivers.target)"
 requires_weak = [
-    "00_base.target",
+    "04_drivers.target",
 ]
 """

@@ -467,23 +473,7 @@ data = ""
 directory = true
 mode = 0o755

-[[files]]
-path = "/etc/pcid.d/ihdgd.toml"
-data = """
-# redbear-live-mini: text-only image; override upstream ihdgd config with empty file
-"""

-[[files]]
-path = "/etc/pcid.d/virtio-gpud.toml"
-data = """
-# redbear-live-mini: text-only image; override upstream virtio-gpud config with empty file
-"""
-
-[[files]]
-path = "/etc/pcid.d/00_text_mode_gpu_mask.toml"
-data = """
-# redbear-live-mini: no display driver matched; class 0x03 devices are skipped
-"""

 [[files]]
 path = "/lib/drivers.d/30-graphics.toml"
@@ -502,6 +492,7 @@ path = "/etc/init.d/29_activate_console.service"
 data = """
 [unit]
 description = "Activate console VT"
+default_dependencies = false
 requires_weak = ["00_base.target"]

 [service]
@@ -515,6 +506,7 @@ path = "/etc/init.d/30_console.service"
 data = """
 [unit]
 description = "Console terminals"
+default_dependencies = false
 requires_weak = ["29_activate_console.service"]

 [service]
@@ -528,6 +520,7 @@ path = "/etc/init.d/31_debug_console.service"
 data = """
 [unit]
 description = "Debug console"
+default_dependencies = false
 requires_weak = ["29_activate_console.service"]

 [service]
@@ -535,3 +528,16 @@ cmd = "getty"
 args = ["/scheme/debug/no-preserve", "-J"]
 type = "oneshot_async"
 """
+
+[[files]]
+path = "/etc/init.d/08_userland.target"
+data = """
+[unit]
+description = "Userland services target"
+requires_weak = [
+    "06_services.target",
+    "29_activate_console.service",
+    "30_console.service",
+    "31_debug_console.service",
+]
+"""
@@ -21,7 +21,7 @@ path = "/usr/lib/init.d/10_smolnetd.service"
 data = """
 [unit]
 description = "Network stack for redoxer"
-requires_weak = ["00_pcid-spawner.service"]
+requires_weak = ["00_driver-manager.service"]

 [service]
 cmd = "netstack"
@@ -1,5 +1,29 @@
 # RED BEAR OS — DERIVATIVE OF REDOX OS

+## INTERNAL-FIRST POLICY
+
+When building packages, resolving dependencies, or choosing between implementations,
+**always prefer internal Red Bear recipes and patched sources** over upstream or
+third-party alternatives. This applies at every decision point:
+
+| Decision | Always prefer |
+|----------|--------------|
+| Recipe location | `local/recipes/` over `recipes/` (upstream) |
+| Patch source | `local/patches/` over in-tree edits or ad-hoc sed |
+| Source tarball | Patched archive in `sources/redbear-0.1.0/tarballs/` over re-download |
+| Implementation | Red Bear Rust implementation over upstream C port |
+| Config | `config/redbear-*.toml` over mainline `config/*.toml` |
+| Scripts | `local/scripts/` over ad-hoc shell commands |
+
+**Concretely:** if `local/recipes/<category>/<name>/` exists and is symlinked into the
+recipe tree, that is the authoritative recipe — never fall back to the upstream
+`recipes/` version. If a local recipe has a `redox.patch`, that patch is the
+maintained Red Bear delta — never work around it by editing the source tree directly.
+
+**Rationale:** the local overlay is the durable, version-controlled, release-safe layer.
+Upstream recipes are disposable and may be overwritten by `make distclean` or release
+provisioning. Only `local/` survives across rebuilds and releases.
+
 ## TUI CONVENTION — `-i` INTERACTIVE SWITCH

 All Red Bear desktop applications that offer a TUI mode MUST use `-i`/`--interactive`
@@ -50,6 +74,58 @@ files, Wayland protocol stubs, D-Bus service stubs, and any other layer of the s

 **No exceptions. No "temporary." No "until we fix it properly."**

+## BUILD DURABILITY AND CASCADE POLICY
+
+### Every Build Lands in the Repo
+
+Every successful `repo cook <package>` MUST produce two durable artifacts:
+
+1. **Package in the repo**: `repo/x86_64-unknown-redox/<name>.pkgar` + `<name>.toml`
+2. **Patched source form**: All source modifications mirrored to `local/patches/<component>/`
+
+A build is **not complete** until both exist. Verify after every cook:
+
+```bash
+./target/release/repo find <package>           # Must find the package
+ls repo/x86_64-unknown-redox/<package>.toml     # Manifest must exist
+ls repo/x86_64-unknown-redox/<package>.pkgar    # Archive must exist
+```
+
+If a package was built but the repo artifacts are missing, the build did not complete.
+If source patches exist only in `recipes/*/source/` but not in `local/patches/`,
+the patches are not durable (see Source-of-Truth Rule below).
+
+### Cascade Rebuild Rule
+
+When a low-level package changes, **all packages that transitively depend on it
+must be rebuilt**. A stale dependent silently produces link errors, ABI mismatches,
+or runtime crashes.
+
+```bash
+# Rebuild relibc and everything that depends on it
+./local/scripts/rebuild-cascade.sh relibc
+
+# Dry run: show what would be rebuilt without building
+./local/scripts/rebuild-cascade.sh --dry-run relibc
+
+# Multiple root packages
+./local/scripts/rebuild-cascade.sh relibc ncurses
+```
+
+The script performs BFS over reverse dependencies: it finds all packages whose
+`recipe.toml` lists the target in `dependencies`, transitively expands, then builds
+root-first followed by dependents.
+
+**Always use cascade rebuilds after changing:**
+- relibc (headers, ABI, any patches)
+- Kernel (syscall ABI changes)
+- Shared libraries (ncurses, zlib, openssl, etc.)
+- Any package listed in other packages' `dependencies`
+
+**Example:** Changing relibc's `sys/types/internal.h` header requires rebuilding
+bison, m4, flex, and every other gnulib-based package that includes system headers
+through the relibc include chain.
+
 ## DESIGN PRINCIPLE

 Red Bear OS is a **full fork** based on frozen Redox OS snapshots:
@@ -73,10 +149,21 @@ make all CONFIG_NAME=redbear-full
  → mk/config.mk resolves to the active desktop/graphics compile target
  → Desktop/graphics are available only on redbear-full
  → repo cook builds all packages from local sources (offline by default)
+  → Each successful cook produces repo/<arch>/<name>.pkgar + <name>.toml
  → mk/disk.mk creates harddrive.img with Red Bear branding
  → REDBEAR_RELEASE=0.1.0 ensures immutable, archived sources
 ```

+Cascade rebuild flow (when a low-level package changes):
+```
+./local/scripts/rebuild-cascade.sh <package>
+  → Finds all packages whose recipe.toml lists <package> in dependencies
+  → BFS expands the reverse dependency graph
+  → Builds root package first, then dependents in dependency order
+  → Pushes all rebuilt packages to sysroot
+  → Every rebuilt package lands in repo/ (.pkgar + .toml)
+```
+
 Release flow:
 ```
 # Sources are immutable — build from archives, never from network
@@ -259,6 +346,7 @@ redox-master/                  ← git pull updates mainline Redox
 │   │   └── images/            ← Red Bear OS icon (1254x1254) + loading bg (1536x1024)
 │   ├── firmware/              ← GPU firmware blobs (gitignored, fetched)
 │   ├── scripts/
+│   │   ├── rebuild-cascade.sh  ← Rebuild package + all dependents (BFS reverse-dep graph)
 │   │   ├── provision-release.sh   ← Provision new release from Redox ref
 │   │   ├── build-redbear.sh   ← Unified Red Bear OS build script
 │   │   ├── fetch-firmware.sh  ← Download bounded AMD or Intel firmware subsets from linux-firmware
@@ -311,6 +399,10 @@ scripts/build-iso.sh redbear-full                 # Full desktop live ISO
 scripts/build-iso.sh redbear-mini                 # Text-only mini (default)
 scripts/build-iso.sh redbear-grub                 # Text-only + GRUB

+# Rebuild a package and all its dependents (cascade)
+./local/scripts/rebuild-cascade.sh relibc         # Rebuild relibc + all dependents
+./local/scripts/rebuild-cascade.sh --dry-run ncurses  # Show cascade without building
+
 # VM-network baseline validation helpers
 ./local/scripts/validate-vm-network-baseline.sh
 ./local/scripts/test-vm-network-qemu.sh redbear-mini
@@ -848,4 +940,4 @@ Config comparison:

 ## ANTI-PATTERNS (COMMIT POLICY)

- **DO NOT** include AI attribution in commit messages — no "Ultraworked with [Sisyphus]", "Co-authored-by: Sisyphus", or similar AI agent footers. Commits belong to the human author only.
+- **DO NOT** include AI attribution in commit messages — no AI agent footers, co-authored-by lines for automated assistance, or similar markers. Commits belong to the human author only.
@@ -7,6 +7,7 @@ priority = 100
 command = ["/usr/lib/drivers/nvmed"]

 [[driver.match]]
+bus = "pci"
 class = 1
 subclass = 8

@@ -17,6 +18,7 @@ priority = 100
 command = ["/usr/lib/drivers/ahcid"]

 [[driver.match]]
+bus = "pci"
 class = 1
 subclass = 6

@@ -27,6 +29,7 @@ priority = 100
 command = ["/usr/lib/drivers/ided"]

 [[driver.match]]
+bus = "pci"
 class = 1
 subclass = 1

@@ -37,6 +40,7 @@ priority = 100
 command = ["/usr/lib/drivers/virtio-blkd"]

 [[driver.match]]
+bus = "pci"
 vendor = 0x1AF4
 device = 0x1001
 class = 1
@@ -7,6 +7,7 @@ priority = 50
 command = ["/usr/lib/drivers/e1000d"]

 [[driver.match]]
+bus = "pci"
 vendor = 0x8086
 class = 2

@@ -17,6 +18,7 @@ priority = 50
 command = ["/usr/lib/drivers/rtl8168d"]

 [[driver.match]]
+bus = "pci"
 vendor = 0x10EC
 class = 2

@@ -27,6 +29,7 @@ priority = 50
 command = ["/usr/lib/drivers/rtl8139d"]

 [[driver.match]]
+bus = "pci"
 vendor = 0x10EC
 device = 0x8139

@@ -37,6 +40,7 @@ priority = 50
 command = ["/usr/lib/drivers/ixgbed"]

 [[driver.match]]
+bus = "pci"
 vendor = 0x8086
 class = 2
 subclass = 0
@@ -48,5 +52,6 @@ priority = 50
 command = ["/usr/lib/drivers/virtio-netd"]

 [[driver.match]]
+bus = "pci"
 vendor = 0x1AF4
 class = 2
@@ -44,6 +44,49 @@ priority = 80
 command = ["/usr/lib/drivers/uhcid"]

 [[driver.match]]
+bus = "pci"
+class = 0x0C
+subclass = 0x03
+prog_if = 0x30
+
+# EHCI (USB 2.0)
+[[driver]]
+name = "ehcid"
+description = "EHCI USB 2.0 host controller"
+priority = 80
+command = ["/usr/lib/drivers/ehcid"]
+
+# EHCI now owns a simple /scheme/usb controller surface for per-port status and
+# control-transfer pass-through while the wider USB stack continues converging.
+
+[[driver.match]]
+bus = "pci"
+class = 0x0C
+subclass = 0x03
+prog_if = 0x20
+
+# OHCI (USB 1.1 — non-Intel chipsets)
+[[driver]]
+name = "ohcid"
+description = "OHCI USB 1.1 host controller"
+priority = 80
+command = ["/usr/lib/drivers/ohcid"]
+
+[[driver.match]]
+bus = "pci"
+class = 0x0C
+subclass = 0x03
+prog_if = 0x10
+
+# UHCI (USB 1.1 — Intel chipsets)
+[[driver]]
+name = "uhcid"
+description = "UHCI USB 1.1 host controller (Intel)"
+priority = 80
+command = ["/usr/lib/drivers/uhcid"]
+
+[[driver.match]]
+bus = "pci"
 class = 0x0C
 subclass = 0x03
 prog_if = 0x00
@@ -7,6 +7,7 @@ priority = 60
 command = ["/usr/lib/drivers/vesad"]

 [[driver.match]]
+bus = "pci"
 class = 0x03

 [[driver]]
@@ -18,14 +19,17 @@ command = ["/usr/bin/redox-drm"]
 # Only match known GPU vendors.  Class 0x03 alone catches QEMU VGA
 # (vendor 0x1234) which redox-drm rejects with a fatal error.
 [[driver.match]]
+bus = "pci"
 vendor = 0x1002
 class = 0x03

 [[driver.match]]
+bus = "pci"
 vendor = 0x8086
 class = 0x03

 [[driver.match]]
+bus = "pci"
 vendor = 0x1AF4
 class = 0x03

@@ -36,6 +40,7 @@ priority = 61
 command = ["/usr/bin/redox-drm"]

 [[driver.match]]
+bus = "pci"
 vendor = 0x1AF4
 class = 0x03

@@ -47,6 +52,7 @@ priority = 61
 command = ["/usr/bin/redox-drm"]

 [[driver.match]]
+bus = "pci"
 vendor = 0x8086
 class = 0x03
 subclass = 0x00
@@ -59,6 +65,7 @@ priority = 61
 command = ["/usr/bin/redox-drm"]

 [[driver.match]]
+bus = "pci"
 vendor = 0x1002
 class = 0x03
 subclass = 0x00
@@ -7,6 +7,7 @@ priority = 40
 command = ["/usr/lib/drivers/ihdad"]

 [[driver.match]]
+bus = "pci"
 vendor = 0x8086
 class = 0x04

@@ -17,6 +18,7 @@ priority = 40
 command = ["/usr/lib/drivers/ac97d"]

 [[driver.match]]
+bus = "pci"
 class = 0x04
 subclass = 0x01

@@ -1,49 +1,139 @@
 # GPIO and I2C controller drivers
+#
+# These drivers match against both PCI and ACPI devices.
+# ACPI devices are classified by _HID → PCI-equivalent class/subclass/vendor
+# codes via redox-driver-acpi's classify_acpi_device().
+#
+# Match criteria use the standard [[driver.match]] format with class/subclass/vendor.
+# The ACPI bus fills these fields from the _HID classification table.
+
+# --- I2C/SPI controller infrastructure ---

 [[driver]]
 name = "i2cd"
 description = "I2C host adapter registry"
 priority = 85
 command = ["/usr/lib/drivers/i2cd"]
+# i2cd is the I2C bus registry — spawned as infrastructure before
+# specific I2C controller drivers. Does not match against hardware
+# directly; it provides /scheme/i2c for controller drivers to register with.

 [[driver]]
 name = "gpiod"
 description = "GPIO controller registry"
 priority = 85
 command = ["/usr/lib/drivers/gpiod"]
+# gpiod is the GPIO pin registry — spawned as infrastructure before
+# specific GPIO controller drivers. Does not match against hardware
+# directly; it provides /scheme/gpio for controller drivers to register with.
+
+# --- ACPI I2C controller drivers ---
+# These match against ACPI devices classified as Serial Bus Controller (0x0C),
+# subclass SMBus/I2C (0x05), by the ACPI bus.
+# The ACPI bus maps Intel INT33C3/INT3433/... and AMD AMDI0010 HIDs to these codes.

 [[driver]]
 name = "dw-acpi-i2cd"
 description = "DesignWare ACPI I2C controller"
 priority = 80
 command = ["/usr/lib/drivers/dw-acpi-i2cd"]
+depends_on = ["acpi", "i2c"]

-[[driver]]
-name = "intel-gpiod"
-description = "Intel ACPI GPIO registrar"
-priority = 80
-command = ["/usr/lib/drivers/intel-gpiod"]
+[[driver.match]]
+bus = "acpi"
+class = 0x0C
+subclass = 0x05
+vendor = 0x8086

 [[driver]]
 name = "amd-mp2-i2cd"
 description = "AMD MP2 I2C controller"
 priority = 80
 command = ["/usr/lib/drivers/amd-mp2-i2cd"]
+depends_on = ["acpi", "i2c"]
+
+[[driver.match]]
+bus = "acpi"
+class = 0x0C
+subclass = 0x05
+vendor = 0x1022

 [[driver]]
 name = "intel-lpss-i2cd"
 description = "Intel LPSS I2C controller"
 priority = 80
 command = ["/usr/lib/drivers/intel-lpss-i2cd"]
+depends_on = ["acpi", "i2c"]
+
+[[driver.match]]
+bus = "acpi"
+class = 0x0C
+subclass = 0x05
+vendor = 0x8086
+
+# --- ACPI SPI controller drivers ---
+# These match against ACPI devices classified as Serial Bus Controller (0x0C),
+# subclass SPI (0x06), by the ACPI bus.
+
+[[driver]]
+name = "intel-lpss-spid"
+description = "Intel LPSS SPI controller"
+priority = 80
+command = ["/usr/lib/drivers/intel-lpss-spid"]
+depends_on = ["acpi"]
+
+[[driver.match]]
+bus = "acpi"
+class = 0x0C
+subclass = 0x06
+vendor = 0x8086
+
+# --- ACPI GPIO controller drivers ---
+# These match against ACPI devices classified as Serial Bus Controller (0x0C),
+# subclass Other (0x80), vendor Intel, by the ACPI bus.
+# The ACPI bus maps INT33C7/INT3437/INT3450 HIDs to these codes.
+
+[[driver]]
+name = "intel-gpiod"
+description = "Intel ACPI GPIO registrar"
+priority = 80
+command = ["/usr/lib/drivers/intel-gpiod"]
+depends_on = ["acpi", "gpio"]
+
+[[driver.match]]
+bus = "acpi"
+class = 0x0C
+subclass = 0x80
+vendor = 0x8086
+
+# --- ACPI thermal/power drivers ---
+# These match against ACPI devices classified as Thermal/Battery (0x0B).
+
+[[driver]]
+name = "redbear-thermald"
+description = "ACPI thermal zone monitor"
+priority = 60
+command = ["/usr/lib/drivers/redbear-thermald"]
+depends_on = ["acpi"]
+
+[[driver.match]]
+bus = "acpi"
+class = 0x0B
+
+# --- I2C companion drivers ---
+# These depend on I2C bus being available and match against specific
+# I2C device addresses (not PCI/ACPI class matching).

 [[driver]]
 name = "i2c-gpio-expanderd"
 description = "I2C GPIO expander companion bridge"
 priority = 75
 command = ["/usr/lib/drivers/i2c-gpio-expanderd"]
+depends_on = ["i2c", "gpio"]

 [[driver]]
 name = "intel-thc-hidd"
 description = "Intel THC QuickI2C HID transport"
 priority = 75
 command = ["/usr/lib/drivers/intel-thc-hidd"]
+depends_on = ["acpi", "i2c"]
@@ -1,158 +0,0 @@
-# Red Bear OS — CPU/DMA/IRQ/MSI/Scheduler Fix Plan
-
-**Date**: 2026-05-04
-**Updated**: 2026-05-04 (MSI T1.1–T2.2 implemented, committed, pushed)
-**Status**: Active — MSI Phase 1 complete, DMA/Scheduler pending
-**Source of truth**: Linux kernel 7.0 (local/reference/linux-7.0/)
-
-## 1. Problem Statement
-
-Five critical integration gaps in the microkernel architecture:
-
-| Gap | Severity | Impact | Status |
-|-----|----------|--------|--------|
-| MSI absent from kernel | CRITICAL | All NVMe/GPU/NIC on legacy INTx | ✅ RESOLVED (P8-msi.patch) |
-| DMA/IOMMU not integrated | CRITICAL | DMA buffers unprotected | ⏳ Pending |
-| PIT tick (148Hz) vs LAPIC (1000Hz) | HIGH | Scheduler 6x slower than Linux | ✅ RESOLVED (P7-scheduler patch) |
-| Global scheduler lock | HIGH | Serializes all context switches | ✅ RESOLVED (work-stealing) |
-| Thread creation (3 IPC hops) | HIGH | 3x slower than Linux clone() | ⏳ Pending |
-
-## 2. Phase 1: MSI/MSI-X in Kernel (Week 1-3) ✅ COMPLETE
-
-### T1.1: MSI Capability Parsing ✅ DONE
- File: `kernel/src/arch/x86_shared/device/msi.rs` (61 lines)
- Commit: `678980521` in `P8-msi.patch`
- Linux ref: `arch/x86/kernel/apic/msi.c` (391 lines)
- Implements: `MsiMessage` (compose/validate), `MsiCapability` (parse 32/64-bit), `MsixCapability` (parse table/PBA), `is_valid_msi_address`, `is_valid_msi_vector`
- Bounds-safe: all `parse()` methods return `Option<Self>`, using `.get()` instead of raw indexing
-
-### T1.2: Vector Allocation Matrix ✅ DONE
- File: `kernel/src/arch/x86_shared/device/vector.rs` (53 lines)
- Commit: `678980521` in `P8-msi.patch`
- Linux ref: `arch/x86/kernel/apic/vector.c` (1387 lines)
- Implements: per-CPU bitmatrix (7×32-bit banks = 224 vectors 32-255), `allocate_vector`, `free_vector`
- Lock-free CAS-based allocation with `trailing_ones()` find-first-zero
- NOTE: VECTORS table is global (not yet per-CPU sharded) — sufficient for 224 vectors
-
-### T1.3: MSI IRQ Domain (Scheme Integration) ✅ DONE
- File: `kernel/src/scheme/irq.rs`
- Commit: `678980521` in `P8-msi.patch`
- Implements: `msi_vector_is_valid()` (32-0xEF range check), `iommu_validate_msi_irq()` hook (stub: always true), IOMMU gate at `irq_trigger()` for vectors ≥16
-
-### T1.4: Userspace MSI Consumer (driver-sys) ✅ DONE
- File: `local/recipes/drivers/redox-driver-sys/source/src/irq.rs`
- Commit: `678980521`
- Implements: `MsiAllocation` with round-robin CPU allocation, `irq_set_affinity` (scheme write), `program_x86_message` with kernel-mediated address/vector validation (mask `0xFFF0_0000`)
- Quirk-aware fallback retained: FORCE_LEGACY, NO_MSI, NO_MSIX
-
-### T1.5: Kernel-side MSI Affinity Handler ✅ DONE
- File: `kernel/src/scheme/irq.rs`
- Commit: `678980521` in `P8-msi.patch`
- Implements: `Handle::IrqAffinity { irq, mask }` variant, path routing for `<irq>/affinity` and `cpu-XX/<irq>/affinity`, kwrite validates CPU id and stores mask atomically, kfstat/kfpath/kreadoff/close all handle new variant
-
-## 3. Phase 2: DMA/IOMMU Integration (Week 3-5) — AUDITED 2026-05-04
-
-**Status**: IOMMU daemon (1003 lines) and DmaBuffer (261 lines) already exist and are solid. Tasks re-scoped from "create" to "wire."
-
-### T2.1: IommuDmaAllocator (driver-sys) ⏳ P0
- File: `local/recipes/drivers/redox-driver-sys/source/src/dma.rs`
- Add `IommuDmaAllocator` struct: holds IOMMU domain fd, wraps `DmaBuffer::allocate()` with IOMMU MAP opcode
- Uses `scheme:iommu/domain/N` write with MAP request → get IOVA
- Linux ref: `include/linux/dma-mapping.h` — `dma_alloc_coherent()` → `iommu_dma_alloc()`
-
-### T2.2: GPU DMA pass-through ⏳ P0
- Wire `redox-drm` GPU drivers to open IOMMU device endpoint and use IommuDmaAllocator
- amdgpu: VRAM/GTT allocations through IOMMU domain
- Intel i915: GTT pages through IOMMU domain
- Files: `local/recipes/gpu/redox-drm/source/`, `local/recipes/gpu/amdgpu/source/`
-
-### T2.3: Streaming DMA (linux-kpi) ⏳ P1
- `dma_map_single()`: allocate bounce buffer, copy data, map through IOMMU
- `dma_unmap_single()`: copy back, unmap, free bounce buffer
- Linux ref: `kernel/dma/mapping.c` — streaming API
- File: `local/recipes/drivers/linux-kpi/source/`
-
-### T2.4: NVMe DMA pass-through ⏳ P1
- Wire `ahcid`/`nvmed` PRP list physical addresses through IOMMU domain
- Linux ref: `drivers/nvme/host/pci.c` — `nvme_map_data()`
-
-### T2.5: SWIOTLB Fallback (low priority) ⏳ P2
- Linux ref: `kernel/dma/swiotlb.c`
- Bounce buffer for devices with <4GB DMA addressing
- Only needed for ancient hardware; x86_64 modern hardware doesn't need it
-
-## 4. Phase 3: Scheduler Improvements (Week 4-6) — MOSTLY DONE
-
-### T3.1: LAPIC Timer as Primary Tick ✅ DONE
- P7-scheduler-improvements.patch: LAPIC timer calibrated + enabled at vector 48
- TSC-deadline mode, 1000Hz tick drives DWRR scheduler directly
- PIT fallback retained
-
-### T3.2: Per-CPU Scheduler Locks ✅ DONE
- Work-stealing load balancer in switch.rs
- Per-CPU nr_running counter
- Idle CPUs steal work via IPI
-
-### T3.3: Load Balancing ✅ DONE
- RT scheduling class (priority 0-9, skip DWRR, immediate dispatch)
- Threshold reduced: 3→1 ticks for LAPIC-driven mode
- Geometric weights in DWRR
-
-### T3.4: RT Scheduling Class ✅ DONE
-
-### T3.5: NUMA-Aware Scheduling ❌
- Not implemented — low priority for desktop/non-NUMA systems
- Linux ref: kernel/sched/rt.c
- FIFO and Round-Robin classes
- Priority inheritance
- RT throttling: 95% CPU cap/sec
-
-### T3.5: TSC-Deadline Timer
- Use IA32_TSC_DEADLINE MSR for precise tick
- True tickless operation
- TSC calibration via HPET or PIT
-
-## 5. Phase 4: Thread Creation (Week 6-7)
-
-### T4.1: Batched Thread Creation
- Batch new-thread requests (reduce IPC)
- Pre-allocate stack pages during fork
-
-### T4.2: Kernel Thread Pool
- Pre-create idle kernel threads
- Reuse via object pool
-
-### T4.3: Shared Memory IPC
- Use shm for proc scheme bulk ops
- Avoid data copy through IPC channel
-
-## 6. Dependencies
-
-Phase 1 (MSI): T1.1 -> T1.2 -> T1.3 -> T1.4 -> T1.5
-Phase 2 (DMA): T2.1 -> T2.2 -> T2.3 -> T2.4 -> T2.5
-Phase 3 (Sched): T3.1 -> T3.5 -> T3.2 -> T3.3 -> T3.4
-Phase 4 (Thread): T4.1 -> T4.2 -> T4.3
-
-Phase 1+2 independent (parallel). Phase 2.4 needs Phase 1.3.
-Phase 3.1 partially done (start immediately).
-
-## 7. Timeline
-
-| Phase | Duration | Cumulative |
-|-------|----------|------------|
-| Phase 1 (MSI) | 3 weeks | Week 3 |
-| Phase 2 (DMA/IOMMU) | 3 weeks | Week 5 |
-| Phase 3 (Scheduler) | 3 weeks | Week 7 |
-| Phase 4 (Threads) | 2 weeks | Week 7 |
-
-Total: 7 weeks (2 devs parallel Phase 1+2)
-
-## 8. Success Metrics
-
-| Metric | Before | After |
-|--------|--------|-------|
-| Scheduler tick | 148Hz (PIT) | 1000Hz (LAPIC) |
-| NVMe throughput | INTx shared | MSI-X 4+ queues |
-| Context switch | ~6.75ms | ~1ms |
-| Thread create | 3 IPC hops | 2 IPC hops |
-| DMA safety | Unprotected | IOMMU-mapped |
@@ -1,385 +0,0 @@
-# Red Bear OS — Master Implementation Plan
-
-**Date**: 2026-05-04
-**Status**: Authoritative — supersedes CHANGELOG-DRIVER-IMPROVEMENT-PLAN.md, COMPREHENSIVE-DRIVER-AUDIT-2026-05-04.md, and HARDWARE-VALIDATION-MATRIX.md
-**Source of truth**: Linux kernel 7.0 (`local/reference/linux-7.0/`)
-
---
-
-## 1. Authority & Scope
-
-### 1.1 Relationship to Existing Plans
-
-This plan is the **master execution document**. It delegates subsystem authority to specialized plans:
-
-| Plan | Subsystem | Relationship |
-|------|-----------|-------------|
-| `ACPI-IMPROVEMENT-PLAN.md` | ACPI sleep, thermal, EC, power | **Authoritative** for ACPI |
-| `IRQ-AND-LOWLEVEL-CONTROLLERS-ENHANCEMENT-PLAN.md` | PCI IRQ, MSI-X, IOMMU, controllers | **Authoritative** for IRQ/PCI |
-| `USB-IMPLEMENTATION-PLAN.md` | xHCI, EHCI, device lifecycle | **Authoritative** for USB |
-| `DRM-MODERNIZATION-EXECUTION-PLAN.md` | GPU/DRM, KMS, Mesa | **Authoritative** for GPU |
-| `BLUETOOTH-IMPLEMENTATION-PLAN.md` | BT host/controller | **Authoritative** for BT |
-| `WIFI-IMPLEMENTATION-PLAN.md` | Wi-Fi control plane | **Authoritative** for Wi-Fi |
-| `CONSOLE-TO-KDE-DESKTOP-PLAN.md` | Desktop/KDE path | **Authoritative** for desktop |
-
-**This master plan covers**: storage, network, audio, input drivers, cross-cutting quality, CPU/power, virtio, and kernel substrate (CPU/SMP/timers/DMA/memory).
-
-### 1.2 Validation Levels
-
- **builds** — compiles without error
- **enumerates** — discovers hardware via scheme interfaces
- **usable** — works in bounded scenario (QEMU or bare metal)
- **validated** — passes explicit acceptance tests with evidence
- **hardware-validated** — proven on real bare metal
-
---
-
-## 2. Phase 0: Cross-Cutting Driver Quality (Week 1-2) ⏳ IMPLEMENTED
-
-### T0.1: Driver Error Handling ✅
-
-**Status**: DONE. All 5 critical driver main.rs files have zero `unwrap()` calls. 165-line durable patch at `local/patches/base/P6-driver-main-fixes.patch`.
-
-**Files**: ahcid, e1000d, rtl8168d, ihdad, ac97d main.rs
-
-### T0.2: Driver Logging 
-
-Not started. Drivers use inconsistent logging.
-
-### T0.3: Driver Lifecycle Documentation
-
-Not started.
-
---
-
-## 3. Phase 1: Storage Drivers (Week 2-6) ⏳ STRUCTURE EXISTING
-
-### T1.1: AHCI NCQ ✅ (71 lines, wired)
-
-**Status**: DONE. `ahci/src/ahci/ncq.rs` (71 lines) with tag alloc, FIS construction, completion processing, NCQ enable/issue. Wired via `pub mod ncq` in mod.rs.
-
-**Linux ref**: `drivers/ata/libata-sata.c` — `ata_qc_issue()`
-
-**Remaining work**: Wire into port interrupt handler, runtime test with QEMU AHCI + NCQ.
-
-### T1.2: AHCI Power Management ❌
-
-**Linux ref**: `drivers/ata/libata-eh.c:3682` — `ata_eh_handle_port_suspend()`
-
-### T1.3: AHCI TRIM/Discard ❌
-
-**Linux ref**: `drivers/ata/libata-scsi.c` — `ata_scsi_unmap_xlat()`
-
-### T1.4: NVMe Multiple Queues ❌
-
-**Linux ref**: `drivers/nvme/host/pci.c` — `nvme_reset_work()`
-
---
-
-## 4. Phase 2: Network Drivers (Week 4-8) ⏳ STRUCTURE EXISTING
-
-### T2.1: e1000 ITR + Checksum ✅ (33 lines, wired)
-
-**Status**: DONE. `e1000d/src/itr.rs` (33 lines) with ITR state machine, set_itr, configure_default, enable_rx_checksum, enable_tso. Wired via `pub mod itr` in main.rs.
-
-**Linux ref**: `e1000e/netdev.c:4200` — `e1000_configure_itr()`
-
-### T2.2: e1000 TSO ❌
-
-### T2.3: r8169 PHY ✅ (34 lines, wired)
-
-**Status**: DONE. `rtl8168d/src/phy.rs` (34 lines) with chip detection (12 variants), PHY registers, link detect, reset, autoneg + gigabit init. Wired via `pub mod phy` in main.rs.
-
-**Linux ref**: `r8169_phy_config.c` (1,354 lines)
-
-### T2.4: Jumbo Frames ❌
-
---
-
-## 5. Phase 3: Audio Drivers (Week 6-10) ⏳ STRUCTURE EXISTING
-
-### T3.1: HDA Codec Detection ✅ (STRUCTURE)
-
-**Status**: DONE. `ihdad/src/hda/codec.rs` (18 lines) + `jack.rs` (4 lines). Both wired. 12 known codec table. Jack sense with pin config parsing.
-
-### T3.2: HDA Jack Detection ✅ (STRUCTURE)
-
-**Status**: `ihdad/src/hda/jack.rs` exists. Jack sense, unsolicited response.
-
-### T3.3: HDA Stream Setup
-
-Stream.rs exists (387 lines). NOT runtime-validated.
-
-### T3.4: AC97 Multiple Codec ❌
-
---
-
-## 6. Phase 4: Input Drivers (Week 3-5) ⏳ PARTIAL
-
-### T4.1: PS/2 Controller Reset ❌
-
-**Linux ref**: `drivers/input/serio/i8042.c:522`
-
-### T4.2: Touchpad Protocols ❌
-
-**Linux ref**: `drivers/input/mouse/synaptics.c`
-
---
-
-## 7. Phase 5: Validation (Week 1-12, parallel) ⏳ IMPLEMENTED
-
-### T5.1: Test Harnesses ✅
-
-`local/scripts/test-storage-qemu.sh` and `test-network-qemu.sh` exist.
-
-### T5.2: Hardware Validation Matrix ✅
-
-`local/docs/HARDWARE-VALIDATION-MATRIX.md` — 28 lines tracking 18 components.
-
---
-
-## 8. Kernel Substrate (Addendum A findings)
-
-### K1: CPU / SMP / Timer (T0 priority)
-
-| Gap | Linux Ref | Lines |
-|-----|-----------|-------|
-| BSP/AP handoff | `arch/x86/kernel/smpboot.c:895` | 1,511 |
-| CPU hotplug | `smpboot.c:1312` | — |
-| TSC calibration | `arch/x86/kernel/tsc.c:1186` | 1,612 |
-| APIC timer calibration | `arch/x86/kernel/apic/apic.c:294` | 2,694 |
-| Vector allocation | `arch/x86/kernel/apic/vector.c` | 1,387 |
-| MSI/MSI-X | `arch/x86/kernel/apic/msi.c` | 391 | ✅ DONE — P8-msi.patch (msi.rs, vector.rs, scheme/irq.rs, driver-sys) |
-
-### K2: DMA / IOMMU (Audited 2026-05-04)
-
-**Current State — Thorough Audit:**
-
-| Component | Location | Lines | Status |
-|---|---|---|---|
-| IOMMU scheme daemon | `local/recipes/system/iommu/source/src/lib.rs` | 1,003 | ✅ REAL — full AMD-Vi protocol: domain CRUD, MAP/UNMAP/TRANSLATE, device assignment, event drain, IRQ remapping. Host-runnable tests pass. |
-| AMD-Vi unit driver | `local/recipes/system/iommu/source/src/amd_vi.rs` | 427 | ✅ REAL — IVRS parsing, MMIO mapping, device table programming, command buffer, event log, page table init |
-| Domain page tables | `local/recipes/system/iommu/source/src/page_table.rs` | — | ✅ REAL — multi-level page table, IOVA allocation, mapping flags (R/W/X/coherent/user) |
-| DMA buffer (alloc+phys) | `local/recipes/drivers/redox-driver-sys/source/src/dma.rs` | 261 | ✅ REAL — `DmaBuffer` with physically contiguous allocation via scheme:memory, virt-to-phys translation, heap fallback |
-| linux-kpi DMA headers | `local/recipes/drivers/linux-kpi/source/` | — | ✅ dma-mapping.h, dma-direction.h, scatterlist.h ported |
-| IOMMU←→driver wiring | — | — | ❌ **GAP** — `DmaBuffer` does NOT pass through IOMMU domains. GPU/NIC/NVMe drivers allocate DMA directly, not through IOMMU-isolated domains |
-| Streaming DMA | — | — | ❌ **GAP** — no `dma_map_single`/`dma_unmap_single` for bounce-buffer ops |
-| SWIOTLB | — | — | ❌ **GAP** — no bounce buffer for devices with limited DMA range |
-
-**Implementation Plan — DMA/IOMMU Integration (Week 3-5):**
-
-| Task | Description | Lines | Priority |
-|---|---|---|---|
-| **D2.1: IommuDmaAllocator** | New type in driver-sys: takes an IOMMU domain handle, allocates DmaBuffer through it. Uses `scheme:iommu/domain/N` MAP opcode. | ~150 | P0 |
-| **D2.2: GPU DMA pass-through** | Wire `redox-drm` to use `IommuDmaAllocator` for GTT/VRAM allocations. Requires amdgpu/ihdgd to open IOMMU device handle. | ~80 | P0 |
-| **D2.3: NVMe DMA pass-through** | Wire `ahcid`/`nvmed` PRP lists through `IommuDmaAllocator`. | ~60 | P1 |
-| **D2.4: Streaming DMA** | `dma_map_single`/`dma_unmap_single` in linux-kpi. Allocates temp buffer, copies data, maps through IOMMU. | ~120 | P1 |
-| **D2.5: SWIOTLB** | Bounce buffer allocation for DMA-limited devices. Linux ref: `kernel/dma/swiotlb.c`. | ~200 | P2 |
-
-**Linux Reference Summary (from `local/reference/linux-7.0/`):**
-
-| Linux API | Purpose | Red Bear Equivalent |
-|---|---|---|
-| `dma_alloc_coherent()` | Allocate physically contiguous, uncached DMA buffer | `DmaBuffer::allocate()` + `IommuDmaAllocator` (planned) |
-| `dma_map_single()` | Map a single buffer for device DMA (cache sync) | Not yet — D2.4 |
-| `dma_map_sg()` | Map scatter-gather list | Not yet |
-| `iommu_domain_alloc()` | Create IOMMU translation domain | `IommuScheme` CREATE_DOMAIN opcode |
-| `iommu_map()` | Map physical pages into domain | `IommuScheme` MAP opcode |
-| `iommu_attach_device()` | Assign device to domain | `IommuScheme` ASSIGN_DEVICE opcode |
-
-### K2b: Thread Creation / fork() (Audited 2026-05-04)
-
-**Current State:**
-
-| Component | Location | Lines | Status |
-|---|---|---|---|
-| Kernel `context::spawn` | `recipes/core/kernel/source/src/context/mod.rs:217` | ~25 | ✅ Creates new context with NEW address space, kernel stack, initial call frame |
-| `scheme:user` process spawn | `recipes/core/kernel/source/src/scheme/user.rs:723` | — | ✅ Userspace writes process params → kernel spawns |
-| relibc `rlct_clone` | `recipes/core/relibc/source/src/platform/redox/mod.rs:1154` | ~10 | ✅ Thread creation via `redox_rt::thread::rlct_clone_impl` — lightweight: shares address space, TCB, signal state |
-| `pthread_create` | `recipes/core/relibc/source/src/pthread/mod.rs:105` | ~100 | ✅ Allocates stack via mmap, creates TCB, calls rlct_clone |
-| Thread stack allocation | mmap-based (line 130-143) | — | ✅ MAP_PRIVATE | MAP_ANONYMOUS, correct |
-
-**Gap Analysis:**
-
-| Gap | Severity | Detail |
-|---|---|---|
-| No `clone()` syscall | MEDIUM | Redox uses `rlct_clone` for threads and `scheme:user` for processes. This is architecturally correct for a microkernel — no gap. |
-| No `CLONE_VM` flag | N/A | `rlct_clone` implicitly shares address space (it's a THREAD clone, not a process clone). Process creation via `scheme:user` creates new address space. Correct semantics. |
-| No `CLONE_FILES` | N/A | File descriptors are shared via the `scheme:user` write protocol. Re-layout possible but functional. |
-| "3 IPC hops" slower than Linux | LOW | Measured: 1) mmap stack, 2) rlct_clone syscall, 3) synchronization mutex unlock. Linux `clone()` does all three in kernel. Acceptable for a microkernel. |
-| No `posix_spawn()` fast-path | MEDIUM | Currently goes through `fork`-equivalent → `exec`. Linux has `posix_spawn` via `vfork`+`exec`. Not yet in Redox. |
-
-**Overall verdict on DMA/IOMMU**: IOMMU daemon is the most complete userspace component — it needs wiring, not rewriting. DmaBuffer exists but is IOMMU-unaware. The implementation tasks (D2.1-D2.5) are wiring tasks connecting an already-working IOMMU to already-working driver allocators.
-
-### K3: Virtio
-
-| Gap | Linux Ref | Lines |
-|-----|-----------|-------|
-| Modern PCI transport | `drivers/virtio/virtio_pci_modern.c` | 1,301 |
-| Packed virtqueue | `drivers/virtio/virtio_ring.c` | 3,940 |
-| Multiqueue | `drivers/net/virtio_net.c` | 7,256 |
-
-### K4: CPU Frequency / Thermal
-
-| Component | Lines | Status |
-|-----------|-------|--------|
-| cpufreqd | 26 | STUB — needs MSR/governor implementation |
-| thermald | 837 | REAL — needs trip points, fan control |
-
-### K5: Block Layer
-
-No shared block layer exists. Each storage driver reinvents I/O dispatch. Linux: `block/blk-mq.c` (5,309 lines).
-
---
-
-## 9. ACPI Gaps (delegated to ACPI-IMPROVEMENT-PLAN.md)
-
-| Linux File | Lines | Feature | Status |
-|------------|-------|---------|--------|
-| `drivers/acpi/sleep.c` | 1,152 | S3/S4 suspend | ❌ |
-| `drivers/acpi/thermal.c` | 1,067 | Thermal zones | ❌ |
-| `drivers/acpi/battery.c` | 1,331 | Battery status | ❌ |
-| `drivers/acpi/ec.c` | 2,380 | EC runtime | ❌ |
-| `drivers/acpi/fan.c` | ~400 | Fan control | ❌ |
-| `arch/x86/kernel/acpi/sleep.c` | 202 | x86 sleep | ❌ |
-
---
-
-## 10. Execution Priority
-
-### Tier T0 — Kernel Substrate (CRITICAL — blocks all driver work)
-
-| Task | Files | Estimated |
-|------|-------|-----------|
-| MSI/MSI-X support | kernel apic + irq.rs | 4-6 weeks |
-| TSC calibration | kernel time + tsc | 1-2 weeks |
-| DMA API | kernel dma | 2-3 weeks |
-| Virtio modern PCI | virtio-core transport | 2-3 weeks |
-| cpufreqd (real impl) | local cpufreqd | 2-3 weeks |
-
-### Tier T1 — Storage + Network (HIGH)
-
-| Task | Files | Estimated |
-|------|-------|-----------|
-| AHCI NCQ runtime | ahci ncq.rs + main.rs | 2-3 weeks |
-| AHCI PM + TRIM | ahci new module | 1-2 weeks |
-| e1000 ITR runtime | e1000 itr.rs + device.rs | 1-2 weeks |
-| r8169 PHY runtime | r8169 phy.rs + device.rs | 1-2 weeks |
-
-### Tier T2 — Audio + Input (MEDIUM)
-
-| Task | Files | Estimated |
-|------|-------|-----------|
-| HDA codec runtime | ihdad hda/codec.rs | 2-3 weeks |
-| HDA stream playback | ihdad hda/stream.rs | 2-3 weeks |
-| PS/2 controller reset | ps2d controller.rs | 3-5 days |
-| Touchpad protocols | ps2d mouse.rs | 1-2 weeks |
-
-### Tier T3 — Completeness (LOW)
-
-| Task | Files | Estimated |
-|------|-------|-----------|
-| NVMe multi-queue | nvmed | 2-3 weeks |
-| e1000 TSO | e1000 | 1-2 weeks |
-| Jumbo frames | e1000 + r8169 | 3-5 days |
-| AC97 multi-codec | ac97d | 1 week |
-
---
-
-## 11. Hardware Validation Matrix
-
-| Component | QEMU | Bare Metal | Status |
-|-----------|------|------------|--------|
-| AHCI SATA | ✅ | 🔲 | NCQ structure present |
-| NVMe | 🔲 | 🔲 | Basic driver |
-| virtio-blk | ✅ | N/A | QEMU only |
-| e1000 | 🔲 | 🔲 | ITR structure present |
-| rtl8168 | 🔲 | 🔲 | PHY config present |
-| virtio-net | ✅ | N/A | QEMU only |
-| Intel HDA | 🔲 | 🔲 | Codec+jack added |
-| AC97 | 🔲 | 🔲 | Basic driver |
-| PS/2 | ✅ | 🔲 | QEMU works |
-| VESA | ✅ | 🔲 | QEMU FB works |
-| virtio-gpu | ✅ | N/A | 2D only |
-| cpufreqd | 🔲 | 🔲 | STUB (26 lines) |
-| thermald | 🔲 | 🔲 | ACPI thermal |
-| x2APIC/SMP | ✅ | ✅ | Multi-core works |
-
---
-
-## 12. File Inventory
-
-### Patches (durable)
-
-| Patch | Lines | Recipe | Status |
-|-------|-------|--------|--------|
-| `local/patches/relibc/P5-named-semaphores.patch` | 249 | relibc | ✅ Wired |
-| `local/patches/base/P6-driver-main-fixes.patch` | 165 | base | ✅ Wired |
-| `local/patches/base/P6-driver-new-modules.patch` | 185 | base | ✅ Wired |
-| `local/patches/base/P6-cpufreqd-real-impl.patch` | 177 | — | 🔲 Not wired |
-
-### New Source Files
-
-| File | Lines | Phase | Status |
-|------|-------|-------|--------|
-| `ahcid/src/ahci/ncq.rs` | 12 | Phase 1 | ⚠️ Truncated |
-| `e1000d/src/itr.rs` | 9 | Phase 2 | ⚠️ Truncated |
-| `rtl8168d/src/phy.rs` | 5 | Phase 2 | ⚠️ Truncated |
-| `ihdad/src/hda/codec.rs` | 4 | Phase 3 | ⚠️ Truncated |
-| `ihdad/src/hda/jack.rs` | 5 | Phase 3 | ⚠️ Truncated |
-| `cpufreqd/src/main.rs` | 26 | Kernel | ❌ STUB |
-
-### Scripts
-
-| Script | Phase | Status |
-|--------|-------|--------|
-| `local/scripts/test-storage-qemu.sh` | Phase 5 | ✅ |
-| `local/scripts/test-network-qemu.sh` | Phase 5 | ✅ |
-| `local/scripts/lint-config-paths.sh` | Phase 0 | ✅ |
-| `local/scripts/validate-init-services.sh` | Phase 0 | ✅ |
-| `local/scripts/validate-file-ownership.sh` | Phase 0 | ✅ |
-| `local/scripts/generate-installs-manifest.sh` | Phase 0 | ✅ |
-
-### Documentation
-
-| Document | Lines | Status |
-|----------|-------|--------|
-| `IMPLEMENTATION-MASTER-PLAN.md` | — | This file |
-| `CHANGELOG-DRIVER-IMPROVEMENT-PLAN.md` | 672 | Superseded |
-| `COMPREHENSIVE-DRIVER-AUDIT-2026-05-04.md` | 316 | Superseded |
-| `HARDWARE-VALIDATION-MATRIX.md` | 28 | Superseded |
-| `BUILD-SYSTEM-HARDENING-PLAN.md` | 403 | Active |
-| `BUILD-SYSTEM-INVARIANTS.md` | 436 | Active |
-| `ACPI-IMPROVEMENT-PLAN.md` | 839 | Active |
-| `IRQ-AND-LOWLEVEL-CONTROLLERS-ENHANCEMENT-PLAN.md` | 916 | Active |
-
---
-
-## 14. Scheduler & Threading Assessment (2026-05-04)
-
-### Architecture
- **Kernel**: DWRR scheduler (577 lines), 40 priority levels, per-CPU queues, futex (222 lines)
- **Userspace**: proc manager (2,638 lines), pthread (440 lines), signal delivery via proc scheme
- **IPC bridge**: 3 round-trips for thread creation vs Linux's single clone() syscall
-
-### Strengths
- DWRR with geometric weights, CPU affinity masks, soft-blocking with monotonic timeout
- Full POSIX process model (PID/PGID/SID, job control, orphan detection)
- Futex with physical-address keys for cross-process synchronization
-
-### Critical Gaps
-1. **PIT-based tick (~148Hz)** — LAPIC timer exists but `setup_timer()` is commented out. Should use Periodic/TscDeadline mode at 1000Hz.
-2. **Global CONTEXT_SWITCH_LOCK** — spinlock serializes all context switches across CPUs. Should be per-CPU.
-3. **No load balancing** — idle CPUs don't steal work from busy CPUs
-4. **No RT scheduling** — missing FIFO/RR/Deadline classes
-5. **No cgroups** — no CPU bandwidth control or resource limits
-6. **Thread creation latency** — 3 IPC hops vs single clone()
-
-| Tier | Duration |
-|------|----------|
-| T0 (kernel substrate) | 10-14 weeks |
-| T1 (storage + network) | 6-10 weeks |
-| T2 (audio + input) | 6-10 weeks |
-| T3 (completeness) | 4-8 weeks |
-| **Total (2 developers, parallel)** | **16-24 weeks** |
-| **Total (1 developer, sequential)** | **26-42 weeks** |
@@ -2,7 +2,7 @@ diff --git a/daemon/src/lib.rs b/daemon/src/lib.rs
 index 9f507221..c69c2cfa 100644
 --- a/daemon/src/lib.rs
 +++ b/daemon/src/lib.rs
-@@ -10,15 +10,26 @@ use libredox::Fd;
+@@ -10,15 +10,25 @@ use libredox::Fd;
 use redox_scheme::Socket;
 use redox_scheme::scheme::{SchemeAsync, SchemeSync};
 
@@ -10,7 +10,6 @@ index 9f507221..c69c2cfa 100644
 -    let fd: RawFd = std::env::var(var).unwrap().parse().unwrap();
 +unsafe fn get_fd(var: &str) -> Option<RawFd> {
 +    let fd: RawFd = match std::env::var(var)
-+        .map_err(|e| eprintln!("daemon: env var {var} not set: {e}"))
 +        .ok()
 +        .and_then(|val| {
 +            val.parse()
@@ -33,7 +32,7 @@ index 9f507221..c69c2cfa 100644
 }
 
 unsafe fn pass_fd(cmd: &mut Command, env: &str, fd: OwnedFd) {
-@@ -38,20 +49,26 @@ unsafe fn pass_fd(cmd: &mut Command, env: &str, fd: OwnedFd) {
+@@ -38,20 +48,26 @@ unsafe fn pass_fd(cmd: &mut Command, env: &str, fd: OwnedFd) {
 /// A long running background process that handles requests.
 #[must_use = "Daemon::ready must be called"]
 pub struct Daemon {
@@ -63,7 +62,7 @@ index 9f507221..c69c2cfa 100644
     }
 
     /// Executes `Command` as a child process.
-@@ -83,25 +100,28 @@ impl Daemon {
+@@ -83,25 +99,28 @@ impl Daemon {
 /// A long running background process that handles requests using schemes.
 #[must_use = "SchemeDaemon::ready must be called"]
 pub struct SchemeDaemon {
@@ -0,0 +1,184 @@
+diff --git a/drivers/graphics/fbbootlogd/src/main.rs b/drivers/graphics/fbbootlogd/src/main.rs
+index 3e42d590..79c2119f 100644
+--- a/drivers/graphics/fbbootlogd/src/main.rs
+++ b/drivers/graphics/fbbootlogd/src/main.rs
+@@ -46,13 +46,17 @@ fn daemon(daemon: daemon::SchemeDaemon) -> ! {
+         )
+         .expect("fbbootlogd: failed to subscribe to scheme events");
+ 
+-    event_queue
+-        .subscribe(
+-            scheme.input_handle.event_handle().as_raw_fd() as usize,
+-            Source::Input,
+-            event::EventFlags::READ,
+-        )
+-        .expect("fbbootlogd: failed to subscribe to scheme events");
+    if let Some(ref input_handle) = scheme.input_handle {
+        event_queue
+            .subscribe(
+                input_handle.event_handle().as_raw_fd() as usize,
+                Source::Input,
+                event::EventFlags::READ,
+            )
+            .expect("fbbootlogd: failed to subscribe to input events");
+    } else {
+        eprintln!("fbbootlogd: running without input handle (log-only mode)");
+    }
+ 
+     {
+         let log_fd = socket
+@@ -76,6 +80,11 @@ fn daemon(daemon: daemon::SchemeDaemon) -> ! {
+     // driver handoff. In the future inputd may directly pass a handle to the display instead.
+     //libredox::call::setrens(0, 0).expect("fbbootlogd: failed to enter null namespace");
+ 
+    enum Action {
+        Input(Event),
+        Handoff,
+    }
+
+     for event in event_queue {
+         match event.expect("fbbootlogd: failed to get event").user_data {
+             Source::Scheme => loop {
+@@ -88,20 +97,31 @@ fn daemon(daemon: daemon::SchemeDaemon) -> ! {
+                 }
+             },
+             Source::Input => {
+-                let mut events = [Event::new(); 16];
+-                loop {
+-                    match scheme
+-                        .input_handle
+-                        .read_events(&mut events)
+-                        .expect("fbbootlogd: error while reading events")
+-                    {
+-                        ConsumerHandleEvent::Events(&[]) => break,
+-                        ConsumerHandleEvent::Events(events) => {
+-                            for event in events {
+-                                scheme.handle_input(&event);
+                let mut actions: Vec<Action> = Vec::new();
+                if let Some(ref mut input_handle) = scheme.input_handle {
+                    let mut events = [Event::new(); 16];
+                    loop {
+                        match input_handle
+                            .read_events(&mut events)
+                            .expect("fbbootlogd: error while reading events")
+                        {
+                            ConsumerHandleEvent::Events(&[]) => break,
+                            ConsumerHandleEvent::Events(events) => {
+                                for event in events {
+                                    actions.push(Action::Input(*event));
+                                }
+                            }
+                            ConsumerHandleEvent::Handoff => {
+                                actions.push(Action::Handoff);
+                                break;
+                             }
+                         }
+-                        ConsumerHandleEvent::Handoff => {
+                    }
+                }
+                for action in actions {
+                    match action {
+                        Action::Input(event) => scheme.handle_input(&event),
+                        Action::Handoff => {
+                             eprintln!("fbbootlogd: handoff requested");
+                             scheme.handle_handoff();
+                         }
+diff --git a/drivers/graphics/fbbootlogd/src/scheme.rs b/drivers/graphics/fbbootlogd/src/scheme.rs
+index 812c4a5b..53e4bc75 100644
+--- a/drivers/graphics/fbbootlogd/src/scheme.rs
+++ b/drivers/graphics/fbbootlogd/src/scheme.rs
+@@ -14,7 +14,7 @@ use syscall::schemev2::NewFdFlags;
+ use syscall::{Error, Result, EACCES, EBADF, EINVAL, ENOENT};
+ 
+ pub struct FbbootlogScheme {
+-    pub input_handle: ConsumerHandle,
+    pub input_handle: Option<ConsumerHandle>,
+     display_map: Option<V2DisplayMap>,
+     text_screen: console_draw::TextScreen,
+     text_buffer: console_draw::TextBuffer,
+@@ -25,8 +25,16 @@ pub struct FbbootlogScheme {
+ 
+ impl FbbootlogScheme {
+     pub fn new() -> FbbootlogScheme {
+        let input_handle = match ConsumerHandle::bootlog_vt() {
+            Ok(handle) => Some(handle),
+            Err(err) => {
+                eprintln!("fbbootlogd: Failed to open vt (non-fatal): {err}");
+                None
+            }
+        };
+
+         let mut scheme = FbbootlogScheme {
+-            input_handle: ConsumerHandle::bootlog_vt().expect("fbbootlogd: Failed to open vt"),
+            input_handle,
+             display_map: None,
+             text_screen: console_draw::TextScreen::new(),
+             text_buffer: console_draw::TextBuffer::new(1000),
+@@ -41,8 +49,19 @@ impl FbbootlogScheme {
+     }
+ 
+     pub fn handle_handoff(&mut self) {
+-        let new_display_handle = match self.input_handle.open_display_v2() {
+-            Ok(display) => V2GraphicsHandle::from_file(display).unwrap(),
+        let Some(ref input_handle) = self.input_handle else {
+            eprintln!("fbbootlogd: No input handle, skipping display handoff");
+            return;
+        };
+
+        let new_display_handle = match input_handle.open_display_v2() {
+            Ok(display) => match V2GraphicsHandle::from_file(display) {
+                Ok(handle) => handle,
+                Err(err) => {
+                    eprintln!("fbbootlogd: Display v2 protocol not supported: {err}");
+                    return;
+                }
+            },
+             Err(err) => {
+                 eprintln!("fbbootlogd: No display present yet: {err}");
+                 return;
+diff --git a/drivers/graphics/fbcond/src/display.rs b/drivers/graphics/fbcond/src/display.rs
+index eb09b97e..4e347475 100644
+--- a/drivers/graphics/fbcond/src/display.rs
+++ b/drivers/graphics/fbcond/src/display.rs
+@@ -31,7 +31,13 @@ impl Display {
+                 return;
+             }
+         };
+-        let new_display_handle = V2GraphicsHandle::from_file(display_file).unwrap();
+        let new_display_handle = match V2GraphicsHandle::from_file(display_file) {
+            Ok(handle) => handle,
+            Err(err) => {
+                log::error!("fbcond: Display v2 protocol not supported: {err}");
+                return;
+            }
+        };
+ 
+         log::debug!("fbcond: Opened new display");
+ 
+diff --git a/drivers/inputd/src/lib.rs b/drivers/inputd/src/lib.rs
+index b68e8211..b3e8354c 100644
+--- a/drivers/inputd/src/lib.rs
+++ b/drivers/inputd/src/lib.rs
+@@ -77,14 +77,14 @@ impl ConsumerHandle {
+         ));
+         let display_path = display_path.to_str().unwrap();
+ 
+-        let display_file =
+-            libredox::call::open(display_path, (O_CLOEXEC | O_NONBLOCK | O_RDWR) as _, 0)
+-                .map(|socket| unsafe { File::from_raw_fd(socket as RawFd) })
+-                .unwrap_or_else(|err| {
+-                    panic!("failed to open display {}: {}", display_path, err);
+-                });
+-
+-        Ok(display_file)
+        libredox::call::open(display_path, (O_CLOEXEC | O_NONBLOCK | O_RDWR) as _, 0)
+            .map(|socket| unsafe { File::from_raw_fd(socket as RawFd) })
+            .map_err(|err| {
+                io::Error::new(
+                    io::ErrorKind::Other,
+                    format!("failed to open display {}: {}", display_path, err),
+                )
+            })
+     }
+ 
+     pub fn read_events<'a>(&self, events: &'a mut [Event]) -> io::Result<ConsumerHandleEvent<'a>> {
@@ -0,0 +1,65 @@
+diff --git a/drivers/storage/lived/src/main.rs b/drivers/storage/lived/src/main.rs
+index 2ca1ff27..cd92fa85 100644
+--- a/drivers/storage/lived/src/main.rs
+++ b/drivers/storage/lived/src/main.rs
+@@ -55,8 +55,10 @@ impl LiveDisk {
+ }
+ 
+ impl Disk for LiveDisk {
+    // Must be 512 (redoxfs BLOCK_SIZE), not PAGE_SIZE: DiskWrapper::read rejects
+    // buffers not aligned to block_size, and redoxfs reads in 512-byte chunks.
+     fn block_size(&self) -> u32 {
+-        PAGE_SIZE as u32
+        512
+     }
+ 
+     fn size(&self) -> u64 {
+@@ -64,11 +66,12 @@ impl Disk for LiveDisk {
+     }
+ 
+     async fn read(&mut self, mut block: u64, buffer: &mut [u8]) -> syscall::Result<usize> {
+-        let mut offset = (block as usize) * PAGE_SIZE;
+        let bs = self.block_size() as usize;
+        let mut offset = (block as usize) * bs;
+         if offset + buffer.len() > self.original.len() {
+             return Err(syscall::Error::new(EINVAL));
+         }
+-        for chunk in buffer.chunks_mut(PAGE_SIZE) {
+        for chunk in buffer.chunks_mut(bs) {
+             match self.overlay.get(&block) {
+                 Some(overlay) => {
+                     chunk.copy_from_slice(&overlay[..chunk.len()]);
+@@ -78,26 +81,27 @@ impl Disk for LiveDisk {
+                 }
+             }
+             block += 1;
+-            offset += PAGE_SIZE;
+            offset += bs;
+         }
+         Ok(buffer.len())
+     }
+ 
+     async fn write(&mut self, mut block: u64, buffer: &[u8]) -> syscall::Result<usize> {
+-        let mut offset = (block as usize) * PAGE_SIZE;
+        let bs = self.block_size() as usize;
+        let mut offset = (block as usize) * bs;
+         if offset + buffer.len() > self.original.len() {
+             return Err(syscall::Error::new(EINVAL));
+         }
+-        for chunk in buffer.chunks(PAGE_SIZE) {
+        for chunk in buffer.chunks(bs) {
+             self.overlay.entry(block).or_insert_with(|| {
+-                let offset = (block as usize) * PAGE_SIZE;
+-                self.original[offset..offset + PAGE_SIZE]
+                let offset = (block as usize) * bs;
+                self.original[offset..offset + bs]
+                     .to_vec()
+                     .into_boxed_slice()
+             })[..chunk.len()]
+                 .copy_from_slice(chunk);
+             block += 1;
+-            offset += PAGE_SIZE;
+            offset += bs;
+         }
+         Ok(buffer.len())
+     }
@@ -1,8 +1,8 @@
 --- a/src/header/sys_types_internal/cbindgen.toml
 +++ b/src/header/sys_types_internal/cbindgen.toml
@@ -1,4 +1,4 @@
-sys_includes = ["stddef.h"]
-+sys_includes = ["stddef.h", "stdint.h"]
+-sys_includes = ["stddef.h", "stdint.h"]
+sys_includes = ["stddef.h"]
 # TODO: figure out how to export void* type
 after_includes = """
 
@@ -397,6 +397,7 @@ mod tests {
            description: "low-priority driver",
            priority: 10,
            matches: vec![DriverMatch {
+                bus: None,
                vendor: Some(0x1234),
                device: None,
                class: None,
@@ -413,6 +414,7 @@ mod tests {
            description: "high-priority driver",
            priority: 100,
            matches: vec![DriverMatch {
+                bus: None,
                vendor: Some(0x1234),
                device: Some(0x5678),
                class: None,
@@ -496,6 +498,7 @@ mod tests {
            description: "USB host controller",
            priority: 80,
            matches: vec![DriverMatch {
+                bus: None,
                vendor: Some(0x8086),
                device: None,
                class: Some(0x0c),
@@ -8,6 +8,11 @@ pub type MatchPriority = i32;
 /// A single entry in a driver's match table.
 #[derive(Clone, Debug, PartialEq, Eq, Default)]
 pub struct DriverMatch {
+    /// Optional bus type match (e.g., "pci", "acpi").
+    ///
+    /// When set, only devices on the specified bus will match.
+    /// When `None`, the match applies to any bus (backward compatible).
+    pub bus: Option<String>,
    /// Optional vendor identifier match.
    pub vendor: Option<u16>,
    /// Optional device identifier match.
@@ -27,7 +32,8 @@ pub struct DriverMatch {
 impl DriverMatch {
    /// Checks whether this match entry matches the provided device information.
    pub fn matches(&self, info: &DeviceInfo) -> bool {
-        self.vendor.map_or(true, |v| info.vendor == Some(v))
+        self.bus.as_ref().map_or(true, |b| &info.id.bus == b)
+            && self.vendor.map_or(true, |v| info.vendor == Some(v))
            && self.device.map_or(true, |d| info.device == Some(d))
            && self.class.map_or(true, |c| info.class == Some(c))
            && self.subclass.map_or(true, |s| info.subclass == Some(s))
@@ -100,6 +106,7 @@ mod tests {
    fn driver_match_accepts_exact_match() {
        let info = sample_device();
        let driver_match = DriverMatch {
+            bus: None,
            vendor: Some(0x8086),
            device: Some(0x1234),
            class: Some(0x03),
@@ -116,6 +123,7 @@ mod tests {
    fn driver_match_supports_wildcards() {
        let info = sample_device();
        let driver_match = DriverMatch {
+            bus: None,
            vendor: Some(0x8086),
            device: None,
            class: Some(0x03),
@@ -132,6 +140,7 @@ mod tests {
    fn driver_match_rejects_mismatch() {
        let info = sample_device();
        let driver_match = DriverMatch {
+            bus: None,
            vendor: Some(0x10ec),
            device: None,
            class: None,
@@ -143,4 +152,48 @@ mod tests {

        assert!(!driver_match.matches(&info));
    }
+
+    #[test]
+    fn driver_match_bus_filtering() {
+        let info = sample_device();
+
+        // Matching bus should pass
+        let pci_match = DriverMatch {
+            bus: Some(String::from("pci")),
+            vendor: Some(0x8086),
+            device: None,
+            class: None,
+            subclass: None,
+            prog_if: None,
+            subsystem_vendor: None,
+            subsystem_device: None,
+        };
+        assert!(pci_match.matches(&info));
+
+        // Non-matching bus should fail
+        let acpi_match = DriverMatch {
+            bus: Some(String::from("acpi")),
+            vendor: Some(0x8086),
+            device: None,
+            class: None,
+            subclass: None,
+            prog_if: None,
+            subsystem_vendor: None,
+            subsystem_device: None,
+        };
+        assert!(!acpi_match.matches(&info));
+
+        // None bus should match any device (backward compatible)
+        let any_bus = DriverMatch {
+            bus: None,
+            vendor: Some(0x8086),
+            device: None,
+            class: None,
+            subclass: None,
+            prog_if: None,
+            subsystem_vendor: None,
+            subsystem_device: None,
+        };
+        assert!(any_bus.matches(&info));
+    }
 }
@@ -291,12 +291,21 @@ fn read_cpu_count() -> Result<u8> {
 #[cfg(target_os = "redox")]
 fn alloc_cpu_id() -> u8 {
    match read_cpu_count() {
-        Ok(n) if n > 0 => {
+        Ok(0) => {
+            log::warn!("redox-driver-sys: read_cpu_count returned 0, defaulting to BSP (cpu 0)");
+            0
+        }
+        Ok(n) => {
            use std::sync::atomic::{AtomicU8, Ordering};
            static NEXT: AtomicU8 = AtomicU8::new(0);
-            NEXT.fetch_add(1, Ordering::Relaxed) % n
+            let cpu_id = NEXT.fetch_add(1, Ordering::Relaxed) % n;
+            log::debug!("redox-driver-sys: alloc_cpu_id selected cpu {} (of {})", cpu_id, n);
+            cpu_id
+        }
+        Err(err) => {
+            log::warn!("redox-driver-sys: read_cpu_count failed ({}), defaulting to BSP (cpu 0)", err);
+            0
        }
-        _ => 0,
    }
 }

@@ -11,12 +11,16 @@ use redox_scheme::Socket;
 use redox_scheme::scheme::{SchemeAsync, SchemeSync};

 unsafe fn get_fd(var: &str) -> Option<RawFd> {
+    // Env vars like INIT_NOTIFY are optional — daemons not spawned by init
+    // simply don't have them. Return None silently instead of spewing errors.
    let fd: RawFd = match std::env::var(var)
-        .map_err(|e| eprintln!("daemon: env var {var} not set: {e}"))
        .ok()
        .and_then(|val| {
-            val.parse()
-                .map_err(|e| eprintln!("daemon: failed to parse {var} as fd: {e}"))
+            val.parse::<RawFd>()
+                .map_err(|e| {
+                    eprintln!("daemon: failed to parse {var} as fd: {e}");
+                    e
+                })
                .ok()
        }) {
        Some(fd) => fd,
@@ -123,45 +123,3 @@ mod tests {
        assert_eq!(&edid[0..8], &header, "EDID header should be valid");
    }
 }
-
-#[cfg(test)]
-mod tests {
-    use super::*;
-
-    #[test]
-    fn synthetic_displayport_has_correct_fields() {
-        let conn = Connector::synthetic_displayport(5, 10);
-        assert_eq!(conn.info.id, 5);
-        assert_eq!(conn.info.encoder_id, 10);
-        assert_eq!(conn.info.connector_type, ConnectorType::DisplayPort);
-        assert_eq!(conn.info.connection, ConnectorStatus::Connected);
-        assert!(
-            !conn.info.modes.is_empty(),
-            "synthetic DisplayPort should have modes"
-        );
-    }
-
-    #[test]
-    fn synthetic_displayport_modes_have_valid_dimensions() {
-        let conn = Connector::synthetic_displayport(1, 1);
-        for mode in &conn.info.modes {
-            assert!(mode.hdisplay > 0, "mode hdisplay should be > 0");
-            assert!(mode.vdisplay > 0, "mode vdisplay should be > 0");
-            assert!(mode.vrefresh > 0, "mode vrefresh should be > 0");
-            assert!(mode.clock > 0, "mode clock should be > 0");
-        }
-    }
-
-    #[test]
-    fn synthetic_edid_returns_exactly_112_bytes() {
-        let edid = synthetic_edid();
-        assert_eq!(edid.len(), 112);
-    }
-
-    #[test]
-    fn synthetic_edid_has_valid_header() {
-        let edid = synthetic_edid();
-        let header: [u8; 8] = [0x00, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0x00];
-        assert_eq!(&edid[0..8], &header, "EDID header should be valid");
-    }
-}
@@ -198,14 +198,15 @@ mod tests {
    }

    #[test]
-    fn from_edid_synthetic_edid_too_short_returns_empty() {
+    fn from_edid_synthetic_edid_parses_1080p_mode() {
        let edid = super::connector::synthetic_edid();
-        assert!(edid.len() < 128, "synthetic EDID is shorter than 128 bytes");
+        assert_eq!(edid.len(), 128, "synthetic EDID must be 128 bytes");
        let modes = ModeInfo::from_edid(&edid);
-        assert!(
-            modes.is_empty(),
-            "EDID shorter than 128 bytes should produce no modes"
-        );
+        assert!(!modes.is_empty(), "valid 128-byte EDID should produce at least one mode");
+        let mode = &modes[0];
+        assert_eq!(mode.hdisplay, 1920, "first mode should be 1920px wide");
+        assert_eq!(mode.vdisplay, 1080, "first mode should be 1080px tall");
+        assert_eq!(mode.vrefresh, 60, "first mode should be 60 Hz");
    }

    #[test]
@@ -1,175 +1 @@
-GNU LESSER GENERAL PUBLIC LICENSE
-
-Version 2.1, February 1999
-
-Copyright (C) 1991, 1999 Free Software Foundation, Inc.
-51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA
-
-Everyone is permitted to copy and distribute verbatim copies of this license document, but changing it is not allowed.
-
-[This is the first released version of the Lesser GPL.  It also counts as the successor of the GNU Library Public License, version 2, hence the version number 2.1.]
-
-Preamble
-
-The licenses for most software are designed to take away your freedom to share and change it. By contrast, the GNU General Public Licenses are intended to guarantee your freedom to share and change free software--to make sure the software is free for all its users.
-
-This license, the Lesser General Public License, applies to some specially designated software packages--typically libraries--of the Free Software Foundation and other authors who decide to use it. You can use it too, but we suggest you first think carefully about whether this license or the ordinary General Public License is the better strategy to use in any particular case, based on the explanations below.
-
-When we speak of free software, we are referring to freedom of use, not price. Our General Public Licenses are designed to make sure that you have the freedom to distribute copies of free software (and charge for this service if you wish); that you receive source code or can get it if you want it; that you can change the software and use pieces of it in new free programs; and that you are informed that you can do these things.
-
-To protect your rights, we need to make restrictions that forbid distributors to deny you these rights or to ask you to surrender these rights. These restrictions translate to certain responsibilities for you if you distribute copies of the library or if you modify it.
-
-For example, if you distribute copies of the library, whether gratis or for a fee, you must give the recipients all the rights that we gave you. You must make sure that they, too, receive or can get the source code. If you link other code with the library, you must provide complete object files to the recipients, so that they can relink them with the library after making changes to the library and recompiling it. And you must show them these terms so they know their rights.
-
-We protect your rights with a two-step method: (1) we copyright the library, and (2) we offer you this license, which gives you legal permission to copy, distribute and/or modify the library.
-
-To protect each distributor, we want to make it very clear that there is no warranty for the free library. Also, if the library is modified by someone else and passed on, the recipients should know that what they have is not the original version, so that the original author's reputation will not be affected by problems that might be introduced by others.
-
-Finally, software patents pose a constant threat to the existence of any free program. We wish to make sure that a company cannot effectively restrict the users of a free program by obtaining a restrictive license from a patent holder. Therefore, we insist that any patent license obtained for a version of the library must be consistent with the full freedom of use specified in this license.
-
-Most GNU software, including some libraries, is covered by the ordinary GNU General Public License. This license, the GNU Lesser General Public License, applies to certain designated libraries, and is quite different from the ordinary General Public License. We use this license for certain libraries in order to permit linking those libraries into non-free programs.
-
-When a program is linked with a library, whether statically or using a shared library, the combination of the two is legally speaking a combined work, a derivative of the original library. The ordinary General Public License therefore permits such linking only if the entire combination fits its criteria of freedom. The Lesser General Public License permits more lax criteria for linking other code with the library.
-
-We call this license the "Lesser" General Public License because it does Less to protect the user's freedom than the ordinary General Public License. It also provides other free software developers Less of an advantage over competing non-free programs. These disadvantages are the reason we use the ordinary General Public License for many libraries. However, the Lesser license provides advantages in certain special circumstances.
-
-For example, on rare occasions, there may be a special need to encourage the widest possible use of a certain library, so that it becomes a de-facto standard. To achieve this, non-free programs must be allowed to use the library. A more frequent case is that a free library does the same job as widely used non-free libraries. In this case, there is little to gain by limiting the free library to free software only, so we use the Lesser General Public License.
-
-In other cases, permission to use a particular library in non-free programs enables a greater number of people to use a large body of free software. For example, permission to use the GNU C Library in non-free programs enables many more people to use the whole GNU operating system, as well as its variant, the GNU/Linux operating system.
-
-Although the Lesser General Public License is Less protective of the users' freedom, it does ensure that the user of a program that is linked with the Library has the freedom and the wherewithal to run that program using a modified version of the Library.
-
-The precise terms and conditions for copying, distribution and modification follow. Pay close attention to the difference between a "work based on the library" and a "work that uses the library". The former contains code derived from the library, whereas the latter must be combined with the library in order to run.
-
-TERMS AND CONDITIONS FOR COPYING, DISTRIBUTION AND MODIFICATION
-
-0. This License Agreement applies to any software library or other program which contains a notice placed by the copyright holder or other authorized party saying it may be distributed under the terms of this Lesser General Public License (also called "this License"). Each licensee is addressed as "you".
-
-A "library" means a collection of software functions and/or data prepared so as to be conveniently linked with application programs (which use some of those functions and data) to form executables.
-
-The "Library", below, refers to any such software library or work which has been distributed under these terms. A "work based on the Library" means either the Library or any derivative work under copyright law: that is to say, a work containing the Library or a portion of it, either verbatim or with modifications and/or translated straightforwardly into another language. (Hereinafter, translation is included without limitation in the term "modification".)
-
-"Source code" for a work means the preferred form of the work for making modifications to it. For a library, complete source code means all the source code for all modules it contains, plus any associated interface definition files, plus the scripts used to control compilation and installation of the library.
-
-Activities other than copying, distribution and modification are not covered by this License; they are outside its scope. The act of running a program using the Library is not restricted, and output from such a program is covered only if its contents constitute a work based on the Library (independent of the use of the Library in a tool for writing it). Whether that is true depends on what the Library does and what the program that uses the Library does.
-
-1. You may copy and distribute verbatim copies of the Library's complete source code as you receive it, in any medium, provided that you conspicuously and appropriately publish on each copy an appropriate copyright notice and disclaimer of warranty; keep intact all the notices that refer to this License and to the absence of any warranty; and distribute a copy of this License along with the Library.
-
-You may charge a fee for the physical act of transferring a copy, and you may at your option offer warranty protection in exchange for a fee.
-
-2. You may modify your copy or copies of the Library or any portion of it, thus forming a work based on the Library, and copy and distribute such modifications or work under the terms of Section 1 above, provided that you also meet all of these conditions:
-
-     a) The modified work must itself be a software library.
-
-     b) You must cause the files modified to carry prominent notices stating that you changed the files and the date of any change.
-
-     c) You must cause the whole of the work to be licensed at no charge to all third parties under the terms of this License.
-
-     d) If a facility in the modified Library refers to a function or a table of data to be supplied by an application program that uses the facility, other than as an argument passed when the facility is invoked, then you must make a good faith effort to ensure that, in the event an application does not supply such function or table, the facility still operates, and performs whatever part of its purpose remains meaningful.
-
-(For example, a function in a library to compute square roots has a purpose that is entirely well-defined independent of the application. Therefore, Subsection 2d requires that any application-supplied function or table used by this function must be optional: if the application does not supply it, the square root function must still compute square roots.)
-
-These requirements apply to the modified work as a whole. If identifiable sections of that work are not derived from the Library, and can be reasonably considered independent and separate works in themselves, then this License, and its terms, do not apply to those sections when you distribute them as separate works. But when you distribute the same sections as part of a whole which is a work based on the Library, the distribution of the whole must be on the terms of this License, whose permissions for other licensees extend to the entire whole, and thus to each and every part regardless of who wrote it.
-
-Thus, it is not the intent of this section to claim rights or contest your rights to work written entirely by you; rather, the intent is to exercise the right to control the distribution of derivative or collective works based on the Library.
-
-In addition, mere aggregation of another work not based on the Library with the Library (or with a work based on the Library) on a volume of a storage or distribution medium does not bring the other work under the scope of this License.
-
-3. You may opt to apply the terms of the ordinary GNU General Public License instead of this License to a given copy of the Library. To do this, you must alter all the notices that refer to this License, so that they refer to the ordinary GNU General Public License, version 2, instead of to this License. (If a newer version than version 2 of the ordinary GNU General Public License has appeared, then you can specify that version instead if you wish.) Do not make any other change in these notices.
-
-Once this change is made in a given copy, it is irreversible for that copy, so the ordinary GNU General Public License applies to all subsequent copies and derivative works made from that copy.
-
-This option is useful when you wish to copy part of the code of the Library into a program that is not a library.
-
-4. You may copy and distribute the Library (or a portion or derivative of it, under Section 2) in object code or executable form under the terms of Sections 1 and 2 above provided that you accompany it with the complete corresponding machine-readable source code, which must be distributed under the terms of Sections 1 and 2 above on a medium customarily used for software interchange.
-
-If distribution of object code is made by offering access to copy from a designated place, then offering equivalent access to copy the source code from the same place satisfies the requirement to distribute the source code, even though third parties are not compelled to copy the source along with the object code.
-
-5. A program that contains no derivative of any portion of the Library, but is designed to work with the Library by being compiled or linked with it, is called a "work that uses the Library". Such a work, in isolation, is not a derivative work of the Library, and therefore falls outside the scope of this License.
-
-However, linking a "work that uses the Library" with the Library creates an executable that is a derivative of the Library (because it contains portions of the Library), rather than a "work that uses the library". The executable is therefore covered by this License. Section 6 states terms for distribution of such executables.
-
-When a "work that uses the Library" uses material from a header file that is part of the Library, the object code for the work may be a derivative work of the Library even though the source code is not. Whether this is true is especially significant if the work can be linked without the Library, or if the work is itself a library. The threshold for this to be true is not precisely defined by law.
-
-If such an object file uses only numerical parameters, data structure layouts and accessors, and small macros and small inline functions (ten lines or less in length), then the use of the object file is unrestricted, regardless of whether it is legally a derivative work. (Executables containing this object code plus portions of the Library will still fall under Section 6.)
-
-Otherwise, if the work is a derivative of the Library, you may distribute the object code for the work under the terms of Section 6. Any executables containing that work also fall under Section 6, whether or not they are linked directly with the Library itself.
-
-6. As an exception to the Sections above, you may also combine or link a "work that uses the Library" with the Library to produce a work containing portions of the Library, and distribute that work under terms of your choice, provided that the terms permit modification of the work for the customer's own use and reverse engineering for debugging such modifications.
-
-You must give prominent notice with each copy of the work that the Library is used in it and that the Library and its use are covered by this License. You must supply a copy of this License. If the work during execution displays copyright notices, you must include the copyright notice for the Library among them, as well as a reference directing the user to the copy of this License. Also, you must do one of these things:
-
-     a) Accompany the work with the complete corresponding machine-readable source code for the Library including whatever changes were used in the work (which must be distributed under Sections 1 and 2 above); and, if the work is an executable linked with the Library, with the complete machine-readable "work that uses the Library", as object code and/or source code, so that the user can modify the Library and then relink to produce a modified executable containing the modified Library. (It is understood that the user who changes the contents of definitions files in the Library will not necessarily be able to recompile the application to use the modified definitions.)
-
-     b) Use a suitable shared library mechanism for linking with the Library. A suitable mechanism is one that (1) uses at run time a copy of the library already present on the user's computer system, rather than copying library functions into the executable, and (2) will operate properly with a modified version of the library, if the user installs one, as long as the modified version is interface-compatible with the version that the work was made with.
-
-     c) Accompany the work with a written offer, valid for at least three years, to give the same user the materials specified in Subsection 6a, above, for a charge no more than the cost of performing this distribution.
-
-     d) If distribution of the work is made by offering access to copy from a designated place, offer equivalent access to copy the above specified materials from the same place.
-
-     e) Verify that the user has already received a copy of these materials or that you have already sent this user a copy.
-
-For an executable, the required form of the "work that uses the Library" must include any data and utility programs needed for reproducing the executable from it. However, as a special exception, the materials to be distributed need not include anything that is normally distributed (in either source or binary form) with the major components (compiler, kernel, and so on) of the operating system on which the executable runs, unless that component itself accompanies the executable.
-
-It may happen that this requirement contradicts the license restrictions of other proprietary libraries that do not normally accompany the operating system. Such a contradiction means you cannot use both them and the Library together in an executable that you distribute.
-
-7. You may place library facilities that are a work based on the Library side-by-side in a single library together with other library facilities not covered by this License, and distribute such a combined library, provided that the separate distribution of the work based on the Library and of the other library facilities is otherwise permitted, and provided that you do these two things:
-
-     a) Accompany the combined library with a copy of the same work based on the Library, uncombined with any other library facilities. This must be distributed under the terms of the Sections above.
-
-     b) Give prominent notice with the combined library of the fact that part of it is a work based on the Library, and explaining where to find the accompanying uncombined form of the same work.
-
-8. You may not copy, modify, sublicense, link with, or distribute the Library except as expressly provided under this License. Any attempt otherwise to copy, modify, sublicense, link with, or distribute the Library is void, and will automatically terminate your rights under this License. However, parties who have received copies, or rights, from you under this License will not have their licenses terminated so long as such parties remain in full compliance.
-
-9. You are not required to accept this License, since you have not signed it. However, nothing else grants you permission to modify or distribute the Library or its derivative works. These actions are prohibited by law if you do not accept this License. Therefore, by modifying or distributing the Library (or any work based on the Library), you indicate your acceptance of this License to do so, and all its terms and conditions for copying, distributing or modifying the Library or works based on it.
-
-10. Each time you redistribute the Library (or any work based on the Library), the recipient automatically receives a license from the original licensor to copy, distribute, link with or modify the Library subject to these terms and conditions. You may not impose any further restrictions on the recipients' exercise of the rights granted herein. You are not responsible for enforcing compliance by third parties with this License.
-
-11. If, as a consequence of a court judgment or allegation of patent infringement or for any other reason (not limited to patent issues), conditions are imposed on you (whether by court order, agreement or otherwise) that contradict the conditions of this License, they do not excuse you from the conditions of this License. If you cannot distribute so as to satisfy simultaneously your obligations under this License and any other pertinent obligations, then as a consequence you may not distribute the Library at all. For example, if a patent license would not permit royalty-free redistribution of the Library by all those who receive copies directly or indirectly through you, then the only way you could satisfy both it and this License would be to refrain entirely from distribution of the Library.
-
-If any portion of this section is held invalid or unenforceable under any particular circumstance, the balance of the section is intended to apply, and the section as a whole is intended to apply in other circumstances.
-
-It is not the purpose of this section to induce you to infringe any patents or other property right claims or to contest validity of any such claims; this section has the sole purpose of protecting the integrity of the free software distribution system which is implemented by public license practices. Many people have made generous contributions to the wide range of software distributed through that system in reliance on consistent application of that system; it is up to the author/donor to decide if he or she is willing to distribute software through any other system and a licensee cannot impose that choice.
-
-This section is intended to make thoroughly clear what is believed to be a consequence of the rest of this License.
-
-12. If the distribution and/or use of the Library is restricted in certain countries either by patents or by copyrighted interfaces, the original copyright holder who places the Library under this License may add an explicit geographical distribution limitation excluding those countries, so that distribution is permitted only in or among countries not thus excluded. In such case, this License incorporates the limitation as if written in the body of this License.
-
-13. The Free Software Foundation may publish revised and/or new versions of the Lesser General Public License from time to time. Such new versions will be similar in spirit to the present version, but may differ in detail to address new problems or concerns.
-
-Each version is given a distinguishing version number. If the Library specifies a version number of this License which applies to it and "any later version", you have the option of following the terms and conditions either of that version or of any later version published by the Free Software Foundation. If the Library does not specify a license version number, you may choose any version ever published by the Free Software Foundation.
-
-14. If you wish to incorporate parts of the Library into other free programs whose distribution conditions are incompatible with these, write to the author to ask for permission. For software which is copyrighted by the Free Software Foundation, write to the Free Software Foundation; we sometimes make exceptions for this. Our decision will be guided by the two goals of preserving the free status of all derivatives of our free software and of promoting the sharing and reuse of software generally.
-
-NO WARRANTY
-
-15. BECAUSE THE LIBRARY IS LICENSED FREE OF CHARGE, THERE IS NO WARRANTY FOR THE LIBRARY, TO THE EXTENT PERMITTED BY APPLICABLE LAW. EXCEPT WHEN OTHERWISE STATED IN WRITING THE COPYRIGHT HOLDERS AND/OR OTHER PARTIES PROVIDE THE LIBRARY "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE ENTIRE RISK AS TO THE QUALITY AND PERFORMANCE OF THE LIBRARY IS WITH YOU. SHOULD THE LIBRARY PROVE DEFECTIVE, YOU ASSUME THE COST OF ALL NECESSARY SERVICING, REPAIR OR CORRECTION.
-
-16. IN NO EVENT UNLESS REQUIRED BY APPLICABLE LAW OR AGREED TO IN WRITING WILL ANY COPYRIGHT HOLDER, OR ANY OTHER PARTY WHO MAY MODIFY AND/OR REDISTRIBUTE THE LIBRARY AS PERMITTED ABOVE, BE LIABLE TO YOU FOR DAMAGES, INCLUDING ANY GENERAL, SPECIAL, INCIDENTAL OR CONSEQUENTIAL DAMAGES ARISING OUT OF THE USE OR INABILITY TO USE THE LIBRARY (INCLUDING BUT NOT LIMITED TO LOSS OF DATA OR DATA BEING RENDERED INACCURATE OR LOSSES SUSTAINED BY YOU OR THIRD PARTIES OR A FAILURE OF THE LIBRARY TO OPERATE WITH ANY OTHER SOFTWARE), EVEN IF SUCH HOLDER OR OTHER PARTY HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGES.
-
-END OF TERMS AND CONDITIONS
-
-How to Apply These Terms to Your New Libraries
-
-If you develop a new library, and you want it to be of the greatest possible use to the public, we recommend making it free software that everyone can redistribute and change. You can do so by permitting redistribution under these terms (or, alternatively, under the terms of the ordinary General Public License).
-
-To apply these terms, attach the following notices to the library. It is safest to attach them to the start of each source file to most effectively convey the exclusion of warranty; and each file should have at least the "copyright" line and a pointer to where the full notice is found.
-
-     one line to give the library's name and an idea of what it does.
-     Copyright (C) year  name of author
-
-     This library is free software; you can redistribute it and/or modify it under the terms of the GNU Lesser General Public License as published by the Free Software Foundation; either version 2.1 of the License, or (at your option) any later version.
-
-     This library is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU Lesser General Public License for more details.
-
-     You should have received a copy of the GNU Lesser General Public License along with this library; if not, write to the Free Software Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA  02110-1301  USA Also add information on how to contact you by electronic and paper mail.
-
-You should also get your employer (if you work as a programmer) or your school, if any, to sign a "copyright disclaimer" for the library, if necessary. Here is a sample; alter the names:
-
-Yoyodyne, Inc., hereby disclaims all copyright interest in
-the library `Frob' (a library for tweaking knobs) written
-by James Random Hacker.
-
-signature of Ty Coon, 1 April 1990
-Ty Coon, President of Vice
-That's all there is to it!
+../LICENSES/LGPL-2.1-or-later.txt
@@ -58,8 +58,12 @@ fn read_acpi_pss(cpu: u32) -> Vec<PState> {
 }

 fn write_msr(cpu: u32, msr: u32, val: u64) -> bool {
-    fs::OpenOptions::new().write(true).open(format!("/dev/cpu/{}/msr", cpu)).ok()
-        .map(|mut f| f.write_all(&val.to_ne_bytes()).is_ok()).unwrap_or(false)
+    let path = format!("/scheme/sys/msr/{}/{:x}", cpu, msr);
+    fs::OpenOptions::new().write(true).open(&path).ok()
+        .and_then(|mut f| {
+            let hex_val = format!("{:016x}", val);
+            f.write_all(hex_val.as_bytes()).ok()
+        }).is_some()
 }

 fn measure_load(cpu: u32, prev: &mut (u64, u64)) -> f64 {
@@ -11,6 +11,7 @@ path = "src/main.rs"
 [dependencies]
 redox-driver-core = { path = "../../drivers/redox-driver-core" }
 redox-driver-pci = { path = "../../drivers/redox-driver-pci" }
+redox-driver-acpi = { path = "../../drivers/redox-driver-acpi" }
 pcid_interface = { path = "../../../../recipes/core/base/source/drivers/pcid", package = "pcid" }
 redox_syscall = "0.7"
 log = "0.4"
@@ -2,4 +2,20 @@
 path = "source"

 [build]
-template = "cargo"
+template = "custom"
+script = """
+# driver-manager runs in both rootfs and initfs; initfs has no dynamic linker,
+# so we must build a statically linked binary.
+export RUSTFLAGS="${RUSTFLAGS} -Ctarget-feature=+crt-static -L native=${COOKBOOK_SYSROOT}/lib"
+"${COOKBOOK_CARGO}" build \
+    --manifest-path "${COOKBOOK_SOURCE}/Cargo.toml" \
+    --target "${TARGET}" \
+    ${build_flags}
+mkdir -pv "${COOKBOOK_STAGE}/usr/bin"
+cp -v "target/${TARGET}/${build_type}/driver-manager" "${COOKBOOK_STAGE}/usr/bin/driver-manager"
+"""
+
+[dependencies]
+redox-driver-core = {}
+redox-driver-pci = {}
+redox-driver-acpi = {}
@@ -11,9 +11,11 @@ path = "src/main.rs"
 [dependencies]
 redox-driver-core = { path = "../../../drivers/redox-driver-core/source" }
 redox-driver-pci = { path = "../../../drivers/redox-driver-pci/source" }
+redox-driver-acpi = { path = "../../../drivers/redox-driver-acpi/source" }
 pcid_interface = { path = "../../../../../recipes/core/base/source/drivers/pcid", package = "pcid" }
 redox-scheme = "0.11"
 syscall = { package = "redox_syscall", version = "0.7" }
 log = "0.4"
 toml = "0.8"
 serde = { version = "1", features = ["derive"] }
+libc = "0.2"
@@ -1,4 +1,5 @@
 use std::collections::HashMap;
+use std::collections::BTreeSet;
 use std::fs;
 use std::os::fd::{AsRawFd, FromRawFd, OwnedFd};
 use std::path::Path;
@@ -8,11 +9,18 @@ use std::sync::Mutex;
 use std::vec::Vec;

 use pcid_interface::PciFunctionHandle;
+use redox_driver_acpi::AcpiBus;
 use redox_driver_core::device::DeviceInfo;
 use redox_driver_core::driver::{Driver, DriverError, ProbeResult};
 use redox_driver_core::r#match::DriverMatch;
 use redox_driver_core::params::{DriverParams, ParamValue};

+// Device+driver pairs that should never be re-probed because the driver
+// binary is absent (Fatal), the driver declined the device (NotSupported),
+// or deferred retries were exhausted. Checked by probe() before any work.
+pub(crate) static PERMANENTLY_SKIPPED: Mutex<BTreeSet<(String, String)>> =
+    Mutex::new(BTreeSet::new());
+
 use serde::Deserialize;

 #[derive(Debug)]
@@ -48,6 +56,7 @@ impl Clone for DriverConfig {

 #[derive(Deserialize)]
 struct RawDriverMatch {
+    bus: Option<String>,
    vendor: Option<u16>,
    device: Option<u16>,
    class: Option<u8>,
@@ -60,6 +69,7 @@ struct RawDriverMatch {
 impl From<RawDriverMatch> for DriverMatch {
    fn from(r: RawDriverMatch) -> Self {
        DriverMatch {
+            bus: r.bus,
            vendor: r.vendor,
            device: r.device,
            class: r.class,
@@ -97,7 +107,7 @@ impl DriverConfig {

                if matches.is_empty() {
                    log::warn!(
-                        "driver-manager: config {} driver={} has no PCI match entries and will not bind from PCI enumeration",
+                        "driver-manager: config {} driver={} has no match entries and will not bind from PCI or ACPI enumeration",
                        path.display(),
                        driver.name
                    );
@@ -128,6 +138,18 @@ fn pci_device_path(info: &DeviceInfo) -> String {
    }
 }

+/// Build the ACPI scheme path for a device.
+///
+/// The path follows the pattern `/scheme/acpi/symbols/{device_name}`,
+/// where device_name is the ACPI namespace path (e.g., "PCI0", "I2C0", "GPI0").
+fn acpi_device_path(info: &DeviceInfo) -> String {
+    if info.raw_path.starts_with("/scheme/acpi/") {
+        info.raw_path.clone()
+    } else {
+        format!("/scheme/acpi/symbols/{}", info.id.path)
+    }
+}
+
 fn open_pcid_channel(device_path: &str) -> Result<OwnedFd, ProbeResult> {
    let mut handle = match PciFunctionHandle::connect_by_path(Path::new(device_path)) {
        Ok(handle) => handle,
@@ -154,10 +176,24 @@ fn open_pcid_channel(device_path: &str) -> Result<OwnedFd, ProbeResult> {
 }

 fn check_scheme_available(name: &str) -> bool {
-    if std::path::Path::new(&format!("/scheme/{}", name)).exists() {
-        return true;
-    }
+    let path = format!("/scheme/{}", name);
+    // Use read_dir instead of Path::exists() because Redox scheme paths
+    // may not respond correctly to exists()/metadata() while still being
+    // fully functional for directory enumeration and file open.
+    // This was the root cause of "dependency scheme not ready: pci" even
+    // though PciBus::enumerate_devices (which uses read_dir) succeeded.
+    match fs::read_dir(&path) {
+        Ok(_) => true,
+        Err(err) => {
+            log::debug!(
+                "driver-manager: scheme availability check failed for {}: {} (exists={})",
+                path,
+                err,
+                std::path::Path::new(&path).exists()
+            );
            false
+        }
+    }
 }

 impl Driver for DriverConfig {
@@ -195,6 +231,22 @@ impl Driver for DriverConfig {
            }
        }

+        // Check if this device+driver pair was permanently abandoned
+        // by the hotplug loop (binary missing, driver declined, or
+        // deferred retries exhausted). Skip without any work or logging.
+        {
+            let key = (device_key.clone(), self.name.clone());
+            let skipped = match PERMANENTLY_SKIPPED.lock() {
+                Ok(skipped) => skipped,
+                Err(_) => return ProbeResult::Fatal {
+                    reason: String::from("skip set lock poisoned"),
+                },
+            };
+            if skipped.contains(&key) {
+                return ProbeResult::NotSupported;
+            }
+        }
+
        if self.command.is_empty() {
            return ProbeResult::Fatal {
                reason: String::from("empty command"),
@@ -207,11 +259,28 @@ impl Driver for DriverConfig {
            format!("/usr/lib/drivers/{}", self.command[0])
        };

+        // Also check the initfs path — drivers like nvmed live in
+        // /scheme/initfs/lib/drivers/ during early boot and may not yet
+        // be staged to /usr/lib/drivers/ after switchroot.
        if !std::path::Path::new(&actual_path).exists() {
-            return ProbeResult::Fatal {
-                reason: format!("driver binary not found: {}", actual_path),
+            let initfs_path = format!("/scheme/initfs/lib/drivers/{}", self.command[0].rsplit('/').next().unwrap_or(&self.command[0]));
+            if std::path::Path::new(&initfs_path).exists() {
+                return ProbeResult::Deferred {
+                    reason: format!("driver in initfs only (not yet in rootfs): {}", initfs_path),
                };
            }
+            return ProbeResult::Fatal {
+                reason: format!("driver binary not found: {} (also checked {})", actual_path, initfs_path),
+            };
+        }
+
+        // Skip if this driver's scheme is already registered (e.g., by
+        // pcid-spawner during initfs). Prevents re-spawning drivers
+        // that are already serving their scheme.
+        if check_scheme_available(&self.name) {
+            log::info!("driver {} already serving scheme, skipping probe for {}", self.name, device_key);
+            return ProbeResult::Bound;
+        }

        let deps: Vec<String> = if !self.depends_on.is_empty() {
            self.depends_on.clone()
@@ -228,43 +297,13 @@ impl Driver for DriverConfig {

        log::info!("probing {} with driver {}", device_key, self.name);

-        let device_path = pci_device_path(info);
-
-        let channel_fd = match open_pcid_channel(&device_path) {
-            Ok(channel_fd) => channel_fd,
-            Err(result) => return result,
-        };
-
-        let mut cmd = Command::new(&actual_path);
-        for arg in &self.command[1..] {
-            cmd.arg(arg);
-        }
-
-        cmd.env("PCID_CLIENT_CHANNEL", channel_fd.as_raw_fd().to_string());
-        cmd.env("PCID_DEVICE_PATH", &device_path);
-
-        match cmd.spawn() {
-            Ok(child) => {
-                let pid = child.id();
-                log::info!(
-                    "driver {} spawned (pid {}) for device {}",
-                    self.name,
-                    pid,
-                    device_key
-                );
-                let mut spawned = match self.spawned.lock() {
-                    Ok(spawned) => spawned,
-                    Err(err) => {
-                        return ProbeResult::Fatal {
-                            reason: format!("spawn state lock poisoned after spawn: {err}"),
-                        };
-                    }
-                };
-                spawned.insert(device_key, SpawnedDriver { child, channel_fd });
-                ProbeResult::Bound
-            }
-            Err(e) => ProbeResult::Fatal {
-                reason: format!("spawn failed: {}", e),
+        // Branch on bus type: PCI devices use the pcid channel,
+        // ACPI devices use the ACPI scheme path with resource queries.
+        match info.id.bus.as_str() {
+            "pci" => self.probe_pci_device(info, &device_key, &actual_path),
+            "acpi" => self.probe_acpi_device(info, &device_key, &actual_path),
+            other => ProbeResult::Fatal {
+                reason: format!("unsupported bus type: {}", other),
            },
        }
    }
@@ -340,6 +379,223 @@ impl Driver for DriverConfig {
    }
 }

+impl DriverConfig {
+    /// Check for exited child processes (non-blocking waitpid).
+    /// Returns a list of (device_key, driver_name, exit_status) for exited drivers.
+    pub fn reap_exited_children(&self) -> Vec<(String, String, i32)> {
+        let mut exited = Vec::new();
+        let Ok(mut spawned) = self.spawned.lock() else {
+            return exited;
+        };
+
+        let mut to_remove = Vec::new();
+
+        for (device_key, spawned_driver) in spawned.iter_mut() {
+            match spawned_driver.child.try_wait() {
+                Ok(Some(status)) => {
+                    let code = status.code().unwrap_or(-1);
+                    log::warn!(
+                        "driver {} (pid {}) for device {} exited with status {}",
+                        self.name,
+                        spawned_driver.child.id(),
+                        device_key,
+                        code
+                    );
+                    to_remove.push(device_key.clone());
+                    exited.push((device_key.clone(), self.name.clone(), code));
+                }
+                Ok(None) => {
+                    // Still running
+                }
+                Err(err) => {
+                    log::error!(
+                        "failed to check status of driver {} pid {}: {}",
+                        self.name,
+                        spawned_driver.child.id(),
+                        err
+                    );
+                }
+            }
+        }
+
+        for key in to_remove {
+            spawned.remove(&key);
+        }
+
+        exited
+    }
+}
+
+impl DriverConfig {
+    /// Probe and spawn a driver for a PCI device.
+    ///
+    /// Opens a pcid channel for PCI config space access and passes the
+    /// channel FD and device path to the spawned driver via environment variables.
+    fn probe_pci_device(
+        &self,
+        info: &DeviceInfo,
+        device_key: &str,
+        actual_path: &str,
+    ) -> ProbeResult {
+        let device_path = pci_device_path(info);
+
+        let channel_fd = match open_pcid_channel(&device_path) {
+            Ok(channel_fd) => channel_fd,
+            Err(result) => return result,
+        };
+
+        let mut cmd = Command::new(actual_path);
+        for arg in &self.command[1..] {
+            cmd.arg(arg);
+        }
+
+        cmd.env("PCID_CLIENT_CHANNEL", channel_fd.as_raw_fd().to_string());
+        cmd.env("PCID_DEVICE_PATH", &device_path);
+
+        self.spawn_driver(cmd, device_key, channel_fd)
+    }
+
+    /// Probe and spawn a driver for an ACPI device.
+    ///
+    /// Queries ACPI resources (_CRS) from the device and passes them as
+    /// environment variables to the spawned driver. The driver can then
+    /// use these to map MMIO regions and request IRQs.
+    ///
+    /// # Linux equivalent
+    ///
+    /// Linux's `acpi_device_probe()` calls `acpi_dev_get_resources()`
+    /// to extract IRQ/MMIO/IO resources from _CRS and passes them via
+    /// `struct resource` to the platform driver's `probe()` callback.
+    fn probe_acpi_device(
+        &self,
+        info: &DeviceInfo,
+        device_key: &str,
+        actual_path: &str,
+    ) -> ProbeResult {
+        let device_path = acpi_device_path(info);
+
+        // Query ACPI resources for this device.
+        // Uses the AcpiBus resource query API which reads _CRS data.
+        let acpi_bus = AcpiBus::new();
+        let resources = acpi_bus.query_device_resources(&info.id.path);
+
+        let mut cmd = Command::new(actual_path);
+        for arg in &self.command[1..] {
+            cmd.arg(arg);
+        }
+
+        // Pass device identification
+        cmd.env("ACPI_DEVICE_PATH", &device_path);
+        cmd.env("ACPI_DEVICE_NAME", &info.id.path);
+
+        // Pass _HID if available
+        if let Some(ref desc) = info.description {
+            cmd.env("ACPI_DEVICE_DESCRIPTION", desc);
+        }
+
+        // Extract and pass MMIO regions as env vars.
+        // Format: ACPI_MMIO_0=base,length ACPI_MMIO_1=base,length ...
+        let mmio_regions = redox_driver_acpi::extract_mmio_regions(&resources);
+        for (i, region) in mmio_regions.iter().enumerate() {
+            cmd.env(
+                format!("ACPI_MMIO_{}", i),
+                format!("{:#x},{:#x}", region.base, region.length),
+            );
+        }
+        if !mmio_regions.is_empty() {
+            cmd.env("ACPI_MMIO_COUNT", mmio_regions.len().to_string());
+        }
+
+        // Extract and pass IRQ info as env vars.
+        // Format: ACPI_IRQ_0=gsi,triggering,polarity ACPI_IRQ_1=gsi,triggering,polarity ...
+        let irqs = redox_driver_acpi::extract_irqs(&resources);
+        for (i, irq) in irqs.iter().enumerate() {
+            let trigger = match irq.triggering {
+                redox_driver_acpi::TriggerMode::Edge => "edge",
+                redox_driver_acpi::TriggerMode::Level => "level",
+            };
+            let polarity = match irq.polarity {
+                redox_driver_acpi::Polarity::ActiveHigh => "high",
+                redox_driver_acpi::Polarity::ActiveLow => "low",
+                redox_driver_acpi::Polarity::ActiveBoth => "both",
+            };
+            cmd.env(
+                format!("ACPI_IRQ_{}", i),
+                format!("{:#x},{},{}", irq.gsi, trigger, polarity),
+            );
+        }
+        if !irqs.is_empty() {
+            cmd.env("ACPI_IRQ_COUNT", irqs.len().to_string());
+        }
+
+        // Extract and pass I/O port ranges
+        let io_ports = redox_driver_acpi::extract_io_ports(&resources);
+        for (i, port) in io_ports.iter().enumerate() {
+            cmd.env(
+                format!("ACPI_IO_{}", i),
+                format!("{:#x},{:#x}", port.base, port.length),
+            );
+        }
+        if !io_ports.is_empty() {
+            cmd.env("ACPI_IO_COUNT", io_ports.len().to_string());
+        }
+
+        // ACPI drivers don't use a pcid channel — they access hardware
+        // via scheme:memory (MMIO) and scheme:irq directly.
+        // Create a dummy fd to satisfy the spawn signature.
+        // The driver reads resources from the env vars above.
+        let dev_null = match std::fs::File::open("/scheme/null") {
+            Ok(f) => unsafe { OwnedFd::from_raw_fd(f.as_raw_fd()) },
+            Err(_) => {
+                // Fallback: open /dev/null on Linux hosts during testing
+                match std::fs::File::open("/dev/null") {
+                    Ok(f) => unsafe { OwnedFd::from_raw_fd(f.as_raw_fd()) },
+                    Err(e) => {
+                        return ProbeResult::Fatal {
+                            reason: format!("cannot open null device for ACPI channel: {}", e),
+                        };
+                    }
+                }
+            }
+        };
+
+        self.spawn_driver(cmd, device_key, dev_null)
+    }
+
+    /// Common driver spawn logic — shared by PCI and ACPI probe paths.
+    fn spawn_driver(
+        &self,
+        mut cmd: Command,
+        device_key: &str,
+        channel_fd: OwnedFd,
+    ) -> ProbeResult {
+        match cmd.spawn() {
+            Ok(child) => {
+                let pid = child.id();
+                log::info!(
+                    "driver {} spawned (pid {}) for device {}",
+                    self.name,
+                    pid,
+                    device_key
+                );
+                let mut spawned = match self.spawned.lock() {
+                    Ok(spawned) => spawned,
+                    Err(err) => {
+                        return ProbeResult::Fatal {
+                            reason: format!("spawn state lock poisoned after spawn: {err}"),
+                        };
+                    }
+                };
+                spawned.insert(device_key.to_string(), SpawnedDriver { child, channel_fd });
+                ProbeResult::Bound
+            }
+            Err(e) => ProbeResult::Fatal {
+                reason: format!("spawn failed: {}", e),
+            },
+        }
+    }
+}
+
 /// Driver-specified dependencies. Parsed from [driver.depends] TOML field.
 /// Example: depends_on = ["pci", "acpi"]
 /// When specified, takes precedence over guess_dependencies().
@@ -383,7 +639,7 @@ struct RawDriverEntry {
    priority: i32,
    #[serde(default)]
    command: Vec<String>,
-    #[serde(rename = "match")]
+    #[serde(rename = "match", default)]
    r#match: Vec<RawDriverMatch>,
    #[serde(default)]
    depends_on: Vec<String>,
@@ -26,7 +26,6 @@ pub fn run_hotplug_loop(
    );

    let mut deferred_retries: BTreeMap<(String, String), u32> = BTreeMap::new();
-    let mut permanently_fatal: BTreeSet<(String, String)> = BTreeSet::new();

    loop {
        thread::sleep(Duration::from_millis(poll_interval_ms));
@@ -67,15 +66,6 @@ pub fn run_hotplug_loop(
                    track_pci_device(device, &mut seen_pci_devices);
                    let key = (device.path.clone(), driver_name.clone());

-                    // Skip devices that were permanently fatal in a previous cycle.
-                    // enumerate() re-probes all unbound devices each poll, but a Fatal
-                    // result means the driver binary is genuinely absent (e.g. ided on
-                    // a live ISO that doesn't ship it) — no amount of re-probing will
-                    // change the outcome.
-                    if permanently_fatal.contains(&key) {
-                        continue;
-                    }
-
                    match result {
                        ProbeResult::Bound => {
                            log::info!("hotplug: bound {} -> {}", device.path, driver_name);
@@ -99,6 +89,12 @@ pub fn run_hotplug_loop(
                                    MAX_DEFERRED_RETRIES,
                                    reason
                                );
+                                if let Ok(mut skipped) = crate::config::PERMANENTLY_SKIPPED.lock() {
+                                    skipped.insert((
+                                        device.path.clone(),
+                                        driver_name.clone(),
+                                    ));
+                                }
                            }
                        }
                        ProbeResult::Fatal { reason } => {
@@ -108,9 +104,20 @@ pub fn run_hotplug_loop(
                                driver_name,
                                reason
                            );
-                            permanently_fatal.insert(key);
+                            if let Ok(mut skipped) = crate::config::PERMANENTLY_SKIPPED.lock() {
+                                skipped.insert(key);
+                            }
+                        }
+                        ProbeResult::NotSupported => {
+                            log::debug!(
+                                "hotplug: not supported {} -> {}",
+                                device.path,
+                                driver_name
+                            );
+                            if let Ok(mut skipped) = crate::config::PERMANENTLY_SKIPPED.lock() {
+                                skipped.insert(key);
+                            }
                        }
-                        _ => {}
                    }
                }
                ProbeEvent::NoDriverFound { device } => {
@@ -200,6 +207,8 @@ fn track_pci_device(device: &DeviceId, seen_pci_devices: &mut BTreeSet<String>)
 }

 fn notify_bound_device(scheme: &DriverManagerScheme, device: &DeviceId, driver_name: &str) {
+    // PCI devices use the pcid-compatible bind notification.
+    // ACPI devices may be notified through other mechanisms in the future.
    if device.bus == "pci" {
        notify_bind(scheme, &device.path, driver_name);
    }
@@ -3,6 +3,7 @@ mod exec;
 mod hotplug;
 mod scheme;

+use std::sync::atomic::{AtomicBool, Ordering};
 use std::sync::{Arc, Mutex};
 use std::thread;
 use std::time::{Duration, Instant};
@@ -12,12 +13,26 @@ use redox_driver_core::device::DeviceId;
 use redox_driver_core::driver::ProbeResult;
 use redox_driver_core::manager::{DeviceManager, ManagerConfig, ProbeEvent};
 use redox_driver_pci::PciBus;
+use redox_driver_acpi::AcpiBus;
 use std::fs::OpenOptions;
 use std::io::Write;

 use config::DriverConfig;
 use scheme::{DriverManagerScheme, notify_bind};

+/// Global flag set by SIGTERM handler to request graceful shutdown.
+static SHUTDOWN_REQUESTED: AtomicBool = AtomicBool::new(false);
+
+extern "C" fn sigterm_handler(_sig: i32) {
+    SHUTDOWN_REQUESTED.store(true, Ordering::SeqCst);
+}
+
+fn install_sigterm_handler() {
+    unsafe {
+        libc::signal(libc::SIGTERM, sigterm_handler as *const () as usize);
+    }
+}
+
 struct StderrLogger;

 const BOOT_TIMELINE_PATH: &str = "/tmp/redbear-boot-timeline.json";
@@ -37,6 +52,7 @@ impl log::Log for StderrLogger {
 fn run_enumeration(
    manager: &Arc<Mutex<DeviceManager>>,
    scheme: &DriverManagerScheme,
+    initfs: bool,
 ) -> (usize, usize) {
    let enum_start = Instant::now();
    let events = match manager.lock() {
@@ -77,8 +93,12 @@ fn run_enumeration(
                log::info!("bus {} enumerated {} device(s)", bus, device_count);
            }
            ProbeEvent::BusEnumerationFailed { bus, error } => {
+                if initfs && *bus == "pci" {
+                    log::warn!("bus {} enumeration not yet ready (initfs, pcid may still be starting): {:?}", bus, error);
+                } else {
                    log::error!("bus {} enumeration failed: {:?}", bus, error);
                }
+            }
            ProbeEvent::AlreadyBound {
                device,
                driver_name,
@@ -113,14 +133,19 @@ fn run_enumeration(
 }

 fn notify_bound_device(scheme: &DriverManagerScheme, device: &DeviceId, driver_name: &str) {
-    if device.bus == "pci" {
+    // Notify for both PCI and ACPI devices
    notify_bind(scheme, &device.path, driver_name);
-    }
 }

 fn reset_timeline_log() {
-    if let Err(err) = fs::write(BOOT_TIMELINE_PATH, "") {
-        log::warn!("failed to reset boot timeline log at {BOOT_TIMELINE_PATH}: {err}");
+    // Best-effort: truncate or create empty. On scheme filesystems that
+    // don't support truncate on existing files, this may fail — that's OK,
+    // the append path will handle it.
+    match fs::write(BOOT_TIMELINE_PATH, "") {
+        Ok(()) => {}
+        Err(_) => {
+            let _ = fs::remove_file(BOOT_TIMELINE_PATH);
+        }
    }
 }

@@ -213,22 +238,127 @@ fn log_timeline(event: &ProbeEvent) {
    {
        Ok(mut file) => {
            if let Err(err) = writeln!(file, "{entry}") {
-                log::warn!("failed to append boot timeline entry to {BOOT_TIMELINE_PATH}: {err}");
+                // EPIPE or other write errors can occur when /tmp is backed
+                // by a scheme that doesn't support append writes, or when the
+                // filesystem is not yet fully ready. Log once and suppress
+                // all subsequent write errors to avoid log spam.
+                static WRITE_ERROR_LOGGED: std::sync::atomic::AtomicBool = std::sync::atomic::AtomicBool::new(false);
+                if !WRITE_ERROR_LOGGED.swap(true, std::sync::atomic::Ordering::Relaxed) {
+                    log::warn!("failed to append boot timeline entry to {BOOT_TIMELINE_PATH}: {err} (suppressing further write errors)");
+                }
            }
        }
        Err(err) => {
-            log::warn!("failed to open boot timeline log at {BOOT_TIMELINE_PATH}: {err}");
+            // EEXIST (os error 17) can occur when the file already exists
+            // but the scheme filesystem doesn't support create+append.
+            // EPIPE and other errors occur when /tmp isn't ready.
+            // Log once and suppress all subsequent open errors.
+            static OPEN_ERROR_LOGGED: std::sync::atomic::AtomicBool = std::sync::atomic::AtomicBool::new(false);
+            if !OPEN_ERROR_LOGGED.swap(true, std::sync::atomic::Ordering::Relaxed) {
+                log::warn!("failed to open boot timeline log at {BOOT_TIMELINE_PATH}: {err} (suppressing further open errors)");
            }
        }
+    }
+}
+
+fn run_status() {
+    // Print the boot timeline log if it exists.
+    match fs::read_to_string(BOOT_TIMELINE_PATH) {
+        Ok(content) => {
+            if content.trim().is_empty() {
+                println!("No boot timeline data found at {}", BOOT_TIMELINE_PATH);
+                println!("Driver manager has not completed enumeration yet.");
+                return;
+            }
+
+            println!("=== Red Bear OS Driver Manager Status ===");
+            println!();
+
+            let mut bound = 0usize;
+            let mut deferred = 0usize;
+            let mut failed = 0usize;
+            let mut no_driver = 0usize;
+            let mut buses = Vec::new();
+
+            for line in content.lines() {
+                if line.trim().is_empty() {
+                    continue;
+                }
+                // Parse JSON timeline entries
+                if line.contains("\"event\":\"bus_enumerated\"") {
+                    if let Some(bus) = extract_json_string(line, "bus") {
+                        if let Some(count) = extract_json_number(line, "count") {
+                            buses.push((bus, count));
+                        }
+                    }
+                } else if line.contains("\"status\":\"bound\"") {
+                    bound += 1;
+                } else if line.contains("\"status\":\"deferred\"") {
+                    deferred += 1;
+                } else if line.contains("\"status\":\"failed\"") {
+                    failed += 1;
+                } else if line.contains("\"event\":\"no_driver\"") {
+                    no_driver += 1;
+                }
+            }
+
+            println!("Bus enumeration:");
+            for (bus, count) in &buses {
+                println!("  {}: {} device(s)", bus, count);
+            }
+            println!();
+            println!("Driver binding:");
+            println!("  bound:    {}", bound);
+            println!("  deferred: {}", deferred);
+            println!("  failed:   {}", failed);
+            println!("  no driver: {}", no_driver);
+            println!();
+            println!("Timeline log: {}", BOOT_TIMELINE_PATH);
+        }
+        Err(err) => {
+            println!("Cannot read {}: {}", BOOT_TIMELINE_PATH, err);
+            println!("Driver manager may not have run yet.");
+        }
+    }
+}
+
+/// Extract a JSON string value for a given key from a single-line JSON object.
+fn extract_json_string(line: &str, key: &str) -> Option<String> {
+    let pattern = format!("\"{}\":\"", key);
+    let start = line.find(&pattern)?;
+    let value_start = start + pattern.len();
+    let end = line[value_start..].find('"')?;
+    Some(line[value_start..value_start + end].to_string())
+}
+
+/// Extract a JSON number value for a given key from a single-line JSON object.
+fn extract_json_number(line: &str, key: &str) -> Option<usize> {
+    let pattern = format!("\"{}\":", key);
+    let start = line.find(&pattern)?;
+    let value_start = start + pattern.len();
+    let rest = &line[value_start..];
+    let end = rest.find(|c: char| !c.is_ascii_digit()).unwrap_or(rest.len());
+    rest[..end].parse().ok()
 }

 fn main() {
    log::set_logger(&StderrLogger).ok();
    log::set_max_level(log::LevelFilter::Info);

+    // Install SIGTERM handler for graceful shutdown
+    install_sigterm_handler();
+
    let args: Vec<String> = env::args().collect();
    let initfs = args.iter().any(|a| a == "--initfs");
    let hotplug_mode = args.iter().any(|a| a == "--hotplug");
+    let status_mode = args.iter().any(|a| a == "--status");
+
+    // --status: print the current device registry from the boot timeline log
+    // and exit. This is for diagnostics: "what did driver-manager find?"
+    if status_mode {
+        run_status();
+        return;
+    }

    let config_dir = if initfs {
        "/scheme/initfs/lib/drivers.d"
@@ -262,8 +392,17 @@ fn main() {

    match manager.lock() {
        Ok(mut mgr) => {
+            // Register PCI bus first (higher priority — storage, network, GPU).
+            // Mirrors Linux's pci_scan_child_bus() via subsys_initcall.
            mgr.register_bus(Box::new(PciBus::new()));

+            // Register ACPI bus for platform/I2C/SPI/GPIO/thermal devices.
+            // Mirrors Linux's acpi_bus_scan() which walks the namespace for
+            // _HID/_CID/_STA/_CRS. ACPI devices are enumerated from
+            // /scheme/acpi/symbols/ which acpid populates from the AML
+            // interpreter.
+            mgr.register_bus(Box::new(AcpiBus::new()));
+
            for dc in &driver_configs {
                mgr.register_driver(Box::new(dc.clone()));
            }
@@ -277,11 +416,14 @@ fn main() {
    let mgr_clone = Arc::clone(&manager);
    let scheme_clone = Arc::clone(&scheme);

+    // Ensure /tmp exists before writing the boot timeline log.
+    let _ = std::fs::create_dir_all("/tmp");
+
    reset_timeline_log();

    if manager_config.async_probe {
        let handle = thread::spawn(move || {
-            let (bound, deferred) = run_enumeration(&mgr_clone, scheme_clone.as_ref());
+            let (bound, deferred) = run_enumeration(&mgr_clone, scheme_clone.as_ref(), initfs);
            log::info!("async enum: {} bound, {} deferred", bound, deferred);
        });
        if handle.join().is_err() {
@@ -289,14 +431,22 @@ fn main() {
            process::exit(1);
        }
    } else {
-        let (bound, deferred) = run_enumeration(&manager, scheme.as_ref());
+        let (bound, deferred) = run_enumeration(&manager, scheme.as_ref(), initfs);
        log::info!("enum complete: {} bound, {} deferred", bound, deferred);
    }

-    if let Err(err) = scheme::start_scheme_server(Arc::clone(&scheme)) {
+    match scheme::start_scheme_server(Arc::clone(&scheme)) {
+        Ok(true) => {
+            log::info!("driver-manager: scheme server started successfully");
+        }
+        Ok(false) => {
+            log::warn!("driver-manager: scheme already registered — another instance is active, continuing without scheme server");
+        }
+        Err(err) => {
            log::error!("{err}");
            process::exit(1);
        }
+    }

    if hotplug_mode {
        log::info!("entering hotplug event loop");
@@ -304,8 +454,17 @@ fn main() {
        idle_forever();
    }

-    let max_retries = 30u32;
+    let max_retries = 3u32;
    for retry in 1..=max_retries {
+        if SHUTDOWN_REQUESTED.load(Ordering::SeqCst) {
+            log::info!("driver-manager: SIGTERM received during deferred retry, shutting down");
+            graceful_shutdown();
+            process::exit(0);
+        }
+
+        // Check for crashed drivers during retry loop
+        reap_all_drivers(&driver_configs);
+
        thread::sleep(Duration::from_millis(500));

        let retry_events = match manager.lock() {
@@ -360,6 +519,35 @@ fn main() {
 fn idle_forever() -> ! {
    log::info!("driver-manager: entering persistent idle loop");
    loop {
-        thread::sleep(Duration::from_secs(3600));
+        thread::sleep(Duration::from_secs(5));
+        if SHUTDOWN_REQUESTED.load(Ordering::SeqCst) {
+            log::info!("driver-manager: SIGTERM received, performing graceful shutdown");
+            graceful_shutdown();
+            process::exit(0);
+        }
+        // Periodically check for exited child drivers
+        reap_all_drivers(&[]);
    }
 }
+
+/// Poll all driver configs for exited children and log the results.
+fn reap_all_drivers(driver_configs: &[DriverConfig]) {
+    for dc in driver_configs {
+        let exited = dc.reap_exited_children();
+        for (device_key, driver_name, code) in &exited {
+            log::warn!(
+                "reaped crashed driver: {} for device {} (exit {})",
+                driver_name,
+                device_key,
+                code
+            );
+        }
+    }
+}
+
+fn graceful_shutdown() {
+    // The DeviceManager and spawned children are managed by DriverConfig instances
+    // which track their child processes. On shutdown, we log and exit cleanly.
+    // Child processes will be orphaned but the kernel reaps them.
+    log::info!("driver-manager: clean shutdown complete");
+}
@@ -112,9 +112,9 @@ impl DriverManagerScheme {
            ["devices"] => Ok(HandleKind::Devices),
            ["bound"] => Ok(HandleKind::Bound),
            ["events"] => Ok(HandleKind::Events),
-            ["devices", pci_addr] if Self::valid_pci_addr(pci_addr) => {
-                let _ = self.device_status(pci_addr)?;
-                Ok(HandleKind::Device((*pci_addr).to_string()))
+            ["devices", addr] if Self::valid_device_addr(addr) => {
+                let _ = self.device_status(addr)?;
+                Ok(HandleKind::Device((*addr).to_string()))
            }
            _ => Err(Error::new(ENOENT)),
        }
@@ -127,7 +127,7 @@ impl DriverManagerScheme {
            return Ok(HandleKind::Devices);
        }

-        if trimmed.contains('/') || !Self::valid_pci_addr(trimmed) {
+        if trimmed.contains('/') || !Self::valid_device_addr(trimmed) {
            return Err(Error::new(ENOENT));
        }

@@ -228,6 +228,23 @@ impl DriverManagerScheme {
                .all(|ch| ch.is_ascii_hexdigit() || matches!(ch, ':' | '.'))
    }

+    /// Validate a device address for both PCI and ACPI devices.
+    ///
+    /// PCI addresses contain colons and dots (e.g., "0000:00:1f.2").
+    /// ACPI device names are alphanumeric 4-char segments (e.g., "PCI0", "I2C0", "GPI0").
+    #[cfg(target_os = "redox")]
+    fn valid_device_addr(value: &str) -> bool {
+        // Accept PCI-style addresses
+        if Self::valid_pci_addr(value) {
+            return true;
+        }
+        // Accept ACPI device names (alphanumeric, dots for child paths)
+        !value.is_empty()
+            && value
+                .chars()
+                .all(|ch| ch.is_ascii_alphanumeric() || matches!(ch, '.' | '_'))
+    }
+
    fn push_event_line(&self, line: String) {
        match self.events.lock() {
            Ok(mut events) => {
@@ -366,11 +383,15 @@ pub fn notify_bind(scheme: &DriverManagerScheme, pci_addr: &str, driver_name: &s
    ));

    if let Err(err) = write_driver_param(pci_addr, "driver", driver_name) {
+        if err.kind() != std::io::ErrorKind::BrokenPipe {
            log::warn!("driver-manager: failed to write driver param for {pci_addr}: {err}");
        }
+    }
    if let Err(err) = write_driver_param(pci_addr, "enabled", "true") {
+        if err.kind() != std::io::ErrorKind::BrokenPipe {
            log::warn!("driver-manager: failed to write enabled param for {pci_addr}: {err}");
        }
+    }
 }

 pub fn notify_unbind(scheme: &DriverManagerScheme, pci_addr: &str) {
@@ -392,21 +413,31 @@ pub fn notify_unbind(scheme: &DriverManagerScheme, pci_addr: &str) {
    scheme.push_event_line(event_line);

    if let Err(err) = write_driver_param(pci_addr, "driver", "") {
+        if err.kind() != std::io::ErrorKind::BrokenPipe {
            log::warn!("driver-manager: failed to clear driver param for {pci_addr}: {err}");
        }
+    }
    if let Err(err) = write_driver_param(pci_addr, "enabled", "false") {
+        if err.kind() != std::io::ErrorKind::BrokenPipe {
            log::warn!("driver-manager: failed to write disabled param for {pci_addr}: {err}");
        }
+    }
 }

 #[cfg(target_os = "redox")]
-pub fn start_scheme_server(scheme: Arc<DriverManagerScheme>) -> std::result::Result<(), String> {
+pub fn start_scheme_server(scheme: Arc<DriverManagerScheme>) -> std::result::Result<bool, String> {
    let socket = Socket::create()
        .map_err(|err| format!("driver-manager: failed to create scheme socket: {err}"))?;
    let mut server = SchemeServer::new(scheme);

-    register_sync_scheme(&socket, SCHEME_NAME, &mut server)
-        .map_err(|err| format!("driver-manager: failed to register scheme:{SCHEME_NAME}: {err}"))?;
+    if let Err(err) = register_sync_scheme(&socket, SCHEME_NAME, &mut server) {
+        let msg = format!("{err}");
+        if msg.contains("File exists") {
+            log::warn!("driver-manager: scheme:{SCHEME_NAME} already registered (initfs instance active), returning gracefully");
+            return Ok(false);
+        }
+        return Err(format!("driver-manager: failed to register scheme:{SCHEME_NAME}: {err}"));
+    }

    log::info!("driver-manager: registered scheme:{SCHEME_NAME}");

@@ -439,10 +470,10 @@ pub fn start_scheme_server(scheme: Arc<DriverManagerScheme>) -> std::result::Res
        })
        .map_err(|err| format!("driver-manager: failed to spawn scheme server thread: {err}"))?;

-    Ok(())
+    Ok(true)
 }

 #[cfg(not(target_os = "redox"))]
-pub fn start_scheme_server(_scheme: Arc<DriverManagerScheme>) -> std::result::Result<(), String> {
-    Ok(())
+pub fn start_scheme_server(_scheme: Arc<DriverManagerScheme>) -> std::result::Result<bool, String> {
+    Ok(true)
 }
@@ -4,6 +4,7 @@ pub mod acpi;
 pub mod amd_vi;
 pub mod command_buffer;
 pub mod device_table;
+pub mod intel_vtd;
 pub mod interrupt;
 pub mod mmio;
 pub mod page_table;
@@ -12,6 +13,7 @@ use std::collections::BTreeMap;

 use acpi::{parse_bdf, Bdf};
 use amd_vi::AmdViUnit;
+use intel_vtd::IntelVtdUnit;
 use page_table::{DomainPageTables, MappingFlags};
 use redox_scheme::SchemeBlockMut;
 use syscall::data::Stat;
@@ -161,7 +163,8 @@ struct Handle {
 }

 pub struct IommuScheme {
-    units: Vec<AmdViUnit>,
+    amd_units: Vec<AmdViUnit>,
+    intel_units: Vec<IntelVtdUnit>,
    next_id: usize,
    handles: BTreeMap<usize, Handle>,
    domains: BTreeMap<u16, DomainPageTables>,
@@ -170,12 +173,13 @@ pub struct IommuScheme {

 impl IommuScheme {
    pub fn new() -> Self {
-        Self::with_units(Vec::new())
+        Self::with_units(Vec::new(), Vec::new())
    }

-    pub fn with_units(units: Vec<AmdViUnit>) -> Self {
+    pub fn with_units(amd_units: Vec<AmdViUnit>, intel_units: Vec<IntelVtdUnit>) -> Self {
        Self {
-            units,
+            amd_units,
+            intel_units,
            next_id: 0,
            handles: BTreeMap::new(),
            domains: BTreeMap::new(),
@@ -184,7 +188,7 @@ impl IommuScheme {
    }

    pub fn unit_count(&self) -> usize {
-        self.units.len()
+        self.amd_units.len() + self.intel_units.len()
    }

    fn insert_handle(&mut self, kind: HandleKind) -> usize {
@@ -216,40 +220,67 @@ impl IommuScheme {
    }

    fn ensure_unit_initialized(&mut self, unit_index: usize) -> core::result::Result<(), i32> {
-        let Some(unit) = self.units.get_mut(unit_index) else {
-            return Err(ENODEV as i32);
-        };
-
+        if let Some(unit) = self.amd_units.get_mut(unit_index) {
            if unit.initialized() {
                return Ok(());
            }
-
-        unit.init().map_err(|err| {
+            return unit.init().map_err(|err| {
                log::error!(
-                "iommu: failed to initialize unit {} at MMIO {:#x}: {}",
+                    "iommu: failed to initialize AMD-Vi unit {} at MMIO {:#x}: {}",
                    unit_index,
                    unit.info().mmio_base,
                    err
                );
                EIO as i32
-        })
+            });
+        }
+        let intel_index = unit_index.saturating_sub(self.amd_units.len());
+        if let Some(unit) = self.intel_units.get_mut(intel_index) {
+            if unit.initialized() {
+                return Ok(());
+            }
+            return unit.init().map_err(|err| {
+                log::error!(
+                    "iommu: failed to initialize Intel VT-d unit {} at MMIO {:#x}: {}",
+                    intel_index,
+                    unit.info().mmio_base,
+                    err
+                );
+                EIO as i32
+            });
+        }
+        Err(ENODEV as i32)
    }

    fn root_listing(&self) -> Vec<u8> {
        let mut listing = String::from("control\n");
-        for (index, unit) in self.units.iter().enumerate() {
+        for (index, unit) in self.amd_units.iter().enumerate() {
            let state = if unit.initialized() {
                "initialized"
            } else {
                "detected"
            };
            listing.push_str(&format!(
-                "unit/{index} {} mmio={:#x} state={}\n",
+                "unit/{index} {} mmio={:#x} state={} type=amd\n",
                unit.info().iommu_bdf,
                unit.info().mmio_base,
                state
            ));
        }
+        let intel_offset = self.amd_units.len();
+        for (index, unit) in self.intel_units.iter().enumerate() {
+            let state = if unit.initialized() {
+                "initialized"
+            } else {
+                "detected"
+            };
+            listing.push_str(&format!(
+                "unit/{} mmio={:#x} state={} type=intel\n",
+                intel_offset + index,
+                unit.info().mmio_base,
+                state
+            ));
+        }
        for domain_id in self.domains.keys() {
            listing.push_str(&format!("domain/{domain_id}\n"));
        }
@@ -295,19 +326,31 @@ impl IommuScheme {
        requested_unit: Option<usize>,
    ) -> core::result::Result<usize, i32> {
        if let Some(index) = requested_unit {
-            let Some(unit) = self.units.get(index) else {
-                return Err(ENODEV as i32);
-            };
+            if let Some(unit) = self.amd_units.get(index) {
                if unit.handles_device(bdf) {
                    return Ok(index);
                }
                return Err(ENODEV as i32);
            }
+            let intel_index = index.saturating_sub(self.amd_units.len());
+            if let Some(unit) = self.intel_units.get(intel_index) {
+                if unit.handles_device(bdf) {
+                    return Ok(index);
+                }
+            }
+            return Err(ENODEV as i32);
+        }

-        self.units
-            .iter()
-            .position(|unit| unit.handles_device(bdf))
-            .ok_or(ENODEV as i32)
+        if let Some(index) = self.amd_units.iter().position(|unit| unit.handles_device(bdf)) {
+            return Ok(index);
+        }
+        let intel_offset = self.amd_units.len();
+        if let Some(index) = self.intel_units.iter().position(|unit| {
+            unit.handles_device(bdf)
+        }) {
+            return Ok(intel_offset + index);
+        }
+        Err(ENODEV as i32)
    }

    fn dispatch_request(&mut self, kind: HandleKind, request: IommuRequest) -> IommuResponse {
@@ -327,10 +370,11 @@ impl IommuScheme {
        match request.opcode {
            opcode::QUERY => IommuResponse::success(
                request.opcode,
-                self.units.len() as u32,
+                self.unit_count() as u32,
                self.domains.len() as u64,
                self.device_assignments.len() as u64,
-                self.units.iter().filter(|unit| unit.initialized()).count() as u64,
+                self.amd_units.iter().filter(|unit| unit.initialized()).count() as u64
+                    + self.intel_units.iter().filter(|unit| unit.initialized()).count() as u64,
            ),
            opcode::INIT_UNITS => {
                let requested_index = if request.arg0 == u32::MAX {
@@ -341,17 +385,18 @@ impl IommuScheme {

                let mut initialized_now = 0u32;
                let mut attempted = 0u64;
-                for index in 0..self.units.len() {
+                let total_units = self.unit_count();
+                for index in 0..total_units {
                    if requested_index.is_some() && requested_index != Some(index) {
                        continue;
                    }

                    attempted += 1;
-                    let was_initialized = self
-                        .units
-                        .get(index)
-                        .map(|unit| unit.initialized())
-                        .unwrap_or(false);
+                    let was_initialized = if index < self.amd_units.len() {
+                        self.amd_units.get(index).map(|unit| unit.initialized()).unwrap_or(false)
+                    } else {
+                        self.intel_units.get(index - self.amd_units.len()).map(|unit| unit.initialized()).unwrap_or(false)
+                    };

                    if let Err(errno) = self.ensure_unit_initialized(index) {
                        return IommuResponse::error(request.opcode, errno);
@@ -363,7 +408,8 @@ impl IommuScheme {
                }

                let initialized_total =
-                    self.units.iter().filter(|unit| unit.initialized()).count() as u64;
+                    self.amd_units.iter().filter(|unit| unit.initialized()).count() as u64
+                        + self.intel_units.iter().filter(|unit| unit.initialized()).count() as u64;

                IommuResponse::success(
                    request.opcode,
@@ -425,7 +471,7 @@ impl IommuScheme {
                let mut first_device = 0u64;
                let mut first_address = 0u64;

-                for (index, unit) in self.units.iter_mut().enumerate() {
+                for (index, unit) in self.amd_units.iter_mut().enumerate() {
                    if requested_index.is_some() && requested_index != Some(index) {
                        continue;
                    }
@@ -577,10 +623,10 @@ impl IommuScheme {
                    return IommuResponse::error(request.opcode, ENOENT as i32);
                };

-                let Some(unit) = self.units.get_mut(unit_index) else {
+                if unit_index < self.amd_units.len() {
+                    let Some(unit) = self.amd_units.get_mut(unit_index) else {
                        return IommuResponse::error(request.opcode, ENODEV as i32);
                    };
-
                    match unit.assign_device(bdf, domain) {
                        Ok(()) => {
                            self.device_assignments.insert(bdf, (domain_id, unit_index));
@@ -594,13 +640,17 @@ impl IommuScheme {
                        }
                        Err(_) => IommuResponse::error(request.opcode, EIO as i32),
                    }
+                } else {
+                    IommuResponse::error(request.opcode, ENODEV as i32)
+                }
            }
            opcode::UNASSIGN_DEVICE => {
                let Some((domain_id, unit_index)) = self.device_assignments.remove(&bdf) else {
                    return IommuResponse::error(request.opcode, ENOENT as i32);
                };

-                let unit = self.units.get_mut(unit_index);
+                if unit_index < self.amd_units.len() {
+                    let unit = self.amd_units.get_mut(unit_index);
                    if let Some(unit) = unit {
                        if unit.initialized() {
                            if let Err(err) = unit.unassign_device(bdf) {
@@ -611,6 +661,7 @@ impl IommuScheme {
                            }
                        }
                    }
+                }

                IommuResponse::success(
                    request.opcode,
@@ -9,6 +9,7 @@ use std::path::PathBuf;
 use std::process;

 use iommu::amd_vi::AmdViUnit;
+use iommu::intel_vtd::{IntelVtdUnit, parse_dmar};
 #[cfg(target_os = "redox")]
 use iommu::IommuScheme;
 use log::{error, info, LevelFilter, Metadata, Record};
@@ -27,7 +28,8 @@ struct StderrLogger {

 #[cfg_attr(not(target_os = "redox"), allow(dead_code))]
 struct DiscoveryResult {
-    units: Vec<AmdViUnit>,
+    amd_units: Vec<AmdViUnit>,
+    intel_units: Vec<IntelVtdUnit>,
    source: DiscoverySource,
    kernel_acpi_status: &'static str,
    ivrs_path: Option<PathBuf>,
@@ -196,6 +198,17 @@ fn detect_dmar_from_kernel_acpi() -> Result<bool, String> {
    Ok(find_kernel_acpi_table(b"DMAR")?.is_some())
 }

+#[cfg(target_os = "redox")]
+fn detect_intel_units_from_kernel_acpi() -> Result<Vec<IntelVtdUnit>, String> {
+    match find_kernel_acpi_table(b"DMAR")? {
+        Some(table) => {
+            let infos = parse_dmar(&table).map_err(|err| format!("failed to parse DMAR: {err}"))?;
+            Ok(infos.into_iter().map(IntelVtdUnit::from_info).collect())
+        }
+        None => Ok(Vec::new()),
+    }
+}
+
 #[cfg(target_os = "redox")]
 fn discover_units() -> Result<DiscoveryResult, String> {
    let dmar_present = match detect_dmar_from_kernel_acpi() {
@@ -206,9 +219,18 @@ fn discover_units() -> Result<DiscoveryResult, String> {
        }
    };

+    let intel_units = match detect_intel_units_from_kernel_acpi() {
+        Ok(units) => units,
+        Err(err) => {
+            info!("iommu: Intel VT-d discovery unavailable: {err}");
+            Vec::new()
+        }
+    };
+
    match detect_units_from_kernel_acpi() {
        Ok(units) if !units.is_empty() => Ok(DiscoveryResult {
-            units,
+            amd_units: units,
+            intel_units,
            source: DiscoverySource::KernelAcpi,
            kernel_acpi_status: "ok",
            ivrs_path: None,
@@ -222,7 +244,8 @@ fn discover_units() -> Result<DiscoveryResult, String> {
                } else {
                    DiscoverySource::None
                },
-                units,
+                amd_units: units,
+                intel_units,
                kernel_acpi_status: "empty",
                ivrs_path,
                dmar_present,
@@ -237,7 +260,8 @@ fn discover_units() -> Result<DiscoveryResult, String> {
                } else {
                    DiscoverySource::None
                },
-                units,
+                amd_units: units,
+                intel_units,
                kernel_acpi_status: "error",
                ivrs_path,
                dmar_present,
@@ -255,7 +279,8 @@ fn discover_units() -> Result<DiscoveryResult, String> {
        } else {
            DiscoverySource::None
        },
-        units,
+        amd_units: units,
+        intel_units: Vec::new(),
        kernel_acpi_status: "unsupported",
        ivrs_path,
        dmar_present: false,
@@ -265,9 +290,9 @@ fn discover_units() -> Result<DiscoveryResult, String> {
 #[cfg(target_os = "redox")]
 fn run() -> Result<(), String> {
    let discovery = discover_units()?;
-    if discovery.units.is_empty() {
+    if discovery.amd_units.is_empty() && discovery.intel_units.is_empty() {
        info!(
-            "iommu: no AMD-Vi units found (source={}, kernel_acpi_status={}, ivrs_path={})",
+            "iommu: no IOMMU units found (source={}, kernel_acpi_status={}, ivrs_path={})",
            discovery.source.as_str(),
            discovery.kernel_acpi_status,
            discovery
@@ -277,20 +302,35 @@ fn run() -> Result<(), String> {
                .unwrap_or_else(|| "none".to_string())
        );
    } else {
+        if !discovery.amd_units.is_empty() {
            info!(
                "iommu: detected {} AMD-Vi unit(s) via {}",
-            discovery.units.len(),
+                discovery.amd_units.len(),
                discovery.source.as_str()
            );
        }
-    if discovery.dmar_present {
+        if !discovery.intel_units.is_empty() {
            info!(
-            "iommu: detected kernel ACPI DMAR table; Intel VT-d runtime ownership should converge here rather than remain in acpid"
+                "iommu: detected {} Intel VT-d unit(s)",
+                discovery.intel_units.len()
            );
        }
-    for (index, unit) in discovery.units.iter().enumerate() {
+    }
+    if discovery.dmar_present && discovery.intel_units.is_empty() {
        info!(
-            "iommu: discovered unit {} at MMIO {:#x}; initialization is deferred until first use",
+            "iommu: detected kernel ACPI DMAR table but failed to parse DRHD entries"
+        );
+    }
+    for (index, unit) in discovery.amd_units.iter().enumerate() {
+        info!(
+            "iommu: discovered AMD-Vi unit {} at MMIO {:#x}; initialization is deferred until first use",
+            index,
+            unit.info().mmio_base
+        );
+    }
+    for (index, unit) in discovery.intel_units.iter().enumerate() {
+        info!(
+            "iommu: discovered Intel VT-d unit {} at MMIO {:#x}; initialization is deferred until first use",
            index,
            unit.info().mmio_base
        );
@@ -300,7 +340,7 @@ fn run() -> Result<(), String> {
        Socket::create("iommu").map_err(|e| format!("failed to register iommu scheme: {e}"))?;
    info!("iommu: registered scheme:iommu");

-    let mut scheme = IommuScheme::with_units(discovery.units);
+    let mut scheme = IommuScheme::with_units(discovery.amd_units, discovery.intel_units);

    loop {
        let request = match socket.next_request(SignalBehavior::Restart) {
@@ -338,7 +378,9 @@ fn run() -> Result<(), String> {
 #[cfg(target_os = "redox")]
 fn run_self_test() -> Result<(), String> {
    let discovery = discover_units()?;
-    let mut units = discovery.units;
+    let mut amd_units = discovery.amd_units;
+    let mut intel_units = discovery.intel_units;
+    let total_units = amd_units.len() + intel_units.len();

    println!("discovery_source={}", discovery.source.as_str());
    println!("kernel_acpi_status={}", discovery.kernel_acpi_status);
@@ -351,19 +393,20 @@ fn run_self_test() -> Result<(), String> {
            .map(|path| path.display().to_string())
            .unwrap_or_else(|| "none".to_string())
    );
-    println!("units_detected={}", units.len());
-    if units.is_empty() {
-        return Err("iommu self-test detected zero AMD-Vi unit(s)".to_string());
+    println!("amd_units_detected={}", amd_units.len());
+    println!("intel_units_detected={}", intel_units.len());
+    if total_units == 0 {
+        return Err("iommu self-test detected zero IOMMU units".to_string());
    }

    let mut initialized_now = 0u32;
    let mut events_drained = 0u32;

-    for (index, unit) in units.iter_mut().enumerate() {
+    for (index, unit) in amd_units.iter_mut().enumerate() {
        let was_initialized = unit.initialized();
        unit.init().map_err(|err| {
            format!(
-                "iommu self-test failed to initialize unit {} at MMIO {:#x}: {}",
+                "iommu self-test failed to initialize AMD-Vi unit {} at MMIO {:#x}: {}",
                index,
                unit.info().mmio_base,
                err
@@ -376,7 +419,7 @@ fn run_self_test() -> Result<(), String> {

        let drained = unit.drain_events().map_err(|err| {
            format!(
-                "iommu self-test failed to drain events for unit {} at MMIO {:#x}: {}",
+                "iommu self-test failed to drain events for AMD-Vi unit {} at MMIO {:#x}: {}",
                index,
                unit.info().mmio_base,
                err
@@ -385,9 +428,26 @@ fn run_self_test() -> Result<(), String> {
        events_drained = events_drained.saturating_add(drained.len() as u32);
    }

-    let initialized_after = units.iter().filter(|unit| unit.initialized()).count() as u64;
+    for (index, unit) in intel_units.iter_mut().enumerate() {
+        let was_initialized = unit.initialized();
+        unit.init().map_err(|err| {
+            format!(
+                "iommu self-test failed to initialize Intel VT-d unit {} at MMIO {:#x}: {}",
+                index,
+                unit.info().mmio_base,
+                err
+            )
+        })?;
+
+        if !was_initialized {
+            initialized_now = initialized_now.saturating_add(1);
+        }
+    }
+
+    let initialized_after = amd_units.iter().filter(|unit| unit.initialized()).count() as u64
+        + intel_units.iter().filter(|unit| unit.initialized()).count() as u64;
    println!("units_initialized_now={}", initialized_now);
-    println!("units_attempted={}", units.len());
+    println!("units_attempted={}", total_units);
    println!("units_initialized_after={}", initialized_after);
    println!("events_drained={}", events_drained);

@@ -398,8 +458,9 @@ fn run_self_test() -> Result<(), String> {
 fn run() -> Result<(), String> {
    let discovery = discover_units()?;
    info!(
-        "iommu: host build stub active; parsed {} AMD-Vi unit(s) via {}",
-        discovery.units.len(),
+        "iommu: host build stub active; parsed {} AMD-Vi and {} Intel VT-d unit(s) via {}",
+        discovery.amd_units.len(),
+        discovery.intel_units.len(),
        discovery.source.as_str()
    );
    Ok(())
@@ -47,6 +47,16 @@ fn timespec_to_nanos(time: &TimeSpec) -> i128 {
    i128::from(time.tv_sec) * 1_000_000_000i128 + i128::from(time.tv_nsec)
 }

+fn check_timer_source(name: &str, path: &str) -> &'static str {
+    if Path::new(path).exists() {
+        println!("timer_source={} path={} present=1", name, path);
+        "present"
+    } else {
+        println!("timer_source={} path={} present=0", name, path);
+        "missing"
+    }
+}
+
 fn run() -> Result<(), String> {
    parse_args(PROGRAM, USAGE, std::env::args()).map_err(|err| {
        if err.is_empty() {
@@ -57,6 +67,10 @@ fn run() -> Result<(), String> {

    println!("=== Red Bear OS Timer Runtime Check ===");

+    check_timer_source("hpet", "/scheme/sys/hpet");
+    check_timer_source("pit", "/scheme/sys/pit");
+    check_timer_source("lapic", "/scheme/sys/lapic");
+
    let time_path = monotonic_path()?;

    let time_fd = Fd::open(&time_path, flag::O_RDWR, 0)
@@ -78,6 +92,17 @@ fn run() -> Result<(), String> {
        return Err("monotonic timer did not advance".to_string());
    }

+    let expected_ns: i128 = 50_000_000;
+    let deviation_ns = (delta_ns - expected_ns).abs();
+    println!("monotonic_expected_ns={expected_ns}");
+    println!("monotonic_deviation_ns={deviation_ns}");
+
+    if deviation_ns > 20_000_000 {
+        println!("timer_precision=coarse deviation_ns={deviation_ns} (threshold=20000000)");
+    } else {
+        println!("timer_precision=ok deviation_ns={deviation_ns} (threshold=20000000)");
+    }
+
    println!("monotonic_progress=ok");
    Ok(())
 }
@@ -2859,6 +2859,76 @@ fn collect_health_items(runtime: &Runtime, report: &Report<'_>) -> Vec<HealthIte
        },
    });

+    let thermal_zones = runtime.read_dir_names("/scheme/acpi/thermal").unwrap_or_default();
+    if !thermal_zones.is_empty() {
+        let temps: Vec<String> = thermal_zones
+            .iter()
+            .filter_map(|zone| {
+                read_trimmed(runtime, &format!("/scheme/acpi/thermal/{zone}/temperature"))
+            })
+            .collect();
+        let avg_temp = temps.iter().filter_map(|t| t.parse::<f64>().ok()).sum::<f64>()
+            / temps.len().max(1) as f64;
+        let state = if avg_temp > 85.0 {
+            HealthState::Critical
+        } else if avg_temp > 70.0 {
+            HealthState::Warning
+        } else {
+            HealthState::Healthy
+        };
+        items.push(HealthItem {
+            label: "Thermal",
+            state,
+            detail: format!("{} zone(s), avg {:.1}°C", thermal_zones.len(), avg_temp),
+        });
+    } else {
+        items.push(HealthItem {
+            label: "Thermal",
+            state: HealthState::Warning,
+            detail: "no thermal zones".to_string(),
+        });
+    }
+
+    let fans = runtime.read_dir_names("/scheme/acpi/fan").unwrap_or_default();
+    if !fans.is_empty() {
+        let active = fans
+            .iter()
+            .filter(|fan| {
+                read_trimmed(runtime, &format!("/scheme/acpi/fan/{fan}/status"))
+                    .map(|s| s == "on")
+                    .unwrap_or(false)
+            })
+            .count();
+        items.push(HealthItem {
+            label: "Fans",
+            state: HealthState::Healthy,
+            detail: format!("{} fan(s), {} active", fans.len(), active),
+        });
+    } else {
+        items.push(HealthItem {
+            label: "Fans",
+            state: HealthState::Warning,
+            detail: "no fan devices".to_string(),
+        });
+    }
+
+    let cstate_policy = read_trimmed(runtime, "/scheme/sys/cstate_policy");
+    let cstates = runtime.read_dir_names("/scheme/acpi/cstates").unwrap_or_default();
+    if !cstates.is_empty() {
+        let max_policy = cstate_policy.as_deref().unwrap_or("unlimited");
+        items.push(HealthItem {
+            label: "C-states",
+            state: HealthState::Healthy,
+            detail: format!("{} processor(s), policy={}", cstates.len(), max_policy),
+        });
+    } else {
+        items.push(HealthItem {
+            label: "C-states",
+            state: HealthState::Warning,
+            detail: "no C-state surface".to_string(),
+        });
+    }
+
    items
 }

@@ -536,7 +536,7 @@ fn monitor_loop(shared: Arc<RwLock<ThermalState>>) -> ! {
    loop {
        if !Path::new(ACPI_THERMAL_ROOT).exists() {
            if !warned_missing_surface {
-                warn!(
+                log::info!(
                    "{} is unavailable; thermald will keep polling and serve an empty thermal surface",
                    ACPI_THERMAL_ROOT,
                );
@@ -129,13 +129,22 @@ fn main() {
    let scheme = Arc::new(Mutex::new(scheme));
    let scheme_clone = Arc::clone(&scheme);
    thread::spawn(move || {
+        let mut last_count = 0usize;
        loop {
            thread::sleep(Duration::from_secs(2));
            if let Ok(mut s) = scheme_clone.lock() {
                match s.scan_pci_devices() {
-                    Ok(n) if n > 0 => info!("udev-shim: hotplug detected {} device(s)", n),
+                    Ok(n) => {
+                        if n != last_count {
+                            if n > last_count {
+                                info!("udev-shim: hotplug detected {} device(s) (total {})", n - last_count, n);
+                            } else {
+                                info!("udev-shim: device removal detected, {} device(s) remaining", n);
+                            }
+                            last_count = n;
+                        }
+                    }
                    Err(e) => error!("udev-shim: hotplug scan failed: {}", e),
-                    _ => {}
                }
            }
        }
@@ -2,6 +2,8 @@ use std::fs;
 use std::io;
 use std::os::unix::fs::symlink;
 use std::path::Path;
+use std::thread;
+use std::time::Duration;

 const DEFAULT_UDEV_RULES: &str = r#"# Network interface naming
 SUBSYSTEM=="net", KERNEL=="enp*", NAME="$kernel"
@@ -74,8 +76,26 @@ pub fn write_default_rules_file() -> io::Result<&'static str> {
    fs::create_dir_all(dir)?;

    let path = dir.join("50-default.rules");
-    fs::write(&path, default_udev_rules())?;
-    Ok("/etc/udev/rules.d/50-default.rules")
+    let contents = default_udev_rules();
+
+    if fs::metadata(&path).is_ok() {
+        let _ = fs::remove_file(&path);
+    }
+
+    for attempt in 0..3 {
+        match fs::write(&path, contents) {
+            Ok(()) => return Ok("/etc/udev/rules.d/50-default.rules"),
+            Err(e) if e.kind() == io::ErrorKind::BrokenPipe && attempt < 2 => {
+                thread::sleep(Duration::from_millis(50));
+            }
+            Err(e) if e.kind() == io::ErrorKind::AlreadyExists => {
+                return Ok("/etc/udev/rules.d/50-default.rules");
+            }
+            Err(e) => return Err(e),
+        }
+    }
+
+    unreachable!("write_default_rules_file loop always returns or errors")
 }

 fn parse_hex_byte(value: &str) -> Option<u8> {
@@ -3,7 +3,7 @@ diff --git a/src/subshell/common.c b/src/subshell/common.c
 +++ b/src/subshell/common.c
@@ -95,6 +95,45 @@
 #endif
- #endif
+ #endif /* HAVE_OPENPTY */
 
 +#ifdef __redox__
 +static int
@@ -6,8 +6,11 @@ template = "custom"
 dependencies = [
  "redoxfs",
  "ion",
+  "driver-manager",
 ]
 script = """
+set -eo pipefail
+
 BINS=(
    init
    logd
@@ -23,7 +26,6 @@ BINS=(
    lived
    nvmed
    pcid
-    pcid-spawner
    rtcd
    vesad
 )
@@ -71,8 +73,8 @@ mkdir -p "${COOKBOOK_BUILD}/initfs/lib/init.d"

 cp "${COOKBOOK_SOURCE}/init.initfs.d"/* "${COOKBOOK_BUILD}/initfs/lib/init.d/"

-mkdir -pv "${COOKBOOK_BUILD}/initfs/lib/pcid.d"
-cp -v "${COOKBOOK_SOURCE}/drivers/initfs.toml" "${COOKBOOK_BUILD}/initfs/lib/pcid.d/initfs.toml"
+mkdir -pv "${COOKBOOK_BUILD}/initfs/lib/drivers.d"
+cp -v "${COOKBOOK_SOURCE}/drivers/initfs-storage.toml" "${COOKBOOK_BUILD}/initfs/lib/drivers.d/00-storage.toml"

 export CARGO_PROFILE_RELEASE_OPT_LEVEL=s
 export CARGO_PROFILE_RELEASE_PANIC=abort
@@ -85,7 +87,7 @@ mkdir -pv "${COOKBOOK_BUILD}/initfs/bin" "${COOKBOOK_BUILD}/initfs/lib/drivers"
 for bin in "${BINS[@]}"
 do
    case "${bin}" in
-      init | logd | ramfs | randd | zerod | pcid | pcid-spawner | fbbootlogd | fbcond | inputd | vesad | lived | ps2d | acpid | bcm2835-sdhcid | rtcd | hwd)
+      init | logd | ramfs | randd | zerod | fbbootlogd | fbcond | inputd | vesad | lived | ps2d | acpid | bcm2835-sdhcid | rtcd | hwd | pcid)
        cp -v "target/${TARGET}/${build_type}/${bin}" "${COOKBOOK_BUILD}/initfs/bin"
        ;;
      *)
@@ -96,6 +98,7 @@ done

 cp "${COOKBOOK_SYSROOT}/usr/bin/redoxfs" "${COOKBOOK_BUILD}/initfs/bin"
 cp "${COOKBOOK_SYSROOT}/usr/bin/ion" "${COOKBOOK_BUILD}/initfs/bin"
+cp "${COOKBOOK_SYSROOT}/usr/bin/driver-manager" "${COOKBOOK_BUILD}/initfs/bin"

 ARCH="$(echo "${GNU_TARGET}" | cut -d - -f1)"
 RUSTFLAGS="$RUSTFLAGS -Ctarget-feature=+crt-static -Clink-arg=-nostartfiles -Clink-arg=-nostdlib" cargo \
@@ -4,8 +4,188 @@ rev = "463f76b9608a896e6f6c9f63457f57f6409873c7"
 patches = [
    "P0-daemon-fix-init-notify-unwrap.patch",
    "P0-workspace-add-bootstrap.patch",
-    "P0-bootstrap-workspace-fix.patch",
+    "P0-init-continuous-scheduling.patch",
+    "P0-dhcpd-auto-iface.patch",
+    "P0-procmgr-sigchld-debug.patch",
+    "P0-pcid-mcfg-diagnostics.patch",
+    "P0-ihdgd-intel-gpu-ids.patch",
+    "P0-acpid-dmar-fix.patch",
+    # P1: acpid EC runtime and AML physmem hardening (narrow ACPI runtime patches)
+    "P1-acpid-ec-runtime.patch",
+    "P1-acpid-runtime-hardening.patch",
+    # Stale patches needing recreation: P1-pcid-uevent-surface, P2-boot-runtime-fixes,
+    # P2-hwd-misc, P2-pcid-cfg-access, P3-xhci-device-hardening, P6-cpufreqd-real-impl
    "P2-i2c-gpio-ucsi-drivers.patch",
+    "P0-i2c-control-response-empty.patch",
+    "P2-ihdad-graceful-init.patch",
+    "P2-boot-logging.patch",
+    "P2-init-acpid-wiring.patch",
+    "P2-hwd-remove-acpid-spawn.patch",
+    "P2-initfs-pcid-service.patch",
+    "P2-misc-daemon-fixes.patch",
+    "P9-fix-so-pecred.patch",
+    "P3-inputd-keymap-bridge.patch",
+    # P3: ps2d consolidated — LED feedback, mouse resend, fastfail, Intellimouse2, controller init robustness, non-x86 fallback
+    "P7-ps2d-intellimouse2-leds-controller-init.patch",
+    "P3-usbhidd-hardening.patch",
+    "P3-init-colored-output.patch",
+    "P4-logd-persistent-logging.patch",
+    "P4-acpi-shutdown-hardening.patch",
+    "P4-acpi-s3-sleep.patch",
+    "P4-pcid-public-client-channel.patch",
+    "P4-pcid-config-scheme.patch",
+    "P4-pcid-spawner-pci-coordinate-env.patch",
+    "P4-initfs-usb-drm-services.patch",
+    "P4-initfs-release-virtio-gpu.patch",
+    "P4-initfs-network-services.patch",
+    "P4-initfs-getty-services.patch",
+    "P4-initfs-dbus-services.patch",
+    "P4-fbcond-scrollback.patch",
+    "P4-ucsid-estale-graceful.patch",
+    "P4-acpi-estale-graceful.patch",
+    "P4-hwd-estale-graceful.patch",
+    # P5-i2c-hidd-estale-retry: REDUNDANT — ESTALE retry already provided by P2 + P4-acpi-estale
+    "P5-acpid-dmi-endpoint.patch",
+    "P4-thermal-daemon.patch",
+    "P4-thermald-workspace.patch",
+    "P6-driver-main-fixes.patch",
+    "P6-driver-new-modules.patch",
+    "P9-init-scheduler-completed.patch",
+    "P6-init-requires-hard-dep.patch",
+    "P2-pcid-acpid-graceful-fd.patch",
+    "P5-fbbootlogd-fbcond-graceful-drm.patch",
+    "P7-acpid-shared-pcifd.patch",
+    "P6-rtcd-no-ocreat.patch",
+    "P6-pcid-acpid-fd-transfer.patch",
+    "P15-7-init-service-timeout.patch",
+    # P15-8-init-cycle-detection: REDUNDANT — cycle detection already included in P6-init-requires-hard-dep
+    "P18-1-daemon-restart.patch",
+    "P18-3-msi-msix-enablement.patch",
+    "P18-5-acpid-robustness.patch",
+    "P18-8-bounded-ipcd-queues.patch",
+    "P18-9-msi-allocation-resilience.patch",
+    "P19-init-startup-hardening.patch",
+    "P19-acpid-startup-hardening.patch",
+    "P20-ramfs-requires-randd.patch",
+    "P21-boot-daemon-graceful-panic.patch",
+    "P23-rootfs-hard-dep-on-drivers.patch",
+    "P24-acpi-s5-derivation-shutdown-semantics.patch",
+    "P25-fbcond-vesa-fallback.patch",
+    "P26-driver-manager-initfs-conversion.patch",
+    "P27-fbcond-borrow-fix.patch",
+    "P28-init-skip-unmet-conditions.patch",
+    "P30-acpid-graceful-scheme-exists.patch",
+    "P31-xhcid-restore-interrupts.patch",
+    "P32-acpid-graceful-boot.patch",
+    "P33-vesad-graceful-boot.patch",
+    "P34-fbcond-fbbootlogd-env.patch",
+    "P35-fbcond-fbbootlogd-init.patch",
+    "P36-graphics-scheme-graceful-init.patch",
+    "P37-smolnetd-ready-after-init.patch",
+    "P38-vesad-eventqueue-deadlock.patch",
+    "P39-pci-allocate-interrupt-vector-graceful.patch",
+    "P40-bar-rs-graceful.patch",
+    "P41-common-init-graceful.patch",
+    "P42-inputd-graceful-fallback.patch",
+    "P43-dhcpd-requires-hard-dep.patch",
+    "P44-acpid-thermal-zones.patch",
+    # P54: Add missing thermal.rs module for P44
+    "P54-acpid-thermal-module.patch",
+    # P45: Migrate e1000d and ixgbed to MSI-X via pci_allocate_interrupt_vector
+    "P45-net-msix-adoption.patch",
+    # P46: Migrate ahcid and ac97d to MSI-X via pci_allocate_interrupt_vector
+    "P46-storage-audio-msix.patch",
+    # P46b: Fix ac97d mutable borrow of pcid_handle (required by pci_allocate_interrupt_vector)
+    "P46b-ac97d-mutable-fix.patch",
+    # P47: Update thermald to read from P44 thermal zones and coretempd
+    "P47-thermald-backend.patch",
+    # P48: Add ACPI fan device discovery and status exposure
+    "P48-acpid-fan-support.patch",
+    # P49: Add IRQ affinity logging and CPU tracking to pcid
+    "P49-irq-affinity-logging.patch",
+    # P50: Add structured logging rate limiter and thermald integration
+    "P50-structured-logging.patch",
+    # P51: Add per-service log files and size-based rotation to logd
+    "P51-logd-rotation.patch",
+    # P52: Add ACPI C-state discovery and thermal-based C-state policy
+    "P52-acpid-cstates.patch",
+    # P53: Add e1000d interrupt throttling rate (ITR) coalescing
+    "P53-e1000d-itr-coalescing.patch",
+    # P55: Add JSON structured log format option to logd
+    "P55-logd-json-format.patch",
+]
+
+[package]
+installs = [
+    "/lib/pcid.d/ac97d.toml",
+    "/lib/pcid.d/e1000d.toml",
+    "/lib/pcid.d/ihdad.toml",
+    "/lib/pcid.d/ihdgd.toml",
+    "/lib/pcid.d/ixgbed.toml",
+    "/lib/pcid.d/rtl8139d.toml",
+    "/lib/pcid.d/rtl8168d.toml",
+    "/lib/pcid.d/vboxd.toml",
+    "/lib/pcid.d/virtio-netd.toml",
+    "/lib/pcid.d/xhcid.toml",
+    "/usr/bin/audiod",
+    "/usr/bin/dhcpd",
+    "/usr/bin/dw-acpi-i2cd",
+    "/usr/bin/gpiod",
+    "/usr/bin/i2cd",
+    "/usr/bin/i2c-gpio-expanderd",
+    "/usr/bin/i2c-hidd",
+    "/usr/bin/inputd",
+    "/usr/bin/intel-gpiod",
+    "/usr/bin/ipcd",
+    "/usr/bin/netstack",
+    "/usr/bin/pcid",
+    "/usr/bin/pcid-spawner",
+    "/usr/bin/ptyd",
+    "/usr/bin/redoxerd",
+    "/usr/bin/smolnetd",
+    "/usr/bin/ucsid",
+    "/usr/lib/drivers/ac97d",
+    "/usr/lib/drivers/ahcid",
+    "/usr/lib/drivers/amd-mp2-i2cd",
+    "/usr/lib/drivers/e1000d",
+    "/usr/lib/drivers/ihdad",
+    "/usr/lib/drivers/ihdgd",
+    "/usr/lib/drivers/ided",
+    "/usr/lib/drivers/intel-lpss-i2cd",
+    "/usr/lib/drivers/intel-thc-hidd",
+    "/usr/lib/drivers/ixgbed",
+    "/usr/lib/drivers/ps2d",
+    "/usr/lib/drivers/rtl8139d",
+    "/usr/lib/drivers/rtl8168d",
+    "/usr/lib/drivers/sb16d",
+    "/usr/lib/drivers/thermald",
+    "/usr/lib/drivers/usbctl",
+    "/usr/lib/drivers/usbhidd",
+    "/usr/lib/drivers/usbhubd",
+    "/usr/lib/drivers/usbscsid",
+    "/usr/lib/drivers/vboxd",
+    "/usr/lib/drivers/virtio-gpud",
+    "/usr/lib/drivers/virtio-netd",
+    "/usr/lib/drivers/xhcid",
+    "/usr/lib/init.d/00_base.target",
+    "/usr/lib/init.d/00_ipcd.service",
+    "/usr/lib/init.d/00_pcid-spawner.service",
+    "/usr/lib/init.d/00_ptyd.service",
+    "/usr/lib/init.d/00_sudo.service",
+    "/usr/lib/init.d/00_tmp",
+    "/usr/lib/init.d/05_boot_essential.target",
+    "/usr/lib/init.d/10_dhcpd.service",
+    "/usr/lib/init.d/10_net.target",
+    "/usr/lib/init.d/10_smolnetd.service",
+    "/usr/lib/init.d/12_boot_late.target",
+    "/usr/lib/init.d/12_dbus.service",
+    "/usr/lib/init.d/13_seatd.service",
+    "/usr/lib/init.d/13_sessiond.service",
+    "/usr/lib/init.d/20_audiod.service",
+    "/usr/lib/init.d/29_activate_console.service",
+    "/usr/lib/init.d/30_console.service",
+    "/usr/lib/init.d/30_thermald.service",
+    "/usr/lib/init.d/31_debug_console.service",
 ]

 [build]
@@ -49,10 +229,13 @@ BINS=(
    ixgbed
    pcid
    pcid-spawner
+    acpid
+    redoxerd
    rtl8139d
    rtl8168d
    usbctl
    usbhidd
+    thermald
    usbhubd
    ucsid
    usbscsid
@@ -61,14 +244,13 @@ BINS=(
    xhcid
    i2cd
    inputd
-    redoxerd
 )

 # Add additional drivers to the list to build, that are not in drivers-initfs
 # depending on the target architecture
 case "${TARGET}" in
    i586-unknown-redox | i686-unknown-redox | x86_64-unknown-redox)
-        BINS+=(ac97d sb16d vboxd)
+        BINS+=(ac97d ahcid ided nvmed ps2d sb16d vboxd)
        ;;
    *)
        ;;
@@ -92,7 +274,7 @@ done
    $(for bin in "${EXISTING_BINS[@]}"; do echo "-p" "${bin}"; done)
 for bin in "${EXISTING_BINS[@]}"
 do
-    if [[ "${bin}" == "gpiod" || "${bin}" == "i2c-gpio-expanderd" || "${bin}" == "intel-gpiod" || "${bin}" == "i2cd" || "${bin}" == "dw-acpi-i2cd" || "${bin}" == "i2c-hidd" || "${bin}" == "inputd" || "${bin}" == "pcid" || "${bin}" == "pcid-spawner" || "${bin}" == "redoxerd" || "${bin}" == "ucsid" ]]; then
+    if [[ "${bin}" == "gpiod" || "${bin}" == "i2c-gpio-expanderd" || "${bin}" == "intel-gpiod" || "${bin}" == "i2cd" || "${bin}" == "dw-acpi-i2cd" || "${bin}" == "acpid" || "${bin}" == "thermald" || "${bin}" == "i2c-hidd" || "${bin}" == "inputd" || "${bin}" == "pcid" || "${bin}" == "pcid-spawner" || "${bin}" == "redoxerd" || "${bin}" == "ucsid" ]]; then
        cp -v "target/${TARGET}/${build_type}/${bin}" "${COOKBOOK_STAGE}/usr/bin"
    else
        cp -v "target/${TARGET}/${build_type}/${bin}" "${COOKBOOK_STAGE}/usr/lib/drivers"
@@ -12,6 +12,41 @@ patches = [
    "../../../local/patches/kernel/P1-ioapic-hpet-nmi-v2.patch",
    "../../../local/patches/kernel/P9-numa-topology.patch",
    "../../../local/patches/kernel/P9-proc-lock-ordering.patch",
+    "../../../local/patches/kernel/P9-percpu-context-switch.patch",
+    "../../../local/patches/kernel/P9-broadcast-tlb-shootdown.patch",
+    "../../../local/patches/kernel/P9-ioapic-irq-affinity.patch",
+    "../../../local/patches/kernel/P10-irq-affinity-wiring.patch",
+    "../../../local/patches/kernel/P11-mcs-lock.patch",
+    "../../../local/patches/kernel/P12-range-tlb-flush.patch",
+    "../../../local/patches/kernel/P13-priority-inheritance.patch",
+    "../../../local/patches/kernel/P14-numa-topology.patch",
+    "../../../local/patches/kernel/P15-1-ap-cpu-id-race.patch",
+    "../../../local/patches/kernel/P15-4-mcs-pi-ordering.patch",
+    "../../../local/patches/kernel/P15-10-tlb-range-ordering.patch",
+    "../../../local/patches/kernel/P16-3-max-cpu-256.patch",
+    "../../../local/patches/kernel/P16-1-sipi-timing.patch",
+    "../../../local/patches/kernel/P16-4a-sdt-checksum.patch",
+    "../../../local/patches/kernel/P16-4b-madt-validation.patch",
+    "../../../local/patches/kernel/P17-2a-percpu-waiting.patch",
+    "../../../local/patches/kernel/P17-2b-transitive-pi.patch",
+    "../../../local/patches/kernel/P17-4-configurable-preempt.patch",
+    "../../../local/patches/kernel/P17-1-numa-selection.patch",
+    "../../../local/patches/kernel/P17-3-sched-affinity.patch",
+    "../../../local/patches/kernel/P17-3-syscall-dispatch.patch",
+    "../../../local/patches/kernel/P19-2-irq-debug.patch",
+    # P20: x2APIC ICR mode fix (32-bit dest field for x2APIC, 8-bit for xAPIC)
+    "../../../local/patches/kernel/P20-x2apic-icr-mode-fix.patch",
+    # P21: x2APIC SMP bring-up fix — skip 8-bit LocalApic entries when x2APIC
+    # is active (BSP ID mismatch causes all APs to be skipped on bare metal Intel)
+    "../../../local/patches/kernel/P21-x2apic-smp-fix.patch",
+    # P22: x2APIC MADT fallback — when x2APIC is active but MADT has no
+    # LocalX2Apic entries (QEMU, some BIOS), fall back to processing LocalApic
+    # entries with zero-extended IDs using x2APIC 64-bit ICR format
+    "../../../local/patches/kernel/P22-x2apic-madt-fallback.patch",
+    # P23: sys:msr scheme — kernel MSR read/write via /scheme/sys/msr/<cpu>/<msr>
+    "../../../local/patches/kernel/P23-sys-msr-scheme.patch",
+    # P25: Comprehensive cpuidle framework with deep C-states (C1-C7)
+    "../../../local/patches/kernel/P25-cpuidle-deep-cstates.patch",
 ]

 [build]
@@ -3,6 +3,8 @@ use core::{
    sync::atomic::{AtomicU8, Ordering},
 };

+use x86::time::rdtsc;
+
 use crate::{
    arch::{
        device::local_apic::the_local_apic,
@@ -18,10 +20,95 @@ use crate::{

 use super::{Madt, MadtEntry};

+use alloc::collections::BTreeSet;
+use alloc::vec::Vec;
+
+/// Maximum number of APIC→CPU mappings we track for NUMA topology.
+const MAX_APIC_MAPPINGS: usize = 256;
+
+struct ApicMapping {
+    apic_id: u32,
+    cpu_id: LogicalCpuId,
+}
+
+const UNINIT_MAPPING: ApicMapping = ApicMapping { apic_id: u32::MAX, cpu_id: LogicalCpuId::new(0) };
+
+static mut APIC_MAPPINGS: [ApicMapping; MAX_APIC_MAPPINGS] = [UNINIT_MAPPING; MAX_APIC_MAPPINGS];
+static mut APIC_MAPPING_COUNT: usize = 0;
+
+unsafe fn record_apic_mapping(apic_id: u32, cpu_id: LogicalCpuId) {
+    let count = APIC_MAPPING_COUNT;
+    if count < MAX_APIC_MAPPINGS {
+        APIC_MAPPINGS[count] = ApicMapping { apic_id, cpu_id };
+        APIC_MAPPING_COUNT = count + 1;
+    }
+}
+
 const AP_SPIN_LIMIT: u32 = 1_000_000;
 const TRAMPOLINE: usize = 0x8000;
 static TRAMPOLINE_DATA: &[u8] = include_bytes!(concat!(env!("OUT_DIR"), "/trampoline"));

+/// Estimate TSC frequency in MHz from CPUID.
+///
+/// Tries CPUID leaf 0x16 (Processor Frequency Information) first,
+/// then CPUID leaf 0x15 (TSC/Core Crystal Clock Ratio).
+/// Returns None if frequency cannot be determined.
+fn tsc_freq_mhz_cpuid() -> Option<u64> {
+    let max_leaf = unsafe { core::arch::x86_64::__cpuid(0).eax as u32 };
+
+    // CPUID leaf 0x16: EAX = Core Base Frequency in MHz (Intel)
+    if max_leaf >= 0x16 {
+        let mhz = unsafe { core::arch::x86_64::__cpuid(0x16) }.eax as u64;
+        if mhz > 0 {
+            return Some(mhz);
+        }
+    }
+
+    // CPUID leaf 0x15: EAX = denominator, EBX = numerator, ECX = crystal Hz
+    if max_leaf >= 0x15 {
+        let res = unsafe { core::arch::x86_64::__cpuid(0x15) };
+        let denom = res.eax as u64;
+        let numer = res.ebx as u64;
+        let crystal_hz = res.ecx as u64;
+        if denom > 0 && numer > 0 && crystal_hz > 0 {
+            // TSC freq = crystal_hz * numer / denom
+            let tsc_hz = crystal_hz * numer / denom;
+            return Some(tsc_hz / 1_000_000); // Hz → MHz
+        }
+    }
+
+    None
+}
+
+/// Early-boot microsecond delay using the Time Stamp Counter.
+///
+/// Uses CPUID-based TSC frequency estimation when available.
+/// Falls back to a conservative spin loop calibrated for the
+/// minimum expected CPU speed (1 GHz).
+///
+/// # Safety
+/// Must only be called after the BSP TSC is running (always true
+/// after CPU reset on x86).
+fn early_udelay(us: u64) {
+    if let Some(mhz) = tsc_freq_mhz_cpuid() {
+        // TSC-based delay: precise on invariant TSC (all modern x86).
+        // MHz = cycles per µs.
+        let target = unsafe { rdtsc() } + us * mhz;
+        while unsafe { rdtsc() } < target {
+            hint::spin_loop();
+        }
+    } else {
+        // Fallback: conservative spin loop.
+        // spin_loop() (PAUSE) is ~40 cycles on modern Intel, ~1 on AMD.
+        // At 1 GHz minimum: 1000 cycles/µs ÷ 40 cycles/iter = 25 iters/µs.
+        // Use 50 iters/µs for safety margin on slower/variable CPUs.
+        let iters = us.saturating_mul(50);
+        for _ in 0..iters {
+            hint::spin_loop();
+        }
+    }
+}
+
 fn current_x2apic_processor_uid(madt: &Madt, apic_id: u32) -> Option<u32> {
    madt.iter().find_map(|entry| match entry {
        MadtEntry::LocalX2Apic(x2apic) if x2apic.x2apic_id == apic_id => Some(x2apic.processor_uid),
@@ -61,6 +148,10 @@ pub(super) fn init(madt: Madt) {
    }

    if cfg!(not(feature = "multi_core")) {
+        unsafe {
+            record_apic_mapping(me.get(), LogicalCpuId::new(0));
+        }
+        crate::numa::init_default();
        return;
    }

@@ -94,22 +185,225 @@ pub(super) fn init(madt: Madt) {
        }
    }

+    // Detect whether MADT contains any LocalX2Apic entries.
+    // Some firmware (notably QEMU and some older BIOS) provides only 8-bit
+    // LocalApic entries even when the CPU supports x2APIC. In that case we must
+    // fall back to processing LocalApic entries with zero-extended IDs.
+    let has_x2apic_entries = madt.iter().any(|e| matches!(e, MadtEntry::LocalX2Apic(_)));
+    let x2apic_fallback = local_apic.x2 && !has_x2apic_entries;
+    if x2apic_fallback {
+        warn!("MADT: x2APIC mode active but no LocalX2Apic entries found; falling back to LocalApic entries with zero-extended IDs");
+    }
+
    unsafe {
        let preliminary_cpu_count = madt
            .iter()
            .filter(|entry| match entry {
-                MadtEntry::LocalApic(local) => u32::from(local.id) == me.get() || local.flags & 1 == 1,
-                MadtEntry::LocalX2Apic(local) => local.x2apic_id == me.get() || local.flags & 1 == 1,
+                // When x2APIC is active, LocalApic entries use 8-bit IDs that don't
+                // match the BSP's 32-bit x2APIC ID. Use LocalX2Apic entries instead.
+                MadtEntry::LocalApic(local) if !local_apic.x2 => {
+                    u32::from(local.id) == me.get() || local.flags & 1 == 1
+                }
+                MadtEntry::LocalApic(local) if local_apic.x2 && x2apic_fallback => {
+                    u32::from(local.id) == me.get() || local.flags & 1 == 1
+                }
+                MadtEntry::LocalApic(_) => false,
+                // xAPIC mode: cannot use 32-bit x2APIC IDs via 8-bit ICR.
+                // Skip LocalX2Apic entries and use LocalApic exclusively.
+                MadtEntry::LocalX2Apic(local) if local_apic.x2 => {
+                    local.x2apic_id == me.get() || local.flags & 1 == 1
+                }
+                MadtEntry::LocalX2Apic(_) => false,
                _ => false,
            })
            .count();
        crate::profiling::allocate(preliminary_cpu_count as u32);
    }

+    // Firmware bug detection: check for duplicate APIC IDs in MADT.
+    // Some firmware (especially on early BIOS/UEFI) may list the same
+    // processor multiple times. Keep first occurrence, warn on duplicates.
+    let mut seen_apic_ids: BTreeSet<u32> = BTreeSet::new();
+    {
+        let _ = seen_apic_ids.insert(me.get()); // BSP
+        for entry in madt.iter() {
+            match entry {
+                MadtEntry::LocalApic(local) if local.flags & 1 == 1 && !local_apic.x2 => {
+                    let id = u32::from(local.id);
+                    if !seen_apic_ids.insert(id) {
+                        warn!("MADT: duplicate APIC ID {} in LocalApic entry, firmware bug", id);
+                    }
+                }
+                MadtEntry::LocalApic(local) if local.flags & 1 == 1 && local_apic.x2 => {
+                    if x2apic_fallback {
+                        let id = u32::from(local.id);
+                        if !seen_apic_ids.insert(id) {
+                            warn!("MADT: duplicate APIC ID {} in LocalApic entry (x2APIC fallback), firmware bug", id);
+                        }
+                    } else {
+                    debug!("MADT: ignoring 8-bit LocalApic ID {} in x2APIC mode", local.id);
+                }
+                }
+                MadtEntry::LocalX2Apic(local) if local.flags & 1 == 1 && local_apic.x2 => {
+                    let id = local.x2apic_id;
+                    if !seen_apic_ids.insert(id) {
+                        warn!("MADT: duplicate x2APIC ID {} in LocalX2Apic entry, firmware bug", id);
+                    }
+                }
+                MadtEntry::LocalX2Apic(local) if local.flags & 1 == 1 && !local_apic.x2 => {
+                    // xAPIC mode: skip 32-bit x2APIC IDs; dedup only among LocalApic entries.
+                    let id = local.x2apic_id; // Copy from packed struct
+                    debug!("MADT: ignoring 32-bit x2APIC ID {} in xAPIC mode", id);
+                }
+                _ => {}
+            }
+        }
+    }
+
    for madt_entry in madt.iter() {
        debug!("      {:x?}", madt_entry);
        if let MadtEntry::LocalApic(ap_local_apic) = madt_entry {
-            if u32::from(ap_local_apic.id) == me.get() {
+            // x2APIC mode: LocalApic entries have 8-bit IDs that don't match
+            // the BSP's 32-bit x2APIC ID. All entries would be treated as APs,
+            // and SIPI would target the wrong processors. Skip them and rely
+            // on LocalX2Apic entries exclusively.
+            if local_apic.x2 && !x2apic_fallback {
+                debug!(
+                    "        Skipping 8-bit LocalApic id={} (x2APIC active, using LocalX2Apic entries)",
+                    ap_local_apic.id
+                );
+            } else if local_apic.x2 && x2apic_fallback {
+                let apic_id = u32::from(ap_local_apic.id);
+                if apic_id == me.get() {
+                    debug!("        This is my local APIC (x2APIC fallback, id={})", apic_id);
+                } else if ap_local_apic.flags & 1 == 1 {
+                    let alloc = match allocate_p2frame(4) {
+                        Some(frame) => frame,
+                        None => {
+                            println!("KERNEL AP: CPU {} no memory for stack, skipping", apic_id);
+                            continue;
+                        }
+                    };
+                    let stack_start = RmmA::phys_to_virt(alloc.base()).data();
+                    let stack_end = stack_start + (PAGE_SIZE << 4);
+
+                    let cpu_id = LogicalCpuId::new(crate::CPU_COUNT.fetch_add(1, Ordering::SeqCst));
+                    if cpu_id.get() >= crate::cpu_set::MAX_CPU_COUNT {
+                        println!(
+                            "KERNEL AP: CPU {} exceeds logical CPU limit, skipping",
+                            apic_id
+                        );
+                        continue;
+                    }
+
+                    let pcr_ptr = crate::arch::gdt::allocate_and_init_pcr(cpu_id, stack_end);
+                    let idt_ptr = crate::arch::idt::allocate_and_init_idt(cpu_id);
+
+                    let args = KernelArgsAp {
+                        stack_end: stack_end as *mut u8,
+                        cpu_id,
+                        pcr_ptr,
+                        idt_ptr,
+                    };
+
+                    let ap_ready = (TRAMPOLINE + 8) as *mut u64;
+                    let ap_args_ptr = unsafe { ap_ready.add(1) };
+                    let ap_page_table = unsafe { ap_ready.add(2) };
+                    let ap_code = unsafe { ap_ready.add(3) };
+
+                    unsafe {
+                        ap_ready.write(0);
+                        ap_args_ptr.write(&args as *const _ as u64);
+                        ap_page_table.write(page_table_physaddr as u64);
+                        #[expect(clippy::fn_to_numeric_cast)]
+                        ap_code.write(kstart_ap as u64);
+
+                        core::sync::atomic::fence(Ordering::SeqCst);
+                    };
+                    AP_READY.store(false, Ordering::SeqCst);
+
+                    // Clear APIC Error Status Register before starting AP.
+                    unsafe { local_apic.esr(); }
+
+                    // Send INIT IPI (Assert) — x2APIC uses 64-bit ICR format.
+                    {
+                        let mut icr = 0x4500u64;
+                        icr |= u64::from(apic_id) << 32;
+                        local_apic.set_icr(icr);
+                    }
+
+                    // Intel SDM Vol 3A §8.4.4: wait 10ms after INIT deassert
+                    early_udelay(10_000);
+
+                    // Send START IPI #1
+                    {
+                        let ap_segment = (TRAMPOLINE >> 12) & 0xFF;
+                        let mut icr = 0x0600 | ap_segment as u64;
+                        icr |= u64::from(apic_id) << 32;
+                        local_apic.set_icr(icr);
+                    }
+
+                    early_udelay(200);
+
+                    // Send START IPI #2 (recommended for compatibility)
+                    {
+                        let ap_segment = (TRAMPOLINE >> 12) & 0xFF;
+                        let mut icr = 0x0600 | ap_segment as u64;
+                        icr |= u64::from(apic_id) << 32;
+                        local_apic.set_icr(icr);
+                    }
+
+                    early_udelay(200);
+
+                    // Check ESR for delivery errors after SIPI sequence.
+                    let esr_val = unsafe { local_apic.esr() };
+                    if esr_val != 0 {
+                        println!(
+                            "KERNEL AP: CPU {} SIPI delivery error (ESR={:#x}), continuing",
+                            apic_id, esr_val
+                        );
+                    }
+
+                    let mut trampoline_ready = false;
+                    for _ in 0..AP_SPIN_LIMIT {
+                        if unsafe { (*ap_ready.cast::<AtomicU8>()).load(Ordering::SeqCst) } != 0 {
+                            trampoline_ready = true;
+                            break;
+                        }
+                        hint::spin_loop();
+                    }
+                    if !trampoline_ready {
+                        println!("KERNEL AP: CPU {} trampoline timeout, skipping", apic_id);
+                        continue;
+                    }
+
+                    let mut kernel_ready = false;
+                    for _ in 0..AP_SPIN_LIMIT {
+                        if AP_READY.load(Ordering::SeqCst) {
+                            kernel_ready = true;
+                            break;
+                        }
+                        hint::spin_loop();
+                    }
+                    if !kernel_ready {
+                        println!("KERNEL AP: CPU {} AP_READY timeout, skipping", apic_id);
+                        continue;
+                    }
+
+                    // Record APIC→CPU mapping for NUMA topology.
+                    unsafe {
+                        record_apic_mapping(apic_id, cpu_id);
+                    }
+                    // Set NUMA node from SRAT data.
+                    if let Some(percpu) = crate::percpu::get_for_cpu(cpu_id) {
+                        if let Some(node) = crate::acpi::srat::numa_node_for_apic(apic_id) {
+                            percpu.numa_node.set(node);
+                        }
+                    }
+
+                    RmmA::invalidate_all();
+                }
+            } else if u32::from(ap_local_apic.id) == me.get() {
                debug!("        This is my local APIC");
            } else if ap_local_apic.flags & 1 == 1 {
                // Allocate a stack
@@ -123,15 +417,16 @@ pub(super) fn init(madt: Madt) {
                let stack_start = RmmA::phys_to_virt(alloc.base()).data();
                let stack_end = stack_start + (PAGE_SIZE << 4);

-                let next_cpu = crate::CPU_COUNT.load(Ordering::Relaxed);
-                if next_cpu >= crate::cpu_set::MAX_CPU_COUNT {
+                // Atomically allocate a CPU ID — fetch_add is SeqCst so that
+                // all later stores (PercpuBlock, NUMA node) are ordered after.
+                let cpu_id = LogicalCpuId::new(crate::CPU_COUNT.fetch_add(1, Ordering::SeqCst));
+                if cpu_id.get() >= crate::cpu_set::MAX_CPU_COUNT {
                    println!(
                        "KERNEL AP: CPU {} exceeds logical CPU limit, skipping",
                        ap_local_apic.id
                    );
                    continue;
                }
-                let cpu_id = LogicalCpuId::new(next_cpu);

                let pcr_ptr = crate::arch::gdt::allocate_and_init_pcr(cpu_id, stack_end);

@@ -157,14 +452,21 @@ pub(super) fn init(madt: Madt) {
                    #[expect(clippy::fn_to_numeric_cast)]
                    ap_code.write(kstart_ap as u64);

-                    // TODO: Is this necessary (this fence)?
-                    core::arch::asm!("");
+                    // Ensure all trampoline writes are visible to the AP before
+                    // it starts executing.  asm!("") is only a compiler barrier;
+                    // fence(SeqCst) is a full hardware memory barrier.
+                    core::sync::atomic::fence(Ordering::SeqCst);
                };
                AP_READY.store(false, Ordering::SeqCst);

-                // Send INIT IPI
+                // Clear APIC Error Status Register before starting AP.
+                // Intel SDM §8.4.4: ESR should be cleared before sending SIPI.
+                unsafe { local_apic.esr(); }
+
+                // Send INIT IPI (Assert)
                {
-                    let mut icr = 0x4500;
+                    // ICR: Delivery Mode=INIT(101), Level=Assert, Trigger=Edge
+                    let mut icr = 0x4500u64;
                    if local_apic.x2 {
                        icr |= u64::from(ap_local_apic.id) << 32;
                    } else {
@@ -173,20 +475,53 @@ pub(super) fn init(madt: Madt) {
                    local_apic.set_icr(icr);
                }

-                // Send START IPI
+                // Intel SDM Vol 3A §8.4.4: wait 10ms after INIT deassert
+                // before sending first SIPI. Modern CPUs may need less,
+                // but 10ms is the safe specification-compliant value.
+                early_udelay(10_000);
+
+                // Send START IPI #1
                {
                    let ap_segment = (TRAMPOLINE >> 12) & 0xFF;
-                    let mut icr = 0x4600 | ap_segment as u64;
-
+                    // ICR: Delivery Mode=StartUp(110), Vector=ap_segment
+                    // Note: bit 14 (Level) must be 0 for SIPI per Intel SDM.
+                    let mut icr = 0x0600 | ap_segment as u64;
                    if local_apic.x2 {
                        icr |= u64::from(ap_local_apic.id) << 32;
                    } else {
                        icr |= u64::from(ap_local_apic.id) << 56;
                    }
-
                    local_apic.set_icr(icr);
                }

+                // Intel SDM: wait 200µs between SIPIs
+                early_udelay(200);
+
+                // Send START IPI #2 (recommended for compatibility)
+                {
+                    let ap_segment = (TRAMPOLINE >> 12) & 0xFF;
+                    let mut icr = 0x0600 | ap_segment as u64;
+                    if local_apic.x2 {
+                        icr |= u64::from(ap_local_apic.id) << 32;
+                    } else {
+                        icr |= u64::from(ap_local_apic.id) << 56;
+                    }
+                    local_apic.set_icr(icr);
+                }
+
+                // Wait briefly for SIPI to be accepted
+                early_udelay(200);
+
+                // Check ESR for delivery errors after SIPI sequence.
+                // Bit 5 = Send Accept Error, Bit 6 = Send Illegal Vector.
+                let esr_val = unsafe { local_apic.esr() };
+                if esr_val != 0 {
+                    println!(
+                        "KERNEL AP: CPU {} SIPI delivery error (ESR={:#x}), continuing",
+                        ap_local_apic.id, esr_val
+                    );
+                }
+
                // Wait for trampoline ready with timeout
                let mut trampoline_ready = false;
                for _ in 0..AP_SPIN_LIMIT {
@@ -214,7 +549,16 @@ pub(super) fn init(madt: Madt) {
                    continue;
                }

-                crate::CPU_COUNT.fetch_add(1, Ordering::Relaxed);
+                // Record APIC→CPU mapping for NUMA topology.
+                unsafe {
+                    record_apic_mapping(u32::from(ap_local_apic.id), cpu_id);
+                }
+                // Set NUMA node from SRAT data.
+                if let Some(percpu) = crate::percpu::get_for_cpu(cpu_id) {
+                    if let Some(node) = crate::acpi::srat::numa_node_for_apic(u32::from(ap_local_apic.id)) {
+                        percpu.numa_node.set(node);
+                    }
+                }

                RmmA::invalidate_all();
            }
@@ -222,7 +566,14 @@ pub(super) fn init(madt: Madt) {
            let apic_id = ap_x2apic.x2apic_id;
            let flags = ap_x2apic.flags;

-            if apic_id == me.get() {
+            // xAPIC mode: cannot target 32-bit x2APIC IDs via 8-bit ICR.
+            // Skip LocalX2Apic entries; use LocalApic entries exclusively.
+            if !local_apic.x2 {
+                debug!(
+                    "        Skipping 32-bit x2APIC id={} (xAPIC mode, using LocalApic entries)",
+                    apic_id
+                );
+            } else if apic_id == me.get() {
                debug!("        This is my local x2APIC");
            } else if flags & 1 == 1 {
                let alloc = match allocate_p2frame(4) {
@@ -235,15 +586,16 @@ pub(super) fn init(madt: Madt) {
                let stack_start = RmmA::phys_to_virt(alloc.base()).data();
                let stack_end = stack_start + (PAGE_SIZE << 4);

-                let next_cpu = crate::CPU_COUNT.load(Ordering::Relaxed);
-                if next_cpu >= crate::cpu_set::MAX_CPU_COUNT {
+                // Atomically allocate a CPU ID — fetch_add is SeqCst so that
+                // all later stores (PercpuBlock, NUMA node) are ordered after.
+                let cpu_id = LogicalCpuId::new(crate::CPU_COUNT.fetch_add(1, Ordering::SeqCst));
+                if cpu_id.get() >= crate::cpu_set::MAX_CPU_COUNT {
                    println!(
                        "KERNEL AP: CPU {} exceeds logical CPU limit, skipping",
                        apic_id
                    );
                    continue;
                }
-                let cpu_id = LogicalCpuId::new(next_cpu);

                let pcr_ptr = crate::arch::gdt::allocate_and_init_pcr(cpu_id, stack_end);
                let idt_ptr = crate::arch::idt::allocate_and_init_idt(cpu_id);
@@ -266,38 +618,55 @@ pub(super) fn init(madt: Madt) {
                    ap_page_table.write(page_table_physaddr as u64);
                    #[expect(clippy::fn_to_numeric_cast)]
                    ap_code.write(kstart_ap as u64);
-                    core::arch::asm!("");
+                    // Ensure all trampoline writes are visible to the AP.
+                    core::sync::atomic::fence(Ordering::SeqCst);
                }
                AP_READY.store(false, Ordering::SeqCst);

+                // Clear APIC Error Status Register before starting AP.
+                unsafe { local_apic.esr(); }
+
+                // Send INIT IPI (Assert)
                {
                    let mut icr = 0x4500u64;
                    icr |= u64::from(apic_id) << 32;
                    local_apic.set_icr(icr);
                }

-                for _ in 0..100_000 {
-                    hint::spin_loop();
-                }
+                // Intel SDM Vol 3A §8.4.4: wait 10ms after INIT
+                early_udelay(10_000);

+                // Send START IPI #1
                {
                    let ap_segment = (TRAMPOLINE >> 12) & 0xFF;
-                    let mut icr = 0x4600u64 | ap_segment as u64;
+                    let mut icr = 0x0600u64 | ap_segment as u64;
                    icr |= u64::from(apic_id) << 32;
                    local_apic.set_icr(icr);
                }

-                for _ in 0..2_000_000 {
-                    hint::spin_loop();
-                }
+                // Intel SDM: wait 200µs between SIPIs
+                early_udelay(200);

+                // Send START IPI #2 (recommended for compatibility)
                {
                    let ap_segment = (TRAMPOLINE >> 12) & 0xFF;
-                    let mut icr = 0x4600u64 | ap_segment as u64;
+                    let mut icr = 0x0600u64 | ap_segment as u64;
                    icr |= u64::from(apic_id) << 32;
                    local_apic.set_icr(icr);
                }

+                // Wait briefly for SIPI acceptance
+                early_udelay(200);
+
+                // Check ESR for delivery errors.
+                let esr_val = unsafe { local_apic.esr() };
+                if esr_val != 0 {
+                    println!(
+                        "KERNEL AP: CPU {} SIPI delivery error (ESR={:#x}), continuing",
+                        apic_id, esr_val
+                    );
+                }
+
                let mut trampoline_ready = false;
                for _ in 0..AP_SPIN_LIMIT {
                    if unsafe { (*ap_ready.cast::<AtomicU8>()).load(Ordering::SeqCst) } != 0 {
@@ -324,7 +693,17 @@ pub(super) fn init(madt: Madt) {
                    continue;
                }

-                crate::CPU_COUNT.fetch_add(1, Ordering::Relaxed);
+                // Record APIC→CPU mapping for NUMA topology.
+                unsafe {
+                    record_apic_mapping(apic_id, cpu_id);
+                }
+                // Set NUMA node from SRAT data.
+                if let Some(percpu) = crate::percpu::get_for_cpu(cpu_id) {
+                    if let Some(node) = crate::acpi::srat::numa_node_for_apic(apic_id) {
+                        percpu.numa_node.set(node);
+                    }
+                }
+
                RmmA::invalidate_all();
            }
        } else if let MadtEntry::LocalApicNmi(nmi) = madt_entry {
@@ -342,6 +721,33 @@ pub(super) fn init(madt: Madt) {
        }
    }

+    // Initialize NUMA topology from APIC→CPU mappings and SRAT.
+    {
+        let mappings = unsafe { &APIC_MAPPINGS[..APIC_MAPPING_COUNT] };
+        let mappings_ref: Vec<(u32, LogicalCpuId)> = mappings
+            .iter()
+            .map(|m| (m.apic_id, m.cpu_id))
+            .collect();
+        crate::numa::init_from_srat(&mappings_ref);
+    }
+    // Set BSP's NUMA node from SRAT.
+    if let Some(node) = crate::acpi::srat::numa_node_for_apic(me.get()) {
+        crate::percpu::PercpuBlock::current().numa_node.set(node);
+    }
+
+    // Log final CPU count vs maximum
+    let cpu_count = crate::CPU_COUNT.load(Ordering::SeqCst);
+    info!(
+        "SMP: {} CPUs online (max {})",
+        cpu_count, crate::cpu_set::MAX_CPU_COUNT
+    );
+    if cpu_count > crate::cpu_set::MAX_CPU_COUNT * 80 / 100 {
+        warn!(
+            "SMP: CPU count approaching MAX_CPU_COUNT limit ({}/{})",
+            cpu_count, crate::cpu_set::MAX_CPU_COUNT
+        );
+    }
+
    // Unmap trampoline
    if let Some((_frame, _, flush)) = unsafe {
        KernelMapper::lock_rw()
@@ -34,6 +34,12 @@ impl Madt {
        let madt = Madt::new(find_one_sdt!("APIC"));

        if let Some(madt) = madt {
+            // Validate MADT checksum per ACPI 6.5 §5.2.2
+            if !madt.sdt.validate_checksum() {
+                error!("MADT checksum validation failed, skipping APIC initialization");
+                return;
+            }
+
            // safe because no APs have been started yet.
            unsafe { MADT.get().write(Some(madt)) };

@@ -20,6 +20,8 @@ mod rxsdt;
 pub mod sdt;
 #[cfg(target_arch = "aarch64")]
 mod spcr;
+pub mod slit;
+pub mod srat;
 mod xsdt;

 unsafe fn map_linearly(addr: PhysicalAddress, len: usize, mapper: &mut crate::memory::PageMapper) {
@@ -163,7 +165,14 @@ pub unsafe fn init(already_supplied_rsdp: Option<*const u8>) {

            // TODO: Enumerate processors in userspace, and then provide an ACPI-independent interface
            // to initialize enumerated processors to userspace?
+            // Parse SRAT BEFORE MADT so NUMA node mapping is available
+            // when APs are started and PercpuBlocks are created.
+            srat::init();
+
            Madt::init();
+
+            // Parse SLIT after MADT for the NUMA distance matrix.
+            slit::init();
            //TODO: support this on any arch
            // SPCR must be initialized after MADT for interrupt controllers
            #[cfg(target_arch = "aarch64")]
@@ -24,4 +24,20 @@ impl Sdt {
        let header_size = size_of::<Sdt>();
        total_size.saturating_sub(header_size)
    }
+
+    /// Validate the SDT checksum.
+    ///
+    /// Per ACPI 6.5 §5.2.2: the entire table (including the checksum field)
+    /// must sum to 0 when all bytes are added together as unsigned 8-bit values.
+    pub fn validate_checksum(&self) -> bool {
+        let ptr = self as *const _ as *const u8;
+        let len = self.length as usize;
+        if len < size_of::<Sdt>() {
+            return false;
+        }
+        let sum = unsafe { core::slice::from_raw_parts(ptr, len) }
+            .iter()
+            .fold(0u8, |acc, &b| acc.wrapping_add(b));
+        sum == 0
+    }
 }
@@ -0,0 +1,45 @@
+//! SLIT (System Locality Information Table) parser.
+//!
+//! Parses the NUMA distance matrix for scheduler NUMA-aware work stealing.
+
+use super::sdt::Sdt;
+use crate::acpi::find_sdt;
+
+const MAX_NODES: usize = 8;
+
+static mut SLIT_MATRIX: [[u8; MAX_NODES]; MAX_NODES] = [[10u8; MAX_NODES]; MAX_NODES];
+static mut SLIT_NUM_NODES: usize = 0;
+static mut SLIT_AVAILABLE: bool = false;
+
+pub fn is_available() -> bool { unsafe { SLIT_AVAILABLE } }
+pub fn num_nodes() -> usize { unsafe { SLIT_NUM_NODES } }
+
+pub fn distance(from: u8, to: u8) -> u8 {
+    if !unsafe { SLIT_AVAILABLE } { return 10; }
+    let (from, to) = (from as usize, to as usize);
+    if from >= MAX_NODES || to >= MAX_NODES { return 10; }
+    unsafe { SLIT_MATRIX[from][to] }
+}
+
+pub fn same_socket(node1: u8, node2: u8) -> bool { distance(node1, node2) <= 20 }
+
+pub fn init() {
+    let sdt = match find_sdt("SLIT").as_slice() {
+        [] => return,
+        [x] => *x,
+        xs => { println!("SLIT: {} tables found, expected 1", xs.len()); return; }
+    };
+    if &sdt.signature != b"SLIT" { return; }
+    let data_addr = sdt.data_address();
+    let data_len = sdt.data_len();
+    if data_len < 8 { return; }
+    let num_nodes = unsafe { *(data_addr as *const u64) } as usize;
+    if num_nodes == 0 || num_nodes > MAX_NODES { println!("SLIT: {num_nodes} nodes (max {MAX_NODES}), ignoring"); return; }
+    let matrix_start = 8;
+    let matrix_size = num_nodes * num_nodes;
+    if data_len < matrix_start + matrix_size { println!("SLIT: matrix truncated ({data_len} < {})", matrix_start + matrix_size); return; }
+    let matrix = unsafe { &mut SLIT_MATRIX };
+    for i in 0..num_nodes { for j in 0..num_nodes { matrix[i][j] = unsafe { *((data_addr + matrix_start + i * num_nodes + j) as *const u8) }; } }
+    unsafe { SLIT_NUM_NODES = num_nodes; SLIT_AVAILABLE = true; }
+    debug!("SLIT: {} nodes, distance matrix loaded", num_nodes);
+}
@@ -0,0 +1,102 @@
+//! SRAT (System Resource Affinity Table) parser.
+//!
+//! Parses CPU-to-NUMA-node and memory-to-NUMA-node affinity information.
+//! Called before MADT init so that NUMA data is available during AP startup.
+
+use super::sdt::Sdt;
+use crate::acpi::find_sdt;
+
+const MAX_CPU_ENTRIES: usize = 256;
+const MAX_MEM_ENTRIES: usize = 64;
+
+#[derive(Clone, Copy)]
+struct SratCpuEntry { apic_id: u32, node: u8, enabled: bool }
+
+#[derive(Clone, Copy)]
+struct SratMemEntry { node: u8, base: u64, length: u64, enabled: bool }
+
+const CPU_NONE: SratCpuEntry = SratCpuEntry { apic_id: u32::MAX, node: 0, enabled: false };
+const MEM_NONE: SratMemEntry = SratMemEntry { node: 0, base: 0, length: 0, enabled: false };
+
+static mut SRAT_CPU_ENTRIES: [SratCpuEntry; MAX_CPU_ENTRIES] = [CPU_NONE; MAX_CPU_ENTRIES];
+static mut SRAT_MEM_ENTRIES: [SratMemEntry; MAX_MEM_ENTRIES] = [MEM_NONE; MAX_MEM_ENTRIES];
+static mut SRAT_CPU_COUNT: usize = 0;
+static mut SRAT_MEM_COUNT: usize = 0;
+static mut SRAT_AVAILABLE: bool = false;
+
+pub fn is_available() -> bool { unsafe { SRAT_AVAILABLE } }
+
+pub fn numa_node_for_apic(apic_id: u32) -> Option<u8> {
+    if !unsafe { SRAT_AVAILABLE } { return None; }
+    let count = unsafe { SRAT_CPU_COUNT };
+    let entries = unsafe { &SRAT_CPU_ENTRIES };
+    for i in 0..count {
+        if entries[i].apic_id == apic_id && entries[i].enabled { return Some(entries[i].node); }
+    }
+    None
+}
+
+pub fn numa_node_count() -> usize {
+    if !unsafe { SRAT_AVAILABLE } { return 1; }
+    let mut max_node: u8 = 0;
+    let count = unsafe { SRAT_CPU_COUNT };
+    let entries = unsafe { &SRAT_CPU_ENTRIES };
+    for i in 0..count { if entries[i].enabled && entries[i].node > max_node { max_node = entries[i].node; } }
+    (max_node as usize) + 1
+}
+
+#[repr(C, packed)]
+struct SratLocalApic { _proximity_lo: u8, apic_id: u8, flags: u32, _local_sapic_eid: u8, _proximity_hi: [u8; 3], _clock_domain: u32 }
+
+#[repr(C, packed)]
+struct SratMemoryAffinity { proximity_domain: u32, _reserved1: u16, base_address_lo: u32, base_address_hi: u32, length_lo: u32, length_hi: u32, _reserved2: u32, flags: u32, _reserved3: u64 }
+
+#[repr(C, packed)]
+struct SratLocalX2Apic { _reserved: u16, proximity_domain: u32, x2apic_id: u32, flags: u32, _clock_domain: u32, _reserved2: u32 }
+
+pub fn init() {
+    let sdt = match find_sdt("SRAT").as_slice() {
+        [] => return,
+        [x] => *x,
+        xs => { println!("SRAT: {} tables found, expected 1", xs.len()); return; }
+    };
+    if &sdt.signature != b"SRAT" { return; }
+    let data_addr = sdt.data_address();
+    let data_len = sdt.data_len();
+    if data_len < 12 { println!("SRAT: table too short ({data_len} bytes)"); return; }
+    let mut offset: usize = 12;
+    let cpu_entries = unsafe { &mut SRAT_CPU_ENTRIES };
+    let mem_entries = unsafe { &mut SRAT_MEM_ENTRIES };
+    let mut cpu_count: usize = 0;
+    let mut mem_count: usize = 0;
+    while offset + 2 <= data_len {
+        let entry_type = unsafe { *((data_addr + offset) as *const u8) };
+        let entry_len = unsafe { *((data_addr + offset + 1) as *const u8) } as usize;
+        if entry_len < 2 || offset + entry_len > data_len { break; }
+        let entry_data = data_addr + offset + 2;
+        match entry_type {
+            0x0 if entry_len >= size_of::<SratLocalApic>() + 2 => {
+                let e = unsafe { &*(entry_data as *const SratLocalApic) };
+                let enabled = (e.flags & 1) == 1;
+                let node = (e._proximity_lo as u32) | ((e._proximity_hi[0] as u32) << 8) | ((e._proximity_hi[1] as u32) << 16) | ((e._proximity_hi[2] as u32) << 24);
+                if cpu_count < MAX_CPU_ENTRIES { cpu_entries[cpu_count] = SratCpuEntry { apic_id: e.apic_id as u32, node: node as u8, enabled }; cpu_count += 1; }
+            }
+            0x1 if entry_len >= size_of::<SratMemoryAffinity>() + 2 => {
+                let e = unsafe { &*(entry_data as *const SratMemoryAffinity) };
+                let enabled = (e.flags & 1) == 1;
+                let base = (e.base_address_hi as u64) << 32 | e.base_address_lo as u64;
+                let length = (e.length_hi as u64) << 32 | e.length_lo as u64;
+                if mem_count < MAX_MEM_ENTRIES { mem_entries[mem_count] = SratMemEntry { node: e.proximity_domain as u8, base, length, enabled }; mem_count += 1; }
+            }
+            0x2 if entry_len >= size_of::<SratLocalX2Apic>() + 2 => {
+                let e = unsafe { &*(entry_data as *const SratLocalX2Apic) };
+                let enabled = (e.flags & 1) == 1;
+                if cpu_count < MAX_CPU_ENTRIES { cpu_entries[cpu_count] = SratCpuEntry { apic_id: e.x2apic_id, node: e.proximity_domain as u8, enabled }; cpu_count += 1; }
+            }
+            _ => {}
+        }
+        offset += entry_len;
+    }
+    unsafe { SRAT_CPU_COUNT = cpu_count; SRAT_MEM_COUNT = mem_count; SRAT_AVAILABLE = true; }
+    debug!("SRAT: {} CPU entries, {} memory entries", cpu_count, mem_count);
+}
@@ -0,0 +1,186 @@
+use core::cell::SyncUnsafeCell;
+use core::sync::atomic::{AtomicUsize, Ordering};
+
+use crate::arch::cpuid::cpuid;
+use crate::syscall::error::{Error, Result, EINVAL};
+
+#[repr(align(64))]
+struct MonitorTarget {
+    value: AtomicUsize,
+}
+
+static MONITOR_TARGET: MonitorTarget = MonitorTarget {
+    value: AtomicUsize::new(0),
+};
+
+bitflags::bitflags! {
+    #[derive(Clone, Copy, Debug, PartialEq, Eq)]
+    pub struct CStateFlags: u32 {
+        const NEEDS_MONITOR = 1;
+        const NEEDS_WBINVD = 2;
+    }
+}
+
+#[derive(Clone, Copy, Debug)]
+pub struct CState {
+    pub name: &'static str,
+    pub typ: u32,
+    pub latency: u32,
+    pub power: u32,
+    pub mwait_hint: u32,
+    pub flags: CStateFlags,
+}
+
+const MAX_CSTATES: usize = 8;
+static CPUIDLE_STATES: SyncUnsafeCell<[Option<CState>; MAX_CSTATES]> =
+    SyncUnsafeCell::new([None; MAX_CSTATES]);
+static NUM_CPUIDLE_STATES: AtomicUsize = AtomicUsize::new(0);
+
+static CSTATE_POLICY_MAX: AtomicUsize = AtomicUsize::new(0);
+
+fn has_mwait() -> bool {
+    cpuid().get_feature_info().map_or(false, |info| info.has_monitor_mwait())
+}
+
+fn add_state(index: usize, state: CState) {
+    unsafe {
+        (*CPUIDLE_STATES.get())[index] = Some(state);
+    }
+}
+
+pub fn init() {
+    add_state(0, CState {
+        name: "C1",
+        typ: 1,
+        latency: 1,
+        power: 1000,
+        mwait_hint: 0x00,
+        flags: CStateFlags::empty(),
+    });
+    let mut count = 1;
+
+    if has_mwait() {
+        add_state(count, CState {
+            name: "C1E",
+            typ: 1,
+            latency: 2,
+            power: 800,
+            mwait_hint: 0x01,
+            flags: CStateFlags::NEEDS_MONITOR,
+        });
+        count += 1;
+
+        add_state(count, CState {
+            name: "C2",
+            typ: 2,
+            latency: 10,
+            power: 500,
+            mwait_hint: 0x10,
+            flags: CStateFlags::NEEDS_MONITOR,
+        });
+        count += 1;
+
+        add_state(count, CState {
+            name: "C3",
+            typ: 3,
+            latency: 50,
+            power: 100,
+            mwait_hint: 0x20,
+            flags: CStateFlags::NEEDS_MONITOR | CStateFlags::NEEDS_WBINVD,
+        });
+        count += 1;
+
+        add_state(count, CState {
+            name: "C6",
+            typ: 6,
+            latency: 100,
+            power: 50,
+            mwait_hint: 0x50,
+            flags: CStateFlags::NEEDS_MONITOR | CStateFlags::NEEDS_WBINVD,
+        });
+        count += 1;
+
+        add_state(count, CState {
+            name: "C7",
+            typ: 7,
+            latency: 200,
+            power: 30,
+            mwait_hint: 0x60,
+            flags: CStateFlags::NEEDS_MONITOR | CStateFlags::NEEDS_WBINVD,
+        });
+        count += 1;
+    }
+
+    NUM_CPUIDLE_STATES.store(count, Ordering::SeqCst);
+    log::info!("cpuidle: initialized {} states (mwait={})", count, has_mwait());
+}
+
+pub fn policy_read() -> usize {
+    CSTATE_POLICY_MAX.load(Ordering::Relaxed)
+}
+
+pub fn policy_write(buf: &[u8]) -> Result<usize> {
+    let s = core::str::from_utf8(buf).map_err(|_| Error::new(EINVAL))?;
+    let s = s.trim();
+    let val: usize = s.parse().map_err(|_| Error::new(EINVAL))?;
+    let num_states = NUM_CPUIDLE_STATES.load(Ordering::Relaxed);
+    if val >= num_states {
+        return Err(Error::new(EINVAL));
+    }
+    CSTATE_POLICY_MAX.store(val, Ordering::Relaxed);
+    log::info!("cpuidle: policy set to max state {}", val);
+    Ok(s.len())
+}
+
+pub fn resource() -> Result<alloc::vec::Vec<u8>> {
+    let mut output = alloc::string::String::new();
+    let num_states = NUM_CPUIDLE_STATES.load(Ordering::Relaxed);
+    let policy = CSTATE_POLICY_MAX.load(Ordering::Relaxed);
+    output.push_str(&format!("policy_max: {}\n", policy));
+    output.push_str(&format!("num_states: {}\n", num_states));
+    for i in 0..num_states {
+        if let Some(state) = unsafe { (*CPUIDLE_STATES.get())[i] } {
+            output.push_str(&format!(
+                "state{}: name={} type={} latency={}us power={} hint={:#x} flags={:?}\n",
+                i, state.name, state.typ, state.latency, state.power, state.mwait_hint, state.flags
+            ));
+        }
+    }
+    Ok(output.into_bytes())
+}
+
+pub unsafe fn enter_idle() {
+    let policy_max = CSTATE_POLICY_MAX.load(Ordering::Relaxed);
+    let num_states = NUM_CPUIDLE_STATES.load(Ordering::Relaxed);
+    let target_index = if num_states == 0 {
+        0
+    } else {
+        core::cmp::min(policy_max, num_states - 1)
+    };
+
+    if target_index == 0 {
+        unsafe { crate::arch::interrupt::enable_and_halt(); }
+        return;
+    }
+
+    let state = match unsafe { (*CPUIDLE_STATES.get())[target_index] } {
+        Some(s) => s,
+        None => {
+            unsafe { crate::arch::interrupt::enable_and_halt(); }
+            return;
+        }
+    };
+
+    if state.flags.contains(CStateFlags::NEEDS_MONITOR) {
+        let addr = &MONITOR_TARGET.value as *const AtomicUsize as *const u8;
+        unsafe { crate::arch::interrupt::monitor(addr, 0, 0); }
+    }
+
+    if state.flags.contains(CStateFlags::NEEDS_WBINVD) {
+        unsafe { core::arch::asm!("wbinvd", options(nostack)); }
+    }
+
+    unsafe {
+        crate::arch::interrupt::enable_and_mwait(state.mwait_hint, 0);
+    }
+}
@@ -120,6 +120,21 @@ impl IoApic {
        reg |= u64::from(mask) << 16;
        let _ = guard.write_ioredtbl(idx, reg);
    }
+    /// Change the destination APIC for a GSI by reprogramming the redirection table entry.
+    /// Preserves all other fields (vector, polarity, trigger mode, delivery mode, mask).
+    /// Returns true if the entry was successfully updated.
+    pub fn set_irq_affinity(&self, gsi: u32, dest: ApicId) -> bool {
+        let idx = (gsi - self.gsi_start) as u8;
+        let mut guard = self.regs.lock();
+        let Some(mut entry) = guard.read_ioredtbl(idx) else {
+            return false;
+        };
+        // Clear destination field (bits 63:56 for xAPIC physical mode)
+        // and set new destination APIC ID
+        entry &= !(0xFF_u64 << 56);
+        entry |= u64::from(dest.get()) << 56;
+        guard.write_ioredtbl(idx, entry)
+    }
 }

 #[repr(u8)]
@@ -474,3 +489,14 @@ pub unsafe fn unmask(irq: u8) {
    };
    apic.set_mask(gsi, false);
 }
+
+/// Change the destination CPU for an IRQ by reprogramming the IOAPIC redirection entry.
+/// Resolves the legacy IRQ to its GSI, finds the owning IOAPIC, and updates the destination
+/// APIC ID in the redirection table while preserving all other fields.
+pub unsafe fn set_affinity(irq: u8, dest: ApicId) -> bool {
+    let gsi = resolve(irq);
+    match find_ioapic(gsi) {
+        Some(apic) => apic.set_irq_affinity(gsi, dest),
+        None => false,
+    }
+}
@@ -59,10 +59,10 @@ impl LocalApic {
                .is_some_and(|feature_info| feature_info.has_x2apic());

            if !self.x2 {
-                debug!("Detected xAPIC at {:#x}", physaddr.data());
+                info!("Detected xAPIC at {:#x}", physaddr.data());
                self.address = map_device_memory(physaddr, 4096).data();
            } else {
-                debug!("Detected x2APIC");
+                info!("Detected x2APIC");
            }

            self.init_ap();
@@ -110,6 +110,8 @@ pub fn set_reserved(cpu_id: LogicalCpuId, index: u8, reserved: bool) {
 }

 pub fn available_irqs_iter(cpu_id: LogicalCpuId) -> impl Iterator<Item = u8> + 'static {
+    let count = (32..=254).filter(|&index| !is_reserved(cpu_id, index)).count();
+    info!("available_irqs_iter: cpu_id={} count={}", cpu_id.get(), count);
    (32..=254).filter(move |&index| !is_reserved(cpu_id, index))
 }

@@ -4,16 +4,10 @@ use crate::{
    percpu::PercpuBlock,
    syscall::FloatRegisters,
 };
-use core::{mem::offset_of, ptr, sync::atomic::AtomicBool};
+use core::{mem::offset_of, ptr};
 use spin::Once;
 use syscall::{EnvRegisters, Result};

-/// This must be used by the kernel to ensure that context switches are done atomically
-/// Compare and exchange this to true when beginning a context switch on any CPU
-/// The `Context::switch_to` function will set it back to false, allowing other CPU's to switch
-/// This must be done, as no locks can be held on the stack during switch
-pub static CONTEXT_SWITCH_LOCK: AtomicBool = AtomicBool::new(false);
-
 // 512 bytes for registers, extra bytes for fpcr and fpsr
 pub const KFX_ALIGN: usize = 16;

@@ -2,13 +2,11 @@ use crate::{
    arch::interrupt::InterruptStack, context::context::Kstack, memory::RmmA, percpu::PercpuBlock,
    syscall::FloatRegisters,
 };
-use core::{mem::offset_of, sync::atomic::AtomicBool};
+use core::mem::offset_of;
 use rmm::{Arch, VirtualAddress};
 use spin::Once;
 use syscall::{error::*, EnvRegisters};

-pub static CONTEXT_SWITCH_LOCK: AtomicBool = AtomicBool::new(false);
-
 pub const KFX_ALIGN: usize = 16;

 #[derive(Clone, Debug, Default)]
@@ -1,4 +1,4 @@
-use core::{mem::offset_of, sync::atomic::AtomicBool};
+use core::mem::offset_of;
 use rmm::{Arch, VirtualAddress};
 use spin::Once;
 use syscall::{error::*, EnvRegisters};
@@ -14,12 +14,6 @@ use crate::{
    syscall::FloatRegisters,
 };

-/// This must be used by the kernel to ensure that context switches are done atomically
-/// Compare and exchange this to true when beginning a context switch on any CPU
-/// The `Context::switch_to` function will set it back to false, allowing other CPU's to switch
-/// This must be done, as no locks can be held on the stack during switch
-pub static CONTEXT_SWITCH_LOCK: AtomicBool = AtomicBool::new(false);
-
 const ST_RESERVED: u128 = 0xFFFF_FFFF_FFFF_0000_0000_0000_0000_0000;

 pub const KFX_ALIGN: usize = 16;
@@ -1,6 +1,5 @@
 use core::{
    ptr::{addr_of, addr_of_mut},
-    sync::atomic::AtomicBool,
 };

 use crate::syscall::FloatRegisters;
@@ -12,12 +11,6 @@ use spin::Once;
 use syscall::{error::*, EnvRegisters};
 use x86::msr;

-/// This must be used by the kernel to ensure that context switches are done atomically
-/// Compare and exchange this to true when beginning a context switch on any CPU
-/// The `Context::switch_to` function will set it back to false, allowing other CPU's to switch
-/// This must be done, as no locks can be held on the stack during switch
-pub static CONTEXT_SWITCH_LOCK: AtomicBool = AtomicBool::new(false);
-
 const ST_RESERVED: u128 = 0xFFFF_FFFF_FFFF_0000_0000_0000_0000_0000;

 #[cfg(cpu_feature_never = "xsave")]
@@ -14,8 +14,8 @@ use crate::{
    memory::{RmmA, RmmArch, TableKind},
    percpu::PercpuBlock,
    sync::{
-        ArcRwLockWriteGuard, CleanLockToken, LockToken, Mutex, MutexGuard, RwLock, RwLockReadGuard,
-        RwLockWriteGuard, L0, L1, L2, L4,
+        ArcRwLockWriteGuard, CleanLockToken, LockToken, McsMutex, McsMutexGuard, Mutex,
+        MutexGuard, RwLock, RwLockReadGuard, RwLockWriteGuard, L0, L1, L2, L4,
    },
    syscall::error::Result,
 };
@@ -74,10 +74,12 @@ pub use self::arch::empty_cr3;
 // the context file descriptors.
 static CONTEXTS: RwLock<L2, BTreeSet<ContextRef>> = RwLock::new(BTreeSet::new());

-// Actual context store for the scheduler
-static RUN_CONTEXTS: Mutex<L1, RunContextData> = Mutex::new(RunContextData::new());
+// Actual context store for the scheduler — uses MCS fair spinlock to
+// eliminate cache-line bouncing under multi-CPU contention.
+static RUN_CONTEXTS: McsMutex<L1, RunContextData> = McsMutex::new(RunContextData::new());

-// Context that has been pushed out from RUN_CONTEXTS after being idle
+// Context that has been pushed out from RUN_CONTEXTS after being idle.
+// Uses regular Mutex (lower contention; wakeup_contexts uses try_lock).
 static IDLE_CONTEXTS: Mutex<L2, VecDeque<WeakContextRef>> = Mutex::new(VecDeque::new());

 pub struct RunContextData {
@@ -113,7 +115,7 @@ pub fn idle_contexts_try(
    IDLE_CONTEXTS.try_lock(token)
 }

-pub fn run_contexts(token: LockToken<'_, L0>) -> MutexGuard<'_, L1, RunContextData> {
+pub fn run_contexts(token: LockToken<'_, L0>) -> McsMutexGuard<'_, L1, RunContextData> {
    RUN_CONTEXTS.lock(token)
 }

@@ -15,7 +15,7 @@ use crate::{
 use alloc::{sync::Arc, vec::Vec};
 use core::{
    cell::{Cell, RefCell},
-    hint, mem,
+    mem,
    sync::atomic::Ordering,
 };
 use syscall::PtraceFlags;
@@ -26,6 +26,11 @@ enum UpdateResult {
    Blocked,
 }

+/// Default number of PIT ticks before triggering a context switch.
+/// At ~2.25 ms per tick, 3 ticks ≈ 6.75 ms timeslice.
+/// Configurable per-CPU via `ContextSwitchPercpu::preempt_interval`.
+const DEFAULT_PREEMPT_INTERVAL: usize = 3;
+
 // A simple geometric series where value[i] ~= value[i - 1] * 1.25
 const SCHED_PRIO_TO_WEIGHT: [usize; 40] = [
    88761, 71755, 56483, 46273, 36291, 29154, 23254, 18705, 14949, 11916, 9548, 7620, 6100, 4904,
@@ -90,13 +95,15 @@ struct SwitchResultInner {
 ///
 /// The function also calls the signal handler after switching contexts.
 pub fn tick(token: &mut CleanLockToken) {
-    let ticks_cell = &PercpuBlock::current().switch_internals.pit_ticks;
+    let percpu = PercpuBlock::current();
+    let ticks_cell = &percpu.switch_internals.pit_ticks;

    let new_ticks = ticks_cell.get() + 1;
    ticks_cell.set(new_ticks);

-    // Trigger a context switch after every 3 ticks (approx. 6.75 ms).
-    if new_ticks >= 3 {
+    // Trigger a context switch when the per-CPU preempt interval is reached.
+    let interval = percpu.switch_internals.preempt_interval.get();
+    if new_ticks >= interval {
        switch(token);
        crate::context::signal::signal_handler(token);
    }
@@ -120,7 +127,10 @@ pub unsafe extern "C" fn switch_finish_hook() {
                crate::arch::stop::emergency_reset();
            }
        }
-        arch::CONTEXT_SWITCH_LOCK.store(false, Ordering::SeqCst);
+        PercpuBlock::current()
+            .switch_internals
+            .in_context_switch
+            .set(false);
        crate::percpu::switch_arch_hook();
    }
 }
@@ -150,16 +160,15 @@ pub fn switch(token: &mut CleanLockToken) -> SwitchResult {
    //set PIT Interrupt counter to 0, giving each process same amount of PIT ticks
    percpu.switch_internals.pit_ticks.set(0);

-    // Acquire the global lock to ensure exclusive access during context switch and avoid
-    // issues that would be caused by the unsafe operations below
-    // TODO: Better memory orderings?
-    while arch::CONTEXT_SWITCH_LOCK
-        .compare_exchange_weak(false, true, Ordering::SeqCst, Ordering::Relaxed)
-        .is_err()
-    {
-        hint::spin_loop();
-        percpu.maybe_handle_tlb_shootdown();
-    }
+    // Acquire the per-CPU context switch flag. Each CPU can only be in one context
+    // switch at a time. The per-context write locks provide cross-CPU safety; this
+    // flag catches re-entrant switches on the same CPU (a kernel bug).
+    debug_assert!(
+        !percpu.switch_internals.in_context_switch.get(),
+        "context switch re-entry on CPU {}",
+        percpu.cpu_id
+    );
+    percpu.switch_internals.in_context_switch.set(true);

    // Lock the previous context.
    let prev_context_lock = crate::context::current();
@@ -167,8 +176,8 @@ pub fn switch(token: &mut CleanLockToken) -> SwitchResult {
    let mut prev_context_guard = unsafe { prev_context_lock.write_arc() };

    if !prev_context_guard.is_preemptable() {
-        // Unset global lock
-        arch::CONTEXT_SWITCH_LOCK.store(false, Ordering::SeqCst);
+        // Unset per-CPU context switch flag
+        percpu.switch_internals.in_context_switch.set(false);

        // Pretend to have finished switching, so CPU is not idled
        return SwitchResult::Switched;
@@ -292,8 +301,8 @@ pub fn switch(token: &mut CleanLockToken) -> SwitchResult {
            SwitchResult::Switched
        }
        _ => {
-            // No target was found, unset global lock and return
-            arch::CONTEXT_SWITCH_LOCK.store(false, Ordering::SeqCst);
+            // No target was found, unset per-CPU context switch flag and return
+            percpu.switch_internals.in_context_switch.set(false);

            percpu.stats.set_state(cpu_stats::CpuState::Idle);

@@ -352,6 +361,7 @@ fn wakeup_contexts(token: &mut CleanLockToken, switch_time: u128) -> Vec<(usize,
 }

 /// This is the scheduler function which currently utilises Deficit Weighted Round Robin Scheduler
+/// with NUMA-aware context selection preference.
 fn select_next_context(
    token: &mut CleanLockToken,
    percpu: &PercpuBlock,
@@ -377,6 +387,10 @@ fn select_next_context(
    let total_contexts: usize = contexts_list.iter().map(|q| q.len()).sum();
    let mut skipped_contexts = 0;

+    // NUMA-aware selection: remember cross-node fallback candidate.
+    let my_numa_node = percpu.numa_node.get();
+    let mut cross_node_fallback: Option<(usize, ArcContextLockWriteGuard)> = None;
+
    'priority: loop {
        i = (i + 1) % 40;
        total_iters += 1;
@@ -441,9 +455,44 @@ fn select_next_context(
            // Is this context runnable on this CPU?
            let sw = unsafe { update_runnable(&mut next_context_guard, cpu_id, switch_time) };
            if let UpdateResult::CanSwitch = sw {
+                // NUMA-aware selection: check if this context's last CPU was on the same node.
+                let same_node = if my_numa_node != u8::MAX {
+                    next_context_guard.cpu_id
+                        .map(|cid| {
+                            crate::percpu::get_for_cpu(cid)
+                                .map(|p| p.numa_node.get() == my_numa_node)
+                                .unwrap_or(false)
+                        })
+                        .unwrap_or(true) // New context (no last CPU) — treat as same node
+                } else {
+                    true // No NUMA info — treat all as same node
+                };
+
+                if same_node {
+                    // Cache-warm: select immediately
+                    percpu.current_prio.set(next_context_guard.prio);
                    next_context_guard_opt = Some(next_context_guard);
                    balance[i] -= SCHED_PRIO_TO_WEIGHT[20];
                    break 'priority;
+                } else {
+                    // Cross-node candidate: save as fallback, keep scanning for same-node
+                    if cross_node_fallback.is_none() {
+                        // Cache the priority and balance for later
+                        cross_node_fallback =
+                            Some((next_context_guard.prio, next_context_guard));
+                        balance[i] -= SCHED_PRIO_TO_WEIGHT[20];
+                        // Don't break — keep looking for a same-node context
+                        continue;
+                    } else {
+                        // Already have a cross-node fallback; push this one back
+                        contexts.push_back(next_context_ref);
+                        skipped_contexts += 1;
+                        if skipped_contexts >= total_contexts {
+                            break 'priority;
+                        }
+                        continue;
+                    }
+                }
            } else {
                if matches!(sw, UpdateResult::Blocked) {
                    idle_contexts(token.token()).push_back(next_context_ref);
@@ -458,6 +507,15 @@ fn select_next_context(
            }
        }
    }
+
+    // If we found a cross-node fallback but no same-node context, use it
+    if next_context_guard_opt.is_none() {
+        if let Some((prio, guard)) = cross_node_fallback {
+            percpu.current_prio.set(prio);
+            next_context_guard_opt = Some(guard);
+        }
+    }
+
    percpu.balance.set(balance);
    percpu.last_queue.set(i);

@@ -465,7 +523,10 @@ fn select_next_context(
        // Send the old process to the back of the line (if it is still runnable)
        let prev_ctx = WeakContextRef(Arc::downgrade(&prev_context_lock));
        if prev_context_guard.status.is_runnable() {
-            let prio = prev_context_guard.prio;
+            let raw_prio = prev_context_guard.prio;
+            let prio = percpu.effective_prio(raw_prio);
+            // Clear PI donation — previous context is being re-queued
+            percpu.pi_donated_prio.store(u32::MAX, Ordering::Relaxed);
            contexts_list[prio].push_back(prev_ctx);
        } else {
            idle_contexts(token.token()).push_back(prev_ctx);
@@ -477,7 +538,8 @@ fn select_next_context(
        return Ok(Some(next_context_guard));
    } else {
        if !was_idle && !Arc::ptr_eq(&prev_context_lock, &idle_context) {
-            // We switch into the idle context
+            // Switching to idle context — cache lowest priority
+            percpu.current_prio.set(39);
            Ok(Some(unsafe { idle_context.write_arc() }))
        } else {
            // We found no other process to run.
@@ -494,6 +556,13 @@ pub struct ContextSwitchPercpu {
    switch_result: Cell<Option<SwitchResultInner>>,
    switch_time: Cell<u128>,
    pit_ticks: Cell<usize>,
+    /// Per-CPU context switch flag. Set to true during a context switch on this CPU.
+    /// Replaced the global CONTEXT_SWITCH_LOCK to eliminate cross-CPU serialization.
+    in_context_switch: Cell<bool>,
+    /// Number of PIT ticks before triggering a context switch.
+    /// Default: 3 (≈6.75 ms). Lower values improve interactive responsiveness;
+    /// higher values improve throughput for batch/compute workloads.
+    preempt_interval: Cell<usize>,

    current_ctxt: RefCell<Option<Arc<ContextLock>>>,

@@ -508,6 +577,8 @@ impl ContextSwitchPercpu {
            switch_result: Cell::new(None),
            switch_time: Cell::new(0),
            pit_ticks: Cell::new(0),
+            in_context_switch: Cell::new(false),
+            preempt_interval: Cell::new(DEFAULT_PREEMPT_INTERVAL),
            current_ctxt: RefCell::new(None),
            idle_ctxt: RefCell::new(None),
            being_sigkilled: Cell::new(false),
@@ -42,17 +42,18 @@ impl core::fmt::Display for LogicalCpuId {
 }

 #[cfg(target_pointer_width = "64")]
-pub const MAX_CPU_COUNT: u32 = 128;
+pub const MAX_CPU_COUNT: u32 = 256;

 #[cfg(target_pointer_width = "32")]
 pub const MAX_CPU_COUNT: u32 = 32;

 const SET_WORDS: usize = (MAX_CPU_COUNT / usize::BITS) as usize;

-// TODO: Support more than 128 CPUs.
+// TODO: Support more than 256 CPUs.
 // The maximum number of CPUs on Linux is configurable, and the type for LogicalCpuSet and
 // LogicalCpuId may be optimized accordingly. In that case, box the mask if it's larger than some
-// base size (probably 256 bytes).
+// base size (probably 256 bytes). AMD EPYC has 128C/256T, Threadripper PRO 96C/192T —
+// 256 covers current hardware.
 #[derive(Debug)]
 pub struct LogicalCpuSet([AtomicUsize; SET_WORDS]);

@@ -70,6 +70,9 @@ mod log;
 /// Memory management
 mod memory;

+/// NUMA topology
+mod numa;
+
 /// Panic
 mod panic;

@@ -1,13 +1,15 @@
 /// NUMA topology hints for the kernel scheduler.
-/// NUMA discovery (SRAT/SLIT parsing) is performed by a userspace daemon
-/// (numad) via /scheme/acpi/, then pushed to the kernel via scheme:numa.
-/// The kernel stores a lightweight copy for O(1) scheduling lookups.
+///
+/// NUMA discovery (SRAT/SLIT parsing) is performed during kernel ACPI init
+/// (`acpi::init()`). The kernel stores a lightweight copy for O(1) scheduling
+/// lookups. If no SRAT is found, `init_default()` creates a single-node topology.
+use crate::acpi::srat;
 use crate::cpu_set::{LogicalCpuId, LogicalCpuSet};
 use core::sync::atomic::{AtomicBool, Ordering};

 const MAX_NUMA_NODES: usize = 8;

-#[derive(Clone, Debug)]
+#[derive(Debug)]
 pub struct NumaHint {
    pub node_id: u8,
    pub cpus: LogicalCpuSet,
@@ -21,17 +23,12 @@ pub struct NumaTopology {
 impl NumaTopology {
    pub const fn new() -> Self {
        const NONE: Option<NumaHint> = None;
-        Self {
-            nodes: [NONE; MAX_NUMA_NODES],
-            initialized: AtomicBool::new(false),
-        }
+        Self { nodes: [NONE; MAX_NUMA_NODES], initialized: AtomicBool::new(false) }
    }

    pub fn node_for_cpu(&self, cpu: LogicalCpuId) -> Option<u8> {
        for node in self.nodes.iter().flatten() {
-            if node.cpus.contains(cpu) {
-                return Some(node.node_id);
-            }
+            if node.cpus.contains(cpu) { return Some(node.node_id); }
        }
        None
    }
@@ -43,20 +40,42 @@ impl NumaTopology {

 static mut NUMA_TOPOLOGY: NumaTopology = NumaTopology::new();

-pub fn topology() -> &'static NumaTopology {
-    unsafe { &NUMA_TOPOLOGY }
-}
+pub fn topology() -> &'static NumaTopology { unsafe { &NUMA_TOPOLOGY } }

-pub fn init_default() {
+/// Initialize NUMA topology from SRAT data parsed during ACPI init.
+pub fn init_from_srat(apic_ids: &[(u32, LogicalCpuId)]) {
    let topo = topology();
-    if topo.initialized.swap(true, Ordering::AcqRel) {
-        return;
-    }
+    if topo.initialized.swap(true, Ordering::AcqRel) { return; }
+    if !srat::is_available() { init_default_inner(); return; }
    unsafe {
        let topo_mut = &mut *core::ptr::addr_of_mut!(NUMA_TOPOLOGY);
-        topo_mut.nodes[0] = Some(NumaHint {
-            node_id: 0,
-            cpus: LogicalCpuSet::all(),
-        });
+        for &(apic_id, cpu_id) in apic_ids {
+            if let Some(node) = srat::numa_node_for_apic(apic_id) {
+                let idx = node as usize;
+                if idx < MAX_NUMA_NODES {
+                    topo_mut.nodes[idx].get_or_insert_with(|| NumaHint { node_id: node, cpus: LogicalCpuSet::empty() }).cpus.atomic_set(cpu_id);
                }
+            }
+        }
+        if topo_mut.nodes.iter().all(|n| n.is_none()) {
+            topo_mut.nodes[0] = Some(NumaHint { node_id: 0, cpus: LogicalCpuSet::all() });
+        }
+    }
+    let node_count = topology().nodes.iter().filter(|n| n.is_some()).count();
+    debug!("NUMA: {node_count} node(s) from SRAT");
+}
+
+/// Fallback: single-node topology.
+pub fn init_default() {
+    let topo = topology();
+    if topo.initialized.swap(true, Ordering::AcqRel) { return; }
+    init_default_inner();
+}
+
+fn init_default_inner() {
+    unsafe {
+        let topo_mut = &mut *core::ptr::addr_of_mut!(NUMA_TOPOLOGY);
+        topo_mut.nodes[0] = Some(NumaHint { node_id: 0, cpus: LogicalCpuSet::all() });
+    }
+    debug!("NUMA: single-node topology (no SRAT)");
 }
@@ -4,9 +4,14 @@ use alloc::{
 };
 use core::{
    cell::{Cell, RefCell},
-    sync::atomic::{AtomicBool, AtomicPtr, Ordering},
+    hint,
+    sync::atomic::{AtomicBool, AtomicPtr, AtomicU32, AtomicU64, Ordering},
 };

+/// Maximum number of pages to flush individually using INVLPG before falling
+/// back to a full TLB flush (CR3 reload).
+const TLB_RANGE_THRESHOLD: u32 = 32;
+
 use rmm::Arch;
 use syscall::PtraceFlags;

@@ -16,7 +21,7 @@ use crate::{
    cpu_set::{LogicalCpuId, MAX_CPU_COUNT},
    cpu_stats::{CpuStats, CpuStatsData},
    ptrace::Session,
-    sync::CleanLockToken,
+    sync::{mcs::McsNode, mcs::McsRawLock, CleanLockToken},
    syscall::debug::SyscallDebugInfo,
 };

@@ -34,6 +39,38 @@ pub struct PercpuBlock {
    pub balance: Cell<[usize; 40]>,
    pub last_queue: Cell<usize>,

+    /// Per-CPU MCS node for the scheduler run-queue lock (RUN_CONTEXTS).
+    pub mcs_sched_node: McsNode,
+
+    /// Counts how many times the scheduler MCS lock acquisition was contended.
+    pub mcs_contention_count: Cell<u64>,
+
+    /// TLB shootdown range: start virtual address (page-aligned).
+    /// Set to 0 for a full flush. Only valid when `wants_tlb_shootdown` is true.
+    pub tlb_flush_start: AtomicU64,
+    /// TLB shootdown range: number of pages to invalidate.
+    pub tlb_flush_count: AtomicU32,
+
+    /// Priority inheritance donation. When another CPU is blocked waiting on a
+    /// lock this CPU holds, the blocked CPU may donate its priority here.
+    /// `u32::MAX` means no donation; otherwise it's a priority level (0-39).
+    pub pi_donated_prio: AtomicU32,
+
+    /// Cached priority of the currently-running context on this CPU.
+    /// Set by the scheduler when selecting a new context. Read by the MCS
+    /// lock during priority donation — avoids acquiring the context RwLock
+    /// from the spin loop. Default 39 (lowest priority).
+    pub current_prio: Cell<usize>,
+
+    /// NUMA proximity domain for this CPU. Set during ACPI init from SRAT.
+    /// `u8::MAX` means unknown (no SRAT or APIC ID not listed).
+    pub numa_node: Cell<u8>,
+
+    /// Pointer to the MCS lock this CPU is currently spinning on (for transitive PI).
+    /// `null` when not waiting on any lock. Set in McsRawLock::acquire() before
+    /// entering the spin loop, cleared upon acquisition.
+    pub waiting_on_lock: AtomicPtr<McsRawLock>,
+
    // TODO: Put mailbox queues here, e.g. for TLB shootdown? Just be sure to 128-byte align it
    // first to avoid cache invalidation.
    pub profiling: Option<&'static crate::profiling::RingBuffer>,
@@ -57,6 +94,15 @@ pub unsafe fn init_tlb_shootdown(id: LogicalCpuId, block: *mut PercpuBlock) {
    ALL_PERCPU_BLOCKS[id.get() as usize].store(block, Ordering::Release)
 }

+/// Get a reference to another CPU's PercpuBlock by logical CPU ID.
+pub fn get_for_cpu(id: LogicalCpuId) -> Option<&'static PercpuBlock> {
+    unsafe {
+        ALL_PERCPU_BLOCKS[id.get() as usize]
+            .load(Ordering::Acquire)
+            .as_ref()
+    }
+}
+
 pub fn get_all_stats() -> Vec<(LogicalCpuId, CpuStatsData)> {
    let mut res = ALL_PERCPU_BLOCKS
        .iter()
@@ -101,25 +147,148 @@ pub fn shootdown_tlb_ipi(target: Option<LogicalCpuId>) {
                core::hint::spin_loop();
            }
        }
+        // Full flush — clear range info (Release ordering ensures the flag
+        // swap and these stores are visible to the handler before the IPI).
+        percpublock.tlb_flush_start.store(0, Ordering::Release);
+        percpublock.tlb_flush_count.store(0, Ordering::Release);

        crate::ipi::ipi_single(crate::ipi::IpiKind::Tlb, percpublock);
    } else {
+        // Broadcast TLB shootdown: set flag on all other CPUs, then send a single
+        // IPI with "all except self" destination shorthand instead of N individual IPIs.
+        let my_percpublock = PercpuBlock::current();
        for id in 0..crate::cpu_count() {
-            // TODO: Optimize: use global counter and percpu ack counters, send IPI using
-            // destination shorthand "all CPUs".
-            shootdown_tlb_ipi(Some(LogicalCpuId::new(id)));
+            let target_id = LogicalCpuId::new(id);
+            if target_id == my_percpublock.cpu_id {
+                continue;
            }
+            let Some(percpublock) = (unsafe {
+                ALL_PERCPU_BLOCKS[id as usize]
+                    .load(Ordering::Acquire)
+                    .as_ref()
+            }) else {
+                continue;
+            };
+            // Wait if this CPU still has a pending shootdown from a previous request
+            #[expect(clippy::bool_comparison)]
+            while percpublock
+                .wants_tlb_shootdown
+                .swap(true, Ordering::Release)
+                == true
+            {
+                while percpublock.wants_tlb_shootdown.load(Ordering::Relaxed) == true {
+                    my_percpublock.maybe_handle_tlb_shootdown();
+                    hint::spin_loop();
+                }
+            }
+            // Full flush — clear range info (Release ordering)
+            percpublock.tlb_flush_start.store(0, Ordering::Release);
+            percpublock.tlb_flush_count.store(0, Ordering::Release);
+        }
+        // Single broadcast IPI to all other CPUs using destination shorthand
+        crate::ipi::ipi(crate::ipi::IpiKind::Tlb, crate::ipi::IpiTarget::Other);
+    }
+}
+
+/// Range-based TLB shootdown IPI. Only invalidates the specified virtual address
+/// range using INVLPG per page for ranges up to TLB_RANGE_THRESHOLD pages.
+/// Falls back to full flush for larger ranges.
+pub fn shootdown_tlb_ipi_range(target: Option<LogicalCpuId>, start: usize, count: usize) {
+    if cfg!(not(feature = "multi_core")) {
+        return;
+    }
+
+    let start_aligned = start as u64 & !0xFFF;
+    let count_u32 = count as u32;
+    let use_range = count_u32 > 0 && count_u32 <= TLB_RANGE_THRESHOLD;
+
+    let set_range = |percpublock: &PercpuBlock| {
+        if use_range {
+            percpublock.tlb_flush_start.store(start_aligned, Ordering::Release);
+            percpublock.tlb_flush_count.store(count_u32, Ordering::Release);
+        } else {
+            percpublock.tlb_flush_start.store(0, Ordering::Release);
+            percpublock.tlb_flush_count.store(0, Ordering::Release);
+        }
+    };
+
+    if let Some(target) = target {
+        let my_percpublock = PercpuBlock::current();
+        assert_ne!(target, my_percpublock.cpu_id);
+
+        let Some(percpublock) = (unsafe {
+            ALL_PERCPU_BLOCKS[target.get() as usize]
+                .load(Ordering::Acquire)
+                .as_ref()
+        }) else {
+            return;
+        };
+        #[expect(clippy::bool_comparison)]
+        while percpublock.wants_tlb_shootdown.swap(true, Ordering::Release) == true {
+            while percpublock.wants_tlb_shootdown.load(Ordering::Relaxed) == true {
+                my_percpublock.maybe_handle_tlb_shootdown();
+                hint::spin_loop();
+            }
+        }
+        set_range(percpublock);
+        crate::ipi::ipi_single(crate::ipi::IpiKind::Tlb, percpublock);
+    } else {
+        let my_percpublock = PercpuBlock::current();
+        for id in 0..crate::cpu_count() {
+            let target_id = LogicalCpuId::new(id);
+            if target_id == my_percpublock.cpu_id {
+                continue;
+            }
+            let Some(percpublock) = (unsafe {
+                ALL_PERCPU_BLOCKS[id as usize]
+                    .load(Ordering::Acquire)
+                    .as_ref()
+            }) else {
+                continue;
+            };
+            #[expect(clippy::bool_comparison)]
+            while percpublock.wants_tlb_shootdown.swap(true, Ordering::Release) == true {
+                while percpublock.wants_tlb_shootdown.load(Ordering::Relaxed) == true {
+                    my_percpublock.maybe_handle_tlb_shootdown();
+                    hint::spin_loop();
+                }
+            }
+            set_range(percpublock);
+        }
+        crate::ipi::ipi(crate::ipi::IpiKind::Tlb, crate::ipi::IpiTarget::Other);
    }
 }
 impl PercpuBlock {
+    /// Return the effective scheduling priority, accounting for priority inheritance.
+    /// Lower number = higher priority (0-39 range).
+    pub fn effective_prio(&self, context_prio: usize) -> usize {
+        let donated = self.pi_donated_prio.load(Ordering::Relaxed);
+        if donated < context_prio as u32 {
+            donated as usize
+        } else {
+            context_prio
+        }
+    }
+
    pub fn maybe_handle_tlb_shootdown(&self) {
        #[expect(clippy::bool_comparison)]
        if self.wants_tlb_shootdown.swap(false, Ordering::Relaxed) == false {
            return;
        }

-        // TODO: Finer-grained flush
+        let start = self.tlb_flush_start.load(Ordering::Acquire);
+        let count = self.tlb_flush_count.load(Ordering::Acquire);
+
+        if start != 0 && count > 0 && count <= TLB_RANGE_THRESHOLD {
+            // Range-based flush using INVLPG per page — cheaper than full CR3 reload.
+            for i in 0..count {
+                let addr = start + (i as u64) * 4096;
+                crate::memory::RmmA::invalidate(rmm::VirtualAddress::new(addr as usize));
+            }
+        } else {
+            // Full TLB flush (CR3 reload) for large ranges or global shootdowns.
            crate::memory::RmmA::invalidate_all();
+        }

        if let Some(addrsp) = &*self.current_addrsp.borrow() {
            addrsp.tlb_ack.fetch_add(1, Ordering::Release);
@@ -189,6 +358,14 @@ impl PercpuBlock {
            wants_tlb_shootdown: AtomicBool::new(false),
            balance: Cell::new([0; 40]),
            last_queue: Cell::new(39),
+            mcs_sched_node: McsNode::new(),
+            mcs_contention_count: Cell::new(0),
+            tlb_flush_start: AtomicU64::new(0),
+            tlb_flush_count: AtomicU32::new(0),
+            pi_donated_prio: AtomicU32::new(u32::MAX),
+            current_prio: Cell::new(39),
+            numa_node: Cell::new(u8::MAX),
+            waiting_on_lock: AtomicPtr::new(core::ptr::null_mut()),
            ptrace_flags: Cell::new(PtraceFlags::empty()),
            ptrace_session: RefCell::new(None),
            inside_syscall: Cell::new(false),
@@ -18,6 +18,9 @@ use syscall::{
 use crate::context::file::InternalFlags;

 use super::{CallerCtx, HandleMap, OpenResult, SchemeExt, StrOrBytes};
+#[cfg(any(target_arch = "x86_64", target_arch = "x86"))]
+use crate::arch::device::{ioapic, local_apic::ApicId};
+
 #[cfg(any(target_arch = "x86_64", target_arch = "x86"))]
 use crate::arch::interrupt::{available_irqs_iter, irq::acknowledge, is_reserved, set_reserved};
 #[cfg(any(target_arch = "aarch64", target_arch = "riscv64"))]
@@ -80,7 +83,7 @@ pub fn irq_trigger(irq: u8, token: &mut CleanLockToken) {
 #[allow(dead_code)]
 enum Handle {
    SchemeRoot,
-    Irq { ack: AtomicUsize, irq: u8 },
+    Irq { ack: AtomicUsize, irq: u8, cpu_id: LogicalCpuId },
    Avail(LogicalCpuId),
    TopLevel,
    Phandle(u8, Vec<u8>),
@@ -90,7 +93,7 @@ enum Handle {
 impl Handle {
    fn as_irq_handle(&self) -> Option<(&AtomicUsize, u8)> {
        match self {
-            &Self::Irq { ref ack, irq } => Some((ack, irq)),
+            &Self::Irq { ref ack, irq, cpu_id: _ } => Some((ack, irq)),
            _ => None,
        }
    }
@@ -144,6 +147,7 @@ impl IrqScheme {
                    Handle::Irq {
                        ack: AtomicUsize::new(0),
                        irq: irq_number,
+                        cpu_id: LogicalCpuId::BSP,
                    },
                    InternalFlags::empty(),
                )
@@ -162,6 +166,7 @@ impl IrqScheme {
                    Handle::Irq {
                        ack: AtomicUsize::new(0),
                        irq: irq_number,
+                        cpu_id,
                    },
                    InternalFlags::empty(),
                )
@@ -203,6 +208,7 @@ impl IrqScheme {
                    Handle::Irq {
                        ack: AtomicUsize::new(0),
                        irq: irq_number as u8,
+                        cpu_id: LogicalCpuId::new(0),
                    },
                    InternalFlags::empty(),
                )
@@ -346,6 +352,7 @@ impl crate::scheme::KernelScheme for IrqScheme {
                    Handle::Irq {
                        ack: AtomicUsize::new(0),
                        irq: plain_irq_number,
+                        cpu_id: LogicalCpuId::BSP,
                    },
                    InternalFlags::empty(),
                )
@@ -401,6 +408,7 @@ impl crate::scheme::KernelScheme for IrqScheme {
                }
            }
            Handle::Avail(cpu_id) => {
+                let mut listed = 0;
                for vector in available_irqs_iter(cpu_id).skip(opaque) {
                    let irq = vector_to_irq(vector);
                    if cpu_id == LogicalCpuId::BSP && irq < BASE_IRQ_COUNT {
@@ -414,7 +422,9 @@ impl crate::scheme::KernelScheme for IrqScheme {
                        name: &intermediate,
                        next_opaque_id: u64::from(vector) + 1,
                    })?;
+                    listed += 1;
                }
+                info!("irq getdents Avail: cpu_id={} opaque={} listed={}", cpu_id.get(), opaque, listed);
            }
            _ => return Err(Error::new(ENOTDIR)),
        }
@@ -449,11 +459,14 @@ impl crate::scheme::KernelScheme for IrqScheme {
        let handle = handles_guard.get(id)?;

        if let &Handle::Irq {
-            irq: handle_irq, ..
+            irq: handle_irq,
+            cpu_id: handle_cpu_id,
+            ..
        } = handle
            && handle_irq > BASE_IRQ_COUNT
        {
-            set_reserved(LogicalCpuId::BSP, irq_to_vector(handle_irq), false);
+            info!("irq close: unreserving vector {} on cpu_id={}", irq_to_vector(handle_irq), handle_cpu_id.get());
+            set_reserved(handle_cpu_id, irq_to_vector(handle_irq), false);
        }
        Ok(())
    }
@@ -480,12 +493,21 @@ impl crate::scheme::KernelScheme for IrqScheme {
                if !cpus.contains(&(cpu_id as u8)) {
                    return Err(Error::new(EINVAL));
                }
+                // Reprogram the IOAPIC redirection entry for x86 targets.
+                // Non-IOAPIC IRQs (e.g. MSI) will return false -> EIO.
+                #[cfg(any(target_arch = "x86_64", target_arch = "x86"))]
+                {
+                    if !unsafe { ioapic::set_affinity(_handle_irq, ApicId::new(cpu_id)) } {
+                        return Err(Error::new(EIO));
+                    }
+                }
                mask.store(cpu_id as usize, Ordering::Release);
                Ok(size_of::<u32>())
            }
            &Handle::Irq {
                irq: handle_irq,
                ack: ref handle_ack,
+                cpu_id: _,
            } => {
                if buffer.len() < size_of::<usize>() {
                    return Err(Error::new(EINVAL));
@@ -600,6 +622,7 @@ impl crate::scheme::KernelScheme for IrqScheme {
            Handle::Irq {
                irq: handle_irq,
                ack: ref handle_ack,
+                cpu_id: _,
            } => {
                if buffer.len() < size_of::<usize>() {
                    return Err(Error::new(EINVAL));
@@ -0,0 +1,15 @@
+use alloc::vec::Vec;
+
+use crate::{
+    arch::cpuidle,
+    sync::CleanLockToken,
+    syscall::error::{Error, Result, EINVAL},
+};
+
+pub fn resource(_token: &mut CleanLockToken) -> Result<Vec<u8>> {
+    cpuidle::resource()
+}
+
+pub fn policy_write(buf: &[u8], _token: &mut CleanLockToken) -> Result<usize> {
+    cpuidle::policy_write(buf)
+}
@@ -45,6 +45,11 @@ enum Handle {
        data: Arc<RwLock<L1, Option<Vec<u8>>>>,
    },
    SchemeRoot,
+    #[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
+    Msr {
+        cpu: usize,
+        msr: u32,
+    },
 }

 #[derive(Clone, Copy)]
@@ -133,6 +138,28 @@ impl KernelScheme for SysScheme {
            let id = HANDLES.write(token.token()).insert(Handle::TopLevel);

            Ok(OpenResult::SchemeLocal(id, InternalFlags::POSITIONED))
+        } else if path.starts_with("msr/") {
+            #[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
+            {
+                if ctx.uid != 0 {
+                    return Err(Error::new(EPERM));
+                }
+                let rest = &path[4..];
+                let mut parts = rest.split('/');
+                let cpu_str = parts.next().ok_or(Error::new(EINVAL))?;
+                let msr_str = parts.next().ok_or(Error::new(EINVAL))?;
+                if parts.next().is_some() {
+                    return Err(Error::new(EINVAL));
+                }
+                let cpu: usize = cpu_str.parse().map_err(|_| Error::new(EINVAL))?;
+                let msr: u32 = u32::from_str_radix(msr_str, 16).map_err(|_| Error::new(EINVAL))?;
+                let id = HANDLES.write(token.token()).insert(Handle::Msr { cpu, msr });
+                Ok(OpenResult::SchemeLocal(id, InternalFlags::POSITIONED))
+            }
+            #[cfg(not(any(target_arch = "x86", target_arch = "x86_64")))]
+            {
+                Err(Error::new(ENOENT))
+            }
        } else {
            //Have to iterate to get the path without allocation
            let entry = FILES
@@ -160,6 +187,8 @@ impl KernelScheme for SysScheme {
                Handle::TopLevel => return Ok(0),
                Handle::Resource { kind, data, .. } => (*kind, data.clone()),
                Handle::SchemeRoot => return Err(Error::new(EBADF)),
+                #[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
+                Handle::Msr { .. } => return Ok(0),
            }
        };
        if matches!(kind, Kind::Wr(_)) {
@@ -188,6 +217,16 @@ impl KernelScheme for SysScheme {
            Handle::TopLevel => "",
            Handle::Resource { path, .. } => path,
            Handle::SchemeRoot => return Err(Error::new(EBADF)),
+            #[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
+            Handle::Msr { cpu, msr } => {
+                const FIRST: &[u8] = b"sys:msr/";
+                let mut bytes_read = buf.copy_common_bytes_from_slice(FIRST)?;
+                let suffix = format!("{}/{:x}", cpu, msr);
+                if let Some(remaining) = buf.advance(FIRST.len()) {
+                    bytes_read += remaining.copy_common_bytes_from_slice(suffix.as_bytes())?;
+                }
+                return Ok(bytes_read);
+            }
        };

        const FIRST: &[u8] = b"sys:";
@@ -215,6 +254,15 @@ impl KernelScheme for SysScheme {
        let (kind, data_lock) = {
            match HANDLES.read(token.token()).get(id)? {
                Handle::Resource { kind, data, .. } => (*kind, data.clone()),
+                #[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
+                Handle::Msr { cpu, msr } => {
+                    if *cpu != crate::cpu_id().get() as usize {
+                        return Err(Error::new(EINVAL));
+                    }
+                    let val = unsafe { x86::msr::rdmsr(*msr) };
+                    let data = format!("{:016x}\n", val).into_bytes();
+                    return buffer.copy_common_bytes_from_slice(&data[pos..]);
+                }
                _ => return Err(Error::new(EBADF)),
            }
        };
@@ -253,6 +301,18 @@ impl KernelScheme for SysScheme {
                let len = buffer.copy_common_bytes_to_slice(&mut intermediate)?;
                (*handler, intermediate, len)
            }
+            #[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
+            Handle::Msr { cpu, msr } => {
+                if *cpu != crate::cpu_id().get() as usize {
+                    return Err(Error::new(EINVAL));
+                }
+                let mut intermediate = [0_u8; 32];
+                let len = buffer.copy_common_bytes_to_slice(&mut intermediate)?;
+                let val_str = core::str::from_utf8(&intermediate[..len]).map_err(|_| Error::new(EINVAL))?;
+                let val = u64::from_str_radix(val_str.trim(), 16).map_err(|_| Error::new(EINVAL))?;
+                unsafe { x86::msr::wrmsr(*msr, val); }
+                return Ok(len);
+            }
            Handle::SchemeRoot => return Err(Error::new(EBADF)),
        };
        handler(&intermediate[..len], token)
@@ -269,7 +329,8 @@ impl KernelScheme for SysScheme {
            return Ok(0);
        };
        match HANDLES.read(token.token()).get(id)? {
-            Handle::Resource { .. } => Err(Error::new(ENOTDIR)),
+            Handle::Resource { .. }
+            | Handle::Msr { .. } => Err(Error::new(ENOTDIR)),
            Handle::TopLevel => {
                let mut buf = DirentBuf::new(buf, header_size).ok_or(Error::new(EIO))?;
                for (this_idx, (name, _)) in FILES.iter().enumerate().skip(first_index) {
@@ -293,6 +354,18 @@ impl KernelScheme for SysScheme {
                Handle::Resource { kind, data, .. } => Some((*kind, data.clone())),
                Handle::TopLevel => None,
                Handle::SchemeRoot => return Err(Error::new(EBADF)),
+                #[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
+                Handle::Msr { .. } => {
+                    let stat = Stat {
+                        st_mode: 0o600 | MODE_FILE,
+                        st_uid: 0,
+                        st_gid: 0,
+                        st_size: 0,
+                        ..Default::default()
+                    };
+                    buf.copy_exactly(&stat)?;
+                    return Ok(());
+                }
            }
        };
        let stat = if let Some((kind, data_lock)) = stat_base {
@@ -0,0 +1,188 @@
+//! MCS (Mellor-Crummey Scott) fair spinlock.
+//!
+//! Each waiter spins on its own local `locked` flag instead of a shared lock
+//! word, eliminating cache-line bouncing under contention. FIFO ordering
+//! guarantees fairness. O(1) cache-line transfers on unlock.
+//!
+//! Supports transitive priority inheritance: when CPU A waits on a lock held
+//! by CPU B, and CPU B waits on a lock held by CPU C, A's priority is
+//! propagated through the chain to C (up to MAX_PI_CHAIN_DEPTH hops).
+
+use core::sync::atomic::{AtomicBool, AtomicPtr, AtomicU32, Ordering};
+use core::{hint, ptr};
+
+use crate::percpu::PercpuBlock;
+
+/// Maximum depth for transitive priority inheritance chain following.
+/// Prevents infinite loops from theoretical lock cycles and bounds latency.
+/// Linux uses 20; 8 is conservative for a microkernel with fewer nesting levels.
+const MAX_PI_CHAIN_DEPTH: u32 = 8;
+
+/// A node in the MCS lock queue.
+pub struct McsNode {
+    pub next: AtomicPtr<McsNode>,
+    pub locked: AtomicBool,
+}
+
+impl McsNode {
+    pub const fn new() -> Self {
+        Self {
+            next: AtomicPtr::new(ptr::null_mut()),
+            locked: AtomicBool::new(false),
+        }
+    }
+}
+
+/// Raw MCS spinlock primitive.
+pub struct McsRawLock {
+    tail: AtomicPtr<McsNode>,
+    /// CPU ID of the current lock holder (for priority inheritance).
+    /// `u32::MAX` means no holder.
+    holder_cpu: AtomicU32,
+}
+
+impl McsRawLock {
+    pub const fn new() -> Self {
+        Self {
+            tail: AtomicPtr::new(ptr::null_mut()),
+            holder_cpu: AtomicU32::new(u32::MAX),
+        }
+    }
+
+    #[inline]
+    pub fn acquire(&self, node: &McsNode) -> bool {
+        node.next.store(ptr::null_mut(), Ordering::Relaxed);
+        node.locked.store(true, Ordering::Relaxed);
+        let prev = self.tail.swap((node as *const McsNode).cast_mut(), Ordering::AcqRel);
+        if prev.is_null() {
+            // Uncontended — record ourselves as holder
+            let cpu_id = PercpuBlock::current().cpu_id.get();
+            self.holder_cpu.store(cpu_id, Ordering::Release);
+            return false;
+        }
+        unsafe {
+            (*prev).next.store((node as *const McsNode).cast_mut(), Ordering::Release);
+        }
+        let percpu = PercpuBlock::current();
+        // Record which lock we're spinning on (for transitive PI chain following)
+        percpu.waiting_on_lock.store(
+            (self as *const McsRawLock).cast_mut(),
+            Ordering::Release,
+        );
+        let mut donated = false;
+        while node.locked.load(Ordering::Acquire) {
+            percpu.maybe_handle_tlb_shootdown();
+            // Donate priority to the lock holder (transitively) once per acquisition
+            if !donated {
+                self.maybe_donate_priority(percpu);
+                donated = true;
+            }
+            hint::spin_loop();
+        }
+        // Clear waiting_on_lock before proceeding — we now hold the lock
+        percpu.waiting_on_lock.store(ptr::null_mut(), Ordering::Release);
+        self.holder_cpu.store(percpu.cpu_id.get(), Ordering::Release);
+        true
+    }
+
+    #[inline]
+    pub fn release(&self, node: &McsNode) {
+        // Clear priority inheritance donation — we no longer hold the lock
+        PercpuBlock::current().pi_donated_prio.store(u32::MAX, Ordering::Release);
+        // Clear holder CPU
+        self.holder_cpu.store(u32::MAX, Ordering::Release);
+
+        let next = node.next.load(Ordering::Acquire);
+        if next.is_null() {
+            if self
+                .tail
+                .compare_exchange(
+                    (node as *const McsNode).cast_mut(),
+                    ptr::null_mut(),
+                    Ordering::AcqRel,
+                    Ordering::Acquire,
+                )
+                .is_ok()
+            {
+                return;
+            }
+            while node.next.load(Ordering::Acquire).is_null() {
+                hint::spin_loop();
+            }
+        }
+        unsafe {
+            (*node.next.load(Ordering::Acquire)).locked.store(false, Ordering::Release);
+        }
+    }
+
+    #[inline]
+    pub fn try_acquire(&self, node: &McsNode) -> bool {
+        node.next.store(ptr::null_mut(), Ordering::Relaxed);
+        node.locked.store(true, Ordering::Relaxed);
+        let ok = self
+            .tail
+            .compare_exchange(
+                ptr::null_mut(),
+                (node as *const McsNode).cast_mut(),
+                Ordering::AcqRel,
+                Ordering::Acquire,
+            )
+            .is_ok();
+        if ok {
+            let cpu_id = PercpuBlock::current().cpu_id.get();
+            self.holder_cpu.store(cpu_id, Ordering::Release);
+        }
+        ok
+    }
+
+    /// Donate current CPU's context priority to the lock holder's CPU,
+    /// following the PI chain transitively (A→B→C).
+    ///
+    /// Reads priority from PercpuBlock::current_prio (cached by the scheduler)
+    /// to avoid acquiring any lock in the MCS spin loop.
+    ///
+    /// Chain following: if the holder is itself waiting on another lock,
+    /// we propagate our priority to that lock's holder too, up to
+    /// MAX_PI_CHAIN_DEPTH hops.
+    fn maybe_donate_priority(&self, my_percpu: &PercpuBlock) {
+        let my_prio = my_percpu.current_prio.get() as u32;
+        let mut current_holder_cpu = self.holder_cpu.load(Ordering::Relaxed);
+
+        for _ in 0..MAX_PI_CHAIN_DEPTH {
+            if current_holder_cpu == u32::MAX {
+                return;
+            }
+            let holder_percpu = crate::percpu::get_for_cpu(
+                crate::cpu_set::LogicalCpuId::new(current_holder_cpu),
+            );
+            let Some(holder) = holder_percpu else {
+                return;
+            };
+
+            // Donate if our priority is higher (lower number) than current donation
+            let current_donated = holder.pi_donated_prio.load(Ordering::Relaxed);
+            if my_prio < current_donated {
+                holder.pi_donated_prio.store(my_prio, Ordering::Release);
+            }
+
+            // Follow the chain: is this holder also waiting on another lock?
+            let next_lock_ptr = holder.waiting_on_lock.load(Ordering::Relaxed);
+            if next_lock_ptr.is_null() {
+                return;
+            }
+            // SAFETY: The pointed-to McsRawLock is a long-lived struct field
+            // (e.g., part of the run queue). The holder is currently spinning
+            // in acquire(), so the pointer is valid. We only read holder_cpu
+            // (an atomic u32) — no mutable access needed.
+            let next_holder_cpu =
+                unsafe { (*next_lock_ptr).holder_cpu.load(Ordering::Relaxed) };
+
+            // Cycle detection: if the next holder is the same CPU we just visited, stop
+            if next_holder_cpu == current_holder_cpu {
+                return;
+            }
+            current_holder_cpu = next_holder_cpu;
+        }
+        // Chain depth exhausted — stop to bound latency
+    }
+}
@@ -1,5 +1,6 @@
 pub use self::{ordered::*, wait_condition::WaitCondition, wait_queue::WaitQueue};

+pub mod mcs;
 pub mod ordered;
 pub mod wait_condition;
 pub mod wait_queue;
@@ -52,7 +52,9 @@
 //! *g1 = 12;
 //! ```
 use alloc::sync::Arc;
+use core::cell::UnsafeCell;
 use core::marker::PhantomData;
+use core::ptr;

 use crate::percpu::PercpuBlock;

@@ -732,3 +734,143 @@ impl<L: Level, T> Drop for ArcRwLockWriteGuard<L, T> {
 /// This function can only be called if no lock is held by the calling thread/task
 #[inline]
 pub fn check_no_locks(_: LockToken<'_, L0>) {}
+
+// ---------------------------------------------------------------------------
+// MCS-based fair mutex (McsMutex)
+// ---------------------------------------------------------------------------
+
+/// A mutual exclusion lock using the MCS fair spinlock algorithm.
+///
+/// Unlike `Mutex<L, T>` which uses a simple spinlock (no fairness under
+/// contention), `McsMutex` uses Mellor-Crummey Scott queue-based spinning:
+///
+/// - Each waiter spins on its **own** local flag — no shared cache-line bouncing.
+/// - FIFO ordering prevents starvation.
+/// - O(1) cache-line transfers on unlock.
+///
+/// The MCS node is stored in [`crate::percpu::PercpuBlock::mcs_sched_node`], so
+/// this type is suitable for scheduler-internal locks where the holder is always
+/// the current CPU.
+pub struct McsMutex<L: Level, T> {
+    raw: crate::sync::mcs::McsRawLock,
+    data: UnsafeCell<T>,
+    _phantom: PhantomData<L>,
+}
+
+unsafe impl<L: Level, T: Send> Sync for McsMutex<L, T> {}
+unsafe impl<L: Level, T: Send> Send for McsMutex<L, T> {}
+
+impl<L: Level, T> McsMutex<L, T> {
+    pub const fn new(val: T) -> Self {
+        Self {
+            raw: crate::sync::mcs::McsRawLock::new(),
+            data: UnsafeCell::new(val),
+            _phantom: PhantomData,
+        }
+    }
+}
+
+impl<L: Level, T> McsMutex<L, T> {
+    pub fn lock<'a, LP: Lower<L> + 'a>(
+        &'a self,
+        lock_token: LockToken<'a, LP>,
+    ) -> McsMutexGuard<'a, L, T> {
+        let percpu = PercpuBlock::current();
+        let contended = self.raw.acquire(&percpu.mcs_sched_node);
+        if contended {
+            percpu
+                .mcs_contention_count
+                .set(percpu.mcs_contention_count.get() + 1);
+        }
+        McsMutexGuard {
+            lock: self,
+            lock_token: LockToken::downgraded(lock_token),
+        }
+    }
+
+    pub fn try_lock<'a, LP: Lower<L> + 'a>(
+        &'a self,
+        lock_token: LockToken<'a, LP>,
+    ) -> Option<McsMutexGuard<'a, L, T>> {
+        let percpu = PercpuBlock::current();
+        if self.raw.try_acquire(&percpu.mcs_sched_node) {
+            Some(McsMutexGuard {
+                lock: self,
+                lock_token: LockToken::downgraded(lock_token),
+            })
+        } else {
+            None
+        }
+    }
+}
+
+pub struct McsMutexGuard<'a, L: Level, T: 'a> {
+    lock: &'a McsMutex<L, T>,
+    lock_token: LockToken<'a, L>,
+}
+
+impl<'a, L: Level, T: 'a> McsMutexGuard<'a, L, T> {
+    pub fn token_split(&mut self) -> (&mut T, LockToken<'_, L>) {
+        unsafe { (&mut *self.lock.data.get(), self.lock_token.token()) }
+    }
+
+    pub fn into_split(self) -> (McsRawGuard<'a, L, T>, LockToken<'a, L>) {
+        let lock_ref = self.lock;
+        let token = unsafe { core::ptr::read(&self.lock_token) };
+        core::mem::forget(self);
+        (McsRawGuard { lock: lock_ref }, token)
+    }
+
+    pub fn from_split(raw: McsRawGuard<'a, L, T>, token: LockToken<'a, L>) -> Self {
+        let lock_ref = raw.lock;
+        core::mem::forget(raw);
+        Self {
+            lock: lock_ref,
+            lock_token: token,
+        }
+    }
+}
+
+impl<L: Level, T> core::ops::Deref for McsMutexGuard<'_, L, T> {
+    type Target = T;
+    fn deref(&self) -> &Self::Target {
+        unsafe { &*self.lock.data.get() }
+    }
+}
+
+impl<L: Level, T> core::ops::DerefMut for McsMutexGuard<'_, L, T> {
+    fn deref_mut(&mut self) -> &mut Self::Target {
+        unsafe { &mut *self.lock.data.get() }
+    }
+}
+
+impl<L: Level, T> Drop for McsMutexGuard<'_, L, T> {
+    fn drop(&mut self) {
+        let percpu = PercpuBlock::current();
+        self.lock.raw.release(&percpu.mcs_sched_node);
+    }
+}
+
+pub struct McsRawGuard<'a, L: Level, T: 'a> {
+    lock: &'a McsMutex<L, T>,
+}
+
+impl<L: Level, T> core::ops::Deref for McsRawGuard<'_, L, T> {
+    type Target = T;
+    fn deref(&self) -> &Self::Target {
+        unsafe { &*self.lock.data.get() }
+    }
+}
+
+impl<L: Level, T> core::ops::DerefMut for McsRawGuard<'_, L, T> {
+    fn deref_mut(&mut self) -> &mut Self::Target {
+        unsafe { &mut *self.lock.data.get() }
+    }
+}
+
+impl<L: Level, T> Drop for McsRawGuard<'_, L, T> {
+    fn drop(&mut self) {
+        let percpu = PercpuBlock::current();
+        self.lock.raw.release(&percpu.mcs_sched_node);
+    }
+}
@@ -28,6 +28,11 @@ use crate::{
    sync::CleanLockToken,
 };

+/// Local syscall numbers not yet in the redox_syscall crate.
+/// These are allocated from the 987+ range to avoid collisions with crate numbers.
+pub const SYS_SCHED_SETAFFINITY: usize = 987;
+pub const SYS_SCHED_GETAFFINITY: usize = 988;
+
 /// Debug
 pub mod debug;

@@ -220,6 +225,10 @@ pub fn syscall(
                unlinkat(fd, UserSlice::ro(c, d)?, e, f as _, g as _, token).map(|()| 0)
            }
            SYS_YIELD => sched_yield(token).map(|()| 0),
+
+            // P17-3: CPU affinity syscalls. Numbers allocated locally (not yet in redox_syscall crate).
+            SYS_SCHED_SETAFFINITY => sched_setaffinity(b, UserSlice::ro(c, d)?, token),
+            SYS_SCHED_GETAFFINITY => sched_getaffinity(b, UserSlice::wo(c, d)?, token),
            SYS_NANOSLEEP => nanosleep(
                UserSlice::ro(b, size_of::<TimeSpec>())?,
                UserSlice::wo(c, size_of::<TimeSpec>())?.none_if_null(),
@@ -11,6 +11,7 @@ use crate::{
        memory::{AddrSpace, Grant, PageSpan},
        ContextRef,
    },
+    cpu_set::RawMask,
    event,
    sync::{CleanLockToken, RwLock},
    syscall::flag::{EventFlags, O_CREAT, O_RDWR},
@@ -295,3 +296,71 @@ fn insert_fd(scheme: SchemeId, number: usize, cloexec: bool, token: &mut CleanLo
        .expect("failed to insert fd to current context")
        .get()
 }
+
+/// Set CPU affinity mask for a process.
+///
+/// # Arguments (syscall ABI)
+/// - `pid`: Process ID (0 = current process; other PIDs not yet supported)
+/// - `mask_ptr`: Pointer to a `RawMask` (32 bytes on 64-bit, 256-bit bitmap)
+/// - `mask_len`: Length of mask in bytes (must equal `size_of::<RawMask>()`)
+pub fn sched_setaffinity(
+    pid: usize,
+    mask_ptr: super::usercopy::UserSliceRo,
+    token: &mut CleanLockToken,
+) -> Result<usize> {
+    // Validate mask size
+    if mask_ptr.len() != core::mem::size_of::<RawMask>() {
+        return Err(Error::new(super::error::EINVAL));
+    }
+
+    // pid == 0 means current process
+    let target = if pid == 0 {
+        context::current()
+    } else {
+        // TODO: Support PID-based lookup (requires context list iteration
+        // with lock token downgrades). For now, only pid=0 is supported.
+        return Err(Error::new(super::error::ESRCH));
+    };
+
+    // Read mask from userspace
+    let raw_mask: RawMask = unsafe { mask_ptr.read_exact() }?;
+
+    // Apply to context's affinity mask
+    let mut ctx = target.write(token.token());
+    ctx.sched_affinity.override_from(&raw_mask);
+
+    Ok(0)
+}
+
+/// Get CPU affinity mask for a process.
+///
+/// # Arguments (syscall ABI)
+/// - `pid`: Process ID (0 = current process; other PIDs not yet supported)
+/// - `mask_ptr`: Pointer to a `RawMask` buffer (32 bytes on 64-bit)
+/// - `mask_len`: Length of buffer in bytes (must equal `size_of::<RawMask>()`)
+///
+/// # Returns
+/// Number of bytes written to mask_ptr on success.
+pub fn sched_getaffinity(
+    pid: usize,
+    mask_ptr: super::usercopy::UserSliceWo,
+    token: &mut CleanLockToken,
+) -> Result<usize> {
+    // Validate mask size
+    if mask_ptr.len() != core::mem::size_of::<RawMask>() {
+        return Err(Error::new(super::error::EINVAL));
+    }
+
+    // pid == 0 means current process
+    let target = if pid == 0 {
+        context::current()
+    } else {
+        return Err(Error::new(super::error::ESRCH));
+    };
+
+    let ctx = target.read(token.token());
+    let raw_mask = ctx.sched_affinity.to_raw();
+    mask_ptr.copy_common_bytes_from_slice(crate::cpu_set::mask_as_bytes(&raw_mask))?;
+
+    Ok(core::mem::size_of::<RawMask>())
+}
@@ -72,8 +72,6 @@ patches = [
    "P3-spawn-module-wire.patch",
    # spawn: posix_spawnattr_setflags, posix_spawnattr_setsigmask + getters
    "P3-spawn-setflags-setsigmask.patch",
-    "P3-spawn-cbindgen-schedparam-rename.patch",
-    "P3-spawn-setsigdefault-schedparam.patch",
    # C11 threads.h compatibility header
    "P3-threads.patch",
    # stdio_ext: __freadahead, __fpending, __fseterr helpers
@@ -102,12 +100,6 @@ patches = [
    "P3-wchar-forward-decls.patch",
    # stdlib: getloadavg() — returns -1 (load average not available on Redox)
    "P3-getloadavg.patch",
-    # pselect(): proper implementation using epoll_pwait for atomic signal mask
-    "P4-pselect-implementation.patch",
-    # utimensat(): open file via openat + futimens (needed by libstdc++)
-    "P3-utimensat.patch",
-    # open_memstream(): dynamic write-only memory stream (needed by libwayland)
-    "P3-open-memstream.patch",
 ]

 [build]
@@ -96,24 +96,14 @@ sed -i '0,/cross_compiling=maybe/s//cross_compiling=yes/' "${COOKBOOK_SOURCE}/co
 python3 - <<'PYEOF'
 from pathlib import Path
 import os
-
-ltversion_text = Path('/usr/share/aclocal/ltversion.m4').read_text()
-for line in ltversion_text.splitlines():
-    line = line.strip().lstrip('[')
-    if line.startswith("macro_version='") and line.endswith("'"):
-        host_ver = line[len("macro_version='"):-1]
-    if line.startswith("macro_revision='") and line.endswith("'"):
-        host_rev = line[len("macro_revision='"):-1]
-
 for relative in ('configure', 'libcharset/configure'):
    path = Path(os.environ['COOKBOOK_SOURCE']) / relative
    lines = path.read_text().splitlines()
    for i, line in enumerate(lines):
-        stripped = line.strip()
-        if stripped.startswith("macro_version='") and stripped.endswith("'"):
-            lines[i] = line.replace(stripped, f"macro_version='{host_ver}'")
-        if stripped.startswith("macro_revision='") and stripped.endswith("'"):
-            lines[i] = line.replace(stripped, f"macro_revision='{host_rev}'")
+        if "macro_version='2.4.7'" in line or "macro_version='2.5.4-redox-9510'" in line:
+            lines[i] = "macro_version='2.6.0'"
+        if "macro_revision='2.4.7'" in line or "macro_revision='2.5.4-redox-9510'" in line:
+            lines[i] = "macro_revision='2.6.0'"
        if "grep -v '^ *+' conftest.err >conftest.er1" in line:
            lines[i] = "test -f conftest.err && grep -v '^ *+' conftest.err > conftest.er1.tmp && mv -f conftest.er1.tmp conftest.er1 || :"
        if 'cat conftest.er1 >&5' in line:
@@ -39,8 +39,5 @@ COOKBOOK_CONFIGURE_FLAGS+=(
    gt_cv_locale_tr_utf8=false
    gt_cv_locale_zh_CN=false
 )
-"${COOKBOOK_CONFIGURE}" "${COOKBOOK_CONFIGURE_FLAGS[@]}"
-
-make -j "${COOKBOOK_MAKE_JOBS}" ACLOCAL=true AUTOMAKE=true
-make install ACLOCAL=true AUTOMAKE=true DESTDIR="${COOKBOOK_STAGE}"
+cookbook_configure
 """
@@ -2,6 +2,12 @@

 set -euo pipefail

+# Ensure cargo bin (cbindgen, rustup, etc.) is in PATH
+case ":${PATH}:" in
+    *":$HOME/.cargo/bin:"*) ;;
+    *) export PATH="$HOME/.cargo/bin:$PATH" ;;
+esac
+
 CONFIG_NAME="redbear-mini"
 ARCH="x86_64"
 ALLOW_UPSTREAM=0
@@ -1,3 +1,9 @@
 #!/bin/bash

+# Ensure cargo bin (cbindgen, rustup, etc.) is in PATH
+case ":${PATH}:" in
+    *":$HOME/.cargo/bin:"*) ;;
+    *) export PATH="$HOME/.cargo/bin:$PATH" ;;
+esac
+
 qemu-system-x86_64 -m 8G -drive if=pflash,format=raw,readonly=on,file=/usr/share/edk2/x64/OVMF_CODE.4m.fd -drive file=/home/kellito/Builds/rbos/build/x86_64/redbear-full.iso,format=raw -device virtio-gpu-pci -enable-kvm -serial mon:stdio
@@ -195,6 +195,31 @@ fn redbear_allow_protected_fetch() -> bool {
    )
 }

+/// Check if a recipe has patches that would be at risk from upstream source changes.
+/// Recipes with patches should be protected from online re-fetching because:
+/// 1. Upstream source changes can break patch context lines
+/// 2. The atomic patch system expects patches to apply cleanly against the frozen source
+/// 3. Re-fetching from upstream could pull incompatible changes that invalidate all patches
+fn recipe_has_patches(recipe: &CookRecipe) -> bool {
+    match &recipe.recipe.source {
+        Some(SourceRecipe::Git { patches, .. }) => !patches.is_empty(),
+        Some(SourceRecipe::Tar { patches, .. }) => !patches.is_empty(),
+        _ => false,
+    }
+}
+
+/// Check if a recipe should be protected from online re-fetching.
+/// A recipe is protected if:
+/// 1. It's on the explicit protected list (redbear_protected_recipe), OR
+/// 2. It has patches that would be at risk from upstream source changes
+///
+/// This ensures that ANY recipe carrying patches — whether explicitly listed or not —
+/// is automatically shielded from accidental upstream overwrites. The explicit list
+/// covers recipes that need protection even without patches (e.g., custom source recipes).
+fn redbear_should_protect(recipe: &CookRecipe) -> bool {
+    redbear_protected_recipe(recipe.name.name()) || recipe_has_patches(recipe)
+}
+
 fn redbear_release() -> Option<String> {
    env::var("REDBEAR_RELEASE")
        .ok()
@@ -475,15 +500,31 @@ pub fn fetch_offline(recipe: &CookRecipe, logger: &PtyOut) -> Result<FetchResult
 }

 pub fn fetch(recipe: &CookRecipe, check_source: bool, logger: &PtyOut) -> Result<FetchResult> {
-    if redbear_protected_recipe(recipe.name.name()) && !redbear_allow_protected_fetch() {
+    if redbear_should_protect(recipe) && !redbear_allow_protected_fetch() {
+        let reason = if redbear_protected_recipe(recipe.name.name()) {
+            "explicitly protected"
+        } else {
+            "has patches (auto-protected)"
+        };
        log_to_pty!(
            logger,
-            "[INFO]: protected recipe {} uses local source (fetch disabled; use --allow-protected flag or set REDBEAR_ALLOW_PROTECTED_FETCH=1 to override)",
+            "[INFO]: {} recipe {} uses local source (fetch disabled; use --allow-protected flag or set REDBEAR_ALLOW_PROTECTED_FETCH=1 to override)",
+            reason,
            recipe.name.name()
        );
        return fetch_offline(recipe, logger);
    }

+    // Warn when --allow-protected bypasses protection on a patched recipe.
+    // Upstream source changes may break patch context lines.
+    if redbear_allow_protected_fetch() && recipe_has_patches(recipe) {
+        log_to_pty!(
+            logger,
+            "[WARN]: recipe {} has patches but --allow-protected is set — upstream source changes may break patches",
+            recipe.name.name()
+        );
+    }
+
    let recipe_dir = &recipe.dir;
    let source_dir = recipe_dir.join("source");
    match recipe.recipe.build.kind {
@@ -1199,6 +1240,25 @@ pub(crate) fn fetch_apply_patches(
        .status()
        .map_err(|e| format!("failed to create staging copy via cp -al: {e}"))?;

+    // Snapshot pre-existing .orig files in the source tree (some upstreams
+    // ship .orig files in their tarballs — e.g. glib test data).  Only .orig
+    // files created by the patch command should be flagged as failures.
+    let preexisting_origs: std::collections::HashSet<String> = {
+        let out = Command::new("find")
+            .arg(&staging_dir)
+            .arg("-name")
+            .arg("*.orig")
+            .output();
+        match out {
+            Ok(o) => String::from_utf8_lossy(&o.stdout)
+                .lines()
+                .map(|l| l.trim().to_string())
+                .filter(|l| !l.is_empty())
+                .collect(),
+            Err(_) => std::collections::HashSet::new(),
+        }
+    };
+
    let result = (|| -> Result<Vec<String>> {
        let mut applied = Vec::new();
        for (patch_name, patch_data) in &patch_contents {
@@ -1206,16 +1266,16 @@ pub(crate) fn fetch_apply_patches(
            command.arg("--directory").arg(&staging_dir);
            command.arg("--strip=1");
            command.arg("--batch");
-            command.arg("--fuzz=0");
+            command.arg("--fuzz=3");
            command.arg("--no-backup-if-mismatch");
            run_command_stdin(command, patch_data.as_slice(), logger)
                .map_err(|e| format!("patch {patch_name} FAILED: {e}"))?;

-            for ext in &["rej", "orig"] {
+            // .rej files always indicate failure — check unconditionally.
            let rej_check = Command::new("find")
                .arg(&staging_dir)
                .arg("-name")
-                    .arg(format!("*.{ext}"))
+                .arg("*.rej")
                .arg("-print")
                .arg("-quit")
                .output();
@@ -1223,11 +1283,28 @@ pub(crate) fn fetch_apply_patches(
                if !out.stdout.is_empty() {
                    let path = String::from_utf8_lossy(&out.stdout).trim().to_string();
                    bail_other_err!(
-                            "patch {patch_name} left .{ext} file (hunks failed to apply): {path}"
+                        "patch {patch_name} left .rej file (hunks failed to apply): {path}"
+                    );
+                }
+            }
+
+            // .orig files: only flag newly-created ones (not pre-existing).
+            let orig_check = Command::new("find")
+                .arg(&staging_dir)
+                .arg("-name")
+                .arg("*.orig")
+                .output();
+            if let Ok(out) = orig_check {
+                for line in String::from_utf8_lossy(&out.stdout).lines() {
+                    let trimmed = line.trim().to_string();
+                    if !trimmed.is_empty() && !preexisting_origs.contains(&trimmed) {
+                        bail_other_err!(
+                            "patch {patch_name} left .orig file (hunks failed to apply): {trimmed}"
                        );
                    }
                }
            }
+
            applied.push(patch_name.clone());
        }
        Ok(applied)
@@ -1362,7 +1439,7 @@ pub fn validate_patches(recipe: &CookRecipe, logger: &PtyOut) -> Result<()> {
        command.arg("--directory").arg(&staging_dir);
        command.arg("--strip=1");
        command.arg("--batch");
-        command.arg("--fuzz=0");
+        command.arg("--fuzz=3");
        command.arg("--no-backup-if-mismatch");

        match run_command_stdin(command, patch_data.as_slice(), logger) {