Files
RedBear-OS/local/docs/DRIVER-DISCOVERY-AND-MAPPING-PLAN.md
T

15 KiB
Raw Blame History

Driver Discovery and Dynamic Hardware Mapping Plan

Status: Draft — implementation pending Date: 2026-05-27 Supersedes: Ad-hoc pcid-spawner + hardcoded lived disk paths Author: Red Bear OS team


1. Problem Statement

Red Bear OS has two critical gaps in hardware discovery:

  1. lived's disk fallback is broken: The live ISO boot daemon (lived) tries hardcoded paths /scheme/disk/0 and /scheme/usbscsi/0 to find the physical boot disk. But no disk driver registers those exact scheme names — they register disk.pci-00-1F-2_ahci, disk.usb-xhci+1-scsi, etc. The fallback never works.

  2. No dynamic hardware mapping: The system does not distinguish between "hardware present" and "driver needed." On bare metal with no virtio devices, the system should not try to load virtio-blkd. On QEMU with no real AHCI controller, the system should not try to load ahcid. Today, the driver-manager loads whatever matches its static config files regardless of whether the hardware exists.

Linux solves both problems with a two-stage model:

  • Stage 1 (initramfs): Enumerate PCI bus, load ONLY the storage driver matching the boot controller, mount rootfs.
  • Stage 2 (rootfs): Full enumeration, udev + modprobe dynamically load all remaining drivers based on actual hardware.

2. Current Architecture

2.1 Boot Sequence (Initfs Phase)

Bootstrap (PID 1) → init → services start in dependency order:

  00_runtime.target    randd, nulld, zerod, rtcd, logd
  10_inputd.service    VT input multiplexer
  10_lived.service     Live disk daemon (RAM preload + disk fallback)
  20_graphics.target   vesad (FB handoff), fbcond, fbbootlogd
  41_acpid.service     ACPI interpreter → scheme:acpi
  40_hwd.service       Hardware manager → spawns pcid internally
                     pcid → enumerates PCI bus → registers scheme:pci
  00_driver-manager-initfs.service   (if P26 applied)
                     Loads /scheme/initfs/lib/drivers.d/00-storage.toml
                     Only: ahcid, ided, nvmed, virtio-blkd
  40_drivers.target   All initfs drivers
  50_rootfs.service   Mount rootfs (hard dep on drivers.target)
  90_initfs.target    Trigger switchroot

2.2 Driver Registration Contract

All disk drivers using driver_block::DiskScheme register schemes starting with "disk":

Driver Scheme Name Pattern Match Criteria
ided disk.pci-XX-XX-X_ide PCI class 0x01, subclass 0x01
ahcid disk.pci-XX-XX-X_ahci PCI class 0x01, subclass 0x06
nvmed disk.pci-XX-XX-X-nvme PCI class 0x01, subclass 0x08
virtio-blkd disk.pci-XX-XX-X_virtio_blk PCI vendor 0x1AF4, device 0x1001
usbscsid disk.usb-xhci+PORT-scsi USB SCSI transport
lived disk.live RAM-backed (our daemon)

The DiskScheme::new() assertion (assert!(scheme_name.starts_with("disk"))) is the contract that enables dynamic discovery: any consumer can find all disk schemes by listing /scheme/ and filtering for the "disk" prefix.

2.3 The Two Driver-Loading Paths

Path Mechanism Config Source Drivers
Initfs driver-manager --initfs /scheme/initfs/lib/drivers.d/00-storage.toml Storage only (4 drivers)
Rootfs driver-manager --hotplug /lib/drivers.d/*.toml All categories (40+ drivers)

2.4 How Linux Does It (Reference)

Linux uses a two-tier ordering:

Tier 1 — Initcall levels (include/linux/init.h):

Level 0: pure_initcall     (architecture setup)
Level 2: postcore_initcall  (PCI subsystem registers here)
Level 4: subsys_initcall    (SCSI, networking subsystems)
Level 6: device_initcall    (module_init → all built-in drivers)
Level 7: late_initcall      (late-stage platform drivers)

Tier 2 — Link order within device_initcall (drivers/Makefile):

Line 49: obj-y += virtio/        # VirtIO before block
Line 76: obj-y += block/         # Block devices (storage)
Line 84: obj-y += nvme/          # NVMe
Line 85: obj-y += ata/           # ATA/AHCI
Line 92: obj-y += net/           # Network
Line 68: obj-y += gpu/           # GPU comes AFTER storage

The critical principle: Storage must load before GPU not because of PCI ordering, but because GPU drivers need firmware blobs from /lib/firmware/ — which requires a mounted filesystem. Storage drivers are needed to mount that filesystem.

Dynamic loading (after rootfs mount): MODULE_DEVICE_TABLE entries in every driver generate modules.alias patterns. udev receives kernel uevents with MODALIAS=pci:v00001AF4d00001001..., calls modprobe, which looks up the alias and loads the matching .ko module.


3. Design: Two-Stage Dynamic Hardware Discovery

3.1 Stage 1 — Initfs Boot (Storage-Only)

Goal: Load exactly the storage driver(s) needed to mount the root filesystem. No more, no less.

Mechanism: driver-manager --initfs already exists and does PCI class/vendor matching. The missing piece is that the P26 patch (which creates 00_driver-manager-initfs.service and initfs-storage.toml) is wired in recipe.toml but needs to be applied.

Initfs driver config (initfs-storage.toml):

# Only storage drivers — needed to mount rootfs
# GPU/display deliberately excluded (handled by rootfs DRM/KMS stack)

[[driver]]
name = "nvmed"
description = "NVMe storage driver"
priority = 100
command = ["/scheme/initfs/lib/drivers/nvmed"]

[[driver.match]]
bus = "pci"
class = 1
subclass = 8

[[driver]]
name = "ahcid"
description = "AHCI SATA driver"
priority = 100
command = ["/scheme/initfs/lib/drivers/ahcid"]

[[driver.match]]
bus = "pci"
class = 1
subclass = 6

[[driver]]
name = "ided"
description = "PATA IDE driver"
priority = 100
command = ["/scheme/initfs/lib/drivers/ided"]

[[driver.match]]
bus = "pci"
class = 1
subclass = 1

[[driver]]
name = "virtio-blkd"
description = "VirtIO block device driver"
priority = 100
command = ["/scheme/initfs/lib/drivers/virtio-blkd"]

[[driver.match]]
bus = "pci"
vendor = 0x1AF4
device = 0x1001

How this is already dynamic: The driver-manager only spawns a driver when the PCI bus actually reports a matching device. If QEMU has no AHCI controller, ahcid is never spawned. If bare metal has no VirtIO devices, virtio-blkd is never spawned. The TOML match table is a candidate list, not a must-load list.

What's needed: Ensure P26 is applied, ensure virtio-blkd is in the BINS list, and ensure the initfs binary staging includes all 4 storage drivers.

3.2 Stage 2 — Rootfs (Full Hardware Discovery)

Goal: After rootfs is mounted, dynamically discover and load ALL remaining drivers based on actual hardware.

Mechanism: driver-manager --hotplug already reads /lib/drivers.d/*.toml (8 config files, 40+ drivers), enumerates PCI + ACPI buses, and spawns matching drivers. It also runs a hotplug loop for device add/remove.

The existing driver configs are already data-driven and dynamic:

Config File Category Priority Matching
00-storage.toml Storage 100 PCI class-based
10-network.toml Network 50 PCI vendor + class
20-usb.toml USB 80 PCI class + prog_if
30-graphics.toml GPU/Display 60 PCI class 0x03
40-input.toml Input 40 Sentinel (vendor=0xFFFF)
50-audio.toml Audio 40 PCI vendor + class
60-gpio-i2c.toml GPIO/I2C 30 ACPI bus matching
70-usb-class.toml USB class 20 Sentinel (vendor=0xFFFF)

Key property: Priority ordering ensures storage (100) > USB (80) > GPU (60) > network (50) > audio (40). This mirrors Linux's link-order principle.

3.3 lived Disk Fallback Fix

Current bug: lived tries /scheme/disk/0 — but real schemes are named disk.pci-00-1F-2_ahci, never just disk.

Fix: Replace hardcoded paths with RedoxFS-style dynamic scheme discovery (same pattern as filesystem_by_uuid in redoxfs/src/bin/mount.rs):

fn try_open_disk(&self) -> Result<File, String> {
    for attempt in 0..DISK_OPEN_MAX_RETRIES {
        // List /scheme/ to find all registered disk schemes
        if let Ok(entries) = std::fs::read_dir("/scheme") {
            for entry in entries.flatten() {
                let name = entry.file_name();
                let name_str = name.to_string_lossy();

                // All disk schemes start with "disk." (driver-block contract)
                // Skip our own "disk.live" scheme
                if name_str.starts_with("disk.") && name_str != "disk.live" {
                    // Try opening disk 0 on this scheme
                    let path = format!("/scheme/{}/0", name_str);
                    if let Ok(file) = File::open(&path) {
                        eprintln!("lived: opened physical disk at {} (attempt {})",
                            path, attempt + 1);
                        return Ok(file);
                    }
                }
            }
        }

        if attempt < DISK_OPEN_MAX_RETRIES - 1 {
            std::thread::sleep(std::time::Duration::from_millis(
                DISK_OPEN_RETRY_INTERVAL_MS
            ));
        }
    }

    Err(format!("no disk scheme found after {} retries", DISK_OPEN_MAX_RETRIES))
}

This is the exact pattern RedoxFS uses in filesystem_by_uuid(). It:

  1. Lists /scheme/ (all registered schemes)
  2. Filters to names starting with "disk." (the driver-block contract)
  3. Skips disk.live (our own RAM-backed scheme)
  4. Tries opening disk 0 on each discovered scheme

Boot timing: lived starts at service 10, before disk drivers. The retry loop (60 × 500ms = 30s) gives driver-manager and storage drivers time to load and register their schemes. As soon as ANY storage driver registers disk.*, lived finds it.


4. What Needs to Change

4.1 Patches Required

Component Patch What It Does
base P60 (new) Add virtio-blkd to BINS + staged files; update lived's try_open_disk() with dynamic scheme discovery
kernel P26 (existing) DebugDisplay scrolling fix (already done)
base P26-driver-manager-initfs-conversion.patch (existing, wired but needs application verification) Replaces pcid-spawner with driver-manager in initfs

4.2 Changes to recipes/core/base/recipe.toml

  1. Add virtio-blkd to BINS (already done in working tree)
  2. Add virtio-blkd to staged files list (already done in working tree)
  3. No changes to driver configsinitfs-storage.toml already lists all 4 storage drivers

4.3 Changes to recipes/core/base/source/drivers/storage/lived/src/main.rs

Replace the hardcoded candidates array in try_open_disk() with /scheme/ directory enumeration that discovers disk schemes dynamically.

4.4 No Changes Needed

  • driver-manager — already does dynamic PCI matching
  • initfs-storage.toml — already has the right 4 storage drivers
  • Driver configs (/lib/drivers.d/*.toml) — already data-driven with vendor/class matching
  • pcid — already enumerates PCI bus correctly
  • Boot service order — already correct (lived at 10, driver-manager-initfs at 00, rootfs at 50)

5. Verification Plan

5.1 QEMU with IDE (default)

timeout 60 qemu-system-x86_64 \
  -drive file=build/x86_64/redbear-full.iso,format=raw \
  -m 4G -smp 4 -serial stdio -no-reboot

Expected: lived finds disk.pci-00-01-1_ide scheme from ided, mounts rootfs.

5.2 QEMU with virtio-blk

timeout 60 qemu-system-x86_64 \
  -device virtio-blk-pci,drive=drive0 \
  -drive id=drive0,file=build/x86_64/redbear-full.iso,format=raw,if=none \
  -m 4G -smp 4 -serial stdio -no-reboot

Expected: lived finds disk.pci-00-XX-X_virtio_blk scheme from virtio-blkd, mounts rootfs.

5.3 Bare Metal USB Boot

Expected: lived finds disk.usb-xhci+PORT-scsi scheme from usbscsid, mounts rootfs.

5.4 No Unnecessary Drivers

On QEMU with only virtio-blk (no AHCI), ahcid should NOT be spawned. Verify via boot log:

driver-manager: no driver found for pci 0000:00:01.1   # IDE controller — no match
driver-manager: bound: 0000:00:04.0 -> virtio-blkd     # VirtIO block — matched

6. PCI Class Code Reference

From Linux include/linux/pci_ids.h and our driver configs:

Class Subclass Prog IF Device Type Red Bear Driver
0x01 0x01 IDE/PATA ided
0x01 0x06 0x01 AHCI SATA ahcid
0x01 0x08 0x02 NVMe nvmed
0x01 0x00 VirtIO Block (vendor 0x1AF4, device 0x1001) virtio-blkd
0x02 Ethernet e1000d, rtl8168d, etc.
0x03 Display/GPU redox-drm
0x04 0x03 Audio (HDA) ihdad
0x0C 0x03 0x30 xHCI USB xhcid
0x0C 0x03 0x00 UHCI USB uhcid
0x0C 0x03 0x10 OHCI USB ohcid
0x0C 0x03 0x20 EHCI USB ehcid

7. Boot Timeline (Target State)

T+0ms    Bootstrap starts, creates initfs/procmgr/namespace schemes
T+50ms   init starts, launches 00_randd → 00_logd → 00_runtime.target
T+200ms  lived starts (service 10), loads 128 MiB preload
T+300ms  vesad starts (FB handoff for text console)
T+400ms  acpid starts → ACPI interpreter → scheme:acpi
T+500ms  hwd starts → spawns pcid → PCI bus scan → scheme:pci
         driver-manager --initfs starts:
           Loads 00-storage.toml (4 storage drivers)
           Enumerates PCI bus via /scheme/pci/
           QEMU: finds 8086:7010 (IDE) → spawns ided
                  finds 1234:1111 (virtio-gpu) → no storage match, skipped
                  finds 1AF4:1050 (virtio-net) → no storage match, skipped
T+1500ms ided registers disk.pci-00-01-1_ide
         lived discovers disk.pci-00-01-1_ide via /scheme/ enumeration
         lived disk fallback succeeds
T+2000ms redoxfs mounts rootfs from lived
T+2500ms switchroot → rootfs init starts
T+3000ms driver-manager --hotplug starts (rootfs):
           Loads all /lib/drivers.d/*.toml configs
           Detects ided already bound → skips
           Finds 1234:1111 (display class 0x03) → spawns redox-drm
           Finds 8086:100E (network class 0x02) → spawns e1000d
           Finds 1AF4:1050 (virtio-net) → spawns virtio-netd
T+5000ms All drivers bound, system fully operational

8. Principles

  1. Data-driven, not hardcoded: Driver matching via TOML configs with vendor/device/class fields. No binary name hardcoding, no path guessing.

  2. Enumerate first, match second: PCI bus scan produces ALL devices. Driver matching filters to supported ones. Unknown hardware is logged but doesn't block boot.

  3. Priority ordering: Storage (100) before USB (80) before GPU (60) before network (50) before audio (40). Mirrors Linux's link-order principle.

  4. Stage 1 = minimum viable set: Initfs loads ONLY storage drivers. Everything else waits for rootfs.

  5. Dynamic scheme discovery: lived discovers disk schemes by reading /scheme/ and filtering for the "disk." prefix — the same contract that driver-block enforces.

  6. No unnecessary drivers: If hardware doesn't exist, the driver is never spawned. driver-manager only calls probe() for devices that actually exist on the PCI/ACPI bus.

  7. Deferred retry for timing: Drivers that start before their dependencies are ready get retried (3 times in initfs, 5 times in hotplug). After max retries, the device is permanently skipped with a logged reason.