Files
RedBear-OS/local/docs/DRIVER-DISCOVERY-AND-MAPPING-PLAN.md
T

364 lines
15 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Driver Discovery and Dynamic Hardware Mapping Plan
**Status**: Draft — implementation pending
**Date**: 2026-05-27
**Supersedes**: Ad-hoc pcid-spawner + hardcoded lived disk paths
**Author**: Red Bear OS team
---
## 1. Problem Statement
Red Bear OS has two critical gaps in hardware discovery:
1. **lived's disk fallback is broken**: The live ISO boot daemon (`lived`) tries hardcoded paths `/scheme/disk/0` and `/scheme/usbscsi/0` to find the physical boot disk. But no disk driver registers those exact scheme names — they register `disk.pci-00-1F-2_ahci`, `disk.usb-xhci+1-scsi`, etc. The fallback **never works**.
2. **No dynamic hardware mapping**: The system does not distinguish between "hardware present" and "driver needed." On bare metal with no virtio devices, the system should not try to load `virtio-blkd`. On QEMU with no real AHCI controller, the system should not try to load `ahcid`. Today, the driver-manager loads whatever matches its static config files regardless of whether the hardware exists.
Linux solves both problems with a two-stage model:
- **Stage 1 (initramfs)**: Enumerate PCI bus, load ONLY the storage driver matching the boot controller, mount rootfs.
- **Stage 2 (rootfs)**: Full enumeration, udev + modprobe dynamically load all remaining drivers based on actual hardware.
---
## 2. Current Architecture
### 2.1 Boot Sequence (Initfs Phase)
```
Bootstrap (PID 1) → init → services start in dependency order:
00_runtime.target randd, nulld, zerod, rtcd, logd
10_inputd.service VT input multiplexer
10_lived.service Live disk daemon (RAM preload + disk fallback)
20_graphics.target vesad (FB handoff), fbcond, fbbootlogd
41_acpid.service ACPI interpreter → scheme:acpi
40_hwd.service Hardware manager → spawns pcid internally
pcid → enumerates PCI bus → registers scheme:pci
00_driver-manager-initfs.service (if P26 applied)
Loads /scheme/initfs/lib/drivers.d/00-storage.toml
Only: ahcid, ided, nvmed, virtio-blkd
40_drivers.target All initfs drivers
50_rootfs.service Mount rootfs (hard dep on drivers.target)
90_initfs.target Trigger switchroot
```
### 2.2 Driver Registration Contract
All disk drivers using `driver_block::DiskScheme` register schemes starting with `"disk"`:
| Driver | Scheme Name Pattern | Match Criteria |
|--------|---------------------|----------------|
| ided | `disk.pci-XX-XX-X_ide` | PCI class 0x01, subclass 0x01 |
| ahcid | `disk.pci-XX-XX-X_ahci` | PCI class 0x01, subclass 0x06 |
| nvmed | `disk.pci-XX-XX-X-nvme` | PCI class 0x01, subclass 0x08 |
| virtio-blkd | `disk.pci-XX-XX-X_virtio_blk` | PCI vendor 0x1AF4, device 0x1001 |
| usbscsid | `disk.usb-xhci+PORT-scsi` | USB SCSI transport |
| lived | `disk.live` | RAM-backed (our daemon) |
The `DiskScheme::new()` assertion (`assert!(scheme_name.starts_with("disk"))`) is the **contract** that enables dynamic discovery: any consumer can find all disk schemes by listing `/scheme/` and filtering for the `"disk"` prefix.
### 2.3 The Two Driver-Loading Paths
| Path | Mechanism | Config Source | Drivers |
|------|-----------|---------------|---------|
| **Initfs** | `driver-manager --initfs` | `/scheme/initfs/lib/drivers.d/00-storage.toml` | Storage only (4 drivers) |
| **Rootfs** | `driver-manager --hotplug` | `/lib/drivers.d/*.toml` | All categories (40+ drivers) |
### 2.4 How Linux Does It (Reference)
Linux uses a two-tier ordering:
**Tier 1 — Initcall levels** (include/linux/init.h):
```
Level 0: pure_initcall (architecture setup)
Level 2: postcore_initcall (PCI subsystem registers here)
Level 4: subsys_initcall (SCSI, networking subsystems)
Level 6: device_initcall (module_init → all built-in drivers)
Level 7: late_initcall (late-stage platform drivers)
```
**Tier 2 — Link order** within device_initcall (drivers/Makefile):
```
Line 49: obj-y += virtio/ # VirtIO before block
Line 76: obj-y += block/ # Block devices (storage)
Line 84: obj-y += nvme/ # NVMe
Line 85: obj-y += ata/ # ATA/AHCI
Line 92: obj-y += net/ # Network
Line 68: obj-y += gpu/ # GPU comes AFTER storage
```
**The critical principle**: Storage must load before GPU not because of PCI ordering, but because GPU drivers need firmware blobs from `/lib/firmware/` — which requires a mounted filesystem. Storage drivers are needed to mount that filesystem.
**Dynamic loading** (after rootfs mount): `MODULE_DEVICE_TABLE` entries in every driver generate `modules.alias` patterns. udev receives kernel uevents with `MODALIAS=pci:v00001AF4d00001001...`, calls `modprobe`, which looks up the alias and loads the matching `.ko` module.
---
## 3. Design: Two-Stage Dynamic Hardware Discovery
### 3.1 Stage 1 — Initfs Boot (Storage-Only)
**Goal**: Load exactly the storage driver(s) needed to mount the root filesystem. No more, no less.
**Mechanism**: driver-manager `--initfs` already exists and does PCI class/vendor matching. The missing piece is that the P26 patch (which creates `00_driver-manager-initfs.service` and `initfs-storage.toml`) is wired in `recipe.toml` but needs to be applied.
**Initfs driver config** (`initfs-storage.toml`):
```toml
# Only storage drivers — needed to mount rootfs
# GPU/display deliberately excluded (handled by rootfs DRM/KMS stack)
[[driver]]
name = "nvmed"
description = "NVMe storage driver"
priority = 100
command = ["/scheme/initfs/lib/drivers/nvmed"]
[[driver.match]]
bus = "pci"
class = 1
subclass = 8
[[driver]]
name = "ahcid"
description = "AHCI SATA driver"
priority = 100
command = ["/scheme/initfs/lib/drivers/ahcid"]
[[driver.match]]
bus = "pci"
class = 1
subclass = 6
[[driver]]
name = "ided"
description = "PATA IDE driver"
priority = 100
command = ["/scheme/initfs/lib/drivers/ided"]
[[driver.match]]
bus = "pci"
class = 1
subclass = 1
[[driver]]
name = "virtio-blkd"
description = "VirtIO block device driver"
priority = 100
command = ["/scheme/initfs/lib/drivers/virtio-blkd"]
[[driver.match]]
bus = "pci"
vendor = 0x1AF4
device = 0x1001
```
**How this is already dynamic**: The driver-manager only spawns a driver when the PCI bus actually reports a matching device. If QEMU has no AHCI controller, `ahcid` is never spawned. If bare metal has no VirtIO devices, `virtio-blkd` is never spawned. The TOML match table is a **candidate list**, not a **must-load list**.
**What's needed**: Ensure P26 is applied, ensure `virtio-blkd` is in the BINS list, and ensure the initfs binary staging includes all 4 storage drivers.
### 3.2 Stage 2 — Rootfs (Full Hardware Discovery)
**Goal**: After rootfs is mounted, dynamically discover and load ALL remaining drivers based on actual hardware.
**Mechanism**: `driver-manager --hotplug` already reads `/lib/drivers.d/*.toml` (8 config files, 40+ drivers), enumerates PCI + ACPI buses, and spawns matching drivers. It also runs a hotplug loop for device add/remove.
**The existing driver configs are already data-driven and dynamic**:
| Config File | Category | Priority | Matching |
|-------------|----------|----------|----------|
| `00-storage.toml` | Storage | 100 | PCI class-based |
| `10-network.toml` | Network | 50 | PCI vendor + class |
| `20-usb.toml` | USB | 80 | PCI class + prog_if |
| `30-graphics.toml` | GPU/Display | 60 | PCI class 0x03 |
| `40-input.toml` | Input | 40 | Sentinel (vendor=0xFFFF) |
| `50-audio.toml` | Audio | 40 | PCI vendor + class |
| `60-gpio-i2c.toml` | GPIO/I2C | 30 | ACPI bus matching |
| `70-usb-class.toml` | USB class | 20 | Sentinel (vendor=0xFFFF) |
**Key property**: Priority ordering ensures storage (100) > USB (80) > GPU (60) > network (50) > audio (40). This mirrors Linux's link-order principle.
### 3.3 lived Disk Fallback Fix
**Current bug**: `lived` tries `/scheme/disk/0` — but real schemes are named `disk.pci-00-1F-2_ahci`, never just `disk`.
**Fix**: Replace hardcoded paths with RedoxFS-style dynamic scheme discovery (same pattern as `filesystem_by_uuid` in `redoxfs/src/bin/mount.rs`):
```rust
fn try_open_disk(&self) -> Result<File, String> {
for attempt in 0..DISK_OPEN_MAX_RETRIES {
// List /scheme/ to find all registered disk schemes
if let Ok(entries) = std::fs::read_dir("/scheme") {
for entry in entries.flatten() {
let name = entry.file_name();
let name_str = name.to_string_lossy();
// All disk schemes start with "disk." (driver-block contract)
// Skip our own "disk.live" scheme
if name_str.starts_with("disk.") && name_str != "disk.live" {
// Try opening disk 0 on this scheme
let path = format!("/scheme/{}/0", name_str);
if let Ok(file) = File::open(&path) {
eprintln!("lived: opened physical disk at {} (attempt {})",
path, attempt + 1);
return Ok(file);
}
}
}
}
if attempt < DISK_OPEN_MAX_RETRIES - 1 {
std::thread::sleep(std::time::Duration::from_millis(
DISK_OPEN_RETRY_INTERVAL_MS
));
}
}
Err(format!("no disk scheme found after {} retries", DISK_OPEN_MAX_RETRIES))
}
```
**This is the exact pattern RedoxFS uses** in `filesystem_by_uuid()`. It:
1. Lists `/scheme/` (all registered schemes)
2. Filters to names starting with `"disk."` (the `driver-block` contract)
3. Skips `disk.live` (our own RAM-backed scheme)
4. Tries opening disk 0 on each discovered scheme
**Boot timing**: lived starts at service 10, before disk drivers. The retry loop (60 × 500ms = 30s) gives driver-manager and storage drivers time to load and register their schemes. As soon as ANY storage driver registers `disk.*`, lived finds it.
---
## 4. What Needs to Change
### 4.1 Patches Required
| Component | Patch | What It Does |
|-----------|-------|--------------|
| **base** | P60 (new) | Add `virtio-blkd` to BINS + staged files; update lived's `try_open_disk()` with dynamic scheme discovery |
| **kernel** | P26 (existing) | DebugDisplay scrolling fix (already done) |
| **base** | P26-driver-manager-initfs-conversion.patch (existing, wired but needs application verification) | Replaces pcid-spawner with driver-manager in initfs |
### 4.2 Changes to `recipes/core/base/recipe.toml`
1. **Add `virtio-blkd` to BINS** (already done in working tree)
2. **Add `virtio-blkd` to staged files list** (already done in working tree)
3. **No changes to driver configs**`initfs-storage.toml` already lists all 4 storage drivers
### 4.3 Changes to `recipes/core/base/source/drivers/storage/lived/src/main.rs`
Replace the hardcoded `candidates` array in `try_open_disk()` with `/scheme/` directory enumeration that discovers disk schemes dynamically.
### 4.4 No Changes Needed
- **driver-manager** — already does dynamic PCI matching
- **initfs-storage.toml** — already has the right 4 storage drivers
- **Driver configs** (`/lib/drivers.d/*.toml`) — already data-driven with vendor/class matching
- **pcid** — already enumerates PCI bus correctly
- **Boot service order** — already correct (lived at 10, driver-manager-initfs at 00, rootfs at 50)
---
## 5. Verification Plan
### 5.1 QEMU with IDE (default)
```bash
timeout 60 qemu-system-x86_64 \
-drive file=build/x86_64/redbear-full.iso,format=raw \
-m 4G -smp 4 -serial stdio -no-reboot
```
Expected: lived finds `disk.pci-00-01-1_ide` scheme from `ided`, mounts rootfs.
### 5.2 QEMU with virtio-blk
```bash
timeout 60 qemu-system-x86_64 \
-device virtio-blk-pci,drive=drive0 \
-drive id=drive0,file=build/x86_64/redbear-full.iso,format=raw,if=none \
-m 4G -smp 4 -serial stdio -no-reboot
```
Expected: lived finds `disk.pci-00-XX-X_virtio_blk` scheme from `virtio-blkd`, mounts rootfs.
### 5.3 Bare Metal USB Boot
Expected: lived finds `disk.usb-xhci+PORT-scsi` scheme from `usbscsid`, mounts rootfs.
### 5.4 No Unnecessary Drivers
On QEMU with only virtio-blk (no AHCI), `ahcid` should NOT be spawned. Verify via boot log:
```
driver-manager: no driver found for pci 0000:00:01.1 # IDE controller — no match
driver-manager: bound: 0000:00:04.0 -> virtio-blkd # VirtIO block — matched
```
---
## 6. PCI Class Code Reference
From Linux `include/linux/pci_ids.h` and our driver configs:
| Class | Subclass | Prog IF | Device Type | Red Bear Driver |
|-------|----------|---------|-------------|-----------------|
| 0x01 | 0x01 | — | IDE/PATA | `ided` |
| 0x01 | 0x06 | 0x01 | AHCI SATA | `ahcid` |
| 0x01 | 0x08 | 0x02 | NVMe | `nvmed` |
| 0x01 | 0x00 | — | VirtIO Block (vendor 0x1AF4, device 0x1001) | `virtio-blkd` |
| 0x02 | — | — | Ethernet | `e1000d`, `rtl8168d`, etc. |
| 0x03 | — | — | Display/GPU | `redox-drm` |
| 0x04 | 0x03 | — | Audio (HDA) | `ihdad` |
| 0x0C | 0x03 | 0x30 | xHCI USB | `xhcid` |
| 0x0C | 0x03 | 0x00 | UHCI USB | `uhcid` |
| 0x0C | 0x03 | 0x10 | OHCI USB | `ohcid` |
| 0x0C | 0x03 | 0x20 | EHCI USB | `ehcid` |
---
## 7. Boot Timeline (Target State)
```
T+0ms Bootstrap starts, creates initfs/procmgr/namespace schemes
T+50ms init starts, launches 00_randd → 00_logd → 00_runtime.target
T+200ms lived starts (service 10), loads 128 MiB preload
T+300ms vesad starts (FB handoff for text console)
T+400ms acpid starts → ACPI interpreter → scheme:acpi
T+500ms hwd starts → spawns pcid → PCI bus scan → scheme:pci
driver-manager --initfs starts:
Loads 00-storage.toml (4 storage drivers)
Enumerates PCI bus via /scheme/pci/
QEMU: finds 8086:7010 (IDE) → spawns ided
finds 1234:1111 (virtio-gpu) → no storage match, skipped
finds 1AF4:1050 (virtio-net) → no storage match, skipped
T+1500ms ided registers disk.pci-00-01-1_ide
lived discovers disk.pci-00-01-1_ide via /scheme/ enumeration
lived disk fallback succeeds
T+2000ms redoxfs mounts rootfs from lived
T+2500ms switchroot → rootfs init starts
T+3000ms driver-manager --hotplug starts (rootfs):
Loads all /lib/drivers.d/*.toml configs
Detects ided already bound → skips
Finds 1234:1111 (display class 0x03) → spawns redox-drm
Finds 8086:100E (network class 0x02) → spawns e1000d
Finds 1AF4:1050 (virtio-net) → spawns virtio-netd
T+5000ms All drivers bound, system fully operational
```
---
## 8. Principles
1. **Data-driven, not hardcoded**: Driver matching via TOML configs with vendor/device/class fields. No binary name hardcoding, no path guessing.
2. **Enumerate first, match second**: PCI bus scan produces ALL devices. Driver matching filters to supported ones. Unknown hardware is logged but doesn't block boot.
3. **Priority ordering**: Storage (100) before USB (80) before GPU (60) before network (50) before audio (40). Mirrors Linux's link-order principle.
4. **Stage 1 = minimum viable set**: Initfs loads ONLY storage drivers. Everything else waits for rootfs.
5. **Dynamic scheme discovery**: lived discovers disk schemes by reading `/scheme/` and filtering for the `"disk."` prefix — the same contract that `driver-block` enforces.
6. **No unnecessary drivers**: If hardware doesn't exist, the driver is never spawned. `driver-manager` only calls `probe()` for devices that actually exist on the PCI/ACPI bus.
7. **Deferred retry for timing**: Drivers that start before their dependencies are ready get retried (3 times in initfs, 5 times in hotplug). After max retries, the device is permanently skipped with a logged reason.