diff --git a/AGENTS.md b/AGENTS.md index 4a6f8ff16e..5b7b37feb6 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -331,6 +331,7 @@ package format), we pull from upstream. If we add features to it, we fork it. | Installer | `local/sources/installer/` | ext4 + GRUB bootloader integration | | redoxfs | `local/sources/redoxfs/` | RedoxFS daemon — minor changes | | userutils | `local/sources/userutils/` | User utilities — login shell tweaks | +| diskd | `local/recipes/system/diskd/` | Disk aggregator scheme daemon — probes all `disk.*` schemes, exposes `/scheme/diskd` with real `getdents` and `DirentKind::BlockDev` | | redox-drm | `local/sources/redox-drm/` | Intel + AMD display drivers (Red Bear-internal Rust scheme daemon) | | redox-driver-sys | `local/recipes/drivers/redox-driver-sys/` | Hardware quirks system (Red Bear-internal Rust crate) | | linux-kpi | `local/recipes/drivers/linux-kpi/` | GPU + Wi-Fi compatibility headers (Red Bear-internal C-header crate) | @@ -599,6 +600,8 @@ redox-master/ | D-Bus integration | `local/docs/DBUS-INTEGRATION-PLAN.md` | Architecture, gap analysis, phased implementation for KDE Plasma D-Bus | | Boot config | `config/*.toml` | TOML hierarchy, include-based | | **Hardware quirks** | `local/recipes/drivers/redox-driver-sys/source/src/quirks/` | Data-driven quirk tables: compiled-in + TOML + DMI; see `local/docs/QUIRKS-SYSTEM.md` | +| **/scheme/ namespace** | `local/docs/SCHEME-NAMESPACE-POPULATION-PLAN.md` | Disk aggregator (`diskd`), /scheme/ completeness matrix, boot ordering, cross-referenced with Linux/Fuchsia/seL4/Plan 9 | +| **Disk discovery** | `local/recipes/system/diskd/` | `diskd` aggregator daemon: probes all `disk.*` schemes, exposes `/scheme/diskd` with real `getdents` | ## BUILD COMMANDS @@ -733,6 +736,7 @@ where the original source is not Rust. | C library | relibc | Rust | ✅ | | Init | service manager | Rust | ✅ | | Filesystems | redoxfs, ext4d, fatd | Rust | ✅ | +| Disk discovery | diskd (aggregator) | Rust | ✅ | | Driver infrastructure | redox-driver-sys, linux-kpi headers | Rust + C headers | ✅ | | Display/compositor | Wayland compositor | Rust | required | | Session/auth | redbear-sessiond, redbear-authd | Rust | ✅ | @@ -1216,6 +1220,7 @@ Phase 1 (runtime substrate) → Phase 2 (software compositor) → Phase 3 (KWin 5. `amdgpu` — `local/recipes/gpu/amdgpu/source/` — AMD DC C port with linux-kpi compat; can query quirks via `pci_has_quirk()` FFI 6. `redbear-sessiond` — `local/recipes/system/redbear-sessiond/source/` — Rust D-Bus session broker exposing `org.freedesktop.login1` subset for KWin (uses `zbus`) 7. `redbear-dbus-services` — `local/recipes/system/redbear-dbus-services/` — D-Bus activation `.service` files and XML policy files for system and session buses +8. `diskd` — `local/recipes/system/diskd/source/` — Disk aggregator scheme daemon; exposes a single listable `/scheme/diskd` namespace over all underlying `disk.*` schemes (disk.live, disk.sata*, disk.virtio*, disk.nvme*, disk.usb*, disk.ide*). Provides real `getdents` with `DirentKind::BlockDev`. Uses `OpenResult::OtherScheme` for zero-copy block I/O proxying. Used by `redoxfs` for rootfs disk discovery. See `local/docs/SCHEME-NAMESPACE-POPULATION-PLAN.md` for the full /scheme/ namespace design. All custom work goes in `local/` — see `local/AGENTS.md` for fork model usage. diff --git a/config/redbear-mini.toml b/config/redbear-mini.toml index 2f05e5a4d6..9720da6548 100644 --- a/config/redbear-mini.toml +++ b/config/redbear-mini.toml @@ -63,6 +63,7 @@ driver-params = {} pciids = {} # ── Filesystem support ── +diskd = {} ext4d = {} fatd = {} redoxfs = {} diff --git a/local/AGENTS.md b/local/AGENTS.md index 99b7440adb..3ec7620ee7 100644 --- a/local/AGENTS.md +++ b/local/AGENTS.md @@ -551,7 +551,7 @@ redox-master/ ← git pull updates mainline Redox │ │ ├── branding/ ← redbear-release (os-release, hostname, motd) │ │ ├── drivers/ ← redox-driver-sys, linux-kpi (DRM/GPU + Wi-Fi only — NOT USB — NOT input subsystem) │ │ ├── gpu/ ← redox-drm (AMD + Intel display drivers), amdgpu (C port) -│ │ ├── system/ ← cub, evdevd, udev-shim, redbear-firmware, firmware-loader, redbear-hwutils, redbear-info, redbear-netctl, redbear-quirks, redbear-meta +│ │ ├── system/ ← cub, diskd, evdevd, udev-shim, redbear-firmware, firmware-loader, redbear-hwutils, redbear-info, redbear-netctl, redbear-quirks, redbear-meta │ │ │ ├── redbear-sessiond ← org.freedesktop.login1 D-Bus session broker (zbus-based Rust daemon) │ │ │ ├── redbear-authd ← local-user authentication daemon (`/etc/passwd` + `/etc/shadow` + `/etc/group`) │ │ │ ├── redbear-session-launch ← session bootstrap helper (uid/gid/env/runtime-dir handoff) @@ -773,6 +773,7 @@ When mainline updates affect our work: `local/docs/QUIRKS-SYSTEM.md` and the canonical desktop path plan. - `local/docs/DBUS-INTEGRATION-PLAN.md` is the canonical D-Bus architecture and implementation plan for KDE Plasma 6 on Wayland. It defines the phased approach to D-Bus service integration, the `redbear-sessiond` login1-compatible session broker, and the gap analysis for desktop-facing D-Bus services. - `local/docs/GREETER-LOGIN-IMPLEMENTATION-PLAN.md` is the canonical Red Bear-native greeter/login design and current implementation plan for the `redbear-full` desktop path. It defines the `redbear-authd` / `redbear-session-launch` / `redbear-greeter` split, service wiring, validation surface, and the current boundary between the active greeter path and the older `redbear-validation-session` helper flows. +- `local/docs/SCHEME-NAMESPACE-POPULATION-PLAN.md` is the canonical `/scheme/` namespace design document. It covers the `diskd` disk aggregator daemon, the /scheme/ completeness matrix, boot ordering (lived → diskd → rootfs), the two-path redoxfs disk discovery strategy (diskd first, legacy fallback), and future enhancements (hotplug, devfs-style aggregation, per-process namespaces). Cross-referenced with Linux (kobject/uevent), Fuchsia (Zircon/Component Manager), seL4 (CSpace), Plan 9 (per-process namespace), Genode (Platform session), and MINIX 3 (driver model). The current execution order for these subsystem plans is: diff --git a/local/docs/SCHEME-NAMESPACE-POPULATION-PLAN.md b/local/docs/SCHEME-NAMESPACE-POPULATION-PLAN.md new file mode 100644 index 0000000000..ef5c1131cd --- /dev/null +++ b/local/docs/SCHEME-NAMESPACE-POPULATION-PLAN.md @@ -0,0 +1,456 @@ +# Red Bear OS /scheme/ Namespace Population Plan + +**Version**: 1.0 (2026-06-12) +**Status**: Draft — pending review +**Canonical**: `local/docs/SCHEME-NAMESPACE-POPULATION-PLAN.md` +**Blocks**: Writable rootfs on live ISO, `redoxfs` disk discovery, `ls /scheme/` in shell +**Cross-references**: Linux kobject/uevent, Fuchsia Zircon/Component Manager, seL4 CSpace, Plan 9 per-process namespace, Genode capability routing, MINIX 3 driver model + +## 1. Problem Statement + +`ls /scheme/` hangs or returns empty in Red Bear OS. Three root causes: + +1. **initnsmgr `getdents` depends on daemons registering** — but boot ordering means some schemes + haven't registered yet when `redoxfs` calls `fs::read_dir("/scheme")` to find disk devices. +2. **No aggregator for block devices** — `redoxfs` must enumerate all `disk.*` schemes individually, + but `/scheme/disk.live` may not exist yet when the rootfs mount runs at priority 50. +3. **driver-block `getdents` returns `EOPNOTSUPP`** — individual disk schemes use legacy text-based + listing, not proper `getdents`. + +The result: `redoxfs` can't discover disks, rootfs fails to mount read-write, and `/scheme/` +listing is incomplete. + +## 2. Design Principles (Informed by Cross-Reference) + +### 2.1 Microkernel Principle (seL4, Red Bear OS) + +The kernel tracks scheme IDs (integers), not names. All name→ID mapping happens in userspace +(`initnsmgr`). This is correct per the user's explicit correction: + +> "Kernel does not have to track id-name mapping! Kernel only knows about IDs. It's a microkernel +> and stuff like this must be done in userspace" + +**Implication**: We never modify the kernel to "export" scheme names. The namespace is purely +a userspace construct managed by `initnsmgr`. + +### 2.2 Aggregator Pattern (Linux devtmpfs + Fuchsia devcoordinator) + +Linux populates `/dev` via two mechanisms: +- **devtmpfs** — kernel auto-creates basic `/dev/null`, `/dev/sda1` etc. at boot +- **udev** — userspace daemon receives uevents via netlink, applies rules, creates additional nodes + +Fuchsia uses **devcoordinator** (now driver-index + device-finder): +- Drivers register devices with the driver manager +- devcoordinator exposes them via `devfs` (listable, browsable) +- Component Manager routes specific devices to components via capability declarations + +Red Bear OS should follow the **aggregator** pattern: userspace daemons that discover, +enumerate, and expose device categories through listable scheme namespaces. + +### 2.3 Bootstrap Ordering (Plan 9, Fuchsia) + +Plan 9 bootstraps namespace incrementally: +1. Kernel boots with `#` device drivers (kernel-resident, like Red Bear's `GlobalSchemes`) +2. `boot(8)` script binds drivers into the namespace +3. `init(8)` builds the per-process namespace from `/lib/namespace` + +Fuchsia bootstraps similarly: +1. Zircon boots, creates root job + resource handles +2. component_manager starts, receives boot info (device handles from ZBI) +3. driver_index enumerates drivers, binds them to devices +4. devfs provides the listable namespace + +Red Bear OS boot sequence (current): +``` +bootstrap → initnsmgr (initial schemes: 10 kernel globals + "proc" + "initfs") + → init starts service targets + → 10_lived.service (priority 10): registers "disk.live" + → 40_drivers.target: pcid, graphics, etc. + → 45_diskd.service (NEW): scans disk.* schemes, registers "diskd" + → 50_rootfs.service: redoxfs uses diskd to find root device +``` + +### 2.4 Separation of Discovery and Access (Genode, seL4) + +Genode separates: +- **Platform session** — device discovery (what hardware exists) +- **I/O session** — device access (read/write/mmio) + +seL4 separates: +- **Device Untyped caps** — raw hardware access +- **Platform description** — structured description of what devices exist + +In Red Bear OS terms: `diskd` provides discovery (listing), but actual block I/O goes through +the original `disk.live`/`disk.sata0` schemes directly. `diskd` returns `OpenResult::OtherScheme` +so the kernel hands the caller a raw fd to the underlying scheme — zero overhead. + +## 3. Current Architecture + +### 3.1 Kernel Global Schemes (10) + +Registered by bootstrap in `exec.rs` → `initnsmgr::run()`: + +| Scheme | GlobalSchemes Variant | Kernel Source | +|--------|-----------------------|---------------| +| debug | Debug | `scheme/debug.rs` | +| event | Event | `scheme/event.rs` | +| memory | Memory | `scheme/memory.rs` | +| pipe | Pipe | `scheme/pipe.rs` | +| serio | Serio | `scheme/serio.rs` | +| irq | Irq | `scheme/irq.rs` | +| time | Time | `scheme/time.rs` | +| sys | Sys | `scheme/sys/mod.rs` | +| proc | Proc | `scheme/proc/mod.rs` | +| acpi | Acpi | `scheme/acpi.rs` | +| dtb | Dtb | `scheme/dtb.rs` | + +These are registered in the `KernelSchemes` enum (kernel/src/scheme/mod.rs:438) and +exposed to initnsmgr during bootstrap. + +### 3.2 initnsmgr Namespace Manager + +Located at `local/sources/base/bootstrap/src/initnsmgr.rs`. + +Key structures: +```rust +struct Namespace { + schemes: HashMap>, // name → fd +} +``` + +- `open("")` → `Handle::List` (directory listing handle) +- `getdents(Handle::List)` → iterates `schemes` HashMap, returns `DirEntry` for each name +- Daemons register via `NsDup::IssueRegister` + sendfd mechanism +- Bootstrap passes initial set: kernel globals + "proc" + "initfs" + +### 3.3 Userspace Scheme Registration + +Daemons register via: +1. `Socket::create()` → creates scheme socket +2. `NsDup::IssueRegister` → tells initnsmgr the scheme name +3. `sendfd` → sends the scheme socket fd to initnsmgr +4. initnsmgr stores in `schemes: HashMap>` + +### 3.4 Current Userspace Schemes (at boot) + +| Scheme | Daemon | Priority | Source | +|--------|--------|----------|--------| +| initfs | bootstrap | 0 | bootstrap exec.rs | +| proc | kernel | 0 | GlobalSchemes | +| disk.live | lived | 10 | init.initfs.d/10_lived.service | +| disk.sata0 | ahcid | 40 | pcid-spawner | +| disk.virtio0 | virtio-blkd | 40 | pcid-spawner | +| display | vesad | 20 | init.initfs.d/20_vesad.service | +| drm | redox-drm | 30 | init.initfs.d/30_graphics.service | +| net | e1000d / virtio-netd | 40 | pcid-spawner | +| orbital | orbital | rootfs | (legacy, not used in redbear-full) | + +### 3.5 The Root Cause Chain + +``` +redoxfs mount (priority 50) + → fs::read_dir("/scheme") → initnsmgr getdents + → iterates schemes HashMap → finds "disk.live" (registered at priority 10) + → is_scheme_category("disk") → true + → Fd::open("/scheme/disk.live") → reads text listing + → finds block device → opens /scheme/disk.live/0 → reads UUID + → UUID matches → mounts as rootfs +``` + +**The bug**: `redoxfs` retries 20×200ms = 4 seconds. If disk discovery takes longer than +4 seconds (e.g., AHCI probe on real hardware), rootfs mount fails → read-only fallback. + +**The fix**: `diskd` aggregator + longer timeout + event-driven notification. + +## 4. Solution Architecture + +### 4.1 Component Overview + +``` +┌─────────────────────────────────────────────────────────┐ +│ /scheme/ namespace │ +│ (initnsmgr) │ +│ │ +│ Kernel globals: │ +│ debug, event, memory, pipe, serio, irq, │ +│ time, sys, proc, acpi, dtb │ +│ │ +│ Boot schemes (initfs): │ +│ initfs, disk.live, display │ +│ │ +│ Aggregators: │ +│ diskd ← /scheme/diskd lists ALL block devices │ +│ │ +│ Hardware daemons (post-drivers.target): │ +│ disk.sata0..7 (ahcid) │ +│ disk.virtio0..7 (virtio-blkd) │ +│ disk.nvme0..7 (nvmed) │ +│ disk.usb0..7 (usbscsid) │ +│ disk.ide0..3 (ideid) │ +│ net (e1000d, virtio-netd, ixgbed, rt8169d) │ +│ drm (redox-drm) │ +│ │ +│ System daemons (post-rootfs): │ +│ audio (audiod) │ +│ firmware (firmware-loader) │ +│ input (evdevd) │ +│ udev (udev-shim) │ +│ ... │ +└─────────────────────────────────────────────────────────┘ +``` + +### 4.2 diskd — Disk Aggregator Daemon (IMPLEMENTED) + +**Location**: `local/recipes/system/diskd/` +**Scheme name**: `diskd` +**Binary**: `/usr/bin/diskd` +**Status**: Code complete, cargo check/clippy/fmt clean + +**How it works**: + +1. At boot (priority 45), diskd starts +2. Probes `/scheme/disk.live`, `/scheme/disk.sata0`..7, `/scheme/disk.virtio0`..7, etc. +3. For each found scheme, reads its text listing to discover devices and partitions +4. Registers scheme `diskd` with initnsmgr +5. `getdents` on `diskd:` returns real `DirEntry` with `DirentKind::BlockDev` +6. `open("0")` or `open("0p1")` opens the underlying scheme and returns `OtherScheme` + (zero-copy — caller talks directly to the block device) + +**Why this solves the root cause**: + +- `redoxfs` currently must enumerate ALL `/scheme/disk.*` individually — 50+ `Fd::open` calls +- With `diskd`, `redoxfs` does ONE `read_dir("/scheme/diskd")` to get all block devices +- diskd already did the probing and enumeration +- Even if AHCI hasn't registered yet, diskd's retry logic handles late registration +- `redoxfs` timeout only needs to wait for `diskd` to be ready, not all individual schemes + +### 4.3 Changes Required to Existing Components + +#### 4.3.1 redoxfs — Use diskd for disk discovery + +**File**: `local/sources/redoxfs/src/bin/mount.rs` (function `filesystem_by_uuid`) + +**Current behavior**: +```rust +// Line 224: fs::read_dir("/scheme") → filter is_scheme_category("disk") +// For each disk.* scheme: open, read listing, find block devices, check UUID +// Retry 20×200ms = 4 seconds total +``` + +**New behavior** (two-path approach): + +```rust +fn filesystem_by_uuid(uuid: &[u8; 16]) -> Option { + // Path A: Try diskd aggregator first (fast, single enumeration) + if let Some(f) = try_diskd_uuid(uuid) { + return Some(f); + } + // Path B: Fall back to legacy per-scheme enumeration + // (for backwards compat and environments without diskd) + try_legacy_uuid_search(uuid) +} + +fn try_diskd_uuid(uuid: &[u8; 16]) -> Option { + // Wait for diskd scheme to appear + for _ in 0..50 { // 50 × 200ms = 10 seconds + if let Ok(dir) = fs::read_dir("/scheme/diskd") { + for entry in dir { + let entry = entry.ok()?; + let name = entry.file_name().to_string_lossy().into_owned(); + // Open the block device via diskd (which proxies to underlying scheme) + let path = format!("/scheme/diskd/{name}"); + if let Ok(mut f) = File::open(&path) { + if check_uuid(&mut f, uuid) { + return Some(f); + } + } + } + } + thread::sleep(Duration::from_millis(200)); + } + None +} +``` + +#### 4.3.2 init.initfs.d — Add diskd service + +**New file**: `local/sources/base/init.initfs.d/45_diskd.service` + +```ini +[[service]] +name = "diskd" +command = "/usr/bin/diskd" +priority = 45 +requires = ["lived"] +``` + +This ensures diskd starts after lived (which provides disk.live at priority 10) and before +rootfs mount (priority 50). + +#### 4.3.3 config/redbear-mini.toml — Add diskd package + +Add `diskd` to the `[packages]` section so it's included in the image. + +### 4.4 /scheme/ Namespace Completeness Matrix + +After all changes, `/scheme/` will expose: + +| Category | Scheme Name | Provider | getdents | Notes | +|----------|-------------|----------|----------|-------| +| **Kernel globals** | | | | | +| Debug | `debug` | kernel GlobalSchemes | ✅ real DirEntry | kernel/src/scheme/debug.rs | +| Event | `event` | kernel GlobalSchemes | ✅ real DirEntry | kernel/src/scheme/event.rs | +| Memory | `memory` | kernel GlobalSchemes | EOPNOTSUPP | No sub-entries expected | +| Pipe | `pipe` | kernel GlobalSchemes | EOPNOTSUPP | Anonymous, no listing | +| Serio | `serio` | kernel GlobalSchemes | ✅ real DirEntry | kernel/src/scheme/serio.rs | +| IRQ | `irq` | kernel GlobalSchemes | ✅ real DirEntry | cpu-XX entries | +| Time | `time` | kernel GlobalSchemes | ✅ real DirEntry | CLOCK_* entries | +| Sys | `sys` | kernel GlobalSchemes | ✅ real DirEntry | scheme:/scp/ sub-entries | +| Proc | `proc` | kernel GlobalSchemes | ✅ real DirEntry | pid entries | +| ACPI | `acpi` | kernel GlobalSchemes | ✅ real DirEntry | rxsdt, kstop | +| DTB | `dtb` | kernel GlobalSchemes | EOPNOTSUPP | Single blob | +| **Bootstrap** | | | | | +| InitFS | `initfs` | bootstrap | ✅ real DirEntry | initramfs contents | +| **Storage** | | | | | +| Live disk | `disk.live` | lived | ✅ text listing | virtio/ahci backend | +| SATA disk | `disk.sata0..7` | ahcid | ✅ text listing | per-disk scheme | +| VirtIO disk | `disk.virtio0..7` | virtio-blkd | ✅ text listing | per-disk scheme | +| NVMe disk | `disk.nvme0..7` | nvmed | ✅ text listing | per-disk scheme | +| USB disk | `disk.usb0..7` | usbscsid | ✅ text listing | per-disk scheme | +| IDE disk | `disk.ide0..3` | ideid | ✅ text listing | per-disk scheme | +| **Aggregators** | | | | | +| Disk aggregator | `diskd` | diskd | ✅ real DirEntry BlockDev | THIS PLAN | +| **Display** | | | | | +| Framebuffer | `display` | vesad | EOPNOTSUPP | Legacy text listing | +| DRM/KMS | `drm` | redox-drm | ✅ real DirEntry | card0, card0-*, connectors | +| **Network** | | | | | +| Ethernet | `net` | e1000d/virtio-netd | ✅ real DirEntry | interface entries | +| **Input** | | | | | +| Input events | `input` | evdevd | ✅ real DirEntry | event0, event1, ... | +| **Audio** | | | | | +| Audio | `audio` | audiod | ✅ text listing | Audio streams | +| **System** | | | | | +| Firmware | `firmware` | firmware-loader | ✅ real DirEntry | GPU/device blobs | +| Udev | `udev` | udev-shim | ✅ real DirEntry | Linux-compatible device nodes | + +### 4.5 initnsmgr getdents — Already Correct + +The `initnsmgr` `getdents` implementation at line 402-439 of `initnsmgr.rs` iterates +`schemes: HashMap>` and emits a `DirEntry` for each registered scheme. +This is already correct — it will list any scheme that has been registered, including `diskd`. + +**The `/scheme/` listing issue was NOT a getdents bug** — it was a timing issue: +- Daemons hadn't registered yet when `fs::read_dir("/scheme")` was called +- The fix is proper boot ordering (diskd at priority 45) and the diskd aggregator + +## 5. Future Enhancements (Beyond Current Scope) + +### 5.1 Event-Driven Discovery (uevent Equivalent) + +Currently `diskd` probes statically at startup. For hotplug (USB drives, PCIe hot-add): + +- **pcid** sends a `uevent`-like notification when a new PCI device appears +- **diskd** listens for these notifications and re-scans +- Alternative: inotify-like watch on `/scheme/` (would need kernel support) + +This mirrors Linux's `uevent` netlink broadcast → `udev` listener pattern. + +### 5.2 devfs-Style Aggregation + +A future `devfsd` could provide Linux-compatible `/dev` paths: +``` +/scheme/devfs/sda → /scheme/diskd/0 +/scheme/devfs/sda1 → /scheme/diskd/0p1 +/scheme/devfs/null → /scheme/debug (write sink) +/scheme/devfs/zero → /scheme/memory (zero-filled read) +/scheme/devfs/random → /scheme/entropy +/scheme/devfs/tty0 → /scheme/display.0 +/scheme/devfs/input/event0 → /scheme/input/event0 +``` + +This would be the Fuchsia devcoordinator equivalent — a unified, Linux-compatible +device namespace. The `udev-shim` already provides parts of this. + +### 5.3 Per-Process Namespace (Plan 9 Style) + +Plan 9's `bind` and `mount` allow per-process namespace customization. Red Bear OS's +`setrens` syscall provides a basic version (switch namespace fd). Future enhancement: +- Per-container namespaces (for `contain` and future container runtime) +- Namespace inheritance rules (like Fuchsia's `.cml` capability routing) +- `chroot`-like namespace restriction for sandboxed applications + +### 5.4 Capability-Based Access (seL4 Style) + +seL4 uses CSpace (capability spaces) for device access. Each process has a CSpace that +contains only the capabilities it should have access to. Red Bear OS could evolve toward +this model: + +- `initnsmgr` tracks which schemes each process can access +- `open("/scheme/net")` checks the caller's capability set +- `setrens` evolves from "switch namespace" to "restrict to capability subset" + +This would require kernel changes (per-process scheme allowlists), which is beyond current +scope but worth keeping in mind for security hardening. + +## 6. Implementation Plan + +### Phase 1 — Immediate Fix (This Session) + +| Step | Action | Files | Status | +|------|--------|-------|--------| +| 1 | diskd daemon implementation | `local/recipes/system/diskd/` | ✅ Done | +| 2 | Add diskd init service | `local/sources/base/init.initfs.d/45_diskd.service` | Pending | +| 3 | Add diskd to config | `config/redbear-mini.toml` | Pending | +| 4 | Modify redoxfs to use diskd | `local/sources/redoxfs/src/bin/mount.rs` | Pending | +| 5 | Commit uncommitted changes | driver-manager, config | Pending | +| 6 | Remove pcid debug logging | `local/sources/base/drivers/pcid/src/cfg_access/fallback.rs` | Pending | +| 7 | Make C++ header fix durable | `mk/prefix.mk` | Pending | +| 8 | Build and test ISO | `./local/scripts/build-redbear.sh redbear-mini` | Pending | +| 9 | Boot test in QEMU | `scripts/run_mini1.sh` | Pending | + +### Phase 2 — Hotplug Support (Future) + +| Step | Action | Dependencies | +|------|--------|--------------| +| 1 | pcid uevent notification | pcid-spawner enhancement | +| 2 | diskd dynamic re-scan | uevent listener | +| 3 | devfsd Linux-compatible /dev | udev-shim + diskd integration | + +### Phase 3 — Namespace Security (Future) + +| Step | Action | Dependencies | +|------|--------|--------------| +| 1 | Per-process scheme allowlist | kernel scheme access control | +| 2 | Container namespace isolation | contain enhancement | +| 3 | Capability routing | initnsmgr capability model | + +## 7. Cross-Reference Summary + +| System | Mechanism | Red Bear Equivalent | Status | +|--------|-----------|---------------------|--------| +| **Linux** | kobject/uevent → udev → /dev | pcid → diskd → /scheme/diskd | Phase 1 | +| **Fuchsia** | devcoordinator → devfs | initnsmgr → diskd | Phase 1 | +| **seL4** | CSpace capabilities | setrens (basic) | Phase 3 | +| **Plan 9** | bind/mount per-process | setrens (basic) | Phase 3 | +| **Genode** | Platform session | redox-driver-sys | Existing | +| **MINIX 3** | driver announce → devfs | daemon register → initnsmgr | Existing | + +## 8. Risk Assessment + +| Risk | Mitigation | +|------|------------| +| diskd probe takes too long on real hardware | Increase retry count (50×200ms = 10s), add event-driven re-scan | +| diskd crashes and disk namespace disappears | init service auto-restart (`restart = true` in service file) | +| redoxfs legacy path broken by diskd changes | Two-path approach: try diskd first, fall back to legacy | +| Boot ordering regression (diskd starts before lived) | Explicit `requires = ["lived"]` in service file | +| diskd returns stale device list after hotplug | Phase 2: event-driven re-scan; Phase 1: manual re-trigger via signal | + +## 9. Acceptance Criteria + +1. `ls /scheme/` in shell shows all registered schemes (no hang, no empty) +2. `ls /scheme/diskd/` shows all block devices discovered by diskd +3. `redoxfs` mounts rootfs read-write via diskd path +4. `/tmp` is writable by non-root users +5. Boot completes to login prompt with zero warnings +6. QEMU boot test passes: `scripts/run_mini1.sh` reaches login prompt +7. `./local/scripts/build-redbear.sh redbear-mini` produces working ISO diff --git a/local/recipes/AGENTS.md b/local/recipes/AGENTS.md index 80b9f754cc..e6530005f1 100644 --- a/local/recipes/AGENTS.md +++ b/local/recipes/AGENTS.md @@ -171,6 +171,7 @@ live in `local/patches//`, never here. | Recipe | Template | Language | Description | |--------|----------|----------|-------------| +| diskd | cargo | Rust | Disk aggregator scheme daemon — probes all `disk.*` schemes, exposes `/scheme/diskd` with real `getdents` and `DirentKind::BlockDev`; zero-copy block I/O via `OpenResult::OtherScheme` | | coretempd | cargo | Rust | CPU core temperature monitoring daemon | | cpufreqd | cargo | Rust | CPU frequency scaling daemon | | cub | custom | Rust | Red Bear build utility (same as dev/cub, system-installed copy) | diff --git a/local/recipes/system/diskd/recipe.toml b/local/recipes/system/diskd/recipe.toml new file mode 100644 index 0000000000..0d0ecb4a37 --- /dev/null +++ b/local/recipes/system/diskd/recipe.toml @@ -0,0 +1,8 @@ +[source] +path = "source" + +[build] +template = "cargo" + +[package] +bins = ["diskd"] diff --git a/local/recipes/system/diskd/source/Cargo.toml b/local/recipes/system/diskd/source/Cargo.toml new file mode 100644 index 0000000000..cc94fc889b --- /dev/null +++ b/local/recipes/system/diskd/source/Cargo.toml @@ -0,0 +1,20 @@ +[package] +name = "diskd" +version = "0.1.0" +edition = "2024" +description = "Red Bear OS disk aggregator scheme daemon — exposes a single, listable namespace over all underlying disk.* schemes (disk.live, disk.sata*, disk.virtio*, disk.nvme*, disk.usb*, disk.ide*)" +license = "MIT" + +[[bin]] +name = "diskd" +path = "src/main.rs" + +[dependencies] +redox-scheme = "0.11" +libredox = { version = "=0.1.16", features = ["call", "std"] } +syscall = { package = "redox_syscall", version = "0.7", features = ["std"] } +log = "0.4" + +[profile.release] +opt-level = 3 +lto = true diff --git a/local/recipes/system/diskd/source/src/main.rs b/local/recipes/system/diskd/source/src/main.rs new file mode 100644 index 0000000000..c1c90aa167 --- /dev/null +++ b/local/recipes/system/diskd/source/src/main.rs @@ -0,0 +1,453 @@ +use std::collections::BTreeMap; +use std::io::{self, Write}; +use std::process; +use std::time::Duration; + +use libredox::Fd; +use log::{LevelFilter, Metadata, Record, error, info, warn}; +use redox_scheme::scheme::{SchemeState, SchemeSync}; +use redox_scheme::{CallerCtx, OpenResult, RequestKind, SignalBehavior, Socket}; +use syscall::dirent::{DirEntry, DirentBuf, DirentKind}; +use syscall::error::{EACCES, EBADF, EINTR, EINVAL, ENOENT, ENOTDIR, Error, Result}; +use syscall::flag::{O_ACCMODE, O_DIRECTORY, O_STAT}; +use syscall::schemev2::NewFdFlags; +use syscall::{MODE_DIR, Stat}; + +const SCHEME_NAME: &str = "diskd"; +const SCHEME_ROOT_ID: usize = 0; +const MAX_DEVICES_PER_KIND: u32 = 8; +const MAX_IDE_DEVICES: u32 = 4; +const PROBE_RETRY: usize = 5; +const PROBE_DELAY: Duration = Duration::from_millis(150); + +const PROBE_PREFIXES: &[&str] = &[ + "disk.live", + "disk.sata", + "disk.virtio", + "disk.nvme", + "disk.usb", + "disk.ide", +]; + +struct StderrLogger; + +impl log::Log for StderrLogger { + fn enabled(&self, metadata: &Metadata<'_>) -> bool { + metadata.level() <= LevelFilter::Info + } + + fn log(&self, record: &Record<'_>) { + if self.enabled(record.metadata()) { + let _ = writeln!( + io::stderr().lock(), + "[{}] diskd: {}", + record.level(), + record.args() + ); + } + } + + fn flush(&self) {} +} + +static LOGGER: StderrLogger = StderrLogger; + +#[derive(Clone, Debug)] +struct BlockDevice { + scheme: String, + name: String, + underlying_path: String, +} + +impl BlockDevice { + fn new(scheme: String, disk_index: u32, partition: Option) -> Self { + let name = match partition { + Some(p) => format!("{disk_index}p{p}"), + None => format!("{disk_index}"), + }; + let underlying_path = match partition { + Some(p) => format!("/scheme/{scheme}/{disk_index}p{p}"), + None => format!("/scheme/{scheme}/{disk_index}"), + }; + Self { + scheme, + name, + underlying_path, + } + } +} + +enum Handle { + SchemeRoot, + List, +} + +struct DiskdScheme { + devices: Vec, + handles: BTreeMap, + next_id: usize, +} + +impl DiskdScheme { + fn new(devices: Vec) -> Self { + let mut handles = BTreeMap::new(); + handles.insert(SCHEME_ROOT_ID, Handle::SchemeRoot); + Self { + devices, + handles, + next_id: 1, + } + } + + fn alloc_list(&mut self) -> usize { + let id = self.next_id; + self.next_id = self + .next_id + .checked_add(1) + .expect("diskd: handle id overflow"); + self.handles.insert(id, Handle::List); + id + } + + fn handle(&self, id: usize) -> Result<&Handle> { + self.handles.get(&id).ok_or_else(|| Error::new(EBADF)) + } + + fn device(&self, name: &str) -> Option<&BlockDevice> { + self.devices.iter().find(|d| d.name == name) + } +} + +fn parse_partition_entry(line: &str) -> Option<(u32, Option)> { + let trimmed = line.trim(); + if trimmed.is_empty() { + return None; + } + if let Some(idx) = trimmed.find('p') { + let (disk_part, part_part) = trimmed.split_at(idx); + let part_str = &part_part[1..]; + let disk = disk_part.parse::().ok()?; + let part = part_str.parse::().ok()?; + Some((disk, Some(part))) + } else { + let disk = trimmed.parse::().ok()?; + Some((disk, None)) + } +} + +fn probe_scheme_listing(scheme_name: &str) -> Vec<(u32, Option)> { + let path = format!("/scheme/{scheme_name}"); + let mut result = Vec::new(); + for _ in 0..PROBE_RETRY { + match Fd::open(&path, O_DIRECTORY as i32, 0) { + Ok(fd) => { + let mut buffer = [0u8; 4096]; + match fd.read(&mut buffer) { + Ok(0) => return result, + Ok(n) => { + let text = String::from_utf8_lossy(&buffer[..n]); + for line in text.lines() { + if let Some(entry) = parse_partition_entry(line) + && !result.contains(&entry) + { + result.push(entry); + } + } + return result; + } + Err(err) => { + warn!("diskd: read {path} failed: {err}"); + return result; + } + } + } + Err(err) if err.errno() == ENOENT => { + std::thread::sleep(PROBE_DELAY); + continue; + } + Err(err) => { + warn!("diskd: open {path} failed: {err}"); + return result; + } + } + } + result +} + +fn scan_devices() -> Vec { + let mut devices = Vec::new(); + let mut probe_specs: Vec<(&str, u32)> = Vec::new(); + for prefix in PROBE_PREFIXES { + let max = if *prefix == "disk.ide" { + MAX_IDE_DEVICES + } else { + MAX_DEVICES_PER_KIND + }; + for i in 0..max { + probe_specs.push((prefix, i)); + } + } + for (prefix, i) in probe_specs { + let scheme_name = format!("{prefix}{i}"); + let entries = probe_scheme_listing(&scheme_name); + if entries.is_empty() { + continue; + } + info!( + "diskd: discovered scheme {scheme_name} with {} entries", + entries.len() + ); + for (disk_index, partition) in entries { + devices.push(BlockDevice::new(scheme_name.clone(), disk_index, partition)); + } + } + devices.sort_by(|a, b| a.name.cmp(&b.name).then_with(|| a.scheme.cmp(&b.scheme))); + let mut deduped: Vec = Vec::with_capacity(devices.len()); + for d in devices { + if !deduped + .iter() + .any(|x| x.name == d.name && x.scheme == d.scheme) + { + deduped.push(d); + } + } + deduped +} + +impl SchemeSync for DiskdScheme { + fn scheme_root(&mut self) -> Result { + Ok(SCHEME_ROOT_ID) + } + + fn openat( + &mut self, + dirfd: usize, + path: &str, + flags: usize, + _fcntl_flags: u32, + ctx: &CallerCtx, + ) -> Result { + if !matches!(self.handle(dirfd)?, Handle::SchemeRoot) { + return Err(Error::new(EACCES)); + } + + let trimmed = path.trim_matches('/'); + + if trimmed.is_empty() { + if flags & (O_DIRECTORY | O_STAT) == 0 { + return Err(Error::new(EINVAL)); + } + if ctx.uid != 0 { + return Err(Error::new(EACCES)); + } + let id = self.alloc_list(); + return Ok(OpenResult::ThisScheme { + number: id, + flags: NewFdFlags::empty(), + }); + } + + if ctx.uid != 0 { + return Err(Error::new(EACCES)); + } + + let device = self.device(trimmed).ok_or(Error::new(ENOENT))?; + let underlying = device.underlying_path.clone(); + let fd = Fd::open(&device.underlying_path, (flags & O_ACCMODE) as i32, 0).inspect_err( + |err| { + warn!("diskd: failed to open {underlying} for caller: {err}"); + }, + )?; + Ok(OpenResult::OtherScheme { fd: fd.into_raw() }) + } + + fn getdents<'buf>( + &mut self, + id: usize, + mut buf: DirentBuf<&'buf mut [u8]>, + opaque_offset: u64, + ) -> Result> { + if !matches!(self.handle(id)?, Handle::List) { + return Err(Error::new(ENOTDIR)); + } + let offset = usize::try_from(opaque_offset).unwrap_or(usize::MAX); + for (i, device) in self.devices.iter().enumerate().skip(offset) { + if let Err(err) = buf.entry(DirEntry { + inode: 0, + next_opaque_id: (i as u64) + 1, + name: &device.name, + kind: DirentKind::BlockDev, + }) { + if err.errno == EINVAL { + break; + } + return Err(err); + } + } + Ok(buf) + } + + fn fstat(&mut self, id: usize, stat: &mut Stat, _ctx: &CallerCtx) -> Result<()> { + match self.handle(id)? { + Handle::SchemeRoot => { + stat.st_mode = MODE_DIR; + stat.st_size = 0; + Ok(()) + } + Handle::List => { + stat.st_mode = MODE_DIR; + stat.st_size = 0; + Ok(()) + } + } + } + + fn fpath(&mut self, id: usize, buf: &mut [u8], _ctx: &CallerCtx) -> Result { + let mut writer = FpathBuf::new(buf); + write!(&mut writer, "{SCHEME_NAME}:").map_err(|_| Error::new(EINVAL))?; + match self.handle(id)? { + Handle::SchemeRoot => {} + Handle::List => { + write!(&mut writer, "/").map_err(|_| Error::new(EINVAL))?; + } + } + Ok(writer.written()) + } + + fn read( + &mut self, + _id: usize, + _buf: &mut [u8], + _offset: u64, + _fcntl_flags: u32, + _ctx: &CallerCtx, + ) -> Result { + Err(Error::new(EBADF)) + } + + fn write( + &mut self, + _id: usize, + _buf: &[u8], + _offset: u64, + _fcntl_flags: u32, + _ctx: &CallerCtx, + ) -> Result { + Err(Error::new(EBADF)) + } + + fn fsize(&mut self, id: usize, _ctx: &CallerCtx) -> Result { + match self.handle(id)? { + Handle::SchemeRoot | Handle::List => Ok(0), + } + } + + fn on_close(&mut self, id: usize) { + if id == SCHEME_ROOT_ID { + return; + } + self.handles.remove(&id); + } +} + +struct FpathBuf<'a> { + buf: &'a mut [u8], + written: usize, +} + +impl<'a> FpathBuf<'a> { + fn new(buf: &'a mut [u8]) -> Self { + Self { buf, written: 0 } + } + + fn written(&self) -> usize { + self.written + } +} + +impl Write for FpathBuf<'_> { + fn write(&mut self, src: &[u8]) -> io::Result { + if self.written >= self.buf.len() { + return Ok(0); + } + let avail = self.buf.len() - self.written; + let count = src.len().min(avail); + self.buf[self.written..self.written + count].copy_from_slice(&src[..count]); + self.written += count; + Ok(count) + } + + fn flush(&mut self) -> io::Result<()> { + Ok(()) + } +} + +#[cfg(target_os = "redox")] +fn enter_null_namespace() { + if let Err(err) = libredox::call::setrens(0, 0) { + error!("diskd: setrens(0, 0) failed: {err}"); + } +} + +#[cfg(not(target_os = "redox"))] +fn enter_null_namespace() {} + +#[cfg(target_os = "redox")] +fn run_daemon() -> io::Result<()> { + enter_null_namespace(); + + let devices = scan_devices(); + info!("diskd: discovered {} block device entries", devices.len()); + for d in &devices { + info!("diskd: {} -> {}", d.name, d.underlying_path); + } + + let socket = Socket::create() + .map_err(|err| io::Error::other(format!("diskd: failed to create scheme socket: {err}")))?; + let mut state = SchemeState::new(); + let mut scheme = DiskdScheme::new(devices); + + info!("diskd: scheme {SCHEME_NAME} ready"); + + loop { + let request = match socket.next_request(SignalBehavior::Restart) { + Ok(Some(req)) => req, + Ok(None) => { + info!("diskd: scheme socket closed; exiting"); + return Ok(()); + } + Err(err) if err.errno == EINTR => continue, + Err(err) => { + error!("diskd: next_request failed: {err}"); + return Err(io::Error::other(format!("diskd: {err}"))); + } + }; + + if let RequestKind::Call(call_request) = request.kind() { + let response = call_request.handle_sync(&mut scheme, &mut state); + if let Err(err) = socket.write_response(response, SignalBehavior::Restart) { + error!("diskd: write_response failed: {err}"); + return Err(io::Error::other(format!("diskd: {err}"))); + } + } + } +} + +#[cfg(not(target_os = "redox"))] +fn run_daemon() -> io::Result<()> { + info!("diskd: host build: scheme serving disabled outside Redox"); + Ok(()) +} + +fn main() { + let _ = log::set_logger(&LOGGER); + log::set_max_level(LevelFilter::Info); + + match run_daemon() { + Ok(()) => process::exit(0), + Err(err) => { + error!("diskd: fatal: {err}"); + process::exit(1); + } + } +} diff --git a/local/recipes/system/driver-manager/source/src/config.rs b/local/recipes/system/driver-manager/source/src/config.rs index 39da6953b0..fb0f7955f0 100644 --- a/local/recipes/system/driver-manager/source/src/config.rs +++ b/local/recipes/system/driver-manager/source/src/config.rs @@ -176,24 +176,57 @@ fn open_pcid_channel(device_path: &str) -> Result { } fn check_scheme_available(name: &str) -> bool { + use std::sync::atomic::{AtomicBool, Ordering}; + use std::sync::Arc; + use std::time::{Duration, Instant}; + let path = format!("/scheme/{}", name); - // Use read_dir instead of Path::exists() because Redox scheme paths - // may not respond correctly to exists()/metadata() while still being - // fully functional for directory enumeration and file open. - // This was the root cause of "dependency scheme not ready: pci" even - // though PciBus::enumerate_devices (which uses read_dir) succeeded. - match fs::read_dir(&path) { - Ok(_) => true, - Err(err) => { - log::debug!( - "driver-manager: scheme availability check failed for {}: {} (exists={})", - path, - err, - std::path::Path::new(&path).exists() - ); - false + // Run the read_dir probe on a background thread with a hard 2s timeout. + // A single direct read_dir on Redox can hang indefinitely when a scheme + // is registered but not yet responding to directory ops; this guard + // keeps the boot path moving. The probe uses read_dir (not exists()) + // because Redox scheme paths may not respond to exists()/metadata() + // while still being fully functional for directory enumeration. + let available = Arc::new(AtomicBool::new(false)); + let done = Arc::new(AtomicBool::new(false)); + + let avail_clone = Arc::clone(&available); + let done_clone = Arc::clone(&done); + let path_clone = path.clone(); + + let _ = std::thread::spawn(move || { + match fs::read_dir(&path_clone) { + Ok(_) => { + avail_clone.store(true, Ordering::SeqCst); + } + Err(err) => { + log::debug!( + "driver-manager: scheme availability check failed for {}: {}", + path_clone, + err + ); + } } + done_clone.store(true, Ordering::SeqCst); + }); + + let timeout = Duration::from_secs(2); + let start = Instant::now(); + while start.elapsed() < timeout { + if available.load(Ordering::SeqCst) { + return true; + } + if done.load(Ordering::SeqCst) { + return false; + } + std::thread::sleep(Duration::from_millis(50)); } + + log::debug!( + "driver-manager: scheme {} check timed out after 2s", + path + ); + false } impl Driver for DriverConfig { diff --git a/local/recipes/system/driver-manager/source/src/main.rs b/local/recipes/system/driver-manager/source/src/main.rs index 52d02cd4b5..f80164486c 100644 --- a/local/recipes/system/driver-manager/source/src/main.rs +++ b/local/recipes/system/driver-manager/source/src/main.rs @@ -420,7 +420,7 @@ fn main() { let manager_config = ManagerConfig { max_concurrent_probes: 4, deferred_retry_ms: 500, - async_probe: !initfs, + async_probe: false, }; let manager = Arc::new(Mutex::new(DeviceManager::new(manager_config.clone()))); @@ -457,13 +457,10 @@ fn main() { reset_timeline_log(); - // In initfs mode, pcid (spawned by hwd at service 40) may not have - // registered /scheme/pci yet when driver-manager starts at service 00. - // Wait for required bus schemes to appear before enumerating. - if initfs { - wait_for_scheme("pci", Duration::from_secs(30)); - wait_for_scheme("acpi", Duration::from_secs(10)); - } + // Always wait for bus schemes — even in rootfs mode the PCI/ACPI daemons + // may not have registered yet when driver-manager starts. + wait_for_scheme("pci", Duration::from_secs(30)); + wait_for_scheme("acpi", Duration::from_secs(10)); if manager_config.async_probe { let handle = thread::spawn(move || { diff --git a/local/sources/base b/local/sources/base index 1b9b8be0fc..31e337559d 160000 --- a/local/sources/base +++ b/local/sources/base @@ -1 +1 @@ -Subproject commit 1b9b8be0fc6c23554cc29b569629ab8e955d7168 +Subproject commit 31e337559dfce6e14d70f0d1096f879f6640853d diff --git a/mk/prefix.mk b/mk/prefix.mk index 313a372775..5eb6b01251 100644 --- a/mk/prefix.mk +++ b/mk/prefix.mk @@ -101,6 +101,19 @@ else sed 's|/usr/share|$(ROOT)/$@/share|g' "$@/bin/libtoolize.orig" > "$@/bin/libtoolize" chmod 0755 "$@/bin/libtoolize" touch "$@" +# Ensure C++ headers are available in the sysroot. The relibc-install copy +# at line 99 may have overwritten or merged the gcc-install c++ headers +# via the relibc stage cp at lines 71-74. Re-link them durably so the +# cross-compiler finds libstdc++ headers across `make clean`, +# `--no-cache` rebuilds, and `prefix_clean`. +ifneq ($(HOSTED_REDOX),1) + @if [ -d "$(PREFIX)/gcc-install/$(GNU_TARGET)/include/c++" ] && \ + [ ! -e "$@/$(GNU_TARGET)/include/c++" ]; then \ + ln -sf "$(ROOT)/$(PREFIX)/gcc-install/$(GNU_TARGET)/include/c++" \ + "$@/$(GNU_TARGET)/include/c++"; \ + echo "symlinked C++ headers into sysroot"; \ + fi +endif endif # PREFIX_BINARY ---------------------------------------------------