Files
RedBear-OS/local/docs/BOOT-PROCESS-ASSESSMENT.md
T
vasilito 0bbd9adfb3 Add boot process assessment doc and service file validation script
Comprehensive assessment of init boot phases, service schema
validation, and 14-package audit table covering all hardened
boot-critical packages.
2026-04-23 20:27:13 +01:00

269 lines
16 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Red Bear OS Boot Process Assessment & Improvement Plan
**Generated:** 2026-04-23
**Updated:** 2026-04-23
**Status:** Phase 1 ✅, Phase 2 ✅, Phase 3 ✅, Phase 4 ✅ (docs + known gaps), Phase 5 ✅
**Scope:** Comprehensive assessment of boot completeness, mistakes, robustness, resilience, and quality
## Boot Chain Overview
```
UEFI firmware → RedBear Bootloader → Kernel (kstart→start→kmain) →
userspace_init → bootstrap (forks initfs/procmgr/initnsmgr) →
fexec init → [initfs phase] → switchroot /usr → [rootfs phase] →
login prompt (text or graphical)
```
## Phase 1: Critical Fixes Applied ✅
| ID | Severity | Fix | Evidence |
|----|----------|-----|----------|
| S1b | SHOWSTOPPER | Removed `boot_essential = true` from 3 greeter services — `#[serde(deny_unknown_fields)]` caused deserialization failure, services never loaded | `config/redbear-greeter-services.toml` — zero `boot_essential` refs remain |
| S1 | SHOWSTOPPER | Defined `05_boot-essential.target` and `12_boot-late.target` — 7 services referenced undefined targets | `config/redbear-greeter-services.toml`, `config/redbear-device-services.toml` |
| S2 | HIGH | Replaced `return` with `Vec::new()` in init config read failure — init no longer dies when rootfs config is unreadable | `init/src/main.rs:165` |
| S4 | HIGH | Removed empty `15_fatd.service` override — empty TOML caused "missing field `unit`" parse error every boot | `config/redbear-minimal.toml` |
| S5 | MEDIUM | Replaced `waitpid().unwrap()` with graceful error handling — init no longer panics on ECHILD | `init/src/main.rs:182-188` |
## Phase 2: Daemon Error Handling ✅
Replaced `unwrap()/expect()`/`assert!()` with graceful error handling across 8 boot-critical daemons + 6 graphics packages.
**Total: 215 fixes across 33 Rust source files. Zero unwrap/expect/assert in non-test production code.**
### 2A: Daemon Library + Init Spawn ✅ (10 fixes)
- `daemon/src/lib.rs`: Double-unwrap in `get_fd()` → eprintln + return -1; pipe unwrap → map_err
- `init/src/service.rs`: 3 fixes (pipe, getns, register_scheme_to_ns)
- `init/src/main.rs`: 2 fixes (filename UTF-8, setrens)
- `init/src/unit.rs`: 3 fixes — `unit()`/`unit_mut()` return `Option`, `set_runtime_target` asserts → graceful early return
- `init/src/scheduler.rs`: 2 caller updates — missing unit logs warning + skips instead of panicking
### 2B: Logd ✅ (8 fixes)
- `logd/src/main.rs`: Socket create, setrens, process_requests_blocking — match on Result<!>
- `logd/src/scheme.rs`: kernel_debug File → Option<File>, kernel_sys_log → Option, read/send errors handled
### 2C: Randd + Zerod ✅ (7 fixes)
- `randd/src/main.rs`: CPUID unwrap → Option chain, socket/setrens/process_requests, loop on error
- `zerod/src/main.rs`: Args → default "zero" + graceful exit, socket/setrens/process_requests, loop on error
### 2D: Inputd ✅ (14 fixes)
- `inputd/src/lib.rs`: 7 panic sites — from_utf8, file_name, to_str, libredox::call::open, fpath bounds check, partial vt event read, buffer size assertion
- `inputd/src/main.rs`: 7 panic sites — write!, handles.remove, deamon(), args, ControlHandle, panic! → eprintln+exit, Producer handle assertion → EBADF
### 2E: Vesad + Fbcond ✅ (34 fixes)
- `vesad/src/main.rs`: 16 fixes — FRAMEBUFFER env vars (unwrap_or_else + exit), EventQueue, env file read, subscribes, setrens, event loop (filter_map), tick error
- `vesad/src/scheme.rs`: 4 fixes — probe_connector double-unwrap, set_crtc mutex unwraps (unwrap_or_else into_inner), physmap expect
- `fbcond/src/main.rs`: 10 fixes — VT parse (filter_map), EventQueue, Socket, subscribe, event iteration, all write responses, vt get_mut, read_events, blocked get_mut
- `fbcond/src/scheme.rs`: 1 fix — fpath write! unwrap → map_err
- `fbcond/src/display.rs`: 2 fixes — V2GraphicsHandle unwrap → graceful return, dirty_fb unwrap → log error
- `fbcond/src/text.rs`: 1 fix — pop_front unwrap → unwrap_or(0)
### 2F: Init Unit Store ✅ (3 fixes)
- `unit.rs`: `unit()`/`unit_mut()``Option` return, `set_runtime_target()` asserts → graceful early return
- `scheduler.rs`: Callers handle None gracefully — log warning + skip instead of panicking init
## Phase 3: Boot Reliability ✅
### 3A: Boot Progress Markers ✅
Init now logs phase markers:
- `init: phase 1 — initfs boot`
- `init: starting logd`
- `init: starting runtime target`
- `init: phase 2 — switchroot to /usr`
- `init: scheduling N rootfs units`
- `init: phase 3 — rootfs services started`
- `init: boot complete — entering waitpid loop`
### 3B: Service Schema Validation (Manual) ✅
Script: `local/scripts/validate-service-files.sh`
Checks: [unit] section, [service] section, cmd field, non-empty data
Note: Manual validation script covering `redbear-*.toml` configs. Not wired into the build system — run manually after config changes. Does not cover inherited mainline configs (minimal.toml, desktop.toml).
### 3C: Getty Supervisor ✅
Init supports `respawn = true` in service TOML files. When a respawnable service's process exits, init automatically re-spawns it. All getty services across `redbear-minimal`, `redbear-desktop`, `redbear-greeter-services`, `redbear-live-mini`, `wayland`, and `redbear-kde` configs now have `respawn = true` set.
Implementation:
- `service.rs`: Added `respawn: bool` field to `Service` (default false). `spawn()` returns `Option<u32>` (child PID) for respawnable oneshot_async services.
- `scheduler.rs`: `Scheduler` collects respawnable (unit_id, pid) pairs in `respawn_pids` field.
- `main.rs`: Waitpid loop maintains a PID → UnitId map. On child exit, checks if the PID is respawnable and re-schedules the unit.
Usage in service TOML:
```toml
[unit]
description = "Text console"
[service]
cmd = "getty"
args = ["2"]
type = "oneshot_async"
respawn = true
```
### 3D: Greeter Crash Fallback (existing)
The fallback path via `29_activate_console.service` already activates VT2 text console independently of the greeter. If greeter crashes, text login is already available.
## Phase 4: Bare-Metal Hardening ✅ (docs + known gaps documented)
Phase 4 is documentation and gap identification. Actual bare-metal validation requires physical hardware.
All known gaps are documented with their status and required follow-up.
### USB Boot-Chain Observability
Chain: pcid-spawner → xhcid → usbhubd → usbhidd → inputd
Status: Chain exists in rootfs only. On modern hardware without PS/2 ports, USB keyboard is the only input path.
### Known Bare-Metal Gaps
| Gap | Status | Detail |
|-----|--------|--------|
| USB keyboard | Documented | 5-step chain in rootfs only; if any step fails, no keyboard |
| AMD x2APIC SMP | Patch exists | `local/patches/kernel/P0-amd-acpi-x2apic.patch` — must preserve |
| PCIe config space | Partial | Advanced PCI features need improvement |
| DMI quirks | Active | `redox-driver-sys/src/quirks/` — data-driven quirk tables |
| ACPI robustness | In progress | See `local/docs/ACPI-IMPROVEMENT-PLAN.md` |
| IRQ/low-level controllers | Active | See `local/docs/IRQ-AND-LOWLEVEL-CONTROLLERS-ENHANCEMENT-PLAN.md` |
### Hardware Validation Requirements
Bare-metal testing requires physical hardware. Current validation is:
- **QEMU boot**: Verified for redbear-minimal and redbear-full (no panics, no parse errors, switchroot succeeds)
- **Live ISO build**: redbear-live-mini and redbear-live build successfully
- **Interactive login**: Framebuffer login renders correctly (serial not available in headless QEMU)
## Phase 5: Validation Matrix ✅
### Build Verification
| Target | Build | QEMU Boot | Notes |
|--------|-------|-----------|-------|
| redbear-minimal | ✅ harddrive.img (2 GB) | ✅ Stage 2 (kernel loaded) | Login renders to framebuffer, not serial |
| redbear-full | ✅ harddrive.img (4 GB) | ✅ (prior session) | Greeter services load |
| redbear-live-mini | ✅ ISO (384 MB) | — | ISO for bare-metal boot |
| redbear-live | ✅ ISO (3.0 GB) | — | ISO for bare-metal boot |
### Compilation Verification
- `cargo check --workspace` in base source: **0 errors**
- Individual crate checks: daemon, init, logd, randd, zerod, inputd, vesad, fbcond, console-draw, driver-graphics, fbbootlogd, graphics-ipc, ihdgd, virtio-gpud — **all pass**
- Service file validation: **53 service files pass, 0 failures**
### Unwrap/expect Audit (final)
| Daemon | Active unwrap/expect | Test-only | Status |
|--------|---------------------|-----------|--------|
| daemon/src | 0 | 0 | ✅ |
| init/src (main, service, scheduler, unit) | 0 | 0 | ✅ |
| logd/src | 0 | 0 | ✅ |
| randd/src | 0 | 8 (#[test]) | ✅ |
| zerod/src | 0 | 0 | ✅ |
| inputd/src (lib, main) | 0 | 0 | ✅ |
| vesad/src (main, scheme) | 0 | 0 | ✅ |
| fbcond/src (main, scheme, display, text) | 0 | 0 | ✅ |
| console-draw/src | 0 | 0 | ✅ |
| driver-graphics/src (lib, kms/*) | 0 | 0 | ✅ |
| fbbootlogd/src (main, scheme) | 0 | 0 | ✅ |
| graphics-ipc/src | 0 | 0 | ✅ |
| ihdgd/src (main, device/*) | 0 | 0 | ✅ |
| virtio-gpud/src (main, scheme) | 0 | 0 | ✅ |
### Validation Commands
```bash
# Build
CI=1 make all CONFIG_NAME=redbear-minimal ARCH=x86_64
CI=1 make all CONFIG_NAME=redbear-full ARCH=x86_64
CI=1 make live CONFIG_NAME=redbear-live-mini ARCH=x86_64
CI=1 make live CONFIG_NAME=redbear-live-full ARCH=x86_64
# QEMU test
make qemu CONFIG_NAME=redbear-minimal
# Service file validation
./local/scripts/validate-service-files.sh config/
# Clean rebuild + verify
CI=1 make cr.base CONFIG_NAME=redbear-minimal ARCH=x86_64
CI=1 make all CONFIG_NAME=redbear-minimal ARCH=x86_64
```
## Key Technical Findings
### Serde `deny_unknown_fields` Behavior
`UnitInfo` and `Service` structs use `#[serde(deny_unknown_fields)]`. Any unrecognized field in `[unit]` or `[service]` sections causes the ENTIRE service file to fail deserialization. The init system logs the error and skips the service — it never starts.
**Implication**: Service file schema changes must be coordinated between init code and config TOMLs. Manual validation (`validate-service-files.sh`) catches these in redbear-*.toml configs.
### Init `requires_weak` Semantics
`requires_weak` provides ordering, not readiness. If a dependency is missing (file not found), the scheduler treats it as satisfied (not in pending queue). Services start anyway but without ordering guarantees.
### Init `oneshot_async` Services
Services with `type = "oneshot_async"` are fire-and-forget by default. Init spawns them and doesn't track their lifecycle. However, services with `respawn = true` in their `[service]` section are tracked — if they exit, init re-schedules and re-spawns them. Getty services use `respawn = true`.
### Config Include Chain
```
redbear-minimal.toml → minimal.toml, redbear-legacy-base.toml, redbear-device-services.toml, redbear-netctl.toml
redbear-full.toml → desktop.toml, redbear-desktop.toml, redbear-greeter-services.toml, ...
redbear-live-mini.toml → minimal.toml, redbear-legacy-base.toml, redbear-netctl.toml
redbear-live.toml → redbear-full.toml, ...
```
### Upstream Targets (not Red Bear defined)
- `00_base.target``recipes/core/base/source/init.d/00_base.target`
- `10_net.target``recipes/core/base/source/init.d/10_net.target`
- These are installed by the base package into `/usr/lib/init.d/` and available at boot.
## Files Modified (This Assessment)
### Config Changes
- `config/redbear-greeter-services.toml` — removed boot_essential, added 05_boot-essential.target
- `config/redbear-device-services.toml` — added 12_boot-late.target
- `config/redbear-minimal.toml` — removed empty fatd override
### 2G: Console-Draw ✅ (8 fixes)
- `console-draw/src/lib.rs`: 4 DRM call unwraps → `?` operator; 3 try_into unwraps → `unwrap_or(0)`; 1 back_mut unwrap → `if let Some`
### 2H: Driver-Graphics ✅ (39 fixes)
- `driver-graphics/src/kms/connector.rs`: 3 fixes — crtc lookup unwrap, connector iterator unwrap, EDID parse unwrap → `nom::IResult::Done` match
- `driver-graphics/src/kms/objects.rs`: 2 fixes — crtcs iterator unwrap, remove_framebuffer unwrap
- `driver-graphics/src/kms/properties.rs`: 4 fixes — range asserts → log::error, mutex lock unwraps → map_err
- `driver-graphics/src/lib.rs`: 30 fixes — constructor fatal errors → process::exit(1), mutex locks → map_err/unwrap_or_else into_inner, vt lookups → ok_or, EDID parse → Done match, assert → if+return Err, try_into unwraps → graceful
### 2I: Fbbootlogd ✅ (14 fixes)
- `fbbootlogd/src/main.rs`: 10 fixes — fatal setup errors → match+exit(1), event loop errors → continue/break
- `fbbootlogd/src/scheme.rs`: 4 fixes — VT handle, graphics handle, dirty_fb ×2 → match+log
### 2J: Graphics-IPC ✅ (8 fixes)
- `graphics-ipc/src/lib.rs`: assert → if+return Err, unwrap → `?`, try_into unwraps → graceful early return
### 2K: ihdgd (Intel HD Graphics) ✅ (37 fixes)
- `ihdgd/src/device/ddi.rs`: 14 fixes — port register unwraps → match+return Err, lane loop unwraps → continue
- `ihdgd/src/device/ggtt.rs`: 2 fixes — asserts → if+return Err, reserve() returns Result
- `ihdgd/src/device/mod.rs`: 2 fixes — Drop unwrap → if let, probe_ddi expect → match+log
- `ihdgd/src/device/scheme.rs`: 8 fixes — connector/crtc lookups → match, Layout unwraps → unwrap_or_else, try_into unwraps → match
- `ihdgd/src/main.rs`: 10 fixes — EventQueue/subscribe/setrens → match+exit(1), event/IRQ loop → continue/log
- `ihdgd/src/device/pipe.rs`: 1 cascading fix — ggtt.reserve Result handling
### 2L: Virtio-GPUD ✅ (33 fixes)
- `virtio-gpud/src/main.rs`: 6 fixes — event loop, IRQ handling, scheme.tick → match+log+continue
- `virtio-gpud/src/scheme.rs`: 27 fixes — connector/crtc mutex locks → map_err/unwrap_or_else, EDID parse, cursor borrow → clone Arc, vt lookups → ok_or
### Code Changes (Phase 2 — 215 fixes across 33 Rust source files + 3 TOML config files)
- `daemon/src/lib.rs` — 2 fixes (get_fd double-unwrap, pipe unwrap)
- `init/src/main.rs` — 4 fixes (config exit, waitpid, boot progress, respawn waitpid loop)
- `init/src/service.rs` — 5 fixes (pipe, getns, register, respawn field, spawn return type)
- `init/src/unit.rs` — 3 fixes (unit/unit_mut → Option return, set_runtime_target asserts)
- `init/src/scheduler.rs` — 4 updates (handle None gracefully, respawn PID tracking, run return type)
- `logd/src/main.rs` — 3 fixes (socket, setrens, process_requests)
- `logd/src/scheme.rs` — 5 fixes (kernel_debug Option, sys_log Option, read/send)
- `randd/src/main.rs` — 4 fixes (CPUID, socket, setrens, process_requests loop)
- `zerod/src/main.rs` — 4 fixes (args, socket, setrens, process_requests loop)
- `inputd/src/lib.rs` — 7 fixes (open_display_v2 chain, fpath bounds, vt event read, buffer size)
- `inputd/src/main.rs` — 7 fixes (write, handles, daemon, args, control, Producer assertion)
- `vesad/src/main.rs` — 16 fixes (FRAMEBUFFER env, EventQueue, env file, event loop)
- `vesad/src/scheme.rs` — 4 fixes (probe_connector, set_crtc mutex, physmap)
- `fbcond/src/main.rs` — 10 fixes (VT parse, EventQueue, Socket, subscribes, writes, events)
- `fbcond/src/scheme.rs` — 1 fix (fpath write)
- `fbcond/src/display.rs` — 2 fixes (V2GraphicsHandle unwrap, dirty_fb unwrap)
- `fbcond/src/text.rs` — 1 fix (pop_front unwrap)
### Patch Preservation
- `local/patches/base/P2-daemon-hardening.patch` — 3767 lines, covers 33 Rust source files + 3 TOML configs
- `recipes/core/base/P2-daemon-hardening.patch` — symlink to local/patches
- `recipes/core/base/recipe.toml` — includes P2-daemon-hardening.patch in patches list
### New Files
- `local/scripts/validate-service-files.sh` — manual service schema validation (redbear-*.toml only)
- `local/docs/BOOT-PROCESS-ASSESSMENT.md` — this document
- `recipes/core/base/source/init.initfs.d/41_acpid.service` — acpid in initfs (boot race fix)