Files
RedBear-OS/local/docs/CACHYOS-BOOT-EXPERIENCE-PLAN.md
T

477 lines
21 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Red Bear OS — CachyOS-Class Boot Experience Implementation Plan
**Version:** 1.0 · 2026-06-11 · Branch: `0.2.3`
**Status:** Canonical plan for boot visual quality, display handoff, and boot speed
**Depends on:** existing `redox-drm`, `inputd`, `vesad`, `fbbootlogd`, `fbcond`, `bootloader`
**Supersedes:** boot-comfort fragments in `CONSOLE-TO-KDE-DESKTOP-PLAN.md` (boot pipeline layer only)
---
## 0. Architecture Decision
**The Linux model is correct: once DRM driver becomes available, it realizes handoff automatically.**
No daemon-side config awareness. No polling. No inter-daemon handshakes. When `redox-drm` registers
`scheme:drm/card0`, the display path switches through the existing `inputd` ESTALE mechanism. Init
orchestrates the lifecycle — staging the splash, detecting DRM, withdrawing the earlyfb, forwarding
traffic to the new path.
### Target Pipeline (Post-Plan)
```
UEFI GOP framebuffer (bootloader paints Red Bear logo)
→ kernel boots, passes FB env vars to init
→ init starts vesad (20_vesad.service) ← registers display.vesa (earlyfb)
→ init starts redbear-bootanim (20_bootanim.service) ← paints splash on earlyfb
→ init starts fbbootlogd (quiet mode, hidden behind splash)
→ init starts fbcond (VT 2, behind splash)
→ redox-drm loads (04_drivers.target), registers scheme:drm/card0
→ inputd signals ESTALE on all display.* handles
→ 50_drm-handoff.service runs ← atomic swap: vesad → DRM
• bootanim re-parents onto DRM FB (memcpy, no redraw)
• fbbootlogd/fbcond reconnect to DRM
• vesad releases bootloader FB, exits
→ SDDM/KWin start (08_userland.target)
→ bootanim fades out as greeter paints
Visible result: black → red bear logo + spinner → silent handoff → SDDM fade-in
No log text unless user presses Esc. No flicker. No blank screen.
```
### Linux Mechanism Mapping
| CachyOS / Linux | Red Bear equivalent |
|---|---|
| `simpledrm` (kernel) | `vesad` earlyfb + bootanim mmap |
| `Plymouth` (userspace splash) | `redbear-bootanim` (Rust, per AGENTS.md "system-critical must be Rust") |
| Plymouth two-step (pre-DRM → post-DRM) | bootanim `Surface::Vesad``Surface::Drm` state machine |
| `drm_aperture_remove_conflicting_framebuffers()` | init-managed via `50_drm-handoff.service` + `98_release_vesad.service` |
| `CONFIG_FRAMEBUFFER_CONSOLE_DEFERRED_TAKEOVER` | bootanim holds firmware FB visible until DRM handoff completes |
| Plymouth Esc-to-reveal | bootanim SIGUSR2 → fbbootlogd reconnects, paints log overlay |
| Plymouth fade-out on greeter ready | bootanim SIGTERM → 200ms fade → exit |
---
## 1. Current State Assessment
### What Exists
| Component | Location | Scheme | Status |
|---|---|---|---|
| Bootloader | `local/sources/bootloader/` | UEFI GOP text menu | Text-only, no logo/splash |
| Kernel debug display | `local/sources/kernel/src/devices/graphical_debug/` | `scheme:debug` | Immediately overwrites bootloader FB |
| vesad | `local/sources/base/drivers/graphics/vesad/` | `display.vesa` | ✅ Registers earlyfb. No handoff code. Stays alive. |
| fbbootlogd | `local/sources/base/drivers/graphics/fbbootlogd/` | `fbbootlog` | ✅ Overwrites FB with log text immediately. Has handoff path. VT 1. |
| fbcond | `local/sources/base/drivers/graphics/fbcond/` | `fbcon` | ✅ Text console VTs. Handoff with 4-retry limit. VT 2+. |
| inputd | `local/sources/base/drivers/inputd/` | `scheme:input` | ✅ Display/input multiplexer. Signals ESTALE on handoff. |
| redox-drm | `local/recipes/gpu/redox-drm/source/` | `scheme:drm` | 🚧 Registers DRM. Calls inputd/handle/ to announce itself. |
| virtio-gpud | `local/sources/base/drivers/graphics/virtio-gpud/` | `display.virtio-gpu` | ⚠️ Legacy, uses old GraphicsScheme API |
| ihdgd | `local/sources/base/drivers/graphics/ihdgd/` | `display.ihdg.*` | ⚠️ Legacy Intel driver |
| Branding assets | `local/Assets/images/` | n/a | PNGs exist, NOT integrated anywhere |
### What's Missing (Gap Analysis)
| # | Gap | Impact |
|---|-----|--------|
| 1 | No boot splash/logo | User sees raw kernel/init log text from the first millisecond |
| 2 | fbbootlogd overwrites bootloader FB immediately | Any bootloader-painted pixels are destroyed within milliseconds |
| 3 | No smooth display handoff | vesad stays alive, doesn't release FB memory, no coordinated transition |
| 4 | No "quiet boot" mode | Kernel/init log is always shown, no way to hide it behind splash |
| 5 | Boot is slow (4 barrier syncs before SDDM) | 00→02→04→06→08 target chain; each waits for all services |
| 6 | No progress indicator | No animated spinner or progress bar during boot |
| 7 | No bootloader branding | UEFI bootloader shows text mode selection menu only |
| 8 | vesad doesn't release FB on DRM handoff | Bootloader FB stays mapped, wasting ~8MB memory |
| 9 | `29_activate_console` is a mess | Overridden to no-op in legacy-base, then overridden again in mini. 200ms sleep hack. |
| 10 | fbcond gives up after 4 handoff retries | If DRM is slow (firmware load), console silently stops |
| 11 | Legacy virtio-gpud/ihdgd may conflict | Could race with redox-drm for display scheme |
### Init Service Order (Current)
```
INITFS STAGE:
00_runtime.target → 10_inputd → 20_vesad → 20_fbbootlogd → 20_fbcond
→ 40_drivers.target → 50_rootfs → 90_initfs.target → switch_root
ROOTFS STAGE:
00_base.target → 02_early_hw.target → 04_drivers.target → 06_services.target
→ 08_userland.target → 29_activate_console → 30_console (getty 2) → login
For redbear-full:
Same + 12_sddm → kwin_wayland → KDE Plasma
```
---
## 2. Phased Implementation Plan
### PHASE 1 — Branding Infrastructure
**Goal:** Single source of truth for Red Bear visual assets with deterministic conversion.
**Effort:** 14 hours
**Files:**
| Path | Type | Purpose |
|---|---|---|
| `local/Assets/scripts/render-assets.sh` | script | PNG → BMP/RAW conversion via `imagemagick` (host-side) |
| `local/Assets/MANIFEST.sha256` | text | Deterministic checksums for all generated assets |
| `local/recipes/system/redbear-assets/recipe.toml` | recipe (Rule 1) | Stages assets to `/usr/share/redbear/assets/` |
| `local/sources/redbear-assets/` | source (Rule 1) | Trivial install crate |
| `local/docs/BOOT-BRANDING-SPEC.md` | doc | Resolution policy, color profile, animation budget |
**Generated assets (from existing PNGs):**
| Asset | Format | Resolution | Consumer |
|---|---|---|---|
| `bootlogo-1080p.bmp` | 32-bit BGRA BMP | 1920×1080 | Bootloader UEFI `Blt()` |
| `bootlogo-720p.bmp` | 32-bit BGRA BMP | 1280×720 | Bootloader fallback |
| `bootlogo-tiny.bmp` | 32-bit BGRA BMP | 640×480 | VESA-only firmware |
| `splash-1080p.raw` | Raw BGRA scanout | 1920×1080 | bootanim direct mmap |
| `splash-1080p.anim.json` | JSON | n/a | Animation timeline |
**Verification:**
- `render-assets.sh` produces all assets, byte-identical across rebuilds
- `redbear-assets` recipe stages them into sysroot
---
### PHASE 2 — `redbear-bootanim`: Plymouth Equivalent
**Goal:** Rust userspace daemon that owns the framebuffer from vesad registration until greeter focus,
rendering the Red Bear brand consistently across both earlyfb and DRM.
**Effort:** 12 days
**Files:**
| Path | Type | Purpose |
|---|---|---|
| `local/sources/redbear-bootanim/` | source (Rule 1) | Bootanim daemon source |
| `local/sources/redbear-bootanim/src/main.rs` | Rust | Daemon entry, signal handlers |
| `local/sources/redbear-bootanim/src/surface.rs` | Rust | Surface abstraction over vesad earlyfb + DRM |
| `local/sources/redbear-bootanim/src/anim.rs` | Rust | Animation loop (logo + spinner + progress) |
| `local/sources/redbear-bootanim/src/progress.rs` | Rust | Unix datagram socket for progress updates from init |
| `local/recipes/system/redbear-bootanim/recipe.toml` | recipe (Rule 1) | Depends on redbear-assets, inputd |
| `config/redbear-bootanim.toml` | config fragment | 20_bootanim.service + 50_drm-handoff + 98_release_vesad |
**Service wiring:**
```toml
# 20_bootanim.service — runs on earlyfb, transitions to DRM
[[files]]
path = "/etc/init.d/20_bootanim.service"
data = """
[unit]
description = "Red Bear boot animation (splash)"
requires_weak = ["10_inputd.service", "20_vesad.service"]
[service]
cmd = "/usr/bin/redbear-bootanim"
args = ["--surface=vesad", "--vt=1"]
type = "simple"
respawn = false
"""
```
**Behavior:**
| State | Surface | Renders | Input |
|---|---|---|---|
| `Surface::Vesad` | mmap'd bootloader FB | Logo + spinner + progress | Pass-through to fbcond |
| `Surface::Drm` | `/scheme/drm/card0` | Same pixels (memcpy, no redraw) | Pass-through |
| `Reveal` (SIGUSR2/Esc) | Both | Translucent log overlay on splash | Log scrollback |
| `Exit` (SIGTERM) | n/a | 200ms fade to black, exit | n/a |
**Key design property:** Handoff is a memcpy, not a redraw. bootanim holds a cached `Box<[u32]>` of the last frame (~8MB). On handoff, it copies this to the DRM FB. Both surfaces end up pixel-identical — zero flicker.
**Verification:**
- `redbear-mini`: logo appears in UEFI FB, continues through init, transitions to fbbootlogd
- `redbear-full`: logo → smooth DRM handoff → SDDM fade-in (no blank gap >1 frame)
- Esc reveals log; Esc again hides it
---
### PHASE 3 — Atomic DRM Handoff (Linux `drm_aperture` Equivalent)
**Goal:** One-shot helper that orchestrates vesad → DRM transition in a single transaction.
**Effort:** 48 hours
**Files:**
| Path | Type | Purpose |
|---|---|---|
| `local/sources/redbear-bootanim/src/bin/handoff.rs` | Rust | Handoff orchestrator binary |
| `local/sources/redbear-bootanim/src/bin/release_fb.rs` | Rust | Sends RELEASE_EARLYFB to vesad |
**Handoff sequence (in `handoff.rs`):**
```
1. Send PREPARE_HANDOFF to bootanim → bootanim flushes scanout, snapshots frame, pauses animation
2. bootanim opens /scheme/drm/card0, performs ModeSetCrtc + first present
3. bootanim returns HANDOFF_READY
4. Send RELEASE_EARLYFB to vesad → vesad munmaps bootloader FB, signals ESTALE, exits
5. Send POST_HANDOFF to bootanim → bootanim resumes animation on DRM surface exclusively
6. Send REBIND_DISPLAY drm to inputd → promotes DRM to primary, ESTALE to remaining consumers
7. Exit 0
```
**Why a separate binary:** Init can enforce ordering and timeout. If handoff hangs, init moves on — user still gets a working system (stuck splash, compositor paints over it).
**Timeout/fallback:** If `redox-drm` doesn't register within 30s, handoff helper falls back to keeping splash on vesad, shows "GPU driver did not load" overlay.
**Linux mapping:**
| Linux | Red Bear |
|---|---|
| `drm_aperture_remove_conflicting_framebuffers()` | Init via `handoff.rs` (driver doesn't do implicit aperture management) |
| `CONFIG_FRAMEBUFFER_CONSOLE_DEFERRED_TAKEOVER` | bootanim holds firmware FB visible until handoff step 4 |
| Plymouth `show-splash` / `hide-splash` | bootanim exit + sessiond Seat transition signal |
**Verification:**
- `redbear-full` QEMU: screen never black for >1 frame during handoff
- Disable redox-drm: fallback message appears, user can still log in via getty
- Kill bootanim mid-handoff: handoff helper detects and recovers
---
### PHASE 4 — Quiet Boot (Log Suppression Behind Splash)
**Goal:** Normal boot shows only splash. Kernel/init log hidden unless user presses Esc or boot fails.
**Effort:** 1 day
**Files to modify:**
| Path | Change |
|---|---|
| `local/sources/base/drivers/graphics/fbbootlogd/src/main.rs` | Add `--quiet` flag (don't open display, write to logd only) |
| `local/sources/base/drivers/graphics/fbbootlogd/src/scheme.rs` | Quiet mode: no display painting until SIGUSR2 |
| `local/sources/base/drivers/inputd/src/main.rs` | Separate "log sink" consumer role from "display" consumer |
| `config/redbear-full.toml` | fbbootlogd args `["--quiet"]` |
| `config/redbear-mini.toml` | fbbootlogd args `[]` (no quiet — text target shows log) |
| `local/docs/QUIET-BOOT-SPEC.md` | Kernel cmdline `redbear_quiet=0|1`, key bindings, failure modes |
**Reveal key:** Esc (configurable in `/etc/redbear/bootanim.toml`) → bootanim sends SIGUSR2 to fbbootlogd → fbbootlogd connects to display, paints log. Esc again → disconnects, clears overlay.
**Force-reveal conditions (always show log, no quiet):**
- Kernel panic
- `redox-drm` register timeout
- Init restart loop > 2 times
- `redbear_quiet=0` kernel cmdline
**Verification:**
- `redbear-full`: no log text during normal boot. Esc reveals, Esc hides.
- `redbear-mini`: log always visible (no quiet).
- Daemon crash during boot: log auto-reveals for 5s.
---
### PHASE 5 — Boot Speed: Flatten the Stage Graph
**Goal:** Parallelize display path with hardware enumeration. Remove the 200ms sleep hack.
**Effort:** 12 days
**Current chain (4 barrier syncs):**
```
00_base → 02_early_hw → 04_drivers → 06_services → 08_userland → SDDM
```
**Proposed chain (parallel branches):**
```
00_base.target (10_inputd is the ONLY hard dep)
├─ [branch A — display] [branch B — hardware]
│ 10_bootanim 50_rootfs
│ 20_vesad 02_early_hw.target
│ 20_fbbootlogd 04_drivers.target
│ 20_fbcond redox-drm, xhcid, e1000d, ...
│ 06_services.target
│ dbus, sessiond, dhcpd
└──────────────┬───────────────────┘
08_userland.target
12_sddm (requires 50_drm-handoff, not 04_drivers.target)
29_activate_console (no sleep — waits on handoff FD)
30_console (getty 2)
```
**Key changes:**
- Display services and driver services run in parallel
- `29_activate_console` uses FD-barrier instead of `sleep 0.2` (the FD-handoff pattern from existing pcid patches)
- SDDM requires `50_drm-handoff.service`, not `04_drivers.target`
- fbcond retry limit removed — handoff helper retries DRM internally with exponential backoff (30s budget)
**Benchmark targets:**
| Metric | QEMU target | Bare-metal target |
|---|---|---|
| kernel_entry → bootanim started | < 300ms | < 200ms |
| bootanim → SDDM visible | < 2.0s | < 4.0s |
| kernel_entry → SDDM painted | < 5.0s | < 7.0s |
| Regression threshold | >10% fails CI | >10% fails CI |
**Verification:**
- `measure-boot-stages.sh` produces CSV of stage timestamps
- QEMU video recording: splash from start to SDDM, no black gap
- `redbear-mini` unchanged (speedup is redbear-full specific)
---
### PHASE 6 — Bootloader Branding & Live Progress
**Goal:** Red Bear logo visible from UEFI handoff. Branded boot menu with auto-boot countdown.
**Effort:** 12 days
**Files to add/modify:**
| Path | Change |
|---|---|
| `local/sources/bootloader/src/os/uefi/boot_logo.rs` | New module: `Blt()` bootlogo BMP at native resolution |
| `local/sources/bootloader/src/os/uefi/display.rs` | Extend Output to support `Blt()` with 32-bit BGRA |
| `local/sources/bootloader/src/os/uefi/video_mode.rs` | Prefer largest available mode, paint bootlogo |
| `local/sources/bootloader/src/main.rs` | Add `--quiet` (default on), `--menu-timeout=3` config |
| `local/sources/bootloader/mk/uefi.mk` | Embed BMPs at compile time via `include_bytes!` |
| `recipes/core/bootloader/recipe.toml` | Add redbear-assets as dependency |
| `local/docs/BOOTLOADER-BRANDING-SPEC.md` | Menu layout, timeout, key bindings, text fallback |
**Bootloader progress bar:**
- Logo + thin progress bar at bottom (0% at start)
- Bar fills to 10% when kernel is read from disk
- Bar fills to 100% when kernel entry is reached
- Same logo persists through kernel → init transition (no visible gap)
**Fallback:** If UEFI GOP doesn't support `Blt()`, bootloader falls back to text mode. Splash from Phase 2 still works.
**Verification:**
- `redbear-full` ISO in QEMU: red bear logo in UEFI FB, 3s menu, smooth transition to kernel FB
- Bare metal AMD + Intel: same behavior
- Firmware without Blt(): text fallback works
---
### PHASE 7 — Early Graphical Greeter
**Goal:** Something graphical appears before full SDDM/KWin is ready (~2s splash → ~3s minimal greeter).
**Effort:** 12 days
**Files:**
| Path | Type | Purpose |
|---|---|---|
| `local/recipes/wayland/redbear-compositor/source/src/bin/mini.rs` | Rust | Minimal Wayland greeter (user selector on black bg) |
| `config/redbear-greeter-services.toml` | config | `11_mini-greeter.service` between handoff and SDDM |
**The mini greeter:**
- Tiny Wayland compositor (few hundred lines Rust)
- Shows single user selector per configured user
- Owns the `wl_display` before KWin
- On user selection: calls `org.freedesktop.login1.Manager.SwitchToUser(uid)`, exits
- Init then starts `12_sddm` which inherits the Wayland display
**Verification:**
- `redbear-full`: splash → mini greeter (~500ms) → user selection → KWin/Plasma
- Total time < 7s on QEMU
- `redbear-mini`: unchanged
---
### PHASE 8 — Clean FB Resource Management
**Goal:** vesad releases bootloader FB on handoff. Memory accounting is auditable.
**Effort:** 48 hours
**Files to modify:**
| Path | Change |
|---|---|
| `local/sources/base/drivers/graphics/vesad/src/main.rs` | On RELEASE_EARLYFB: munmap FB, close FD, log freed bytes, exit 0 |
| `local/sources/base/drivers/graphics/vesad/src/scheme.rs` | Track FB lifetime in `Resource` struct |
| `local/sources/base/drivers/inputd/src/main.rs` | On handoff: query vesad resource, log freed bytes, 30s kill watchdog |
| `config/redbear-bootanim.toml` | Add vesad-release-timeout watchdog service |
| `local/docs/FB-RESOURCE-LIFECYCLE.md` | Full lifecycle diagram with byte counts |
**FB lifecycle:**
```
Bootloader → vesad mmap (8MB) → redox-drm allocates DRM FB (8MB)
→ handoff: both mapped briefly (16MB) → release vesad → only DRM (8MB)
```
**Verification:**
- `/var/log/logd` shows FB byte counts through lifecycle
- Watchdog kills vesad if release hangs >30s
- `redbear-mini`: vesad stays alive (no DRM, no release)
---
## 3. Dependency Graph
```
Phase 1 (branding assets) ← everything downstream
Phase 2 (bootanim daemon) ← needs Phase 1 assets
Phase 3 (atomic handoff) ← needs Phase 2 state machine
Phase 4 (quiet boot) ← independent, parallelizable
Phase 5 (boot speed graph) ← needs Phase 3 (handoff is the barrier)
Phase 6 (bootloader branding) ← independent, parallelizable
Phase 7 (mini greeter) ← needs Phase 3 + Phase 5
Phase 8 (FB resource mgmt) ← needs Phase 3 (release step)
Critical path: 1 → 2 → 3 → 5 → 7
Parallelizable: 4, 6, 8
```
---
## 4. Effort Summary
| Phase | Effort | Risk | Rollback |
|---|---|---|---|
| 1. Branding assets | 14 h | Trivial (host-side imagemagick) | Delete recipe + config |
| 2. bootanim daemon | 12 d | Handoff correctness is subtle | Disable service; log/console still works |
| 3. Atomic handoff | 48 h | Low (thin orchestrator) | Fallback to vesad if handoff fails |
| 4. Quiet boot | 1 d | Reveal key must work pre-fbcond | Per-config opt-in; mini unchanged |
| 5. Boot speed | 12 d | Invasive stage graph restructure | Revert config; one git checkout |
| 6. Bootloader branding | 12 d | UEFI Blt() varies by firmware | Text mode fallback preserved |
| 7. Mini greeter | 12 d | New UI; keyboard handling | Opt-in per config; SDDM still works |
| 8. FB resource mgmt | 48 h | Force-killing vesad could break consumers | Disable watchdog service |
**Total: ~710 working days** for a single engineer to land all 8 phases.
**First visible improvement:** Phase 1 + Phase 2 (~2 days) → bootloader logo + splash on earlyfb.
**Full CachyOS-class experience:** All 8 phases.
---
## 5. Watch-Outs
1. **Bootloader `Blt()` is firmware-dependent.** Test on ≥2 bare-metal firmwares + QEMU OVMF. If GOP doesn't support `Blt()`, text fallback kicks in.
2. **Resolution mismatch on handoff.** If DRM mode differs from vesad earlyfb, bootanim resamples the cached frame (Lanczos). Worst case: Intel i915 at 1366×768 panel + 1920×1080 DRM mode.
3. **Init FD-handoff semantics** assumed by Phase 5 (`pass_fds = [3]`) must be verified in init source before restructuring the boot graph.
4. **No patches in `local/patches/`.** All changes are direct edits in `local/sources/<component>/` (Rule 1) or tracked config fragments.
5. **Actual source paths:** `local/sources/base/drivers/graphics/<daemon>/`, not `local/sources/base/src/daemon/`. Verify before editing.
6. **KWin QML gate:** If full Plasma can't boot, Phase 7's mini greeter is the graceful degradation. Working graphical session without Plasma is better than stuck boot.
7. **Legacy virtio-gpud/ihdgd conflict:** Verify `config/redbear-full.toml` excludes these. If they ship alongside redox-drm, they'll race for the display scheme.
---
## 6. Immediate Next Steps (Blocking Issues)
Before starting Phase 1, fix these existing issues that block a clean boot:
1. **Init stops at thermald** — why console services (29-31) never start. Need runtime debug output from init.
2. **`29_activate_console.service` no-op** — redbear-legacy-base.toml overrides to `cmd = "true"`. VT 2 never activated.
3. **Remove temporary debug code** from init main.rs (INIT_LOG_LEVEL=DEBUG, debug_log function).
4. **Fix `00_acpid.service` reference**`00_driver-manager.service` references non-existent `00_acpid.service` (should be `30_acpid.service`).