Files
RedBear-OS/local/docs/CACHYOS-BOOT-EXPERIENCE-PLAN.md
T

21 KiB
Raw Blame History

Red Bear OS — CachyOS-Class Boot Experience Implementation Plan

Version: 1.0 · 2026-06-11 · Branch: 0.2.3 Status: Canonical plan for boot visual quality, display handoff, and boot speed Depends on: existing redox-drm, inputd, vesad, fbbootlogd, fbcond, bootloader Supersedes: boot-comfort fragments in CONSOLE-TO-KDE-DESKTOP-PLAN.md (boot pipeline layer only)


0. Architecture Decision

The Linux model is correct: once DRM driver becomes available, it realizes handoff automatically.

No daemon-side config awareness. No polling. No inter-daemon handshakes. When redox-drm registers scheme:drm/card0, the display path switches through the existing inputd ESTALE mechanism. Init orchestrates the lifecycle — staging the splash, detecting DRM, withdrawing the earlyfb, forwarding traffic to the new path.

Target Pipeline (Post-Plan)

UEFI GOP framebuffer (bootloader paints Red Bear logo)
  → kernel boots, passes FB env vars to init
  → init starts vesad (20_vesad.service)           ← registers display.vesa (earlyfb)
  → init starts redbear-bootanim (20_bootanim.service) ← paints splash on earlyfb
  → init starts fbbootlogd (quiet mode, hidden behind splash)
  → init starts fbcond (VT 2, behind splash)
  → redox-drm loads (04_drivers.target), registers scheme:drm/card0
  → inputd signals ESTALE on all display.* handles
  → 50_drm-handoff.service runs                    ← atomic swap: vesad → DRM
      • bootanim re-parents onto DRM FB (memcpy, no redraw)
      • fbbootlogd/fbcond reconnect to DRM
      • vesad releases bootloader FB, exits
  → SDDM/KWin start (08_userland.target)
  → bootanim fades out as greeter paints

Visible result: black → red bear logo + spinner → silent handoff → SDDM fade-in
No log text unless user presses Esc. No flicker. No blank screen.

Linux Mechanism Mapping

CachyOS / Linux Red Bear equivalent
simpledrm (kernel) vesad earlyfb + bootanim mmap
Plymouth (userspace splash) redbear-bootanim (Rust, per AGENTS.md "system-critical must be Rust")
Plymouth two-step (pre-DRM → post-DRM) bootanim Surface::VesadSurface::Drm state machine
drm_aperture_remove_conflicting_framebuffers() init-managed via 50_drm-handoff.service + 98_release_vesad.service
CONFIG_FRAMEBUFFER_CONSOLE_DEFERRED_TAKEOVER bootanim holds firmware FB visible until DRM handoff completes
Plymouth Esc-to-reveal bootanim SIGUSR2 → fbbootlogd reconnects, paints log overlay
Plymouth fade-out on greeter ready bootanim SIGTERM → 200ms fade → exit

1. Current State Assessment

What Exists

Component Location Scheme Status
Bootloader local/sources/bootloader/ UEFI GOP text menu Text-only, no logo/splash
Kernel debug display local/sources/kernel/src/devices/graphical_debug/ scheme:debug Immediately overwrites bootloader FB
vesad local/sources/base/drivers/graphics/vesad/ display.vesa Registers earlyfb. No handoff code. Stays alive.
fbbootlogd local/sources/base/drivers/graphics/fbbootlogd/ fbbootlog Overwrites FB with log text immediately. Has handoff path. VT 1.
fbcond local/sources/base/drivers/graphics/fbcond/ fbcon Text console VTs. Handoff with 4-retry limit. VT 2+.
inputd local/sources/base/drivers/inputd/ scheme:input Display/input multiplexer. Signals ESTALE on handoff.
redox-drm local/recipes/gpu/redox-drm/source/ scheme:drm 🚧 Registers DRM. Calls inputd/handle/ to announce itself.
virtio-gpud local/sources/base/drivers/graphics/virtio-gpud/ display.virtio-gpu ⚠️ Legacy, uses old GraphicsScheme API
ihdgd local/sources/base/drivers/graphics/ihdgd/ display.ihdg.* ⚠️ Legacy Intel driver
Branding assets local/Assets/images/ n/a PNGs exist, NOT integrated anywhere

What's Missing (Gap Analysis)

# Gap Impact
1 No boot splash/logo User sees raw kernel/init log text from the first millisecond
2 fbbootlogd overwrites bootloader FB immediately Any bootloader-painted pixels are destroyed within milliseconds
3 No smooth display handoff vesad stays alive, doesn't release FB memory, no coordinated transition
4 No "quiet boot" mode Kernel/init log is always shown, no way to hide it behind splash
5 Boot is slow (4 barrier syncs before SDDM) 00→02→04→06→08 target chain; each waits for all services
6 No progress indicator No animated spinner or progress bar during boot
7 No bootloader branding UEFI bootloader shows text mode selection menu only
8 vesad doesn't release FB on DRM handoff Bootloader FB stays mapped, wasting ~8MB memory
9 29_activate_console is a mess Overridden to no-op in legacy-base, then overridden again in mini. 200ms sleep hack.
10 fbcond gives up after 4 handoff retries If DRM is slow (firmware load), console silently stops
11 Legacy virtio-gpud/ihdgd may conflict Could race with redox-drm for display scheme

Init Service Order (Current)

INITFS STAGE:
  00_runtime.target → 10_inputd → 20_vesad → 20_fbbootlogd → 20_fbcond
  → 40_drivers.target → 50_rootfs → 90_initfs.target → switch_root

ROOTFS STAGE:
  00_base.target → 02_early_hw.target → 04_drivers.target → 06_services.target
  → 08_userland.target → 29_activate_console → 30_console (getty 2) → login

  For redbear-full:
  Same + 12_sddm → kwin_wayland → KDE Plasma

2. Phased Implementation Plan

PHASE 1 — Branding Infrastructure

Goal: Single source of truth for Red Bear visual assets with deterministic conversion.

Effort: 14 hours

Files:

Path Type Purpose
local/Assets/scripts/render-assets.sh script PNG → BMP/RAW conversion via imagemagick (host-side)
local/Assets/MANIFEST.sha256 text Deterministic checksums for all generated assets
local/recipes/system/redbear-assets/recipe.toml recipe (Rule 1) Stages assets to /usr/share/redbear/assets/
local/sources/redbear-assets/ source (Rule 1) Trivial install crate
local/docs/BOOT-BRANDING-SPEC.md doc Resolution policy, color profile, animation budget

Generated assets (from existing PNGs):

Asset Format Resolution Consumer
bootlogo-1080p.bmp 32-bit BGRA BMP 1920×1080 Bootloader UEFI Blt()
bootlogo-720p.bmp 32-bit BGRA BMP 1280×720 Bootloader fallback
bootlogo-tiny.bmp 32-bit BGRA BMP 640×480 VESA-only firmware
splash-1080p.raw Raw BGRA scanout 1920×1080 bootanim direct mmap
splash-1080p.anim.json JSON n/a Animation timeline

Verification:

  • render-assets.sh produces all assets, byte-identical across rebuilds
  • redbear-assets recipe stages them into sysroot

PHASE 2 — redbear-bootanim: Plymouth Equivalent

Goal: Rust userspace daemon that owns the framebuffer from vesad registration until greeter focus, rendering the Red Bear brand consistently across both earlyfb and DRM.

Effort: 12 days

Files:

Path Type Purpose
local/sources/redbear-bootanim/ source (Rule 1) Bootanim daemon source
local/sources/redbear-bootanim/src/main.rs Rust Daemon entry, signal handlers
local/sources/redbear-bootanim/src/surface.rs Rust Surface abstraction over vesad earlyfb + DRM
local/sources/redbear-bootanim/src/anim.rs Rust Animation loop (logo + spinner + progress)
local/sources/redbear-bootanim/src/progress.rs Rust Unix datagram socket for progress updates from init
local/recipes/system/redbear-bootanim/recipe.toml recipe (Rule 1) Depends on redbear-assets, inputd
config/redbear-bootanim.toml config fragment 20_bootanim.service + 50_drm-handoff + 98_release_vesad

Service wiring:

# 20_bootanim.service — runs on earlyfb, transitions to DRM
[[files]]
path = "/etc/init.d/20_bootanim.service"
data = """
[unit]
description = "Red Bear boot animation (splash)"
requires_weak = ["10_inputd.service", "20_vesad.service"]

[service]
cmd = "/usr/bin/redbear-bootanim"
args = ["--surface=vesad", "--vt=1"]
type = "simple"
respawn = false
"""

Behavior:

State Surface Renders Input
Surface::Vesad mmap'd bootloader FB Logo + spinner + progress Pass-through to fbcond
Surface::Drm /scheme/drm/card0 Same pixels (memcpy, no redraw) Pass-through
Reveal (SIGUSR2/Esc) Both Translucent log overlay on splash Log scrollback
Exit (SIGTERM) n/a 200ms fade to black, exit n/a

Key design property: Handoff is a memcpy, not a redraw. bootanim holds a cached Box<[u32]> of the last frame (~8MB). On handoff, it copies this to the DRM FB. Both surfaces end up pixel-identical — zero flicker.

Verification:

  • redbear-mini: logo appears in UEFI FB, continues through init, transitions to fbbootlogd
  • redbear-full: logo → smooth DRM handoff → SDDM fade-in (no blank gap >1 frame)
  • Esc reveals log; Esc again hides it

PHASE 3 — Atomic DRM Handoff (Linux drm_aperture Equivalent)

Goal: One-shot helper that orchestrates vesad → DRM transition in a single transaction.

Effort: 48 hours

Files:

Path Type Purpose
local/sources/redbear-bootanim/src/bin/handoff.rs Rust Handoff orchestrator binary
local/sources/redbear-bootanim/src/bin/release_fb.rs Rust Sends RELEASE_EARLYFB to vesad

Handoff sequence (in handoff.rs):

1. Send PREPARE_HANDOFF to bootanim → bootanim flushes scanout, snapshots frame, pauses animation
2. bootanim opens /scheme/drm/card0, performs ModeSetCrtc + first present
3. bootanim returns HANDOFF_READY
4. Send RELEASE_EARLYFB to vesad → vesad munmaps bootloader FB, signals ESTALE, exits
5. Send POST_HANDOFF to bootanim → bootanim resumes animation on DRM surface exclusively
6. Send REBIND_DISPLAY drm to inputd → promotes DRM to primary, ESTALE to remaining consumers
7. Exit 0

Why a separate binary: Init can enforce ordering and timeout. If handoff hangs, init moves on — user still gets a working system (stuck splash, compositor paints over it).

Timeout/fallback: If redox-drm doesn't register within 30s, handoff helper falls back to keeping splash on vesad, shows "GPU driver did not load" overlay.

Linux mapping:

Linux Red Bear
drm_aperture_remove_conflicting_framebuffers() Init via handoff.rs (driver doesn't do implicit aperture management)
CONFIG_FRAMEBUFFER_CONSOLE_DEFERRED_TAKEOVER bootanim holds firmware FB visible until handoff step 4
Plymouth show-splash / hide-splash bootanim exit + sessiond Seat transition signal

Verification:

  • redbear-full QEMU: screen never black for >1 frame during handoff
  • Disable redox-drm: fallback message appears, user can still log in via getty
  • Kill bootanim mid-handoff: handoff helper detects and recovers

PHASE 4 — Quiet Boot (Log Suppression Behind Splash)

Goal: Normal boot shows only splash. Kernel/init log hidden unless user presses Esc or boot fails.

Effort: 1 day

Files to modify:

Path Change
local/sources/base/drivers/graphics/fbbootlogd/src/main.rs Add --quiet flag (don't open display, write to logd only)
local/sources/base/drivers/graphics/fbbootlogd/src/scheme.rs Quiet mode: no display painting until SIGUSR2
local/sources/base/drivers/inputd/src/main.rs Separate "log sink" consumer role from "display" consumer
config/redbear-full.toml fbbootlogd args ["--quiet"]
config/redbear-mini.toml fbbootlogd args [] (no quiet — text target shows log)
local/docs/QUIET-BOOT-SPEC.md Kernel cmdline `redbear_quiet=0

Reveal key: Esc (configurable in /etc/redbear/bootanim.toml) → bootanim sends SIGUSR2 to fbbootlogd → fbbootlogd connects to display, paints log. Esc again → disconnects, clears overlay.

Force-reveal conditions (always show log, no quiet):

  • Kernel panic
  • redox-drm register timeout
  • Init restart loop > 2 times
  • redbear_quiet=0 kernel cmdline

Verification:

  • redbear-full: no log text during normal boot. Esc reveals, Esc hides.
  • redbear-mini: log always visible (no quiet).
  • Daemon crash during boot: log auto-reveals for 5s.

PHASE 5 — Boot Speed: Flatten the Stage Graph

Goal: Parallelize display path with hardware enumeration. Remove the 200ms sleep hack.

Effort: 12 days

Current chain (4 barrier syncs):

00_base → 02_early_hw → 04_drivers → 06_services → 08_userland → SDDM

Proposed chain (parallel branches):

00_base.target (10_inputd is the ONLY hard dep)
  ├─ [branch A — display]          [branch B — hardware]
  │   10_bootanim                    50_rootfs
  │   20_vesad                       02_early_hw.target
  │   20_fbbootlogd                  04_drivers.target
  │   20_fbcond                        redox-drm, xhcid, e1000d, ...
  │                                  06_services.target
  │                                    dbus, sessiond, dhcpd
  │
  └──────────────┬───────────────────┘
                 │
           08_userland.target
             12_sddm (requires 50_drm-handoff, not 04_drivers.target)
             29_activate_console (no sleep — waits on handoff FD)
             30_console (getty 2)

Key changes:

  • Display services and driver services run in parallel
  • 29_activate_console uses FD-barrier instead of sleep 0.2 (the FD-handoff pattern from existing pcid patches)
  • SDDM requires 50_drm-handoff.service, not 04_drivers.target
  • fbcond retry limit removed — handoff helper retries DRM internally with exponential backoff (30s budget)

Benchmark targets:

Metric QEMU target Bare-metal target
kernel_entry → bootanim started < 300ms < 200ms
bootanim → SDDM visible < 2.0s < 4.0s
kernel_entry → SDDM painted < 5.0s < 7.0s
Regression threshold >10% fails CI >10% fails CI

Verification:

  • measure-boot-stages.sh produces CSV of stage timestamps
  • QEMU video recording: splash from start to SDDM, no black gap
  • redbear-mini unchanged (speedup is redbear-full specific)

PHASE 6 — Bootloader Branding & Live Progress

Goal: Red Bear logo visible from UEFI handoff. Branded boot menu with auto-boot countdown.

Effort: 12 days

Files to add/modify:

Path Change
local/sources/bootloader/src/os/uefi/boot_logo.rs New module: Blt() bootlogo BMP at native resolution
local/sources/bootloader/src/os/uefi/display.rs Extend Output to support Blt() with 32-bit BGRA
local/sources/bootloader/src/os/uefi/video_mode.rs Prefer largest available mode, paint bootlogo
local/sources/bootloader/src/main.rs Add --quiet (default on), --menu-timeout=3 config
local/sources/bootloader/mk/uefi.mk Embed BMPs at compile time via include_bytes!
recipes/core/bootloader/recipe.toml Add redbear-assets as dependency
local/docs/BOOTLOADER-BRANDING-SPEC.md Menu layout, timeout, key bindings, text fallback

Bootloader progress bar:

  • Logo + thin progress bar at bottom (0% at start)
  • Bar fills to 10% when kernel is read from disk
  • Bar fills to 100% when kernel entry is reached
  • Same logo persists through kernel → init transition (no visible gap)

Fallback: If UEFI GOP doesn't support Blt(), bootloader falls back to text mode. Splash from Phase 2 still works.

Verification:

  • redbear-full ISO in QEMU: red bear logo in UEFI FB, 3s menu, smooth transition to kernel FB
  • Bare metal AMD + Intel: same behavior
  • Firmware without Blt(): text fallback works

PHASE 7 — Early Graphical Greeter

Goal: Something graphical appears before full SDDM/KWin is ready (~2s splash → ~3s minimal greeter).

Effort: 12 days

Files:

Path Type Purpose
local/recipes/wayland/redbear-compositor/source/src/bin/mini.rs Rust Minimal Wayland greeter (user selector on black bg)
config/redbear-greeter-services.toml config 11_mini-greeter.service between handoff and SDDM

The mini greeter:

  • Tiny Wayland compositor (few hundred lines Rust)
  • Shows single user selector per configured user
  • Owns the wl_display before KWin
  • On user selection: calls org.freedesktop.login1.Manager.SwitchToUser(uid), exits
  • Init then starts 12_sddm which inherits the Wayland display

Verification:

  • redbear-full: splash → mini greeter (~500ms) → user selection → KWin/Plasma
  • Total time < 7s on QEMU
  • redbear-mini: unchanged

PHASE 8 — Clean FB Resource Management

Goal: vesad releases bootloader FB on handoff. Memory accounting is auditable.

Effort: 48 hours

Files to modify:

Path Change
local/sources/base/drivers/graphics/vesad/src/main.rs On RELEASE_EARLYFB: munmap FB, close FD, log freed bytes, exit 0
local/sources/base/drivers/graphics/vesad/src/scheme.rs Track FB lifetime in Resource struct
local/sources/base/drivers/inputd/src/main.rs On handoff: query vesad resource, log freed bytes, 30s kill watchdog
config/redbear-bootanim.toml Add vesad-release-timeout watchdog service
local/docs/FB-RESOURCE-LIFECYCLE.md Full lifecycle diagram with byte counts

FB lifecycle:

Bootloader → vesad mmap (8MB) → redox-drm allocates DRM FB (8MB)
→ handoff: both mapped briefly (16MB) → release vesad → only DRM (8MB)

Verification:

  • /var/log/logd shows FB byte counts through lifecycle
  • Watchdog kills vesad if release hangs >30s
  • redbear-mini: vesad stays alive (no DRM, no release)

3. Dependency Graph

Phase 1 (branding assets)          ← everything downstream
  │
Phase 2 (bootanim daemon)          ← needs Phase 1 assets
  │
Phase 3 (atomic handoff)           ← needs Phase 2 state machine
  │
Phase 4 (quiet boot)               ← independent, parallelizable
  │
Phase 5 (boot speed graph)         ← needs Phase 3 (handoff is the barrier)
  │
Phase 6 (bootloader branding)      ← independent, parallelizable
  │
Phase 7 (mini greeter)             ← needs Phase 3 + Phase 5
  │
Phase 8 (FB resource mgmt)         ← needs Phase 3 (release step)

Critical path: 1 → 2 → 3 → 5 → 7
Parallelizable: 4, 6, 8

4. Effort Summary

Phase Effort Risk Rollback
1. Branding assets 14 h Trivial (host-side imagemagick) Delete recipe + config
2. bootanim daemon 12 d Handoff correctness is subtle Disable service; log/console still works
3. Atomic handoff 48 h Low (thin orchestrator) Fallback to vesad if handoff fails
4. Quiet boot 1 d Reveal key must work pre-fbcond Per-config opt-in; mini unchanged
5. Boot speed 12 d Invasive stage graph restructure Revert config; one git checkout
6. Bootloader branding 12 d UEFI Blt() varies by firmware Text mode fallback preserved
7. Mini greeter 12 d New UI; keyboard handling Opt-in per config; SDDM still works
8. FB resource mgmt 48 h Force-killing vesad could break consumers Disable watchdog service

Total: ~710 working days for a single engineer to land all 8 phases. First visible improvement: Phase 1 + Phase 2 (~2 days) → bootloader logo + splash on earlyfb. Full CachyOS-class experience: All 8 phases.


5. Watch-Outs

  1. Bootloader Blt() is firmware-dependent. Test on ≥2 bare-metal firmwares + QEMU OVMF. If GOP doesn't support Blt(), text fallback kicks in.
  2. Resolution mismatch on handoff. If DRM mode differs from vesad earlyfb, bootanim resamples the cached frame (Lanczos). Worst case: Intel i915 at 1366×768 panel + 1920×1080 DRM mode.
  3. Init FD-handoff semantics assumed by Phase 5 (pass_fds = [3]) must be verified in init source before restructuring the boot graph.
  4. No patches in local/patches/. All changes are direct edits in local/sources/<component>/ (Rule 1) or tracked config fragments.
  5. Actual source paths: local/sources/base/drivers/graphics/<daemon>/, not local/sources/base/src/daemon/. Verify before editing.
  6. KWin QML gate: If full Plasma can't boot, Phase 7's mini greeter is the graceful degradation. Working graphical session without Plasma is better than stuck boot.
  7. Legacy virtio-gpud/ihdgd conflict: Verify config/redbear-full.toml excludes these. If they ship alongside redox-drm, they'll race for the display scheme.

6. Immediate Next Steps (Blocking Issues)

Before starting Phase 1, fix these existing issues that block a clean boot:

  1. Init stops at thermald — why console services (29-31) never start. Need runtime debug output from init.
  2. 29_activate_console.service no-op — redbear-legacy-base.toml overrides to cmd = "true". VT 2 never activated.
  3. Remove temporary debug code from init main.rs (INIT_LOG_LEVEL=DEBUG, debug_log function).
  4. Fix 00_acpid.service reference00_driver-manager.service references non-existent 00_acpid.service (should be 30_acpid.service).