Fix boot-to-login: override pcid-spawner to oneshot_async, add U3 input producers, Intel HDA phases A-D
- Override 00_pcid-spawner.service to oneshot_async in redbear-legacy-base.toml: rootfs phase no longer blocks on PCI driver init; getty/login starts immediately. Confirmed working on both QEMU and bare metal (redbear-live-mini). - Clean up 00_base legacy script: remove dead notify ipcd/ptyd calls, keep sudo --daemon. - Add U3 named input producers: inputd supports per-device named producers with fan-out to both device-specific consumers and legacy VT consumers. Migrate ps2d and usbhidd to InputProducer trait. RESERVED_DEVICE_NAMES deduplicated. - Add Intel HDA audio driver phases A-D: ihdad error handling (37 fixes), audio quirks, codec path enumeration, mixer/volume control. - Add init service start/readiness logging (always visible, not debug-gated). - Update BOOT-PROCESS-ASSESSMENT.md: Phase 6 complete, boot procedure documented, validation matrix updated with confirmed boot status. - Update USB-IMPLEMENTATION-PLAN.md and INPUT-SCHEME-ENHANCEMENT.md for U2/U3 status.
This commit is contained in:
@@ -1,8 +1,8 @@
|
||||
# Red Bear OS Boot Process Assessment & Improvement Plan
|
||||
|
||||
**Generated:** 2026-04-23
|
||||
**Updated:** 2026-04-23
|
||||
**Status:** Phase 1 ✅, Phase 2 ✅, Phase 3 ✅, Phase 4 ✅ (docs + known gaps), Phase 5 ✅
|
||||
**Updated:** 2026-04-24
|
||||
**Status:** Phase 1 ✅, Phase 2 ✅, Phase 3 ✅, Phase 4 ✅ (docs + known gaps), Phase 5 ✅, Phase 6 ✅ (boot to login confirmed)
|
||||
**Scope:** Comprehensive assessment of boot completeness, mistakes, robustness, resilience, and quality
|
||||
|
||||
## Boot Chain Overview
|
||||
@@ -128,12 +128,12 @@ Bare-metal testing requires physical hardware. Current validation is:
|
||||
## Phase 5: Validation Matrix ✅
|
||||
|
||||
### Build Verification
|
||||
| Target | Build | QEMU Boot | Notes |
|
||||
|--------|-------|-----------|-------|
|
||||
| redbear-minimal | ✅ harddrive.img (2 GB) | ✅ Stage 2 (kernel loaded) | Login renders to framebuffer, not serial |
|
||||
| redbear-full | ✅ harddrive.img (4 GB) | ✅ (prior session) | Greeter services load |
|
||||
| redbear-live-mini | ✅ ISO (384 MB) | — | ISO for bare-metal boot |
|
||||
| redbear-live | ✅ ISO (3.0 GB) | — | ISO for bare-metal boot |
|
||||
| Target | Build | QEMU Boot | Bare-Metal Boot | Notes |
|
||||
|--------|-------|-----------|-----------------|-------|
|
||||
| redbear-mini | ✅ harddrive.img (2 GB) | ✅ Login prompt | — | Framebuffer console login |
|
||||
| redbear-full | ✅ harddrive.img (4 GB) | ✅ Login prompt | — | Desktop packages included |
|
||||
| redbear-live-mini | ✅ ISO (384 MB) | — | ✅ Login prompt | ISO for bare-metal boot |
|
||||
| redbear-live-full | ✅ ISO (3.0 GB) | — | — | ISO for bare-metal boot |
|
||||
|
||||
### Compilation Verification
|
||||
- `cargo check --workspace` in base source: **0 errors**
|
||||
@@ -179,6 +179,82 @@ CI=1 make all CONFIG_NAME=redbear-minimal ARCH=x86_64
|
||||
|
||||
## Key Technical Findings
|
||||
|
||||
### Bare-Metal Boot Log Analysis (2026-04-24)
|
||||
|
||||
AMD machine boot log shows initfs phase starts but never completes:
|
||||
- Kernel boots: ACPI, IOAPIC, timer, memory all OK
|
||||
- vesad initializes: 1280x1024 at 0xA0000000 (FRAMEBUFFER_* from UEFI bootloader)
|
||||
- fbbootlogd maps display
|
||||
- ps2d: keyboard works, mouse BAT fails (no PS/2 mouse port — expected on modern hardware)
|
||||
- pcid begins PCI enumeration
|
||||
- acpid starts, AML interpreter initializes
|
||||
- **MISSING**: "init: initfs drivers target step() complete" — scheduler.step() never returns
|
||||
- **MISSING**: "init: phase 2 — switchroot to /usr" — rootfs phase never starts
|
||||
- **MISSING**: any getty or login output
|
||||
|
||||
Root cause hypothesis (unproven): a service with `type = "notify"`, `type = { scheme = "..." }`,
|
||||
or `type = "oneshot"` in the initfs phase does not signal readiness or does not exit,
|
||||
causing init's scheduler.step() to block forever. All three service types wait synchronously
|
||||
in `service.rs`. Possible blockers include:
|
||||
- A `notify` service that hangs before calling `daemon::Daemon::ready()`
|
||||
- A `scheme` service that hangs before calling `daemon::SchemeDaemon::ready_*()`
|
||||
- An `oneshot` service like `pcid-spawner --initfs` that hangs during PCI enumeration
|
||||
With the new per-service logging (Phase 6A + 6C), the next boot will show exactly which
|
||||
service blocks — the last `init: starting ...` line before the hang identifies the blocker.
|
||||
|
||||
### Bare-Metal/QEMU Boot Log Analysis (2026-04-24, second test with Phase 6 logging)
|
||||
|
||||
The enhanced logging proved the initfs phase completes successfully. The actual blocker is
|
||||
in the rootfs phase:
|
||||
|
||||
- Initfs phase: ✅ all services start and signal readiness/exit correctly
|
||||
- `init: phase 2 - switchroot to /usr` ✅
|
||||
- `init: scheduling 22 rootfs units` ✅
|
||||
- `init: starting PCI driver spawner (pcid-spawner)` ← **BLOCKS HERE**
|
||||
- pcid-spawner (rootfs, `type = "oneshot"`) spawns e1000d (ok), ihdad (fails with RIRB timeout)
|
||||
- Then hangs — no further output for 30+ seconds while system is alive (keyboard works)
|
||||
- Init never reaches `30_console` → getty → login
|
||||
|
||||
Root cause (confirmed): rootfs `00_pcid-spawner.service` uses `type = "oneshot"`, which
|
||||
causes init to block until pcid-spawner exits. On real hardware and QEMU, pcid-spawner
|
||||
can hang waiting for a PCI device driver that never responds, blocking the entire rootfs
|
||||
phase including getty/login.
|
||||
|
||||
Fix: override `00_pcid-spawner.service` to `type = "oneshot_async"` in
|
||||
`config/redbear-legacy-base.toml`. Drivers spawn in the background while init proceeds
|
||||
to start console services. Network services that depend on specific drivers handle their
|
||||
own timing (they connect to driver schemes when ready).
|
||||
|
||||
**Confirmed working**: Both QEMU and bare-metal boot to login prompt after this fix.
|
||||
|
||||
### Phase 6: Boot Visibility & Service Cleanup ✅
|
||||
|
||||
**Status: Confirmed working — system boots to login prompt on both QEMU and bare metal.**
|
||||
|
||||
**6A: Init service start logging (always visible)**
|
||||
`init/src/scheduler.rs`: Service and target start messages promoted from DEBUG to always-visible.
|
||||
Every service now logs `init: starting <description> (<cmd>)` before spawning and
|
||||
`init: started <description> (pid <N>)` after a respawnable process is created.
|
||||
|
||||
**6B: Legacy init script cleanup**
|
||||
`config/redbear-legacy-base.toml`:
|
||||
- `00_base`: Removed dead `notify ipcd` / `notify ptyd` calls.
|
||||
The `notify` binary does not exist anywhere in the build tree — these calls always failed
|
||||
silently. ipcd and ptyd are started by the base recipe's systemd-style services
|
||||
(`00_ipcd.service`, `00_ptyd.service`). sudo --daemon is kept because `00_sudo.service`
|
||||
exists in the base recipe but is not wired into any target that gets scheduled.
|
||||
The script now does tmpdir setup + sudo --daemon.
|
||||
- `00_drivers`: Blanked (was redundant — pcid-spawner starts via `00_pcid-spawner.service`).
|
||||
|
||||
**6C: Service readiness completion logging**
|
||||
`init/src/service.rs`: Added success log after each blocking wait completes:
|
||||
- `notify` services: `init: <cmd> ready (notify)` after readiness byte received
|
||||
- `scheme` services: `init: <cmd> ready (scheme <name>)` after scheme registered
|
||||
- `oneshot` services: `init: <cmd> done (oneshot)` after process exits successfully
|
||||
Combined with 6A's `init: starting ...` before spawn, the boot log now shows the full
|
||||
lifecycle of every blocking service — any gap between "starting" and "ready/done" pinpoints
|
||||
the blocker.
|
||||
|
||||
### Serde `deny_unknown_fields` Behavior
|
||||
`UnitInfo` and `Service` structs use `#[serde(deny_unknown_fields)]`. Any unrecognized field in `[unit]` or `[service]` sections causes the ENTIRE service file to fail deserialization. The init system logs the error and skips the service — it never starts.
|
||||
|
||||
@@ -192,10 +268,18 @@ Services with `type = "oneshot_async"` are fire-and-forget by default. Init spaw
|
||||
|
||||
### Config Include Chain
|
||||
```
|
||||
redbear-minimal.toml → minimal.toml, redbear-legacy-base.toml, redbear-device-services.toml, redbear-netctl.toml
|
||||
redbear-full.toml → desktop.toml, redbear-desktop.toml, redbear-greeter-services.toml, ...
|
||||
redbear-live-full.toml → redbear-live.toml
|
||||
redbear-live.toml → redbear-full.toml
|
||||
redbear-full.toml → desktop.toml, redbear-legacy-base.toml, redbear-legacy-desktop.toml,
|
||||
redbear-device-services.toml, redbear-netctl.toml, redbear-greeter-services.toml
|
||||
desktop.toml → desktop-minimal.toml, server.toml
|
||||
desktop-minimal.toml → minimal.toml
|
||||
server.toml → minimal.toml
|
||||
minimal.toml → base.toml
|
||||
|
||||
redbear-live-mini.toml → minimal.toml, redbear-legacy-base.toml, redbear-netctl.toml
|
||||
redbear-live.toml → redbear-full.toml, ...
|
||||
redbear-mini → redbear-minimal.toml → minimal.toml, redbear-legacy-base.toml,
|
||||
redbear-device-services.toml, redbear-netctl.toml
|
||||
```
|
||||
|
||||
### Upstream Targets (not Red Bear defined)
|
||||
@@ -266,3 +350,117 @@ redbear-live.toml → redbear-full.toml, ...
|
||||
- `local/scripts/validate-service-files.sh` — manual service schema validation (redbear-*.toml only)
|
||||
- `local/docs/BOOT-PROCESS-ASSESSMENT.md` — this document
|
||||
- `recipes/core/base/source/init.initfs.d/41_acpid.service` — acpid in initfs (boot race fix)
|
||||
|
||||
## Boot Procedure
|
||||
|
||||
### Supported compile targets
|
||||
|
||||
| Target | Purpose | Output |
|
||||
|--------|---------|--------|
|
||||
| `redbear-mini` | Minimal non-desktop (QEMU + bare metal) | `build/x86_64/harddrive.img` |
|
||||
| `redbear-live-mini` | Minimal live ISO (bare metal only) | `build/x86_64/redbear-live-mini.iso` |
|
||||
| `redbear-full` | Desktop/graphics (QEMU + bare metal) | `build/x86_64/harddrive.img` |
|
||||
| `redbear-live-full` / `redbear-live` | Desktop/graphics live ISO (bare metal only) | `build/x86_64/redbear-live-full.iso` |
|
||||
|
||||
### Build commands
|
||||
|
||||
```bash
|
||||
# Minimal target (QEMU testing)
|
||||
CI=1 make all CONFIG_NAME=redbear-mini ARCH=x86_64
|
||||
|
||||
# Minimal live ISO (bare-metal boot)
|
||||
CI=1 make live CONFIG_NAME=redbear-live-mini ARCH=x86_64
|
||||
|
||||
# Desktop/graphics target (QEMU testing)
|
||||
CI=1 make all CONFIG_NAME=redbear-full ARCH=x86_64
|
||||
|
||||
# Desktop/graphics live ISO (bare-metal boot)
|
||||
CI=1 make live CONFIG_NAME=redbear-live-full ARCH=x86_64
|
||||
```
|
||||
|
||||
### QEMU boot (harddrive.img)
|
||||
|
||||
```bash
|
||||
# Boot the minimal target in QEMU
|
||||
make qemu CONFIG_NAME=redbear-mini
|
||||
|
||||
# Boot with more RAM
|
||||
make qemu CONFIG_NAME=redbear-mini QEMUFLAGS="-m 4G"
|
||||
|
||||
# Boot desktop target
|
||||
make qemu CONFIG_NAME=redbear-full
|
||||
```
|
||||
|
||||
QEMU boots from `harddrive.img` (not the live ISO). The `-serial mon:stdio` flag provides
|
||||
the serial console, but Red Bear uses the framebuffer console for login — type at the
|
||||
graphical console, not serial.
|
||||
|
||||
### Bare-metal boot (live ISO)
|
||||
|
||||
1. **Build the ISO:**
|
||||
```bash
|
||||
CI=1 make live CONFIG_NAME=redbear-live-mini ARCH=x86_64
|
||||
```
|
||||
|
||||
2. **Write ISO to USB drive:**
|
||||
```bash
|
||||
sudo dd if=build/x86_64/redbear-live-mini.iso of=/dev/sdX bs=4M status=progress && sync
|
||||
```
|
||||
Replace `/dev/sdX` with your USB device. Use `lsblk` to identify it.
|
||||
|
||||
3. **Boot from USB:**
|
||||
- Insert USB into target machine
|
||||
- Power on, enter UEFI boot menu (typically F12, F8, or Esc)
|
||||
- Select the USB device as boot target
|
||||
- Red Bear OS boots from UEFI → bootloader → kernel → init → login prompt
|
||||
|
||||
4. **Login:**
|
||||
- Default user: `root`, no password
|
||||
- The framebuffer console displays the login prompt after boot completes
|
||||
|
||||
### What happens during boot
|
||||
|
||||
```
|
||||
UEFI firmware
|
||||
→ Red Bear bootloader (loaded from EFI system partition)
|
||||
→ Kernel (kstart → start → kmain)
|
||||
→ userspace_init → bootstrap (forks initfs/procmgr/initnsmgr)
|
||||
→ Initfs phase:
|
||||
logd, inputd, vesad (framebuffer), fbcond, fbbootlogd,
|
||||
ps2d (keyboard), acpid, pcid-spawner-initfs (initfs PCI drivers), lived, redoxfs
|
||||
→ switchroot /usr
|
||||
→ Rootfs phase:
|
||||
00_base (tmpdir + sudo --daemon)
|
||||
00_ipcd.service, 00_ptyd.service
|
||||
00_pcid-spawner.service (async — spawns PCI drivers in background)
|
||||
30_console (getty with respawn)
|
||||
→ Login prompt on framebuffer console
|
||||
```
|
||||
|
||||
### Boot log markers
|
||||
|
||||
The init system logs the following always-visible markers. If boot hangs, the last visible
|
||||
marker identifies the blocker:
|
||||
|
||||
```
|
||||
init: phase 1 — initfs boot
|
||||
init: starting <description> (<cmd>) # before each service spawn
|
||||
init: <cmd> ready (notify) # notify-type service ready
|
||||
init: <cmd> ready (scheme <name>) # scheme-type service ready
|
||||
init: <cmd> done (oneshot) # oneshot service exited
|
||||
init: phase 2 — switchroot to /usr
|
||||
init: scheduling N rootfs units
|
||||
init: reached target <description>
|
||||
init: phase 3 — rootfs services started
|
||||
init: boot complete — entering waitpid loop
|
||||
```
|
||||
|
||||
### Troubleshooting
|
||||
|
||||
| Symptom | Likely cause | Fix |
|
||||
|---------|-------------|-----|
|
||||
| No display output | UEFI framebuffer not provided | Try different USB port or disable CSM in UEFI settings |
|
||||
| Boot hangs after "scheduling N rootfs units" | A blocking service hangs | Check last "starting" line; `pcid-spawner` was previously the blocker |
|
||||
| Keyboard not working | PS/2 unavailable, USB not ready | Modern hardware uses USB — ensure xHCI controller is functional |
|
||||
| No login prompt | Getty not starting | Check `30_console` service in config; verify getty respawn is set |
|
||||
| "missing field `unit`" parse error | Invalid service TOML | Run `./local/scripts/validate-service-files.sh config/` |
|
||||
|
||||
Reference in New Issue
Block a user