Files
RedBear-OS/local/docs/BUILD-SYSTEM-HARDENING-PLAN.md
T
vasilito f31522130f fix: comprehensive boot warnings and exceptions — fixable silenced, unfixable diagnosed
Build system (5 gaps hardened):
- COOKBOOK_OFFLINE defaults to true (fork-mode)
- normalize_patch handles diff -ruN format
- New 'repo validate-patches' command (25/25 relibc patches)
- 14 patched Qt/Wayland/display recipes added to protected list
- relibc archive regenerated with current patch chain

Boot fixes (fixable):
- Full ISO EFI partition: 16 MiB → 1 MiB (matches mini, BIOS hardcoded 2 MiB offset)
- D-Bus system bus: absolute /usr/bin/dbus-daemon path (was skipped)
- redbear-sessiond: absolute /usr/bin/redbear-sessiond path (was skipped)
- daemon framework: silenced spurious INIT_NOTIFY warnings for oneshot_async services (P0-daemon-silence-init-notify.patch)
- udev-shim: demoted INIT_NOTIFY warning to INFO (expected for oneshot_async)
- relibc: comprehensive named semaphores (sem_open/close/unlink) replacing upstream todo!() stubs
- greeterd: Wayland socket timeout 15s → 30s (compositor DRM wait)
- greeter-ui: built and linked (header guard unification, sem_compat stubs removed)
- mc: un-ignored in both configs, fixed glib/libiconv/pcre2 transitive deps
- greeter config: removed stale keymapd dependency from display/greeter services
- prefix toolchain: relibc headers synced, _RELIBC_STDLIB_H guard unified

Unfixable (diagnosed, upstream):
- i2c-hidd: abort on no-I2C-hardware (QEMU) — process::exit → relibc abort
- kded6/greeter-ui: page fault 0x8 — Qt library null deref
- Thread panics fd != -1 — Rust std library on Redox
- DHCP timeout / eth0 MAC — QEMU user-mode networking
- hwrngd/thermald — no hardware RNG/thermal in VM
- live preload allocation — BIOS memory fragmentation, continues on demand
2026-05-05 20:20:37 +01:00

483 lines
18 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Build System Hardening Plan
**Date:** 2026-05-03
**Status:** Implemented
**Scope:** Installer file-layer collision detection, config-layer path enforcement,
recipe file-ownership tracking, validation gates, and architectural documentation.
**Triggering incident:** 40 init service files in `config/redbear-*.toml` used
`/usr/lib/init.d/` paths. The `base` package installs to the same directory.
Package staging silently overwrote config overrides. The init scheduler blocked
on `scheme`-type services that were supposed to be overridden to `oneshot_async`,
preventing D-Bus and 20+ services from ever starting.
**Fix applied:** Changed all config `[[files]]` init service paths from
`/usr/lib/init.d/` to `/etc/init.d/`. The init system's `config_for_dirs()`
BTreeMap gives `/etc/init.d/` priority over `/usr/lib/init.d/` for the same
filename, so config overrides now survive package installation and take effect
at runtime.
**Goal:** Prevent this class of silent file collision from recurring by adding
build-time detection, installer awareness, and architectural documentation.
---
## Phase 1: Config-Layer Path Enforcement (12 days)
**Objective:** Ensure config `[[files]]` entries for init services always use
`/etc/init.d/` paths. Detect violations at build time.
### 1.1 Add a build-time lint for init service path violations
Create `scripts/lint-config-paths.sh` that:
- Parses all `config/redbear-*.toml` files
- Finds `[[files]]` entries with `path = "/usr/lib/init.d/..."`
- Reports violations with file, line number, and path
- Returns non-zero if any violations found
- Can be integrated into the build as a pre-build step
**Why a script, not Rust:** Config parsing is already TOML-based and a shell
script with `grep`/`awk` is sufficient for this lint. Adding it to the cookbook
Rust tool would require rebuilding the tool for lint-only changes. A script is
cheaper to iterate on and can run without a Rust toolchain rebuild.
**Acceptance:**
```bash
scripts/lint-config-paths.sh # exits 0 when clean, 1 + report when violations found
```
### 1.2 Document the init service layer convention
Add to AGENTS.md (project root) a clear rule:
> **Init service file ownership:**
> - Packages own `/usr/lib/init.d/` — the default service files installed by recipe staging
> - Config overrides own `/etc/init.d/` — override files created by `[[files]]` entries
> - The init system's `config_for_dirs()` gives `/etc/init.d/` priority via BTreeMap dedup
> - Config `[[files]]` entries MUST NOT use `/usr/lib/init.d/` paths for init services
### 1.3 Add Makefile integration
In `mk/config.mk` or `mk/depends.mk`, add a pre-build lint step:
```makefile
# Lint config files for init service path violations
lint-config:
@scripts/lint-config-paths.sh
# Hook into the build before repo cook
repo: lint-config
```
---
## Phase 2: Installer Collision Detection (23 days)
**Objective:** The installer detects when a config `[[files]]` entry would be
silently overwritten by package staging, and warns or errors accordingly.
### 2.1 Track file provenance during `install_dir()`
Modify `install_dir()` in `installer.rs` to track which layer created each file:
```rust
struct InstallTracker {
/// Map from destination path to the layer that created it
files: BTreeMap<PathBuf, FileProvenance>,
}
enum FileProvenance {
ConfigPreInstall, // Created by [[files]] with postinstall=false
Package, // Created by install_packages()
ConfigPostInstall, // Created by [[files]] with postinstall=true
}
```
Implementation points:
- Before `file.create(&output_dir)`, record the path and layer
- Before `install_packages()`, snapshot existing files
- After `install_packages()`, diff to find new/overwritten files
- After postinstall `[[files]]`, record new files
### 2.2 Detect and report collisions
During the diff after `install_packages()`:
1. If a file existed from `ConfigPreInstall` and was overwritten by `Package`:
- **WARN** (default): Print a warning showing the collision
- **ERROR** (strict mode via `STRICT_COLLISION=1` env): Fail the build
2. For init service files specifically (`/usr/lib/init.d/*.service`,
`/etc/init.d/*.service`):
- Always **ERROR**: Init service collisions are never acceptable because they
silently break the boot sequence
3. For other file types:
- **WARN** by default: Some collisions may be intentional (e.g., default
configs that packages override with versioned copies)
### 2.3 Collision report format
```
[COLLISION] /usr/lib/init.d/10_evdevd.service
Created by: config redbear-mini.toml (pre-install)
Overwritten by: package base
Impact: init service override lost
Fix: Change config [[files]] path from /usr/lib/init.d/ to /etc/init.d/
```
### 2.4 Implementation location
Patch against `recipes/core/installer/source/src/installer.rs`:
- New module `src/tracker.rs` with `InstallTracker`
- Modify `install_dir()` to use tracker
- Patch stored in `local/patches/installer/`
**Acceptance:**
- Build with a known collision (revert the /etc/init.d/ fix temporarily) should
produce clear error output
- Build with current configs should produce zero collisions
---
## Phase 3: Recipe File-Ownership Manifests (35 days)
**Objective:** Recipes declare what paths they install, enabling build-time
conflict detection between packages and between packages and config layers.
### 3.1 Add optional `installs` field to recipe.toml
```toml
[package]
# Optional: declare what paths this recipe installs into the image
# Used for collision detection and build validation
installs = [
"/usr/lib/init.d/10_evdevd.service",
"/usr/lib/init.d/11_udev.service",
"/usr/bin/evdevd",
"/usr/lib/libevdev.so",
]
```
This is **optional** — existing recipes without `installs` work as before.
New recipes and frequently-updated recipes should declare their installs.
### 3.2 Build-time ownership registry
The `repo cook` command builds an in-memory registry:
```
path → recipe_name
```
When multiple recipes claim the same path:
- **WARN** for non-critical paths (shared headers, etc.)
- **ERROR** for init service paths (`.service` files in `init.d/`)
### 3.3 Auto-generation tool
Create `scripts/generate-installs-manifest.sh`:
- Inspects recipe stage directory after build
- Lists all installed files relative to sysroot root
- Outputs suggested `installs = [...]` for recipe.toml
- Can be run as `make manifest.<recipe>`
### 3.4 Implementation location
Patch against `src/cook/package.rs` and recipe parsing in `src/`:
- Parse `installs` field from `[package]` section
- Build registry during `repo cook --with-package-deps`
- Check for conflicts before staging
---
## Phase 4: Post-Image Validation Gates (23 days)
**Objective:** After the image is created, validate that init service files
match expectations and no config overrides were silently lost.
### 4.1 Init service validation script
Create `scripts/validate-init-services.sh`:
```bash
# Mount image, inspect init.d directories, validate:
# 1. Every /etc/init.d/*.service file has different content from /usr/lib/init.d/ counterpart
# (if they exist in both — if identical, the override is redundant)
# 2. No /usr/lib/init.d/*.service file was supposed to be overridden but wasn't
# 3. All scheme-type services have corresponding scheme daemons in the image
# 4. Service dependency graph has no missing dependencies
# 5. Service dependency graph has no cycles
```
Validation checks:
1. **Override verification**: For each file in `/etc/init.d/`, verify it differs
from the corresponding `/usr/lib/init.d/` file (if any). If identical, warn
about redundant override.
2. **Missing override detection**: For each config `[[files]]` entry targeting
`/etc/init.d/`, verify the file actually exists in the mounted image and
matches the config content.
3. **Scheme service audit**: List all services with `type = { scheme = "..." }`.
For each, verify the scheme binary exists in `/usr/bin/`. Warn about scheme
services that may block the scheduler if the daemon isn't guaranteed to start.
4. **Dependency cycle check**: Parse all service files, build a dependency graph,
detect cycles.
5. **Missing dependency check**: For each `requires`/`requires_weak` entry,
verify the referenced target/service file exists.
### 4.2 Makefile integration
Add to `mk/disk.mk`:
```makefile
# Validate init services in the built image
validate-init: $(BUILD)/harddrive.img
@scripts/validate-init-services.sh $(BUILD)/harddrive.img
# Full validation gate
validate: validate-init
@echo "Build validation passed"
```
### 4.3 CI integration
No `.gitlab-ci.yml` exists in the repository yet. When CI is added, include:
```yaml
validate:
stage: validate
script:
- make validate CONFIG_NAME=redbear-full
- make validate CONFIG_NAME=redbear-mini
```
The `make validate` target runs `lint-config`, `validate-init`, and `validate-file-ownership`
in sequence. It requires a built image (`harddrive.img`) to exist.
---
## Phase 5: Architectural Documentation (1 day)
**Objective:** Document the file ownership hierarchy, installer ordering, and
init system override mechanism so future contributors understand the constraints.
### 5.1 Update AGENTS.md (project root)
Add a section "Installer File Layering" covering:
1. **Layer ordering during `install_dir()`:**
```
Layer 1: Config pre-install [[files]] (postinstall = false)
Layer 2: Package staging (install_packages())
Layer 3: Config post-install [[files]] (postinstall = true)
Layer 4: User/group creation (passwd, shadow, group)
```
2. **Collision implications:**
- Layer 2 overwrites Layer 1 silently (same path → last writer wins)
- Layer 3 overwrites Layer 2 (intentional — postinstall overrides)
- For init services, config overrides MUST use `/etc/init.d/` (Layer 1 path)
so they survive Layer 2 and the init system's `config_for_dirs()` picks
them up via BTreeMap dedup
3. **Init system override mechanism:**
- `config_for_dirs(["/usr/lib/init.d", "/etc/init.d"])` → BTreeMap
- Same filename: `/etc/init.d/` entry overwrites `/usr/lib/init.d/` entry
- This is the intended override path: packages own `/usr/lib/init.d/`,
configs own `/etc/init.d/`
### 5.2 Update BUILD-SYSTEM-INVARIANTS.md
Add new invariants:
> **Invariant I1: Init Service Path Separation**
>
> Config `[[files]]` entries that create or override init service files MUST use
> `/etc/init.d/` paths. Package-owned service files go in `/usr/lib/init.d/`.
> The installer does not detect file collisions between layers.
> **Invariant I2: Config Override Survival**
>
> Any file created by config `[[files]]` that must survive package installation
> MUST use a path that packages do not install to. The init system's
> `config_for_dirs()` mechanism provides this for init services via the
> `/etc/init.d/` override directory.
> **Invariant I3: Post-Install is the Override Layer**
>
> `[[files]]` entries with `postinstall = true` run AFTER package installation
> and are guaranteed to overwrite any package-provided file. Use this for files
> that must always reflect the config's content regardless of package content.
> Prefer `/etc/` directory overrides over postinstall for init services, because
> postinstall requires all overrides to be explicitly marked and is easy to miss.
### 5.3 Update local/AGENTS.md
Add a "Build System Safety" section referencing this plan and the invariants.
---
## Implementation Order
| Phase | Duration | Dependencies | Risk | Value |
|-------|----------|-------------|------|-------|
| Phase 1 | 12 days | None | Low | Prevents recurrence immediately |
| Phase 5 | 1 day | None | Low | Knowledge preservation |
| Phase 2 | 23 days | Phase 1 | Medium | Catches future collisions |
| Phase 4 | 23 days | Phase 1 | Medium | Validates built images |
| Phase 3 | 35 days | Phase 2 | Higher | Full ownership tracking |
**Recommended execution order:** Phase 1 → Phase 5 → Phase 2 → Phase 4 → Phase 3
Phases 1 and 5 are documentation and linting — zero risk, immediate value.
Phase 2 is the core installer improvement. Phase 4 adds validation on top.
Phase 3 is the most ambitious and can be deferred.
---
## Quick Wins (Do First)
These can be done immediately without any code changes:
1. **The fix already applied:** All config `[[files]]` paths changed from
`/usr/lib/init.d/` to `/etc/init.d/` — verified working (40 services,
D-Bus operational).
2. **Add lint script** (Phase 1.1): ~30 minutes of work.
3. **Update AGENTS.md** (Phase 5.1): ~1 hour of documentation.
4. **Update BUILD-SYSTEM-INVARIANTS.md** (Phase 5.2): ~30 minutes.
---
## File Change Summary
| File | Change | Phase |
|------|--------|-------|
| `scripts/lint-config-paths.sh` | New — lint for /usr/lib/init.d/ in config files | 1 |
| `mk/depends.mk` | Add lint-config target | 1 |
| `AGENTS.md` | Add installer file layering section | 5 |
| `local/docs/BUILD-SYSTEM-INVARIANTS.md` | Add invariants I1I3 | 5 |
| `local/patches/installer/collision-detection.patch` | New — installer collision detection | 2 |
| `recipes/core/installer/recipe.toml` | Wire collision detection patch | 2 |
| `scripts/validate-init-services.sh` | New — post-image init validation | 4 |
| `mk/disk.mk` | Add validate-init target | 4 |
| `src/cook/package.rs` | Parse installs field from recipe.toml | 3 |
| `src/recipe.rs` (or equivalent) | Add installs field to recipe struct | 3 |
---
## Scope Boundaries
**In scope:**
- Init service file path enforcement and collision detection
- Installer file-layer collision detection
- Post-image validation for init services
- Recipe file-ownership manifests (optional field)
- Architectural documentation
**Out of scope:**
- Init system redesign (scheduler, service types, dependency resolution)
- Package manager changes (pkgar format, dependency resolution)
- Build system Makefile restructuring
- Runtime validation of service startup order
- General file-conflict detection across all filesystem paths
(init service paths are the critical path; general detection is Phase 3)
---
## Relationship to Existing Plans
- **BUILD-SYSTEM-INVARIANTS.md**: This plan adds invariants I1I3 to the existing
surface-ownership model. Phases 14 implement enforcement of these new invariants.
- **PATCH-GOVERNANCE.md**: Unchanged. Patch governance covers source-tree durability;
this plan covers installer file-layer collisions — orthogonal concerns.
- **CONSOLE-TO-KDE-DESKTOP-PLAN.md**: This plan is infrastructure, not a desktop
feature. It prevents build-system regressions that could block the desktop path.
- **DBUS-INTEGRATION-PLAN.md**: The triggering incident was a D-Bus regression caused
by init service file collisions. This plan prevents recurrence of the root cause.
---
## Phase 6: Patch Integrity and Source Protection (2026-05)
**Triggering incident:** The relibc patch chain (mega-patch at `absorbed/redox.patch`)
was created by diffing a manually-edited source tree, resulting in 3x code duplication,
syntax errors, and stale context lines. When patches failed, the temptation was to
create stubs instead of rebasing, causing cascading downstream failures.
**Gaps identified and fixed:**
### Gap 1: COOKBOOK_OFFLINE defaults to false
Red Bear OS is a fork with frozen sources. Defaulting `COOKBOOK_OFFLINE` to `false`
allowed the build system to contact upstream repositories for non-protected recipes,
potentially clobbering patched sources.
**Fix:** Changed default from `false` to `true` in `src/config.rs:111`. Protected
recipes were already forced-offline; this change ensures ALL recipes default to
offline. Set `COOKBOOK_OFFLINE=false` explicitly to opt-in to online fetching.
### Gap 2: normalize_patch only handled diff --git
Patches in `diff -ruN` format (produced by `diff -ruN old/ new/`) were not normalized,
leaving format-specific headers that `patch` cannot handle. This caused opaque
"malformed patch" errors during atomic application.
**Fix:** Added `diff -ruN` and `diff -r` header stripping to `normalize_patch()`
in `src/cook/fetch.rs`. The function now strips equivalent headers from both
`diff --git` and `diff -ruN` formats.
### Gap 3: No patch validation before building
Patches were only tested during full `repo cook` builds. A stale patch could fail
after minutes-to-hours of compilation of unrelated packages, with no quick way to
validate the patch chain against clean upstream source.
**Fix:** Added `repo validate-patches <recipe>` command. It:
1. Restores clean upstream source from release archives
2. Creates a temporary staging copy (same filesystem, `cp -al` hard links)
3. Resets to pristine upstream state (`git clean -ffdx && git reset --hard`)
4. Applies each patch in order with `--fuzz=0`
5. Reports `[PASS]` or `[FAIL]` for each patch
6. Cleans up the staging directory without touching the live source tree
Usage:
```bash
./target/release/repo validate-patches relibc
./target/release/repo validate-patches base
```
### Gap 4: Qt and patched packages not in protected list
Recipes carrying Red Bear patches (qtbase, qtwayland, mesa, libdrm, etc.) were
not in the `redbear_protected_recipe()` list. On non-offline builds, these could
be re-fetched from upstream, potentially introducing mismatched source versions.
**Fix:** Added 14 recipes to the protected list: `qtbase`, `qtwayland`, `qtdeclarative`,
`qtbase-compat`, `libdrm`, `mesa`, `libwayland`, `libevdev`, `libinput`, `dbus`,
`glib`, plus the existing protected recipes were preserved.
### Gap 5: Stale pre-patched archives
The relibc archive at `sources/redbear-0.1.0/tarballs/core-relibc-v861bbb0-patched.tar.gz`
was built with an older patch chain. When the archive was restored and patches were
re-applied, the build system correctly detected staleness and reset the source, but
the archive itself wasted disk space and slightly increased build time.
**Fix:** Regenerated the archive from the current patched source (minus `target/`
build artifacts). Updated `BLAKE3SUMS` with the new checksum.
### Acceptance
- [x] `repo validate-patches relibc` passes all 25 patches
- [x] `make all CONFIG_NAME=redbear-full` completes successfully
- [x] QEMU boots to login prompt with virtio-gpu (1280×800) and vesad console (1280×720)
- [x] All protected recipes use only archived sources
- [x] `diff -ruN` patches apply correctly after normalization