Files
RedBear-OS/local/docs/BUILD-SYSTEM-HARDENING-PLAN.md
T
vasilito f31522130f fix: comprehensive boot warnings and exceptions — fixable silenced, unfixable diagnosed
Build system (5 gaps hardened):
- COOKBOOK_OFFLINE defaults to true (fork-mode)
- normalize_patch handles diff -ruN format
- New 'repo validate-patches' command (25/25 relibc patches)
- 14 patched Qt/Wayland/display recipes added to protected list
- relibc archive regenerated with current patch chain

Boot fixes (fixable):
- Full ISO EFI partition: 16 MiB → 1 MiB (matches mini, BIOS hardcoded 2 MiB offset)
- D-Bus system bus: absolute /usr/bin/dbus-daemon path (was skipped)
- redbear-sessiond: absolute /usr/bin/redbear-sessiond path (was skipped)
- daemon framework: silenced spurious INIT_NOTIFY warnings for oneshot_async services (P0-daemon-silence-init-notify.patch)
- udev-shim: demoted INIT_NOTIFY warning to INFO (expected for oneshot_async)
- relibc: comprehensive named semaphores (sem_open/close/unlink) replacing upstream todo!() stubs
- greeterd: Wayland socket timeout 15s → 30s (compositor DRM wait)
- greeter-ui: built and linked (header guard unification, sem_compat stubs removed)
- mc: un-ignored in both configs, fixed glib/libiconv/pcre2 transitive deps
- greeter config: removed stale keymapd dependency from display/greeter services
- prefix toolchain: relibc headers synced, _RELIBC_STDLIB_H guard unified

Unfixable (diagnosed, upstream):
- i2c-hidd: abort on no-I2C-hardware (QEMU) — process::exit → relibc abort
- kded6/greeter-ui: page fault 0x8 — Qt library null deref
- Thread panics fd != -1 — Rust std library on Redox
- DHCP timeout / eth0 MAC — QEMU user-mode networking
- hwrngd/thermald — no hardware RNG/thermal in VM
- live preload allocation — BIOS memory fragmentation, continues on demand
2026-05-05 20:20:37 +01:00

18 KiB
Raw Blame History

Build System Hardening Plan

Date: 2026-05-03 Status: Implemented Scope: Installer file-layer collision detection, config-layer path enforcement, recipe file-ownership tracking, validation gates, and architectural documentation.

Triggering incident: 40 init service files in config/redbear-*.toml used /usr/lib/init.d/ paths. The base package installs to the same directory. Package staging silently overwrote config overrides. The init scheduler blocked on scheme-type services that were supposed to be overridden to oneshot_async, preventing D-Bus and 20+ services from ever starting.

Fix applied: Changed all config [[files]] init service paths from /usr/lib/init.d/ to /etc/init.d/. The init system's config_for_dirs() BTreeMap gives /etc/init.d/ priority over /usr/lib/init.d/ for the same filename, so config overrides now survive package installation and take effect at runtime.

Goal: Prevent this class of silent file collision from recurring by adding build-time detection, installer awareness, and architectural documentation.


Phase 1: Config-Layer Path Enforcement (12 days)

Objective: Ensure config [[files]] entries for init services always use /etc/init.d/ paths. Detect violations at build time.

1.1 Add a build-time lint for init service path violations

Create scripts/lint-config-paths.sh that:

  • Parses all config/redbear-*.toml files
  • Finds [[files]] entries with path = "/usr/lib/init.d/..."
  • Reports violations with file, line number, and path
  • Returns non-zero if any violations found
  • Can be integrated into the build as a pre-build step

Why a script, not Rust: Config parsing is already TOML-based and a shell script with grep/awk is sufficient for this lint. Adding it to the cookbook Rust tool would require rebuilding the tool for lint-only changes. A script is cheaper to iterate on and can run without a Rust toolchain rebuild.

Acceptance:

scripts/lint-config-paths.sh  # exits 0 when clean, 1 + report when violations found

1.2 Document the init service layer convention

Add to AGENTS.md (project root) a clear rule:

Init service file ownership:

  • Packages own /usr/lib/init.d/ — the default service files installed by recipe staging
  • Config overrides own /etc/init.d/ — override files created by [[files]] entries
  • The init system's config_for_dirs() gives /etc/init.d/ priority via BTreeMap dedup
  • Config [[files]] entries MUST NOT use /usr/lib/init.d/ paths for init services

1.3 Add Makefile integration

In mk/config.mk or mk/depends.mk, add a pre-build lint step:

# Lint config files for init service path violations
lint-config:
	@scripts/lint-config-paths.sh

# Hook into the build before repo cook
repo: lint-config

Phase 2: Installer Collision Detection (23 days)

Objective: The installer detects when a config [[files]] entry would be silently overwritten by package staging, and warns or errors accordingly.

2.1 Track file provenance during install_dir()

Modify install_dir() in installer.rs to track which layer created each file:

struct InstallTracker {
    /// Map from destination path to the layer that created it
    files: BTreeMap<PathBuf, FileProvenance>,
}

enum FileProvenance {
    ConfigPreInstall,   // Created by [[files]] with postinstall=false
    Package,            // Created by install_packages()
    ConfigPostInstall,  // Created by [[files]] with postinstall=true
}

Implementation points:

  • Before file.create(&output_dir), record the path and layer
  • Before install_packages(), snapshot existing files
  • After install_packages(), diff to find new/overwritten files
  • After postinstall [[files]], record new files

2.2 Detect and report collisions

During the diff after install_packages():

  1. If a file existed from ConfigPreInstall and was overwritten by Package:

    • WARN (default): Print a warning showing the collision
    • ERROR (strict mode via STRICT_COLLISION=1 env): Fail the build
  2. For init service files specifically (/usr/lib/init.d/*.service, /etc/init.d/*.service):

    • Always ERROR: Init service collisions are never acceptable because they silently break the boot sequence
  3. For other file types:

    • WARN by default: Some collisions may be intentional (e.g., default configs that packages override with versioned copies)

2.3 Collision report format

[COLLISION] /usr/lib/init.d/10_evdevd.service
  Created by: config redbear-mini.toml (pre-install)
  Overwritten by: package base
  Impact: init service override lost
  Fix: Change config [[files]] path from /usr/lib/init.d/ to /etc/init.d/

2.4 Implementation location

Patch against recipes/core/installer/source/src/installer.rs:

  • New module src/tracker.rs with InstallTracker
  • Modify install_dir() to use tracker
  • Patch stored in local/patches/installer/

Acceptance:

  • Build with a known collision (revert the /etc/init.d/ fix temporarily) should produce clear error output
  • Build with current configs should produce zero collisions

Phase 3: Recipe File-Ownership Manifests (35 days)

Objective: Recipes declare what paths they install, enabling build-time conflict detection between packages and between packages and config layers.

3.1 Add optional installs field to recipe.toml

[package]
# Optional: declare what paths this recipe installs into the image
# Used for collision detection and build validation
installs = [
    "/usr/lib/init.d/10_evdevd.service",
    "/usr/lib/init.d/11_udev.service",
    "/usr/bin/evdevd",
    "/usr/lib/libevdev.so",
]

This is optional — existing recipes without installs work as before. New recipes and frequently-updated recipes should declare their installs.

3.2 Build-time ownership registry

The repo cook command builds an in-memory registry:

path → recipe_name

When multiple recipes claim the same path:

  • WARN for non-critical paths (shared headers, etc.)
  • ERROR for init service paths (.service files in init.d/)

3.3 Auto-generation tool

Create scripts/generate-installs-manifest.sh:

  • Inspects recipe stage directory after build
  • Lists all installed files relative to sysroot root
  • Outputs suggested installs = [...] for recipe.toml
  • Can be run as make manifest.<recipe>

3.4 Implementation location

Patch against src/cook/package.rs and recipe parsing in src/:

  • Parse installs field from [package] section
  • Build registry during repo cook --with-package-deps
  • Check for conflicts before staging

Phase 4: Post-Image Validation Gates (23 days)

Objective: After the image is created, validate that init service files match expectations and no config overrides were silently lost.

4.1 Init service validation script

Create scripts/validate-init-services.sh:

# Mount image, inspect init.d directories, validate:
# 1. Every /etc/init.d/*.service file has different content from /usr/lib/init.d/ counterpart
#    (if they exist in both — if identical, the override is redundant)
# 2. No /usr/lib/init.d/*.service file was supposed to be overridden but wasn't
# 3. All scheme-type services have corresponding scheme daemons in the image
# 4. Service dependency graph has no missing dependencies
# 5. Service dependency graph has no cycles

Validation checks:

  1. Override verification: For each file in /etc/init.d/, verify it differs from the corresponding /usr/lib/init.d/ file (if any). If identical, warn about redundant override.

  2. Missing override detection: For each config [[files]] entry targeting /etc/init.d/, verify the file actually exists in the mounted image and matches the config content.

  3. Scheme service audit: List all services with type = { scheme = "..." }. For each, verify the scheme binary exists in /usr/bin/. Warn about scheme services that may block the scheduler if the daemon isn't guaranteed to start.

  4. Dependency cycle check: Parse all service files, build a dependency graph, detect cycles.

  5. Missing dependency check: For each requires/requires_weak entry, verify the referenced target/service file exists.

4.2 Makefile integration

Add to mk/disk.mk:

# Validate init services in the built image
validate-init: $(BUILD)/harddrive.img
	@scripts/validate-init-services.sh $(BUILD)/harddrive.img

# Full validation gate
validate: validate-init
	@echo "Build validation passed"

4.3 CI integration

No .gitlab-ci.yml exists in the repository yet. When CI is added, include:

validate:
  stage: validate
  script:
    - make validate CONFIG_NAME=redbear-full
    - make validate CONFIG_NAME=redbear-mini

The make validate target runs lint-config, validate-init, and validate-file-ownership in sequence. It requires a built image (harddrive.img) to exist.


Phase 5: Architectural Documentation (1 day)

Objective: Document the file ownership hierarchy, installer ordering, and init system override mechanism so future contributors understand the constraints.

5.1 Update AGENTS.md (project root)

Add a section "Installer File Layering" covering:

  1. Layer ordering during install_dir():

    Layer 1: Config pre-install [[files]]    (postinstall = false)
    Layer 2: Package staging                  (install_packages())
    Layer 3: Config post-install [[files]]    (postinstall = true)
    Layer 4: User/group creation              (passwd, shadow, group)
    
  2. Collision implications:

    • Layer 2 overwrites Layer 1 silently (same path → last writer wins)
    • Layer 3 overwrites Layer 2 (intentional — postinstall overrides)
    • For init services, config overrides MUST use /etc/init.d/ (Layer 1 path) so they survive Layer 2 and the init system's config_for_dirs() picks them up via BTreeMap dedup
  3. Init system override mechanism:

    • config_for_dirs(["/usr/lib/init.d", "/etc/init.d"]) → BTreeMap
    • Same filename: /etc/init.d/ entry overwrites /usr/lib/init.d/ entry
    • This is the intended override path: packages own /usr/lib/init.d/, configs own /etc/init.d/

5.2 Update BUILD-SYSTEM-INVARIANTS.md

Add new invariants:

Invariant I1: Init Service Path Separation

Config [[files]] entries that create or override init service files MUST use /etc/init.d/ paths. Package-owned service files go in /usr/lib/init.d/. The installer does not detect file collisions between layers.

Invariant I2: Config Override Survival

Any file created by config [[files]] that must survive package installation MUST use a path that packages do not install to. The init system's config_for_dirs() mechanism provides this for init services via the /etc/init.d/ override directory.

Invariant I3: Post-Install is the Override Layer

[[files]] entries with postinstall = true run AFTER package installation and are guaranteed to overwrite any package-provided file. Use this for files that must always reflect the config's content regardless of package content. Prefer /etc/ directory overrides over postinstall for init services, because postinstall requires all overrides to be explicitly marked and is easy to miss.

5.3 Update local/AGENTS.md

Add a "Build System Safety" section referencing this plan and the invariants.


Implementation Order

Phase Duration Dependencies Risk Value
Phase 1 12 days None Low Prevents recurrence immediately
Phase 5 1 day None Low Knowledge preservation
Phase 2 23 days Phase 1 Medium Catches future collisions
Phase 4 23 days Phase 1 Medium Validates built images
Phase 3 35 days Phase 2 Higher Full ownership tracking

Recommended execution order: Phase 1 → Phase 5 → Phase 2 → Phase 4 → Phase 3

Phases 1 and 5 are documentation and linting — zero risk, immediate value. Phase 2 is the core installer improvement. Phase 4 adds validation on top. Phase 3 is the most ambitious and can be deferred.


Quick Wins (Do First)

These can be done immediately without any code changes:

  1. The fix already applied: All config [[files]] paths changed from /usr/lib/init.d/ to /etc/init.d/ — verified working (40 services, D-Bus operational).

  2. Add lint script (Phase 1.1): ~30 minutes of work.

  3. Update AGENTS.md (Phase 5.1): ~1 hour of documentation.

  4. Update BUILD-SYSTEM-INVARIANTS.md (Phase 5.2): ~30 minutes.


File Change Summary

File Change Phase
scripts/lint-config-paths.sh New — lint for /usr/lib/init.d/ in config files 1
mk/depends.mk Add lint-config target 1
AGENTS.md Add installer file layering section 5
local/docs/BUILD-SYSTEM-INVARIANTS.md Add invariants I1I3 5
local/patches/installer/collision-detection.patch New — installer collision detection 2
recipes/core/installer/recipe.toml Wire collision detection patch 2
scripts/validate-init-services.sh New — post-image init validation 4
mk/disk.mk Add validate-init target 4
src/cook/package.rs Parse installs field from recipe.toml 3
src/recipe.rs (or equivalent) Add installs field to recipe struct 3

Scope Boundaries

In scope:

  • Init service file path enforcement and collision detection
  • Installer file-layer collision detection
  • Post-image validation for init services
  • Recipe file-ownership manifests (optional field)
  • Architectural documentation

Out of scope:

  • Init system redesign (scheduler, service types, dependency resolution)
  • Package manager changes (pkgar format, dependency resolution)
  • Build system Makefile restructuring
  • Runtime validation of service startup order
  • General file-conflict detection across all filesystem paths (init service paths are the critical path; general detection is Phase 3)

Relationship to Existing Plans

  • BUILD-SYSTEM-INVARIANTS.md: This plan adds invariants I1I3 to the existing surface-ownership model. Phases 14 implement enforcement of these new invariants.
  • PATCH-GOVERNANCE.md: Unchanged. Patch governance covers source-tree durability; this plan covers installer file-layer collisions — orthogonal concerns.
  • CONSOLE-TO-KDE-DESKTOP-PLAN.md: This plan is infrastructure, not a desktop feature. It prevents build-system regressions that could block the desktop path.
  • DBUS-INTEGRATION-PLAN.md: The triggering incident was a D-Bus regression caused by init service file collisions. This plan prevents recurrence of the root cause.

Phase 6: Patch Integrity and Source Protection (2026-05)

Triggering incident: The relibc patch chain (mega-patch at absorbed/redox.patch) was created by diffing a manually-edited source tree, resulting in 3x code duplication, syntax errors, and stale context lines. When patches failed, the temptation was to create stubs instead of rebasing, causing cascading downstream failures.

Gaps identified and fixed:

Gap 1: COOKBOOK_OFFLINE defaults to false

Red Bear OS is a fork with frozen sources. Defaulting COOKBOOK_OFFLINE to false allowed the build system to contact upstream repositories for non-protected recipes, potentially clobbering patched sources.

Fix: Changed default from false to true in src/config.rs:111. Protected recipes were already forced-offline; this change ensures ALL recipes default to offline. Set COOKBOOK_OFFLINE=false explicitly to opt-in to online fetching.

Gap 2: normalize_patch only handled diff --git

Patches in diff -ruN format (produced by diff -ruN old/ new/) were not normalized, leaving format-specific headers that patch cannot handle. This caused opaque "malformed patch" errors during atomic application.

Fix: Added diff -ruN and diff -r header stripping to normalize_patch() in src/cook/fetch.rs. The function now strips equivalent headers from both diff --git and diff -ruN formats.

Gap 3: No patch validation before building

Patches were only tested during full repo cook builds. A stale patch could fail after minutes-to-hours of compilation of unrelated packages, with no quick way to validate the patch chain against clean upstream source.

Fix: Added repo validate-patches <recipe> command. It:

  1. Restores clean upstream source from release archives
  2. Creates a temporary staging copy (same filesystem, cp -al hard links)
  3. Resets to pristine upstream state (git clean -ffdx && git reset --hard)
  4. Applies each patch in order with --fuzz=0
  5. Reports [PASS] or [FAIL] for each patch
  6. Cleans up the staging directory without touching the live source tree

Usage:

./target/release/repo validate-patches relibc
./target/release/repo validate-patches base

Gap 4: Qt and patched packages not in protected list

Recipes carrying Red Bear patches (qtbase, qtwayland, mesa, libdrm, etc.) were not in the redbear_protected_recipe() list. On non-offline builds, these could be re-fetched from upstream, potentially introducing mismatched source versions.

Fix: Added 14 recipes to the protected list: qtbase, qtwayland, qtdeclarative, qtbase-compat, libdrm, mesa, libwayland, libevdev, libinput, dbus, glib, plus the existing protected recipes were preserved.

Gap 5: Stale pre-patched archives

The relibc archive at sources/redbear-0.1.0/tarballs/core-relibc-v861bbb0-patched.tar.gz was built with an older patch chain. When the archive was restored and patches were re-applied, the build system correctly detected staleness and reset the source, but the archive itself wasted disk space and slightly increased build time.

Fix: Regenerated the archive from the current patched source (minus target/ build artifacts). Updated BLAKE3SUMS with the new checksum.

Acceptance

  • repo validate-patches relibc passes all 25 patches
  • make all CONFIG_NAME=redbear-full completes successfully
  • QEMU boots to login prompt with virtio-gpu (1280×800) and vesad console (1280×720)
  • All protected recipes use only archived sources
  • diff -ruN patches apply correctly after normalization