Comprehensive 6-tier plan to address the 1.5h full-rebuild pathology when making small config changes. Covers content-hash output fingerprinting, per-crate granularity, public API surface tracking, restat / equivalence caching, and developer-experience tools. Synthesizes techniques from Nix, Buildroot, Yocto, GN/Ninja, Cargo, and Bazel adapted to Red Bear OS's Rust cookbook. Triggered by: 2-line edit to local/sources/base/Cargo.toml caused 1.5h full rebuild of redbear-mini. Root cause: cookbook tracks at recipe granularity (one stage.pkgar for 45-member Cargo workspace) instead of crate granularity.
22 KiB
RED BEAR OS — BUILD SYSTEM ROBUSTNESS PLAN
Generated: 2026-06-08
Trigger: A 2-line config change (local/sources/base/Cargo.toml — added [patch.crates-io]
entry, changed one path dep to absolute) caused a full 1.5-hour rebuild of the entire OS image.
That is not normal. The build system must be made of independent packages with surgical
rebuild semantics, and the cookbook must distinguish "source changed" from "no actual output change".
THE CORE PROBLEM
Red Bear OS's cookbook treats a Cargo workspace as a single recipe:
baseis one recipe (recipes/core/base/recipe.toml) but contains a 45-member Cargo workspace (local/sources/base/Cargo.tomlwithmembers = ["audiod", ..., "drivers/pcid", ..., "drivers/graphics/driver-graphics", ...]).- A 1-line change to
local/sources/base/Cargo.tomlinvalidates the recipe (modified_dir_ignore_gitwalks the entire source tree). - Cargo recompiles all 45 workspace members because the workspace config changed.
- The recipe then stages all 45 binaries into one
stage.pkgar. - Every package that lists
basein its[build] dependenciessees a newer.pkgarmtime and rebuilds. - Result: a 2-line change rebuilds the entire OS.
This violates the "red bear custom work survives changes" principle. We need surgical rebuild semantics: a change to a single driver should rebuild only that driver, not 45 others, and the only downstream rebuilds should be packages that actually consume the changed driver's public output.
WHAT MATURE SYSTEMS DO
Synthesis of Nix, Buildroot, Yocto, Chromium GN/Ninja, Cargo, and Bazel:
| System | Granularity | Cache key | Cascade behavior |
|---|---|---|---|
| Nix | Per-derivation | Hash of all inputs (content-addressed) | Only downstream whose input hash changed rebuilds. Quotient hashing avoids mass rebuilds when fixed inputs change |
| Buildroot | Per-package (stamp file) | Stamp mtime | Manual — user must know when to cascade |
| Yocto | Per-task (siginfo) | Hash of all recipe variables | sstate cache; equivalence server avoids redundant rebuilds |
| GN/Ninja | Per-target (explicit deps) |
mtime + restat + gn analyze |
gn analyze prunes tree to affected targets; public_deps distinguish API vs implementation |
| Cargo | Per-unit (fingerprint) | Hash of rustc version + features + target + profile + dep fingerprints | Only units with changed fingerprints rebuild; dep-fingerprint cascade |
| Bazel | Per-action (declared inputs) | Hash of action inputs + command line + env | Skyframe does reverse-transitive-closure; "resurrection" reverts if rebuild produces identical output |
The four core techniques Red Bear OS is missing:
- Content-addressed outputs (Nix, Bazel) — store by hash, not by name
- Per-unit fingerprints with dep cascade (Cargo) — only rebuild units whose fingerprint changes
- Public vs private API boundary (GN) — only propagate dirty when public surface changes
- Restat / equivalence caching (Ninja, Yocto) — if rebuilt output is byte-identical, mark dirty as false
TIER 1 — IMMEDIATE WINS (low effort, high impact)
T1.1 — Content-hash stage.pkgar to detect "no actual change"
Problem: When base rebuilds, it produces a new stage.pkgar (different mtime), even if the
pkgar content is byte-identical to the previous one. Downstream sees the mtime change and
rebuilds.
Fix: After rebuild, compute a content hash of the new stage.pkgar. If it matches the
previous hash, do not bump the mtime (or set the new pkgar's mtime to the old one).
Downstream mtime comparison will see no change → no cascade.
Implementation (cookbook, src/cook/cook_build.rs):
// After packaging stage → stage.pkgar
let new_hash = blake3::hash(&std::fs::read(&stage_pkgar)?);
let old_hash_path = stage_dir.join("stage.pkgar.hash");
if let Ok(old_hash) = std::fs::read_to_string(&old_hash_path) {
if old_hash.trim() == new_hash.to_hex().to_string() {
// Content unchanged — preserve mtime, skip cascade
preserve_mtime(&stage_pkgar, &old_pkgar)?;
return Ok(());
}
}
std::fs::write(old_hash_path, new_hash.to_hex().to_string())?;
Impact: A config change that doesn't affect output (e.g., adding a comment, reordering members) will no longer cascade.
Effort: 1 day (cookbook + recipe-side blake3 dep if not present).
T1.2 — repo cook --since=<git-ref> incremental mode
Problem: repo cook rebuilds everything that's "dirty" by mtime. For a developer iterating
on a single file, this can be over-inclusive.
Fix: Add --since=<ref> flag that uses git diff --name-only <ref>..HEAD to find changed
files, then walks the reverse dep graph to find affected recipes.
Implementation: New src/cook/cook_incremental.rs:
// 1. git diff --name-only <ref>..HEAD → list of changed files
// 2. For each changed file, find recipes whose source contains it
// 3. Build reverse dep graph (BFS)
// 4. Build root-first, then dependents
Impact: A 1-line change in one file rebuilds only that file's recipe + cascade, not the whole source-modified set.
Effort: 3-5 days (git plumbing + BFS + integration with build-redbear.sh).
T1.3 — Fix cascade script to use Cargo workspace member detection
Problem: local/scripts/rebuild-cascade.sh uses text grep
(grep -q "dependencies.*=.*\[.*${target}.*\]") which misses:
- Cargo workspace member-to-member dependencies (e.g.,
pcidandpcid-spawnerin same workspace) dev-dependencies- Conditional dependencies behind features
Fix: Augment with Cargo workspace member parsing. For each recipe, if it's a Cargo recipe,
parse Cargo.toml for [workspace.members] and add member-to-member edges.
Implementation (rebuild-cascade.sh, augment with cargo-aware pass):
# After text-grep pass, add cargo workspace members
for recipe_toml in $(find recipes/ local/recipes/ -name "recipe.toml"); do
source_dir=$(toml_get "$recipe_toml" source.path)
if [ -f "$source_dir/Cargo.toml" ]; then
# Parse workspace members
members=$(grep -A100 '^\[workspace\]' "$source_dir/Cargo.toml" | \
grep -E '^\s*"[^"]+",?\s*$' | tr -d '",' | xargs)
for member in $members; do
# Each member is a potential dependent if it has Cargo deps on the target
...
done
fi
done
Impact: Cascade detection is accurate; no missed rebuilds, no false rebuilds.
Effort: 2-3 days.
T1.4 — Per-source-hash invalidation in modified_dir_ignore_git
Problem: fs.rs:160-167 walks the ENTIRE source tree to find the newest file mtime. A
single .swp file or build artifact can invalidate the cache.
Fix: Use git tree hash for source modification detection. If the source is a git repo
(most local sources are), use git rev-parse HEAD:./path to get a content hash. Only when the
hash changes, mark dirty.
Implementation (fs.rs):
pub fn source_fingerprint(dir: &Path) -> Result<String> {
if is_git_repo(dir) {
// git rev-parse --verify HEAD -- path → only hashes tracked files
let output = Command::new("git")
.args(&["-C", dir.to_str().unwrap(), "ls-tree", "-r", "HEAD"])
.output()?;
let mut hasher = blake3::Hasher::new();
for line in output.stdout.lines() {
hasher.update(line.as_bytes());
hasher.update(b"\n");
}
Ok(hasher.finalize().to_hex().to_string())
} else {
// Fallback: hash all files
Ok(blake3::hash_dir(dir)?.to_hex().to_string())
}
}
Impact: Build artifacts (.swp, target/, Cargo.lock if not tracked) no longer trigger
rebuilds. Stale mtime due to touch operations no longer triggers.
Effort: 2 days.
TIER 2 — PER-CRATE GRANULARITY (medium effort, very high impact)
T2.1 — Split base workspace into per-binary sub-recipes
Problem: base is one recipe with 45 workspace members. A 1-line change rebuilds all 45.
Fix: Two options:
Option A (simpler): Keep base as a Cargo workspace, but change the cookbook's build()
to track per-binary stage.pkgar files. Each binary gets its own pkgar; downstream depends on
specific binaries.
Option B (cleaner): Split base into one recipe per binary. Each recipe:
- Has its own
recipe.tomlwithtemplate = "cargo"and a single-pfilter - Stages its own binary
- Other packages depend on specific binaries (e.g.,
pcid-bin,usbhidd-bin)
Recommended: Option A first (smaller diff), then migrate to Option B as cleanup.
Implementation (cookbook, cook_build.rs):
// Detect workspace members
let members = parse_workspace_members(&source_manifest)?;
for member in members {
let member_pkgar = stage_dir.join(format!("{member}.pkgar"));
// Per-member mtime + per-member hash
let member_source_dir = source_dir.join(&member);
let member_modified = modified_dir_ignore_git(&member_source_dir)?;
let member_deps = member_dependencies(&member, &source_manifest)?;
// Per-member cache check
if member_pkgar_modified < member_modified ||
member_pkgar_modified < deps_modified_for(&member_deps) {
// Rebuild this member only
cargo build -p member ...
}
}
Impact: A change to audiod rebuilds only audiod, not all 45 base binaries. A change to
pcid rebuilds only pcid. This is the single biggest win.
Effort: 1-2 weeks (cookbook refactor + recipe updates).
T2.2 — Restructure kernel and mesa recipes similarly
Same pattern as T2.1 for:
kernelrecipe (singlekernel.elfoutput, but multiple internal stages)mesarecipe (singlelibGL.soetc., but multiple internal sub-libraries)llvm21recipe (singleclangbinary, but many internal components)
Impact: A change to one mesa component rebuilds only that component, not the whole mesa build (which takes 20+ minutes).
Effort: 1 week each.
T2.3 — Per-binfmt_pkg output tracking
Problem: A recipe's stage/ directory contains many files, all bundled into one
stage.pkgar. The cookbook doesn't know which file in stage/ corresponds to which binary.
Fix: Add installs = [...] to recipes (already partially supported), and use it to track
per-output mtime.
Implementation: When a recipe declares installs = ["/usr/bin/foo", "/usr/lib/libbar.so"],
the cookbook:
- Tracks mtime per output path
- Computes per-output hash
- Lets downstream depend on specific output paths
Impact: When libdrm.so changes but libdrm_intel.so doesn't, only consumers of
libdrm.so rebuild.
Effort: 1-2 weeks.
TIER 3 — OUTPUT FINGERPRINTING (medium-high effort, very high impact)
T3.1 — Hash the sysroot content of each recipe
Problem: Currently the cookbook only checks mtime of stage.pkgar, not its content. Two
builds that produce identical pkgar content still cascade downstream.
Fix: Compute BLAKE3 hash of the staged sysroot artifacts; cache it; use it as part of the package fingerprint.
Implementation (src/cook/cook_build.rs):
// After staging files into stage/
let stage_fingerprint = compute_stage_fingerprint(&stage_dir)?;
let fp_file = stage_dir.join("stage.fingerprint");
let new_fp = blake3::hash(stage_fingerprint.as_bytes()).to_hex().to_string();
if let Ok(old_fp) = std::fs::read_to_string(&fp_file) {
if old_fp.trim() == new_fp {
// Stage contents identical — preserve mtime
preserve_mtime_recursive(&stage_dir)?;
}
}
std::fs::write(fp_file, new_fp)?;
Impact: A rebuild that produces identical output (e.g., due to deterministic compiler output for unchanged sources) doesn't cascade.
Effort: 3-5 days.
T3.2 — Cascade invalidation only when downstream input hash changes
Problem: Cascade currently triggers on mtime. Mtime can change without content change.
Fix: Instead of stage_modified < deps_modified, use:
downstream_fingerprint_input < upstream_fingerprint.
Where:
- Each recipe declares a
fingerprint_inputs = [...]list (paths it consumes) - On each rebuild, hash the contents of those paths
- Store the hash as part of the recipe's fingerprint
Implementation:
// Recipe declares:
[package]
fingerprint_inputs = ["/usr/lib/libdrm.so", "/usr/include/libdrm/drm.h"]
// Cookbook computes:
let input_fingerprint = blake3::hash_dir_contents(fingerprint_inputs)?;
Impact: When libdrm.so content doesn't change (e.g., internal implementation), consumers
don't rebuild.
Effort: 1-2 weeks.
T3.3 — Yocto-style equivalence cache for ABI-stable rebuilds
Problem: When a recipe's source changes but its output is byte-identical, the recipe rebuilds but downstream should not.
Fix: Implement an "equivalence cache" — a database mapping old content hash → new content hash for ABI-equivalent outputs. When the new content hash matches an old one (within the equivalence class), downstream is not invalidated.
Implementation: SQLite-backed equivalence cache at .redbear/equivalence.db. Keyed by
input hash + build flags; value is the set of "equivalent" output hashes.
Impact: Even non-deterministic builds (e.g., embedded timestamps) can be marked equivalent.
Effort: 2-3 weeks.
TIER 4 — PUBLIC API TRACKING (high effort, high impact for kernel/relibc)
T4.1 — Distinguish public headers from internal sources
Problem: relibc changes cascade to all C/C++ packages. But only changes to relibc's public
headers (in local/sources/relibc/include/) should cascade. Internal changes to
local/sources/relibc/src/ should not.
Fix: Each recipe declares public_api = [...] — the list of paths that constitute its
public API. Only mtime/hash changes to those paths trigger cascade.
Implementation:
let public_api = recipe.public_api_paths();
let public_api_modified = public_api.iter()
.map(|p| modified(p))
.max()?;
if stage_modified < public_api_modified {
cascade_to_dependents();
}
Impact: A change to relibc/src/header/errno/mod.rs (internal) doesn't cascade.
A change to relibc/include/sys/errno.h (public) does.
Effort: 1-2 weeks per "API surface" (relibc, kernel, mesa).
T4.2 — Track ABI via .so version files / SONAME bumps
Problem: Even if headers change, if the ABI version is unchanged, downstream can use the new library without recompilation.
Fix: Parse .so files for SONAME. Compare SONAME between old and new build. If
SONAME unchanged, no cascade.
Implementation: ELF SONAME parser in cookbook. SONAME = readelf -d *.so | grep SONAME.
Impact: Relibc ABI-preserving changes don't cascade to C/C++ packages.
Effort: 1 week.
T4.3 — Header dependency graph for C/C++ packages
Problem: A C/C++ package's includes cascade is "any header in any include path", which is over-inclusive. The actual cascade should be "headers this file actually includes".
Fix: Use gcc -M / clang -M to generate per-file header dependencies. Hash the
resulting .d file. Cascade only when those specific headers change.
Impact: A change to errno.h doesn't cascade to packages that don't include errno.h.
Effort: 1-2 weeks.
TIER 5 — RESTAT / OUTPUT STABILITY (medium effort, medium impact)
T5.1 — After rebuild, check if installed files differ from previous
Problem: Currently, every rebuild produces a new stage.pkgar regardless of whether
content changed.
Fix: After cargo build and before pkgar packaging, diff the new sysroot against the
old sysroot. If all files are byte-identical, copy old stage.pkgar mtime to new files.
Implementation: diff -r or content-hash comparison.
Impact: Idempotent builds don't cascade.
Effort: 3-5 days.
T5.2 — Idempotent packaging
Problem: pkgar files include timestamps, so identical content produces different pkgar files.
Fix: Make pkgar packaging deterministic (sort entries, zero timestamps, fixed compression).
Impact: Identical content → identical pkgar → no cascade.
Effort: 1 week (upstream pkgar changes).
TIER 6 — DEPENDENCY GRAPH ANALYSIS (low effort, medium impact)
T6.1 — Add repo graph to show full dependency graph
Problem: Hard to know what rebuilds when X changes.
Fix: Add repo graph that emits the full dep graph in DOT format. Visualize with
xdot or similar.
Implementation: New src/bin/repo_graph.rs (or subcommand in repo.rs).
Effort: 2-3 days.
T6.2 — Add repo cook --since=<commit> to only rebuild affected packages
Problem: When you git pull or merge a branch, you want to rebuild only what the merge
touched.
Fix: Use git diff --name-only between old HEAD and new HEAD, walk reverse deps.
Implementation: New --since flag in repo cook. Falls back to --changed for tracked
files.
Effort: 3-5 days.
T6.3 — Add repo why <pkg> to show what triggers rebuilds
Problem: When pkg rebuilds, why? What cascaded into it?
Fix: Reverse-dep analysis — show the path from each changed source to the target recipe.
Implementation: BFS from changed source paths, through recipe deps and Cargo workspace members, to target recipe.
Effort: 2-3 days.
PRIORITIZED ROADMAP
| Tier | Effort | Impact | Risk | Priority |
|---|---|---|---|---|
| T1.1 Content-hash stage.pkgar | 1 day | High (catches all no-op rebuilds) | Low | P0 — DO FIRST |
| T1.4 Per-source-hash via git tree | 2 days | High (eliminates spurious dirty) | Low | P0 |
T1.2 --since flag |
3-5 days | High (developer workflow) | Medium | P1 |
| T1.3 Cascade script cargo-aware | 2-3 days | Medium | Low | P1 |
| T2.1 Split base per-binary | 1-2 weeks | Very high (45 → 1 rebuild) | Medium (breaking) | P1 |
| T3.1 Sysroot fingerprint | 3-5 days | High | Low | P1 |
| T2.2 Split kernel/mesa | 1 week each | High | Medium | P2 |
| T3.2 Downstream-input hash | 1-2 weeks | Very high | Medium | P2 |
T6.1 repo graph |
2-3 days | Medium (devx) | Low | P2 |
T6.2 --since commit |
3-5 days | High (devx) | Low | P2 |
| T5.1 Restat diff | 3-5 days | Medium | Low | P3 |
| T3.3 Equivalence cache | 2-3 weeks | High | Medium (cache coherency) | P3 |
| T4.1 Public API surface | 1-2 weeks | High (relibc) | Medium (semantics) | P3 |
| T4.2 SONAME tracking | 1 week | Medium | Low | P4 |
| T4.3 Header dep graph | 1-2 weeks | Medium | Medium | P4 |
| T5.2 Idempotent pkgar | 1 week | Medium | Medium | P4 |
| T2.3 Per-binfmt_pkg | 1-2 weeks | High | Medium | P4 |
T6.3 repo why |
2-3 days | Low (devx) | Low | P4 |
PHASED IMPLEMENTATION
Phase A (1-2 weeks) — Stop the bleeding
- T1.1 content-hash stage.pkgar
- T1.4 git-tree source fingerprint
- T1.3 cargo-aware cascade
- Result: 2-line Cargo.toml change no longer cascades if output is identical
Phase B (2-4 weeks) — Per-crate granularity
- T2.1 split base workspace (1-2 weeks)
- T2.2 split kernel/mesa (1 week each)
- T3.1 sysroot fingerprint
- T3.2 downstream-input hash cascade
- Result: A change to one driver rebuilds only that driver
Phase C (2-4 weeks) — API surface tracking
- T4.1 public API surface for relibc + kernel
- T4.2 SONAME tracking
- T4.3 header dep graph
- T5.1 restat diff
- T5.2 idempotent pkgar
- Result: Internal implementation changes don't cascade
Phase D (1-2 weeks) — Developer experience
- T6.1
repo graph - T6.2
--sincecommit - T6.3
repo why - T1.2
--since=<ref>incremental - Result: Developer can answer "what rebuilds" instantly
METRICS & SUCCESS CRITERIA
The build system is healthy when:
| Metric | Current | Target |
|---|---|---|
| 1-line Cargo.toml change rebuilds | Full OS (1.5h) | < 5 min (only changed recipe) |
make after no source change |
Full OS (1.5h) | 0 sec (idempotent, no-op) |
| 1-line kernel source change | Full OS (1.5h) | < 10 min (kernel + kernel consumers) |
| 1-line relibc internal change | Full OS (1.5h) | < 5 min (relibc + 0 consumers if API unchanged) |
repo cook --since=v0.1.0 |
Full OS | < 1 min (1-2 packages) |
repo why mesa |
N/A | < 1 sec (printed graph) |
DESIGN CONSTRAINTS
These constraints are non-negotiable:
- Offline-first —
REPO_OFFLINE=1must remain default. All changes must work without network access. - Determinism — Outputs must be byte-identical for identical inputs (modulo timestamps).
- Backward compat — Existing recipes must continue to work without modification.
- No new build dependencies — Only use crates already in the workspace.
- Performance — Fingerprint computation must be O(source) or O(staged-output), not O(n²).
- Durability — Fingerprint caches must survive
make distclean(inlocal/.cache/).
NON-GOALS
We will NOT:
- Replace Cargo (too invasive, too risky)
- Migrate to Bazel or Nix (would require months of work)
- Add remote artifact caching (out of scope; we have local sstate)
- Rewrite the build in a different language
- Add a distributed build cluster
WHY THIS PLAN WILL WORK
Mature systems (Nix, Cargo, Yocto) already implement these patterns. The techniques are proven. Red Bear OS only needs to add output fingerprinting and per-crate granularity, both of which are well-understood in the broader build systems literature.
The hardest part is T2.1 (per-crate granularity for base) because it requires cookbook
changes. But the rest can be implemented incrementally and tested with --no-cache for
correctness.
NEXT STEPS
- Implement T1.1 (content-hash stage.pkgar) — 1 day, low risk
- Implement T1.4 (git-tree source fingerprint) — 2 days, low risk
- Implement T1.3 (cargo-aware cascade) — 2-3 days, low risk
- Test: 1-line Cargo.toml change should rebuild only base
- Implement T2.1 (per-binary base) — 1-2 weeks
- Test: 1-line pcid source change should rebuild only pcid
- Implement T3.1, T3.2 (output fingerprinting + cascade-by-hash)
- Test: rebuild with identical source produces no cascade
- Phase D devx improvements (graph, why, since)
REFERENCES
- Nix Quotient Hashing: https://nix.dev/manual/nix/2.34/store/derivation/outputs/content-address
- Cargo Fingerprint Module: https://doc.rust-lang.org/stable/nightly-rustc/cargo/core/compiler/fingerprint/index.html
- GN Analyze: https://gn.googlesource.com/gn/+/HEAD/docs/reference.md
- Yocto sstate: https://docs.yoctoproject.org/5.0.9/overview-manual/concepts.html
- Bazel Skyframe: https://preview.bazel.build/reference/skyframe
- Buildroot Rebuilding: https://buildroot.org/downloads/manual/rebuilding-packages.txt