- AGENTS.md: add cache system to STRUCTURE, WHERE TO LOOK, BUILD FLOW, BUILD COMMANDS (--force-rebuild), and CONVENTIONS (dep_hashes.toml, binary store restore, package_groups syntax) - CHANGELOG.md: comprehensive entry for Phase 1-3 + kernel MWAIT + ninja-build Redox support - local/AGENTS.md: note installer fork adds package groups support - BUILD-CACHE-PLAN.md: fix TOML syntax (underscores not hyphens), update all phases to COMPLETE with implementation details, add cache flow diagram, add verification results
14 KiB
Build Cache & Meta-Package Implementation Plan
Created: 2026-06-30 Status: Phase 1–3 — Complete Scope: Hash-based cache invalidation, binary store lookup, config-level package groups
Problem Statement
The build system's cache invalidation (src/cook/cook_build.rs:285-303) is purely
timestamp (mtime) based. When ANY low-level component is rebuilt, ALL dependent
recipes are force-rebuilt — even if the dependency's binary output is bit-identical.
Components that trigger cascade rebuilds
| Component | Change Frequency | Cascade Scope Today |
|---|---|---|
| relibc | Weekly (POSIX gaps) | All C/C++ recipes (via auto-deps: libc.so.6) |
| kernel | Periodic | All kernel-dependent recipes |
| base | Frequent (driver work) | All base-dependent recipes |
| redoxfs | Occasional | All redoxfs-dependent recipes |
| prefix toolchain | On relibc/kernel change | ALL C/C++ recipes (real ABI change — correct cascade) |
Cost
A single low-level change triggers a 4-6 hour rebuild of the entire Qt+KDE stack (60+ recipes: qtbase, qtdeclarative, qtsvg, qtwayland, 32 KF6 frameworks, kwin, kdecoration, etc.). This happens weekly during POSIX gap-filling and driver work.
Root Cause
// src/cook/cook_build.rs:285-303 (CURRENT — BROKEN)
if stage_modified < source_modified
|| stage_modified < deps_modified // mtime of dep .pkgar files
|| stage_modified < deps_host_modified
|| !auto_deps_file.is_file()
{
// FORCE REBUILD — even if dep binary is bit-identical
}
The deps_modified timestamp changes whenever a dependency's .pkgar file is
touched on disk — regardless of whether the content actually changed. The system
already computes BLAKE3 hashes and stores them in stage.toml metadata, but
never consults them for cache decisions.
Solution Architecture
Phase 1: Hash-Based Cache Invalidation (CRITICAL)
Replace mtime comparison with BLAKE3 hash comparison from existing stage.toml
metadata.
How it works:
- At build time: for each build dependency, read the dep's
stage.toml→ extractblake3hash → store indep_hashes.tomlalongsideauto_deps.toml - At cache check time: read each dep's current
stage.tomlblake3 → compare against stored hash indep_hashes.toml - If ALL hashes match → cache hit (skip rebuild)
- If ANY hash differs → rebuild (dependency content actually changed)
- If
dep_hashes.tomldoesn't exist → fall back to mtime (backward compat)
dep_hashes.toml format:
# Generated by cookbook. Do not edit manually.
# Stores the BLAKE3 hash of each build dependency's PKGAR at the time
# this recipe was last built. Used for content-based cache invalidation.
relibc = "7a75a52121a27577fa23c18d662a38029447a4df9b8fb5ab55aee2698c514440"
dbus = "3b8c..."
mesa = "9f2e..."
zlib = "a1b2..."
Implementation (Phase 1 — COMPLETE):
| File | Change | Status |
|---|---|---|
src/cook/cook_build.rs |
DepHashes struct with read/write; collect_current_dep_hashes() reads blake3 from dep .toml metadata; dep_hashes_changed() compares stored vs current; replaces mtime at cache check |
✅ Done |
src/cook/cook_build.rs |
Mtime fallback when dep_hashes.toml absent |
✅ Done |
src/bin/repo.rs |
--force-rebuild CLI flag bypasses hash caching |
✅ Done |
src/config.rs |
force_rebuild field in CookConfig/CookConfigOpt |
✅ Done |
src/cook/cook_build.rs |
build_deps_dir() update — deferred: sysroot rebuild uses mtime, not hash. Does NOT cause cascade rebuilds because the recipe's stage.pkgar is the cache key, not the sysroot. |
📋 Deferred |
Why PKGAR BLAKE3 (not ELF symbol extraction):
- The hash already exists in
stage.toml— zero computation cost - BLAKE3 is deterministic and collision-resistant
- Conservative by design: any file change in the dep triggers rebuild (same false-positive rate as mtime — no regression)
- ELF symbol extraction would be ~500+ LOC for marginal gain, with risk of false negatives (catastrophic — stale binaries linked against missing symbols)
Edge cases:
| Case | Behavior |
|---|---|
First build (no dep_hashes.toml) |
Mtime fallback (existing behavior) |
| Dependency removed from recipe | New dep_hashes.toml omits it; no special handling |
| Dependency added to recipe | Not in stored hashes → treated as "changed" → rebuild |
| Rollback of dependency | Old BLAKE3 matches stored hash → cache hit (correct) |
--force-rebuild flag |
Bypass hash check, always rebuild |
Critical: build_deps_dir() must be updated in the same change. This function
(lines 566-653 of cook_build.rs) builds the per-recipe sysroot from dependency
PKGARs. It also uses mtime comparison. If left on mtime, the sysroot will still
cascade-rebuild even when the hash check says the recipe is cached.
Phase 2: Binary Store Cache Lookup (HIGH ROI)
The repo/<arch>/ directory stores built PKGARs that survive make clean. But
the cook path never consults them — it only checks the recipe's target/ dir.
Fix: Before building, check if repo/<arch>/<name>.pkgar exists and its
dependency hashes match:
// Pseudocode for the repo lookup in cook_build.rs
let repo_pkgar = repo_dir.join(format!("{}.pkgar", recipe_name));
let repo_toml = repo_dir.join(format!("{}.toml", recipe_name));
if repo_pkgar.is_file() && repo_toml.is_file() {
let repo_meta = Package::from_file(&repo_toml)?;
if dep_hashes_match(&repo_toml, &dep_pkgars)? {
// Restore cached binary from repo store
fs::copy(&repo_pkgar, &stage_pkgar)?;
return Ok(BuildResult::cached(stage_dirs, auto_deps));
}
}
Implementation (Phase 2 — COMPLETE):
| File | Change | Status |
|---|---|---|
src/cook/cook_build.rs |
Repo binary store restore: when target/ missing but repo/<arch>/<name>.pkgar + .toml exist, extracts PKGAR to stage dir, copies .toml + dep_hashes.toml, auto-generates auto_deps.toml from repo depends field |
✅ Done |
src/bin/repo_builder.rs |
Publishes <name>.dep_hashes.toml alongside .pkgar and .toml in repo/<arch>/ during publish_packages() (main package only, not optional sub-packages) |
✅ Done |
Total: ~70 LOC
Phase 3: Config-Level Package Groups (COMPLETE)
Meta-packages are an installer/runtime concern. The build system continues to produce one PKGAR per recipe. The installer groups them logically.
Config format (underscore, NOT hyphen — serde requires it):
# config/redbear-full.toml
[package_groups.qt6-core]
description = "Qt 6 Core modules"
packages = ["qtbase", "qtdeclarative", "qtsvg"]
[package_groups.qt6-extras]
description = "Qt 6 Wayland integration"
packages = ["qtwayland", "qt6-wayland-smoke", "qt6-sensors"]
[package_groups.kf6-frameworks]
description = "KDE Frameworks 6 (all 38)"
packages = ["kf6-kcoreaddons", "kf6-kconfig", "kf6-ki18n", ...]
[package_groups.kde-desktop]
description = "Complete KDE Plasma desktop session"
packages = ["graphics-core", "qt6-core", "qt6-extras", "kf6-frameworks", "kwin", "sddm"]
Groups can reference other groups — installer resolves recursively with cycle detection.
Explicit [packages] entries override group membership.
Implementation (Phase 3 — COMPLETE):
| File | Change | Status |
|---|---|---|
local/sources/installer/src/config/mod.rs |
PackageGroup struct, package_groups field on Config, resolve_package_groups() + expand_group() with cycle detection, recursive group resolution, explicit-overrides-group priority. Called at end of Config::from_file() |
✅ Done |
local/sources/installer/src/config/package.rs |
PartialEq derive on PackageConfig for dedup during merge |
✅ Done |
config/redbear-full.toml |
9 groups defined: graphics-core, input-stack, dbus-services, firmware-stack, qt6-core, qt6-extras, kf6-frameworks, desktop-session, kde-desktop |
✅ Done |
Cargo.toml |
Switch redox_installer from upstream git to local fork (path = "local/sources/installer") |
✅ Done |
pkg crate (runtime) |
pkg install qt6-core resolves group |
📋 Future (runtime package manager) |
Total: ~170 LOC
Why NOT merged PKGARs or build-system grouping:
- Merged PKGARs break the one-recipe-one-PKGAR model and make caching harder
BuildKind::Nonemetapackages (existingredbear-metapattern) work for dependency sets but don't group binaries- Config-level grouping is the lightest touch — no build graph changes, no PKGAR format changes, no upstream divergence
What Was Rejected
| Approach | Why Rejected |
|---|---|
| Nix-style content-addressed store | Requires hermetic sandboxed builds. Cookbook uses system cargo/cmake/make. ~2000+ LOC for marginal gain over BLAKE3 comparison. |
| Splitting monolithic packages (base) | Would diverge from upstream Redox structure. User decision: stay aligned with upstream. |
| ELF symbol-level ABI hashing | ~500+ LOC to parse .so files inside LZMA2 PKGARs. Risk of false negatives (stale binary = crash). PKGAR BLAKE3 is adequate. |
| Merged PKGAR meta-packages | Breaks one-build-one-PKGAR model. Makes caching harder. Each component has its own deps and ABI surface. |
| Fine-grained header dependency tracking | Massive complexity, compiler-dependent. Not worth it for a source-built OS. |
Implementation Status
| Phase | Effort | Status | Verification |
|---|---|---|---|
| Phase 1: Hash-based caching | ~175 LOC | ✅ Complete | Touch xz.pkgar → libxml2 stays cached (mtime changed, BLAKE3 same); --force-rebuild bypasses |
| Phase 2: Binary store lookup | ~70 LOC | ✅ Complete | Removed libxml2 target/ → re-cook → restored from repo/ binary store → cache hit |
| Phase 3: Package groups | ~170 LOC | ✅ Complete | 3 unit tests pass (nested groups, explicit override, no-groups compat); mini build verified |
Cache Flow (As Implemented)
Recipe build requested
│
├─ force_rebuild? ────────────────────────────────► REBUILD
│
├─ target/ dir exists with dep_hashes.toml?
│ ├─ YES: Read stored hashes
│ │ Collect current dep PKGAR blake3 from .toml files
│ │ All match? ──► CACHE HIT (skip build)
│ │ Any differ? ──► REBUILD
│ │
│ └─ NO: Fall back to mtime comparison (backward compat)
│ stage < source? ──► REBUILD
│ stage < deps? ──► REBUILD
│ Otherwise ──► CACHE HIT
│
├─ target/ dir MISSING but repo/<arch>/ has pkgar + toml?
│ └─ YES: Restore stage artifacts from binary store
│ Extract pkgar → stage dir
│ Copy .toml + .dep_hashes.toml
│ Generate auto_deps.toml from repo depends
│ Then run cache check above (likely hit)
│
└─ Build complete → write dep_hashes.toml with current hashes
Risk Analysis
| Risk | Severity | Mitigation |
|---|---|---|
| False cache hit (stale binary) | High | BLAKE3 is collision-resistant; --force-rebuild escape hatch; mtime fallback |
build_deps_dir() not updated |
High | Must be in same PR as Phase 1 |
| Header-only dep false positives | Low | Same rate as current mtime; no regression |
| First-build confusion | Low | Mtime fallback; transparent to user |
| Prefix staleness (separate issue) | N/A | Prefix provides real ABI (libc.a, headers). Cascade is CORRECT. build-redbear.sh warns about stale prefix. |
Testing Plan
-
Hash-based caching verification:
- Build qtbase from clean → record
dep_hashes.toml toucha relibc source file but DON'T change ABI → rebuild relibc → runmake r.qtbase- Expected: cache hit (relibc PKGAR BLAKE3 didn't change because binary is same)
- Build qtbase from clean → record
-
Real ABI change verification:
- Add a new function to relibc → rebuild relibc → run
make r.qtbase - Expected: rebuild (relibc PKGAR BLAKE3 changed)
- Add a new function to relibc → rebuild relibc → run
-
Cross-component verification:
- Change a base driver → rebuild base → run
make r.mesa - Expected: cache hit (mesa doesn't depend on base)
- Change a base driver → rebuild base → run
-
--force-rebuildverification:make r.qtbaseaftertouchwithout real change → cache hitrepo cook qtbase --force-rebuild→ rebuild regardless
-
Binary store verification:
- Build qtbase →
make clean→make r.qtbase - Expected: restored from repo/ (no source rebuild)
- Build qtbase →
Relationship to Existing Systems
auto_deps.toml: Unchanged. Still records runtime auto-discovered dependencies.source_info.toml: Unchanged. Records commit/time identifiers for provenance.stage.toml: Unchanged. Package metadata with BLAKE3 hash..patches-state: Unchanged. Tracks patch application state.repo/<arch>/repo.toml: Unchanged. Central manifest of package name → BLAKE3.build-redbear.shprefix warning: Unchanged. Prefix staleness is a real ABI change.- AGENTS.md durability policy: Unchanged. All new code is in
src/(git-tracked).
Glossary
| Term | Definition |
|---|---|
| PKGAR | Ed25519-signed package archive format used by Redox/Red Bear OS |
| BLAKE3 | Cryptographic hash used to fingerprint PKGAR content |
| stage.toml | Per-recipe metadata file with BLAKE3, deps, version, identifiers |
| auto_deps.toml | Per-recipe file with runtime auto-discovered dependencies (ELF DT_NEEDED) |
| dep_hashes.toml | NEW — Per-recipe file storing BLAKE3 hashes of build deps at build time |
| repo/ | Per-arch directory storing built PKGARs + metadata + central manifest |
| prefix/ | Cross-compiler toolchain (Clang/LLVM + relibc sysroot) |
| Cascade rebuild | A low-level change triggering rebuild of all dependents transitively |
| False cascade | A rebuild triggered by mtime change despite identical binary content |