Files
RedBear-OS/local/docs/BUILD-CACHE-PLAN.md
T
vasilito 7d62a7c0ab docs: document content-hash cache system, binary store, package groups
- AGENTS.md: add cache system to STRUCTURE, WHERE TO LOOK, BUILD FLOW,
  BUILD COMMANDS (--force-rebuild), and CONVENTIONS (dep_hashes.toml,
  binary store restore, package_groups syntax)
- CHANGELOG.md: comprehensive entry for Phase 1-3 + kernel MWAIT +
  ninja-build Redox support
- local/AGENTS.md: note installer fork adds package groups support
- BUILD-CACHE-PLAN.md: fix TOML syntax (underscores not hyphens),
  update all phases to COMPLETE with implementation details, add cache
  flow diagram, add verification results
2026-06-30 17:39:35 +03:00

14 KiB
Raw Blame History

Build Cache & Meta-Package Implementation Plan

Created: 2026-06-30 Status: Phase 13 — Complete Scope: Hash-based cache invalidation, binary store lookup, config-level package groups

Problem Statement

The build system's cache invalidation (src/cook/cook_build.rs:285-303) is purely timestamp (mtime) based. When ANY low-level component is rebuilt, ALL dependent recipes are force-rebuilt — even if the dependency's binary output is bit-identical.

Components that trigger cascade rebuilds

Component Change Frequency Cascade Scope Today
relibc Weekly (POSIX gaps) All C/C++ recipes (via auto-deps: libc.so.6)
kernel Periodic All kernel-dependent recipes
base Frequent (driver work) All base-dependent recipes
redoxfs Occasional All redoxfs-dependent recipes
prefix toolchain On relibc/kernel change ALL C/C++ recipes (real ABI change — correct cascade)

Cost

A single low-level change triggers a 4-6 hour rebuild of the entire Qt+KDE stack (60+ recipes: qtbase, qtdeclarative, qtsvg, qtwayland, 32 KF6 frameworks, kwin, kdecoration, etc.). This happens weekly during POSIX gap-filling and driver work.

Root Cause

// src/cook/cook_build.rs:285-303 (CURRENT — BROKEN)
if stage_modified < source_modified
    || stage_modified < deps_modified      // mtime of dep .pkgar files
    || stage_modified < deps_host_modified
    || !auto_deps_file.is_file()
{
    // FORCE REBUILD — even if dep binary is bit-identical
}

The deps_modified timestamp changes whenever a dependency's .pkgar file is touched on disk — regardless of whether the content actually changed. The system already computes BLAKE3 hashes and stores them in stage.toml metadata, but never consults them for cache decisions.

Solution Architecture

Phase 1: Hash-Based Cache Invalidation (CRITICAL)

Replace mtime comparison with BLAKE3 hash comparison from existing stage.toml metadata.

How it works:

  1. At build time: for each build dependency, read the dep's stage.toml → extract blake3 hash → store in dep_hashes.toml alongside auto_deps.toml
  2. At cache check time: read each dep's current stage.toml blake3 → compare against stored hash in dep_hashes.toml
  3. If ALL hashes match → cache hit (skip rebuild)
  4. If ANY hash differs → rebuild (dependency content actually changed)
  5. If dep_hashes.toml doesn't exist → fall back to mtime (backward compat)

dep_hashes.toml format:

# Generated by cookbook. Do not edit manually.
# Stores the BLAKE3 hash of each build dependency's PKGAR at the time
# this recipe was last built. Used for content-based cache invalidation.
relibc = "7a75a52121a27577fa23c18d662a38029447a4df9b8fb5ab55aee2698c514440"
dbus = "3b8c..."
mesa = "9f2e..."
zlib = "a1b2..."

Implementation (Phase 1 — COMPLETE):

File Change Status
src/cook/cook_build.rs DepHashes struct with read/write; collect_current_dep_hashes() reads blake3 from dep .toml metadata; dep_hashes_changed() compares stored vs current; replaces mtime at cache check Done
src/cook/cook_build.rs Mtime fallback when dep_hashes.toml absent Done
src/bin/repo.rs --force-rebuild CLI flag bypasses hash caching Done
src/config.rs force_rebuild field in CookConfig/CookConfigOpt Done
src/cook/cook_build.rs build_deps_dir() update — deferred: sysroot rebuild uses mtime, not hash. Does NOT cause cascade rebuilds because the recipe's stage.pkgar is the cache key, not the sysroot. 📋 Deferred

Why PKGAR BLAKE3 (not ELF symbol extraction):

  • The hash already exists in stage.toml — zero computation cost
  • BLAKE3 is deterministic and collision-resistant
  • Conservative by design: any file change in the dep triggers rebuild (same false-positive rate as mtime — no regression)
  • ELF symbol extraction would be ~500+ LOC for marginal gain, with risk of false negatives (catastrophic — stale binaries linked against missing symbols)

Edge cases:

Case Behavior
First build (no dep_hashes.toml) Mtime fallback (existing behavior)
Dependency removed from recipe New dep_hashes.toml omits it; no special handling
Dependency added to recipe Not in stored hashes → treated as "changed" → rebuild
Rollback of dependency Old BLAKE3 matches stored hash → cache hit (correct)
--force-rebuild flag Bypass hash check, always rebuild

Critical: build_deps_dir() must be updated in the same change. This function (lines 566-653 of cook_build.rs) builds the per-recipe sysroot from dependency PKGARs. It also uses mtime comparison. If left on mtime, the sysroot will still cascade-rebuild even when the hash check says the recipe is cached.

Phase 2: Binary Store Cache Lookup (HIGH ROI)

The repo/<arch>/ directory stores built PKGARs that survive make clean. But the cook path never consults them — it only checks the recipe's target/ dir.

Fix: Before building, check if repo/<arch>/<name>.pkgar exists and its dependency hashes match:

// Pseudocode for the repo lookup in cook_build.rs
let repo_pkgar = repo_dir.join(format!("{}.pkgar", recipe_name));
let repo_toml = repo_dir.join(format!("{}.toml", recipe_name));
if repo_pkgar.is_file() && repo_toml.is_file() {
    let repo_meta = Package::from_file(&repo_toml)?;
    if dep_hashes_match(&repo_toml, &dep_pkgars)? {
        // Restore cached binary from repo store
        fs::copy(&repo_pkgar, &stage_pkgar)?;
        return Ok(BuildResult::cached(stage_dirs, auto_deps));
    }
}

Implementation (Phase 2 — COMPLETE):

File Change Status
src/cook/cook_build.rs Repo binary store restore: when target/ missing but repo/<arch>/<name>.pkgar + .toml exist, extracts PKGAR to stage dir, copies .toml + dep_hashes.toml, auto-generates auto_deps.toml from repo depends field Done
src/bin/repo_builder.rs Publishes <name>.dep_hashes.toml alongside .pkgar and .toml in repo/<arch>/ during publish_packages() (main package only, not optional sub-packages) Done

Total: ~70 LOC

Phase 3: Config-Level Package Groups (COMPLETE)

Meta-packages are an installer/runtime concern. The build system continues to produce one PKGAR per recipe. The installer groups them logically.

Config format (underscore, NOT hyphen — serde requires it):

# config/redbear-full.toml
[package_groups.qt6-core]
description = "Qt 6 Core modules"
packages = ["qtbase", "qtdeclarative", "qtsvg"]

[package_groups.qt6-extras]
description = "Qt 6 Wayland integration"
packages = ["qtwayland", "qt6-wayland-smoke", "qt6-sensors"]

[package_groups.kf6-frameworks]
description = "KDE Frameworks 6 (all 38)"
packages = ["kf6-kcoreaddons", "kf6-kconfig", "kf6-ki18n", ...]

[package_groups.kde-desktop]
description = "Complete KDE Plasma desktop session"
packages = ["graphics-core", "qt6-core", "qt6-extras", "kf6-frameworks", "kwin", "sddm"]

Groups can reference other groups — installer resolves recursively with cycle detection. Explicit [packages] entries override group membership.

Implementation (Phase 3 — COMPLETE):

File Change Status
local/sources/installer/src/config/mod.rs PackageGroup struct, package_groups field on Config, resolve_package_groups() + expand_group() with cycle detection, recursive group resolution, explicit-overrides-group priority. Called at end of Config::from_file() Done
local/sources/installer/src/config/package.rs PartialEq derive on PackageConfig for dedup during merge Done
config/redbear-full.toml 9 groups defined: graphics-core, input-stack, dbus-services, firmware-stack, qt6-core, qt6-extras, kf6-frameworks, desktop-session, kde-desktop Done
Cargo.toml Switch redox_installer from upstream git to local fork (path = "local/sources/installer") Done
pkg crate (runtime) pkg install qt6-core resolves group 📋 Future (runtime package manager)

Total: ~170 LOC

Why NOT merged PKGARs or build-system grouping:

  • Merged PKGARs break the one-recipe-one-PKGAR model and make caching harder
  • BuildKind::None metapackages (existing redbear-meta pattern) work for dependency sets but don't group binaries
  • Config-level grouping is the lightest touch — no build graph changes, no PKGAR format changes, no upstream divergence

What Was Rejected

Approach Why Rejected
Nix-style content-addressed store Requires hermetic sandboxed builds. Cookbook uses system cargo/cmake/make. ~2000+ LOC for marginal gain over BLAKE3 comparison.
Splitting monolithic packages (base) Would diverge from upstream Redox structure. User decision: stay aligned with upstream.
ELF symbol-level ABI hashing ~500+ LOC to parse .so files inside LZMA2 PKGARs. Risk of false negatives (stale binary = crash). PKGAR BLAKE3 is adequate.
Merged PKGAR meta-packages Breaks one-build-one-PKGAR model. Makes caching harder. Each component has its own deps and ABI surface.
Fine-grained header dependency tracking Massive complexity, compiler-dependent. Not worth it for a source-built OS.

Implementation Status

Phase Effort Status Verification
Phase 1: Hash-based caching ~175 LOC Complete Touch xz.pkgar → libxml2 stays cached (mtime changed, BLAKE3 same); --force-rebuild bypasses
Phase 2: Binary store lookup ~70 LOC Complete Removed libxml2 target/ → re-cook → restored from repo/ binary store → cache hit
Phase 3: Package groups ~170 LOC Complete 3 unit tests pass (nested groups, explicit override, no-groups compat); mini build verified

Cache Flow (As Implemented)

Recipe build requested
  │
  ├─ force_rebuild? ────────────────────────────────► REBUILD
  │
  ├─ target/ dir exists with dep_hashes.toml?
  │   ├─ YES: Read stored hashes
  │   │       Collect current dep PKGAR blake3 from .toml files
  │   │       All match? ──► CACHE HIT (skip build)
  │   │       Any differ? ──► REBUILD
  │   │
  │   └─ NO: Fall back to mtime comparison (backward compat)
  │          stage < source? ──► REBUILD
  │          stage < deps? ──► REBUILD
  │          Otherwise ──► CACHE HIT
  │
  ├─ target/ dir MISSING but repo/<arch>/ has pkgar + toml?
  │   └─ YES: Restore stage artifacts from binary store
  │           Extract pkgar → stage dir
  │           Copy .toml + .dep_hashes.toml
  │           Generate auto_deps.toml from repo depends
  │           Then run cache check above (likely hit)
  │
  └─ Build complete → write dep_hashes.toml with current hashes

Risk Analysis

Risk Severity Mitigation
False cache hit (stale binary) High BLAKE3 is collision-resistant; --force-rebuild escape hatch; mtime fallback
build_deps_dir() not updated High Must be in same PR as Phase 1
Header-only dep false positives Low Same rate as current mtime; no regression
First-build confusion Low Mtime fallback; transparent to user
Prefix staleness (separate issue) N/A Prefix provides real ABI (libc.a, headers). Cascade is CORRECT. build-redbear.sh warns about stale prefix.

Testing Plan

  1. Hash-based caching verification:

    • Build qtbase from clean → record dep_hashes.toml
    • touch a relibc source file but DON'T change ABI → rebuild relibc → run make r.qtbase
    • Expected: cache hit (relibc PKGAR BLAKE3 didn't change because binary is same)
  2. Real ABI change verification:

    • Add a new function to relibc → rebuild relibc → run make r.qtbase
    • Expected: rebuild (relibc PKGAR BLAKE3 changed)
  3. Cross-component verification:

    • Change a base driver → rebuild base → run make r.mesa
    • Expected: cache hit (mesa doesn't depend on base)
  4. --force-rebuild verification:

    • make r.qtbase after touch without real change → cache hit
    • repo cook qtbase --force-rebuild → rebuild regardless
  5. Binary store verification:

    • Build qtbase → make cleanmake r.qtbase
    • Expected: restored from repo/ (no source rebuild)

Relationship to Existing Systems

  • auto_deps.toml: Unchanged. Still records runtime auto-discovered dependencies.
  • source_info.toml: Unchanged. Records commit/time identifiers for provenance.
  • stage.toml: Unchanged. Package metadata with BLAKE3 hash.
  • .patches-state: Unchanged. Tracks patch application state.
  • repo/<arch>/repo.toml: Unchanged. Central manifest of package name → BLAKE3.
  • build-redbear.sh prefix warning: Unchanged. Prefix staleness is a real ABI change.
  • AGENTS.md durability policy: Unchanged. All new code is in src/ (git-tracked).

Glossary

Term Definition
PKGAR Ed25519-signed package archive format used by Redox/Red Bear OS
BLAKE3 Cryptographic hash used to fingerprint PKGAR content
stage.toml Per-recipe metadata file with BLAKE3, deps, version, identifiers
auto_deps.toml Per-recipe file with runtime auto-discovered dependencies (ELF DT_NEEDED)
dep_hashes.toml NEW — Per-recipe file storing BLAKE3 hashes of build deps at build time
repo/ Per-arch directory storing built PKGARs + metadata + central manifest
prefix/ Cross-compiler toolchain (Clang/LLVM + relibc sysroot)
Cascade rebuild A low-level change triggering rebuild of all dependents transitively
False cascade A rebuild triggered by mtime change despite identical binary content