diff --git a/local/docs/BUILD-SYSTEM-IMPROVEMENTS.md b/local/docs/BUILD-SYSTEM-IMPROVEMENTS.md new file mode 100644 index 0000000000..6091c787d2 --- /dev/null +++ b/local/docs/BUILD-SYSTEM-IMPROVEMENTS.md @@ -0,0 +1,269 @@ +# Build System Improvements — v6.0 Post-Mortem (2026-06-12) + +This document analyzes the build system gaps that surfaced during the v6.0 +KDE/Qt/Plasma desktop path bring-up (2026-04 through 2026-06) and +proposes targeted, low-risk improvements. Each improvement is sized as +S (small, < 1 day), M (medium, 1-3 days), or L (large, 1+ week). + +## Context + +The current build system handled 136 packages and 45 KF6 + 8 Plasma 6.6 +cook batches over ~2 days of wall-clock time on the desktop path. The +following pain points consumed the majority of that time: + +| Pain point | Time lost | Frequency | +|---|---|---| +| Cascade rebuilds from relibc header changes | 4+ hr | every relibc cook | +| Cookbook re-cooking already-built packages | 2+ hr | every batch cook | +| Python heredoc escaping bugs in TOML recipes | 1+ hr | 3+ times | +| Per-recipe "stale sysroot" diagnosis | 30+ min | every failure | +| `cookbook_apply_patches` non-idempotency for sddm 0.21 | 1+ hr | once | +| `redbear-build` cook sequence not parallelizable | continuous | always | +| QML gate (Qt6Quick can't cross-compile) | ongoing | forever | + +The two recent commits that fixed the worst issues: + +- `68c795f4d cook: fix transient sysroot/stage rebuilds with content-hash + fingerprints` — per-recipe sysroot and stage cache now use + blake3-of-deps-content rather than mtime. A relibc pkgar bump no longer + cascades every downstream per-recipe sysroot. +- `04c979942 rebuild-cascade: walk [build].dependencies and [build].dev_dependencies` + — rebuild-cascade.sh now also walks build-time-only consumers + (kf6-extra-cmake-modules, qt tools, etc.) that were previously invisible. + +## Proposed improvements (priority order) + +### 1. Parallel-safe cook pool (M, ~2 days) + +**Problem.** `cook A B C D` runs strictly serially. KF6 batch of 15 cooks +takes ~2 hours wall-clock. The cookbook has no parallel-cook mode. + +**Proposal.** Add `repo cook --jobs=N` that runs N independent cookbook +invocations in parallel, each writing to its own `target//build/` +and `target//stage.tmp/` (no cross-contamination since per-recipe +target dirs are already isolated). The driver serializes the **push** step +(so the dep-fingerprint scheme is consistent) but parallelizes +configure + build. Pre-conditions: + +- Each recipe's build script must not call `cookbook_apply_patches` in a + way that races with other cooks. (Current patches are per-recipe so OK.) +- The shared `build/qt-host-build` host toolchain is a single point of + contention; the cookbook should detect a build lock and wait/skip. + +**Expected gain.** 2-3x throughput on the 15-package KF6 batch +(parallelism limited by `-j24` on a 24-core machine and shared +qt-host-build contention). + +**Risk.** Medium — could expose races in the cookbook's stage.tmp +handling. Pilot on a 4-package batch first. + +### 2. `cook --repair` mode (S, ~0.5 day) + +**Problem.** When a cook fails mid-build, the user's only options are +`repo cook ` (which often re-runs the configure step from scratch) +or `rm -rf target//build target//stage.tmp` (which +re-pushes deps). Both are slow. + +**Proposal.** Add `repo cook --repair ` that: +1. Keeps the existing source dir + sysroot +2. Re-runs the cookbook's build script with the existing `build/` dir +3. Skips the configure step if `CMakeCache.txt` is newer than the + source dir +4. Only re-pushes the pkgar if the build artifact changed (use + `.deps-fingerprint` to gate the push) + +**Expected gain.** Cut per-failure recovery from 5-20 minutes to +30-60 seconds. Critical when iterating on a single recipe. + +**Risk.** Low — purely additive. Falls back to full cook on any error. + +### 3. Per-recipe patch idempotency auditor (S, ~0.5 day) + +**Problem.** External patches in `local/patches//*.patch` +that aren't `--reverse --check` clean cause the cookbook to fail with +confusing errors (we hit this 4+ times with sddm 0.21.0). The +`cookbook_apply_patches` helper uses `git apply --reverse --check` but +fails for any patch that has multiple hunks where some are in the +"to" state and others aren't. + +**Proposal.** Add a `validate-patches.sh` script that runs `git apply +--reverse --check` against every patch in `local/patches/`, plus a +`--apply --check --reverse --check` round-trip to verify both directions +work. Add a CI hook (or a `make lint` target) that runs this. + +**Expected gain.** Catch patch issues at lint time, not in a 2-hour +cook. The sddm 0.21.0 patch was 8+ hours of debugging. + +**Risk.** None. + +### 4. Cookbook-cached `repo cook` TUI status (M, ~1 day) + +**Problem.** When running `repo cook A B C D` in the background with +`CI=1`, the only status output is the cookbook's per-package tail. +There's no progress bar, no estimated time, no easy way to see +"currently cooking X, 7/15 done". + +**Proposal.** When `CI=1` (non-interactive), print a one-line +status update per package: `[05/15] kf6-kio build 47% (12m 34s elapsed)`. +Parse ninja's stderr for `[X/Y]` build progress. Print to stdout +flushed each line. + +**Expected gain.** Better UX for long cooks. Doesn't change wall-clock +time, but lets the user know if the cook is making progress or stuck. + +**Risk.** None. + +### 5. Build-time recipe lint in `make lint` (M, ~1 day) + +**Problem.** Many recipe errors surface only at cook time: +- TOML Python heredoc escaping (8d4527e20 fixed one) +- Missing `[build].dependencies` (the kde-cli-tools bug we hit) +- Wrong `version` in pkgar vs recipe (silent) +- Patches that don't apply to current upstream (the sddm 0.21 issue) + +**Proposal.** Extend `make lint` (currently lint-config) to include +recipe-level checks: + +1. For every recipe, parse `recipe.toml` and verify `[build].dependencies` + lists every `[package].dependencies` member. (Currently a 1:1 mismatch + is a common bug.) +2. For every recipe with `[source].patches` array, verify each patch + applies to the source at the pinned rev (git apply --check). +3. For every recipe, verify the resulting `.pkgar` is in `repo/` with + matching `version =` in the toml. +4. For every recipe with `[build].script`, lint the script for common + errors (missing `cookbook_apply_patches`, missing `${COOKBOOK_*}` env + vars, etc.). + +**Expected gain.** Catch issues at `make lint` time, not 2 hours into +a cook. The kde-cli-tools missing-dep bug alone cost 30+ minutes. + +**Risk.** None. Lint is a separate step. + +### 6. `recipes/kf6-*` recipe dep audit (S, ~0.5 day) + +**Problem.** The 45 KF6 recipes have grown over time and their +`[build].dependencies` arrays are sometimes out of sync with the actual +code requirements. Examples from this session: +- kde-cli-tools needed `kf6-kcmutils` and `kf6-parts` (added by us) +- kf6-kio had a circular reference risk via `kf6-kparts` +- kf6-syntaxhighlighting had a host-toolchain Python env escaping bug + +**Proposal.** Run a one-time `audit-recipe-deps.sh` that, for each KF6 +recipe, downloads the source, parses the CMakeLists.txt + *.cmake +files, extracts `find_package(KF6::* COMPONENTS ...)` calls, and +verifies every component is in `[build].dependencies`. Report any +mismatches as warnings. + +**Expected gain.** Prevents future "missing dep" failures. No runtime +impact. + +**Risk.** None. + +### 7. QML gate — make Qt6Quick host-targetable (L, ~2 weeks) + +**Problem.** Qt6Quick/QML cross-compilation is broken on Redox. This +blocks KWin, plasma-framework, plasma-desktop, plasma-workspace — +the entire KDE desktop path. The issue is in Qt6's internal QML tooling +that uses `qmltyperegistrar` and `qmlimportscanner` host binaries. + +**Proposal.** Two-track approach: + +A. **Short term (S).** Build a Linux-host x86_64 qmltyperegistrar and +qmlimportscanner, install them in `~/.redoxer/x86_64-unknown-redox/toolchain/bin/`, +and add to the toolchain. The KF6 recipes' cmake already supports +`QT_HOST_PATH` for this purpose. + +B. **Long term (L).** Add a Redox-host qmltyperegistrar implementation. +This requires re-implementing ~2000 lines of Qt internal C++ — out of +scope for "complex fixes", needs its own sub-project. + +**Expected gain.** Track A unblocks the entire KDE desktop path. Track B +is a long-term maintainability win. + +**Risk.** Track A is low risk (it's how upstream Redox already handles +it). Track B is high risk (substantial new code). + +### 8. `redbear_qt_link_sysroot_dirs` should be a no-op when not needed (S, ~0.25 day) + +**Problem.** Many KF6 recipes call `redbear_qt_link_sysroot_dirs +"${COOKBOOK_SYSROOT}" plugins mkspecs metatypes modules`. This is +needed for qtbase's CMake configs to find the right paths. But the +recipe has to be edited to call it; if forgotten, the build fails +with cryptic "Qt6::Qml not found" errors. + +**Proposal.** Move the `redbear_qt_link_sysroot_dirs` call into a +universal cookbook hook that runs for every recipe that has +`qtbase` or `qtdeclarative` in `[build].dependencies`. The hook +auto-detects qt deps and applies the symlinks. + +**Expected gain.** Removes a common footgun. New KF6 recipes just work. + +**Risk.** Low — purely additive. + +### 9. Cookbook build-failure classifier (M, ~1 day) + +**Problem.** When a cook fails, the user has to manually parse the +tail of the output to figure out which of the 20+ common failure +modes it is. We hit at least 8 distinct failure modes this session: +- GLESv2 / Qt6Gui visibility +- Python3 development headers missing +- LibMount missing +- relibc `` not found +- C++20 std::ranges not declared +- C++ qfloat16 (__extendhfdf2) missing +- Stale sysroot (KF6CoreAddons 6.10 vs 6.26) +- gettext gnulib rebuild loop + +**Proposal.** Add `repo cook --explain-failure` that runs after a +failed cook, scans the build log, and outputs a structured diagnosis: +``` +cook kf6-kio failed. Likely cause: GLESv2 / Qt6 visibility + Evidence: line 1234: undefined reference to `KIconLoader::global()' + Fix: add `-DCMAKE_CXX_VISIBILITY_PRESET=default` to cmake flags + Reference: AGENTS.md §"COMPLEX FIX CHECKLIST (v6.0-impl17)" entry 10 +``` + +**Expected gain.** Cut per-failure diagnosis from 5-10 minutes to +10-30 seconds. Critical for new contributors. + +**Risk.** None — read-only analysis. + +### 10. Cookbook scratch-build system (L, ~1 week) + +**Problem.** When something goes deeply wrong (e.g. relibc headers +change), there's no way to "rebuild everything that uses autotools". +The `build-redbear.sh` has a stale detection but it only triggers on +relibc/kernel/base source commits, not on dep pkgar changes. + +**Proposal.** Add `make scratch-rebuild` that: +1. Identifies all packages using autotools (pcre2, gettext, libiconv, etc.) +2. For each, deletes `target//build` and `target//sysroot` +3. Recooks in dependency order + +Uses the existing content-hash fingerprints to scope the rebuild +narrowly. Most useful after a toolchain or relibc change. + +**Expected gain.** Predictable, narrow rebuild after low-level changes. +Eliminates the "delete and pray" pattern. + +**Risk.** Medium — needs to be tested against real cascades. + +## Summary + +| # | Title | Size | Gain | Risk | +|---|---|---|---|---| +| 1 | Parallel-safe cook pool | M | 2-3x | M | +| 2 | `cook --repair` mode | S | 5-10x per-failure | L | +| 3 | Per-recipe patch idempotency auditor | S | Catch at lint | None | +| 4 | Cook TUI status | M | UX | None | +| 5 | Build-time recipe lint | M | Catch at lint | None | +| 6 | KF6 recipe dep audit | S | Prevent bugs | None | +| 7 | QML gate | L | Unblock KDE | A: L, B: H | +| 8 | Auto-link Qt sysroot dirs | S | Fewer bugs | L | +| 9 | Failure classifier | M | 5-10x diagnosis | None | +| 10 | Scratch-rebuild system | L | Predictable | M | + +Recommended order: #3, #6, #8 (S-sized, low risk, quick wins), then #2, +#5, #9 (M-sized, real productivity wins), then #4, #7A, #10, #1 +(bigger), then #7B as a separate project.