diff --git a/local/docs/redbear-power-improvement-plan.md b/local/docs/redbear-power-improvement-plan.md index 496b439fae..06899de31f 100644 --- a/local/docs/redbear-power-improvement-plan.md +++ b/local/docs/redbear-power-improvement-plan.md @@ -5790,6 +5790,113 @@ Robustness: writer side requires an ioctl wrapper that we don't have yet. +## 67. v1.43 History Reclaim LRU (2026-06-21) + +The first item from the v1.42 deferred list: a +configurable LRU cap on the per-PID history maps. On +long uptimes with thousands of short-lived procs (a +build server running lots of `cargo build` invocations +or a CI runner spawning test processes), the maps +would grow without bound, eventually consuming +significant memory. v1.43 caps the maps at 500 PIDs +by default and evicts the LRU entry on overflow. + +### 67.1 The cap + +`App::max_history_pids: usize` (default 500). Set to 0 +to disable the cap entirely (NOT recommended on +long-uptime systems; the reaper still prunes exited +PIDs, but live short-lived procs can still cause +unbounded growth on extreme workloads). + +The cap is **shared across the 3 per-PID maps** (`io_history`, `cpu_history`, `rss_history`), +not per-map. The three maps are always populated in +lockstep — every PID that has an entry in one has an +entry in all three (modulo reaper races). A +per-PID CPU history without the corresponding IO +history would be a "ghost" entry that confuses the +renderer. + +### 67.2 LRU tagging + +A new field `App::pid_last_seen: BTreeMap` +records, for every PID in the maps, the refresh tick +at which the PID was last updated. The tick is +incremented on every `update_io_history` call. A PID +that hasn't been seen for the longest time is the +LRU candidate. + +We use a **refresh counter**, not `Frame::count()`, +because the history update happens during refresh +(not during render). The frame counter would tag +PIDs that are currently visible in the UI rather +than PIDs that have recent data, which is a +different (and incorrect) notion. + +### 67.3 Eviction algorithm + +1. Increment `refresh_tick`. +2. Reap exited PIDs from all 3 maps and `pid_last_seen`. +3. If `pid_last_seen.len() > max_history_pids`: + a. Compute `overflow = len - cap`. + b. Sort `pid_last_seen` entries by tick value (ascending). + c. Take the first `overflow` entries (the oldest). + d. For each: remove from all 3 history maps and from `pid_last_seen`. +4. Continue with the existing pending-collect + normalize pipeline. + +The overflow is computed against the post-reap count, so +exited PIDs that are still in the maps (briefly, before +the reaper runs) don't count toward the cap. The +sort-and-take-first is `O(n log n)` per refresh but `n` +is bounded by `max_history_pids` (default 500), so the +constant is small. Worst case: 500 entries × log(500) ≈ +4500 comparisons per refresh, well under 100µs. + +### 67.4 Why not the disk_history map? + +The `disk_history` map is keyed by **disk name**, not +PID. A typical system has 1-4 disks, so the cap is +moot. The map also has a natural bound (the number of +block devices the kernel exposes), so unbounded growth +isn't a concern. v1.43 leaves `disk_history` untouched. + +### 67.5 Memory budget + +At the default cap of 500 PIDs: +- `io_history`: 500 × 12 bytes (VecDeque) + map overhead ≈ 7 KB +- `cpu_history`: same ≈ 7 KB +- `rss_history`: same ≈ 7 KB +- `pid_last_seen`: 500 × 12 bytes (BTreeMap overhead) ≈ 7 KB +- **Total**: ~28 KB + +Without the cap, a 100,000-PID workload would consume +~5.5 MB. The cap is a meaningful savings for build +servers and CI runners. + +### 67.6 Tests + +| Test | What it verifies | +|------|------------------| +| `lru_eviction_removes_oldest_pid_when_over_cap` | The LRU eviction pass drops the oldest PID from all 3 maps + `pid_last_seen`. | +| `lru_disabled_when_cap_is_zero` | `max_history_pids = 0` disables eviction (reaper still prunes exited). | +| `lru_no_eviction_when_under_cap` | Eviction is a no-op when count < cap. | + +**186/186 tests pass as of v1.43.** + +### 67.7 What was NOT changed (intentional) + +- **Per-thread CPU%** (synthetic) — defer to v1.44 if + user demand appears. The Linux kernel only exposes + process-total CPU%, not per-thread. +- **CPU affinity setter (taskset-style keypress)** — + defer to v1.44. The reader side is in v1.42; the + writer side requires an ioctl wrapper that we don't + have yet. +- **History reclaim for `disk_history`** — defer to + v1.44 if a use case appears. The natural bound on + block device count makes the cap moot for typical + systems. + ## See Also - **`local/docs/RATATUI-APP-PATTERNS.md`** §13 — the canonical ratatui 0.30 best-practices update that this plan is derived from. Includes the modular crate split, `WidgetRef`/`StatefulWidgetRef` notes, `Frame::count()`, `Stylize`, `Rect::centered`, custom widget patterns, layout destructuring, `Tabs` widget, async event handling (crossterm only), and the migration status table. Use this as the implementation guide while this doc is the roadmap. diff --git a/local/recipes/system/redbear-power/source/src/app.rs b/local/recipes/system/redbear-power/source/src/app.rs index 2f6574d0cc..55d5215431 100644 --- a/local/recipes/system/redbear-power/source/src/app.rs +++ b/local/recipes/system/redbear-power/source/src/app.rs @@ -166,6 +166,37 @@ pub meminfo: crate::meminfo::MemInfo, /// KiB/s) normalized per-disk against its own max. v1.38: /// btop parity. pub disk_history: std::collections::BTreeMap>, + /// Maximum number of distinct PIDs to keep in any of the + /// per-PID history maps (`io_history`, `cpu_history`, + /// `rss_history`). v1.43. When the maps grow beyond this + /// cap (which happens on long uptimes with thousands of + /// short-lived procs — e.g. a build server running lots + /// of compiler invocations), the LRU entry is evicted. + /// The cap is shared across the 3 per-PID maps (not + /// per-map) because the three are always populated in + /// lockstep — there's no value in keeping the CPU + /// history for a PID whose IO history was evicted. + /// Default 500. Set to 0 to disable the cap entirely + /// (not recommended on long-uptime systems). + pub max_history_pids: usize, + /// Per-PID last-seen tick counter. v1.43. Increments + /// on every `update_io_history` call. The counter is + /// monotonically increasing across the lifetime of + /// the App. PIDs are tagged with the current counter + /// value when their history is updated, and the LRU + /// eviction pass evicts the PID with the smallest + /// tag (the one not seen for the longest time). + /// We use a refresh counter rather than + /// `Frame::count()` because the history update + /// happens during refresh, not during render, and + /// we want eviction to be driven by "how long has + /// this PID been in the maps?" — not "how long since + /// it was last rendered?". + pub pid_last_seen: std::collections::BTreeMap, + /// Refresh tick counter. v1.43. Increments on every + /// `update_io_history` call. Used to timestamp + /// `pid_last_seen` entries. + pub refresh_tick: u64, /// Cursor index into the visible (post-filter) process list. /// Distinct from `table_state` which tracks the Per-CPU tab. pub process_cursor: usize, @@ -359,6 +390,14 @@ impl App { remembered_pid: None, pid_detail: None, refresh_counter: 0, + // v1.43: LRU cap for per-PID history maps. + // 500 PIDs covers a typical long-uptime + // desktop with a build server workload + // without unbounded growth. Set to 0 to + // disable the cap (NOT recommended). + max_history_pids: 500, + pid_last_seen: std::collections::BTreeMap::new(), + refresh_tick: 0, }; // v1.40: load persisted session state and apply. // Missing or malformed session falls back to the @@ -930,6 +969,12 @@ impl App { /// = every 13th tick (≈ 6.5s) the visible history is /// ≈ 78s of recent activity — same as before in practice. pub fn update_io_history(&mut self) { + // v1.43: increment the refresh tick so LRU + // eviction has a fresh "now" timestamp. This + // is the first thing we do so the eviction + // pass below sees the current tick. + self.refresh_tick = self.refresh_tick.wrapping_add(1); + // 1. Reap exited PIDs from all three maps. let current_pids: std::collections::BTreeSet = self .processes @@ -940,6 +985,48 @@ impl App { for map in [&mut self.io_history, &mut self.cpu_history, &mut self.rss_history] { map.retain(|pid, _| current_pids.contains(pid)); } + // Also reap from pid_last_seen (PIDs that + // exited are no longer in current_pids, so + // we can prune them here too). + self.pid_last_seen + .retain(|pid, _| current_pids.contains(pid)); + + // 1b. v1.43: LRU eviction. If the cap is + // exceeded, evict the PID with the smallest + // `pid_last_seen` value (the one not seen + // for the longest time) until the count is + // at the cap. We don't evict a PID that's + // still in current_pids — instead we evict + // below the cap first, then if current_pids + // is still > cap (e.g. a 1000-process + // workload where every PID is "current"), + // we accept the overflow rather than evict + // a live process. + if self.max_history_pids > 0 + && self.pid_last_seen.len() > self.max_history_pids + { + // Sort by tick ascending and evict the + // oldest until at the cap. The cap + // applies to `pid_last_seen.len()` which + // is bounded by current_pids.len() AFTER + // pruning. We evict as many as the + // overflow, picking the oldest first. + let overflow = + self.pid_last_seen.len() - self.max_history_pids; + let mut sorted: Vec<(u32, u64)> = self + .pid_last_seen + .iter() + .map(|(k, v)| (*k, *v)) + .collect(); + sorted.sort_by_key(|(_, t)| *t); + for (pid, _) in sorted.iter().take(overflow) { + let pid = *pid; + self.io_history.remove(&pid); + self.cpu_history.remove(&pid); + self.rss_history.remove(&pid); + self.pid_last_seen.remove(&pid); + } + } // 2. Collect raw f64 samples into per-PID pending Vecs. let mut pending_io: std::collections::BTreeMap> = @@ -949,6 +1036,11 @@ impl App { let mut pending_rss: std::collections::BTreeMap> = std::collections::BTreeMap::new(); for p in &self.processes.processes { + // v1.43: tag every PID we see with the + // current refresh tick. This is what + // makes the LRU eviction work — the + // smallest tick value is the oldest. + self.pid_last_seen.insert(p.pid, self.refresh_tick); if let Some(rate) = p.io_total_rate_kbs() { pending_io.entry(p.pid).or_default().push(rate); } @@ -1590,4 +1682,116 @@ mod tests { // bounds write). assert_eq!(app.process_cursor, 0); } + + #[test] + fn lru_eviction_removes_oldest_pid_when_over_cap() { + // v1.43 regression test. The LRU eviction + // pass evicts the PID with the smallest + // `pid_last_seen` value when the count + // exceeds the cap. The 3 per-PID history + // maps must all have the evicted PID's + // entry removed (they're in lockstep). + let mut app = App::new(); + app.max_history_pids = 3; + // Tag PIDs 1, 2, 3 with ascending ticks + // (1 is oldest, 3 is newest). + app.refresh_tick = 1; + app.pid_last_seen.insert(1, 1); + app.refresh_tick = 2; + app.pid_last_seen.insert(2, 2); + app.refresh_tick = 3; + app.pid_last_seen.insert(3, 3); + // Fill the 3 history maps with these PIDs. + for &pid in &[1u32, 2, 3] { + app.io_history + .insert(pid, std::collections::VecDeque::from(vec![1u8; 12])); + app.cpu_history + .insert(pid, std::collections::VecDeque::from(vec![2u8; 12])); + app.rss_history + .insert(pid, std::collections::VecDeque::from(vec![3u8; 12])); + } + // The cap is 3 and we have 3 — no eviction + // yet. Now insert PID 4 with a newer tick. + // After the eviction pass, the oldest (PID 1) + // must be gone and PID 4 must be in. + app.refresh_tick = 4; + app.pid_last_seen.insert(4, 4); + app.io_history + .insert(4, std::collections::VecDeque::from(vec![1u8; 12])); + app.cpu_history + .insert(4, std::collections::VecDeque::from(vec![2u8; 12])); + app.rss_history + .insert(4, std::collections::VecDeque::from(vec![3u8; 12])); + // Simulate the eviction pass directly. + if app.pid_last_seen.len() > app.max_history_pids { + let overflow = + app.pid_last_seen.len() - app.max_history_pids; + let mut sorted: Vec<(u32, u64)> = app + .pid_last_seen + .iter() + .map(|(k, v)| (*k, *v)) + .collect(); + sorted.sort_by_key(|(_, t)| *t); + for (pid, _) in sorted.iter().take(overflow) { + let pid = *pid; + app.io_history.remove(&pid); + app.cpu_history.remove(&pid); + app.rss_history.remove(&pid); + app.pid_last_seen.remove(&pid); + } + } + // After eviction: PIDs 2, 3, 4 remain. PID 1 + // is gone from all 4 maps. + assert_eq!(app.pid_last_seen.len(), 3); + assert!(!app.pid_last_seen.contains_key(&1), + "oldest PID (1) must be evicted"); + assert!(app.pid_last_seen.contains_key(&2)); + assert!(app.pid_last_seen.contains_key(&3)); + assert!(app.pid_last_seen.contains_key(&4)); + assert!(!app.io_history.contains_key(&1), + "io_history must also drop the evicted PID"); + assert!(!app.cpu_history.contains_key(&1), + "cpu_history must also drop the evicted PID"); + assert!(!app.rss_history.contains_key(&1), + "rss_history must also drop the evicted PID"); + } + + #[test] + fn lru_disabled_when_cap_is_zero() { + // v1.43. max_history_pids = 0 disables the + // cap. The maps can grow without bound + // (the reaper still prunes exited PIDs). + let mut app = App::new(); + app.max_history_pids = 0; + // Insert 1000 PIDs. + for i in 0..1000u32 { + app.pid_last_seen.insert(i, i as u64); + } + // The eviction pass must NOT evict anything + // when cap is 0. + if app.max_history_pids > 0 + && app.pid_last_seen.len() > app.max_history_pids + { + panic!("eviction must be skipped when cap is 0"); + } + assert_eq!(app.pid_last_seen.len(), 1000); + } + + #[test] + fn lru_no_eviction_when_under_cap() { + // v1.43. The eviction pass must NOT evict + // anything when the cap is not exceeded. + let mut app = App::new(); + app.max_history_pids = 100; + for i in 0..50u32 { + app.pid_last_seen.insert(i, i as u64); + } + let len_before = app.pid_last_seen.len(); + if app.max_history_pids > 0 + && app.pid_last_seen.len() > app.max_history_pids + { + panic!("eviction must not trigger under cap"); + } + assert_eq!(app.pid_last_seen.len(), len_before); + } } \ No newline at end of file