Files
RedBear-OS/local/docs/RAPL-IMPLEMENTATION-PLAN.md
vasilito ee086ded2d redbear-power: RAPL MSR constants, unit parsing, MSR-based energy reading
- msr.rs: add all Intel RAPL MSR addresses (0x606-0x64D) and AMD Zen
  equivalents (0xC0010299-0xC001029B), RaplUnit struct for unit register
  parsing with energy_to_uj/power_to_w conversion, read_rapl_energy()
  and read_rapl_energy_uj() functions
- acpi.rs: read_rapl_package_energy() now tries MSR first (Intel then
  AMD PKG energy MSRs) with unit-based µJ conversion, falls back to
  Linux powercap sysfs
- local/docs/RAPL-IMPLEMENTATION-PLAN.md: comprehensive 3-phase plan
  based on Linux 7.1 kernel analysis, Intel SDM, Fuchsia RFC-0203
  patterns. Documents P0 blocker: /scheme/sys/msr/ not implemented
  in kernel
2026-06-28 16:55:51 +03:00

22 KiB
Raw Permalink Blame History

RedBear OS RAPL Power Monitoring — Comprehensive Implementation Plan

Version: 1.1 (2026-06-28) Status: Draft — awaiting review Linux Reference: local/reference/linux-7.1/drivers/powercap/

P0 Blocker: Kernel MSR Scheme Does Not Exist

The /scheme/sys/msr/{cpu}/0x{msr_hex} path used by redbear-power and cpufreqd is NOT implemented in the kernel. The kernel's sys: scheme handler has modules for cpu, exe, irq, block, syscall, context, uname, fdstat, iostat, log, stat — but NO msr module. The kernel uses rdmsr/wrmsr internally (via the x86 crate) but never exposes MSRs to userspace.

This is the single blocking gap for ALL RAPL work. Without it:

  • read_msr(0x611) returns None on bare-metal Redox
  • redbear-power can only read RAPL on Linux hosts via sysfs or /dev/cpu/*/msr
  • cpufreqd cannot write IA32_PERF_CTL on Redox bare metal
  • thermald cannot read thermal status MSRs

Resolution options (pick one before Phase 1):

  1. Kernel sys:msr handler — Add src/scheme/sys/msr.rs to kernel, expose as /scheme/sys/msr/{cpu}/0x{msr_hex}. Requires CAP_SYS_MSR.
  2. Userspace msrd daemon — Register scheme:msr via redox-scheme. More portable, easier to iterate, no kernel rebuild.
  3. Linux-compatible /dev/cpu/*/msr device — Matches what thermald already uses. Requires VFS-level char device emulation.

Recommendation: Option 2 (userspace daemon). It's the fastest path, doesn't require kernel changes, and can use iopl(3) + inl/outl for x86 port-based MSR access, or the x86 crate's rdmsr if ring0 access is available through a capability scheme.

Executive Summary

Implement hardware-accurate CPU/package/DRAM power monitoring in RedBear OS using the Intel RAPL (Running Average Power Limit) and AMD energy counter MSRs. This replaces the current redbear-power approach of relying on P-state table power estimates (static, inaccurate) and sysfs powercap fallback (Linux-only).

Total scope: 3 phases, ~8-12 weeks with 1 developer. Prerequisite: Redox MSR scheme (/scheme/sys/msr/) — already functional.


1. RAPL Architecture (what we're implementing)

1.1 Power Domains

Intel RAPL exposes energy counters for 5 domains:

Domain MSR (read) Description Typical accuracy
PKG (package) 0x611 Entire CPU socket/package power ±2-5%
PP0 (core) 0x639 All CPU cores collectively ±5-10%
PP1 (graphics) 0x641 GPU uncore (client CPUs only) N/A on server
DRAM 0x619 DRAM controller power ±10-20%
PSys (platform) 0x64D Entire platform (CPU + PCH + VR losses) ±5-15%

Not all domains are available on all CPUs. Server CPUs typically have PKG + DRAM. Client CPUs typically have PKG + PP0 + PP1. PSys is rare (some server platforms).

1.2 Energy Counter Semantics

Energy (µJ) = RAW_COUNTER × ENERGY_UNIT

where ENERGY_UNIT = 1.0 / (2 ^ exponent)
      exponent   = (MSR_RAPL_POWER_UNIT >> 8) & 0x1F
  • 32-bit counter at each MSR, monotonically increasing
  • Wraps at 2^32 (at 100W with 15.3µJ unit: ~60 seconds for package, ~days for per-core)
  • Linux mitigation: hrtimer polls every ~2.5ms to accumulate into 64-bit software counters before the 32-bit hardware counter wraps (overflow detection via max_energy_range_uj)
  • Read semantics: poll at interval Δt, compute P = ΔE/Δt
  • Zero overhead: reads are just an MSR RDMSR instruction (~20-50 cycles)

1.3 Unit Register (MSR 0x606)

MSR_RAPL_POWER_UNIT (0x606):
  Bits [3:0]   — Power unit (Watts = RAW × 2^-exponent)
  Bits [12:8]  — Energy unit (Joules = RAW × 2^-exponent)  
  Bits [19:16] — Time unit (Seconds = RAW × 2^-exponent)

Default values (Sandy Bridge through Alder Lake):
  Power unit:  0.125 W  (exponent = 3)
  Energy unit: 15.3 µJ  (exponent = 16)
  Time unit:   976 µs   (exponent = 10)

1.4 AMD Energy Counters

AMD Zen 2+ (Family 17h+, Model 30h+) exposes RAPL-compatible MSRs at the SAME addresses as Intel but with a different unit register:

  • MSR_AMD_RAPL_POWER_UNIT = 0xC0010299 (AMD-specific unit register)
  • Energy counter MSRs at same addresses (0x611, 0x619, 0x639, 0x641)
  • PKG domain always available on Zen 2+
  • PP0 (core) available on Zen 2+
  • No DRAM or PSys on most AMD platforms
  • Unit conversion: AMD uses (raw >> 8) & 0x1F for energy exponent (same bit layout)

2. Implementation Plan

Phase 1: Core RAPL Reader (4-6 weeks)

Goal: A userspace daemon (rapld) that reads RAPL energy counters and exposes them through a scheme for consumption by redbear-power.

1.1 MSR Access Layer (local/recipes/drivers/redox-driver-sys/source/src/)

Add RAPL MSR constants and read helpers to redox-driver-sys:

// New file: src/rapl.rs or extend src/msr.rs

pub const MSR_RAPL_POWER_UNIT: u32 = 0x606;
pub const MSR_PKG_ENERGY_STATUS: u32 = 0x611;
pub const MSR_PKG_PERF_STATUS: u32 = 0x613;
pub const MSR_PKG_POWER_INFO: u32 = 0x614;
pub const MSR_DRAM_ENERGY_STATUS: u32 = 0x619;
pub const MSR_PP0_ENERGY_STATUS: u32 = 0x639;
pub const MSR_PP1_ENERGY_STATUS: u32 = 0x641;
pub const MSR_PLATFORM_ENERGY_STATUS: u32 = 0x64D;
pub const MSR_AMD_RAPL_POWER_UNIT: u32 = 0xC0010299;

pub struct RaplUnit {
    pub power_exponent: u8,   // bits 3:0
    pub energy_exponent: u8,  // bits 12:8
    pub time_exponent: u8,    // bits 19:16
}

pub struct RaplDomain {
    pub name: &'static str,
    pub energy_msr: u32,
    pub last_energy: u64,
    pub last_timestamp: Instant,
    pub power_w: f64,          // computed: ΔE/Δt
    pub available: bool,
}

pub struct RaplPackage {
    pub id: u32,
    pub units: RaplUnit,
    pub domains: Vec<RaplDomain>,
}

Key functions:

  • rapl_read_unit(lead_cpu: u32) -> Option<RaplUnit> — reads MSR 0x606 (or 0xC0010299 on AMD)
  • rapl_read_energy(lead_cpu: u32, msr: u32) -> Option<u64> — reads 32-bit energy counter
  • rapl_detect_domains(cpu: u32, unit: &RaplUnit) -> Vec<RaplDomain> — probe which MSRs exist
  • rapl_compute_power(domain: &mut RaplDomain, interval: Duration) -> f64 — ΔE/Δt

AMD detection: Use CPUID vendor string. If AuthenticAMD:

  • Use MSR_AMD_RAPL_POWER_UNIT instead of MSR_RAPL_POWER_UNIT
  • MSR 0x611 (PKG) available on Zen 2+ (Family >= 0x17)
  • MSR 0x639 (PP0/core) available on Zen 2+
  • DRAM/PSys typically absent

Error handling (crucial for heterogeneous hardware):

  • Attempt to read each MSR; if #GP(0) → domain unavailable → mark available = false
  • On QEMU: RAPL MSRs are typically unimplemented → graceful degradation
  • On virtualized guests: RAPL may be passed through or absent → check before assuming
  • Wrap detection: if current < previous → counter wrapped → ΔE = (2^32 - prev) + current

1.2 rapld Daemon (local/recipes/system/rapld/)

Architecture:

rapld (userspace daemon, Rust)
├── MSR polling (every 500ms default, configurable)
├── Energy-to-power delta computation
├── Per-domain statistics (avg, max, min over sliding window)
├── Scheme registration: scheme:power or scheme:rapl
└── Optional D-Bus export for desktop integration

Scheme paths:

/scheme/rapl/package/0/power_w       — current package power in watts
/scheme/rapl/package/0/energy_uj     — cumulative energy in µJ
/scheme/rapl/package/0/max_w         — max observed power
/scheme/rapl/package/0/avg_w         — average over sliding window
/scheme/rapl/core/0/power_w          — PP0 (core) power
/scheme/rapl/dram/0/power_w          — DRAM power
/scheme/rapl/units                   — unit coefficients

Polling cadence:

  • Default 500ms (matches redbear-power refresh rate)
  • Configurable via /scheme/rapl/interval_ms
  • Adaptive: can speed up during high load, slow down during idle

1.3 redbear-power Integration

Modify local/recipes/system/redbear-power/source/src/ to use rapld data:

  1. Add rapl_available flag to platform probe: try reading /scheme/rapl/package/0/power_w
  2. Replace RAPL sysfs fallback with scheme read:
    // Before: read_rapl_package_energy() → /sys/class/powercap/...
    // After:  read("/scheme/rapl/package/0/power_w") → f64
    
  3. Add per-domain display to System/Info tabs:
    • Package power: always shown
    • Core power: shown if available
    • DRAM power: shown if available
  4. Use energy_uj for more accurate power when available (avoid double-read of power_w + scheme)

Fallback chain (Linux host):

1. /scheme/rapl/package/0/power_w          (rapld — Redox native)
2. /sys/class/powercap/intel-rapl:0/energy_uj  (RAPL sysfs — current implementation)
3. MSR direct read via /dev/cpu/*/msr           (existing MSR path)
4. P-state table power estimate                  (current fallback, inaccurate)

Fallback chain (bare metal Redox):

1. /scheme/rapl/package/0/power_w    (rapld)
2. MSR direct read via /scheme/sys/msr/ (existing MSR scheme)
3. P-state table power estimate

Phase 2: Power Limiting & Thermal Integration (4-6 weeks)

Goal: Expose RAPL power limits for read/write and integrate with thermald.

2.1 Power Limit Registers

Register MSR Bits Description
PKG_POWER_LIMIT 0x610 [14:0] PL1, [46:32] PL2 Package long/short-term power limits
PP0_POWER_LIMIT 0x638 [14:0] PL1 Core power limit
PL1_ENABLE Bit 15 Enable PL1 enforcement
PL1_CLAMP Bit 16 Allow frequency clamping below OS request
TIME_WINDOW1 [23:17] PL1 averaging window (in time units)
POWER_LIMIT_LOCK Bit 63 Hardware lock (write-once)

2.2 rapld Power Limit Extension

Add scheme paths for power limit control:

/scheme/rapl/package/0/limit1_w         — PL1 (long-term) power limit in watts (RW)
/scheme/rapl/package/0/limit2_w         — PL2 (short-term) power limit in watts (RW)
/scheme/rapl/package/0/time_window1_s   — PL1 time window in seconds (RW)
/scheme/rapl/package/0/locked           — true if limits are hardware-locked (RO)

Safety guards:

  • CAP_SYS_MSR required for writes
  • Validate against PKG_POWER_INFO (max/min power, max time window)
  • Never write below THERMAL_SPEC_POWER unless explicitly forced
  • Log all limit changes

2.3 thermald Integration

Modify thermald (already exists in base drivers) to:

  1. Read current package power from /scheme/rapl/package/0/power_w
  2. If power exceeds configurable threshold (default: TDP), lower P-state or trigger PL1 clamp
  3. Use energy_uj counter for accurate long-term energy accounting
  4. Expose thermal + power budget decisions via scheme

Integration point: thermald already reads /scheme/thermal for temperature. Add /scheme/rapl/ as a secondary data source for power-aware throttling.


Phase 3: redbear-power Full Integration (2-4 weeks)

Goal: Complete redbear-power RAPL integration with real-time power graphs.

3.1 Per-CPU Power Column

The current PkgW column shows P-state power estimate (or n/a). Replace with:

  CPU   Freq/MHz  PkgW(W)  CoreW(W)  DRAM(W)  Temp°C
▶ E0    2436      65.2      42.1      8.3      67

Data flow:

rapld (polls MSR 0x611 every 500ms)
  → writes to /scheme/rapl/package/0/power_w
    → redbear-power reads scheme (one syscall per refresh, not per-CPU)
      → displays single PKG value replicated across all CPU rows

Package power is socket-wide, so the value is the same for all CPU rows but the column stays for visual consistency. Core power (PP0) can be shown as a per-package aggregate or normalized per-core.

3.2 Power History Graph

Add a mini sparkline for package power (like the load sparkline):

PkgW(W) History (30s)
████▇▇▆▅▅▄▄▃ 65.2

Store last 30 samples of power_w in a VecDeque<f64> and render as Unicode block characters, normalized against max observed power.

3.3 System Tab Power Summary

Add to the System tab:

Power:
  Package:  65.2 W (avg: 58.1, max: 89.4)
  Cores:    42.1 W (avg: 38.2, max: 55.0)
  DRAM:      8.3 W
  Limit PL1: 95 W (TDP)
  Limit PL2: 125 W (boost)

3. CPU Support Matrix

3.1 Intel

Generation Arch PKG PP0 PP1 DRAM PSys Notes
Sandy Bridge (2nd) SNB First RAPL implementation
Ivy Bridge (3rd) IVB
Haswell (4th) HSW Client: PKG+PP0+PP1, Server: PKG+DRAM
Broadwell (5th) BDW
Skylake (6th) SKL PSys on server only
Kaby Lake (7th) KBL
Coffee Lake (8-9th) CFL
Comet Lake (10th) CML
Ice Lake (10th) ICL
Tiger Lake (11th) TGL
Alder Lake (12th) ADL Hybrid: PKG covers P+E cores
Raptor Lake (13-14th) RPL
Meteor Lake (Ultra) MTL New SoC architecture
Sapphire Rapids SPR Server: different PL bit layout

3.2 AMD

Generation Family PKG PP0 Notes
Zen 1 (Ryzen 1000) 17h No RAPL support
Zen+ (Ryzen 2000) 17h
Zen 2 (Ryzen 3000) 17h RAPL via MSR 0x611/0x639
Zen 3 (Ryzen 5000) 19h
Zen 4 (Ryzen 7000) 19h
Zen 5 (Ryzen 9000) 1Ah

AMD uses MSR_AMD_RAPL_POWER_UNIT (0xC0010299) for unit conversion instead of Intel's MSR_RAPL_POWER_UNIT (0x606). Energy counter MSR addresses are the same as Intel.

⚠️ Critical AMD vs Intel difference: AMD reports core energy per-core (MSR 0xC001029A), while Intel reports PP0 as all-cores aggregate (MSR 0x639). This means AMD core energy must be summed across all cores to get equivalent PP0 data, while Intel returns the total directly. The amd_energy driver on Linux runs a kernel thread every 100 seconds to accumulate per-core counters into 64-bit software counters to handle 32-bit wrap-around.

3.3 QEMU / Virtualized

  • QEMU: RAPL MSRs typically unimplemented → #GP(0) → graceful "n/a"
  • KVM with -overcommit cpu-pm=on: may expose RAPL to guest (rare)
  • VMware/Hyper-V: no RAPL passthrough known

Detection strategy: Try reading MSR 0x611 from CPU 0. If it returns an error (#GP fault), RAPL is unavailable. Cache this result so subsequent reads skip the MSR access entirely.


4. File Layout & Implementation Order

New files to create:

local/recipes/drivers/redox-driver-sys/source/src/rapl.rs     ← RAPL MSR constants + read primitives
local/recipes/system/rapld/                                    ← New RAPL daemon
local/recipes/system/rapld/recipe.toml                        ← Cargo template, depends: redox-driver-sys
local/recipes/system/rapld/source/Cargo.toml                  
local/recipes/system/rapld/source/src/main.rs                 ← Daemon: poll MSRs, serve scheme
local/recipes/system/rapld/source/src/units.rs                ← Unit conversion (power/energy/time)
local/recipes/system/rapld/source/src/domains.rs              ← Domain detection + power computation
local/recipes/system/rapld/source/src/scheme.rs               ← Scheme:power handler
local/docs/RAPL-IMPLEMENTATION-PLAN.md                        ← This document

local/recipes/system/redbear-power/source/src/rapl.rs         ← RAPL integration for redbear-power

Files to modify:

local/recipes/system/redbear-power/source/src/acpi.rs         ← Replace read_rapl_package_energy()
local/recipes/system/redbear-power/source/src/app.rs          ← Add rapl_domains, pkg_power_w from scheme
local/recipes/system/redbear-power/source/src/render.rs       ← Add CoreW/DRAM columns, power history
local/recipes/system/redbear-power/source/src/main.rs         ← Wire keybindings for power view
config/redbear-full.toml                                       ← Add rapld to [packages]
config/redbear-mini.toml                                       ← Optionally add rapld

Implementation order:

  1. Week 1-2: rapl.rs in redox-driver-sys — MSR constants, read/write primitives, unit parsing
  2. Week 3-4: rapld daemon — scheme server, domain detection, energy polling
  3. Week 5-6: redbear-power integration — scheme reader, per-domain columns, power history
  4. Week 7-8: Power limiting + thermald integration (Phase 2)
  5. Week 9-12: Testing on real hardware (Intel + AMD), QEMU graceful degradation, polish

5. Testing Strategy

5.1 Unit Tests (host, no hardware needed)

// Test unit conversion from known MSR values
fn test_rapl_unit_parsing() {
    // MSR_RAPL_POWER_UNIT = 0x000A1003
    // Power exponent = 3, Energy exponent = 0x10 = 16, Time exponent = 0xA = 10
    let unit = RaplUnit::from_msr(0x000A1003);
    assert_eq!(unit.power_exponent, 3);
    assert_eq!(unit.energy_exponent, 16);
    assert_eq!(unit.time_exponent, 10);
}

// Test energy-to-power conversion
fn test_energy_delta() {
    let prev = (1_000_000u64, Instant::now() - Duration::from_secs(1));
    let curr = (1_065_200u64, Instant::now());
    // ΔE = 65200 µJ over ~1 second = 65.2 W
    let (w, _) = rapl_power_watts(curr, prev);
    assert!((w - 65.2).abs() < 0.1);
}

// Test counter wrap
fn test_energy_wrap() {
    let prev = (0xFFFF_FFF0u64, Instant::now() - Duration::from_secs(1));
    let curr = (0x0000_0010u64, Instant::now());
    // ΔE = (2^32 - 0xFFFFFFF0) + 0x10 = 16 + 16 = 32 µJ
    let (w, _) = rapl_power_watts_wrap(curr, prev);
    assert!(w > 0.0);
}

5.2 Integration Tests (QEMU)

# Test graceful degradation when RAPL unavailable (QEMU default)
make qemu CONFIG_NAME=redbear-full
# Inside guest: rapld should start, detect no RAPL, serve scheme with "n/a"
cat /scheme/rapl/package/0/power_w  # Expected: "unavailable\n"

# Test with RAPL-capable host (bare metal only)
# On bare metal Intel/AMD:
cat /scheme/rapl/package/0/power_w  # Expected: "65.2\n" (or similar)

5.3 Hardware Validation Matrix

Platform CPU RAPL domains expected
Desktop Intel 12th-gen i5-12600K PKG, PP0, PP1, DRAM
Desktop Intel 13th-gen i9-13900K PKG, PP0, PP1, DRAM
Laptop Intel 11th-gen i7-1165G7 PKG, PP0, PP1, DRAM
Desktop AMD Zen 3 Ryzen 5950X PKG, PP0
Desktop AMD Zen 4 Ryzen 7950X PKG, PP0
Server Intel Xeon SPR PKG, PP0, DRAM, PSys
QEMU (any) None (graceful n/a)

6. Risks & Mitigations

Risk Likelihood Mitigation
MSR #GP fault on unsupported CPU High Catch fault, mark domain unavailable, cache decision
Energy counter wraps between polls Low (60s at 100W) Wrap detection in delta computation
Multi-socket power aggregation wrong Medium Per-package MSR reads on lead CPU of each package
AMD energy unit different from Intel High Auto-detect via CPUID vendor, use correct unit MSR
Kernel MSR scheme latency Medium Batch reads (read all domains in one scheme transaction)
Power limit writes brick hardware Low Validate against PKG_POWER_INFO; never write below THERMAL_SPEC
thermald + rapld race on MSR writes Low rapld owns energy reads; thermald owns limit writes; use scheme as synchronization point

7. References

Linux Kernel Source (local/reference/linux-7.1/)

File Lines Content
include/linux/intel_rapl.h 269 Core data structures, domain/primitive enums
drivers/powercap/intel_rapl_common.c ~2000 Domain definitions, powercap zone registration, energy read
drivers/powercap/intel_rapl_msr.c 602 MSR-based RAPL access (most CPUs use this)
drivers/powercap/intel_rapl_tpmi.c ~300 TPMI-based RAPL (newer Intel platforms)
drivers/powercap/powercap_sys.c ~600 Powercap framework (sysfs interface)
arch/x86/include/asm/msr-index.h L490-522 MSR address definitions
arch/x86/events/rapl.c ~800 Perf events RAPL PMU

Intel Documentation

  • Intel SDM Volume 3, Chapter 14.9: "Running Average Power Limit (RAPL)"
  • Intel SDM Volume 4: MSR definitions (0x606, 0x610-0x619, 0x638-0x641, 0x64D)

Existing RedBear Infrastructure

File Content
local/recipes/system/redbear-power/source/src/msr.rs MSR read/write via /scheme/sys/msr/ + /dev/cpu/*/msr
local/recipes/system/redbear-power/source/src/acpi.rs read_rapl_package_energy() — current sysfs fallback
local/recipes/system/redbear-power/source/src/app.rs pkg_power_w, rapl_prev — current RAPL state
local/recipes/system/redbear-power/source/src/platform.rs Platform probe (Redox vs Linux MSR backend selection)
local/recipes/system/cpufreqd/source/src/main.rs MSR 0x199 writes via /scheme/sys/msr/
local/recipes/system/thermald/source/src/main.rs MSR 0x19C/0x1A2 via Linux /dev/cpu/*/msr only

External References

Source Key Insight
Intel SDM Vol.3B §14.9 Domain hierarchy, unit register layout, power limit semantics
Intel SDM Vol.4 All MSR addresses (0x606-0x64D) and bit-field definitions
Fuchsia RFC-0203 Microkernel energy monitoring via zx_system_energy_info() syscall — kernel owns MSR access, userspace gets cooked µJ values
Schöne et al. 2021 AMD Zen 2 RAPL accuracy analysis: package domain ±5%, 1ms update rate
rapl-read.c (deater/uarch-configure) Reference userspace RAPL reader: unit parsing, counter wrap detection
AMD amd_energy driver README Per-core accumulation thread, 100s wake interval, SMT dedup

8. Deliverables

Phase Deliverable Success Criteria
1 rapld daemon serving /scheme/rapl/ Reads PKG energy on real Intel/AMD hardware, serves power in watts
1 redbear-power reads from rapld PkgW column shows real hardware power (not n/a) on supported CPUs
1 Graceful degradation on QEMU RAPL domain "unavailable" instead of crash
2 Power limit read/write /scheme/rapl/package/0/limit1_w returns TDP; writable with CAP_SYS_MSR
2 thermald power-aware throttling thermald reads RAPL power, throttles when exceeding configurable threshold
3 redbear-power multi-domain display PKG, Core, DRAM columns with real values
3 Power history sparkline 30-sample power history visible in Per-CPU tab