Files
RedBear-OS/local/docs/DEFERRED-FEATURES-IMPLEMENTATION-PLAN.md
T

12 KiB

Red Bear OS — Deferred GPU Features Implementation Plan

Created: 2026-06-06 · Cross-referenced: Linux 7.1 i915, Redox kernel, relibc, redox-drm Status: Analysis complete — concrete implementation plans for 7 deferred features

Architecture Context

Red Bear OS is a microkernel OS. GPU drivers run as userspace daemons communicating via schemes:

  • scheme:memory/physical — MMIO and DMA buffer allocation
  • scheme:irq — interrupt delivery
  • scheme:pci — PCI device enumeration and config
  • scheme:firmware — firmware blob serving
  • scheme:event — inter-process event queues + eventfd
  • scheme:drm — DRM/KMS ioctl surface (our daemon)

All GPU memory is allocated from the daemon's address space via DmaBuffer::allocate() which backs onto scheme:memory/physical@wb?phys_contiguous. There is no kernel swap, no kswapd, and no memory pressure notifier. Cross-process buffer sharing is limited to in-process PRIME token exchange.


Feature 1: GEM Shrinker / Eviction

Linux Pattern

Two trigger paths:

  • GTT allocation failure: Walk bound_list in scan-order LRU, skip pinned/active/scanout, evict to make room
  • Memory pressure: I915_SHRINK_BOUND on purge_list (MADV_DONTNEED), I915_SHRINK_UNBOUND for unbound objects, I915_SHRINK_ACTIVE for GPU idle + context eviction

Redox Assessment

Not applicable in its Linux form. Redox has no kernel swap, no kswapd, no memory pressure callback, and no /proc/meminfo. GPU memory is direct physical allocation via scheme:memory/physical — there's no page reclaim mechanism.

Redox Alternative

Instead of a shrinker, implement a configurable hard cap with LRU eviction within the DRM daemon:

redox-drm GemManager:
├── MAX_GEM_BYTES: u64 = 256 * 1024 * 1024  (configurable via recipe.toml)
├── eviction_queue: VecDeque<(GemHandle, Instant)>  // insertion-order LRU
├── on alloc when total > MAX_GEM_BYTES:
│   ├── Walk eviction_queue oldest-first
│   ├── Skip: pinned objects (fb-bound), active objects (GPU has fence)
│   ├── Drop DmaBuffer (frees physical pages)
│   └── Until total < watermark (75% of MAX_GEM_BYTES)
└── No kernel changes required

Effort: ~150 lines in gem.rs. Self-contained, no new schemes. Priority: P1 — prevents OOM on memory-constrained systems.


Feature 2: dma-resv / Cross-Driver Fences

Linux Pattern

Four-function minimal API:

  • dma_resv_reserve_fences(obj, num) — pre-allocate fence slots
  • dma_resv_add_fence(obj, fence, usage) — add shared/exclusive fence
  • dma_resv_wait_timeout(obj, usage, intr, timeout) — block until signaled
  • dma_resv_test_signaled(obj, usage) — non-blocking check

Fence de-duplication: same context + later-or-same seqno → replace old fence.

Redox Assessment

Partially feasible now. The scheme:event infrastructure provides the raw synchronization primitive. The SyncobjManager already has in-process fence tracking with FD interop. What's missing is cross-process sharing.

Redox Implementation

Phase 1 — In-process (feasible now, ~200 lines):

redox-drm dma_fence crate:
├── FenceContext = u64 atomic counter (dma_fence_context_alloc)
├── FenceSeqno = u64 per-context monotonic
├── FenceState: UNSIGNALED | SIGNALED | ERROR
├── FenceOps trait: get_driver_name, get_timeline_name, enable_signaling, release
└── Fence::signal() → sets SIGNALED, wakes waiters

Phase 2 — Cross-process (needs scheme:syncobj, ~500 lines):

scheme:syncobj daemon:
├── Global syncobj registry (handle → state mapping)
├── Export: daemon calls scheme:syncobj/export → gets FD
├── Import: other daemon calls scheme:syncobj/import with FD → gets local handle
├── Wait: scheme:syncobj/{handle}/wait (blocks via scheme:event)
└── Signal: scheme:syncobj/{handle}/signal

Priority: P0 — everything else (PSR, FBC, GuC submission) depends on proper fence synchronization. Effort: Phase 1 ~200 lines, Phase 2 ~500 lines + new daemon.


Feature 3: GuC/HuC Firmware Loading

Linux Pattern

DMA engine upload sequence:

  1. Write DMA_ADDR_0 (source GGTT address) + DMA_ADDR_1 (WOPCM dest with DMA_ADDRESS_SPACE_WOPCM)
  2. Write DMA_COPY_SIZE (CSS header + uCode size)
  3. Write DMA_CTRL = START_DMA | flags
  4. Poll DMA_CTRL for START_DMA clear (timeout 100ms)
  5. Write SOFT_SCRATCH(n) with H2G action + params
  6. Write GUC_SEND_INTERRUPT (Gen9) or GEN11_GUC_HOST_INTERRUPT (Gen11+)
  7. Poll GUC_STATUS[16] for GS_MIA_CORE_STATE

Key registers: GUC_STATUS (0xc000), SOFT_SCRATCH(n) (0xc180+), DMA_ADDR_0/1 (0xc300-0xc30c), DMA_COPY_SIZE (0xc310), DMA_CTRL (0xc314), DMA_GUC_WOPCM_OFFSET (0xc340), GUC_WOPCM_SIZE (0xc050).

Redox Assessment

Fully feasible now. All prerequisites met:

  • scheme:firmware daemon already serves GPU firmware blobs
  • DMC firmware already loaded via same path
  • DMA engine registers known and accessible via MMIO
  • GGTT mapping infrastructure exists in our driver

Redox Implementation (~300 lines)

redox-drm intel/guc.rs:
├── GucFirmware struct with mmio: Arc<MmioRegion>
├── upload(firmware: &[u8]) → parse CSS header → DMA transfer → poll GUC_STATUS
├── Wire into IntelDriver::new() after DMC upload
└── Add guc_fw_key field to info.rs device table per platform

Prerequisites:
├── Firmware blobs in /lib/firmware/i915/ (add to fetch-firmware.sh)
├── GGTT mapping of firmware blob (alloc 2MB below GUC_GGTT_TOP = 0xFEE00000)
└── WOPCM size register programmed before upload

Priority: P2 — needed for Gen9+ GPU scheduling. Not required for display-only. Effort: ~300 lines + firmware blob packaging.


Feature 4: PSR (Panel Self Refresh)

Linux Pattern

Dual-side enable:

  • Sink (via DP AUX): Write DP_PSR_EN_CFG = DP_PSR_ENABLE | link_standby | CRC verify
  • Source (MMIO): Write EDP_PSR_CTL = EDP_PSR_ENABLE | idle_frames | TP times | max_sleep

PSR2 adds EDP_PSR2_CTL with selective update tracking (SU_TRACK_ENABLE, Y_COORDINATE).

Frontbuffer tracking origins: ORIGIN_CS (GPU write → exit PSR), ORIGIN_DIRTYFB (CPU dirty).

Key registers: EDP_PSR_CTL (0x60800 + transcoder*0x100), EDP_PSR_STATUS (0x60840), TRANS_EXITLINE (0x70034).

Redox Assessment

Feasible now. Prerequisites:

  • eDP panel with DP_PSR_EN_CFG support in DPCD
  • DMC firmware loaded (required for PSR)
  • VBT timing data (tp1_wakeup_time_us, tp2_tp3_wakeup_time_us, idle_frames)
  • No interlaced mode, no per-pixel alpha

Redox Implementation (~200 lines)

redox-drm intel/display_psr.rs:
├── PsrState struct with mmio, enabled: bool, psr2: bool
├── enable() → AUX write DP_PSR_EN_CFG → MMIO write EDP_PSR_CTL
├── disable() → AUX write DP_PSR_EN_CFG=0 → MMIO disable
├── flush() → AUX exit → wait vblank → re-enable
└── Wire into set_crtc (enable after modeset on eDP) and page_flip (flush)

Prerequisites:
├── eDP connector detection (already working)
├── DP AUX channel (already working)
└── DMC firmware loaded (already working)

Priority: P2 — power savings for laptop/embedded use. Not needed for desktop. Effort: ~200 lines.


Feature 5: FBC (FrameBuffer Compression)

Linux Pattern

Compression trigger on primary plane commit:

  1. Clear FBC tags
  2. Program DPFC_CB_BASE = stolen memory offset (4k aligned)
  3. Write DPFC_CONTROL = DPFC_CTL_EN | DPFC_CTL_LIMIT_1X | DPFC_CTL_PLANE(pipe) | fence
  4. Poll DPFC_STATUS for FBC_STAT_COMPRESSING clear

Nuke on frontbuffer modify: rewrite DSPADDR to trigger re-compression.

Key registers: DPFC_CB_BASE (0x3200), DPFC_CONTROL (0x3208), DPFC_STATUS (0x3210), DPFC_FENCE_YOFF (0x3218). ILK+: ILK_DPFC_CONTROL(fbc_id) (0x43208).

Redox Assessment

Feasible now. Prerequisites:

  • Stolen memory reservation (~2048KB at 4k alignment)
  • Primary plane with linear or X-tiled buffer
  • Stride 512-byte aligned (SKL+), no rotation, no interlaced
  • Fence register for nuke-on-dirty

Redox Implementation (~200 lines)

redox-drm intel/display_fbc.rs:
├── FbcState struct with mmio, enabled: bool, cfb_base: u64
├── enable(fb_info) → check constraints → program DPFC_CB_BASE → DPFC_CONTROL
├── disable() → clear DPFC_CTL_EN
├── nuke() → rewrite DSPADDR → poll DPFC_STATUS
└── Wire into page_flip (enable on new FB, nuke on modify)

Prerequisites:
├── Stolen memory reservation in GGTT
├── Fence register setup (already have GGTT infrastructure)
└── Plane format/stride constraint checking

Priority: P3 — memory bandwidth savings. Optimization, not required for enablement. Effort: ~200 lines.


Feature 6: DP MST (Multi-Stream Transport)

Linux Pattern

DP AUX sideband messaging for topology discovery and stream allocation:

  • PATH_REPLY messages for topology enumeration
  • CONNECTION_STATUS_NOTIFY for hotplug
  • ALLOCATE_PAYLOAD for virtual channel allocation
  • REMOTE_DPCD_READ/WRITE for remote sink access

Redox Assessment

Feasible but protocol-heavy. Prerequisites:

  • DP AUX channel (already working)
  • Sideband message parsing (new protocol layer)
  • Topology manager (new state machine)

Redox Implementation (~500 lines)

redox-drm intel/dp_mst.rs:
├── MstTopology struct: Vec<MstPort> tree
├── MstPort: port_number, peer_device_type, dpcd_rev, mst_cap
├── enumerate() → sideband PATH_REPLY messages → build topology tree
├── allocate_stream(port, bw) → ALLOCATE_PAYLOAD message → virtual channel
└── Wire into connector detection for DP sinks with MST_CAP

Prerequisites:
├── DP AUX channel (already working)
└── Sideband message handler (new ~300 lines)

Priority: P4 — multi-monitor support. Important but not urgent. Effort: ~500 lines.


Feature 7: HDMI/DP Audio

Linux Pattern

Three-layer audio stack:

  • HDA controller: CORB/RIRB command rings, stream descriptors, DMA engine
  • ELD (EDID-Like Data): retrieved from display sink, programs audio infoframe
  • Audio infoframe: HDMI/DP specific, carries channel count, sample rate, speaker allocation

Redox Assessment

Partially feasible. Prerequisites partially met:

  • Intel HDA driver (ihdad) exists in local/sources/base/drivers/audio/ihdad/
  • audiod mixer daemon exists with scheme:audio and scheme:audiohw
  • USB Audio daemon (redbear-usbaudiod) is a stub — must be replaced with real UAC driver
  • No ALSA compatibility layer

Redox Implementation

Not recommended for immediate implementation. The audio stack needs:

  1. Real USB Audio Class driver (replace redbear-usbaudiod stub) — ~500 lines
  2. Audio infoframe programming in HDMI/DP output path — ~200 lines
  3. ELD retrieval from display sink via DP AUX — ~100 lines
  4. Integration with existing audiod mixer

Priority: P5 — blocked on USB audio driver completion. Effort: ~800 lines across multiple daemons.


Implementation Priority Matrix

Priority Feature Lines New Schemes Prerequisites Met? Impact
P0 dma-fence (in-process) 200 None All met Everything depends on fences
P1 GEM LRU eviction 150 None All met Prevents OOM
P2 GuC firmware 300 None All met Enables Gen9+ GPU scheduling
P2 PSR 200 None DMC + eDP + AUX Laptop power savings
P3 FBC 200 None Stolen mem + fence Memory bandwidth savings
P4 DP MST 500 None DP AUX Multi-monitor support
P5 dma-fence (cross-proc) 500 scheme:syncobj No cross-proc fd passing Cross-driver sync
P5 HDMI/DP audio 800 None (uses existing) USB audio is stub Audio output

Total P0-P3 effort: ~850 lines across 4 new modules. All feasible now with zero new scheme infrastructure.