12 KiB
Red Bear OS — Deferred GPU Features Implementation Plan
Created: 2026-06-06 · Cross-referenced: Linux 7.1 i915, Redox kernel, relibc, redox-drm Status: Analysis complete — concrete implementation plans for 7 deferred features
Architecture Context
Red Bear OS is a microkernel OS. GPU drivers run as userspace daemons communicating via schemes:
scheme:memory/physical— MMIO and DMA buffer allocationscheme:irq— interrupt deliveryscheme:pci— PCI device enumeration and configscheme:firmware— firmware blob servingscheme:event— inter-process event queues + eventfdscheme:drm— DRM/KMS ioctl surface (our daemon)
All GPU memory is allocated from the daemon's address space via DmaBuffer::allocate() which
backs onto scheme:memory/physical@wb?phys_contiguous. There is no kernel swap, no kswapd,
and no memory pressure notifier. Cross-process buffer sharing is limited to in-process PRIME
token exchange.
Feature 1: GEM Shrinker / Eviction
Linux Pattern
Two trigger paths:
- GTT allocation failure: Walk
bound_listin scan-order LRU, skip pinned/active/scanout, evict to make room - Memory pressure:
I915_SHRINK_BOUNDonpurge_list(MADV_DONTNEED),I915_SHRINK_UNBOUNDfor unbound objects,I915_SHRINK_ACTIVEfor GPU idle + context eviction
Redox Assessment
Not applicable in its Linux form. Redox has no kernel swap, no kswapd, no memory pressure callback, and no /proc/meminfo. GPU memory is direct physical allocation via scheme:memory/physical — there's no page reclaim mechanism.
Redox Alternative
Instead of a shrinker, implement a configurable hard cap with LRU eviction within the DRM daemon:
redox-drm GemManager:
├── MAX_GEM_BYTES: u64 = 256 * 1024 * 1024 (configurable via recipe.toml)
├── eviction_queue: VecDeque<(GemHandle, Instant)> // insertion-order LRU
├── on alloc when total > MAX_GEM_BYTES:
│ ├── Walk eviction_queue oldest-first
│ ├── Skip: pinned objects (fb-bound), active objects (GPU has fence)
│ ├── Drop DmaBuffer (frees physical pages)
│ └── Until total < watermark (75% of MAX_GEM_BYTES)
└── No kernel changes required
Effort: ~150 lines in gem.rs. Self-contained, no new schemes.
Priority: P1 — prevents OOM on memory-constrained systems.
Feature 2: dma-resv / Cross-Driver Fences
Linux Pattern
Four-function minimal API:
dma_resv_reserve_fences(obj, num)— pre-allocate fence slotsdma_resv_add_fence(obj, fence, usage)— add shared/exclusive fencedma_resv_wait_timeout(obj, usage, intr, timeout)— block until signaleddma_resv_test_signaled(obj, usage)— non-blocking check
Fence de-duplication: same context + later-or-same seqno → replace old fence.
Redox Assessment
Partially feasible now. The scheme:event infrastructure provides the raw synchronization primitive. The SyncobjManager already has in-process fence tracking with FD interop. What's missing is cross-process sharing.
Redox Implementation
Phase 1 — In-process (feasible now, ~200 lines):
redox-drm dma_fence crate:
├── FenceContext = u64 atomic counter (dma_fence_context_alloc)
├── FenceSeqno = u64 per-context monotonic
├── FenceState: UNSIGNALED | SIGNALED | ERROR
├── FenceOps trait: get_driver_name, get_timeline_name, enable_signaling, release
└── Fence::signal() → sets SIGNALED, wakes waiters
Phase 2 — Cross-process (needs scheme:syncobj, ~500 lines):
scheme:syncobj daemon:
├── Global syncobj registry (handle → state mapping)
├── Export: daemon calls scheme:syncobj/export → gets FD
├── Import: other daemon calls scheme:syncobj/import with FD → gets local handle
├── Wait: scheme:syncobj/{handle}/wait (blocks via scheme:event)
└── Signal: scheme:syncobj/{handle}/signal
Priority: P0 — everything else (PSR, FBC, GuC submission) depends on proper fence synchronization. Effort: Phase 1 ~200 lines, Phase 2 ~500 lines + new daemon.
Feature 3: GuC/HuC Firmware Loading
Linux Pattern
DMA engine upload sequence:
- Write
DMA_ADDR_0(source GGTT address) +DMA_ADDR_1(WOPCM dest withDMA_ADDRESS_SPACE_WOPCM) - Write
DMA_COPY_SIZE(CSS header + uCode size) - Write
DMA_CTRL=START_DMA| flags - Poll
DMA_CTRLforSTART_DMAclear (timeout 100ms) - Write
SOFT_SCRATCH(n)with H2G action + params - Write
GUC_SEND_INTERRUPT(Gen9) orGEN11_GUC_HOST_INTERRUPT(Gen11+) - Poll
GUC_STATUS[16]forGS_MIA_CORE_STATE
Key registers: GUC_STATUS (0xc000), SOFT_SCRATCH(n) (0xc180+), DMA_ADDR_0/1 (0xc300-0xc30c), DMA_COPY_SIZE (0xc310), DMA_CTRL (0xc314), DMA_GUC_WOPCM_OFFSET (0xc340), GUC_WOPCM_SIZE (0xc050).
Redox Assessment
Fully feasible now. All prerequisites met:
scheme:firmwaredaemon already serves GPU firmware blobs- DMC firmware already loaded via same path
- DMA engine registers known and accessible via MMIO
- GGTT mapping infrastructure exists in our driver
Redox Implementation (~300 lines)
redox-drm intel/guc.rs:
├── GucFirmware struct with mmio: Arc<MmioRegion>
├── upload(firmware: &[u8]) → parse CSS header → DMA transfer → poll GUC_STATUS
├── Wire into IntelDriver::new() after DMC upload
└── Add guc_fw_key field to info.rs device table per platform
Prerequisites:
├── Firmware blobs in /lib/firmware/i915/ (add to fetch-firmware.sh)
├── GGTT mapping of firmware blob (alloc 2MB below GUC_GGTT_TOP = 0xFEE00000)
└── WOPCM size register programmed before upload
Priority: P2 — needed for Gen9+ GPU scheduling. Not required for display-only. Effort: ~300 lines + firmware blob packaging.
Feature 4: PSR (Panel Self Refresh)
Linux Pattern
Dual-side enable:
- Sink (via DP AUX): Write
DP_PSR_EN_CFG=DP_PSR_ENABLE| link_standby | CRC verify - Source (MMIO): Write
EDP_PSR_CTL=EDP_PSR_ENABLE| idle_frames | TP times | max_sleep
PSR2 adds EDP_PSR2_CTL with selective update tracking (SU_TRACK_ENABLE, Y_COORDINATE).
Frontbuffer tracking origins: ORIGIN_CS (GPU write → exit PSR), ORIGIN_DIRTYFB (CPU dirty).
Key registers: EDP_PSR_CTL (0x60800 + transcoder*0x100), EDP_PSR_STATUS (0x60840), TRANS_EXITLINE (0x70034).
Redox Assessment
Feasible now. Prerequisites:
- eDP panel with
DP_PSR_EN_CFGsupport in DPCD - DMC firmware loaded (required for PSR)
- VBT timing data (
tp1_wakeup_time_us,tp2_tp3_wakeup_time_us,idle_frames) - No interlaced mode, no per-pixel alpha
Redox Implementation (~200 lines)
redox-drm intel/display_psr.rs:
├── PsrState struct with mmio, enabled: bool, psr2: bool
├── enable() → AUX write DP_PSR_EN_CFG → MMIO write EDP_PSR_CTL
├── disable() → AUX write DP_PSR_EN_CFG=0 → MMIO disable
├── flush() → AUX exit → wait vblank → re-enable
└── Wire into set_crtc (enable after modeset on eDP) and page_flip (flush)
Prerequisites:
├── eDP connector detection (already working)
├── DP AUX channel (already working)
└── DMC firmware loaded (already working)
Priority: P2 — power savings for laptop/embedded use. Not needed for desktop. Effort: ~200 lines.
Feature 5: FBC (FrameBuffer Compression)
Linux Pattern
Compression trigger on primary plane commit:
- Clear FBC tags
- Program
DPFC_CB_BASE= stolen memory offset (4k aligned) - Write
DPFC_CONTROL=DPFC_CTL_EN|DPFC_CTL_LIMIT_1X|DPFC_CTL_PLANE(pipe)| fence - Poll
DPFC_STATUSforFBC_STAT_COMPRESSINGclear
Nuke on frontbuffer modify: rewrite DSPADDR to trigger re-compression.
Key registers: DPFC_CB_BASE (0x3200), DPFC_CONTROL (0x3208), DPFC_STATUS (0x3210), DPFC_FENCE_YOFF (0x3218). ILK+: ILK_DPFC_CONTROL(fbc_id) (0x43208).
Redox Assessment
Feasible now. Prerequisites:
- Stolen memory reservation (~2048KB at 4k alignment)
- Primary plane with linear or X-tiled buffer
- Stride 512-byte aligned (SKL+), no rotation, no interlaced
- Fence register for nuke-on-dirty
Redox Implementation (~200 lines)
redox-drm intel/display_fbc.rs:
├── FbcState struct with mmio, enabled: bool, cfb_base: u64
├── enable(fb_info) → check constraints → program DPFC_CB_BASE → DPFC_CONTROL
├── disable() → clear DPFC_CTL_EN
├── nuke() → rewrite DSPADDR → poll DPFC_STATUS
└── Wire into page_flip (enable on new FB, nuke on modify)
Prerequisites:
├── Stolen memory reservation in GGTT
├── Fence register setup (already have GGTT infrastructure)
└── Plane format/stride constraint checking
Priority: P3 — memory bandwidth savings. Optimization, not required for enablement. Effort: ~200 lines.
Feature 6: DP MST (Multi-Stream Transport)
Linux Pattern
DP AUX sideband messaging for topology discovery and stream allocation:
PATH_REPLYmessages for topology enumerationCONNECTION_STATUS_NOTIFYfor hotplugALLOCATE_PAYLOADfor virtual channel allocationREMOTE_DPCD_READ/WRITEfor remote sink access
Redox Assessment
Feasible but protocol-heavy. Prerequisites:
- DP AUX channel (already working)
- Sideband message parsing (new protocol layer)
- Topology manager (new state machine)
Redox Implementation (~500 lines)
redox-drm intel/dp_mst.rs:
├── MstTopology struct: Vec<MstPort> tree
├── MstPort: port_number, peer_device_type, dpcd_rev, mst_cap
├── enumerate() → sideband PATH_REPLY messages → build topology tree
├── allocate_stream(port, bw) → ALLOCATE_PAYLOAD message → virtual channel
└── Wire into connector detection for DP sinks with MST_CAP
Prerequisites:
├── DP AUX channel (already working)
└── Sideband message handler (new ~300 lines)
Priority: P4 — multi-monitor support. Important but not urgent. Effort: ~500 lines.
Feature 7: HDMI/DP Audio
Linux Pattern
Three-layer audio stack:
- HDA controller: CORB/RIRB command rings, stream descriptors, DMA engine
- ELD (EDID-Like Data): retrieved from display sink, programs audio infoframe
- Audio infoframe: HDMI/DP specific, carries channel count, sample rate, speaker allocation
Redox Assessment
Partially feasible. Prerequisites partially met:
- Intel HDA driver (
ihdad) exists inlocal/sources/base/drivers/audio/ihdad/ audiodmixer daemon exists withscheme:audioandscheme:audiohw- USB Audio daemon (
redbear-usbaudiod) is a stub — must be replaced with real UAC driver - No ALSA compatibility layer
Redox Implementation
Not recommended for immediate implementation. The audio stack needs:
- Real USB Audio Class driver (replace
redbear-usbaudiodstub) — ~500 lines - Audio infoframe programming in HDMI/DP output path — ~200 lines
- ELD retrieval from display sink via DP AUX — ~100 lines
- Integration with existing
audiodmixer
Priority: P5 — blocked on USB audio driver completion. Effort: ~800 lines across multiple daemons.
Implementation Priority Matrix
| Priority | Feature | Lines | New Schemes | Prerequisites Met? | Impact |
|---|---|---|---|---|---|
| P0 | dma-fence (in-process) | 200 | None | ✅ All met | Everything depends on fences |
| P1 | GEM LRU eviction | 150 | None | ✅ All met | Prevents OOM |
| P2 | GuC firmware | 300 | None | ✅ All met | Enables Gen9+ GPU scheduling |
| P2 | PSR | 200 | None | ✅ DMC + eDP + AUX | Laptop power savings |
| P3 | FBC | 200 | None | ✅ Stolen mem + fence | Memory bandwidth savings |
| P4 | DP MST | 500 | None | ✅ DP AUX | Multi-monitor support |
| P5 | dma-fence (cross-proc) | 500 | scheme:syncobj | ❌ No cross-proc fd passing | Cross-driver sync |
| P5 | HDMI/DP audio | 800 | None (uses existing) | ❌ USB audio is stub | Audio output |
Total P0-P3 effort: ~850 lines across 4 new modules. All feasible now with zero new scheme infrastructure.