Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
8.4 KiB
Red Bear OS: Hardware-Accelerated 3D Assessment
Date: 2026-04-16 Scope: AMD + Intel GPU hardware OpenGL/Vulkan for KDE Plasma desktop
Bottom Line
PRIME/DMA-BUF cross-process buffer sharing is now implemented at the scheme level. GEM
allocation, PRIME export/import, and zero-copy mmap via FmapBorrowed all work through the
redox-drm scheme daemon and libdrm. The remaining gaps for hardware 3D are GPU command
submission (CS ioctl), GPU fence/signaling, and Mesa hardware Gallium driver enablement.
These are tracked separately in local/docs/DMA-BUF-IMPROVEMENT-PLAN.md.
Capability Stack
Application (KDE Plasma / Qt6 / Wayland compositor)
↓
EGL / GBM / Wayland protocol
↓
Mesa (Gallium state tracker → hardware driver) ← ONLY swrast (CPU), Redox winsys scaffolding exists
↓
libdrm (userspace DRM wrapper) ← __redox__ PRIME dispatch ✅, opens /scheme/drm
↓
DRM scheme ioctls (GEM, PRIME, render) ← GEM ✅, PRIME ✅ (DmaBuf nodes), render ❌
↓
redox-drm (userspace DRM/KMS daemon) ← display ✅, buffer sharing ✅, render ❌
↓
Kernel (FmapBorrowed, sendfd, GPU interrupts) ← buffer sharing ✅, GPU fences ❌
↓
GPU hardware (AMD RDNA / Intel Gen)
Layer-by-Layer Status
1. GPU Hardware Drivers (redox-drm + amdgpu + linux-kpi)
| Component | Status | Lines | What's Implemented |
|---|---|---|---|
| DRM/KMS modesetting | ✅ Code complete | ~500 | 16 KMS ioctls, CRTC/connector/encoder/plane |
| AMD Display Core | ✅ Compiles | ~1400 | DC init, CRTC programming, firmware loading, HPD |
| Intel Display Driver | ✅ Compiles | ~800 | Display pipe, GGTT, forcewake |
| GEM buffer management | ✅ Full | ~350 | create/close/mmap with DmaBuffer |
| GEM scheme ioctls | ✅ Wired | ~100 | GEM_CREATE, GEM_CLOSE, GEM_MMAP |
| PRIME scheme ioctls | ✅ Implemented | ~120 | PRIME_HANDLE_TO_FD + PRIME_FD_TO_HANDLE via DmaBuf nodes + export refcounting |
| libdrm PRIME dispatch | ✅ Implemented | ~30 | redox wrappers: open dmabuf path + fpath-based GEM handle extraction |
| Mesa Redox winsys | 🚧 Scaffolding | ~4 files | Directory structure + stubs in src/gallium/winsys/redox/drm/ |
| Render command submission | ❌ Missing | 0 | No CS ioctl, no ring buffer programming |
| GPU context management | ❌ Missing | 0 | No context create/destroy |
| Fence/sync objects | ❌ Missing | 0 | No GPU fence signaling |
| AMD ring buffer | ⚠️ Partial | ~100 | Page flip only, no general command submission |
2. Mesa Build Configuration
| Setting | Current Value | Needed for HW 3D |
|---|---|---|
gallium-drivers |
swrast |
swrast,radeonsi (AMD) or swrast,iris (Intel) |
vulkan-drivers |
swrast |
swrast,amd (RADV) or swrast,intel (ANV) |
platforms |
redox |
redox (same) |
| EGL | enabled | enabled (same) |
| GBM | enabled | enabled (same) |
gallium-winsys |
none (swrast doesn't need one) | New Redox winsys for radeonsi/iris |
egl/platform_redox.c |
540 lines, Orbital-backed | Needs DRM backend for HW buffers |
3. Kernel Infrastructure
| Feature | Status | Impact |
|---|---|---|
| PCI enumeration | ✅ | GPU devices discovered |
| Memory scheme (phys mmap) | ✅ | GPU register access works |
| IRQ scheme (MSI-X) | ✅ | GPU interrupts can be delivered |
| DMA-BUF fd passing | ✅ Scheme-level | FmapBorrowed + sendfd + DmaBuf nodes enable zero-copy cross-process sharing |
| GPU fence/wait | ❌ | No GPU completion signaling |
| IOMMU/GPU page tables for imports | ❌ | Imported buffers can't be mapped into GPU GTT |
The Render Path Gap
For hardware OpenGL, the data path is:
Mesa Gallium (radeonsi)
→ libdrm open("drm:card0")
→ DRM_IOCTL_GEM_CREATE (allocate GPU buffer) ← EXISTS
→ DRM_IOCTL_PRIME_HANDLE_TO_FD (export for sharing) ← ✅ IMPLEMENTED (DmaBuf node + scheme fd)
→ DRM_IOCTL_AMDGPU_CS (submit commands to GPU) ← DOES NOT EXIST
→ fence wait (GPU completion) ← DOES NOT EXIST
→ present via KMS (PAGE_FLIP) ← EXISTS
Steps 1-2 now have full scheme ioctl support with cross-process buffer sharing via DmaBuf scheme nodes, sendfd, and FmapBorrowed. Steps 3-4 (command submission, fencing) remain the critical gaps. The buffer sharing foundation is in place — compositors and clients can share GPU buffers zero-copy. The missing piece is GPU command submission for actual rendering.
What Was Implemented
| Change | Before | After |
|---|---|---|
DRM_IOCTL_GEM_CREATE |
Not in scheme | Full ioctl handler: allocate GEM buffer, track ownership |
DRM_IOCTL_GEM_CLOSE |
Not in scheme | Full ioctl handler with ownership check |
DRM_IOCTL_GEM_MMAP |
Not in scheme | Full ioctl handler: return virtual address |
DRM_IOCTL_PRIME_HANDLE_TO_FD |
EOPNOTSUPP | Full implementation: opaque export tokens, prime_exports map, dmabuf fd creation |
DRM_IOCTL_PRIME_FD_TO_HANDLE |
EOPNOTSUPP | Full implementation: accepts export token (from redox_fpath), resolves via prime_exports |
libdrm __redox__ PRIME |
Not present | drmPrimeHandleToFD opens dmabuf path via export token; drmPrimeFDToHandle extracts token via redox_fpath |
NodeKind::DmaBuf |
Not present | DmaBuf node with mmap_prep returning GEM virtual address (enables FmapBorrowed) |
gem_export_refs tracking |
Not present | BTreeMap refcount for shared GEM objects, prevents premature gem_close |
| Mesa winsys scaffolding | Not present | src/gallium/winsys/redox/drm/ stub directory structure |
What Remains (Ordered by Dependency)
Tier 1: Can be done without kernel changes
-
Mesa Gallium hardware driver enablement — Change recipe from
-Dgallium-drivers=swrastto includeradeonsioriris. This will fail to build without a winsys, but the attempt reveals the exact Mesa-side gaps. -
Redox Mesa winsys — Scaffolding exists at
src/gallium/winsys/redox/drm/(compile-time stubs). Needs real implementation of buffer allocation, PRIME export/import, and mmap. PRIME ioctls are now implemented in redox-drm and libdrm has__redox__dispatch. -
libdrm Redox backend — libdrm already has
__redox__conditional handling, opens/scheme/drm, and dispatches PRIME ioctls viaredox_fpath()and dmabuf path opening. The remaining gap is GPU-family-specific command submission ioctls.
Tier 2: Requires kernel work
-
GPU command submission — The amdgpu and Intel drivers need ring buffer programming for 3D command submission, not just page flip. This is GPU-family-specific:
- AMD: GFX ring, compute ring, SDMA ring
- Intel: render ring, blitter ring
-
GPU fence/signaling — After submitting commands, the kernel needs to signal completion back to userspace. This requires IRQ handling that maps GPU interrupts to fence objects.
Tier 3: Requires significant new code
-
GTT/PPGTT population for imported buffers — When Mesa imports a DMA-BUF into the GPU, the buffer's physical pages must be mapped into the GPU's address space. Currently only internally-allocated GEM objects get GTT mappings.
-
Mesa EGL platform extension —
platform_redox.ccurrently uses Orbital for buffer management. It needs an alternative path that uses DRM GEM for hardware-accelerated surfaces.
Estimated Effort (2 developers)
| Tier | Duration | Deliverable |
|---|---|---|
| Tier 1 (userspace) | 8-16 weeks | Mesa builds with radeonsi, winsys talks to DRM scheme |
| Tier 2 (kernel/driver) | 12-20 weeks | GPU command submission, fences, VRAM placement |
| Tier 3 (integration) | 6-12 weeks | Hardware-accelerated OpenGL applications |
| Total | 26-48 weeks | Hardware 3D on AMD |
Intel (iris) is expected to be faster than AMD (radeonsi is ~6M lines vs iris ~400k) but both are equal-priority Red Bear OS targets. The order of enablement is driven by driver complexity, not platform priority.
Relationship to Other Plans
local/docs/CONSOLE-TO-KDE-DESKTOP-PLAN.md— Phase 5 covers hardware GPU enablementlocal/docs/AMD-FIRST-INTEGRATION.md— AMD-specific GPU driver detailslocal/docs/P2-AMD-GPU-DISPLAY.md— Display driver code-complete statusdocs/04-LINUX-DRIVER-COMPAT.md— linux-kpi architecture reference