diff --git a/local/docs/IRQ-AND-LOWLEVEL-CONTROLLERS-ENHANCEMENT-PLAN.md b/local/docs/IRQ-AND-LOWLEVEL-CONTROLLERS-ENHANCEMENT-PLAN.md new file mode 100644 index 00000000..cd96ce75 --- /dev/null +++ b/local/docs/IRQ-AND-LOWLEVEL-CONTROLLERS-ENHANCEMENT-PLAN.md @@ -0,0 +1,560 @@ +# Red Bear OS IRQ and Low-Level Controllers Enhancement Plan + +## Purpose + +This document assesses the current IRQ and low-level controller implementation in Red Bear OS for +completeness and quality, then defines the next enhancement plan in execution order. + +It is grounded in the current repository state, especially: + +- `local/recipes/drivers/redox-driver-sys/` +- `local/recipes/drivers/linux-kpi/` +- `local/recipes/gpu/redox-drm/` +- `local/recipes/system/iommu/` +- `recipes/core/kernel/source/src/acpi/` +- `recipes/core/base/source/drivers/acpid/` +- `local/docs/IOMMU-SPEC-REFERENCE.md` +- `local/docs/ACPI-FIXES.md` +- `docs/04-LINUX-DRIVER-COMPAT.md` + +The goal is not to restate that these pieces compile, but to separate: + +- what exists architecturally, +- what is only build-validated, +- what is runtime-validated, +- and what still needs focused enhancement work. + +## Evidence Model + +This plan uses four different evidence buckets and does **not** treat them as equivalent: + +- **Checked-in source** — what is visible directly in the current source tree. +- **Local patch state** — behavior carried by `local/patches/*` that may not be visible in the + unpacked upstream source snapshot until patches are applied. +- **Build-validated** — code or recipes compile successfully. +- **Runtime-validated** — behavior has been exercised in a real boot/runtime path. + +Where a statement depends on local patches instead of the visible source snapshot, that is called +out explicitly below. + +## Controller Inventory and Ownership + +| Area | Primary owner | Main entry points | Current evidence class | +|---|---|---|---| +| LAPIC / xAPIC / x2APIC | kernel | `recipes/core/kernel/source/src/acpi/madt/`, `arch/x86_shared/device/local_apic.rs` | source + local patch + boot/runtime evidence | +| IOAPIC / IRQ overrides | kernel | `recipes/core/kernel/source/src/arch/x86_shared/device/ioapic.rs`, MADT ISO parsing | source | +| Legacy PIC | kernel | `arch/x86_shared/device/pic.rs` | source | +| ACPI power/reset methods | userspace `acpid` | `recipes/core/base/source/drivers/acpid/src/acpi.rs` plus local base patch | source + local patch + runtime evidence | +| HPET / timer tables | kernel | `recipes/core/kernel/source/src/acpi/hpet.rs` | source | +| PIT fallback timer | kernel | `recipes/core/kernel/source/src/arch/x86_shared/device/mod.rs`, `pit.rs` | source | +| PCI interrupt plumbing | userspace `pcid` / driver layer | `recipes/core/base/source/drivers/pci/`, `scheme:irq`, `scheme:pci` | source + runtime evidence | +| Driver IRQ abstraction | `redox-driver-sys` | `local/recipes/drivers/redox-driver-sys/source/src/irq.rs` | source | +| Linux IRQ compatibility | `linux-kpi` | `local/recipes/drivers/linux-kpi/source/` headers | source | +| GPU MSI/MSI-X usage | `redox-drm` | `local/recipes/gpu/redox-drm/source/` | source + build evidence | +| IOMMU / interrupt remapping | `iommu` daemon | `local/recipes/system/iommu/source/src/main.rs`, `local/docs/IOMMU-SPEC-REFERENCE.md` | source + build evidence | +| Kernel serio / PS2 path | kernel `serio` + userspace `ps2d` | `recipes/core/kernel/source/src/scheme/serio.rs`, `recipes/core/base/source/drivers/input/ps2d/src/main.rs` | source | +| Input controller path | `inputd` / `evdevd` / `udev-shim` | base driver + local system recipes | source + runtime evidence | +| USB xHCI host controller | userspace `xhcid` | `recipes/core/base/source/drivers/usb/xhcid/src/main.rs` | source + build evidence | +| Port I/O / legacy controller access | kernel + `redox-driver-sys` | `iopl`, `io.rs`, legacy driver code | source | +| Legacy IRQ dispatch / ownership map | kernel | `recipes/core/kernel/source/src/arch/x86_shared/interrupt/irq.rs` | source | + +## Current State Summary + +### What is already in place + +Red Bear OS already has a meaningful low-level controller and interrupt foundation: + +- ACPI boot, FADT power control, visible MADT parsing for LAPIC/IOAPIC/interrupt overrides, and + HPET initialization are in place in the checked-in source. +- Additional MADT x2APIC / NMI / power-method handling exists in the local patch set and in prior + runtime validation notes, but that behavior should not be conflated with the unpatched source + snapshot. +- `redox-driver-sys` provides userspace driver primitives for MMIO, DMA, PCI access, IRQ handles, + MSI-X table mapping, and IRQ affinity control. +- `linux-kpi` exposes Linux-style IRQ, PCI, memory, and synchronization APIs on top of + `redox-driver-sys`. +- `redox-drm` already contains a shared interrupt abstraction with MSI-X-first and legacy-IRQ + fallback paths for GPU drivers. +- The AMD-Vi / Intel VT-d reference material and the in-tree `iommu` daemon establish a serious + implementation direction for IOMMU and interrupt-remapping work. + +### What is still weak + +The dominant weakness is not missing abstractions. It is missing runtime proof and uneven +controller-specific validation. + +- MSI-X support exists architecturally but is still weak on hardware validation. +- IOMMU support is specification-rich and code-rich, but still unvalidated on real hardware. +- IRQ routing quality-of-service remains primitive: raw wait handles exist, but balancing, + coalescing, and validation of affinity behavior remain thin. +- Input stacks (`inputd`, `evdevd`, `udev-shim`) now exist as a runtime substrate, but the exact + end-to-end interrupt-to-consumer path still needs sustained validation discipline. +- Low-level controller quality is uneven: ACPI/APIC are much further along than IOMMU, MSI-X, and + controller-specific runtime characterization. + +## Architectural Assessment + +### 1. IRQ delivery architecture + +The project’s IRQ delivery model is fundamentally sound. + +- Kernel/platform side routes interrupts through APIC/x2APIC infrastructure. +- Userspace consumes interrupts through `scheme:irq` handles. +- MSI-X vector allocation is already modeled per CPU via the IRQ scheme. + +This is the right design for Red Bear OS. The main enhancement need is validation and quality, not +an architectural rewrite. + +### 2. PCI and MSI/MSI-X + +The PCI and MSI-X model is one of the strongest parts of the current stack. + +- Config-space access exists. +- Capability parsing exists. +- MSI-X table mapping exists. +- GPU drivers already use the abstraction. + +The gap is that the repository still talks too often in “compiles” language instead of “validated on +hardware with real interrupts firing” language. + +Current runtime-proof entrypoint now present in-tree: + +- `local/scripts/test-msix-qemu.sh` — QEMU/UEFI boot path that verifies live `virtio-net` + initialization reporting `virtio: using MSI-X` + +### 3. IOMMU and interrupt remapping + +IOMMU is the most important low-level controller area that is still incomplete in practice. + +- The implementation direction is correct. +- The data structures and register model are already documented deeply. +- But the hardware-validation story is still effectively open, and current daemon discovery is still + only partially integrated: the daemon now searches common IVRS table locations automatically, but + full platform-native discovery and hardware validation are still open. +- The current QEMU path now reaches AMD-Vi unit detection and `scheme:iommu` registration without + crashing at daemon startup, but unit initialization is still deferred and real hardware validation + remains open. +- The current guest-driven first-use proof now reaches AMD-Vi MMIO reads in QEMU (`control=0x0`, + `status=0x0`), but still dies during the completion path with a CPU-side page fault while touching + the completion-store region. That narrows the remaining blocker to DMA mapping/page-coverage + behavior rather than to missing discovery, missing scheme wiring, or unreadable MMIO registers. + +This makes IOMMU the highest-value long-term controller enhancement area after basic MSI-X runtime +validation. + +### 4. Input/controller path + +The input/controller path is no longer missing. It is now a quality and observability problem. + +- `inputd` exists. +- `evdevd` exists. +- `udev-shim` exists. +- Phase 3 validation helpers exist. + +The enhancement task is to keep turning these from “service present” into “interrupt path proven,” +especially under real runtime scenarios. + +## Completeness Assessment by Area + +### ACPI / APIC / x2APIC + +**State**: materially complete for current platform bring-up goals. + +**Important source note**: the checked-in MADT parser in +`recipes/core/kernel/source/src/acpi/madt/mod.rs` visibly handles `LocalApic`, `IoApic`, + `IntSrcOverride`, `Gicc`, and `Gicd`. Additional x2APIC/NMI support referenced elsewhere in the + repo is currently evidenced through the local patch set and prior validation notes rather than the + plain source snapshot alone. + +Strengths: + +- MADT entries for xAPIC/x2APIC/NMI are handled. +- ACPI reboot/shutdown/power methods exist. +- x2APIC and SMP platform bring-up have already crossed the foundational threshold. + +Open enhancement items: + +- Better controller/runtime characterization on diverse hardware. +- Clearer documentation for what is kernel-complete versus only tested on limited platforms. + +### IOAPIC / interrupt source override routing + +**State**: present in ACPI parsing, but less explicitly validated than LAPIC/x2APIC paths. + +Concrete checked-in owner: + +- `recipes/core/kernel/source/src/arch/x86_shared/device/ioapic.rs` +- `recipes/core/kernel/source/src/acpi/madt/mod.rs` + +Open enhancement items: + +- explicit validation of interrupt source overrides on more real machines +- repo-visible test notes for IOAPIC routing behavior + +### HPET / timer controller surface + +**State**: present, but still thinly characterized. + +Concrete checked-in owner: + +- `recipes/core/kernel/source/src/acpi/hpet.rs` + +Open enhancement items: + +- runtime verification beyond “initialized from ACPI” +- clearer single-HPET limitation documentation + +### PIT fallback timer path + +**State**: explicit checked-in fallback controller path. + +Concrete checked-in owner: + +- `recipes/core/kernel/source/src/arch/x86_shared/device/mod.rs` +- `recipes/core/kernel/source/src/arch/x86_shared/device/pit.rs` + +Current behavior: + +- the kernel prefers HPET when available +- if HPET initialization fails or is unavailable, it falls back to PIT +- PIT interrupt ticks currently drive timeout and scheduler timing paths + +Open enhancement items: + +- document runtime characterization of PIT-only boots +- clarify timer-source selection evidence in validation notes + +### PCI interrupt plumbing / MSI / MSI-X + +**State**: architecturally strong, validation-incomplete. + +Open enhancement items: + +- real hardware MSI-X proof for AMD and Intel GPU paths +- controller-level observability for vector allocation and affinity behavior +- testable records of fallback behavior between MSI-X and legacy IRQs + +Current runtime-validation surface now present in-tree: + +- `local/scripts/test-msix-qemu.sh` — boots a Red Bear image and confirms a live MSI-X path via + `virtio-net` log evidence in QEMU + +### IOMMU / interrupt remapping + +**State**: the biggest completeness gap. + +Concrete checked-in owner: + +- `local/recipes/system/iommu/source/src/main.rs` +- `local/docs/IOMMU-SPEC-REFERENCE.md` + +Open enhancement items: + +- real AMD-Vi initialization validation +- event log and fault-path validation +- interrupt remapping validation under device load +- explicit distinction between “daemon builds” and “controller works” +- replacement of `IOMMU_IVRS_PATH`-only discovery with real system discovery/integration +- diagnosis/fix for the remaining QEMU first-use blocker where completion-store CPU access faults + even after MMIO reads and multiple completion-store placement strategies succeed structurally + +Current implementation improvement: + +- the daemon no longer depends only on `IOMMU_IVRS_PATH`; it now searches common IVRS table paths + automatically before falling back to the environment variable override +- daemon startup now defers AMD-Vi unit initialization until first scheme use, which keeps the + QEMU validation path alive long enough to prove detection plus `scheme:iommu` registration +- a guest-driven self-test path now exists (`/usr/bin/iommu --self-test-init` via + `redbear-phase-iommu-check` / `test-iommu-qemu.sh`) and proves that the remaining failure is in + runtime completion/DMA-page handling, not in daemon startup or bare MMIO readability + +### Legacy IRQ ownership and dispatch map + +**State**: explicit checked-in kernel ownership exists, but it is under-documented in higher-level +controller discussions. + +Concrete checked-in owner: + +- `recipes/core/kernel/source/src/arch/x86_shared/interrupt/irq.rs` + +Current covered paths include: + +- PIT timer interrupt handling +- keyboard and mouse interrupt delivery +- serial COM1/COM2 delivery +- PIC/APIC mask, acknowledge, and EOI behavior +- spurious IRQ accounting for IRQ7 and IRQ15 + +Open enhancement items: + +- document legacy IRQ ownership and routing expectations explicitly in validation notes +- record PIC-vs-APIC runtime behavior on more hardware classes + +### Kernel `serio` / PS2 controller path + +**State**: present and important, but easy to miss if input work is described only in terms of the +later `evdevd`/`udev-shim` stack. + +Concrete checked-in owner: + +- `recipes/core/kernel/source/src/scheme/serio.rs` +- `recipes/core/base/source/drivers/input/ps2d/src/main.rs` + +Current behavior: + +- the kernel owns the serio byte queues to avoid PS/2 controller races +- `ps2d` consumes `/scheme/serio/0` and `/scheme/serio/1` +- that path then feeds the broader input producer chain + +Open enhancement items: + +- keep validation language explicit about the PS/2 path versus the later generic input stack +- add platform notes for systems that still rely on PS/2 keyboard/mouse delivery + +### USB xHCI controller interrupt path + +**State**: present, but not honestly interrupt-complete in the checked-in source. + +Concrete checked-in owner: + +- `recipes/core/base/source/drivers/usb/xhcid/src/main.rs` + +Current behavior: + +- xHCI has MSI/MSI-X and legacy INTx detection logic in source +- the hardwired polling override in `xhcid` has been removed, and the driver now uses the existing + MSI-X / MSI / INTx selection logic again +- `local/scripts/test-xhci-irq-qemu.sh --check` now provides a repo-visible runtime proof path by + booting a Red Bear image in QEMU and checking the xHCI interrupt-mode log output +- `redox-driver-sys` now logs allocated MSI-X vectors so interrupt selection is more observable in + runtime logs + +Open enhancement items: + +- validate the restored interrupt path beyond early boot/logging, especially event-ring behavior +- validate the checked-in event-ring growth path under sustained runtime/device activity + +### Port I/O / legacy controller support + +**State**: exists, but under-characterized. + +Concrete current consumers/owners include: + +- legacy PIC handling in `recipes/core/kernel/source/src/arch/x86_shared/device/pic.rs` +- port-I/O wrappers in `local/recipes/drivers/redox-driver-sys/source/src/io.rs` +- ACPI reset fallback via keyboard-controller port writes in the base/acpid patch path documented in + `local/docs/ACPI-FIXES.md` + +Open enhancement items: + +- determine which real devices still need the port-I/O path +- validate that the current wrappers are sufficient for those devices + +## Quality Assessment + +### Strong points + +- The layering is correct: kernel/platform routing below, userspace schemes and driver wrappers + above. +- The repository already has serious implementation artifacts, not just speculative plans. +- The low-level controller work is documented more deeply than many higher-level desktop areas. +- ACPI and early-platform work is significantly more mature than the rest of the low-level stack. + +### Weak points + +- Validation language is still inconsistent across docs. “builds” and “validated” are too often + treated as adjacent states when they are not. +- IOMMU progress is easy to overread because the spec reference is detailed, but the runtime proof + and discovery story are not there yet. +- Some controller areas are rich in abstractions but poor in operator-facing validation procedures. +- Hardware-controller quality is still under-documented in terms of negative results and known + failure modes. +- Earlier summaries in the repo can blur checked-in source, local patches, and validated runtime + behavior; this document should be used to keep those categories separate. +- Broad category labels can hide concrete controller owners unless PIT, `serio`/PS2, legacy IRQ + dispatch, and xHCI are named explicitly. + +## Enhancement Priorities + +## Priority 1 — MSI-X runtime validation on real devices + +Goal: move MSI-X from “implemented abstraction” to “repeatedly proven behavior.” + +Deliverables: + +- explicit AMD GPU MSI-X validation notes +- explicit Intel GPU MSI-X validation notes +- verified fallback behavior to legacy IRQs when MSI-X is unavailable +- logged CPU/vector affinity behavior in real runs + +Why first: + +This is the lowest-level controller feature that already exists in the main runtime driver path and +blocks confidence in GPU/display work above it. + +## Priority 2 — IOMMU hardware bring-up and fault-path validation + +Goal: move IOMMU from spec-driven implementation to actual controller bring-up. + +Deliverables: + +- validated AMD-Vi daemon initialization on real hardware +- device table / command buffer / event log validation +- explicit interrupt-remapping validation notes +- negative-result documentation if hardware still fails + +Why second: + +It is the largest remaining low-level completeness gap, and it affects the safety and correctness of +userspace driver DMA. + +## Priority 3 — IRQ quality-of-service and observability + +Goal: make IRQ behavior easier to reason about in production. + +Deliverables: + +- better logging/telemetry around allocated IRQs and vectors +- explicit affinity-validation procedures +- measured notes on whether current userspace IRQ wait behavior is good enough for display/input + latency needs + +Why third: + +This improves reliability without changing the underlying architecture. + +## Priority 4 — input/controller runtime proof + +Goal: continue turning the existing input substrate into a well-proven low-level controller path. + +Deliverables: + +- sustained validation of `inputd` → `evdevd` → consumer path +- documentation of real interrupt-backed input evidence, not only service existence +- explicit known limitations for consumer nodes and path expectations + +Why fourth: + +The architecture is there. What remains is proof quality. + +## Priority 5 — timer/controller characterization + +Goal: reduce uncertainty around HPET/APIC-timer behavior and controller assumptions. + +Deliverables: + +- a compact validation note for HPET behavior on real hardware +- notes on timer-controller assumptions and known limits + +Why fifth: + +Important, but less immediately blocking than MSI-X and IOMMU. + +## Priority 6 — xHCI interrupt restoration + +This is Priority 6 **within the low-level controller plan itself**, not within the repository-wide +subsystem order. At the repo-wide level, low-level controller quality remains ahead of USB/Wi-Fi/ +Bluetooth because these later subsystems depend on the controller/runtime proof work documented +here. + +Goal: move USB host-controller operation from polling back to real interrupt-driven behavior. + +Deliverables: + +- restore the actual `get_int_method` path in `xhcid` +- validate MSI/MSI-X or INTx behavior for xHCI on real hardware and/or QEMU +- update docs so USB controller quality is not overstated while polling remains active + +Why sixth: + +This is a real completeness gap in an important low-level controller, but it is narrower in scope +than the cross-cutting MSI-X and IOMMU priorities above. + +## Execution Plan + +### Step A — Establish validation vocabulary in all related docs + +For every low-level controller area, use the same four states consistently: + +- builds +- boots +- validated +- experimental + +Do not mark controller infrastructure “complete” unless the claimed runtime behavior is actually +proven. + +### Step B — Add dedicated validation notes for MSI-X and IOMMU + +The project already has enough code to justify dedicated runtime-validation docs for: + +- GPU MSI-X behavior +- IOMMU bring-up and fault handling + +There is now also an in-tree generic MSI-X runtime proof helper: + +- `local/scripts/test-msix-qemu.sh` + +These should record both successful and failed hardware runs. + +### Step C — Expand runtime-proof tooling where signal is weak + +The project already has a good pattern for this in the Phase 3/4/5 validation helpers. + +Use the same pattern for low-level controllers: + +- one host-side launcher/check path +- one guest-side runtime check path +- one doc entry that records what “passing” actually means + +### Step D — Keep the controller plan separate from higher-level desktop work + +Do not let IRQ/IOMMU/controller planning get absorbed into generic Wayland/KDE roadmaps. + +Controller quality must remain measurable at its own layer. + +## Recommended New Documentation Work + +The current project docs should eventually include dedicated runtime-validation companion documents +for: + +- MSI-X validation +- IOMMU bring-up and fault validation +- timer/controller characterization +- input/controller runtime evidence + +This document is the umbrella enhancement plan; those would be the execution/validation companions. + +## Current Validation Entry Points + +The following in-tree validation paths now exist and should be treated as the current controller +runtime-evidence surface: + +- `local/scripts/test-xhci-irq-qemu.sh --check` — xHCI interrupt-mode proof from QEMU boot logs +- `local/scripts/test-msix-qemu.sh` — live MSI-X proof via `virtio-net` +- `local/scripts/test-iommu-qemu.sh --check` — AMD IOMMU device visibility plus guest boot reachability +- `local/scripts/test-usb-storage-qemu.sh` — USB mass-storage autospawn probe (currently still an + active blocker path) + +## Bottom Line + +Red Bear OS does **not** need a new IRQ/controller architecture. + +It already has the correct architectural direction: + +- scheme-based userspace IRQ delivery +- safe Rust driver wrappers +- PCI/MSI-X support +- IOMMU direction +- ACPI/APIC groundwork + +What it needs now is disciplined completion work in this order: + +1. MSI-X runtime proof +2. IOMMU hardware validation +3. IRQ observability and affinity proof +4. input/controller runtime evidence +5. timer/controller characterization + +The main quality risk is no longer missing design. It is over-claiming readiness before low-level +controller runtime evidence exists. diff --git a/local/docs/PHASE-0-3-REASSESSMENT.md b/local/docs/PHASE-0-3-REASSESSMENT.md new file mode 100644 index 00000000..71c858b4 --- /dev/null +++ b/local/docs/PHASE-0-3-REASSESSMENT.md @@ -0,0 +1,264 @@ +# Red Bear OS Phase 0–3 Reassessment + +## Purpose + +This document reconciles the current public execution plan in `docs/07-RED-BEAR-OS-IMPLEMENTATION-PLAN.md` +with the older hardware-oriented roadmap in `local/docs/AMD-FIRST-INTEGRATION.md`. + +The goal is to make Phase 0 through Phase 3 readable in terms of **what is built**, **what is +boot/runtime wired**, and **what is actually validated**. + +## Validation States + +- **builds** — code or profile compiles successfully +- **boots** — image or service path reaches a usable boot/runtime state +- **validated** — behavior has been exercised with real evidence for the claimed scope +- **experimental** — available for bring-up but not support-promised + +This repo should not treat “compiles” as equivalent to “validated”. + +## Why this reassessment exists + +Two active documents describe the early Red Bear roadmap differently: + +- `docs/07-RED-BEAR-OS-IMPLEMENTATION-PLAN.md` is the canonical public execution plan. +- `local/docs/AMD-FIRST-INTEGRATION.md` is the older AMD-first technical roadmap. + +They are both useful, but they number phases differently: + +- `docs/07` uses a product-enablement framing (`Phase 1` repository/profile structure, `Phase 2` + minimal-system baseline, `Phase 3` driver/runtime substrate). +- `AMD-FIRST` uses a hardware-enablement framing (`P0` ACPI boot, `P1` driver infrastructure, + `P2` AMD display, `P3` input + POSIX). + +This document is the bridge for Phase 0–3 discussions. + +## Phase 0 — Bare-metal boot and ACPI baseline + +### Source of truth + +- `local/docs/AMD-FIRST-INTEGRATION.md` +- Root `AGENTS.md` status summary + +### Scope + +- AMD bare-metal bootability +- ACPI checksums and table handling +- shutdown/reboot/power-method support +- SMP/x2APIC-era platform readiness + +### Current status + +- **builds** — yes +- **boots** — yes +- **validated** — yes, at the platform/boot level described in the AMD-first notes + +### Notes + +Phase 0 is not part of the public `docs/07` numbering, but it remains a real prerequisite in the +AMD-first implementation history and should stay visible when discussing early Red Bear progress. + +## Phase 1 — Repository discipline and profile reproducibility + +### Source of truth + +- `docs/07-RED-BEAR-OS-IMPLEMENTATION-PLAN.md` +- `local/docs/repo-governance.md` +- `local/docs/PROFILE-MATRIX.md` + +### Scope + +- tracked profile definitions +- shared config fragments instead of duplicated wiring +- helper scripts aligned with tracked profiles +- support-language and validation-language rules + +### Current status + +- **builds** — yes +- **boots** — indirectly supported by later profile builds +- **validated** — partially, in the sense that `redbear-minimal` and `redbear-desktop` were used as + reproducibility targets during the Phase 1 cleanup + +### Implemented evidence + +- `config/redbear-*.toml` shared fragment refactor +- `local/docs/repo-governance.md` +- `local/docs/PROFILE-MATRIX.md` +- `local/scripts/build-redbear.sh` profile coverage updates + +### Remaining caution + +Phase 1 is structurally in good shape, but support labels still need to be used consistently in +phase-level docs. + +## Phase 2 — Minimal-system baseline + +### Source of truth + +- `docs/07-RED-BEAR-OS-IMPLEMENTATION-PLAN.md` +- `local/docs/NETWORKING-RTL8125-NETCTL.md` +- `local/docs/REDBEAR-INFO-RUNTIME-REPORT.md` + +### Scope + +- bootable minimal profile +- package-management baseline +- VM networking baseline + +### Current status + +- **builds** — yes +- **boots** — helper and validation surfaces now exist for the VM path +- **validated** — partially; the repo now has explicit validation helpers, but this still needs + continued real runtime use to graduate from baseline bring-up to stronger support claims + +### Implemented evidence + +- `redbear-minimal` enables `wired-dhcp` by default +- `redbear-info` reports VirtIO VM networking visibility +- `local/scripts/validate-vm-network-baseline.sh` +- `local/scripts/test-vm-network-qemu.sh` +- `local/scripts/test-vm-network-runtime.sh` + +### Remaining caution + +Phase 2 should continue to be described as a **baseline**. It now has build-time, launch-time, and +runtime check paths, but that is still not the same as broad hardware validation. + +## Phase 3 — Driver and runtime substrate + +### Source of truth + +- `docs/07-RED-BEAR-OS-IMPLEMENTATION-PLAN.md` +- `local/docs/AMD-FIRST-INTEGRATION.md` + +### Correct framing + +The public plan's wording is the correct top-level framing: + +> **Driver and runtime substrate** + +The AMD-first wording remains useful as a lower-level technical breakdown: + +> **Input + POSIX** + +These are not competing scopes. The second explains the concrete components that fulfill the first. + +### Scope + +- shared driver substrate already built in-tree +- firmware loading available as runtime infrastructure +- input/runtime prerequisites such as `evdevd` and `udev-shim` +- relibc POSIX surfaces required by downstream consumers + +### Current status + +- **builds** — yes for the major in-tree Phase 3 components +- **boots** — partially wired via profile/service configuration +- **validated** — not yet at the level needed to call the substrate runtime-proven end to end + +### Built evidence already in tree + +- `local/recipes/drivers/redox-driver-sys/` +- `local/recipes/drivers/linux-kpi/` +- `local/recipes/system/firmware-loader/` +- `local/recipes/system/evdevd/` +- `local/recipes/system/udev-shim/` +- `local/patches/relibc/P3-*.patch` + +### Real remaining work + +The main remaining Phase 3 task is not “invent the substrate” — it already exists in-tree. The +real gap is **runtime and downstream-consumer validation**: + +- prove the relibc POSIX surfaces against actual consumers +- prove the input path from Redox input sources through `evdevd` and `udev-shim` +- keep Phase 3 distinct from later graphics/Wayland/KDE work + +### Current runtime-validation helpers + +- `./local/scripts/test-phase3-runtime-substrate.sh` — in-guest runtime check for + `firmware-loader`, `udev-shim`, `evdevd`, and their scheme surfaces +- `redbear-info --verbose` — passive runtime evidence for installed/active integrations + +### Runtime evidence gathered during reassessment + +- `redbear-desktop` was booted successfully in QEMU with x86_64 UEFI firmware and reached a real + login prompt over the serial console. +- `pcid-spawner` successfully spawned `virtio-netd` during the guest boot sequence. +- `firmware-loader` registered `scheme:firmware` without crashing, even with an empty + `/usr/firmware/` directory. +- `evdevd` registered `scheme:evdev` and `udev-shim` registered `scheme:udev` during the same + guest boot. +- `redbear-info --json` inside the guest reported `virtio_net_present: true`, a configured + `eth0` address, and live firmware/udev integration evidence. + +## Recommended interpretation going forward + +When discussing the roadmap publicly: + +- use `docs/07` phase numbering as canonical +- treat `AMD-FIRST` phase numbering as historical hardware-roadmap context +- always attach validation language (`builds`, `boots`, `validated`, `experimental`) to claims + +## Summary + +Phase 0 is the AMD-first bare-metal boot foundation. + +Phase 1 is structurally implemented and largely cleaned up. + +Phase 2 now has an actual VM-network baseline with repo, launch, and in-guest validation helpers. + +One practical caveat surfaced during reassessment: the QEMU launch helper also depends on usable +x86_64 UEFI firmware on the host. When that firmware is missing, the failure mode is a host-side +SeaBIOS/iPXE fallback rather than a guest-side Red Bear runtime failure, so the helper now checks +for that prerequisite explicitly. + +Phase 3 should be understood as **runtime-substrate validation and wiring**, not as a brand-new +infrastructure buildout from zero. + +## Quality Assessment + +### Planning quality + +**Strong points** + +- The public plan in `docs/07` is clearer and more execution-oriented than the older roadmap. +- Phase 1 and Phase 2 now have concrete helper scripts and docs instead of relying on implicit + operator knowledge. +- The profile matrix and governance docs substantially reduce ambiguity about what each tracked + profile is supposed to represent. + +**Weak points** + +- Historical phase numbering from `AMD-FIRST-INTEGRATION.md` still differs from the newer public + plan, which can confuse progress reporting if the bridge document is not consulted. +- Some status language across the repo still tends to overvalue “builds” relative to “validated”. + +### Implementation quality + +**Strong points** + +- Shared Red Bear config fragments reduced duplication in tracked profiles. +- The VM-network baseline now has layered validation surfaces: repo-level, launcher-level, and + in-guest runtime checks. +- `redbear-info` remains aligned with real integration changes instead of becoming stale. + +**Weak points** + +- Runtime validation is still thinner than build validation across the early phases. +- Some local operating docs needed follow-up cleanup to reflect the newer scripts and profile set. + +### Recommendation + +For Phase 0–3 work, prefer closing validation gaps and documentation drift before adding new scope. +The early-phase codebase is in a much better structural state now; the main quality risk is no +longer missing packages, but overstating readiness before runtime evidence exists. + +## Phase 4 Handoff Note + +Phase 4 should begin from the existing `wayland.toml` profile, not by jumping straight to KWin. +The current repo already contains the `smallvil`, `cosmic-comp`, `qtwayland`, and Mesa software + rendering pieces; the highest-value next work is validating the `orbital-wayland` → `smallvil` + runtime path on QEMU/VirtIO and only then widening to heavier compositor/session stacks. diff --git a/local/docs/RELIBC-COMPLETENESS-AND-ENHANCEMENT-PLAN.md b/local/docs/RELIBC-COMPLETENESS-AND-ENHANCEMENT-PLAN.md new file mode 100644 index 00000000..d881e8ff --- /dev/null +++ b/local/docs/RELIBC-COMPLETENESS-AND-ENHANCEMENT-PLAN.md @@ -0,0 +1,631 @@ +# Red Bear OS relibc Completeness and Enhancement Plan + +## Purpose + +This document assesses relibc in Red Bear OS for **strengths**, **deficiencies**, **subsystem-facing +gaps**, and **overall quality**, then defines a practical plan for improving it. + +The goal is not to treat relibc as a generic libc project. The goal is to describe: + +- what is already strong, +- what exists only through local patch carriers, +- what is still incomplete or weak, +- what downstream subsystems still depend on relibc improvement, +- and what order of work best improves real system capability. + +This is a Red Bear-specific document. It is grounded in the current repo state rather than older, +pre-correction roadmap assumptions. + +## Evidence Model + +This plan uses four evidence buckets and does **not** treat them as equivalent: + +- **source-visible** — behavior visible directly in the current relibc source tree +- **patch-carried** — behavior carried in `local/patches/relibc/P3-*.patch` +- **build-visible downstream** — downstream packages now compile because the libc surface exists +- **runtime-validated** — behavior has been exercised successfully in real downstream/runtime paths + +This distinction matters because relibc’s current problem is often **not** “API absent,” but the gap +between **implemented**, **patch-carried**, **build-proven**, and **runtime-trusted**. + +## Upstream vs Red Bear ownership + +For relibc, the ownership boundary must stay explicit: + +- `recipes/core/relibc/source/` is the live upstream-owned working tree used for actual build and + validation +- `local/patches/relibc/P3-*.patch` is the Red Bear-owned durable carrier for relibc changes +- `local/docs/...` is the durable explanation of what those changes mean and how to reapply them + +That means a relibc change is not truly preserved until it exists in **both** places: + +1. the live relibc source tree, so the current build can prove it +2. the `local/patches/relibc/` carrier set, so the same result can be recreated after an upstream refresh + +The repo standard for success is not merely “the current source tree builds.” The standard is: + +> we can fetch fresh upstream relibc sources, reapply the Red Bear relibc patch carriers, and still +> rebuild the same working result. + +Any relibc work that exists only under `recipes/core/relibc/source/` should therefore be treated as +validated-but-not-yet-preserved. + +Because relibc is also one of the fastest-moving upstream areas, Red Bear should apply one more +rule here: + +> if a Red Bear relibc patch solves a problem that upstream has already solved, prefer the upstream +> solution and retire or reduce the local patch. + +The goal is durable compatibility, not a permanent relibc fork. + +## Current Repo State + +> **Implementation note (current Red Bear tree):** this repo pass moved several relibc items from +> patch-carried-only or downstream-workaround status into source-visible libc behavior. The current +> tree now contains source-visible `signalfd`, `timerfd`, `eventfd`, `open_memstream`, +> `F_DUPFD_CLOEXEC`, `MSG_NOSIGNAL`, a bounded `waitid()` path, bounded `RLIMIT_NOFILE` / +> `RLIMIT_MEMLOCK` behavior, a bounded `eth0`-backed `net/if.h` / `ifaddrs.h` view, a source-visible +> `resolv.h` plus bounded `res_query()` / `res_search()` compatibility paths with receive/send +> timeout hardening, a first named-semaphore implementation on top of the existing shm path, and +> bounded `sys/ipc.h` / `sys/shm.h` surfaces for the `IPC_PRIVATE` / `shmget` / `shmat` / +> `shmdt` / `shmctl(IPC_RMID)` workflow. + +> **Downstream validation note (current Red Bear tree):** `libwayland` now cooks successfully +> against the updated relibc, and qtbase now configures, builds, and stages with +> `FEATURE_process=ON`, `FEATURE_sharedmemory=ON`, and `FEATURE_systemsemaphore=ON` in the current +> tree. The relibc `tests/` harness also now builds focused Redox-target binaries for `eventfd`, +> `waitid`, `res_init`, `res_query`, `sem_open`, and `shmget`, and the host-target variants of those same +> focused tests now execute successfully under the relibc-built host sysroot. That does not mean +> relibc is complete, but it does mean the implementation has crossed real downstream build/stage +> gates and direct execution-level proof rather than remaining an isolated libc-only pass. The +> current host-side `res_query` proof is still bounded: it compiles, runs, and fails fast under the +> relibc sysroot instead of hanging, but it is not yet a runtime-trusted downstream DNS proof. +> +> **Additional downstream proof (current Red Bear tree):** the in-tree `openssh` recipe now cooks +> successfully against the relibc resolver surface after switching the recipe to the rebuilt relibc +> headers/libraries and removing stale Redox-specific resolver fallbacks from the OpenSSH patch. +> That is still build/stage proof rather than runtime SSH validation, but it demonstrates that real +> consumers can now compile and link `res_init`, `res_query`, and `dn_expand` from relibc. +> +> **Fresh revalidation pass (current Red Bear tree):** the focused host-side relibc proofs were +> rerun for `eventfd`, `waitid`, `res_init`, `res_query`, `sem_open`, and `shmget`; the binaries all +> built, and the executions succeeded for `eventfd`, `waitid`, `res_init`, `sem_open`, and `shmget` +> with the bounded `res_query` test still failing fast rather than hanging. The main downstream +> consumers previously used as evidence were also rerun successfully: `CI=1 ./target/release/repo cook libwayland`, +> `CI=1 ./target/release/repo cook qtbase`, and `CI=1 ./target/release/repo cook openssh` now all +> succeed in the current tree. +> +> **Additional focused coverage (current Red Bear tree):** integrated relibc tests were also added +> for `open_memstream`, SysV semaphores via `semget`/`semop`/`semctl`, `timerfd`, and `signalfd`. +> On the host-side relibc sysroot, `open_memstream` and `semget` execute successfully, while the +> `timerfd` and `signalfd` tests currently report bounded unavailability in that host environment +> rather than hanging or crashing. That still falls short of Redox runtime proof for those two +> non-POSIX APIs, but it moves them from source-visible-only status into the explicit test harness. +> +> **Fresh-upstream reapply proof (current Red Bear tree):** a fresh `repo unfetch relibc` → +> `repo fetch relibc` cycle was used to reconstruct the relibc source tree from upstream-owned +> sources, the durable `local/patches/relibc/` carrier set was reapplied to that fresh tree, and the +> resulting rebuild again supported successful downstream `libwayland` and `qtbase` cooks. That is +> the current proof that Red Bear’s relibc work is not only buildable in-place, but also recoverable +> after a fresh upstream source refresh. + +> **Current reconstructed-state proof set:** with the refreshed source tree rebuilt from the local +> relibc overlay set, the repo now has successful cookbook evidence for all three layers in order: +> `CI=1 ./target/release/repo cook relibc`, then `CI=1 ./target/release/repo cook libwayland`, then +> `CI=1 ./target/release/repo cook qtbase`. This is the strongest current proof that the relibc +> compatibility work is preserved in the right place for long-term maintenance. + +### Summary + +relibc is one of Red Bear’s strongest foundational subsystems, but it is not complete. + +The current repo shows a relibc that is already strong in: + +- broad header/libc surface coverage +- real Redox-native platform integration +- source-visible implementations of the historical Wayland-facing P3 APIs, with patch carriers still retained as sync/upstream artifacts +- enough maturity to unlock major build-side progress in Wayland, Qt, and KDE +- a substantial generic upstream-style test tree + +The current repo also shows relibc is still weak in: + +- shared memory / SysV IPC completeness +- named semaphores +- process/runtime quality for some downstreams +- networking/resolver/interface completeness +- Redox-target and downstream-runtime validation depth + +### Status Matrix + +| Area | State | Notes | +|---|---|---| +| Core POSIX/header breadth | **strong / partial** | Large header surface exists, but many TODO headers and feature gaps remain | +| Wayland-facing P3 APIs | **implemented / source-visible / runtime-unproven** | `signalfd`, `timerfd`, `eventfd`, `open_memstream`, socket flags, and `F_DUPFD_CLOEXEC` now exist in the relibc source tree; runtime proof still trails build integration | +| Networking/libc socket surface | **usable / partial** | AF_INET/AF_UNIX paths exist, but interface/reporting/resolver behavior remains narrow | +| Qt/KDE downstream unblockers | **build-side improved / multiple gates crossed** | `QProcess`, `QSharedMemory`, and `QSystemSemaphore` now configure, build, and stage on in-tree qtbase; broader runtime validation is still needed | +| Shared memory / semaphore completeness | **partial** | `shm_open` exists through the Redox shm path, but SysV IPC/shared-memory and named semaphore completeness remain open | +| Process/runtime completeness | **partial** | Some process-facing functionality still uses stubs or downstream workarounds | +| Dedicated test surface | **present / Redox-specific coverage still thin** | relibc has a substantial `source/tests/` tree, but the Red Bear-visible Redox/P3/runtime validation story is still weaker than the generic libc test surface | +| Runtime validation against real consumers | **insufficient** | Still weaker than build-side evidence | + +## Strong Points + +### 1. relibc already exposes a broad libc/header surface + +`recipes/core/relibc/source/src/header/mod.rs` shows a broad libc/header tree with networking, +threading, polling, stdio, locale, signal, socket, time, and many Unix-facing modules already +present. + +That means Red Bear should treat relibc work as **quality and completeness hardening**, not as a +greenfield libc effort. + +### 2. The historical P3 Wayland-facing API bridge is now source-visible + +The local relibc patch carriers documented the APIs that historically blocked Wayland and downstream +consumers. Some of those fixes are still Red Bear-owned overlays; others are now present upstream and +should no longer be carried locally. + +- `local/patches/relibc/P3-signalfd.patch` +- `local/patches/relibc/P3-timerfd.patch` +- `local/patches/relibc/P3-eventfd.patch` +- `local/patches/relibc/P3-waitid.patch` + +The remaining Red Bear-owned relibc carriers currently add or complete: + +- `signalfd` / `signalfd4` +- `timerfd_create` / `timerfd_settime` / `timerfd_gettime` +- `eventfd` / `eventfd_read` / `eventfd_write` +- bounded `waitid()` +- bounded `sys/ipc.h`, `sys/sem.h`, and `sys/shm.h` compatibility layers +- focused relibc IPC tests needed to keep those overlays validated after upstream refresh + +The upstream-first policy still applies here, but the durable patch-carrier set should be trimmed +only when a fresh upstream refetch plus reapply plus downstream rebuild actually proves the upstream +coverage is sufficient. In the current Red Bear tree, `open_memstream`, `F_DUPFD_CLOEXEC`, and the +socket flag work still need to remain in the relibc overlay set because the clean reconstructed +consumer path still depends on them. + +This is one of relibc’s strongest current points: Red Bear already has the exact P3 compatibility +surface that older docs used to describe as absent. + +The local patches still matter as provenance and sync-upstream carriers for the gaps upstream does +not yet solve, but they should be retired as soon as upstream makes them redundant. + +### 3. Downstream build progress proves relibc is materially useful + +The current docs consistently show that relibc has already enabled substantial downstream progress: + +- `docs/02-GAP-ANALYSIS.md` now marks the P3 bridge as implemented in-tree, with runtime validation still pending +- `docs/03-WAYLAND-ON-REDOX.md` says the build-side relibc/libwayland bridge is restored and that the remaining blocker is runtime validation, not basic POSIX availability +- `local/docs/QT6-PORT-STATUS.md` treats many earlier relibc blockers as moved from “missing” to “present but still needs downstream validation” + +This is a major quality signal: relibc is already strong enough to unlock real build-side subsystem work. + +### 4. relibc already has a substantial generic test surface + +`recipes/core/relibc/source/tests/` is real and large. It already covers many libc-facing areas such +as: + +- `fcntl/` +- `net/` and `netdb/` +- `pthread/` +- `stdio/` +- `sys_mman/` +- `sys_socket/` +- `sys_resource/` +- `time/` +- `unistd/` + +That is a genuine strength and should be documented as one. + +The remaining weakness is narrower: Red Bear still lacks a strong **Redox-target / P3 API / +downstream-runtime** validation story that is as visible and deliberate as this generic relibc test +tree. + +### 5. The current relibc problem is no longer one single blocker + +The downstream evidence shows that relibc now has **multiple completeness fronts**: + +- Wayland-facing POSIX/event APIs +- Qt/KDE shared memory and semaphore support +- process-facing behavior such as `waitid()` +- networking/resolver completeness +- legacy but still-consumed items such as `sigjmp_buf` and locale/runtime edges + +That means the right enhancement plan is no longer “finish one missing API and unblock everything.” +The work has to be triaged by downstream impact. + +### 6. The Redox networking model is reflected in relibc + +`recipes/core/relibc/source/src/platform/redox/socket.rs` shows a real Redox-native socket/path +model instead of a pure stub implementation. That is another strong point: relibc already knows +about Redox-native runtime behavior. + +## Deficiencies and Gaps + +### 1. Header coverage is still incomplete in visible source + +`recipes/core/relibc/source/src/header/mod.rs` still contains a meaningful backlog of TODO or absent +header surfaces, including examples such as: + +- `iconv.h` +- `mqueue.h` +- `spawn.h` +- `sys/msg.h` +- `threads.h` +- `wordexp.h` + +Some of these are lower-value than others, but they still show that relibc has real completeness work left. + +### 2. Named semaphores are now source-visible, but still incomplete + +`recipes/core/relibc/source/src/header/semaphore/mod.rs` is still a clear example of partial completeness. + +Basic unnamed semaphore paths exist (`sem_init`, `sem_post`, `sem_wait`, `sem_timedwait`, etc.), +and the named semaphore path is now source-visible too: + +- `sem_open` +- `sem_close` +- `sem_unlink` + +These are now implemented on top of the existing shm path instead of left as raw `todo!()` stubs. + +The remaining weakness is semantic and validation depth, not pure absence: + +- broader POSIX semaphore semantics are still not strongly runtime-validated +- downstream configure/runtime behavior still needs continued confirmation +- the SysV semaphore surface remains thinner than a full Unix implementation + +This directly affects downstream consumers such as `QSystemSemaphore`. + +### 3. Shared memory is present, but not complete enough for downstream GUI/runtime work + +The current relibc source already exposes one meaningful shared-memory path: + +- `recipes/core/relibc/source/src/header/sys_mman/mod.rs` provides `shm_open()` and `shm_unlink()` +- on Redox, that path resolves to `/scheme/shm/` +- `recipes/core/base/source/ipcd/src/shm.rs` implements the backing shared-memory scheme + +That is a real strength and should not be described as “shared memory absent.” + +The real gap is that shared-memory completeness is still insufficient for broader downstream use: + +- the source tree now has visible `sys/shm.h` / `sys/ipc.h` / `sys/sem.h` modules, but they remain bounded rather than comprehensive +- Qt/KDE-facing docs still treat `shm_open()` / `shmget()`-class behavior as unresolved enough to block full `QSharedMemory` confidence +- the current repo still lacks a strong end-to-end validation story for these paths in desktop consumers + +### 4. Resolver and interface-networking completeness are still uneven + +The downstream scan shows that networking-facing userland still hits relibc gaps beyond raw socket +basics. + +Examples from downstream recipes and docs: + +- `recipes/wip/qt/qtbase/recipe.toml` still leaves QtNetwork disabled because of broader networking/runtime concerns such as `in6_pktinfo` and richer interface semantics, even though minimal `resolv.h` and `arpa/nameser.h` surfaces now exist +- `recipes/net/openssh/recipe.toml` and its patch history still call out `resolv.h` +- `recipes/wip/terminal/tmux/redox.patch` comments out `resolv.h` +- `recipes/libs/glib/redox.patch` still touches resolver-facing includes + +### 5. The networking surface is narrower than generic Unix software expects + +The current source still shows important limits that should be named directly: + +- `recipes/core/relibc/source/src/platform/redox/socket.rs` has AF_INET / AF_UNIX socket handling +- `recipes/core/relibc/source/src/header/net_if/mod.rs` now exposes a bounded `eth0`-backed interface view instead of a permanent `stub` +- `recipes/core/relibc/source/src/header/ifaddrs/mod.rs` now provides a bounded `eth0`-backed `getifaddrs()` path instead of pure `ENOSYS` +- source-visible `resolv.h` / `arpa/nameser.h` plus bounded `res_query()` / `res_search()` compatibility are now present, and at least one real downstream (`openssh`) now builds against them, but broader resolver compatibility is still incomplete + +That is enough to support the current Red Bear native network path in a bounded sense, but it is not +yet strong enough to claim broad interface-aware compatibility for higher-level consumers. Resolver/ +header gaps and interface-model assumptions still show up in ports such as QtNetwork, OpenSSH, +tmux, glib, curl, and libuv. + +### 6. Process/runtime completeness is still uneven + +The repo still has process/runtime unevenness, but one meaningful consumer-facing gap has now moved: + +- relibc now provides a bounded `waitid()` implementation over the existing `waitpid` path +- the old Qt-side injected `waitid()` stub has been retired from the Qt recipe layer + +The source state needs to be classified carefully: + +- `sigjmp_buf` exists in `recipes/core/relibc/source/include/setjmp.h`, so older downstream comments treating it as absent are better read as compatibility/staleness signals rather than primary source truth +- `getgroups()` has a Redox implementation path in `platform/redox/mod.rs` +- `getrlimit()` is no longer a pure placeholder for all consumers: Red Bear now has bounded `RLIMIT_NOFILE` and `RLIMIT_MEMLOCK` behavior, but broader resource-limit completeness is still weak + +So process/runtime completeness should be treated as a real subsystem-quality track, but the plan +must distinguish **missing**, **implemented but weak**, and **stale downstream complaint**. + +### 7. Source quality still contains many TODO / unimplemented branches + +The current source has a large amount of unfinished or explicitly deferred behavior across: + +- `pthread` +- `time` +- `unistd` +- `platform/redox` +- `epoll` +- `ptrace` +- locale and stdio internals + +This does not mean relibc is unusable. It means completeness and quality work now needs a stronger +triage model instead of treating all missing items as equally important. + +### 8. Redox-target and downstream validation remain thin relative to subsystem importance + +The current repo already contains a substantial generic relibc test tree, but the Red Bear-visible +validation story is still thin in the areas that matter most for current subsystem unblockers. + +Right now much of relibc’s confidence in the Red Bear docs still comes from: + +- source inspection +- patch carriers +- build-side downstream success +- limited runtime validation via downstream stacks + +That is not enough for a component as central as libc, especially for the Redox-target and +downstream-consumer paths Red Bear depends on. + +## Downstream-Blocking Gaps by Subsystem + +### Wayland + +The old “basic POSIX APIs are missing” story is no longer the main one. + +Current state: + +- `signalfd`, `timerfd`, `eventfd`, `open_memstream`, and key socket flags are now source-visible in relibc and still tracked by patch carriers for sync/upstream purposes +- the current bounded `waitid()` path is also preserved as a relibc patch carrier so it can be reapplied after upstream refresh +- `libwayland` now rebuilds with a much smaller Redox patch + +Remaining blocker: + +- runtime validation of the full relibc -> libwayland -> compositor path + +So the current relibc task for Wayland is primarily **runtime proof and patch reduction**, not just +adding obvious libc symbols. + +Current Red Bear evidence is stronger than before: `libwayland` now cooks successfully against the +updated relibc tree, which means the generated `sys/signalfd.h`, `sys/timerfd.h`, `sys/eventfd.h`, +and `stdio.h`/`sys/socket.h` surfaces are now sufficient for at least one major downstream consumer. + +### Qt / KDE + +The Qt/KDE-facing relibc backlog is still substantial. + +The biggest libc-facing gaps are: + +- shared memory (`shm_open` / `shmget`) for `QSharedMemory` +- named/system semaphores (`sem_open` / `semget`) for `QSystemSemaphore` +- stronger process/runtime behavior for `QProcess` +- runtime validation of QtNetwork against the current relibc networking surface +- resolver/header completeness (`resolv.h`) and network-interface semantics for QtNetwork +- broader process/runtime validation after the new bounded `waitid()` path + +This makes Qt/KDE the clearest downstream consumer pushing relibc from “build-capable” toward +“desktop-capable”. + +Current Red Bear evidence is stronger than before here too: qtbase now configures, builds, and +stages with +`FEATURE_process=ON`, `FEATURE_sharedmemory=ON`, and `FEATURE_systemsemaphore=ON` in the current +tree. The remaining work is therefore less about “make the feature visible at all” and more about +runtime semantics, broader compatibility, and downstream cleanup. + +### Networking and interface-aware software + +The current relibc networking model is usable, but still narrow enough that higher-level consumers +keep carrying workarounds or disabled features. + +The newer bounded `eth0`-backed `net_if` / `ifaddrs` work improves the source-visible story, but it +is still only a first Red Bear-shaped interface view, not a full generic Unix interface model. + +This is why the plan should treat networking as **usable but still validation-heavy**, not “done”. + +### General userland / server software + +The downstream scan also shows relibc gaps outside graphics: + +- PostgreSQL and some libraries still carry `sigjmp_buf`-related downstream notes that need revalidation against current headers +- SQLite still notes `getrlimit()` / `getgroups()` gaps, even though the current source state now splits those two differently +- Apache and other ports still touch semaphore or IPC assumptions + +That is important because it means relibc completeness is not only about desktop bring-up. It also +affects core application/server breadth. + +### Desktop/session path + +Session and desktop work depends less on one dramatic relibc gap than on overall libc quality: + +- process semantics +- IPC completeness +- synchronization primitives +- runtime interaction with D-Bus/Qt/Wayland consumers + +This is why relibc should be treated as a cross-cutting runtime-quality subsystem, not just a POSIX checklist. + +## Quality Assessment + +### What relibc is good at now + +- broad visible libc/header coverage +- practical Redox-native integration rather than fake stubs everywhere +- concrete P3 compatibility work for real downstreams +- enough maturity to unlock major subsystem builds +- a substantial generic test tree + +### What relibc is bad at now + +- uneven implementation depth +- too many TODO/unimplemented branches for a component this central +- patch-carried functionality that is still not strongly reflected in visible source snapshots +- too little Redox-target and downstream-runtime validation relative to the generic test tree +- too much downstream confidence still derived from “compiles” instead of “runtime-proven” + +## Enhancement Plan + +### Phase R0 — Evidence and Ownership Cleanup + +**Goal**: Make relibc status honest before widening scope. + +**What to do**: + +- explicitly track relibc claims as `source-visible`, `patch-carried`, `build-proven`, or `runtime-validated` +- keep the P3 patch carriers discoverable and documented as canonical until upstreamed +- stop describing relibc gaps with outdated “missing basics” language where the code already exists + +**Exit criteria**: + +- subsystem docs consistently distinguish between missing, patch-carried, and runtime-proven relibc behavior + +--- + +### Phase R1 — Stabilize the newly source-visible P3 APIs + +**Goal**: Keep the newly source-visible P3 APIs aligned with their patch-carrier and downstream expectations. + +**What to do**: + +- keep `signalfd`, `timerfd`, `eventfd`, `open_memstream`, socket flags, and `F_DUPFD_CLOEXEC` visible and maintained as canonical relibc behavior +- reduce downstream assumptions that these APIs are still absent +- ensure generated/exported headers stay aligned with the source-visible implementation set + +**Exit criteria**: + +- the repo consistently treats these P3 APIs as source-visible functionality that now needs validation and downstream cleanup rather than invention + +--- + +### Phase R2 — Close the shared-memory and semaphore completeness gap + +**Goal**: Unlock the next meaningful Qt/KDE-facing libc surface. + +**What to do**: + +- keep the existing `shm_open` / `/scheme/shm/` path explicit and documented +- implement the missing SysV IPC/shared-memory side or document a deliberate non-goal if Red Bear does not want full SysV compatibility +- harden and validate the now source-visible named semaphore support (`sem_open`, `sem_close`, `sem_unlink`) +- close the specific `QSharedMemory` and `QSystemSemaphore` blockers identified in the Qt docs + +**Exit criteria**: + +- the Qt/KDE docs no longer list shared memory and named semaphores as unresolved relibc blockers + +--- + +### Phase R3 — Process/runtime correctness for desktop consumers + +**Goal**: Reduce downstream process workarounds. + +**What to do**: + +- strengthen process-facing libc/runtime behavior enough to remove targeted workarounds such as the Qt `waitid()` shim path +- close or intentionally document the remaining `sigjmp_buf` / `getrlimit()` / `getgroups()` quality gaps that still force downstream patches +- validate process semantics against real downstream consumers, not only isolated libc expectations + +**Current implementation note:** the bounded `waitid()` path is now source-visible, the old Qt-side +`waitid()` shim is gone, and qtbase now configures/builds/stages with process support enabled. The +remaining work is broader process/runtime validation and cleanup, not the old total absence of `waitid()`. + +**Exit criteria**: + +- downstream process workarounds are reduced or eliminated for the current desktop stack + +--- + +### Phase R4 — Networking/runtime validation + +**Goal**: Turn the current networking surface from “present” into “trusted”. + +**What to do**: + +- validate QtNetwork and similar consumers against the current relibc socket/ioctl/interface model +- close the highest-value resolver/header gaps such as `resolv.h` where they are still forcing downstream stubs or disabled modules +- evolve the new bounded `eth0`-backed interface-reporting path into a better general Redox interface model where needed +- document which current networking semantics are intentionally Redox-specific and which are intended to mimic broader Unix behavior + +**Exit criteria**: + +- at least one meaningful higher-level network consumer is validated against the current relibc networking surface + +--- + +### Phase R5 — Dedicated relibc validation expansion + +**Goal**: Improve libc confidence without waiting for whole desktop stacks. + +**What to do**: + +- build a stronger dedicated Redox-target and P3/downstream validation layer on top of the existing generic relibc test tree +- ensure new APIs and bugfixes come with focused libc-level tests where practical +- keep downstream consumer tests, but stop relying on them as the only quality signal + +**Exit criteria**: + +- relibc has explicit Redox-target and downstream-runtime validation beyond the generic upstream-style test tree + +--- + +### Phase R6 — General completeness triage + +**Goal**: Attack the remaining TODO/unimplemented backlog by priority rather than by random header count. + +**What to do**: + +- rank remaining TODO/unimplemented items by downstream subsystem impact +- prioritize IPC, synchronization, process, time, and networking correctness over obscure or deprecated headers +- keep deprecated/low-value gaps documented, but do not let them drive the roadmap ahead of higher-value runtime work + +**Exit criteria**: + +- relibc backlog is organized by real system impact instead of undifferentiated TODO volume + +## Recommended Order of Work + +The current best order is: + +1. evidence cleanup and canonicalization of what already exists +2. shared memory and named semaphores +3. process/runtime correctness +4. networking/runtime validation +5. Redox-target and downstream validation expansion +6. broader backlog triage and cleanup + +That order matches the current downstream blocker chain better than a generic “finish all missing headers” strategy. + +## Support-Language Guidance + +Until the runtime-validation phases are materially complete, Red Bear should avoid saying: + +- “relibc POSIX gaps are solved” +- “Qt/Wayland blockers are fully gone” +- “network/process/shared-memory support is complete” + +Prefer language such as: + +- “consumer-visible P3 APIs are now present, with runtime validation still needed” +- “relibc is materially stronger, but desktop-facing completeness work remains” +- “the remaining relibc problem is now quality and downstream proof, not just symbol absence” + +## Summary + +relibc is one of Red Bear’s strongest foundational subsystems, but it is not complete. + +Its strongest current qualities are: + +- broad libc/header coverage +- real Redox-native platform integration +- concrete source-visible and patch-backed solutions to the historical P3 Wayland-facing blockers +- clear downstream build progress because of those fixes +- a substantial generic test surface + +Its largest remaining weaknesses are: + +- incomplete shared memory and named semaphore support +- process/runtime unevenness +- networking/resolver/interface completeness gaps +- too many TODO/unimplemented branches in central paths +- too little Redox-target and downstream-runtime validation relative to the generic test tree + +The correct relibc roadmap is therefore **not** “hunt random missing symbols.” It is to turn the +current build-capable libc into a runtime-trusted subsystem by closing the high-value desktop/runtime +gaps, strengthening validation, and reducing patch-carried ambiguity. diff --git a/local/docs/RELIBC-IPC-ASSESSMENT-AND-IMPROVEMENT-PLAN.md b/local/docs/RELIBC-IPC-ASSESSMENT-AND-IMPROVEMENT-PLAN.md new file mode 100644 index 00000000..bf55dbef --- /dev/null +++ b/local/docs/RELIBC-IPC-ASSESSMENT-AND-IMPROVEMENT-PLAN.md @@ -0,0 +1,393 @@ +# Red Bear OS relibc IPC Assessment and Improvement Plan + +## Purpose + +This document assesses the current **IPC-related relibc surface** in Red Bear OS and turns that +assessment into a concrete improvement plan. + +The focus here is narrower than the general relibc plan: + +- POSIX shared memory and semaphores +- System V shared memory and semaphores +- missing System V / POSIX IPC areas such as message queues +- IPC-adjacent descriptor/event primitives that downstream software treats as part of the same + coordination substrate: `eventfd`, `signalfd`, and `timerfd` +- the downstream subsystem pressure created by Qt, KDE, Wayland, and related userland + +This is not a generic libc-compliance document. It is grounded in the current repository state. + +## Evidence Model + +This assessment distinguishes four evidence levels: + +- **source-visible** — behavior exists in relibc source now +- **test-visible** — behavior is exercised by focused relibc tests +- **build-visible downstream** — real consumers compile/link against it +- **runtime-validated** — behavior has been exercised in real Redox or consumer runtime paths + +The key IPC problem in the current tree is not simple absence. It is the gap between +**source-visible**, **bounded**, **build-proven**, and **runtime-trusted**. + +## Upstream vs Red Bear separation + +For this IPC work, keep the storage model explicit: + +- the live implementation under `recipes/core/relibc/source/src/header/` is the working upstream + tree used for builds and tests +- the durable Red Bear ownership boundary is `local/patches/relibc/` plus `local/docs/` + +So the IPC implementation is only truly safe when: + +1. the upstream-owned relibc source tree builds with the change now, and +2. the same delta is preserved in `local/patches/relibc/` so a fresh upstream refetch can recover it + +This repo should be able to pull renewed upstream sources every day and still rebuild after +reapplying the local relibc patch carriers. That requirement is part of the IPC improvement plan, +not an afterthought. + +The same section also implies an upstream-preference policy: + +- when upstream relibc already provides the same IPC fix, prefer upstream +- keep Red Bear IPC patches only for gaps that upstream still does not solve adequately +- review patch carriers regularly and delete or shrink ones made obsolete by upstream evolution + +## Current Implementation Note + +This repo pass did not just assess the IPC surface; it also restored the missing relibc IPC modules +that the drafted Red Bear docs were already assuming existed in-tree. + +The current tree now contains source-visible implementations for: + +- `sys/eventfd.h` / `eventfd()` / `eventfd_read()` / `eventfd_write()` +- `sys/timerfd.h` / `timerfd_create()` / `timerfd_settime()` / `timerfd_gettime()` +- `sys/signalfd.h` / `signalfd()` / `signalfd4()` +- `open_memstream()` +- bounded `sys/ipc.h`, `sys/shm.h`, and `sys/sem.h` compatibility layers +- a bounded `waitid()` path sufficient to satisfy current Qt process-side linking + +This pass also added focused relibc tests for: + +- `stdio/open_memstream` +- `sys_sem/semget` +- `sys_timerfd/timerfd` +- `sys_signalfd/signalfd` + +Current manual verification in this repo pass: + +- `cargo check --target x86_64-unknown-linux-gnu` passes for relibc +- host-side focused IPC tests execute successfully for `open_memstream` and `semget` +- host-side focused `timerfd` and `signalfd` tests report bounded unavailability rather than hanging +- `CI=1 ./target/release/repo cook relibc` completes successfully after clearing a stale stage-dir collision +- `CI=1 ./target/release/repo cook qtbase` now succeeds after exporting `eventfd_t` and restoring a bounded `waitid()` path +- a fresh `repo unfetch relibc` → `repo fetch relibc` cycle plus reapplication of + `local/patches/relibc/` again supports successful downstream `libwayland` and `qtbase` builds, + which is the current proof that the relibc IPC overlay is recoverable from refreshed upstream + source, not only from the previously edited working tree + +In other words, the current relibc IPC work is no longer just “working in the checked-out source +tree”. It is now proven as an overlay workflow: + +1. refresh upstream relibc source +2. reapply the local relibc compatibility overlays +3. rebuild relibc +4. rebuild real downstream consumers (`libwayland`, `qtbase`) + +## Scope Map + +### In scope in relibc today + +| Area | State | Primary evidence | +|---|---|---| +| `shm_open()` / `shm_unlink()` | implemented | `recipes/core/relibc/source/src/header/sys_mman/mod.rs` | +| POSIX unnamed semaphores | implemented | `recipes/core/relibc/source/src/header/semaphore/mod.rs` | +| POSIX named semaphores | implemented but bounded | `recipes/core/relibc/source/src/header/semaphore/mod.rs` | +| SysV shared memory | implemented but bounded | `recipes/core/relibc/source/src/header/sys_shm/mod.rs` | +| SysV semaphores | implemented but bounded | `recipes/core/relibc/source/src/header/sys_sem/mod.rs` | +| `eventfd` | implemented; stronger than the other descriptor-event APIs | `recipes/core/relibc/source/src/header/sys_eventfd/mod.rs` | +| `signalfd` | implemented, but runtime-thin and not broadly Redox-runtime-trusted yet | `recipes/core/relibc/source/src/header/signal/signalfd.rs` | +| `timerfd` | implemented, but semantically narrow and not broadly Redox-runtime-trusted yet | `recipes/core/relibc/source/src/header/sys_timerfd/mod.rs` | + +### Explicitly incomplete or absent + +| Area | Current state | Evidence | +|---|---|---| +| POSIX message queues | absent | `recipes/core/relibc/source/src/header/mod.rs` still has `TODO: mqueue.h` | +| SysV message queues | absent | `recipes/core/relibc/source/src/header/mod.rs` still has `TODO: sys/msg.h` | +| `threads.h` / other broader libc completeness | outside this IPC focus, still incomplete | `recipes/core/relibc/source/src/header/mod.rs` | + +## Current Implementation Assessment + +### 1. Strong spots + +The strongest IPC-related point is that relibc is no longer missing its core coordination substrate. +The current tree has real, source-visible implementations for POSIX shm, POSIX semaphores, SysV +shared memory, SysV semaphores, `eventfd`, `signalfd`, and `timerfd`. This is already enough to +move several downstreams from patch-side workarounds to actual libc usage. + +`shm_open()` and `shm_unlink()` are cleanly tied to the Redox-native `/scheme/shm/` path in +`sys_mman/mod.rs`. That is a good architectural fit: Red Bear is not pretending to have a Linux +kernel IPC model under the hood, but it still exposes familiar libc entry points on top of Redox +schemes. + +The second strong point is that the IPC work is not just source-visible anymore. The focused relibc +tests already cover `sem_open`, `shmget`, `open_memstream`, `semget`, `eventfd`, and the bounded +host-side `timerfd` / `signalfd` cases. The broader relibc plan also records successful downstream +builds for `libwayland`, `qtbase`, and `openssh`, which means real consumers are already benefiting +from this work, but those consumers do **not** all prove IPC depth equally. + +### 2. Weak spots + +The biggest weakness is **boundedness masquerading as compatibility**. The SysV layers exist, but +they are deliberately thin wrappers over `/scheme/shm/` and relibc-local bookkeeping, not a broad +Unix-complete implementation. + +In `sys_shm/mod.rs`, `shmat()` rejects non-null attach addresses with `ENOSYS`, `SHM_RND` is +defined but not meaningfully implemented, and `shmctl()` only meaningfully supports `IPC_RMID` and +`IPC_STAT`. This is good enough for simple `IPC_PRIVATE` workflows and current compile-time +consumers, but it is not strong enough to claim general SysV shared-memory completeness. + +In `sys_sem/mod.rs`, `semget()` rejects any `nsems != 1`, so the implementation is effectively a +single-semaphore set model rather than a full semaphore-set model. `semop()` supports multiple +operations in one call, but only for semaphore number 0, and there is no `semtimedop()` support. +`SEM_UNDO` is defined but not actually implemented. Compared with the standard `semop(2)` model, +this means the current layer matches only the narrowest downstream cases. + +Named POSIX semaphores are also present but still bounded. `sem_open()` is implemented on top of +`shm_open()`, which is a practical Redox-native strategy, but the current code comments already mark +it as a bounded Redox path rather than a full Linux/glibc-equivalent semantic model. + +The descriptor-event primitives are in a similar state. `eventfd` is in comparatively good shape, +including a host fallback for Linux test execution. `signalfd` and `timerfd` are weaker. The host +tests for both currently report bounded unavailability instead of successful execution, which is +better than a hang or crash but still leaves them short of runtime trust. `timerfd` in particular +supports only `TFD_CLOEXEC`, `TFD_NONBLOCK`, and `TFD_TIMER_ABSTIME`; Linux-style +`TFD_TIMER_CANCEL_ON_SET` semantics are still absent, and downstream KWin code explicitly wants +that flag. + +### 3. Missing areas + +The obvious missing IPC area is message queues. Both `mqueue.h` and `sys/msg.h` remain TODOs in the +header tree, which means relibc currently has no story at all for POSIX message queues or SysV +message queues. That is not necessarily today’s highest-value blocker, but it is still a real IPC +gap and should be named directly instead of being buried under generic TODO volume. + +## Downstream Subsystem Assessment + +### Qt / KDE + +Qt and KDE are the clearest subsystem forcing IPC depth rather than just IPC surface area. + +`local/docs/QT6-PORT-STATUS.md` already treats `QSharedMemory`, `QSystemSemaphore`, and `QProcess` +as moved from “missing libc surface” to “present, but still needs runtime validation”. That is the +right framing. The libc surface is no longer the primary blocker; confidence and semantics are. + +The strongest concrete consumers in-tree are: + +- `local/recipes/kde/kf6-kservice/source/src/sycoca/kmemfile.cpp` — heavy `QSharedMemory` usage +- `local/recipes/kde/kf6-solid/source/src/solid/devices/backends/udisks2/udisksopticaldisc.cpp` — + `QSharedMemory` plus `QSystemSemaphore` +- `local/recipes/kde/kf6-kio/source/src/gui/previewjob.cpp` — direct SysV `shmget` / `shmat` +- `local/recipes/kde/kwin/source/src/utils/xcbutils.cpp` — direct `shmget` +- `local/recipes/kde/kwin/source/src/core/syncobjtimeline.cpp` and kio scoped-process code — + `eventfd` +- `local/recipes/kde/kwin/source/src/plugins/nightlight/clockskewnotifierengine_linux.cpp` — + `timerfd` with `TFD_TIMER_CANCEL_ON_SET` + +This matters because it shows two different downstream classes: + +1. **Qt abstractions** (`QSharedMemory`, `QSystemSemaphore`) that can tolerate bounded underlying + libc behavior if their common paths work. +2. **Direct Unix/Linux-style callers** (KIO/KWin) that expose the places where the current relibc + SysV and timerfd layers are still semantically narrower than software expects. + +### Wayland stack + +Wayland is less about classic shared-memory IPC completeness now and more about the descriptor-event +side of the same subsystem family. The repo’s existing docs correctly show that `signalfd`, +`timerfd`, `eventfd`, and `open_memstream` were the historical blockers and are now source-visible. +`libwayland` cooking successfully is strong build-side proof, but the remaining work is runtime +behavior under a compositor/session stack. + +### Secondary consumers: OpenSSH / GLib / tmux + +These are weaker IPC drivers and stronger networking/resolver drivers. They still matter because they +show a pattern: once relibc exports the needed surface, downstream recipes can drop fake fallbacks, +but runtime validation still trails source visibility. For an IPC-focused roadmap, they are useful +secondary evidence, not primary IPC blockers. + +The downstream proof should therefore be read this way: + +- `qtbase` is the strongest IPC-facing downstream because it directly pressures shared memory, + semaphores, and process behavior. +- KDE consumers on top of Qt are the strongest subsystem evidence for where IPC semantics still need + runtime trust. +- `libwayland` is strongest as descriptor-event proof (`signalfd`, `timerfd`, `eventfd`, + `open_memstream`) rather than SysV IPC proof. +- `openssh`, `glib`, and `tmux` are useful proof that relibc header/export cleanup is helping real + ports, but they should not be over-counted as core IPC validation. + +## Main Blockers + +### Blocker 1 — SysV layers are intentionally narrower than their API surface suggests + +This is the highest-value blocker because it affects both direct consumers and Qt/KDE confidence. + +Current examples: + +- `semget()` only supports one semaphore per set +- `semop()` only supports semaphore number 0 +- `SEM_UNDO` is not implemented +- `semtimedop()` is absent +- `shmat()` does not support non-null attach addresses +- `shmctl()` does not cover the broader control matrix +- SysV message queues are absent entirely + +None of these invalidate the current build work. But together they mean “API present” is still not +the same as “subsystem-complete”. + +### Blocker 2 — Runtime validation is still shallower than subsystem importance + +The IPC surface is better-tested than before, but runtime validation still trails the subsystem’s +importance. + +Current test story: + +- host-side focused execution exists for `sem_open`, `shmget`, `open_memstream`, `semget`, and + `eventfd` +- `signalfd` and `timerfd` are in the test harness, but host execution currently reports bounded + unavailability +- downstream build evidence exists for `libwayland`, `qtbase`, and `openssh` + +What is still missing is stronger Redox-target or consumer-runtime proof for Qt/KDE and Wayland +paths that actually exercise shared memory, semaphores, and timer/signal descriptor behavior in a +live session. + +The strongest safe claim today is therefore: + +- **source-visible** across the major IPC surfaces, +- **test-visible** for focused host-side cases, +- **build-visible downstream** for meaningful consumers, +- but **not yet broadly runtime-trusted on Redox**. + +### Blocker 3 — Descriptor-event semantics are still narrower than Linux-oriented callers expect + +KWin’s timer code wants `TFD_TIMER_CANCEL_ON_SET`. The current relibc timerfd layer does not support +that flag. This is a concrete example of a downstream expectation gap that is not solved by simply +having `timerfd_create()` present. + +Likewise, `signalfd` support is visible and exported, but its current confidence story is still too +thin for broad claims about desktop/runtime readiness. + +### Blocker 4 — Message queues remain a completely open IPC front + +`mqueue.h` and `sys/msg.h` are still absent. This is not the first blocker to fix for today’s +desktop stack, but it is the clearest “IPC truly not implemented yet” gap left in relibc. + +## Current Non-Goals / Not Yet Claimed + +The current tree should **not** be described as claiming any of the following: + +- full SysV semaphore-set semantics +- full SysV shared-memory semantics +- full Linux-equivalent `timerfd` semantics +- broad Redox-runtime trust for `signalfd` or `timerfd` +- any POSIX message queue support +- any SysV message queue support + +## Recommended Improvement Plan + +### Phase I1 — Reclassify the IPC support language + +**Goal:** Make subsystem docs accurately describe the current state. + +**Do:** + +- describe POSIX shm and semaphores as implemented +- describe SysV shm and semaphores as **bounded compatibility layers**, not comprehensive support +- describe `eventfd` as stronger than `signalfd` / `timerfd` +- describe message queues as still absent + +**Exit criteria:** repo docs stop using broad phrases that imply complete IPC compatibility. + +### Phase I2 — Harden the bounded SysV compatibility layers + +**Goal:** Make the existing SysV support less misleading and more useful. + +**Do:** + +- decide whether Red Bear wants full semaphore-set support or an intentionally limited single-set model +- if limited, document that choice explicitly in relibc and subsystem docs +- otherwise extend `semget` / `semop` / `semctl` beyond the current semaphore-0-only model +- implement or explicitly reject `SEM_UNDO` +- add `semtimedop()` if downstreams need it +- expand `shmctl()` and `shmat()` support where real consumers need more than the current `IPC_PRIVATE` + attach workflow + +**Exit criteria:** the SysV shm/sem layers either become materially broader or are clearly documented +as intentionally bounded Redox compatibility shims. + +### Phase I3 — Close the Qt/KDE runtime-proof gap + +**Goal:** Move the IPC story from build-visible to desktop-visible. + +**Do:** + +- validate `QSharedMemory` under real Qt/KDE usage paths +- validate `QSystemSemaphore` in KDE consumers such as Solid +- validate KIO / KWin direct SysV shm paths +- record exactly which Qt/KDE IPC paths are now runtime-trusted versus merely build-capable + +**Exit criteria:** Qt/KDE docs stop listing shared memory and semaphore support as unresolved relibc +confidence gaps. + +### Phase I4 — Improve descriptor-event completeness for compositor/session code + +**Goal:** Turn the current `eventfd` / `signalfd` / `timerfd` set into a more trustworthy runtime layer. + +**Do:** + +- keep `eventfd` on the current stable path +- validate `signalfd` in real event-loop style consumers +- extend `timerfd` semantics where current downstream code expects more than `TFD_TIMER_ABSTIME` + (notably `TFD_TIMER_CANCEL_ON_SET`) +- build targeted Redox-target tests where host behavior is inherently not representative + +**Exit criteria:** at least one meaningful compositor/session consumer is runtime-validated against +the current descriptor-event path. + +### Phase I5 — Triage message queues explicitly + +**Goal:** Stop leaving message queues as unprioritized TODOs. + +**Do:** + +- determine whether any current Red Bear subsystem actually needs POSIX or SysV message queues +- if not, mark them as lower-priority completeness debt +- if yes, create a dedicated implementation plan rather than burying them in generic header backlog + +**Exit criteria:** `mqueue.h` and `sys/msg.h` are either on a concrete roadmap or explicitly treated +as non-blocking backlog. + +## Recommended Order + +The current best order is: + +1. documentation cleanup and accurate IPC classification +2. SysV shm/sem hardening or explicit non-goal documentation +3. Qt/KDE runtime validation +4. descriptor-event runtime validation and timerfd semantic expansion +5. message queue triage + +That order matches the current subsystem pressure better than a generic “finish all missing IPC +headers” strategy. + +## Bottom Line + +relibc IPC in Red Bear OS is no longer a story of missing primitives. It is now a story of **real +surface area with bounded compatibility depth**. + +The strongest parts are POSIX shm, POSIX semaphores, `eventfd`, and the fact that major downstreams +already build. The weakest parts are the narrow SysV semantics, the lack of message queues, and the +runtime-proof gap for the desktop/session stack. The right next step is not random header work; it +is to harden and validate the IPC layers that current Qt/KDE and Wayland-adjacent consumers are +already trying to use.