Files
RedBear-OS/local/docs/CPU-DMA-IRQ-MSI-SCHEDULER-FIX-PLAN.md
T
vasilito 029472d5e3 feat: IOMMU-aware DmaAllocator + comprehensive DMA/thread audit
dma.rs: IommuDmaAllocator (145 lines)
- New struct wires existing IOMMU daemon (1003 lines) to existing DmaBuffer (261)
- allocate(): phys-contiguous alloc via scheme:memory, then MAP through IOMMU domain
- unmap(): sends UNMAP to IOMMU domain, releases IOVA
- Inlined IOMMU protocol constants — no new crate dependency
- encode_iommu_request/decode_iommu_response for scheme write/read cycle

Documentation updates:
- IMPLEMENTATION-MASTER-PLAN.md: K2 DMA/IOMMU section expanded from 3-line gap
  list to full audit with component inventory, gap analysis, implementation plan
  (D2.1-D2.5), Linux reference table. Added K2b thread/fork audit.
- CPU-DMA-IRQ-MSI-SCHEDULER-FIX-PLAN.md: Phase 1 (MSI) marked complete with
  per-task status. Phase 2 (DMA) re-scoped from 'create' to 'wire' based on
  audit. Phase 3 (scheduler) marked mostly done.
- IRQ-AND-LOWLEVEL-CONTROLLERS-ENHANCEMENT-PLAN.md: kernel MSI support noted
  as materially strong with P8-msi.patch reference.

Audit findings:
- IOMMU daemon is solid: 1003-line lib.rs with full scheme protocol,
  427-line amd_vi.rs, host-runnable tests. Needs wiring, not rewriting.
- DmaBuffer exists but is IOMMU-unaware — IommuDmaAllocator bridges this.
- relibc rlct_clone is correct for threads (shares addr space implicitly).
  '3 IPC hops' claim is microkernel-architectural, not a real perf issue.
- No stale docs to archive at this time.
2026-05-04 18:18:04 +01:00

6.7 KiB
Raw Blame History

Red Bear OS — CPU/DMA/IRQ/MSI/Scheduler Fix Plan

Date: 2026-05-04 Updated: 2026-05-04 (MSI T1.1T2.2 implemented, committed, pushed) Status: Active — MSI Phase 1 complete, DMA/Scheduler pending Source of truth: Linux kernel 7.0 (local/reference/linux-7.0/)

1. Problem Statement

Five critical integration gaps in the microkernel architecture:

Gap Severity Impact Status
MSI absent from kernel CRITICAL All NVMe/GPU/NIC on legacy INTx RESOLVED (P8-msi.patch)
DMA/IOMMU not integrated CRITICAL DMA buffers unprotected Pending
PIT tick (148Hz) vs LAPIC (1000Hz) HIGH Scheduler 6x slower than Linux RESOLVED (P7-scheduler patch)
Global scheduler lock HIGH Serializes all context switches RESOLVED (work-stealing)
Thread creation (3 IPC hops) HIGH 3x slower than Linux clone() Pending

2. Phase 1: MSI/MSI-X in Kernel (Week 1-3) COMPLETE

T1.1: MSI Capability Parsing DONE

  • File: kernel/src/arch/x86_shared/device/msi.rs (61 lines)
  • Commit: 678980521 in P8-msi.patch
  • Linux ref: arch/x86/kernel/apic/msi.c (391 lines)
  • Implements: MsiMessage (compose/validate), MsiCapability (parse 32/64-bit), MsixCapability (parse table/PBA), is_valid_msi_address, is_valid_msi_vector
  • Bounds-safe: all parse() methods return Option<Self>, using .get() instead of raw indexing

T1.2: Vector Allocation Matrix DONE

  • File: kernel/src/arch/x86_shared/device/vector.rs (53 lines)
  • Commit: 678980521 in P8-msi.patch
  • Linux ref: arch/x86/kernel/apic/vector.c (1387 lines)
  • Implements: per-CPU bitmatrix (7×32-bit banks = 224 vectors 32-255), allocate_vector, free_vector
  • Lock-free CAS-based allocation with trailing_ones() find-first-zero
  • NOTE: VECTORS table is global (not yet per-CPU sharded) — sufficient for 224 vectors

T1.3: MSI IRQ Domain (Scheme Integration) DONE

  • File: kernel/src/scheme/irq.rs
  • Commit: 678980521 in P8-msi.patch
  • Implements: msi_vector_is_valid() (32-0xEF range check), iommu_validate_msi_irq() hook (stub: always true), IOMMU gate at irq_trigger() for vectors ≥16

T1.4: Userspace MSI Consumer (driver-sys) DONE

  • File: local/recipes/drivers/redox-driver-sys/source/src/irq.rs
  • Commit: 678980521
  • Implements: MsiAllocation with round-robin CPU allocation, irq_set_affinity (scheme write), program_x86_message with kernel-mediated address/vector validation (mask 0xFFF0_0000)
  • Quirk-aware fallback retained: FORCE_LEGACY, NO_MSI, NO_MSIX

T1.5: Kernel-side MSI Affinity Handler DONE

  • File: kernel/src/scheme/irq.rs
  • Commit: 678980521 in P8-msi.patch
  • Implements: Handle::IrqAffinity { irq, mask } variant, path routing for <irq>/affinity and cpu-XX/<irq>/affinity, kwrite validates CPU id and stores mask atomically, kfstat/kfpath/kreadoff/close all handle new variant

3. Phase 2: DMA/IOMMU Integration (Week 3-5) — AUDITED 2026-05-04

Status: IOMMU daemon (1003 lines) and DmaBuffer (261 lines) already exist and are solid. Tasks re-scoped from "create" to "wire."

T2.1: IommuDmaAllocator (driver-sys) P0

  • File: local/recipes/drivers/redox-driver-sys/source/src/dma.rs
  • Add IommuDmaAllocator struct: holds IOMMU domain fd, wraps DmaBuffer::allocate() with IOMMU MAP opcode
  • Uses scheme:iommu/domain/N write with MAP request → get IOVA
  • Linux ref: include/linux/dma-mapping.hdma_alloc_coherent()iommu_dma_alloc()

T2.2: GPU DMA pass-through P0

  • Wire redox-drm GPU drivers to open IOMMU device endpoint and use IommuDmaAllocator
  • amdgpu: VRAM/GTT allocations through IOMMU domain
  • Intel i915: GTT pages through IOMMU domain
  • Files: local/recipes/gpu/redox-drm/source/, local/recipes/gpu/amdgpu/source/

T2.3: Streaming DMA (linux-kpi) P1

  • dma_map_single(): allocate bounce buffer, copy data, map through IOMMU
  • dma_unmap_single(): copy back, unmap, free bounce buffer
  • Linux ref: kernel/dma/mapping.c — streaming API
  • File: local/recipes/drivers/linux-kpi/source/

T2.4: NVMe DMA pass-through P1

  • Wire ahcid/nvmed PRP list physical addresses through IOMMU domain
  • Linux ref: drivers/nvme/host/pci.cnvme_map_data()

T2.5: SWIOTLB Fallback (low priority) P2

  • Linux ref: kernel/dma/swiotlb.c
  • Bounce buffer for devices with <4GB DMA addressing
  • Only needed for ancient hardware; x86_64 modern hardware doesn't need it

4. Phase 3: Scheduler Improvements (Week 4-6) — MOSTLY DONE

T3.1: LAPIC Timer as Primary Tick DONE

  • P7-scheduler-improvements.patch: LAPIC timer calibrated + enabled at vector 48
  • TSC-deadline mode, 1000Hz tick drives DWRR scheduler directly
  • PIT fallback retained

T3.2: Per-CPU Scheduler Locks DONE

  • Work-stealing load balancer in switch.rs
  • Per-CPU nr_running counter
  • Idle CPUs steal work via IPI

T3.3: Load Balancing DONE

  • RT scheduling class (priority 0-9, skip DWRR, immediate dispatch)
  • Threshold reduced: 3→1 ticks for LAPIC-driven mode
  • Geometric weights in DWRR

T3.4: RT Scheduling Class DONE

T3.5: NUMA-Aware Scheduling

  • Not implemented — low priority for desktop/non-NUMA systems
  • Linux ref: kernel/sched/rt.c
  • FIFO and Round-Robin classes
  • Priority inheritance
  • RT throttling: 95% CPU cap/sec

T3.5: TSC-Deadline Timer

  • Use IA32_TSC_DEADLINE MSR for precise tick
  • True tickless operation
  • TSC calibration via HPET or PIT

5. Phase 4: Thread Creation (Week 6-7)

T4.1: Batched Thread Creation

  • Batch new-thread requests (reduce IPC)
  • Pre-allocate stack pages during fork

T4.2: Kernel Thread Pool

  • Pre-create idle kernel threads
  • Reuse via object pool

T4.3: Shared Memory IPC

  • Use shm for proc scheme bulk ops
  • Avoid data copy through IPC channel

6. Dependencies

Phase 1 (MSI): T1.1 -> T1.2 -> T1.3 -> T1.4 -> T1.5 Phase 2 (DMA): T2.1 -> T2.2 -> T2.3 -> T2.4 -> T2.5 Phase 3 (Sched): T3.1 -> T3.5 -> T3.2 -> T3.3 -> T3.4 Phase 4 (Thread): T4.1 -> T4.2 -> T4.3

Phase 1+2 independent (parallel). Phase 2.4 needs Phase 1.3. Phase 3.1 partially done (start immediately).

7. Timeline

Phase Duration Cumulative
Phase 1 (MSI) 3 weeks Week 3
Phase 2 (DMA/IOMMU) 3 weeks Week 5
Phase 3 (Scheduler) 3 weeks Week 7
Phase 4 (Threads) 2 weeks Week 7

Total: 7 weeks (2 devs parallel Phase 1+2)

8. Success Metrics

Metric Before After
Scheduler tick 148Hz (PIT) 1000Hz (LAPIC)
NVMe throughput INTx shared MSI-X 4+ queues
Context switch ~6.75ms ~1ms
Thread create 3 IPC hops 2 IPC hops
DMA safety Unprotected IOMMU-mapped