029472d5e3
dma.rs: IommuDmaAllocator (145 lines) - New struct wires existing IOMMU daemon (1003 lines) to existing DmaBuffer (261) - allocate(): phys-contiguous alloc via scheme:memory, then MAP through IOMMU domain - unmap(): sends UNMAP to IOMMU domain, releases IOVA - Inlined IOMMU protocol constants — no new crate dependency - encode_iommu_request/decode_iommu_response for scheme write/read cycle Documentation updates: - IMPLEMENTATION-MASTER-PLAN.md: K2 DMA/IOMMU section expanded from 3-line gap list to full audit with component inventory, gap analysis, implementation plan (D2.1-D2.5), Linux reference table. Added K2b thread/fork audit. - CPU-DMA-IRQ-MSI-SCHEDULER-FIX-PLAN.md: Phase 1 (MSI) marked complete with per-task status. Phase 2 (DMA) re-scoped from 'create' to 'wire' based on audit. Phase 3 (scheduler) marked mostly done. - IRQ-AND-LOWLEVEL-CONTROLLERS-ENHANCEMENT-PLAN.md: kernel MSI support noted as materially strong with P8-msi.patch reference. Audit findings: - IOMMU daemon is solid: 1003-line lib.rs with full scheme protocol, 427-line amd_vi.rs, host-runnable tests. Needs wiring, not rewriting. - DmaBuffer exists but is IOMMU-unaware — IommuDmaAllocator bridges this. - relibc rlct_clone is correct for threads (shares addr space implicitly). '3 IPC hops' claim is microkernel-architectural, not a real perf issue. - No stale docs to archive at this time.
6.7 KiB
6.7 KiB
Red Bear OS — CPU/DMA/IRQ/MSI/Scheduler Fix Plan
Date: 2026-05-04 Updated: 2026-05-04 (MSI T1.1–T2.2 implemented, committed, pushed) Status: Active — MSI Phase 1 complete, DMA/Scheduler pending Source of truth: Linux kernel 7.0 (local/reference/linux-7.0/)
1. Problem Statement
Five critical integration gaps in the microkernel architecture:
| Gap | Severity | Impact | Status |
|---|---|---|---|
| MSI absent from kernel | CRITICAL | All NVMe/GPU/NIC on legacy INTx | ✅ RESOLVED (P8-msi.patch) |
| DMA/IOMMU not integrated | CRITICAL | DMA buffers unprotected | ⏳ Pending |
| PIT tick (148Hz) vs LAPIC (1000Hz) | HIGH | Scheduler 6x slower than Linux | ✅ RESOLVED (P7-scheduler patch) |
| Global scheduler lock | HIGH | Serializes all context switches | ✅ RESOLVED (work-stealing) |
| Thread creation (3 IPC hops) | HIGH | 3x slower than Linux clone() | ⏳ Pending |
2. Phase 1: MSI/MSI-X in Kernel (Week 1-3) ✅ COMPLETE
T1.1: MSI Capability Parsing ✅ DONE
- File:
kernel/src/arch/x86_shared/device/msi.rs(61 lines) - Commit:
678980521inP8-msi.patch - Linux ref:
arch/x86/kernel/apic/msi.c(391 lines) - Implements:
MsiMessage(compose/validate),MsiCapability(parse 32/64-bit),MsixCapability(parse table/PBA),is_valid_msi_address,is_valid_msi_vector - Bounds-safe: all
parse()methods returnOption<Self>, using.get()instead of raw indexing
T1.2: Vector Allocation Matrix ✅ DONE
- File:
kernel/src/arch/x86_shared/device/vector.rs(53 lines) - Commit:
678980521inP8-msi.patch - Linux ref:
arch/x86/kernel/apic/vector.c(1387 lines) - Implements: per-CPU bitmatrix (7×32-bit banks = 224 vectors 32-255),
allocate_vector,free_vector - Lock-free CAS-based allocation with
trailing_ones()find-first-zero - NOTE: VECTORS table is global (not yet per-CPU sharded) — sufficient for 224 vectors
T1.3: MSI IRQ Domain (Scheme Integration) ✅ DONE
- File:
kernel/src/scheme/irq.rs - Commit:
678980521inP8-msi.patch - Implements:
msi_vector_is_valid()(32-0xEF range check),iommu_validate_msi_irq()hook (stub: always true), IOMMU gate atirq_trigger()for vectors ≥16
T1.4: Userspace MSI Consumer (driver-sys) ✅ DONE
- File:
local/recipes/drivers/redox-driver-sys/source/src/irq.rs - Commit:
678980521 - Implements:
MsiAllocationwith round-robin CPU allocation,irq_set_affinity(scheme write),program_x86_messagewith kernel-mediated address/vector validation (mask0xFFF0_0000) - Quirk-aware fallback retained: FORCE_LEGACY, NO_MSI, NO_MSIX
T1.5: Kernel-side MSI Affinity Handler ✅ DONE
- File:
kernel/src/scheme/irq.rs - Commit:
678980521inP8-msi.patch - Implements:
Handle::IrqAffinity { irq, mask }variant, path routing for<irq>/affinityandcpu-XX/<irq>/affinity, kwrite validates CPU id and stores mask atomically, kfstat/kfpath/kreadoff/close all handle new variant
3. Phase 2: DMA/IOMMU Integration (Week 3-5) — AUDITED 2026-05-04
Status: IOMMU daemon (1003 lines) and DmaBuffer (261 lines) already exist and are solid. Tasks re-scoped from "create" to "wire."
T2.1: IommuDmaAllocator (driver-sys) ⏳ P0
- File:
local/recipes/drivers/redox-driver-sys/source/src/dma.rs - Add
IommuDmaAllocatorstruct: holds IOMMU domain fd, wrapsDmaBuffer::allocate()with IOMMU MAP opcode - Uses
scheme:iommu/domain/Nwrite with MAP request → get IOVA - Linux ref:
include/linux/dma-mapping.h—dma_alloc_coherent()→iommu_dma_alloc()
T2.2: GPU DMA pass-through ⏳ P0
- Wire
redox-drmGPU drivers to open IOMMU device endpoint and use IommuDmaAllocator - amdgpu: VRAM/GTT allocations through IOMMU domain
- Intel i915: GTT pages through IOMMU domain
- Files:
local/recipes/gpu/redox-drm/source/,local/recipes/gpu/amdgpu/source/
T2.3: Streaming DMA (linux-kpi) ⏳ P1
dma_map_single(): allocate bounce buffer, copy data, map through IOMMUdma_unmap_single(): copy back, unmap, free bounce buffer- Linux ref:
kernel/dma/mapping.c— streaming API - File:
local/recipes/drivers/linux-kpi/source/
T2.4: NVMe DMA pass-through ⏳ P1
- Wire
ahcid/nvmedPRP list physical addresses through IOMMU domain - Linux ref:
drivers/nvme/host/pci.c—nvme_map_data()
T2.5: SWIOTLB Fallback (low priority) ⏳ P2
- Linux ref:
kernel/dma/swiotlb.c - Bounce buffer for devices with <4GB DMA addressing
- Only needed for ancient hardware; x86_64 modern hardware doesn't need it
4. Phase 3: Scheduler Improvements (Week 4-6) — MOSTLY DONE
T3.1: LAPIC Timer as Primary Tick ✅ DONE
- P7-scheduler-improvements.patch: LAPIC timer calibrated + enabled at vector 48
- TSC-deadline mode, 1000Hz tick drives DWRR scheduler directly
- PIT fallback retained
T3.2: Per-CPU Scheduler Locks ✅ DONE
- Work-stealing load balancer in switch.rs
- Per-CPU nr_running counter
- Idle CPUs steal work via IPI
T3.3: Load Balancing ✅ DONE
- RT scheduling class (priority 0-9, skip DWRR, immediate dispatch)
- Threshold reduced: 3→1 ticks for LAPIC-driven mode
- Geometric weights in DWRR
T3.4: RT Scheduling Class ✅ DONE
T3.5: NUMA-Aware Scheduling ❌
- Not implemented — low priority for desktop/non-NUMA systems
- Linux ref: kernel/sched/rt.c
- FIFO and Round-Robin classes
- Priority inheritance
- RT throttling: 95% CPU cap/sec
T3.5: TSC-Deadline Timer
- Use IA32_TSC_DEADLINE MSR for precise tick
- True tickless operation
- TSC calibration via HPET or PIT
5. Phase 4: Thread Creation (Week 6-7)
T4.1: Batched Thread Creation
- Batch new-thread requests (reduce IPC)
- Pre-allocate stack pages during fork
T4.2: Kernel Thread Pool
- Pre-create idle kernel threads
- Reuse via object pool
T4.3: Shared Memory IPC
- Use shm for proc scheme bulk ops
- Avoid data copy through IPC channel
6. Dependencies
Phase 1 (MSI): T1.1 -> T1.2 -> T1.3 -> T1.4 -> T1.5 Phase 2 (DMA): T2.1 -> T2.2 -> T2.3 -> T2.4 -> T2.5 Phase 3 (Sched): T3.1 -> T3.5 -> T3.2 -> T3.3 -> T3.4 Phase 4 (Thread): T4.1 -> T4.2 -> T4.3
Phase 1+2 independent (parallel). Phase 2.4 needs Phase 1.3. Phase 3.1 partially done (start immediately).
7. Timeline
| Phase | Duration | Cumulative |
|---|---|---|
| Phase 1 (MSI) | 3 weeks | Week 3 |
| Phase 2 (DMA/IOMMU) | 3 weeks | Week 5 |
| Phase 3 (Scheduler) | 3 weeks | Week 7 |
| Phase 4 (Threads) | 2 weeks | Week 7 |
Total: 7 weeks (2 devs parallel Phase 1+2)
8. Success Metrics
| Metric | Before | After |
|---|---|---|
| Scheduler tick | 148Hz (PIT) | 1000Hz (LAPIC) |
| NVMe throughput | INTx shared | MSI-X 4+ queues |
| Context switch | ~6.75ms | ~1ms |
| Thread create | 3 IPC hops | 2 IPC hops |
| DMA safety | Unprotected | IOMMU-mapped |