Files
RedBear-OS/local/docs/CPU-DMA-IRQ-MSI-SCHEDULER-FIX-PLAN.md
T
vasilito 029472d5e3 feat: IOMMU-aware DmaAllocator + comprehensive DMA/thread audit
dma.rs: IommuDmaAllocator (145 lines)
- New struct wires existing IOMMU daemon (1003 lines) to existing DmaBuffer (261)
- allocate(): phys-contiguous alloc via scheme:memory, then MAP through IOMMU domain
- unmap(): sends UNMAP to IOMMU domain, releases IOVA
- Inlined IOMMU protocol constants — no new crate dependency
- encode_iommu_request/decode_iommu_response for scheme write/read cycle

Documentation updates:
- IMPLEMENTATION-MASTER-PLAN.md: K2 DMA/IOMMU section expanded from 3-line gap
  list to full audit with component inventory, gap analysis, implementation plan
  (D2.1-D2.5), Linux reference table. Added K2b thread/fork audit.
- CPU-DMA-IRQ-MSI-SCHEDULER-FIX-PLAN.md: Phase 1 (MSI) marked complete with
  per-task status. Phase 2 (DMA) re-scoped from 'create' to 'wire' based on
  audit. Phase 3 (scheduler) marked mostly done.
- IRQ-AND-LOWLEVEL-CONTROLLERS-ENHANCEMENT-PLAN.md: kernel MSI support noted
  as materially strong with P8-msi.patch reference.

Audit findings:
- IOMMU daemon is solid: 1003-line lib.rs with full scheme protocol,
  427-line amd_vi.rs, host-runnable tests. Needs wiring, not rewriting.
- DmaBuffer exists but is IOMMU-unaware — IommuDmaAllocator bridges this.
- relibc rlct_clone is correct for threads (shares addr space implicitly).
  '3 IPC hops' claim is microkernel-architectural, not a real perf issue.
- No stale docs to archive at this time.
2026-05-04 18:18:04 +01:00

159 lines
6.7 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
# Red Bear OS — CPU/DMA/IRQ/MSI/Scheduler Fix Plan
**Date**: 2026-05-04
**Updated**: 2026-05-04 (MSI T1.1T2.2 implemented, committed, pushed)
**Status**: Active — MSI Phase 1 complete, DMA/Scheduler pending
**Source of truth**: Linux kernel 7.0 (local/reference/linux-7.0/)
## 1. Problem Statement
Five critical integration gaps in the microkernel architecture:
| Gap | Severity | Impact | Status |
|-----|----------|--------|--------|
| MSI absent from kernel | CRITICAL | All NVMe/GPU/NIC on legacy INTx | ✅ RESOLVED (P8-msi.patch) |
| DMA/IOMMU not integrated | CRITICAL | DMA buffers unprotected | ⏳ Pending |
| PIT tick (148Hz) vs LAPIC (1000Hz) | HIGH | Scheduler 6x slower than Linux | ✅ RESOLVED (P7-scheduler patch) |
| Global scheduler lock | HIGH | Serializes all context switches | ✅ RESOLVED (work-stealing) |
| Thread creation (3 IPC hops) | HIGH | 3x slower than Linux clone() | ⏳ Pending |
## 2. Phase 1: MSI/MSI-X in Kernel (Week 1-3) ✅ COMPLETE
### T1.1: MSI Capability Parsing ✅ DONE
- File: `kernel/src/arch/x86_shared/device/msi.rs` (61 lines)
- Commit: `678980521` in `P8-msi.patch`
- Linux ref: `arch/x86/kernel/apic/msi.c` (391 lines)
- Implements: `MsiMessage` (compose/validate), `MsiCapability` (parse 32/64-bit), `MsixCapability` (parse table/PBA), `is_valid_msi_address`, `is_valid_msi_vector`
- Bounds-safe: all `parse()` methods return `Option<Self>`, using `.get()` instead of raw indexing
### T1.2: Vector Allocation Matrix ✅ DONE
- File: `kernel/src/arch/x86_shared/device/vector.rs` (53 lines)
- Commit: `678980521` in `P8-msi.patch`
- Linux ref: `arch/x86/kernel/apic/vector.c` (1387 lines)
- Implements: per-CPU bitmatrix (7×32-bit banks = 224 vectors 32-255), `allocate_vector`, `free_vector`
- Lock-free CAS-based allocation with `trailing_ones()` find-first-zero
- NOTE: VECTORS table is global (not yet per-CPU sharded) — sufficient for 224 vectors
### T1.3: MSI IRQ Domain (Scheme Integration) ✅ DONE
- File: `kernel/src/scheme/irq.rs`
- Commit: `678980521` in `P8-msi.patch`
- Implements: `msi_vector_is_valid()` (32-0xEF range check), `iommu_validate_msi_irq()` hook (stub: always true), IOMMU gate at `irq_trigger()` for vectors ≥16
### T1.4: Userspace MSI Consumer (driver-sys) ✅ DONE
- File: `local/recipes/drivers/redox-driver-sys/source/src/irq.rs`
- Commit: `678980521`
- Implements: `MsiAllocation` with round-robin CPU allocation, `irq_set_affinity` (scheme write), `program_x86_message` with kernel-mediated address/vector validation (mask `0xFFF0_0000`)
- Quirk-aware fallback retained: FORCE_LEGACY, NO_MSI, NO_MSIX
### T1.5: Kernel-side MSI Affinity Handler ✅ DONE
- File: `kernel/src/scheme/irq.rs`
- Commit: `678980521` in `P8-msi.patch`
- Implements: `Handle::IrqAffinity { irq, mask }` variant, path routing for `<irq>/affinity` and `cpu-XX/<irq>/affinity`, kwrite validates CPU id and stores mask atomically, kfstat/kfpath/kreadoff/close all handle new variant
## 3. Phase 2: DMA/IOMMU Integration (Week 3-5) — AUDITED 2026-05-04
**Status**: IOMMU daemon (1003 lines) and DmaBuffer (261 lines) already exist and are solid. Tasks re-scoped from "create" to "wire."
### T2.1: IommuDmaAllocator (driver-sys) ⏳ P0
- File: `local/recipes/drivers/redox-driver-sys/source/src/dma.rs`
- Add `IommuDmaAllocator` struct: holds IOMMU domain fd, wraps `DmaBuffer::allocate()` with IOMMU MAP opcode
- Uses `scheme:iommu/domain/N` write with MAP request → get IOVA
- Linux ref: `include/linux/dma-mapping.h``dma_alloc_coherent()``iommu_dma_alloc()`
### T2.2: GPU DMA pass-through ⏳ P0
- Wire `redox-drm` GPU drivers to open IOMMU device endpoint and use IommuDmaAllocator
- amdgpu: VRAM/GTT allocations through IOMMU domain
- Intel i915: GTT pages through IOMMU domain
- Files: `local/recipes/gpu/redox-drm/source/`, `local/recipes/gpu/amdgpu/source/`
### T2.3: Streaming DMA (linux-kpi) ⏳ P1
- `dma_map_single()`: allocate bounce buffer, copy data, map through IOMMU
- `dma_unmap_single()`: copy back, unmap, free bounce buffer
- Linux ref: `kernel/dma/mapping.c` — streaming API
- File: `local/recipes/drivers/linux-kpi/source/`
### T2.4: NVMe DMA pass-through ⏳ P1
- Wire `ahcid`/`nvmed` PRP list physical addresses through IOMMU domain
- Linux ref: `drivers/nvme/host/pci.c``nvme_map_data()`
### T2.5: SWIOTLB Fallback (low priority) ⏳ P2
- Linux ref: `kernel/dma/swiotlb.c`
- Bounce buffer for devices with <4GB DMA addressing
- Only needed for ancient hardware; x86_64 modern hardware doesn't need it
## 4. Phase 3: Scheduler Improvements (Week 4-6) — MOSTLY DONE
### T3.1: LAPIC Timer as Primary Tick ✅ DONE
- P7-scheduler-improvements.patch: LAPIC timer calibrated + enabled at vector 48
- TSC-deadline mode, 1000Hz tick drives DWRR scheduler directly
- PIT fallback retained
### T3.2: Per-CPU Scheduler Locks ✅ DONE
- Work-stealing load balancer in switch.rs
- Per-CPU nr_running counter
- Idle CPUs steal work via IPI
### T3.3: Load Balancing ✅ DONE
- RT scheduling class (priority 0-9, skip DWRR, immediate dispatch)
- Threshold reduced: 3→1 ticks for LAPIC-driven mode
- Geometric weights in DWRR
### T3.4: RT Scheduling Class ✅ DONE
### T3.5: NUMA-Aware Scheduling ❌
- Not implemented — low priority for desktop/non-NUMA systems
- Linux ref: kernel/sched/rt.c
- FIFO and Round-Robin classes
- Priority inheritance
- RT throttling: 95% CPU cap/sec
### T3.5: TSC-Deadline Timer
- Use IA32_TSC_DEADLINE MSR for precise tick
- True tickless operation
- TSC calibration via HPET or PIT
## 5. Phase 4: Thread Creation (Week 6-7)
### T4.1: Batched Thread Creation
- Batch new-thread requests (reduce IPC)
- Pre-allocate stack pages during fork
### T4.2: Kernel Thread Pool
- Pre-create idle kernel threads
- Reuse via object pool
### T4.3: Shared Memory IPC
- Use shm for proc scheme bulk ops
- Avoid data copy through IPC channel
## 6. Dependencies
Phase 1 (MSI): T1.1 -> T1.2 -> T1.3 -> T1.4 -> T1.5
Phase 2 (DMA): T2.1 -> T2.2 -> T2.3 -> T2.4 -> T2.5
Phase 3 (Sched): T3.1 -> T3.5 -> T3.2 -> T3.3 -> T3.4
Phase 4 (Thread): T4.1 -> T4.2 -> T4.3
Phase 1+2 independent (parallel). Phase 2.4 needs Phase 1.3.
Phase 3.1 partially done (start immediately).
## 7. Timeline
| Phase | Duration | Cumulative |
|-------|----------|------------|
| Phase 1 (MSI) | 3 weeks | Week 3 |
| Phase 2 (DMA/IOMMU) | 3 weeks | Week 5 |
| Phase 3 (Scheduler) | 3 weeks | Week 7 |
| Phase 4 (Threads) | 2 weeks | Week 7 |
Total: 7 weeks (2 devs parallel Phase 1+2)
## 8. Success Metrics
| Metric | Before | After |
|--------|--------|-------|
| Scheduler tick | 148Hz (PIT) | 1000Hz (LAPIC) |
| NVMe throughput | INTx shared | MSI-X 4+ queues |
| Context switch | ~6.75ms | ~1ms |
| Thread create | 3 IPC hops | 2 IPC hops |
| DMA safety | Unprotected | IOMMU-mapped |