Files

T

vasilito 029472d5e3 feat: IOMMU-aware DmaAllocator + comprehensive DMA/thread audit

dma.rs: IommuDmaAllocator (145 lines)
- New struct wires existing IOMMU daemon (1003 lines) to existing DmaBuffer (261)
- allocate(): phys-contiguous alloc via scheme:memory, then MAP through IOMMU domain
- unmap(): sends UNMAP to IOMMU domain, releases IOVA
- Inlined IOMMU protocol constants — no new crate dependency
- encode_iommu_request/decode_iommu_response for scheme write/read cycle

Documentation updates:
- IMPLEMENTATION-MASTER-PLAN.md: K2 DMA/IOMMU section expanded from 3-line gap
  list to full audit with component inventory, gap analysis, implementation plan
  (D2.1-D2.5), Linux reference table. Added K2b thread/fork audit.
- CPU-DMA-IRQ-MSI-SCHEDULER-FIX-PLAN.md: Phase 1 (MSI) marked complete with
  per-task status. Phase 2 (DMA) re-scoped from 'create' to 'wire' based on
  audit. Phase 3 (scheduler) marked mostly done.
- IRQ-AND-LOWLEVEL-CONTROLLERS-ENHANCEMENT-PLAN.md: kernel MSI support noted
  as materially strong with P8-msi.patch reference.

Audit findings:
- IOMMU daemon is solid: 1003-line lib.rs with full scheme protocol,
  427-line amd_vi.rs, host-runnable tests. Needs wiring, not rewriting.
- DmaBuffer exists but is IOMMU-unaware — IommuDmaAllocator bridges this.
- relibc rlct_clone is correct for threads (shares addr space implicitly).
  '3 IPC hops' claim is microkernel-architectural, not a real perf issue.
- No stale docs to archive at this time.

2026-05-04 18:18:04 +01:00

6.7 KiB

Raw Blame History

Red Bear OS — CPU/DMA/IRQ/MSI/Scheduler Fix Plan

Date: 2026-05-04 Updated: 2026-05-04 (MSI T1.1–T2.2 implemented, committed, pushed) Status: Active — MSI Phase 1 complete, DMA/Scheduler pending Source of truth: Linux kernel 7.0 (local/reference/linux-7.0/)

1. Problem Statement

Five critical integration gaps in the microkernel architecture:

Gap	Severity	Impact	Status
MSI absent from kernel	CRITICAL	All NVMe/GPU/NIC on legacy INTx	✅ RESOLVED (P8-msi.patch)
DMA/IOMMU not integrated	CRITICAL	DMA buffers unprotected	⏳ Pending
PIT tick (148Hz) vs LAPIC (1000Hz)	HIGH	Scheduler 6x slower than Linux	✅ RESOLVED (P7-scheduler patch)
Global scheduler lock	HIGH	Serializes all context switches	✅ RESOLVED (work-stealing)
Thread creation (3 IPC hops)	HIGH	3x slower than Linux clone()	⏳ Pending

2. Phase 1: MSI/MSI-X in Kernel (Week 1-3) ✅ COMPLETE

T1.1: MSI Capability Parsing ✅ DONE

File: kernel/src/arch/x86_shared/device/msi.rs (61 lines)
Commit: 678980521 in P8-msi.patch
Linux ref: arch/x86/kernel/apic/msi.c (391 lines)
Implements: MsiMessage (compose/validate), MsiCapability (parse 32/64-bit), MsixCapability (parse table/PBA), is_valid_msi_address, is_valid_msi_vector
Bounds-safe: all parse() methods return Option<Self>, using .get() instead of raw indexing

T1.2: Vector Allocation Matrix ✅ DONE

File: kernel/src/arch/x86_shared/device/vector.rs (53 lines)
Commit: 678980521 in P8-msi.patch
Linux ref: arch/x86/kernel/apic/vector.c (1387 lines)
Implements: per-CPU bitmatrix (7×32-bit banks = 224 vectors 32-255), allocate_vector, free_vector
Lock-free CAS-based allocation with trailing_ones() find-first-zero
NOTE: VECTORS table is global (not yet per-CPU sharded) — sufficient for 224 vectors

T1.3: MSI IRQ Domain (Scheme Integration) ✅ DONE

File: kernel/src/scheme/irq.rs
Commit: 678980521 in P8-msi.patch
Implements: msi_vector_is_valid() (32-0xEF range check), iommu_validate_msi_irq() hook (stub: always true), IOMMU gate at irq_trigger() for vectors ≥16

T1.4: Userspace MSI Consumer (driver-sys) ✅ DONE

File: local/recipes/drivers/redox-driver-sys/source/src/irq.rs
Commit: 678980521
Implements: MsiAllocation with round-robin CPU allocation, irq_set_affinity (scheme write), program_x86_message with kernel-mediated address/vector validation (mask 0xFFF0_0000)
Quirk-aware fallback retained: FORCE_LEGACY, NO_MSI, NO_MSIX

T1.5: Kernel-side MSI Affinity Handler ✅ DONE

File: kernel/src/scheme/irq.rs
Commit: 678980521 in P8-msi.patch
Implements: Handle::IrqAffinity { irq, mask } variant, path routing for <irq>/affinity and cpu-XX/<irq>/affinity, kwrite validates CPU id and stores mask atomically, kfstat/kfpath/kreadoff/close all handle new variant

3. Phase 2: DMA/IOMMU Integration (Week 3-5) — AUDITED 2026-05-04

Status: IOMMU daemon (1003 lines) and DmaBuffer (261 lines) already exist and are solid. Tasks re-scoped from "create" to "wire."

T2.1: IommuDmaAllocator (driver-sys) ⏳ P0

File: local/recipes/drivers/redox-driver-sys/source/src/dma.rs
Add IommuDmaAllocator struct: holds IOMMU domain fd, wraps DmaBuffer::allocate() with IOMMU MAP opcode
Uses scheme:iommu/domain/N write with MAP request → get IOVA
Linux ref: include/linux/dma-mapping.h — dma_alloc_coherent() → iommu_dma_alloc()

T2.2: GPU DMA pass-through ⏳ P0

Wire redox-drm GPU drivers to open IOMMU device endpoint and use IommuDmaAllocator
amdgpu: VRAM/GTT allocations through IOMMU domain
Intel i915: GTT pages through IOMMU domain
Files: local/recipes/gpu/redox-drm/source/, local/recipes/gpu/amdgpu/source/

T2.3: Streaming DMA (linux-kpi) ⏳ P1

dma_map_single(): allocate bounce buffer, copy data, map through IOMMU
dma_unmap_single(): copy back, unmap, free bounce buffer
Linux ref: kernel/dma/mapping.c — streaming API
File: local/recipes/drivers/linux-kpi/source/

T2.4: NVMe DMA pass-through ⏳ P1

Wire ahcid/nvmed PRP list physical addresses through IOMMU domain
Linux ref: drivers/nvme/host/pci.c — nvme_map_data()

T2.5: SWIOTLB Fallback (low priority) ⏳ P2

Linux ref: kernel/dma/swiotlb.c
Bounce buffer for devices with <4GB DMA addressing
Only needed for ancient hardware; x86_64 modern hardware doesn't need it

4. Phase 3: Scheduler Improvements (Week 4-6) — MOSTLY DONE

T3.1: LAPIC Timer as Primary Tick ✅ DONE

P7-scheduler-improvements.patch: LAPIC timer calibrated + enabled at vector 48
TSC-deadline mode, 1000Hz tick drives DWRR scheduler directly
PIT fallback retained

T3.2: Per-CPU Scheduler Locks ✅ DONE

Work-stealing load balancer in switch.rs
Per-CPU nr_running counter
Idle CPUs steal work via IPI

T3.3: Load Balancing ✅ DONE

RT scheduling class (priority 0-9, skip DWRR, immediate dispatch)
Threshold reduced: 3→1 ticks for LAPIC-driven mode
Geometric weights in DWRR

T3.4: RT Scheduling Class ✅ DONE

T3.5: NUMA-Aware Scheduling ❌

Not implemented — low priority for desktop/non-NUMA systems
Linux ref: kernel/sched/rt.c
FIFO and Round-Robin classes
Priority inheritance
RT throttling: 95% CPU cap/sec

T3.5: TSC-Deadline Timer

Use IA32_TSC_DEADLINE MSR for precise tick
True tickless operation
TSC calibration via HPET or PIT

5. Phase 4: Thread Creation (Week 6-7)

T4.1: Batched Thread Creation

Batch new-thread requests (reduce IPC)
Pre-allocate stack pages during fork

T4.2: Kernel Thread Pool

Pre-create idle kernel threads
Reuse via object pool

T4.3: Shared Memory IPC

Use shm for proc scheme bulk ops
Avoid data copy through IPC channel

6. Dependencies

Phase 1 (MSI): T1.1 -> T1.2 -> T1.3 -> T1.4 -> T1.5 Phase 2 (DMA): T2.1 -> T2.2 -> T2.3 -> T2.4 -> T2.5 Phase 3 (Sched): T3.1 -> T3.5 -> T3.2 -> T3.3 -> T3.4 Phase 4 (Thread): T4.1 -> T4.2 -> T4.3

Phase 1+2 independent (parallel). Phase 2.4 needs Phase 1.3. Phase 3.1 partially done (start immediately).

7. Timeline

Phase	Duration	Cumulative
Phase 1 (MSI)	3 weeks	Week 3
Phase 2 (DMA/IOMMU)	3 weeks	Week 5
Phase 3 (Scheduler)	3 weeks	Week 7
Phase 4 (Threads)	2 weeks	Week 7

Total: 7 weeks (2 devs parallel Phase 1+2)

8. Success Metrics

Metric	Before	After
Scheduler tick	148Hz (PIT)	1000Hz (LAPIC)
NVMe throughput	INTx shared	MSI-X 4+ queues
Context switch	~6.75ms	~1ms
Thread create	3 IPC hops	2 IPC hops
DMA safety	Unprotected	IOMMU-mapped

6.7 KiB Raw Blame History Unescape Escape