8b872979ef
- udev-shim: replace .expect() with graceful errors (no more panic on Broken pipe) - P4-initfs: remove duplicate sessiond (conflicted with config) - accessibility/ime/keymapd: break instead of exit(1) on EBADF - P6 driver patches rebased - Docs: archive old reports, add implementation master plan
673 lines
34 KiB
Markdown
673 lines
34 KiB
Markdown
# Red Bear OS — Driver & Hardware Improvement Plan
|
|
|
|
**Date**: 2026-05-04
|
|
**Status**: In Progress — Phase 0 ✅, Phase 1 ✅, Phase 2 ✅, Phase 3 ✅, Phase 4 partial, Phase 5 ✅, Addendum A + B added (kernel + daemon audit with precise Linux 7.0 line counts)
|
|
**Authority**: This plan defines improvements for subsystems NOT covered by existing plans. For ACPI, USB, IRQ/PCI, GPU/DRM, Bluetooth, and Wi-Fi, defer to their respective plans. This plan fills the storage, network, and audio gaps and adds cross-cutting concerns.
|
|
|
|
**Source of truth**: Linux kernel 7.0 (`local/reference/linux-7.0/`). When in doubt, Linux behavior is authoritative. Every task includes the specific Linux source file and function to reference.
|
|
|
|
---
|
|
|
|
## Relationship to Existing Plans
|
|
|
|
This plan is **subordinate** to the following plans for their respective subsystems. Tasks here do not duplicate, override, or conflict with them:
|
|
|
|
| Plan Document | Subsystem | Status |
|
|
|---------------|-----------|--------|
|
|
| `ACPI-IMPROVEMENT-PLAN.md` | ACPI sleep, thermal, EC, power states | Active |
|
|
| `IRQ-AND-LOWLEVEL-CONTROLLERS-ENHANCEMENT-PLAN.md` | PCI IRQ, MSI-X, IOMMU, controllers | Active |
|
|
| `USB-IMPLEMENTATION-PLAN.md` | xHCI, EHCI, device lifecycle | Active |
|
|
| `DRM-MODERNIZATION-EXECUTION-PLAN.md` | GPU/DRM display, KMS, Mesa | Active |
|
|
| `BLUETOOTH-IMPLEMENTATION-PLAN.md` | BT host/controller | Active |
|
|
| `WIFI-IMPLEMENTATION-PLAN.md` | Wi-Fi control plane | Active |
|
|
| `CONSOLE-TO-KDE-DESKTOP-PLAN.md` | Desktop/KDE path | Active |
|
|
|
|
**New coverage by this plan**: Storage drivers (AHCI, NVMe), Network drivers (e1000, r8168), Audio drivers (HDA, AC97), Input completeness (PS/2, HID), and cross-cutting driver quality (error handling, logging, lifecycle).
|
|
|
|
---
|
|
|
|
## Validation States
|
|
|
|
All tasks use these validation levels, consistent with existing plans:
|
|
|
|
- **builds** — compiles without error against the target toolchain
|
|
- **enumerates** — discovers hardware and reports it through scheme interfaces
|
|
- **usable** — works in a bounded real scenario (QEMU or bare metal)
|
|
- **validated** — passes explicit acceptance tests with captured evidence
|
|
- **hardware-validated** — proven on real bare metal, not just QEMU
|
|
|
|
---
|
|
|
|
## Phase 0: Cross-Cutting Driver Quality (Weeks 1-2)
|
|
|
|
These improvements apply to ALL drivers and must be done first to establish the quality baseline for subsequent phases.
|
|
|
|
### T0.1: Driver Error Handling Audit
|
|
|
|
**Problem**: Many drivers use `unwrap()`/`expect()` on hardware operations (I/O port reads, MMIO, PCI config space). Hardware failures produce panics instead of graceful degradation.
|
|
|
|
**Task**: Audit all drivers in `recipes/core/base/source/drivers/` and `local/recipes/drivers/` for:
|
|
1. `unwrap()`/`expect()` on hardware I/O — replace with proper `Result` propagation
|
|
2. Missing error logging for hardware failures — add `log::error!()` before error returns
|
|
3. Infinite retry loops without backoff — add bounded retry with exponential backoff
|
|
|
|
**Linux reference**: `drivers/ata/libata-eh.c` — `ata_eh_link_autopsy()` for error classification pattern. Linux distinguishes transient errors (retry), permanent errors (fail), and protocol errors (reset).
|
|
|
|
**File paths**:
|
|
- `recipes/core/base/source/drivers/storage/ahcid/src/main.rs`
|
|
- `recipes/core/base/source/drivers/net/e1000d/src/device.rs`
|
|
- `recipes/core/base/source/drivers/net/rtl8168d/src/device.rs`
|
|
- `recipes/core/base/source/drivers/audio/ihdad/src/main.rs`
|
|
- `recipes/core/base/source/drivers/audio/ac97d/src/device.rs`
|
|
- `local/recipes/drivers/ehcid/source/src/`, `ohcid/`, `uhcid/`
|
|
|
|
**Acceptance**: `grep -r 'unwrap()' recipes/core/base/source/drivers/` returns zero matches for hardware I/O paths. Each `unwrap()` removal includes a `log::error!()` before the error return.
|
|
|
|
### T0.2: Driver Logging Standardization
|
|
|
|
**Problem**: Drivers use inconsistent logging — some use `println!`, some `eprintln!`, some `log::info!`, some no logging at all. Makes debugging hardware issues on bare metal nearly impossible.
|
|
|
|
**Task**: Standardize all drivers to use the `log` crate with logd integration:
|
|
1. Replace `println!`/`eprintln!` with `log::info!`/`log::warn!`/`log::error!`
|
|
2. Log every hardware initialization step (PCI probe, BAR mapping, IRQ registration)
|
|
3. Log every error with the hardware register values that caused it
|
|
4. Add `log::debug!` for register read/write traces (behind a feature flag or compile-time config)
|
|
|
|
**Linux reference**: `drivers/net/ethernet/intel/e1000e/netdev.c` — `e_err()` macro with per-driver message prefix. Linux uses `netdev_err()`, `netdev_warn()`, `netdev_info()` with device context.
|
|
|
|
**Acceptance**: Every driver produces at minimum: one `info!` on start, one `info!` on successful init, one `error!` per failure path with register dump. Verified by booting in QEMU and checking serial output.
|
|
|
|
### T0.3: Driver Lifecycle Documentation
|
|
|
|
**Problem**: No documentation exists for driver initialization sequences, required resources, or expected behavior. New contributors cannot understand or debug drivers.
|
|
|
|
**Task**: For each driver category (storage, network, audio), create a brief `DRIVERS.md` in the driver directory documenting:
|
|
1. Hardware initialization sequence (PCI probe → BAR mapping → device reset → capability enumeration → ready)
|
|
2. Required kernel schemes (scheme:memory, scheme:irq, scheme:pci)
|
|
3. Known hardware quirks
|
|
4. Linux source file(s) to cross-reference
|
|
|
|
**Acceptance**: `DRIVERS.md` exists in `recipes/core/base/source/drivers/storage/`, `drivers/net/`, `drivers/audio/` with the above sections.
|
|
|
|
---
|
|
|
|
## Phase 1: Storage Drivers (Weeks 2-6)
|
|
|
|
### T1.1: AHCI NCQ Support
|
|
|
|
**Problem**: ahcid is 109 lines, only basic PIO/DMA read/write. No NCQ. SSD throughput is 3-5x slower than possible.
|
|
|
|
**Linux reference**: `drivers/ata/libata-sata.c:35` — `sata_fsl_host_intr()` with NCQ error handling. `drivers/ata/ahci.c:1423` — `ahci_qc_prep()` for FIS/command table setup.
|
|
|
|
**Implementation**:
|
|
1. Add command queue structure to `ahcid/src/ahci/` — track up to 32 pending commands per port
|
|
2. Implement `ahci_qc_issue()` modeled on Linux `ata_qc_issue()`:
|
|
- Allocate command slot from device command table
|
|
- Fill command FIS (Frame Information Structure) with READ/WRITE FPDMA command
|
|
- Set PRDT (Physical Region Descriptor Table) for DMA scatter-gather
|
|
- Issue command via PxCI (Port Command Issue) register write
|
|
3. Implement `ahci_port_intr()` modeled on Linux `ahci_port_intr()`:
|
|
- Read PxIS (Port Interrupt Status)
|
|
- Handle D2H Register FIS (command completion)
|
|
- Handle SDB FIS (NCQ completion with per-tag status)
|
|
- Handle PIO Setup FIS (for ATAPI)
|
|
- Handle Device-to-Host FIS errors
|
|
4. Add per-tag completion tracking using `PxSACT` (SActive) register
|
|
|
|
**Files to modify/create**:
|
|
- `recipes/core/base/source/drivers/storage/ahcid/src/main.rs` — NCQ enable in `ahci_init()`
|
|
- `recipes/core/base/source/drivers/storage/ahcid/src/ahci/` — new `ncq.rs`, `fis.rs`
|
|
|
|
**Acceptance**:
|
|
- `fio` random read test on SSD shows ≥3x improvement over current PIO-only
|
|
- NCQ depth 32 verified via `PxSACT` register dump in debug output
|
|
- QEMU with `-device ahci,id=ahci` and `-drive file=...,if=none,id=drive0` produces NCQ completions
|
|
|
|
### T1.2: AHCI Power Management
|
|
|
|
**Problem**: No power management. Laptops drain battery with disk constantly powered.
|
|
|
|
**Linux reference**: `drivers/ata/libata-eh.c:3682` — `ata_eh_handle_port_suspend()`. `drivers/ata/ahci.c` — `ahci_set_lpm()` for Partial/Slumber link power management.
|
|
|
|
**Implementation**:
|
|
1. Add link power management to `ahci_init()`:
|
|
- Set PxCMD.ICC (Interface Communication Control) to Slumber after idle
|
|
- Set PxSCTL.DET to disable PHY when port is idle
|
|
- Restore on new command arrival
|
|
2. Add ALPM (Aggressive Link Power Management):
|
|
- Set AHCI_HOST_CAP2.SDS (Supports Device Sleep) if available
|
|
- Enable HIPM (Host Initiated Power Management) and DIPM (Device Initiated)
|
|
3. Add device sleep (DevSlp) for SATA 3.2+ devices
|
|
|
|
**Acceptance**: After 5 seconds of idle, PxSSTS.DET reports 0x4 (PHY offline). New command wakes the link within 100ms. Verified on bare metal with SATA SSD.
|
|
|
|
### T1.3: AHCI TRIM/Discard
|
|
|
|
**Problem**: SSDs degrade over time without TRIM. Write amplification increases.
|
|
|
|
**Linux reference**: `drivers/ata/libata-scsi.c` — `ata_scsi_unmap_xlat()` maps SCSI UNMAP to ATA DATA SET MANAGEMENT with TRIM bit.
|
|
|
|
**Implementation**:
|
|
1. Add TRIM command support using ATA DATA SET MANAGEMENT (opcode 0x06) with TRIM bit
|
|
2. Implement range list construction (LBA + sector count per entry, up to 64 entries)
|
|
3. Wire into filesystem TRIM/discard path via scheme discard operation
|
|
|
|
**Acceptance**: `fstrim /` (or redoxfs equivalent) issues DATA SET MANAGEMENT commands visible in AHCI debug output. SSD wear leveling counters show improvement after TRIM.
|
|
|
|
### T1.4: NVMe Multiple Queue Support
|
|
|
|
**Problem**: NVMe driver uses single I/O queue. NVMe supports up to 64K queues for parallelism.
|
|
|
|
**Linux reference**: `drivers/nvme/host/pci.c` — `nvme_reset_work()` for controller initialization with queue count negotiation.
|
|
|
|
**Implementation**:
|
|
1. Implement `nvme_create_io_queues()` modeled on Linux:
|
|
- Read controller capabilities for maximum queue count
|
|
- Create one admin submission + completion queue pair
|
|
- Create N I/O submission + completion queue pairs
|
|
- Configure interrupt vectors for MSI-X per-queue
|
|
2. Implement round-robin queue selection for I/O submission
|
|
|
|
**Acceptance**: NVMe device in QEMU reports ≥4 I/O queues. `fio` shows throughput scaling with queue count.
|
|
|
|
---
|
|
|
|
## Phase 2: Network Drivers (Weeks 4-8)
|
|
|
|
### T2.1: e1000 Interrupt Moderation + Checksum Offload
|
|
|
|
**Problem**: e1000d is 458 lines with no hardware offloads. Every packet triggers an interrupt. Throughput is limited by interrupt rate (~10K pps max).
|
|
|
|
**Linux reference**: `drivers/net/ethernet/intel/e1000e/netdev.c:4200` — `e1000_configure_itr()`. `e1000e/netdev.c` — `e1000_tx_csum()`, `e1000_rx_checksum()`.
|
|
|
|
**Implementation**:
|
|
1. **Interrupt moderation** (ITR):
|
|
- Program E1000_ITR register with dynamic moderation
|
|
- Implement `e1000_update_itr()` modeled on Linux: increase ITR under high load, decrease under low load
|
|
- Target: reduce interrupts from 10K/s to 1K/s under full load
|
|
2. **TX checksum offload**:
|
|
- Set E1000_TXD_CMD_IPCSS/TUCMD_IPCSS for IP header checksum
|
|
- Set E1000_TXD_CMD_TCP/UDP for TCP/UDP pseudo-header checksum
|
|
- Set context descriptor for checksum parameters
|
|
3. **RX checksum offload**:
|
|
- Parse E1000_RXD_STAT_IPCS/TCPCS status bits
|
|
- Pass checksum status to netstack
|
|
|
|
**Files to modify**:
|
|
- `recipes/core/base/source/drivers/net/e1000d/src/device.rs` — add ITR, checksum methods
|
|
- `recipes/core/base/source/drivers/net/e1000d/src/main.rs` — wire into TX/RX paths
|
|
|
|
**Acceptance**: `iperf3` TCP throughput ≥5x improvement. Interrupt rate drops from ~10K/s to ≤2K/s under load. Wireshark capture shows valid checksums on TX packets.
|
|
|
|
### T2.2: e1000 TSO/GSO
|
|
|
|
**Problem**: TCP segmentation is done in software. Large sends require per-packet overhead.
|
|
|
|
**Linux reference**: `drivers/net/ethernet/intel/e1000e/netdev.c:5305` — `e1000_tso()`.
|
|
|
|
**Implementation**:
|
|
1. Implement `e1000_tso()` modeled on Linux:
|
|
- Parse GSO descriptor from netstack
|
|
- Set E1000_TXD_CMD_TSE (TCP Segmentation Enable)
|
|
- Set MSS (Maximum Segment Size) in context descriptor
|
|
- Set header length in context descriptor
|
|
- Hardware will segment one large buffer into MSS-sized packets
|
|
2. Implement `e1000_tx_csum()` for combined TSO + checksum offload
|
|
|
|
**Acceptance**: TCP send of 64KB buffer produces hardware-segmented packets (verified via virtio-net capture on host side). Throughput for large sends ≥2x improvement.
|
|
|
|
### T2.3: r8169 PHY Configuration
|
|
|
|
**Problem**: rtl8168d has no per-chip PHY initialization. Works on QEMU's default r8169 but fails on many real chips.
|
|
|
|
**Linux reference**: `drivers/net/ethernet/realtek/r8169_phy_config.c` (1,354 lines of per-chip init sequences).
|
|
|
|
**Implementation**:
|
|
1. Identify chip version from MAC0-MAC4 registers (Linux: `rtl8169_get_mac_version()`)
|
|
2. Add PHY init sequences for common chip versions:
|
|
- RTL_GIGA_MAC_VER_34 (RTL8168EP/8111EP)
|
|
- RTL_GIGA_MAC_VER_44 (RTL8168FP/8111FP)
|
|
- RTL_GIGA_MAC_VER_51 (RTL8168H/8111H)
|
|
3. Implement MDIO register read/write for PHY access
|
|
4. Add PHY status polling for link detection
|
|
|
|
**Files to modify**:
|
|
- `recipes/core/base/source/drivers/net/rtl8168d/src/device.rs` — chip detection, PHY init
|
|
- `recipes/core/base/source/drivers/net/rtl8168d/src/main.rs` — init sequence
|
|
|
|
**Acceptance**: RTL8168 NIC in real hardware enumerates, links up, and passes `ping`. Multiple chip versions tested.
|
|
|
|
### T2.4: Jumbo Frame Support (e1000 + r8169)
|
|
|
|
**Problem**: MTU limited to 1500. Jumbo frames (9000 bytes) reduce per-packet overhead for bulk transfers.
|
|
|
|
**Linux reference**: `e1000e/netdev.c` — `e1000_change_mtu()`. `r8169_main.c:4352` — `rtl_jumbo_config()`.
|
|
|
|
**Implementation**:
|
|
1. Configure RX buffer size for jumbo frames (up to 9KB)
|
|
2. Set MAX_FRAME_SIZE register
|
|
3. Update TX descriptor buffer size
|
|
4. Expose MTU configuration through scheme interface
|
|
|
|
**Acceptance**: `ifconfig eth0 mtu 9000` succeeds. `iperf3` with 9KB MTU shows reduced CPU usage per Gbps.
|
|
|
|
---
|
|
|
|
## Phase 3: Audio Drivers (Weeks 6-10)
|
|
|
|
### T3.1: HDA Codec Auto-Detection
|
|
|
|
**Problem**: ihdad (143 lines) has no codec detection. Audio works on zero real machines.
|
|
|
|
**Linux reference**: `sound/hda/hda_codec.c` — `snd_hda_codec_new()` for codec discovery. `sound/hda/hda_generic.c` for generic codec parser.
|
|
|
|
**Implementation**:
|
|
1. Implement HDA controller initialization:
|
|
- Read GCAP (Global Capabilities) register for stream/IRQ info
|
|
- Reset controller via GCTL.CRST
|
|
- Set CORB/RIRB (Command/Response Ring Buffers) for codec communication
|
|
2. Implement codec discovery:
|
|
- Read STATETS register for codec presence bitmap
|
|
- For each present codec, send GET_PARAMETER verb to read:
|
|
- Vendor/Device ID (F00)
|
|
- Subsystem ID (F20)
|
|
- Revision ID (F02)
|
|
- Node count (F04)
|
|
- Function group type (F05)
|
|
3. Implement codec parsing:
|
|
- Walk widget tree starting from AFG (Audio Function Group) node
|
|
- Parse each widget's parameters (amp capabilities, connection list, pin config)
|
|
- Build internal topology representation
|
|
4. Add codec table for common codecs:
|
|
- Realtek ALC887/ALC888/ALC892 (most common desktop)
|
|
- Realtek ALC269/ALC282/ALC283 (most common laptop)
|
|
- Conexant CX20561/CX20585
|
|
- IDT 92HD73C1/92HD81B1C5
|
|
|
|
**Files to modify/create**:
|
|
- `recipes/core/base/source/drivers/audio/ihdad/src/main.rs` — controller init
|
|
- `recipes/core/base/source/drivers/audio/ihdad/src/hda/` — new `codec.rs`, `widget.rs`, `codecs/`
|
|
- `recipes/core/base/source/drivers/audio/ihdad/src/hda/registers.rs` — register definitions
|
|
|
|
**Acceptance**: Real hardware with Intel HDA controller enumerates codecs. `lspci` shows HD Audio device with driver attached. Codec dump shows vendor/device IDs matching known codecs.
|
|
|
|
### T3.2: HDA Mixer Controls + Jack Detection
|
|
|
|
**Problem**: No volume control, no muting, no jack detection. Audio output is fixed-volume or silent.
|
|
|
|
**Linux reference**: `sound/hda/hda_generic.c` — `create_mute_volume_ctl()`. `sound/hda/hda_jack.c` — `snd_hda_jack_detect()`.
|
|
|
|
**Implementation**:
|
|
1. Add mixer controls for each output path:
|
|
- Volume control (AMP-OUT mute + gain on pin widget)
|
|
- Capture control (AMP-IN mute + gain on ADC widget)
|
|
- Master volume (combined output volume)
|
|
2. Implement jack detection:
|
|
- Enable unsolicited response for jack-sense pin widgets
|
|
- Handle unsolicited response in CORB/RIRB interrupt
|
|
- Report jack state (plugged/unplugged) via scheme
|
|
3. Wire mixer controls to audiod for system-wide volume management
|
|
|
|
**Files to modify**:
|
|
- `recipes/core/base/source/drivers/audio/ihdad/src/hda/codec.rs` — mixer controls
|
|
- `recipes/core/base/source/drivers/audio/ihdad/src/hda/jack.rs` — jack detection (new)
|
|
- `recipes/core/base/source/drivers/audio/audiod/src/scheme.rs` — volume interface
|
|
|
|
**Acceptance**: Volume control changes audible output level. Plugging/unplugging headphones triggers jack event (visible in debug output). Headphone and speaker paths are independent.
|
|
|
|
### T3.3: HDA Stream Setup and PCM Playback
|
|
|
|
**Problem**: No actual PCM audio output. HDA hardware configured but no audio data flows.
|
|
|
|
**Linux reference**: `sound/hda/hda_controller.c` — `azx_pcm_open()` / `azx_pcm_prepare()` / `azx_pcm_trigger()`.
|
|
|
|
**Implementation**:
|
|
1. Implement stream (PCM) management:
|
|
- Allocate stream descriptor from controller (SD0-SDn)
|
|
- Configure stream format (sample rate, bits, channels)
|
|
- Set BDL (Buffer Descriptor List) for DMA
|
|
- Set stream position in buffer (LPIB register)
|
|
2. Implement PCM playback path:
|
|
- `pcm_open(format)` — allocate stream, configure format
|
|
- `pcm_write(data)` — write audio samples to DMA buffer
|
|
- `pcm_start()` — set RUN bit in stream control
|
|
- `pcm_stop()` — clear RUN bit
|
|
3. Implement CORB/RIRB interrupt handling for unsolicited responses
|
|
4. Implement stream interrupt handling for buffer completion (BCIS)
|
|
|
|
**Files to modify**:
|
|
- `recipes/core/base/source/drivers/audio/ihdad/src/hda/stream.rs` — stream management (new)
|
|
- `recipes/core/base/source/drivers/audio/ihdad/src/hda/dma.rs` — BDL setup (new)
|
|
- `recipes/core/base/source/drivers/audio/audiod/src/` — PCM routing
|
|
|
|
**Acceptance**: `aplay` (or redox equivalent) plays a WAV file and produces audible output. `parec` captures from microphone. Loopback (output → input) works without distortion.
|
|
|
|
### T3.4: AC97 Multiple Codec + Mixer Support
|
|
|
|
**Problem**: ac97d supports only single codec at fixed configuration. No volume/mute.
|
|
|
|
**Linux reference**: `sound/pci/ac97/ac97_codec.c` (3,134 lines) — multi-codec architecture.
|
|
|
|
**Implementation**:
|
|
1. Add codec slot detection (AC97 supports up to 4 codecs on one controller)
|
|
2. Add mixer register read/write for volume/mute
|
|
3. Add record source selection
|
|
|
|
**Acceptance**: Desktop with AC97 audio codec produces audible output with adjustable volume.
|
|
|
|
---
|
|
|
|
## Phase 4: Input Completeness (Weeks 3-5)
|
|
|
|
### T4.1: PS/2 i8042 Controller Reset
|
|
|
|
**Problem**: ps2d assumes controller is ready. Real hardware may need reset sequence.
|
|
|
|
**Linux reference**: `drivers/input/serio/i8042.c:522` — `i8042_controller_check()`.
|
|
|
|
**Implementation**:
|
|
1. Add controller self-test: Write 0xAA to command register, expect 0x55 response
|
|
2. Add controller initialization: disable devices, flush buffer, enable
|
|
3. Add AUX (mouse) port detection
|
|
4. Add timeout handling for missing ACK from controller
|
|
|
|
**Files to modify**:
|
|
- `recipes/core/base/source/drivers/input/ps2d/src/controller.rs`
|
|
|
|
**Acceptance**: PS/2 keyboard and mouse work on real hardware after cold boot. No "LED command ACK timeout" warnings.
|
|
|
|
### T4.2: Touchpad Protocol Detection
|
|
|
|
**Problem**: USB HID touchpads work as basic mice. No multi-touch, no gestures.
|
|
|
|
**Linux reference**: `drivers/input/mouse/synaptics.c` for Synaptics protocol. `drivers/input/mouse/alps.c` for ALPS.
|
|
|
|
**Implementation**:
|
|
1. Add PS/2 touchpad protocol detection for Synaptics/ALPS/Elantech
|
|
2. Parse multi-touch data from HID digitizer reports
|
|
3. Expose gesture events through evdevd scheme
|
|
|
|
**Acceptance**: Laptop touchpad supports two-finger scroll. Multi-touch coordinates reported correctly.
|
|
|
|
---
|
|
|
|
## Phase 5: Validation & Documentation (Weeks 1-12, parallel)
|
|
|
|
### T5.1: Per-Driver Test Harnesses
|
|
|
|
**Task**: Create QEMU-based test scripts for each driver category:
|
|
- `local/scripts/test-storage-qemu.sh` — boots with virtio-blk + AHCI, runs fio
|
|
- `local/scripts/test-network-qemu.sh` — boots with e1000 + r8169, runs iperf3
|
|
- `local/scripts/test-audio-qemu.sh` — boots with HDA + AC97, plays test tone
|
|
|
|
**Acceptance**: Each script exits 0 on success, produces captured serial output with test results.
|
|
|
|
### T5.2: Hardware Validation Matrix
|
|
|
|
**Task**: Create `local/docs/HARDWARE-VALIDATION-MATRIX.md` documenting tested hardware configurations:
|
|
- CPU/chipset combinations tested
|
|
- Storage controllers (AHCI, NVMe) tested
|
|
- Network chips (e1000, r8169 variants) tested
|
|
- Audio codecs (HDA, AC97) tested
|
|
- Known-broken configurations
|
|
|
|
**Acceptance**: Matrix has at least one verified entry per driver category on real hardware.
|
|
|
|
---
|
|
|
|
## Execution Order & Dependencies
|
|
|
|
```
|
|
Phase 0 (Cross-cutting) ─────────────────────────────────────────────┐
|
|
T0.1 Error handling T0.2 Logging T0.3 Documentation │
|
|
│ │
|
|
├── Phase 1 (Storage) ─────────────────────────────────────────┐ │
|
|
│ T1.1 AHCI NCQ ──► T1.3 TRIM ──► T1.2 PM ──► T1.4 NVMe │ │
|
|
│ │ │
|
|
├── Phase 2 (Network) ──────────────────────────────────────┐ │ │
|
|
│ T2.1 ITR+Checksum ──► T2.2 TSO ──► T2.3 PHY ──► T2.4 │ │ │
|
|
│ │ │ │
|
|
├── Phase 3 (Audio) ────────────────────────────────────┐ │ │ │
|
|
│ T3.1 CodecDetect ──► T3.3 Stream ──► T3.2 Mixer │ │ │ │
|
|
│ T3.4 AC97 (parallel) │ │ │ │
|
|
│ │ │ │ │
|
|
└── Phase 4 (Input) ───────────────────────────────┐ │ │ │ │
|
|
T4.1 PS/2 reset ──► T4.2 Touchpad │ │ │ │ │
|
|
│ │ │ │ │
|
|
Phase 5 (Validation) ◄───────────────────────────────┴─────┴────┴───┴──┘
|
|
T5.1 Test harnesses T5.2 Hardware matrix
|
|
```
|
|
|
|
**Phase 0 is prerequisite for all other phases.**
|
|
**Phases 1-4 are independent of each other and can run in parallel.**
|
|
**Phase 5 runs concurrently with all phases, finalizing as each completes.**
|
|
|
|
## Timeline
|
|
|
|
| Phase | Tasks | Duration | Cumulative |
|
|
|-------|-------|----------|------------|
|
|
| Phase 0 | T0.1, T0.2, T0.3 | Weeks 1-2 | Week 2 |
|
|
| Phase 1 | T1.1, T1.2, T1.3, T1.4 | Weeks 2-6 | Week 6 |
|
|
| Phase 2 | T2.1, T2.2, T2.3, T2.4 | Weeks 4-8 | Week 8 |
|
|
| Phase 3 | T3.1, T3.2, T3.3, T3.4 | Weeks 6-10 | Week 10 |
|
|
| Phase 4 | T4.1, T4.2 | Weeks 3-5 | Week 5 |
|
|
| Phase 5 | T5.1, T5.2 | Weeks 1-12 (parallel) | Week 12 |
|
|
|
|
**Total**: 12 weeks with 2 developers working in parallel (Phase 1 and Phase 3 on separate tracks).
|
|
|
|
---
|
|
|
|
## Linux Reference Map
|
|
|
|
Every task references specific Linux source. Here is the complete map:
|
|
|
|
| Task | Primary Reference | File Size | Function Focus |
|
|
|------|-------------------|-----------|----------------|
|
|
| T1.1 (NCQ) | `drivers/ata/libata-sata.c` | 1,365 lines | `ata_qc_issue()`, FIS construction |
|
|
| T1.2 (AHCI PM) | `drivers/ata/libata-eh.c` | 3,915 lines | `ata_eh_handle_port_suspend()` |
|
|
| T1.3 (TRIM) | `drivers/ata/libata-scsi.c` | 4,504 lines | `ata_scsi_unmap_xlat()` |
|
|
| T1.4 (NVMe) | `drivers/nvme/host/pci.c` | 3,146 lines | `nvme_reset_work()`, queue creation |
|
|
| T2.1 (ITR) | `e1000e/netdev.c` | 7,240 lines | `e1000_configure_itr()`, checksum |
|
|
| T2.2 (TSO) | `e1000e/netdev.c` | 7,240 lines | `e1000_tso()` |
|
|
| T2.3 (PHY) | `r8169_phy_config.c` | 1,354 lines | per-chip PHY init sequences |
|
|
| T3.1 (Codec) | `sound/hda/hda_codec.c` | 5,598 lines | `snd_hda_codec_new()`, widget parsing |
|
|
| T3.2 (Mixer) | `sound/hda/hda_generic.c` | 5,982 lines | `create_mute_volume_ctl()` |
|
|
| T3.3 (Stream) | `sound/hda/hda_controller.c` | 1,900 lines | `azx_pcm_open/prepare/trigger()` |
|
|
| T3.4 (AC97) | `sound/pci/ac97/ac97_codec.c` | 3,134 lines | multi-codec, mixer regs |
|
|
| T4.1 (PS/2) | `drivers/input/serio/i8042.c` | 1,254 lines | `i8042_controller_check()` |
|
|
| T4.2 (Touchpad) | `drivers/input/mouse/synaptics.c` | 1,707 lines | protocol detection |
|
|
|
|
---
|
|
|
|
## Scope Boundaries
|
|
|
|
**In scope**:
|
|
- Storage driver enhancements (AHCI NCQ, PM, TRIM; NVMe queues)
|
|
- Network driver enhancements (e1000 offload, r8169 PHY, jumbo frames)
|
|
- Audio driver enhancements (HDA codec, mixer, streams; AC97 multi-codec)
|
|
- Input driver enhancements (PS/2 reset, touchpad protocols)
|
|
- Cross-cutting driver quality (error handling, logging, documentation)
|
|
|
|
**Out of scope** (covered by existing plans):
|
|
- ACPI S3/S4 sleep, thermal, EC — see `ACPI-IMPROVEMENT-PLAN.md`
|
|
- PCI IRQ, MSI-X depth, IOMMU — see `IRQ-AND-LOWLEVEL-CONTROLLERS-ENHANCEMENT-PLAN.md`
|
|
- USB controller completeness, device lifecycle — see `USB-IMPLEMENTATION-PLAN.md`
|
|
- GPU/DRM display, KMS, Mesa — see `DRM-MODERNIZATION-EXECUTION-PLAN.md`
|
|
- Bluetooth — see `BLUETOOTH-IMPLEMENTATION-PLAN.md`
|
|
- Wi-Fi — see `WIFI-IMPLEMENTATION-PLAN.md`
|
|
- Desktop/KDE — see `CONSOLE-TO-KDE-DESKTOP-PLAN.md`
|
|
|
|
---
|
|
|
|
## Addendum A: Kernel Substrate Audit (2026-05-04 deep re-assessment)
|
|
|
|
### A.1 CPU / SMP / Timer Initialization
|
|
|
|
**Red Bear**: Kernel arch/x86_64 (502 lines) + arch/x86_shared + time.rs
|
|
**Linux**: `arch/x86/kernel/smpboot.c` (1,511) + `arch/x86/kernel/apic/apic.c` (2,694) + `arch/x86/kernel/tsc.c` (1,612) + `kernel/time/tick-common.c` (595) = 6,412 lines (subset)
|
|
|
|
**What Red Bear has**:
|
|
- Basic x86_64 boot (GDT, IDT, page tables)
|
|
- x2APIC/SMP detected from MADT
|
|
- HPET timer
|
|
|
|
**What Linux has that Red Bear is missing**:
|
|
- ❌ BSP/AP handoff protocol — Linux: `smpboot.c:895` `do_boot_cpu()`
|
|
- ❌ CPU hotplug (online/offline) — Linux: `smpboot.c:1312` `cpu_up()` / `cpu_down()`
|
|
- ❌ TSC calibration and synchronization — Linux: `tsc.c:1186` `check_tsc_sync_source()`
|
|
- ❌ APIC timer calibration and per-CPU timers — Linux: `apic.c:294` `calibrate_APIC_clock()`
|
|
- ❌ Interrupt affinity and vector allocation — Linux: `kernel/irq/manage.c` (2,803 lines)
|
|
- ❌ IPI (Inter-Processor Interrupt) routing — Linux: `apic/ipi.c`
|
|
- ❌ CPU idle states (C-states) — Linux: `arch/x86/kernel/acpi/cstate.c`
|
|
- ❌ Clock source rating and switching — Linux: `kernel/time/clocksource.c`
|
|
|
|
**Priority**: SMP bring-up stability and TSC sync are critical for multi-core correctness. Without APIC timer calibration, scheduler tick is unreliable.
|
|
|
|
### A.2 DMA / Memory / IOMMU Substrate
|
|
|
|
**Red Bear**: kernel memory/mod.rs (1,266 lines) + iommu daemon (4,411 lines)
|
|
**Linux**: `kernel/dma/mapping.c` (1,016) + `drivers/iommu/` (~30K) + `mm/` subsystem
|
|
|
|
**What Red Bear has**:
|
|
- Physical memory mapping via scheme:memory
|
|
- Basic IOMMU daemon (4,411 lines — substantial, AMD-Vi + Intel VT-d)
|
|
- Page table management in iommu daemon
|
|
|
|
**What Linux has that Red Bear is missing**:
|
|
- ❌ Coherent DMA API — Linux: `kernel/dma/mapping.c` `dma_alloc_coherent()`
|
|
- ❌ Streaming DMA API — Linux: `kernel/dma/mapping.c` `dma_map_single()`
|
|
- ❌ Scatter-gather DMA — Linux: `lib/scatterlist.c`
|
|
- ❌ DMA pool/zone management
|
|
- ❌ SWIOTLB bounce buffering — Linux: `kernel/dma/swiotlb.c`
|
|
- ❌ IOMMU DMA remapping per-device — the iommu daemon exists but Linux handles this in-kernel with `iommu_dma_ops`
|
|
- ❌ DMA debug and error injection — Linux: `kernel/dma/debug.c`
|
|
|
|
**Priority**: DMA API is prerequisite for any driver doing scatter-gather. Without coherent DMA, drivers must manually manage cache coherency.
|
|
|
|
### A.3 Virtio Completeness
|
|
|
|
**Red Bear**: virtio-core (1,545 lines) + virtio-blkd + virtio-netd + virtio-gpud
|
|
**Linux**: `drivers/virtio/virtio.c` (730) + `virtio_ring.c` (3,940) + `virtio_pci_modern.c` (1,301) + blk/net/gpu drivers (14,957 total)
|
|
|
|
**What Red Bear has**:
|
|
- Basic virtio PCI transport (legacy)
|
|
- Split virtqueue with basic ring management
|
|
- virtio-blk, virtio-net, virtio-gpu drivers
|
|
|
|
**What Linux has that Red Bear is missing**:
|
|
- ❌ **Virtio 1.0 modern PCI transport** — Linux: `virtio_pci_modern.c` (1,301 lines). Red Bear only uses legacy.
|
|
- ❌ **Packed virtqueue** (Virtio 1.1) — Linux: `virtio_ring.c` supports both split and packed
|
|
- ❌ **Multiqueue support** — Linux: virtio-net supports up to 16 TX/RX queue pairs via MSI-X
|
|
- ❌ **Virtio feature negotiation** — Red Bear hardcodes features; Linux does dynamic negotiation
|
|
- ❌ **Device reset protocol** — Linux: `virtio.c:237` `virtio_reset_device()`
|
|
- ❌ **Virtio-MMIO transport** (for ARM/RISC-V VMs)
|
|
- ❌ **Virtio-balloon** (memory ballooning)
|
|
|
|
**Priority**: Modern PCI transport is required for QEMU machine types `q35` and newer. Packed virtqueues improve throughput. Multiqueue is critical for network performance.
|
|
|
|
### A.4 CPU Frequency / Thermal / Power
|
|
|
|
**Red Bear**: cpufreqd (176 lines — real implementation with governors), thermald (837 lines), hwrngd (534 lines), redbear-upower, redbear-acmd, redbear-ecmd
|
|
**Linux**: `drivers/cpufreq/cpufreq.c` (3,081) + `drivers/thermal/thermal_core.c` (1,956) + `drivers/char/hw_random/core.c` (739)
|
|
|
|
**cpufreqd status**: 176 lines with ondemand/performance/powersave governors, MSR-based P-state control via IA32_PERF_CTL, and CPU load measurement via `/scheme/sys`. Still missing vs Linux:
|
|
- ❌ Governor framework (performance, powersave, ondemand, schedutil)
|
|
- ❌ ACPI P-state (_PSS) integration
|
|
- ❌ Intel P-state / HWP driver
|
|
- ❌ AMD CPPC driver
|
|
|
|
**thermald status**: 837 lines — basic thermal monitoring exists but missing:
|
|
- ❌ Thermal zone trip points (passive/active/critical)
|
|
- ❌ Cooling device registration
|
|
- ❌ Fan speed control via ACPI
|
|
|
|
**hwrngd status**: 534 lines — reasonable random number daemon. Missing:
|
|
- ❌ Entropy estimation per FIPS 140-2
|
|
- ❌ Multiple entropy source mixing (CPU jitter, TPM, RDRAND)
|
|
- ❌ `/dev/hwrng` interface
|
|
|
|
**Priority**: cpufreqd has basic governor support but still needs ACPI P-state integration, Intel HWP, and AMD CPPC for full functionality.
|
|
|
|
### A.5 Block Layer / Filesystem Integration
|
|
|
|
**Red Bear**: No dedicated block layer — each storage driver handles I/O directly via DiskScheme
|
|
**Linux**: `block/blk-mq.c` (5,309) + `block/blk-flush.c` (540) + `block/genhd.c` + `block/elevator.c`
|
|
|
|
**What Linux has that Red Bear is missing**:
|
|
- ❌ Multi-queue block I/O — Linux: `blk-mq.c` — per-CPU queues + tag sets
|
|
- ❌ I/O scheduling (mq-deadline, kyber, bfq) — Linux: `block/mq-deadline.c`
|
|
- ❌ Flush/FUA semantics — Linux: `block/blk-flush.c`
|
|
- ❌ I/O merging and sorting
|
|
- ❌ Request timeout and retry — Linux: `block/blk-mq.c` `blk_mq_check_expired()`
|
|
- ❌ Block device partitioning (MBR/GPT handled by partitionlib library)
|
|
- ❌ Queue depth management and back-pressure
|
|
|
|
**Red Bear storage drivers** (nvmed 1,318 lines; usbscsid 1,622 lines; ided 773 lines) all implement their own I/O dispatch. The lack of a shared block layer means each driver reinvents queuing, timeout, and retry logic.
|
|
|
|
**Priority**: Block layer is prerequisite for NCQ, NVMe multi-queue, TRIM propagation, and crash consistency.
|
|
|
|
---
|
|
|
|
## Revised Execution Priority (incorporating kernel substrate)
|
|
|
|
| Tier | Subsystem | Effort |
|
|
|------|-----------|--------|
|
|
| **T0** (kernel) | SMP bring-up stability, TSC calibration, interrupt affinity | 4-6 weeks |
|
|
| **T0** (kernel) | DMA API + scatter-gather | 2-3 weeks |
|
|
| **T1** | AHCI NCQ + block layer | 3-4 weeks |
|
|
| **T1** | Virtio modern PCI + multiqueue | 2-3 weeks |
|
|
| **T1** | cpufreqd (governor + P-state) | 2-3 weeks |
|
|
| **T2** | Network offloads (Phase 2) | 3-4 weeks |
|
|
| **T2** | HDA codec detection (Phase 3) | 3-4 weeks |
|
|
| **T3** | thermald trip points + fan control | 1-2 weeks |
|
|
| **T3** | NVMe multi-queue | 2-3 weeks |
|
|
| **T4** | Audio streams + mixer (Phase 3 remainder) | 3-4 weeks |
|
|
|
|
**Total**: 24-36 weeks (T0-T2 minimum viable), 40-52 weeks (full).
|
|
|
|
---
|
|
|
|
## Addendum B: Daemon & Subsystem Audit (2026-05-04, updated with precise Linux 7.0 line counts)
|
|
|
|
### B.1 ACPI Subsystem — Deep Linux Cross-Reference
|
|
|
|
**Red Bear**: acpid (2,187 lines) + kernel ACPI (727 lines) = 2,914 total
|
|
**Linux 7.0** (key files): `sleep.c` (1,152) + `thermal.c` (1,067) + `battery.c` (1,331) + `ec.c` (2,380) + `arch/x86/kernel/acpi/sleep.c` (202) + `processor_perflib.c` + `acpi_video.c` + `pci_irq.c` + `apei/` = **~60,000+ total**
|
|
|
|
| Linux File | Lines | Feature | Red Bear Status |
|
|
|------------|-------|---------|-----------------|
|
|
| `drivers/acpi/sleep.c` | 1,152 | S3/S4 suspend, NVS save/restore, wakeup vector | ❌ S3/S4 missing |
|
|
| `drivers/acpi/thermal.c` | 1,067 | Thermal zones, trip points, cooling | ❌ Missing |
|
|
| `drivers/acpi/battery.c` | 1,331 | Battery status, charge, ACPI _BIF/_BST | ❌ Missing |
|
|
| `drivers/acpi/ec.c` | 2,380 | Embedded Controller runtime, commands, GPE | ❌ Missing (redbear-ecmd is stub) |
|
|
| `drivers/acpi/fan.c` | ~400 | Fan speed control | ❌ Missing |
|
|
| `arch/x86/kernel/acpi/sleep.c` | 202 | x86-specific sleep, wakeup vector, trampoline | ❌ Missing |
|
|
| `drivers/acpi/processor_perflib.c` | ~800 | _PSS/_PPC performance states | ❌ Missing |
|
|
| `drivers/acpi/pci_irq.c` | ~500 | PCI IRQ routing overrides (_PRT) | ❌ Missing |
|
|
| `drivers/acpi/apei/` | ~3,000 | ACPI Platform Error Interface | ❌ Missing |
|
|
|
|
**Priority**: S3/S4 sleep and thermal zones are critical for laptop/desktop use. EC support needed for modern laptops.
|
|
|
|
### B.2 IRQ / MSI / Timer Subsystem — Precise Line Counts
|
|
|
|
**Red Bear**: kernel irq.rs (570) + local_apic.rs (272) + ioapic.rs (427) + ipi.rs (53) + time.rs (36) = 1,358 total
|
|
**Linux 7.0** (key files): `kernel/irq/manage.c` (2,803) + `apic/vector.c` (1,387) + `apic/msi.c` (391) + `tsc.c` (1,612) + `tick-common.c` (595) = **6,788 lines (subset)**
|
|
|
|
| Linux File | Lines | Feature | Red Bear Status |
|
|
|------------|-------|---------|-----------------|
|
|
| `kernel/irq/manage.c` | 2,803 | IRQ management, affinity, threading, spurious | ❌ Basic only |
|
|
| `arch/x86/kernel/apic/vector.c` | 1,387 | Vector allocation matrix, CPU assignment | ❌ Missing |
|
|
| `arch/x86/kernel/apic/msi.c` | 391 | MSI address/data composition, mask bits | ❌ Missing |
|
|
| `arch/x86/kernel/tsc.c` | 1,612 | TSC calibration, sync, clocksource rating | ❌ Missing |
|
|
| `kernel/time/tick-common.c` | 595 | Tick management, NO_HZ, broadcast | ❌ Missing |
|
|
|
|
**Priority**: MSI/MSI-X blocks modern GPU/NVMe/network. TSC calibration needed for accurate time.
|
|
|
|
### B.3 cpufreqd — Confirmed 26-line Stub
|
|
|
|
cpufreqd is **26 lines** — logs messages, sleeps forever. No MSR access, no governor, no P-state control. A 176-line implementation was written and saved as `local/patches/base/P6-cpufreqd-real-impl.patch` (177 lines) but the source was reverted. Needs re-application.
|
|
|
|
### B.4 Stale Documentation Cleanup
|
|
|
|
27 docs archived total. BOOT-PROCESS-FIX-SUMMARY and GRAPHICAL-BOOT-ASSESSMENT moved to archive (superseded by this plan).
|