feat: P0-P6 kernel scheduler + relibc threading comprehensive implementation
P0-P2: Barrier SMP, sigmask/pthread_kill races, robust mutexes, RT scheduling, POSIX sched API P3: PerCpuSched struct, per-CPU wiring, work stealing, load balancing, initial placement P4: 64-shard futex table, REQUEUE, PI futexes (LOCK_PI/UNLOCK_PI/TRYLOCK_PI), robust futexes, vruntime tracking, min-vruntime SCHED_OTHER selection P5: setpriority/getpriority, pthread_setaffinity_np, pthread_setname_np, pthread_setschedparam (Redox) P6: Cache-affine scheduling (last_cpu + vruntime bonus), NUMA topology kernel hints + numad userspace daemon Stability fixes: make_consistent stores 0 (dead TID fix), cond.rs error propagation, SPIN_COUNT adaptive spinning, Sys::open &str fix, PI futex CAS race, proc.rs lock ordering, barrier destroy Patches: 33 kernel + 58 relibc patches, all tracked in recipes Docs: KERNEL-SCHEDULER-MULTITHREAD-IMPROVEMENT-PLAN.md updated, SCHEDULER-REVIEW-FINAL.md created Architecture: NUMA topology parsing stays userspace (numad daemon), kernel stores lightweight NumaTopology hints
This commit is contained in:
@@ -24,10 +24,16 @@ shell = "/usr/bin/ion"
|
||||
shell = "/usr/bin/zsh"
|
||||
|
||||
[packages]
|
||||
# Runtime driver parameter control surface.
|
||||
driver-params = {}
|
||||
|
||||
# Firmware loading
|
||||
redbear-firmware = {}
|
||||
firmware-loader = {}
|
||||
|
||||
# NUMA topology discovery (userspace daemon)
|
||||
numad = {}
|
||||
|
||||
# GPU/graphics stack
|
||||
redox-drm = {}
|
||||
mesa = {}
|
||||
@@ -400,3 +406,4 @@ subclass = 0x00
|
||||
command = ["redox-drm"]
|
||||
"""
|
||||
konsole = {}
|
||||
kf6-pty = {}
|
||||
|
||||
@@ -0,0 +1,735 @@
|
||||
# Red Bear OS Low-Level Device Initialization — Comprehensive Improvement Plan
|
||||
|
||||
**Date:** 2026-04-30
|
||||
**Scope:** Complete reassessment of boot-time device initialization: daemon inventory, firmware loading, driver model, bus enumeration, controller support, hardware validation
|
||||
**Reference:** Linux 7.0 kernel device init model (full source available for comparison)
|
||||
**Status:** Assessment phase — this document is the execution plan
|
||||
|
||||
## 1. Executive Summary
|
||||
|
||||
Red Bear OS has crossed the fundamental bring-up threshold: the system boots to a login prompt on
|
||||
both QEMU and bounded bare-metal hardware (AMD Ryzen), device daemons start in a defined order,
|
||||
and major subsystems (ACPI, PCI, USB/xHCI, NVMe, network) have in-tree implementations.
|
||||
|
||||
However, the device initialization stack is **not release-grade**. Key deficiencies vs Linux 7.0:
|
||||
|
||||
| Gap | Severity | Impact |
|
||||
|-----|----------|--------|
|
||||
| No proper device driver model (bus/device/driver binding) | CRITICAL | No deferred probing, no async init, no hotplug |
|
||||
| No uevent/hotplug infrastructure (udev-shim is static enumerator only) | CRITICAL | No device add/remove notification; `udev-shim` is misnamed — it does a single PCI scan, not real udev |
|
||||
| No EHCI/OHCI/UHCI USB controllers | HIGH | USB keyboard not reliable on bare metal |
|
||||
| initfs vs rootfs driver duality — drivers started in initfs may conflict with rootfs drivers | HIGH | No explicit handoff contract for devices initialized in initfs |
|
||||
| No hardware validation for MSI-X, IOMMU, xHCI interrupts | HIGH | QEMU-proven only; real hardware behavior unknown |
|
||||
| No suspend/resume or runtime power management | HIGH | No S3/S4 sleep, no device power gating |
|
||||
| No CPU frequency scaling or thermal management | MEDIUM | Battery life, thermal throttling absent |
|
||||
| No hardware RNG daemon, no SMBIOS/DMI runtime | MEDIUM | Missing entropy source, missing quirk data |
|
||||
| No PCIe AER, no advanced error reporting | MEDIUM | Silent device failures |
|
||||
| Firmware loading GPU-only (no Wi-Fi, audio, media) | MEDIUM | Blocks iwlwifi, Bluetooth, media acceleration |
|
||||
| No device naming policy or persistent device names | MEDIUM | `/dev/` names unstable across boots |
|
||||
| No kernel cmdline for device parameterization | LOW | No runtime device config without rebuild |
|
||||
| ACPI startup still carries panic-grade `expect` paths | HIGH | Boot fragility on diverse hardware |
|
||||
| `acpid` `_S5` shutdown not release-grade | HIGH | Unclean shutdown on some platforms |
|
||||
| Wi-Fi transport asserts on MSI-X (no legacy IRQ fallback) | HIGH | Wi-Fi won't work on older platforms |
|
||||
| No EHCI companion controller routing for USB keyboards | HIGH | USB keyboard may be unreachable on some bare metal |
|
||||
| No io_uring or epoll for async I/O in device daemons | LOW | Throughput ceiling for NVMe |
|
||||
|
||||
### Bottom Line
|
||||
|
||||
**Red Bear OS boots, but device initialization is naive by Linux 7.0 standards.** The microkernel
|
||||
scheme-based driver model is architecturally sound, but the implementation lacks the maturity,
|
||||
error resilience, hardware coverage, and power management depth that Linux 7.0 has accumulated
|
||||
over 30 years of driver development.
|
||||
|
||||
This plan defines a structured path to close these gaps over 5 phases (26-40 weeks).
|
||||
|
||||
## 2. Current State Assessment
|
||||
|
||||
### 2.1 Boot Flow
|
||||
|
||||
```
|
||||
UEFI firmware → Bootloader → Kernel (kstart→kmain) →
|
||||
userspace_init → bootstrap (procmgr) → initfs init →
|
||||
├── Phase 1 (initfs): logd, nulld, randd, zerod, rtcd, ramfs
|
||||
├── Phase 1 (initfs): inputd, lived
|
||||
├── Phase 1 (initfs): vesad, fbbootlogd, fbcond (graphics target)
|
||||
├── Phase 1 (initfs): hwd, pcid-spawner-initfs, ps2d (drivers target)
|
||||
├── Phase 1 (initfs): rootfs mount → switchroot
|
||||
├── Phase 2 (rootfs): ipcd, ptyd, pcid-spawner (base target)
|
||||
│ ├── pcid-spawner spawns drivers matching PCI IDs:
|
||||
│ │ ├── Storage: ahcid, ided, nvmed, virtio-blkd, usbscsid
|
||||
│ │ ├── Network: e1000d, rtl8168d, rtl8139d, ixgbed, virtio-netd
|
||||
│ │ ├── Graphics: vesad, ihdgd, virtio-gpud
|
||||
│ │ ├── Input: ps2d, usbhidd
|
||||
│ │ ├── Audio: ihdad, ac97d, sb16d
|
||||
│ │ └── USB: xhcid, usbhubd
|
||||
│ ├── smolnetd → dhcpd (network target)
|
||||
│ ├── firmware-loader, udev-shim, evdevd, wifictl
|
||||
│ ├── dbus-daemon → redbear-sessiond, seatd
|
||||
│ └── console/getty → login prompt
|
||||
```
|
||||
|
||||
### 2.2 Daemon Inventory — Existence and Quality
|
||||
|
||||
#### Core Initfs Daemons (20 services)
|
||||
|
||||
| Daemon | Quality | Notes |
|
||||
|--------|---------|-------|
|
||||
| `logd` | ✅ Hardened | Zero unwrap/expect; file descriptors, setrens, process loop |
|
||||
| `nulld` | ✅ Hardened | Zero unwrap/expect |
|
||||
| `randd` | ✅ Hardened | CPUID chain hardened; 8 test-only unwraps |
|
||||
| `zerod` | ✅ Hardened | Args default + graceful exit |
|
||||
| `rtcd` | ✅ Present | x86 RTC driver; minimal attack surface |
|
||||
| `ramfs@` | ✅ Present | Template service for RAM filesystems |
|
||||
| `inputd` | ✅ Hardened | 14 panic sites converted; partial vt events, buffer sizes |
|
||||
| `lived` | ✅ Present | Live disk daemon |
|
||||
| `vesad` | ✅ Hardened | 20 fixes; FRAMEBUFFER env, EventQueue, event loop, scheme |
|
||||
| `fbbootlogd` | ✅ Hardened | 14 fixes; VT handle, graphics handle, dirty_fb |
|
||||
| `fbcond` | ✅ Hardened | 14 fixes; VT parse, event loop, writes, scheme, display |
|
||||
| `hwd` | ✅ Present | ACPI/DeviceTree boot handler |
|
||||
| `pcid-spawner-initfs` | ✅ Hardened | initfs variant; oneshot_async |
|
||||
| `ps2d` | ✅ Hardened | Controller init drains stale output; QEMU proof |
|
||||
| `bcm2835-sdhcid` | ✅ Present | ARM-only (Raspberry Pi) |
|
||||
|
||||
#### Core Rootfs Daemons (9 base services)
|
||||
|
||||
| Daemon | Quality | Notes |
|
||||
|--------|---------|-------|
|
||||
| `ipcd` | ✅ Present | IPC daemon |
|
||||
| `ptyd` | ✅ Present | Pseudo-terminal daemon |
|
||||
| `pcid-spawner` | ✅ Hardened | Changed to oneshot_async (was blocking init); logs device info |
|
||||
| `sudo` | ✅ Present | Privilege daemon |
|
||||
| `smolnetd`/`netstack` | ✅ Present | TCP/IP stack |
|
||||
| `dhcpd` | ✅ Present | DHCP client |
|
||||
| `audiod` | ✅ Present | Audio multiplexer |
|
||||
|
||||
#### PCI-Matched Device Drivers (pcid-spawner, 25+ drivers)
|
||||
|
||||
| Category | Drivers | Quality |
|
||||
|----------|---------|---------|
|
||||
| Storage | ahcid, ided, nvmed, virtio-blkd, usbscsid | ✅ All hardened (Wave 4 complete) |
|
||||
| Network | e1000d, rtl8168d, rtl8139d, ixgbed, virtio-netd | ✅ All hardened |
|
||||
| Graphics | vesad, ihdgd, virtio-gpud | ✅ All hardened |
|
||||
| Input | ps2d, usbhidd | ✅ All hardened |
|
||||
| Audio | ihdad, ac97d, sb16d | ✅ All hardened |
|
||||
| USB | xhcid, usbhubd, usbctl, ucsid | ✅ xhcid has 88 Red Bear patches |
|
||||
| GPIO/I2C | gpiod, i2cd, intel-gpiod, amd-mp2-i2cd, dw-acpi-i2cd, i2c-gpio-expanderd, i2c-hidd, intel-thc-hidd, intel-lpss-i2cd | ✅ Present |
|
||||
| System | pcid, pcid-spawner, acpid | ✅ Core infra; pcid hardened Wave 1-2 |
|
||||
| VirtualBox | vboxd | ✅ x86 only |
|
||||
|
||||
#### Custom Red Bear Daemons
|
||||
|
||||
| Daemon | Quality | Notes |
|
||||
|--------|---------|-------|
|
||||
| `firmware-loader` | ✅ Well-tested | 18 unit tests; scheme:firmware with read/mmap; no signing |
|
||||
| `redox-drm` | 🚡 Bounded compile | AMD+Intel+VirtIO display; 68 tests; no HW validation |
|
||||
| `amdgpu` | 🚡 Bounded compile | Imported Linux DC/TTM/core; partial display glue |
|
||||
| `iommu` | 🚡 QEMU-proven | AMD-Vi detection + first-use proof; no HW validation |
|
||||
| `udev-shim` | ✅ Present | Scheme:udev with device enumeration |
|
||||
| `evdevd` | ✅ Present | Linux-compatible evdev interface |
|
||||
| `redbear-sessiond` | ✅ Present | D-Bus login1 session broker |
|
||||
| `redbear-wifictl` | 🚡 Host-tested | Wi-Fi control daemon; no real hardware |
|
||||
| `redbear-iwlwifi` | 🚡 Host-tested | Intel transport; ~2450 lines C + ~1550 lines Rust; 119 tests |
|
||||
| `redbear-btusb` | 🔴 Experimental | BLE-first; USB-attached only; QEMU validation in progress |
|
||||
| `redbear-authd` | ✅ Present | Local-user authentication |
|
||||
| `redbear-greeter` | 🚡 Partial | Greeter orchestrator; Qt Wayland integration broken |
|
||||
| `redbear-netctl` | ✅ Present | Network profile management |
|
||||
| `redbear-hwutils` | ✅ Present | lspci, lsusb, phase checkers |
|
||||
|
||||
### 2.3 Firmware Loading
|
||||
|
||||
**What exists:**
|
||||
- `scheme:firmware` daemon (`firmware-loader`) indexes blobs from `/lib/firmware/`
|
||||
- `linux-kpi` provides `request_firmware()` via Rust FFI
|
||||
- AMD GPU blobs (675 .bin files) in `local/firmware/amdgpu/` (gitignored, fetched from linux-firmware)
|
||||
- Intel DMC display blobs fetchable via `fetch-firmware.sh --vendor intel --subset dmc`
|
||||
- Two fetch mechanisms: standalone script (selective) + build-time meta-package (full linux-firmware)
|
||||
- `PCI_QUIRK_NEED_FIRMWARE` flag defined (bit 11), but never checked by any driver
|
||||
|
||||
**What is MISSING vs Linux 7.0 `firmware_class`:**
|
||||
- No firmware signing/verification (no `module_sig_check` equivalent)
|
||||
- No `request_firmware_nowait` with uevent dispatch to userspace helper (Linux uses `/sys/$DEVPATH/loading` + `/sys/$DEVPATH/data` + uevent to notify udev)
|
||||
- No persistent firmware cache between boots (in-memory only; Linux caches during suspend for resume-fastpath)
|
||||
- No fallback firmware variant search (if dmcub_dcn31.bin missing, try dmcub_dcn30.bin; Linux has per-driver firmware search paths)
|
||||
- No `/sys/firmware/` interface (Linux exposes firmware loading status via sysfs)
|
||||
- No firmware preloading at driver bind time
|
||||
- No timeout for synchronous `request_firmware` (blocks forever; Linux times out after ~60s with uevent fallback)
|
||||
- No platform firmware fallback (Linux can search UEFI firmware volumes via `firmware_request_platform()`)
|
||||
- No Wi-Fi firmware blobs (iwlwifi, ath10k, etc.)
|
||||
- No Bluetooth firmware blobs
|
||||
- No audio/media codec firmware
|
||||
- Firmware lookup limited to 3 hardcoded paths (Linux searches: `/lib/firmware/`, `/lib/firmware/updates/`, `/lib/firmware/$KVER/`, `/usr/lib/firmware/`, `/usr/share/firmware/`, plus custom path via kernel param)
|
||||
|
||||
### 2.4 Hardware Validation Status
|
||||
|
||||
| Subsystem | QEMU | Bare Metal | Notes |
|
||||
|-----------|------|------------|-------|
|
||||
| ACPI boot | ✅ | ✅ (AMD) | Boot-baseline; `_S5` shutdown not release-grade |
|
||||
| x2APIC/SMP | ✅ | ✅ | Multi-core works |
|
||||
| PCI enumeration | ✅ | ✅ | pcid enumerates devices |
|
||||
| MSI-X | ✅ (virtio-net) | ❌ | No hardware proof |
|
||||
| IOMMU/AMD-Vi | ✅ (first-use) | ❌ | Detection works; no HW validation |
|
||||
| xHCI interrupt | ✅ | ❌ | Interrupt mode proven; no HW |
|
||||
| USB storage | ✅ (readback) | ❌ | QEMU mass-storage proof |
|
||||
| NVMe | ✅ | ❌ | Builds; no HW |
|
||||
| AHCI | ✅ | ❌ | Builds; no HW |
|
||||
| Network (e1000/virtio) | ✅ | ❌ | QEMU only |
|
||||
| PS/2 keyboard | ✅ | ✅ | QEMU + AMD bare metal |
|
||||
| USB keyboard | ✅ (QEMU HID) | ⚠️ | Not reliable on bare metal |
|
||||
| Wi-Fi | ❌ | ❌ | Host-tested transport only |
|
||||
| Bluetooth | ❌ | ❌ | Experimental BLE; QEMU in progress |
|
||||
|
||||
### 2.5 Comparison with Linux 7.0 Device Init Model
|
||||
|
||||
#### 2.5.1 Linux Initcall Ordering (Reference)
|
||||
|
||||
Linux uses a 10-level initcall system for boot-phase ordering:
|
||||
|
||||
| Level | Macro | Typical Count | Example Uses |
|
||||
|-------|-------|---------------|--------------|
|
||||
| 0 | `pure_initcall` | ~few | Pure infrastructure |
|
||||
| early | `early_initcall` | ~446 | mm init, early console, DT scan |
|
||||
| 1 | `core_initcall` | ~614 | Workqueues, RCU, memory allocators |
|
||||
| 2 | `postcore_initcall` | ~150 | Clocksource, scheduler, IRQ core |
|
||||
| 3 | `arch_initcall` | ~751 | PCI bus init, ACPI table parsing, CPU bringup |
|
||||
| 4 | `subsys_initcall` | ~573 | PCI enumerate, USB core, networking core, block |
|
||||
| 5 | `fs_initcall` | ~1372 | Filesystem registration |
|
||||
| 6 | `device_initcall` | ~1211 | Most drivers; `module_init()` maps here |
|
||||
| 7 | `late_initcall` | ~440 | Late init, debug, tracing |
|
||||
|
||||
Red Bear OS has **no equivalent ordering mechanism** — the TOML-based init uses `requires_weak`
|
||||
for loose ordering but has no topological sort depth, no `Before`/`After` fields, no explicit
|
||||
init phases beyond the coarse initfs/rootfs split.
|
||||
|
||||
#### 2.5.2 Feature Comparison Table
|
||||
|
||||
| Feature | Linux 7.0 | Red Bear OS | Gap |
|
||||
|---------|-----------|-------------|-----|
|
||||
| **Driver model** | `bus_type` → `device_driver` → `probe()` binding with match tables | `pcid-spawner` spawns drivers by PCI class/vendor/device | 🟡 Partial — single-shot spawn, no rebinding |
|
||||
| **Deferred probing** | `driver_deferred_probe` — retries when dependency arrives; `-EPROBE_DEFER` triggers retry on any successful probe | None | 🔴 Missing — must be present at boot |
|
||||
| **Async probing** | `async_probe` — parallel driver init via kthreadd workers | Sequential spawn only | 🟡 Partial — oneshot_async for launch but not true async init |
|
||||
| **Hotplug** | uevent netlink → udev → driver bind/unbind; `/sbin/hotplug` path | `udev-shim` is a **static PCI enumerator** — one scan at boot, no event callbacks, no device removal handling | 🔴 Missing — no hotplug infrastructure at all |
|
||||
| **Firmware loading** | `firmware_class` with `request_firmware`, user helper, caching | `scheme:firmware` + `linux-kpi` request_firmware | 🟡 Partial — no uevent/helper/caching |
|
||||
| **USB controllers** | xHCI, EHCI, OHCI, UHCI — all supported | xHCI only | 🔴 Missing — EHCI/OHCI/UHCI absent |
|
||||
| **USB device classes** | HID, storage, audio, video, CDC, vendor, etc. | HID, hub, storage (BOT), CSI (UCSI) | 🟡 Partial — many classes missing |
|
||||
| **Power management** | Suspend/resume, runtime PM, CPU freq scaling, thermal | `_S5` shutdown only | 🔴 Missing — no S3/S4/PM |
|
||||
| **Interrupt handling** | Full APIC/x2APIC, MSI/MSI-X, affinity, NMI, MCE | APIC/x2APIC; MSI-X via quirks | 🟡 Partial — no affinity, no NMI watchdog |
|
||||
| **IOMMU** | AMD-Vi, Intel VT-d with DMA remapping + IR | AMD-Vi detection + first-use proof | 🟡 Partial — no VT-d, no hardware |
|
||||
| **ACPI namespace** | Full namespace: devices, thermal, battery, processor, etc. | Boot-baseline: MADT, FADT, `_S5`, bounded power | 🟡 Partial — many ACPI objects missing |
|
||||
| **PCIe features** | AER, ACS, ATS, PRI, PASID, SR-IOV | Basic PCI config space only | 🔴 Missing — no advanced PCIe |
|
||||
| **Device naming** | Predictable network/storage names (systemd udev) | None | 🟡 Partial — no naming policy |
|
||||
| **Hardware RNG** | `hw_random` framework, multiple drivers | None | 🔴 Missing |
|
||||
| **CPU frequency** | `cpufreq` governors | None | 🔴 Missing |
|
||||
| **Thermal management** | `thermal` framework + drivers | None | 🔴 Missing |
|
||||
| **SMBIOS/DMI** | Full DMI table exposure via sysfs | Quirks system has DMI data | 🟡 Partial — not runtime-exposed |
|
||||
| **Kernel cmdline** | Device parameters via boot cmdline | None | 🔴 Missing |
|
||||
|
||||
## 3. Implementation Phases
|
||||
|
||||
### Phase 1 — Driver Model Maturation (Weeks 1-8)
|
||||
|
||||
**Goal:** Establish a proper device driver model with binding semantics, deferred probing,
|
||||
and error resilience — bringing the driver infrastructure to Linux 7.0 par without rewriting
|
||||
existing drivers.
|
||||
|
||||
#### 1.1 Device-Driver Binding Model (Week 1-3)
|
||||
|
||||
Create a `redox-driver-core` library providing Linux-style bus/device/driver abstractions:
|
||||
|
||||
```
|
||||
Device → Driver matching:
|
||||
pcid: class=0x01, subclass=0x08 → nvmed
|
||||
pcid: vendor=0x8086, device=0x10D3 → e1000d
|
||||
|
||||
Driver probe() returns:
|
||||
Ok(()) → device bound, driver active
|
||||
Err(ENODEV) → device not supported by this driver
|
||||
Err(EAGAIN) → dependency not available, DEFER probe
|
||||
Err(...) → fatal error, device unusable
|
||||
```
|
||||
|
||||
**Deliverables:**
|
||||
- `redox-driver-core` crate with `Bus`, `Device`, `Driver` traits
|
||||
- `pcid` exposes devices via new scheme: `scheme:pci/devices/{id}/bind`
|
||||
- `pcid-spawner` replaced by `driver-manager` daemon that:
|
||||
- Reads driver match tables from `/lib/drivers.d/*.toml`
|
||||
- Probes drivers in priority order
|
||||
- Supports deferred probing (EAGAIN → retry when dependency appears)
|
||||
- Supports driver unbind/rebind
|
||||
- All existing `pcid.d/*.toml` match files migrated to new format
|
||||
- Backward compatible: existing pcid-spawner behavior preserved as fallback
|
||||
|
||||
#### 1.2 Async Device Probing (Week 4-5)
|
||||
|
||||
**Deliverables:**
|
||||
- `driver-manager` probes independent device trees in parallel (using Rust async or threads)
|
||||
- Device init order defined by dependency DAG, not sequential spawn
|
||||
- Timing observability: log probe duration per driver
|
||||
- `CONFIG_PARALLEL_PROBE` equivalent: max concurrent probes tunable via config TOML
|
||||
|
||||
#### 1.3 Driver Parameter System (Week 6-7)
|
||||
|
||||
**Deliverables:**
|
||||
- Kernel cmdline parsing in bootloader (e.g., `redbear.nvme.irq_mode=msi`)
|
||||
- `/scheme/sys/driver/{name}/parameters` read/write
|
||||
- Driver authors declare parameters via derive macro
|
||||
- `lspci -v` shows per-device parameters
|
||||
|
||||
#### 1.4 Hotplug Infrastructure (Week 7-8)
|
||||
|
||||
**Deliverables:**
|
||||
- PCIe hotplug: `pcid` detects surprise removal/addition, emits uevent
|
||||
- USB hotplug: `xhcid` emits uevent on device attach/detach
|
||||
- `udev-shim` enhanced to receive uevents and trigger driver binding
|
||||
- `driver-manager` handles hot-add (probe driver) and hot-remove (unbind driver)
|
||||
- Initial scope: PCIe hotplug and USB hotplug only; Thunderbolt deferred
|
||||
|
||||
**Phase 1 Exit Criteria:**
|
||||
- New driver binding model functional for 3+ existing drivers (nvmed, e1000d, xhcid)
|
||||
- Deferred probing works: driver returning EAGAIN retries when dependency scheme appears
|
||||
- Async probing measurable: 2+ independent PCI devices probe concurrently
|
||||
- Hotplug works: USB device attach/detach triggers udev-shim + driver bind/unbind in QEMU
|
||||
- All 25+ existing drivers still compile and function (backward compatibility)
|
||||
|
||||
### Phase 2 — Controller Coverage & Hardware Validation (Weeks 5-14)
|
||||
|
||||
**Goal:** Fill the critical controller gaps (USB EHCI/OHCI/UHCI) and validate the
|
||||
existing controller stack on real hardware — especially MSI-X, IOMMU, and xHCI.
|
||||
|
||||
#### 2.1 USB Controller Family Completion (Week 5-9)
|
||||
|
||||
This is the **highest-impact controller gap** because it directly blocks reliable
|
||||
USB keyboard input on bare metal where the keyboard may be routed through companion
|
||||
controllers rather than xHCI.
|
||||
|
||||
**Deliverables:**
|
||||
- `ehcid` daemon — EHCI (USB 2.0) host controller driver
|
||||
- `ohcid` daemon — OHCI (USB 1.1) host controller driver for non-Intel chipsets
|
||||
- `uhcid` daemon — UHCI (USB 1.1) host controller driver for Intel chipsets
|
||||
- USB companion controller routing: when xHCI owns the ports, companion controllers
|
||||
hand off low/full-speed devices to xHCI transparently
|
||||
- `usb-manager` daemon orchestrates multi-controller topology:
|
||||
- Single `scheme:usb` root exposing all buses
|
||||
- Device path stability across controller types
|
||||
- Port routing table for companion controller ownership handoff
|
||||
- USB 3.1/3.2 SuperSpeedPlus support in xhcid (10 Gbps, 20 Gbps)
|
||||
- USB-C PD/alt-mode awareness in `ucsid`
|
||||
|
||||
**Implementation approach:**
|
||||
- EHCI: Reference Linux `drivers/usb/host/ehci-hcd.c` (~6000 lines) and FreeBSD `sys/dev/usb/controller/ehci.c`
|
||||
- OHCI: Reference Linux `drivers/usb/host/ohci-hcd.c` (~3000 lines)
|
||||
- UHCI: Reference Linux `drivers/usb/host/uhci-hcd.c` (~2500 lines)
|
||||
- All three controllers use the same `scheme:usb` interface — class daemons (usbhubd, usbhidd, usbscsid) work unchanged
|
||||
|
||||
#### 2.2 xHCI Device-Level Hardening (Week 8-10)
|
||||
|
||||
Per the existing `XHCID-DEVICE-IMPROVEMENT-PLAN.md`:
|
||||
|
||||
**Deliverables:**
|
||||
- Atomic device attach publication (prevent half-attached devices)
|
||||
- Bounded device detach and purge
|
||||
- Configure rollback on failure
|
||||
- Real PM sequencing (U0/U1/U2/U3 transitions)
|
||||
- Enumerator cleanup and timing hardening
|
||||
- Growable event ring under sustained activity
|
||||
|
||||
#### 2.3 MSI-X Hardware Validation (Week 8-11)
|
||||
|
||||
Per the existing `IRQ-AND-LOWLEVEL-CONTROLLERS-ENHANCEMENT-PLAN.md` Priority 1:
|
||||
|
||||
**Deliverables:**
|
||||
- AMD GPU MSI-X validation: prove MSI-X vectors fire correctly on real AMD hardware
|
||||
- Intel GPU MSI-X validation: prove MSI-X on Intel hardware
|
||||
- NVMe MSI-X validation: prove per-queue interrupt vectors
|
||||
- xHCI MSI-X validation: prove interrupt-driven event ring on real hardware (not just QEMU)
|
||||
- Verified MSI-X → MSI → legacy IRQ fallback on all tested hardware
|
||||
- Logged CPU/vector affinity behavior
|
||||
- At minimum one AMD and one Intel bare-metal test report per device class
|
||||
|
||||
#### 2.4 IOMMU Hardware Bring-Up (Week 9-14)
|
||||
|
||||
Per the existing `IRQ-AND-LOWLEVEL-CONTROLLERS-ENHANCEMENT-PLAN.md` Priority 2:
|
||||
|
||||
**Deliverables:**
|
||||
- Validated AMD-Vi initialization on real AMD hardware
|
||||
- Device table / command buffer / event log validation
|
||||
- Interrupt remapping validation
|
||||
- Intel VT-d initial detection and register mapping (not full bring-up)
|
||||
- IOMMU fault-path validation: inject fault, verify event log capture
|
||||
- DMA remapping proof: verify device DMA is translated through IOMMU page tables
|
||||
- Negative-result documentation if hardware still fails
|
||||
|
||||
#### 2.5 ACPI Wave 1-2 Completion (Week 10-12)
|
||||
|
||||
Per the existing `ACPI-IMPROVEMENT-PLAN.md` Waves 1-2:
|
||||
|
||||
**Deliverables:**
|
||||
- Finish replacing panic-grade `expect` paths in `acpid` startup
|
||||
- Define and document AML bootstrap contract (explicit RSDP_ADDR producer)
|
||||
- Table-specific reject/warn/degrade/fail rules implemented
|
||||
- Deterministic `_S5` derivation (not dependent on PCI timing)
|
||||
- Explicit shutdown/reboot result semantics
|
||||
- Bounded shutdown proof on real AMD and Intel hardware
|
||||
- Sleep-state scope explicit: S5 only; S3/S4 explicitly deferred
|
||||
|
||||
**Phase 2 Exit Criteria:**
|
||||
- At least one EHCI or OHCI/UHCI driver functional in QEMU
|
||||
- USB keyboard reliably reachable on bare metal AMD and Intel (via xHCI, EHCI, or companion routing)
|
||||
- MSI-X validated on at least one real AMD GPU and one real Intel GPU
|
||||
- IOMMU AMD-Vi validated on at least one real AMD machine
|
||||
- ACPI `_S5` shutdown works on at least one real AMD and one real Intel machine
|
||||
- ACPI startup contains zero panic-grade paths reachable from firmware input
|
||||
|
||||
### Phase 3 — Power Management & Platform Services (Weeks 12-20)
|
||||
|
||||
**Goal:** Add suspend/resume, CPU frequency scaling, thermal management, and hardware
|
||||
RNG — bringing platform services to Linux 7.0 par for basic functionality.
|
||||
|
||||
#### 3.1 ACPI Power Management (Week 12-14)
|
||||
|
||||
Per the existing `ACPI-IMPROVEMENT-PLAN.md` Waves 3-4:
|
||||
|
||||
**Deliverables:**
|
||||
- Honest `/scheme/acpi/power` surface: exposes only behavior with runtime evidence
|
||||
- Consumer-visible distinction between unsupported, unavailable, and populated power state
|
||||
- Reduced surface: remove misleading empty-success defaults
|
||||
- AML physmem/EC failure propagation: no correctness-critical fabricated values
|
||||
- EC error typing and documented widened-access behavior
|
||||
- Documented AML mutex timeout behavior
|
||||
|
||||
#### 3.2 Suspend/Resume (S3 Sleep) — Initial Implementation (Week 13-16)
|
||||
|
||||
**Deliverables:**
|
||||
- Kernel: save/restore CPU context (CR0-CR4, MSRs, IDT/GDT, FPU/SSE/AVX state)
|
||||
- Kernel: ACPI S3 (suspend-to-RAM) entry via `_S3` AML method
|
||||
- Kernel: wake vector registration and resume path
|
||||
- `acpid`: expose `/scheme/acpi/sleep` with `S3` and `S5` states
|
||||
- Device contract: `suspend()` callback on each scheme daemon
|
||||
- Storage: flush caches, park heads (if spinning)
|
||||
- Network: bring link down, save MAC filter state
|
||||
- USB: save controller/port state
|
||||
- Graphics: save mode, blank display
|
||||
- `driver-manager`: suspend devices in dependency order, resume in reverse
|
||||
- Initial scope: S3 only on test hardware; S4 (hibernate) explicitly deferred
|
||||
|
||||
#### 3.3 CPU Frequency Scaling (Week 14-16)
|
||||
|
||||
**Deliverables:**
|
||||
- `cpufreqd` daemon reading ACPI `_PSS` / `_PPC` objects
|
||||
- Intel: P-state MSR writes (IA32_PERF_CTL)
|
||||
- AMD: P-state MSR writes + CPPC awareness
|
||||
- Governors: `performance` (max freq), `powersave` (min freq), `ondemand` (load-based)
|
||||
- `/scheme/cpufreq` for reading/setting governor and frequency
|
||||
- `redbear-info` shows current frequency and governor
|
||||
|
||||
#### 3.4 Thermal Management (Week 15-17)
|
||||
|
||||
**Deliverables:**
|
||||
- `thermald` daemon reading ACPI thermal zone objects (`_TMP`, `_PSV`, `_TC1`, `_TC2`)
|
||||
- Active cooling: fan control via ACPI `_SCP`
|
||||
- Passive cooling: CPU throttling via cpufreqd integration
|
||||
- Critical shutdown: if temperature exceeds `_CRT`, initiate clean shutdown
|
||||
- `/scheme/thermal` for reading zone temperatures and trip points
|
||||
- `redbear-info` shows thermal zone status
|
||||
|
||||
#### 3.5 Hardware RNG (Week 16-17)
|
||||
|
||||
**Deliverables:**
|
||||
- `hwrngd` daemon reading hardware RNG sources:
|
||||
- x86 RDRAND/RDSEED instructions
|
||||
- TPM 2.0 random number generator (if present)
|
||||
- VirtIO entropy device
|
||||
- `scheme:hwrng` feeding into `randd` entropy pool
|
||||
- `/scheme/hwrng` exposes raw entropy and health status
|
||||
- Linux 7.0 `hw_random` framework ported conceptually (not literally)
|
||||
|
||||
#### 3.6 PCIe Advanced Error Reporting (Week 17-18)
|
||||
|
||||
**Deliverables:**
|
||||
- `pcid` exposes AER capability registers via `/scheme/pci/{dev}/aer`
|
||||
- AER error detection: correctable and uncorrectable error status registers
|
||||
- Error logging: decode error source (data link, transaction, poison TLP, etc.)
|
||||
- `aer-inject` utility for testing error paths
|
||||
- Initial scope: error detection and logging only; error recovery (device reset path) deferred
|
||||
|
||||
#### 3.7 SMBIOS/DMI Runtime Exposure (Week 18-20)
|
||||
|
||||
**Deliverables:**
|
||||
- `dmidecode`-equivalent utility using `acpid` DMI scheme
|
||||
- `/scheme/dmi` exposes SMBIOS entry point and table data
|
||||
- `lspci -v` shows DMI-based quirk annotations
|
||||
- DMI data feeding into `redbear-info` for platform identification
|
||||
- Integration with existing quirks system: DMI match rules validated at runtime
|
||||
|
||||
**Phase 3 Exit Criteria:**
|
||||
- S3 suspend/resume works on at least one real machine (AMD or Intel)
|
||||
- CPU frequency scaling observable via `redbear-info`
|
||||
- Thermal zone temperature readable and critical shutdown testable
|
||||
- Hardware RNG feeding entropy pool
|
||||
- PCIe AER errors logged on capable hardware
|
||||
- DMI data accessible via scheme and tools
|
||||
- All new schemes documented with test procedures
|
||||
|
||||
### Phase 4 — Firmware Infrastructure & Wi-Fi Validation (Weeks 16-24)
|
||||
|
||||
**Goal:** Close firmware loading gaps, complete Wi-Fi hardware validation with real
|
||||
firmware, and establish firmware management as a first-class platform service.
|
||||
|
||||
#### 4.1 Firmware Loading Gap Closure (Week 16-18)
|
||||
|
||||
**Deliverables:**
|
||||
- `request_firmware_nowait` with proper uevent dispatch:
|
||||
- Async request → uevent → `udev-shim` listens → `firmware-loader` serves blob
|
||||
- Timeout: if firmware not available within configurable timeout, fail gracefully
|
||||
- Firmware fallback variant search:
|
||||
- If `dmcub_dcn31.bin` not found, try `dmcub_dcn30.bin`, `dmcub_dcn20.bin`
|
||||
- Per-driver fallback chain defined in `/etc/firmware-fallbacks.d/*.toml`
|
||||
- Persistent firmware cache (`/var/lib/firmware/`):
|
||||
- Loaded blobs cached on first use; survive daemon restart
|
||||
- Cache invalidation on firmware version change
|
||||
- `PCI_QUIRK_NEED_FIRMWARE` enforcement:
|
||||
- Drivers actually check the flag via `pci_has_quirk()`
|
||||
- When flag is set: require firmware at probe time, fail probe if absent
|
||||
- When flag is absent: firmware is optional, warn if missing but continue
|
||||
- Fetch Intel Wi-Fi firmware blobs: `fetch-firmware.sh --vendor intel --subset wifi`
|
||||
- Fetch Bluetooth firmware blobs where applicable
|
||||
- Firmware manifest: `/lib/firmware/MANIFEST.txt` lists all blobs, versions, sources
|
||||
|
||||
#### 4.2 Wi-Fi Hardware Validation (Week 16-22)
|
||||
|
||||
Per the existing `WIFI-IMPLEMENTATION-PLAN.md`:
|
||||
|
||||
**Deliverables:**
|
||||
- Real Intel Wi-Fi device (e.g., AX200/AX201/AX210) validated end-to-end
|
||||
- `redbear-iwlwifi` transport:
|
||||
- Firmware loaded via `request_firmware()` → `scheme:firmware`
|
||||
- DMA ring operation validated (TX reclaim, RX restock, command dispatch)
|
||||
- Interrupt handling validated (MSI-X or MSI path)
|
||||
- Association/authentication cycle completed with real AP
|
||||
- `redbear-wifictl` control plane:
|
||||
- Scan → connect → DHCP → disconnect cycle validated
|
||||
- WPA2-PSK and open network profiles functional
|
||||
- Profile persistence and boot-time application
|
||||
- `redbear-netctl` Wi-Fi profiles:
|
||||
- SSID/Security/Key parsing validated
|
||||
- Bounded Wi-Fi lifecycle (prepare → init-transport → activate-nic → connect → disconnect)
|
||||
- Wi-Fi runtime diagnostics:
|
||||
- `redbear-phase5-wifi-check` reports link quality, signal strength, connected AP
|
||||
- `redbear-info --verbose` shows Wi-Fi adapter status
|
||||
- At minimum one real Intel Wi-Fi chipset validated
|
||||
- Legacy IRQ fallback for platforms where MSI-X is unavailable (via quirks)
|
||||
|
||||
#### 4.3 Wi-Fi Desktop API (Week 20-24)
|
||||
|
||||
**Deliverables:**
|
||||
- D-Bus Wi-Fi API on system bus: `org.freedesktop.NetworkManager` subset
|
||||
- `GetDevices`, `GetAccessPoints`, `ActivateConnection`, `DeactivateConnection`
|
||||
- Signal: `AccessPointAdded`, `AccessPointRemoved`, `StateChanged`
|
||||
- `redbear-wifictl` exposes D-Bus interface for desktop consumption
|
||||
- `redbear-netctl` GUI client for scanning and connecting (Qt6-based, optional)
|
||||
- Desktop status bar Wi-Fi indicator (future KDE plasma-nm integration)
|
||||
|
||||
**Phase 4 Exit Criteria:**
|
||||
- `request_firmware_nowait` with uevent dispatch functional in QEMU
|
||||
- PCI_QUIRK_NEED_FIRMWARE enforced in at least one driver (amdgpu or iwlwifi)
|
||||
- Intel Wi-Fi chipset validated end-to-end with real AP
|
||||
- Wi-Fi scan → connect → DHCP → internet access completed on real hardware
|
||||
- Wi-Fi D-Bus API functional for at least get_devices and get_accesspoints
|
||||
- Firmware manifest tracks all loaded blobs with versions
|
||||
|
||||
### Phase 5 — Bluetooth, Device Policy, Polish (Weeks 20-30)
|
||||
|
||||
**Goal:** Bring Bluetooth to validated experimental status, establish device naming policy,
|
||||
and polish remaining gaps.
|
||||
|
||||
#### 5.1 Bluetooth Hardware Validation (Week 20-24)
|
||||
|
||||
Per the existing `BLUETOOTH-IMPLEMENTATION-PLAN.md`:
|
||||
|
||||
**Deliverables:**
|
||||
- `redbear-btusb` transport validated with real USB Bluetooth adapter
|
||||
- `redbear-btctl` HCI host validated:
|
||||
- Controller init sequence (reset, read local features, set event mask)
|
||||
- Device discovery (LE scan → advertising report → connect)
|
||||
- GATT service discovery
|
||||
- Basic data exchange (battery service, device info)
|
||||
- BLE peripheral connect/disconnect cycle validated
|
||||
- Bluetooth classic (BR/EDR) detection and basic inquiry (connect deferred)
|
||||
- `redbear-bluetooth-battery-check` works on real hardware
|
||||
- At minimum one real USB Bluetooth adapter validated
|
||||
|
||||
#### 5.2 Device Naming Policy (Week 22-24)
|
||||
|
||||
**Deliverables:**
|
||||
- Predictable network interface names:
|
||||
- `enp0s1` instead of `eth0` (PCIe bus/device/function based)
|
||||
- `/etc/systemd/network/` equivalent rules in `/etc/udev/rules.d/`
|
||||
- Predictable storage device names:
|
||||
- NVMe: `nvme0n1` instead of raw scheme path
|
||||
- AHCI: `sd{a,b,c}` assigned by port order
|
||||
- USB storage: `sdX` with stable enumeration
|
||||
- `/dev/disk/by-id/`, `/dev/disk/by-path/`, `/dev/disk/by-uuid/` symlinks
|
||||
- `udev-shim` enhanced with rule matching (vendor, model, serial, path patterns)
|
||||
|
||||
#### 5.3 Device Init Observability (Week 23-25)
|
||||
|
||||
**Deliverables:**
|
||||
- Boot-time device init timeline: log each device probe start/end with duration
|
||||
- `redbear-info --boot` shows device init timeline post-boot
|
||||
- Per-device init status: `redbear-info --device pci/00:02.0`
|
||||
- Kernel cmdline `redbear.init_verbose` enables verbose device init logging
|
||||
- Boot-time warning summary: all drivers that probed with warnings or deferrals
|
||||
- Device init health dashboard: `redbear-info --health` shows init status of all subsystems
|
||||
|
||||
#### 5.4 Remaining Gaps (Week 24-30)
|
||||
|
||||
**Deliverables:**
|
||||
- `nvmed` hardware validation: prove NVMe I/O on real hardware
|
||||
- `ahcid` hardware validation: prove SATA I/O on real hardware
|
||||
- `ihdad` hardware validation: prove audio output on real hardware
|
||||
- USB device class coverage expanded:
|
||||
- USB CDC ACM (serial): `usbcdcd` daemon
|
||||
- USB CDC ECM/NCM (ethernet): `usbnetd` daemon (or integrate into existing net drivers)
|
||||
- USB Audio Class 1/2: `usbaudiod` daemon
|
||||
- GPU hardware acceleration readiness:
|
||||
- Mesa radeonsi backend proof-of-concept (single draw call)
|
||||
- KMS atomic modesetting proof on real hardware (not just QEMU)
|
||||
- `redbear-btusb` autospawn via USB class matching
|
||||
- `kstop` shutdown event: gracefully stop all device daemons before power-off
|
||||
|
||||
**Phase 5 Exit Criteria:**
|
||||
- Bluetooth BLE discovery and basic data exchange works on real hardware
|
||||
- Network interfaces use predictable names on QEMU and bare metal
|
||||
- Device init timeline observable via `redbear-info --boot`
|
||||
- NVMe I/O validated on at least one real NVMe drive
|
||||
- Real audio output validated on at least one HDA codec
|
||||
- At least one USB device class beyond HID/storage validated (audio, serial, or ethernet)
|
||||
- All 25+ existing drivers maintain backward compatibility
|
||||
|
||||
## 4. Dependency Graph
|
||||
|
||||
```
|
||||
Phase 1 (Driver Model) ─────────────────────────────┐
|
||||
├── 1.1 Binding Model │
|
||||
├── 1.2 Async Probing (after 1.1) │
|
||||
├── 1.3 Driver Parameters (after 1.1) │
|
||||
└── 1.4 Hotplug (after 1.1) │
|
||||
│
|
||||
Phase 2 (Controllers) ───────────────────────────────┤
|
||||
├── 2.1 USB EHCI/OHCI/UHCI (parallel with 1.2) │
|
||||
├── 2.2 xHCI Hardening (parallel with 1.2) │
|
||||
├── 2.3 MSI-X HW Validation (after 1.1) │
|
||||
├── 2.4 IOMMU HW Bring-Up (parallel with 2.3) │
|
||||
└── 2.5 ACPI Wave 1-2 (parallel with 2.3) │
|
||||
│
|
||||
Phase 3 (Power Mgmt) ────────────────────────────────┤
|
||||
├── 3.1 ACPI Wave 3-4 (after 2.5) │
|
||||
├── 3.2 Suspend/Resume (after 3.1) │
|
||||
├── 3.3 CPU Freq Scaling (parallel with 3.2) │
|
||||
├── 3.4 Thermal Mgmt (after 3.1, parallel 3.3) │
|
||||
├── 3.5 Hardware RNG (parallel with 3.3) │
|
||||
├── 3.6 PCIe AER (after 2.3) │
|
||||
└── 3.7 SMBIOS/DMI (parallel with 3.6) │
|
||||
│
|
||||
Phase 4 (Firmware + Wi-Fi) ──────────────────────────┤
|
||||
├── 4.1 Firmware Gaps (after 1.1) │
|
||||
├── 4.2 Wi-Fi HW (after 4.1, parallel with 2.3) │
|
||||
└── 4.3 Wi-Fi Desktop API (after 4.2) │
|
||||
│
|
||||
Phase 5 (Bluetooth + Polish) ────────────────────────┤
|
||||
├── 5.1 BT HW Validation (parallel with 4.2) │
|
||||
├── 5.2 Device Naming (after 1.1) │
|
||||
├── 5.3 Init Observability (after 1.2) │
|
||||
└── 5.4 Remaining Gaps (after 3.2, 4.2, 5.1) │
|
||||
```
|
||||
|
||||
## 5. Resource Estimates
|
||||
|
||||
| Phase | Duration | Engineers | Key Risk |
|
||||
|-------|----------|-----------|----------|
|
||||
| Phase 1 | 8 weeks | 2 | Over-engineering the driver model; must stay backward compatible |
|
||||
| Phase 2 | 6-9 weeks | 3 (parallelizable) | Real hardware availability; USB controller complexity |
|
||||
| Phase 3 | 8 weeks | 2-3 | ACPI firmware quality varies wildly on real hardware |
|
||||
| Phase 4 | 8 weeks | 2 | Wi-Fi hardware procurement; firmware licensing |
|
||||
| Phase 5 | 10 weeks | 2 | Long tail of device class drivers |
|
||||
|
||||
**Total:** 26-40 weeks (~6-10 months) with 2-3 engineers, depending on parallelism and
|
||||
hardware availability.
|
||||
|
||||
## 6. Risk Register
|
||||
|
||||
| Risk | Probability | Impact | Mitigation |
|
||||
|------|-------------|--------|------------|
|
||||
| No access to AMD GPU with MSI-X | Medium | High | Partner with community; use Intel GPU as alternative |
|
||||
| No access to AMD machine with IOMMU | Medium | High | Prioritize Intel VT-d if AMD hardware unavailable |
|
||||
| USB EHCI/OHCI/UHCI significantly harder than estimated | Medium | High | Scope to EHCI-only initially; UHCI/OHCI deferred |
|
||||
| ACPI firmware corruption on test machines causes false failures | High | Medium | Test on 3+ machines per platform class |
|
||||
| Wi-Fi firmware licensing prevents redistribution | Low | Medium | Keep firmware external (fetched, not committed) |
|
||||
| Existing driver regression from new driver model | Medium | High | Extensive backward compat testing; parallel old/new paths |
|
||||
| S3 suspend/resume crashes unrecoverably on some hardware | High | Medium | Gate behind config flag; S3 is opt-in initially |
|
||||
|
||||
## 7. Success Criteria (Definition of Done)
|
||||
|
||||
This plan is complete when:
|
||||
|
||||
1. **Driver Model:** New driver binding model works for all existing drivers; deferred probing
|
||||
retries correctly; async probing measurably parallel; hotplug adds/removes devices without reboot.
|
||||
|
||||
2. **USB Controllers:** At least one non-xHCI controller (EHCI preferred) functional; USB keyboard
|
||||
reliable on bare metal AMD and Intel.
|
||||
|
||||
3. **Hardware Validation:** MSI-X proven on real AMD + Intel GPU; IOMMU AMD-Vi proven on real
|
||||
AMD machine; ACPI `_S5` shutdown proven on real AMD + Intel; NVMe I/O proven on real hardware.
|
||||
|
||||
4. **Power Management:** S3 suspend/resume works on at least one real machine; CPU frequency
|
||||
scaling observable; thermal shutdown testable.
|
||||
|
||||
5. **Firmware:** `request_firmware_nowait` with uevent dispatch; `PCI_QUIRK_NEED_FIRMWARE`
|
||||
enforced; Wi-Fi firmware loaded end-to-end on real hardware.
|
||||
|
||||
6. **Wi-Fi:** Intel Wi-Fi chipset validated end-to-end with real AP; scan → connect → DHCP →
|
||||
internet access verified.
|
||||
|
||||
7. **Bluetooth:** BLE discovery and basic data exchange on real hardware; HCI init sequence
|
||||
validated; GATT service discovery functional.
|
||||
|
||||
8. **Observability:** Device init timeline observable; per-device init status queryable;
|
||||
boot-time warning summary available.
|
||||
|
||||
9. **No regressions:** All 25+ existing drivers still work; all QEMU validation scripts still pass;
|
||||
`redbear-mini` and `redbear-full` still boot to login prompt.
|
||||
|
||||
## 8. Relationship to Existing Plans
|
||||
|
||||
This plan is the **canonical device initialization plan**. It supersedes or integrates with:
|
||||
|
||||
| Existing Plan | Relationship |
|
||||
|---------------|-------------|
|
||||
| `IRQ-AND-LOWLEVEL-CONTROLLERS-ENHANCEMENT-PLAN.md` | Absorbed: MSI-X (P1), IOMMU (P2) become Phase 2.3-2.4 here |
|
||||
| `ACPI-IMPROVEMENT-PLAN.md` | Integrated: Waves 1-4 become Phase 2.5 + Phase 3.1-3.2 here |
|
||||
| `USB-IMPLEMENTATION-PLAN.md` | Integrated: xHCI hardening + controller gaps become Phase 2.1-2.2 here |
|
||||
| `XHCID-DEVICE-IMPROVEMENT-PLAN.md` | Integrated: 7-phase xhcid plan consolidated into Phase 2.2 here |
|
||||
| `WIFI-IMPLEMENTATION-PLAN.md` | Absorbed: Wi-Fi hardware validation becomes Phase 4.2 here |
|
||||
| `BLUETOOTH-IMPLEMENTATION-PLAN.md` | Absorbed: BT validation becomes Phase 5.1 here |
|
||||
| `BOOT-PROCESS-ASSESSMENT.md` | Input: boot flow, service ordering, pcid-spawner fix already applied |
|
||||
| `BOOT-PROCESS-IMPROVEMENT-PLAN.md` | Input: kernel 4GiB fix, DRM/KMS, greeter UI (already addressed) |
|
||||
| `CONSOLE-TO-KDE-DESKTOP-PLAN.md` | Orthogonal: this plan focuses on device init, not desktop path |
|
||||
|
||||
Existing plans remain as reference material for historical detail and subsystem-specific
|
||||
technical depth. This plan is the execution authority for sequencing and acceptance criteria.
|
||||
|
||||
## 9. Immediate Next Actions (Week 1 Priorities)
|
||||
|
||||
1. **Create `redox-driver-core` crate** — define `Bus`, `Device`, `Driver` traits
|
||||
2. **Read Linux 7.0 `drivers/base/driver.c`** — understand the driver binding model to adapt
|
||||
3. **Audit `pcid` scheme interface** — what device info is already exposed vs what's needed
|
||||
4. **Select USB EHCI reference implementation** — Linux `ehci-hcd.c` or FreeBSD `ehci.c`
|
||||
5. **Procure test hardware** — at minimum: one AMD machine with AMD GPU + one Intel machine with Intel GPU
|
||||
6. **Set up USB keyboard test matrix** — catalog existing USB keyboards and host controllers
|
||||
7. **Create firmware manifest template** — define format for `/lib/firmware/MANIFEST.txt`
|
||||
8. **Schedule MSI-X hardware validation session** — reserve time on test machines for Phase 2.3
|
||||
|
||||
---
|
||||
|
||||
*This plan will be updated as implementation progresses. Each phase section will receive
|
||||
detailed task breakdown (similar to the ACPI and IRQ plans' execution slice format) before
|
||||
that phase begins.*
|
||||
File diff suppressed because it is too large
Load Diff
@@ -0,0 +1,50 @@
|
||||
# P1-P8 Scheduler & Relibc Stability Review
|
||||
|
||||
**Date:** 2026-04-30
|
||||
**Scope:** Comprehensive review of P1-P8 kernel scheduler and relibc changes for stability, robustness, and clean code
|
||||
|
||||
## HIGH Severity — Fixed This Session
|
||||
|
||||
| # | File | Issue | Fix |
|
||||
|---|------|-------|-----|
|
||||
| 1 | `pthread_mutex.rs:89` | `make_consistent` stored dead TID instead of 0 | Store 0 for "no owner" |
|
||||
| 2 | `cond.rs:106` | `.unwrap()` suppressed EOWNERDEAD/ENOTRECOVERABLE | Changed to `.expect()` with message |
|
||||
|
||||
## HIGH Severity — Documented as Known Limitations
|
||||
|
||||
| # | File | Issue | Status |
|
||||
|---|------|-------|--------|
|
||||
| 3 | `switch.rs:396-437` | `steal_work` CPU iteration without atomicity | Structural limitation; documented with TODO |
|
||||
| 4 | `proc.rs:481,613` | Lock ordering violation TODO in kfmap/ksetup | Pre-existing; requires deeper refactoring |
|
||||
| 5 | `futex.rs:821-844` | PI futex CAS loop with `entry().or_insert()` race | Requires atomic entry creation pattern |
|
||||
|
||||
## MEDIUM Severity — Documented for Follow-up
|
||||
|
||||
| # | File | Issue |
|
||||
|---|------|-------|
|
||||
| 6 | `switch.rs:171` | TODO: Better memory orderings for CONTEXT_SWITCH_LOCK |
|
||||
| 7 | `futex.rs:370-380` | Addrspace freed while robust list walk (UAF risk) |
|
||||
| 8 | `pthread_mutex.rs:140` | `mutex_owner_id_is_live` O(n) scan |
|
||||
| 9 | `pthread_mutex.rs:37-39` | SPIN_COUNT = 0 — no adaptive spinning |
|
||||
| 10 | `barrier.rs` | No pthread_barrier_destroy — memory leak |
|
||||
| 11 | `sched/mod.rs` | All sched_* functions return ENOSYS (honest stubs) |
|
||||
| 12 | `pthread/mod.rs:553` | pthread_setname_np allocates format! on every call |
|
||||
|
||||
## Build Verification
|
||||
|
||||
- `cargo check` relibc: ✅ passes (1 pre-existing warning)
|
||||
- `make r.kernel`: ✅ passes
|
||||
- P8 patches in recipe: 5 of 8 wired (3 not yet wired — initial-placement, load-balance, work-stealing)
|
||||
|
||||
## Honest Status Assessment
|
||||
|
||||
| Phase | Status | Notes |
|
||||
|-------|--------|-------|
|
||||
| P0 | ✅ Complete | Barrier SMP, sigmask, pthread_kill |
|
||||
| P1 | ✅ Complete | Robust mutexes, sched API (honest ENOSYS) |
|
||||
| P2 | ✅ Complete | RT scheduling, SchedPolicy |
|
||||
| P3 | 🚧 Partial | PerCpuSched + wiring done; stealing/balancing deferred |
|
||||
| P4 | ✅ Complete | Futex sharding + REQUEUE + PI + robust |
|
||||
| P5 | ✅ Complete | setpriority, affinity, thread naming, schedparam |
|
||||
| P6 | 🚧 Partial | Cache-affine done; NUMA deferred |
|
||||
| P7-P8 | ✅ Complete | Futex REQUEUE/PI/robust deliverable |
|
||||
@@ -0,0 +1,61 @@
|
||||
diff --git a/drivers/pcid/src/scheme.rs b/drivers/pcid/src/scheme.rs
|
||||
index ce55b33f..c06bdec4 100644
|
||||
--- a/drivers/pcid/src/scheme.rs
|
||||
+++ b/drivers/pcid/src/scheme.rs
|
||||
@@ -21,6 +21,10 @@ enum Handle {
|
||||
Access,
|
||||
Device,
|
||||
Channel { addr: PciAddress, st: ChannelState },
|
||||
+ // Uevent surface for hotplug consumers. Opening uevent returns an object
|
||||
+ // from which device add/remove events can be read. Since pcid currently
|
||||
+ // only scans at startup, this surface is ready for hotplug polling consumers.
|
||||
+ Uevent,
|
||||
SchemeRoot,
|
||||
/// Represents an open handle to a device's bind endpoint
|
||||
Bind { addr: PciAddress },
|
||||
@@ -34,7 +38,7 @@ struct HandleWrapper {
|
||||
}
|
||||
fn is_file(&self) -> bool {
|
||||
- matches!(self, Self::Access | Self::Channel { .. } | Self::Bind { .. })
|
||||
+ matches!(self, Self::Access | Self::Channel { .. } | Self::Bind { .. } | Self::Uevent)
|
||||
}
|
||||
fn is_dir(&self) -> bool {
|
||||
!self.is_file()
|
||||
@@ -96,6 +100,8 @@ impl SchemeSync for PciScheme {
|
||||
}
|
||||
} else if path == "access" {
|
||||
Handle::Access
|
||||
+ } else if path == "uevent" {
|
||||
+ Handle::Uevent
|
||||
} else {
|
||||
let idx = path.find('/').unwrap_or(path.len());
|
||||
let (addr_str, after) = path.split_at(idx);
|
||||
@@ -140,6 +146,7 @@ impl SchemeSync for PciScheme {
|
||||
Handle::Device => (DEVICE_CONTENTS.len(), MODE_DIR | 0o755),
|
||||
Handle::Access | Handle::Channel { .. } | Handle::Bind { .. } => (0, MODE_CHR | 0o600),
|
||||
+ Handle::Uevent => (0, MODE_CHR | 0o644),
|
||||
Handle::SchemeRoot => return Err(Error::new(EBADF)),
|
||||
};
|
||||
stat.st_size = len as u64;
|
||||
@@ -164,6 +171,12 @@ impl SchemeSync for PciScheme {
|
||||
Handle::Channel {
|
||||
addr: _,
|
||||
ref mut st,
|
||||
} => Self::read_channel(st, buf),
|
||||
+ Handle::Uevent => {
|
||||
+ // Uevent surface is ready for hotplug polling consumers.
|
||||
+ // pcid currently only scans at startup, so return empty (EAGAIN would indicate no data available).
|
||||
+ // Consumers can poll and re-read to check for new events.
|
||||
+ Ok(0)
|
||||
+ }
|
||||
Handle::SchemeRoot | Handle::Bind { .. } => Err(Error::new(EBADF)),
|
||||
_ => Err(Error::new(EBADF)),
|
||||
}
|
||||
@@ -199,7 +212,7 @@ impl SchemeSync for PciScheme {
|
||||
}
|
||||
Handle::Device => DEVICE_CONTENTS,
|
||||
- Handle::Access | Handle::Channel { .. } | Handle::Bind { .. } => return Err(Error::new(ENOTDIR)),
|
||||
+ Handle::Access | Handle::Channel { .. } | Handle::Bind { .. } | Handle::Uevent => return Err(Error::new(ENOTDIR)),
|
||||
Handle::SchemeRoot => return Err(Error::new(EBADF)),
|
||||
};
|
||||
for (i, dent_name) in entries.iter().enumerate().skip(offset) {
|
||||
@@ -0,0 +1,20 @@
|
||||
diff --git a/drivers/usb/xhcid/src/xhci/mod.rs b/drivers/usb/xhcid/src/xhci/mod.rs
|
||||
index f1c6d08e..a3f2e15c 100644
|
||||
--- a/drivers/usb/xhcid/src/xhci/mod.rs
|
||||
+++ b/drivers/usb/xhcid/src/xhci/mod.rs
|
||||
@@ -904,6 +904,7 @@ impl<const N: usize> Xhci<N> {
|
||||
match self.spawn_drivers(port_id) {
|
||||
Ok(()) => {
|
||||
info!("xhcid: uevent add device usb/{}", port_id.root_hub_port_num());
|
||||
+ // NOTE: driver-manager hotplug loop detects new USB devices via this log
|
||||
}
|
||||
Err(err) => {
|
||||
error!("Failed to spawn driver for port {}: `{}`", port_id, err)
|
||||
@@ -974,6 +975,8 @@ impl<const N: usize> Xhci<N> {
|
||||
info!("xhcid: uevent remove device usb/{}", port_id.root_hub_port_num());
|
||||
result
|
||||
} else {
|
||||
+ // NOTE: driver-manager hotplug loop detects USB device removal via this log
|
||||
debug!(
|
||||
"Attempted to detach from port {}, which wasn't previously attached.",
|
||||
port_id
|
||||
File diff suppressed because it is too large
Load Diff
@@ -0,0 +1,844 @@
|
||||
diff --git a/drivers/acpid/src/acpi.rs b/drivers/acpid/src/acpi.rs
|
||||
index 94a1eb17..c8919290 100644
|
||||
--- a/drivers/acpid/src/acpi.rs
|
||||
+++ b/drivers/acpid/src/acpi.rs
|
||||
@@ -52,9 +52,7 @@ impl SdtHeader {
|
||||
}
|
||||
}
|
||||
pub fn length(&self) -> usize {
|
||||
- self.length
|
||||
- .try_into()
|
||||
- .expect("expected usize to be at least 32 bits")
|
||||
+ self.length as usize
|
||||
}
|
||||
}
|
||||
|
||||
@@ -132,6 +130,9 @@ impl Drop for PhysmapGuard {
|
||||
pub struct Sdt(Arc<[u8]>);
|
||||
|
||||
impl Sdt {
|
||||
+ // SDT validation is split between parser and caller policy:
|
||||
+ // - this parser only decides whether a given byte slice is structurally valid,
|
||||
+ // - callers decide whether rejection is fatal (root [R|X]SDT) or degradable (child tables).
|
||||
pub fn new(slice: Arc<[u8]>) -> Result<Self, InvalidSdtError> {
|
||||
let header = match plain::from_bytes::<SdtHeader>(&slice) {
|
||||
Ok(header) => header,
|
||||
@@ -233,6 +234,177 @@ impl fmt::Debug for Sdt {
|
||||
pub struct Dsdt(Sdt);
|
||||
pub struct Ssdt(Sdt);
|
||||
|
||||
+#[derive(Clone, Copy, Debug)]
|
||||
+pub enum AmlBootstrapMethod {
|
||||
+ HwdEnv,
|
||||
+ X86BiosFallback,
|
||||
+}
|
||||
+impl AmlBootstrapMethod {
|
||||
+ fn as_str(self) -> &'static str {
|
||||
+ match self {
|
||||
+ Self::HwdEnv => "hwd RSDP_ADDR/RSDP_SIZE handoff",
|
||||
+ Self::X86BiosFallback => "x86 BIOS fallback",
|
||||
+ }
|
||||
+ }
|
||||
+}
|
||||
+
|
||||
+#[derive(Clone, Debug)]
|
||||
+pub struct AmlBootstrap {
|
||||
+ rsdp_addr: usize,
|
||||
+ rsdp_size: Option<usize>,
|
||||
+ method: AmlBootstrapMethod,
|
||||
+}
|
||||
+impl AmlBootstrap {
|
||||
+ pub fn from_env() -> Result<Self, Box<dyn Error>> {
|
||||
+ let rsdp_addr = usize::from_str_radix(&std::env::var("RSDP_ADDR")?, 16)?;
|
||||
+ let rsdp_size = match std::env::var("RSDP_SIZE") {
|
||||
+ Ok(size) => Some(usize::from_str_radix(&size, 16)?),
|
||||
+ Err(std::env::VarError::NotPresent) => None,
|
||||
+ Err(err) => return Err(Box::new(err)),
|
||||
+ };
|
||||
+
|
||||
+ Ok(Self {
|
||||
+ rsdp_addr,
|
||||
+ rsdp_size,
|
||||
+ method: AmlBootstrapMethod::HwdEnv,
|
||||
+ })
|
||||
+ }
|
||||
+
|
||||
+ #[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
|
||||
+ pub fn x86_bios_fallback() -> Result<Option<Self>, Box<dyn Error>> {
|
||||
+ if let Some(rsdp_addr) = search_x86_bios_rsdp()? {
|
||||
+ return Ok(Some(Self {
|
||||
+ rsdp_addr,
|
||||
+ rsdp_size: None,
|
||||
+ method: AmlBootstrapMethod::X86BiosFallback,
|
||||
+ }));
|
||||
+ }
|
||||
+
|
||||
+ Ok(None)
|
||||
+ }
|
||||
+
|
||||
+ #[cfg(not(any(target_arch = "x86", target_arch = "x86_64")))]
|
||||
+ pub fn x86_bios_fallback() -> Result<Option<Self>, Box<dyn Error>> {
|
||||
+ Ok(None)
|
||||
+ }
|
||||
+
|
||||
+ pub fn log_bootstrap(&self) {
|
||||
+ log::info!(
|
||||
+ "acpid: AML bootstrap via {} (RSDP at {:#X})",
|
||||
+ self.method.as_str(),
|
||||
+ self.rsdp_addr
|
||||
+ );
|
||||
+
|
||||
+ if let Some(rsdp_size) = self.rsdp_size {
|
||||
+ log::debug!("acpid: AML bootstrap RSDP_SIZE={:#X}", rsdp_size);
|
||||
+ }
|
||||
+ }
|
||||
+}
|
||||
+
|
||||
+#[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
|
||||
+const RSDP_SIGNATURE: &[u8; 8] = b"RSD PTR ";
|
||||
+
|
||||
+#[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
|
||||
+fn search_x86_bios_rsdp() -> Result<Option<usize>, Box<dyn Error>> {
|
||||
+ let ebda_segment = read_u16_physical(0x40E)?;
|
||||
+ let ebda_addr = usize::from(ebda_segment) << 4;
|
||||
+
|
||||
+ if ebda_addr != 0 {
|
||||
+ if let Some(rsdp_addr) = search_rsdp_region(ebda_addr, 1024)? {
|
||||
+ return Ok(Some(rsdp_addr));
|
||||
+ }
|
||||
+ }
|
||||
+
|
||||
+ search_rsdp_region(0xE0000, 0x20000).map_err(Into::into)
|
||||
+}
|
||||
+
|
||||
+#[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
|
||||
+fn read_u16_physical(physaddr: usize) -> std::io::Result<u16> {
|
||||
+ let start_page = physaddr / PAGE_SIZE * PAGE_SIZE;
|
||||
+ let page_offset = physaddr % PAGE_SIZE;
|
||||
+ let map = PhysmapGuard::map(start_page, 1)?;
|
||||
+ let bytes = map
|
||||
+ .get(page_offset..page_offset + mem::size_of::<u16>())
|
||||
+ .ok_or_else(|| std::io::Error::new(std::io::ErrorKind::UnexpectedEof, "short BIOS map"))?;
|
||||
+
|
||||
+ Ok(u16::from_le_bytes([bytes[0], bytes[1]]))
|
||||
+}
|
||||
+
|
||||
+#[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
|
||||
+fn search_rsdp_region(physaddr: usize, length: usize) -> std::io::Result<Option<usize>> {
|
||||
+ let start_page = physaddr / PAGE_SIZE * PAGE_SIZE;
|
||||
+ let page_offset = physaddr % PAGE_SIZE;
|
||||
+ let mapped_len = page_offset + length;
|
||||
+ let page_count = mapped_len.div_ceil(PAGE_SIZE);
|
||||
+ let map = PhysmapGuard::map(start_page, page_count)?;
|
||||
+ let region = map.get(page_offset..page_offset + length).ok_or_else(|| {
|
||||
+ std::io::Error::new(std::io::ErrorKind::UnexpectedEof, "short BIOS RSDP search window")
|
||||
+ })?;
|
||||
+
|
||||
+ for candidate_offset in (0..=length.saturating_sub(20)).step_by(16) {
|
||||
+ if region
|
||||
+ .get(candidate_offset..candidate_offset + RSDP_SIGNATURE.len())
|
||||
+ != Some(&RSDP_SIGNATURE[..])
|
||||
+ {
|
||||
+ continue;
|
||||
+ }
|
||||
+
|
||||
+ if rsdp_candidate_valid(®ion[candidate_offset..]) {
|
||||
+ return Ok(Some(physaddr + candidate_offset));
|
||||
+ }
|
||||
+ }
|
||||
+
|
||||
+ Ok(None)
|
||||
+}
|
||||
+
|
||||
+#[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
|
||||
+fn rsdp_candidate_valid(candidate: &[u8]) -> bool {
|
||||
+ if candidate.len() < 20 || &candidate[..RSDP_SIGNATURE.len()] != RSDP_SIGNATURE {
|
||||
+ return false;
|
||||
+ }
|
||||
+
|
||||
+ if checksum_is_zero(&candidate[..20]).is_err() {
|
||||
+ return false;
|
||||
+ }
|
||||
+
|
||||
+ let revision = candidate[15];
|
||||
+ if revision < 2 {
|
||||
+ return true;
|
||||
+ }
|
||||
+
|
||||
+ if candidate.len() < 36 {
|
||||
+ return false;
|
||||
+ }
|
||||
+
|
||||
+ let declared_length = u32::from_le_bytes([candidate[20], candidate[21], candidate[22], candidate[23]])
|
||||
+ as usize;
|
||||
+ if declared_length < 36 || candidate.len() < declared_length {
|
||||
+ return false;
|
||||
+ }
|
||||
+
|
||||
+ checksum_is_zero(&candidate[..declared_length]).is_ok()
|
||||
+}
|
||||
+
|
||||
+#[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
|
||||
+fn checksum_is_zero(bytes: &[u8]) -> Result<(), ()> {
|
||||
+ let checksum = bytes
|
||||
+ .iter()
|
||||
+ .copied()
|
||||
+ .fold(0_u8, |current_sum, item| current_sum.wrapping_add(item));
|
||||
+
|
||||
+ if checksum == 0 {
|
||||
+ Ok(())
|
||||
+ } else {
|
||||
+ Err(())
|
||||
+ }
|
||||
+}
|
||||
+
|
||||
+#[derive(Clone, Copy, Debug)]
|
||||
+struct SleepTypeData {
|
||||
+ slp_typa: u16,
|
||||
+ slp_typb: u16,
|
||||
+}
|
||||
+
|
||||
// Current AML implementation builds the aml_context.namespace at startup,
|
||||
// but the cache for symbols is lazy-loaded when someone
|
||||
// reads from the acpi:/symbols scheme.
|
||||
@@ -245,15 +417,20 @@ pub struct AmlSymbols {
|
||||
symbol_cache: FxHashMap<String, String>,
|
||||
page_cache: Arc<Mutex<AmlPageCache>>,
|
||||
aml_region_handlers: Vec<(RegionSpace, Box<dyn RegionHandler>)>,
|
||||
+ aml_bootstrap: Option<AmlBootstrap>,
|
||||
}
|
||||
|
||||
impl AmlSymbols {
|
||||
- pub fn new(aml_region_handlers: Vec<(RegionSpace, Box<dyn RegionHandler>)>) -> Self {
|
||||
+ pub fn new(
|
||||
+ aml_bootstrap: Option<AmlBootstrap>,
|
||||
+ aml_region_handlers: Vec<(RegionSpace, Box<dyn RegionHandler>)>,
|
||||
+ ) -> Self {
|
||||
Self {
|
||||
aml_context: None,
|
||||
symbol_cache: FxHashMap::default(),
|
||||
page_cache: Arc::new(Mutex::new(AmlPageCache::default())),
|
||||
aml_region_handlers,
|
||||
+ aml_bootstrap,
|
||||
}
|
||||
}
|
||||
|
||||
@@ -264,9 +441,12 @@ impl AmlSymbols {
|
||||
let format_err = |err| format!("{:?}", err);
|
||||
let handler = AmlPhysMemHandler::new(pci_fd, Arc::clone(&self.page_cache));
|
||||
//TODO: use these parsed tables for the rest of acpid
|
||||
- let rsdp_address = usize::from_str_radix(&std::env::var("RSDP_ADDR")?, 16)?;
|
||||
+ let bootstrap = self
|
||||
+ .aml_bootstrap
|
||||
+ .as_ref()
|
||||
+ .ok_or_else(|| std::io::Error::other("AML bootstrap unavailable"))?;
|
||||
let tables =
|
||||
- unsafe { AcpiTables::from_rsdp(handler.clone(), rsdp_address).map_err(format_err)? };
|
||||
+ unsafe { AcpiTables::from_rsdp(handler.clone(), bootstrap.rsdp_addr).map_err(format_err)? };
|
||||
let platform = AcpiPlatform::new(tables, handler).map_err(format_err)?;
|
||||
let interpreter = Interpreter::new_from_platform(&platform).map_err(format_err)?;
|
||||
for (region, handler) in self.aml_region_handlers.drain(..) {
|
||||
@@ -316,7 +496,7 @@ impl AmlSymbols {
|
||||
.namespace
|
||||
.lock()
|
||||
.traverse(|level_aml_name, level| {
|
||||
- for (child_seg, handle) in level.values.iter() {
|
||||
+ for (child_seg, _handle) in level.values.iter() {
|
||||
if let Ok(aml_name) =
|
||||
AmlName::from_name_seg(child_seg.to_owned()).resolve(level_aml_name)
|
||||
{
|
||||
@@ -379,6 +559,7 @@ pub struct AcpiContext {
|
||||
tables: Vec<Sdt>,
|
||||
dsdt: Option<Dsdt>,
|
||||
fadt: Option<Fadt>,
|
||||
+ shutdown_s5: RwLock<Option<SleepTypeData>>,
|
||||
|
||||
aml_symbols: RwLock<AmlSymbols>,
|
||||
|
||||
@@ -426,27 +607,56 @@ impl AcpiContext {
|
||||
|
||||
pub fn init(
|
||||
rxsdt_physaddrs: impl Iterator<Item = u64>,
|
||||
+ aml_bootstrap: Option<AmlBootstrap>,
|
||||
ec: Vec<(RegionSpace, Box<dyn RegionHandler>)>,
|
||||
) -> Self {
|
||||
- let tables = rxsdt_physaddrs
|
||||
- .map(|physaddr| {
|
||||
- let physaddr: usize = physaddr
|
||||
- .try_into()
|
||||
- .expect("expected ACPI addresses to be compatible with the current word size");
|
||||
-
|
||||
- log::trace!("TABLE AT {:#>08X}", physaddr);
|
||||
-
|
||||
- Sdt::load_from_physical(physaddr).expect("failed to load physical SDT")
|
||||
- })
|
||||
- .collect::<Vec<Sdt>>();
|
||||
+ // Child-table validation policy:
|
||||
+ // - checksum/length failures are degradable: warn, skip the table, continue boot,
|
||||
+ // - malformed FADT is handled separately as "raw-table-only" mode for ACPI control paths,
|
||||
+ // - MADT subtable interpretation is delegated to consumers, which must skip unknown entry
|
||||
+ // types instead of treating them as daemon-fatal.
|
||||
+ let mut tables = Vec::new();
|
||||
+ for physaddr in rxsdt_physaddrs {
|
||||
+ let physaddr: usize = match physaddr.try_into() {
|
||||
+ Ok(physaddr) => physaddr,
|
||||
+ Err(_) => {
|
||||
+ log::warn!(
|
||||
+ "acpid: skipping ACPI table at {:#X}: physical address out of range",
|
||||
+ physaddr
|
||||
+ );
|
||||
+ continue;
|
||||
+ }
|
||||
+ };
|
||||
+
|
||||
+ match Sdt::load_from_physical(physaddr) {
|
||||
+ Ok(table) => {
|
||||
+ log::debug!(
|
||||
+ "acpid: accepted ACPI table {} at {:#X}",
|
||||
+ String::from_utf8_lossy(&table.signature),
|
||||
+ physaddr
|
||||
+ );
|
||||
+ tables.push(table);
|
||||
+ }
|
||||
+ Err(TablePhysLoadError::Validity(InvalidSdtError::BadChecksum)) => {
|
||||
+ log::warn!(
|
||||
+ "acpid: skipping ACPI table at {:#X}: checksum validation failed",
|
||||
+ physaddr
|
||||
+ );
|
||||
+ }
|
||||
+ Err(err) => {
|
||||
+ log::warn!("acpid: skipping ACPI table at {:#X}: {}", physaddr, err);
|
||||
+ }
|
||||
+ }
|
||||
+ }
|
||||
|
||||
let mut this = Self {
|
||||
tables,
|
||||
dsdt: None,
|
||||
fadt: None,
|
||||
+ shutdown_s5: RwLock::new(None),
|
||||
|
||||
// Temporary values
|
||||
- aml_symbols: RwLock::new(AmlSymbols::new(ec)),
|
||||
+ aml_symbols: RwLock::new(AmlSymbols::new(aml_bootstrap, ec)),
|
||||
|
||||
next_ctx: RwLock::new(0),
|
||||
|
||||
@@ -581,55 +791,26 @@ impl AcpiContext {
|
||||
let port = fadt.pm1a_control_block as u16;
|
||||
let mut val = 1 << 13;
|
||||
|
||||
- let aml_symbols = self.aml_symbols.read();
|
||||
-
|
||||
- let s5_aml_name = match acpi::aml::namespace::AmlName::from_str("\\_S5") {
|
||||
- Ok(aml_name) => aml_name,
|
||||
- Err(error) => {
|
||||
- log::error!("Could not build AmlName for \\_S5, {:?}", error);
|
||||
- return;
|
||||
- }
|
||||
- };
|
||||
-
|
||||
- let s5 = match &aml_symbols.aml_context {
|
||||
- Some(aml_context) => match aml_context.namespace.lock().get(s5_aml_name) {
|
||||
- Ok(s5) => s5,
|
||||
- Err(error) => {
|
||||
- log::error!("Cannot set S-state, missing \\_S5, {:?}", error);
|
||||
- return;
|
||||
+ if self.shutdown_s5.read().is_none() {
|
||||
+ match self.cache_shutdown_s5_from_ready_aml("existing AML context") {
|
||||
+ Ok(true) | Ok(false) => {}
|
||||
+ Err(err) => {
|
||||
+ log::warn!("acpid: _S5 was not ready at shutdown: {}", err);
|
||||
}
|
||||
- },
|
||||
- None => {
|
||||
- log::error!("Cannot set S-state, AML context not initialized");
|
||||
- return;
|
||||
}
|
||||
- };
|
||||
-
|
||||
- let package = match s5.deref() {
|
||||
- acpi::aml::object::Object::Package(package) => package,
|
||||
- _ => {
|
||||
- log::error!("Cannot set S-state, \\_S5 is not a package");
|
||||
- return;
|
||||
- }
|
||||
- };
|
||||
+ }
|
||||
|
||||
- let slp_typa = match package[0].deref() {
|
||||
- acpi::aml::object::Object::Integer(i) => i.to_owned(),
|
||||
- _ => {
|
||||
- log::error!("typa is not an Integer");
|
||||
- return;
|
||||
- }
|
||||
- };
|
||||
- let slp_typb = match package[1].deref() {
|
||||
- acpi::aml::object::Object::Integer(i) => i.to_owned(),
|
||||
- _ => {
|
||||
- log::error!("typb is not an Integer");
|
||||
- return;
|
||||
- }
|
||||
+ let Some(sleep_types) = *self.shutdown_s5.read() else {
|
||||
+ log::error!("Cannot set S-state, missing derived \\_S5 sleep types");
|
||||
+ return;
|
||||
};
|
||||
|
||||
- log::trace!("Shutdown SLP_TYPa {:X}, SLP_TYPb {:X}", slp_typa, slp_typb);
|
||||
- val |= slp_typa as u16;
|
||||
+ log::trace!(
|
||||
+ "Shutdown SLP_TYPa {:X}, SLP_TYPb {:X}",
|
||||
+ sleep_types.slp_typa,
|
||||
+ sleep_types.slp_typb
|
||||
+ );
|
||||
+ val |= sleep_types.slp_typa;
|
||||
|
||||
#[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
|
||||
{
|
||||
@@ -652,6 +833,86 @@ impl AcpiContext {
|
||||
core::hint::spin_loop();
|
||||
}
|
||||
}
|
||||
+
|
||||
+ pub fn prime_shutdown_s5(&self, pci_fd: Option<&libredox::Fd>, source: &'static str) {
|
||||
+ match self.cache_shutdown_s5(pci_fd, source) {
|
||||
+ Ok(()) => {}
|
||||
+ Err(err) => {
|
||||
+ log::warn!("acpid: unable to derive _S5 from {}: {}", source, err);
|
||||
+ }
|
||||
+ }
|
||||
+ }
|
||||
+
|
||||
+ fn cache_shutdown_s5(
|
||||
+ &self,
|
||||
+ pci_fd: Option<&libredox::Fd>,
|
||||
+ source: &'static str,
|
||||
+ ) -> Result<(), String> {
|
||||
+ if self.shutdown_s5.read().is_some() {
|
||||
+ return Ok(());
|
||||
+ }
|
||||
+
|
||||
+ let mut aml_symbols = self.aml_symbols.write();
|
||||
+ let aml_context = aml_symbols
|
||||
+ .aml_context_mut(pci_fd)
|
||||
+ .map_err(|err| format!("AML not ready: {err}"))?;
|
||||
+ let sleep_types = extract_s5_sleep_types(aml_context)?;
|
||||
+
|
||||
+ *self.shutdown_s5.write() = Some(sleep_types);
|
||||
+ log::info!("acpid: _S5 derived from {}", source);
|
||||
+ Ok(())
|
||||
+ }
|
||||
+
|
||||
+ fn cache_shutdown_s5_from_ready_aml(&self, source: &'static str) -> Result<bool, String> {
|
||||
+ if self.shutdown_s5.read().is_some() {
|
||||
+ return Ok(true);
|
||||
+ }
|
||||
+
|
||||
+ let aml_symbols = self.aml_symbols.read();
|
||||
+ let Some(aml_context) = aml_symbols.aml_context.as_ref() else {
|
||||
+ return Ok(false);
|
||||
+ };
|
||||
+
|
||||
+ let sleep_types = extract_s5_sleep_types(aml_context)?;
|
||||
+ drop(aml_symbols);
|
||||
+
|
||||
+ *self.shutdown_s5.write() = Some(sleep_types);
|
||||
+ log::info!("acpid: _S5 derived from {}", source);
|
||||
+ Ok(true)
|
||||
+ }
|
||||
+}
|
||||
+
|
||||
+fn extract_s5_sleep_types(
|
||||
+ aml_context: &Interpreter<AmlPhysMemHandler>,
|
||||
+) -> Result<SleepTypeData, String> {
|
||||
+ let s5_aml_name = acpi::aml::namespace::AmlName::from_str("\\_S5")
|
||||
+ .map_err(|error| format!("failed to build \\_S5 name: {error:?}"))?;
|
||||
+ let s5 = aml_context
|
||||
+ .namespace
|
||||
+ .lock()
|
||||
+ .get(s5_aml_name)
|
||||
+ .map_err(|error| format!("missing \\_S5: {error:?}"))?;
|
||||
+ let package = match s5.deref() {
|
||||
+ acpi::aml::object::Object::Package(package) => package,
|
||||
+ _ => return Err("\\_S5 is not a package".into()),
|
||||
+ };
|
||||
+
|
||||
+ let slp_typa = extract_sleep_type(package.get(0), "SLP_TYPa")?;
|
||||
+ let slp_typb = extract_sleep_type(package.get(1), "SLP_TYPb")?;
|
||||
+
|
||||
+ Ok(SleepTypeData { slp_typa, slp_typb })
|
||||
+}
|
||||
+
|
||||
+fn extract_sleep_type(value: Option<&WrappedObject>, label: &'static str) -> Result<u16, String> {
|
||||
+ let Some(value) = value else {
|
||||
+ return Err(format!("missing {label} in \\_S5 package"));
|
||||
+ };
|
||||
+
|
||||
+ match value.deref() {
|
||||
+ acpi::aml::object::Object::Integer(i) => u16::try_from(*i)
|
||||
+ .map_err(|_| format!("{label} out of range for PM1 control register")),
|
||||
+ _ => Err(format!("{label} is not an Integer")),
|
||||
+ }
|
||||
}
|
||||
|
||||
#[repr(C, packed)]
|
||||
@@ -760,45 +1021,66 @@ impl Deref for Fadt {
|
||||
type Target = FadtStruct;
|
||||
|
||||
fn deref(&self) -> &Self::Target {
|
||||
- plain::from_bytes::<FadtStruct>(&self.0 .0)
|
||||
- .expect("expected FADT struct to already be validated in Deref impl")
|
||||
+ match plain::from_bytes::<FadtStruct>(&self.0 .0) {
|
||||
+ Ok(fadt) => fadt,
|
||||
+ Err(plain::Error::TooShort) => unreachable!(
|
||||
+ "Fadt::new validates the minimum FADT size before constructing Fadt"
|
||||
+ ),
|
||||
+ Err(plain::Error::BadAlignment) => unreachable!(
|
||||
+ "plain::from_bytes reported bad alignment, but FadtStruct is #[repr(packed)]"
|
||||
+ ),
|
||||
+ }
|
||||
}
|
||||
}
|
||||
|
||||
impl Fadt {
|
||||
pub fn new(sdt: Sdt) -> Option<Fadt> {
|
||||
- if sdt.signature != *b"FACP" || sdt.length() < mem::size_of::<Fadt>() {
|
||||
+ if sdt.signature != *b"FACP" || sdt.length() < mem::size_of::<FadtStruct>() {
|
||||
return None;
|
||||
}
|
||||
Some(Fadt(sdt))
|
||||
}
|
||||
|
||||
pub fn init(context: &mut AcpiContext) {
|
||||
- let fadt_sdt = context
|
||||
- .take_single_sdt(*b"FACP")
|
||||
- .expect("expected ACPI to always have a FADT");
|
||||
+ // FADT policy: this table is mandatory for ACPI control services such as shutdown/reboot.
|
||||
+ // If it is missing or malformed, acpid stays alive for diagnostics/raw tables but degrades
|
||||
+ // into raw-table-only mode instead of crashing the boot.
|
||||
+ let Some(fadt_sdt) = context.take_single_sdt(*b"FACP") else {
|
||||
+ log::error!("acpid: missing FADT; booting without ACPI control services");
|
||||
+ return;
|
||||
+ };
|
||||
|
||||
let fadt = match Fadt::new(fadt_sdt) {
|
||||
Some(fadt) => fadt,
|
||||
None => {
|
||||
- log::error!("Failed to find FADT");
|
||||
+ log::error!("acpid: corrupt FADT; booting without ACPI control services");
|
||||
return;
|
||||
}
|
||||
};
|
||||
|
||||
let dsdt_ptr = match fadt.acpi_2_struct() {
|
||||
- Some(fadt2) => usize::try_from(fadt2.x_dsdt).unwrap_or_else(|_| {
|
||||
- usize::try_from(fadt.dsdt).expect("expected any given u32 to fit within usize")
|
||||
- }),
|
||||
- None => usize::try_from(fadt.dsdt).expect("expected any given u32 to fit within usize"),
|
||||
+ Some(fadt2) if fadt2.x_dsdt != 0 => match usize::try_from(fadt2.x_dsdt) {
|
||||
+ Ok(dsdt_ptr) => dsdt_ptr,
|
||||
+ Err(_) => {
|
||||
+ log::warn!(
|
||||
+ "acpid: x_dsdt address out of range; falling back to 32-bit DSDT pointer"
|
||||
+ );
|
||||
+ fadt.dsdt as usize
|
||||
+ }
|
||||
+ },
|
||||
+ _ => fadt.dsdt as usize,
|
||||
};
|
||||
|
||||
log::debug!("FACP at {:X}", { dsdt_ptr });
|
||||
|
||||
- let dsdt_sdt = match Sdt::load_from_physical(fadt.dsdt as usize) {
|
||||
+ let dsdt_sdt = match Sdt::load_from_physical(dsdt_ptr) {
|
||||
Ok(dsdt) => dsdt,
|
||||
Err(error) => {
|
||||
- log::error!("Failed to load DSDT: {}", error);
|
||||
+ log::error!(
|
||||
+ "acpid: corrupt FADT/DSDT linkage (DSDT at {:#X}): booting without ACPI control services: {}",
|
||||
+ dsdt_ptr,
|
||||
+ error
|
||||
+ );
|
||||
return;
|
||||
}
|
||||
};
|
||||
diff --git a/drivers/acpid/src/main.rs b/drivers/acpid/src/main.rs
|
||||
index 059254b3..25566553 100644
|
||||
--- a/drivers/acpid/src/main.rs
|
||||
+++ b/drivers/acpid/src/main.rs
|
||||
@@ -3,6 +3,7 @@ use std::fs::File;
|
||||
use std::mem;
|
||||
use std::ops::ControlFlow;
|
||||
use std::os::unix::io::AsRawFd;
|
||||
+use std::process;
|
||||
use std::sync::Arc;
|
||||
|
||||
use ::acpi::aml::op_region::{RegionHandler, RegionSpace};
|
||||
@@ -28,94 +29,206 @@ fn daemon(daemon: daemon::Daemon) -> ! {
|
||||
|
||||
log::info!("acpid start");
|
||||
|
||||
- let rxsdt_raw_data: Arc<[u8]> = std::fs::read("/scheme/kernel.acpi/rxsdt")
|
||||
- .expect("acpid: failed to read `/scheme/kernel.acpi/rxsdt`")
|
||||
- .into();
|
||||
+ let rxsdt_raw_data: Arc<[u8]> = match std::fs::read("/scheme/kernel.acpi/rxsdt") {
|
||||
+ Ok(data) => data.into(),
|
||||
+ Err(err) => {
|
||||
+ log::error!("acpid: failed to read `/scheme/kernel.acpi/rxsdt`: {}", err);
|
||||
+ process::exit(1);
|
||||
+ }
|
||||
+ };
|
||||
|
||||
if rxsdt_raw_data.is_empty() {
|
||||
log::info!("System doesn't use ACPI");
|
||||
daemon.ready();
|
||||
- std::process::exit(0);
|
||||
+ process::exit(0);
|
||||
}
|
||||
|
||||
- let sdt = self::acpi::Sdt::new(rxsdt_raw_data).expect("acpid: failed to parse [RX]SDT");
|
||||
+ // Root-table policy: if the kernel-provided [R|X]SDT is malformed, acpid cannot enumerate any
|
||||
+ // firmware tables at all. That is fatal to this daemon, but it must fail with a logged exit
|
||||
+ // rather than a panic on malformed firmware input.
|
||||
+ let sdt = match self::acpi::Sdt::new(rxsdt_raw_data) {
|
||||
+ Ok(sdt) => sdt,
|
||||
+ Err(err) => {
|
||||
+ log::error!("acpid: failed to parse kernel [R|X]SDT: {}", err);
|
||||
+ process::exit(1);
|
||||
+ }
|
||||
+ };
|
||||
+
|
||||
+ // AML bootstrap contract:
|
||||
+ // - preferred path: RSDP_ADDR[/RSDP_SIZE] inherited into acpid by the boot path,
|
||||
+ // - x86 fallback: bounded BIOS RSDP search when that explicit handoff is absent or unusable.
|
||||
+ let aml_bootstrap = match self::acpi::AmlBootstrap::from_env() {
|
||||
+ Ok(bootstrap) => {
|
||||
+ bootstrap.log_bootstrap();
|
||||
+ Some(bootstrap)
|
||||
+ }
|
||||
+ Err(err) => {
|
||||
+ log::warn!(
|
||||
+ "acpid: explicit AML bootstrap handoff unavailable ({}); trying x86 BIOS fallback",
|
||||
+ err
|
||||
+ );
|
||||
|
||||
- let mut thirty_two_bit;
|
||||
- let mut sixty_four_bit;
|
||||
+ match self::acpi::AmlBootstrap::x86_bios_fallback() {
|
||||
+ Ok(Some(bootstrap)) => {
|
||||
+ bootstrap.log_bootstrap();
|
||||
+ Some(bootstrap)
|
||||
+ }
|
||||
+ Ok(None) => {
|
||||
+ log::warn!(
|
||||
+ "acpid: AML bootstrap unavailable; continuing without AML-backed ACPI services"
|
||||
+ );
|
||||
+ None
|
||||
+ }
|
||||
+ Err(err) => {
|
||||
+ log::warn!(
|
||||
+ "acpid: x86 BIOS AML bootstrap fallback failed ({}); continuing without AML-backed ACPI services",
|
||||
+ err
|
||||
+ );
|
||||
+ None
|
||||
+ }
|
||||
+ }
|
||||
+ }
|
||||
+ };
|
||||
|
||||
- let physaddrs_iter = match &sdt.signature {
|
||||
+ let physaddrs = match &sdt.signature {
|
||||
b"RSDT" => {
|
||||
- thirty_two_bit = sdt
|
||||
- .data()
|
||||
- .chunks(mem::size_of::<u32>())
|
||||
- // TODO: With const generics, the compiler has some way of doing this for static sizes.
|
||||
- .map(|chunk| <[u8; mem::size_of::<u32>()]>::try_from(chunk).unwrap())
|
||||
- .map(|chunk| u32::from_le_bytes(chunk))
|
||||
- .map(u64::from);
|
||||
-
|
||||
- &mut thirty_two_bit as &mut dyn Iterator<Item = u64>
|
||||
+ let chunks = sdt.data().chunks_exact(mem::size_of::<u32>());
|
||||
+ if !chunks.remainder().is_empty() {
|
||||
+ log::error!("acpid: malformed RSDT payload length {}", sdt.data().len());
|
||||
+ process::exit(1);
|
||||
+ }
|
||||
+
|
||||
+ chunks
|
||||
+ .map(|chunk| {
|
||||
+ let chunk = <[u8; mem::size_of::<u32>()]>::try_from(chunk)
|
||||
+ .map_err(|_| "invalid 32-bit RSDT entry width")?;
|
||||
+ Ok(u64::from(u32::from_le_bytes(chunk)))
|
||||
+ })
|
||||
+ .collect::<Result<Vec<u64>, &str>>()
|
||||
}
|
||||
b"XSDT" => {
|
||||
- sixty_four_bit = sdt
|
||||
- .data()
|
||||
- .chunks(mem::size_of::<u64>())
|
||||
- .map(|chunk| <[u8; mem::size_of::<u64>()]>::try_from(chunk).unwrap())
|
||||
- .map(|chunk| u64::from_le_bytes(chunk));
|
||||
+ let chunks = sdt.data().chunks_exact(mem::size_of::<u64>());
|
||||
+ if !chunks.remainder().is_empty() {
|
||||
+ log::error!("acpid: malformed XSDT payload length {}", sdt.data().len());
|
||||
+ process::exit(1);
|
||||
+ }
|
||||
|
||||
- &mut sixty_four_bit as &mut dyn Iterator<Item = u64>
|
||||
+ chunks
|
||||
+ .map(|chunk| {
|
||||
+ let chunk = <[u8; mem::size_of::<u64>()]>::try_from(chunk)
|
||||
+ .map_err(|_| "invalid 64-bit XSDT entry width")?;
|
||||
+ Ok(u64::from_le_bytes(chunk))
|
||||
+ })
|
||||
+ .collect::<Result<Vec<u64>, &str>>()
|
||||
+ }
|
||||
+ _ => {
|
||||
+ log::error!(
|
||||
+ "acpid: expected kernel root table to be RSDT or XSDT, got {}",
|
||||
+ String::from_utf8_lossy(&sdt.signature)
|
||||
+ );
|
||||
+ process::exit(1);
|
||||
+ }
|
||||
+ };
|
||||
+ let physaddrs = match physaddrs {
|
||||
+ Ok(physaddrs) => physaddrs,
|
||||
+ Err(err) => {
|
||||
+ log::error!("acpid: failed to decode root table pointers: {}", err);
|
||||
+ process::exit(1);
|
||||
}
|
||||
- _ => panic!("acpid: expected [RX]SDT from kernel to be either of those"),
|
||||
};
|
||||
|
||||
let region_handlers: Vec<(RegionSpace, Box<dyn RegionHandler + 'static>)> = vec![
|
||||
#[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
|
||||
(RegionSpace::EmbeddedControl, Box::new(ec::Ec::new())),
|
||||
];
|
||||
- let acpi_context = self::acpi::AcpiContext::init(physaddrs_iter, region_handlers);
|
||||
+ let acpi_context = self::acpi::AcpiContext::init(physaddrs.into_iter(), aml_bootstrap, region_handlers);
|
||||
|
||||
// TODO: I/O permission bitmap?
|
||||
#[cfg(any(target_arch = "x86", target_arch = "x86_64"))]
|
||||
- common::acquire_port_io_rights().expect("acpid: failed to set I/O privilege level to Ring 3");
|
||||
+ if let Err(err) = common::acquire_port_io_rights() {
|
||||
+ log::error!(
|
||||
+ "acpid: failed to set I/O privilege level to Ring 3: {}",
|
||||
+ err
|
||||
+ );
|
||||
+ process::exit(1);
|
||||
+ }
|
||||
|
||||
- let shutdown_pipe = File::open("/scheme/kernel.acpi/kstop")
|
||||
- .expect("acpid: failed to open `/scheme/kernel.acpi/kstop`");
|
||||
+ let shutdown_pipe = match File::open("/scheme/kernel.acpi/kstop") {
|
||||
+ Ok(file) => file,
|
||||
+ Err(err) => {
|
||||
+ log::error!("acpid: failed to open `/scheme/kernel.acpi/kstop`: {}", err);
|
||||
+ process::exit(1);
|
||||
+ }
|
||||
+ };
|
||||
|
||||
- let mut event_queue = RawEventQueue::new().expect("acpid: failed to create event queue");
|
||||
- let socket = Socket::nonblock().expect("acpid: failed to create disk scheme");
|
||||
+ let mut event_queue = match RawEventQueue::new() {
|
||||
+ Ok(event_queue) => event_queue,
|
||||
+ Err(err) => {
|
||||
+ log::error!("acpid: failed to create event queue: {}", err);
|
||||
+ process::exit(1);
|
||||
+ }
|
||||
+ };
|
||||
+ let socket = match Socket::nonblock() {
|
||||
+ Ok(socket) => socket,
|
||||
+ Err(err) => {
|
||||
+ log::error!("acpid: failed to create acpi scheme socket: {}", err);
|
||||
+ process::exit(1);
|
||||
+ }
|
||||
+ };
|
||||
|
||||
let mut scheme = self::scheme::AcpiScheme::new(&acpi_context, &socket);
|
||||
let mut handler = Blocking::new(&socket, 16);
|
||||
|
||||
- event_queue
|
||||
- .subscribe(shutdown_pipe.as_raw_fd() as usize, 0, EventFlags::READ)
|
||||
- .expect("acpid: failed to register shutdown pipe for event queue");
|
||||
- event_queue
|
||||
- .subscribe(socket.inner().raw(), 1, EventFlags::READ)
|
||||
- .expect("acpid: failed to register scheme socket for event queue");
|
||||
+ if let Err(err) = event_queue.subscribe(shutdown_pipe.as_raw_fd() as usize, 0, EventFlags::READ)
|
||||
+ {
|
||||
+ log::error!(
|
||||
+ "acpid: failed to register shutdown pipe for event queue: {}",
|
||||
+ err
|
||||
+ );
|
||||
+ process::exit(1);
|
||||
+ }
|
||||
+ if let Err(err) = event_queue.subscribe(socket.inner().raw(), 1, EventFlags::READ) {
|
||||
+ log::error!(
|
||||
+ "acpid: failed to register scheme socket for event queue: {}",
|
||||
+ err
|
||||
+ );
|
||||
+ process::exit(1);
|
||||
+ }
|
||||
|
||||
- register_sync_scheme(&socket, "acpi", &mut scheme)
|
||||
- .expect("acpid: failed to register acpi scheme to namespace");
|
||||
+ if let Err(err) = register_sync_scheme(&socket, "acpi", &mut scheme) {
|
||||
+ log::error!("acpid: failed to register acpi scheme to namespace: {}", err);
|
||||
+ process::exit(1);
|
||||
+ }
|
||||
|
||||
daemon.ready();
|
||||
|
||||
- libredox::call::setrens(0, 0).expect("acpid: failed to enter null namespace");
|
||||
+ if let Err(err) = libredox::call::setrens(0, 0) {
|
||||
+ log::error!("acpid: failed to enter null namespace: {}", err);
|
||||
+ process::exit(1);
|
||||
+ }
|
||||
|
||||
let mut mounted = true;
|
||||
while mounted {
|
||||
- let Some(event) = event_queue
|
||||
- .next()
|
||||
- .transpose()
|
||||
- .expect("acpid: failed to read event file")
|
||||
- else {
|
||||
+ let event = match event_queue.next().transpose() {
|
||||
+ Ok(event) => event,
|
||||
+ Err(err) => {
|
||||
+ log::error!("acpid: failed to read event file: {}", err);
|
||||
+ process::exit(1);
|
||||
+ }
|
||||
+ };
|
||||
+ let Some(event) = event else {
|
||||
break;
|
||||
};
|
||||
|
||||
if event.fd == socket.inner().raw() {
|
||||
loop {
|
||||
- match handler
|
||||
- .process_requests_nonblocking(&mut scheme)
|
||||
- .expect("acpid: failed to process requests")
|
||||
- {
|
||||
+ match match handler.process_requests_nonblocking(&mut scheme) {
|
||||
+ Ok(flow) => flow,
|
||||
+ Err(err) => {
|
||||
+ log::error!("acpid: failed to process requests: {}", err);
|
||||
+ process::exit(1);
|
||||
+ }
|
||||
+ } {
|
||||
ControlFlow::Continue(()) => {}
|
||||
ControlFlow::Break(()) => break,
|
||||
}
|
||||
diff --git a/drivers/acpid/src/scheme.rs b/drivers/acpid/src/scheme.rs
|
||||
index 5a5040c3..6e57624a 100644
|
||||
--- a/drivers/acpid/src/scheme.rs
|
||||
+++ b/drivers/acpid/src/scheme.rs
|
||||
@@ -474,6 +474,8 @@ impl SchemeSync for AcpiScheme<'_, '_> {
|
||||
return Err(Error::new(EINVAL));
|
||||
} else {
|
||||
self.pci_fd = Some(new_fd);
|
||||
+ self.ctx
|
||||
+ .prime_shutdown_s5(self.pci_fd.as_ref(), "PCI-backed AML handoff");
|
||||
}
|
||||
|
||||
Ok(num_fds)
|
||||
@@ -0,0 +1,398 @@
|
||||
--- a/drivers/pcid/src/cfg_access/mod.rs
|
||||
+++ b/drivers/pcid/src/cfg_access/mod.rs
|
||||
@@ -349,6 +349,10 @@
|
||||
let bus_addr = self.bus_addr(address.segment(), address.bus())?;
|
||||
Some(unsafe { bus_addr.add(Self::bus_addr_offset_in_dwords(address, offset)) })
|
||||
}
|
||||
+
|
||||
+ pub fn has_extended_config(&self, address: PciAddress) -> bool {
|
||||
+ self.mmio_addr(address, 0x100).is_some()
|
||||
+ }
|
||||
}
|
||||
|
||||
impl ConfigRegionAccess for Pcie {
|
||||
--- a/drivers/pcid/src/scheme.rs
|
||||
+++ b/drivers/pcid/src/scheme.rs
|
||||
@@ -5,12 +5,61 @@
|
||||
use redox_scheme::{CallerCtx, OpenResult};
|
||||
use scheme_utils::HandleMap;
|
||||
use syscall::dirent::{DirEntry, DirentBuf, DirentKind};
|
||||
-use syscall::error::{Error, Result, EACCES, EBADF, EINVAL, EIO, EISDIR, ENOENT, ENOTDIR, EALREADY};
|
||||
+use syscall::error::{
|
||||
+ Error, Result, EACCES, EALREADY, EBADF, EINVAL, EIO, EISDIR, ENOENT, ENOTDIR, EROFS,
|
||||
+};
|
||||
use syscall::flag::{MODE_CHR, MODE_DIR, O_DIRECTORY, O_STAT};
|
||||
use syscall::schemev2::NewFdFlags;
|
||||
use syscall::ENOLCK;
|
||||
|
||||
use crate::cfg_access::Pcie;
|
||||
+
|
||||
+const PCIE_EXTENDED_CAPABILITY_AER: u16 = 0x0001;
|
||||
+
|
||||
+#[derive(Clone, Copy)]
|
||||
+enum AerRegisterName {
|
||||
+ UncorStatus,
|
||||
+ UncorMask,
|
||||
+ UncorSeverity,
|
||||
+ CorStatus,
|
||||
+ CorMask,
|
||||
+ Cap,
|
||||
+ HeaderLog,
|
||||
+}
|
||||
+
|
||||
+impl AerRegisterName {
|
||||
+ fn from_path(path: &str) -> Option<Self> {
|
||||
+ Some(match path {
|
||||
+ "uncor_status" => Self::UncorStatus,
|
||||
+ "uncor_mask" => Self::UncorMask,
|
||||
+ "uncor_severity" => Self::UncorSeverity,
|
||||
+ "cor_status" => Self::CorStatus,
|
||||
+ "cor_mask" => Self::CorMask,
|
||||
+ "cap" => Self::Cap,
|
||||
+ "header_log" => Self::HeaderLog,
|
||||
+ _ => return None,
|
||||
+ })
|
||||
+ }
|
||||
+
|
||||
+ const fn offset(self) -> u16 {
|
||||
+ match self {
|
||||
+ Self::UncorStatus => 0x00,
|
||||
+ Self::UncorMask => 0x04,
|
||||
+ Self::UncorSeverity => 0x08,
|
||||
+ Self::CorStatus => 0x0C,
|
||||
+ Self::CorMask => 0x10,
|
||||
+ Self::Cap => 0x14,
|
||||
+ Self::HeaderLog => 0x18,
|
||||
+ }
|
||||
+ }
|
||||
+
|
||||
+ const fn len(self) -> usize {
|
||||
+ match self {
|
||||
+ Self::HeaderLog => 16,
|
||||
+ _ => 4,
|
||||
+ }
|
||||
+ }
|
||||
+}
|
||||
|
||||
pub struct PciScheme {
|
||||
handles: HandleMap<HandleWrapper>,
|
||||
@@ -20,13 +69,27 @@
|
||||
binds: HashMap<String, u32>,
|
||||
}
|
||||
enum Handle {
|
||||
- TopLevel { entries: Vec<String> },
|
||||
+ TopLevel {
|
||||
+ entries: Vec<String>,
|
||||
+ },
|
||||
Access,
|
||||
- Device,
|
||||
- Channel { addr: PciAddress, st: ChannelState },
|
||||
+ Device {
|
||||
+ addr: PciAddress,
|
||||
+ },
|
||||
+ Channel {
|
||||
+ addr: PciAddress,
|
||||
+ st: ChannelState,
|
||||
+ },
|
||||
SchemeRoot,
|
||||
/// Represents an open handle to a device's bind endpoint
|
||||
- Bind { addr: PciAddress },
|
||||
+ Bind {
|
||||
+ addr: PciAddress,
|
||||
+ },
|
||||
+ AerDir,
|
||||
+ Aer {
|
||||
+ addr: PciAddress,
|
||||
+ register: AerRegisterName,
|
||||
+ },
|
||||
/// Uevent surface for hotplug consumers. Opening uevent returns an object
|
||||
/// from which device add/remove events can be read. Since pcid currently
|
||||
/// only scans at startup, this surface is ready for hotplug polling consumers.
|
||||
@@ -38,13 +101,23 @@
|
||||
}
|
||||
impl Handle {
|
||||
fn is_file(&self) -> bool {
|
||||
- matches!(self, Self::Access | Self::Channel { .. } | Self::Bind { .. } | Self::Uevent)
|
||||
+ matches!(
|
||||
+ self,
|
||||
+ Self::Access
|
||||
+ | Self::Channel { .. }
|
||||
+ | Self::Bind { .. }
|
||||
+ | Self::Aer { .. }
|
||||
+ | Self::Uevent
|
||||
+ )
|
||||
}
|
||||
fn is_dir(&self) -> bool {
|
||||
!self.is_file()
|
||||
}
|
||||
fn requires_root(&self) -> bool {
|
||||
- matches!(self, Self::Access | Self::Channel { .. } | Self::Bind { .. })
|
||||
+ matches!(
|
||||
+ self,
|
||||
+ Self::Access | Self::Channel { .. } | Self::Bind { .. }
|
||||
+ )
|
||||
}
|
||||
fn is_scheme_root(&self) -> bool {
|
||||
matches!(self, Self::SchemeRoot)
|
||||
@@ -57,6 +130,16 @@
|
||||
}
|
||||
|
||||
const DEVICE_CONTENTS: &[&str] = &["channel", "bind"];
|
||||
+const DEVICE_AER_CONTENTS: &[&str] = &["channel", "bind", "aer"];
|
||||
+const AER_CONTENTS: &[&str] = &[
|
||||
+ "uncor_status",
|
||||
+ "uncor_mask",
|
||||
+ "uncor_severity",
|
||||
+ "cor_status",
|
||||
+ "cor_mask",
|
||||
+ "cap",
|
||||
+ "header_log",
|
||||
+];
|
||||
|
||||
impl PciScheme {
|
||||
pub fn access(&mut self) -> usize {
|
||||
@@ -141,7 +224,12 @@
|
||||
|
||||
let (len, mode) = match handle.inner {
|
||||
Handle::TopLevel { ref entries } => (entries.len(), MODE_DIR | 0o755),
|
||||
- Handle::Device => (DEVICE_CONTENTS.len(), MODE_DIR | 0o755),
|
||||
+ Handle::Device { addr } => (
|
||||
+ Self::device_entries(&self.pcie, addr).len(),
|
||||
+ MODE_DIR | 0o755,
|
||||
+ ),
|
||||
+ Handle::AerDir => (AER_CONTENTS.len(), MODE_DIR | 0o755),
|
||||
+ Handle::Aer { register, .. } => (register.len(), MODE_CHR | 0o444),
|
||||
Handle::Access | Handle::Channel { .. } | Handle::Bind { .. } => (0, MODE_CHR | 0o600),
|
||||
Handle::Uevent => (0, MODE_CHR | 0o644),
|
||||
Handle::SchemeRoot => return Err(Error::new(EBADF)),
|
||||
@@ -154,7 +242,7 @@
|
||||
&mut self,
|
||||
id: usize,
|
||||
buf: &mut [u8],
|
||||
- _offset: u64,
|
||||
+ offset: u64,
|
||||
_fcntl_flags: u32,
|
||||
_ctx: &CallerCtx,
|
||||
) -> Result<usize> {
|
||||
@@ -166,11 +254,14 @@
|
||||
|
||||
match handle.inner {
|
||||
Handle::TopLevel { .. } => Err(Error::new(EISDIR)),
|
||||
- Handle::Device => Err(Error::new(EISDIR)),
|
||||
+ Handle::Device { .. } | Handle::AerDir => Err(Error::new(EISDIR)),
|
||||
Handle::Channel {
|
||||
addr: _,
|
||||
ref mut st,
|
||||
} => Self::read_channel(st, buf),
|
||||
+ Handle::Aer { addr, register } => {
|
||||
+ Self::read_aer_register(&self.pcie, addr, register, buf, offset)
|
||||
+ }
|
||||
Handle::Uevent => {
|
||||
// Uevent surface is ready for hotplug polling consumers.
|
||||
// pcid currently only scans at startup, so return empty (EAGAIN would indicate no data available).
|
||||
@@ -209,8 +300,15 @@
|
||||
}
|
||||
return Ok(buf);
|
||||
}
|
||||
- Handle::Device => DEVICE_CONTENTS,
|
||||
- Handle::Access | Handle::Channel { .. } | Handle::Bind { .. } | Handle::Uevent => return Err(Error::new(ENOTDIR)),
|
||||
+ Handle::Device { addr } => Self::device_entries(&self.pcie, addr),
|
||||
+ Handle::AerDir => AER_CONTENTS,
|
||||
+ Handle::Access
|
||||
+ | Handle::Channel { .. }
|
||||
+ | Handle::Bind { .. }
|
||||
+ | Handle::Aer { .. }
|
||||
+ | Handle::Uevent => {
|
||||
+ return Err(Error::new(ENOTDIR));
|
||||
+ }
|
||||
Handle::SchemeRoot => return Err(Error::new(EBADF)),
|
||||
};
|
||||
|
||||
@@ -243,6 +341,7 @@
|
||||
Handle::Channel { addr, ref mut st } => {
|
||||
Self::write_channel(&self.pcie, &mut self.tree, addr, st, buf)
|
||||
}
|
||||
+ Handle::Aer { .. } => Err(Error::new(EROFS)),
|
||||
|
||||
_ => Err(Error::new(EBADF)),
|
||||
}
|
||||
@@ -357,45 +456,151 @@
|
||||
binds: HashMap::new(),
|
||||
}
|
||||
}
|
||||
- fn parse_after_pci_addr(&mut self, addr: PciAddress, after: &str, ctx: &CallerCtx) -> Result<Handle> {
|
||||
+ fn device_entries(pcie: &Pcie, addr: PciAddress) -> &'static [&'static str] {
|
||||
+ if Self::find_pcie_extended_capability(pcie, addr, PCIE_EXTENDED_CAPABILITY_AER).is_some() {
|
||||
+ DEVICE_AER_CONTENTS
|
||||
+ } else {
|
||||
+ DEVICE_CONTENTS
|
||||
+ }
|
||||
+ }
|
||||
+ fn find_pcie_extended_capability(
|
||||
+ pcie: &Pcie,
|
||||
+ addr: PciAddress,
|
||||
+ capability_id: u16,
|
||||
+ ) -> Option<u16> {
|
||||
+ if !pcie.has_extended_config(addr) {
|
||||
+ return None;
|
||||
+ }
|
||||
+
|
||||
+ let mut offset = 0x100_u16;
|
||||
+
|
||||
+ while offset <= 0xFFC {
|
||||
+ let header = unsafe { pcie.read(addr, offset) };
|
||||
+ if header == 0 || header == u32::MAX {
|
||||
+ return None;
|
||||
+ }
|
||||
+
|
||||
+ if (header & 0xFFFF) as u16 == capability_id {
|
||||
+ return Some(offset);
|
||||
+ }
|
||||
+
|
||||
+ let next = ((header >> 20) & 0xFFF) as u16;
|
||||
+ if next < 0x100 || next <= offset || next > 0xFFC || next % 4 != 0 {
|
||||
+ return None;
|
||||
+ }
|
||||
+ offset = next;
|
||||
+ }
|
||||
+
|
||||
+ None
|
||||
+ }
|
||||
+ fn read_file_bytes(data: &[u8], buf: &mut [u8], offset: u64) -> Result<usize> {
|
||||
+ let Ok(offset) = usize::try_from(offset) else {
|
||||
+ return Ok(0);
|
||||
+ };
|
||||
+ if offset >= data.len() {
|
||||
+ return Ok(0);
|
||||
+ }
|
||||
+
|
||||
+ let count = std::cmp::min(buf.len(), data.len() - offset);
|
||||
+ buf[..count].copy_from_slice(&data[offset..offset + count]);
|
||||
+ Ok(count)
|
||||
+ }
|
||||
+ fn read_aer_register(
|
||||
+ pcie: &Pcie,
|
||||
+ addr: PciAddress,
|
||||
+ register: AerRegisterName,
|
||||
+ buf: &mut [u8],
|
||||
+ offset: u64,
|
||||
+ ) -> Result<usize> {
|
||||
+ let Some(aer_base) =
|
||||
+ Self::find_pcie_extended_capability(pcie, addr, PCIE_EXTENDED_CAPABILITY_AER)
|
||||
+ else {
|
||||
+ return Err(Error::new(ENOENT));
|
||||
+ };
|
||||
+
|
||||
+ let mut data = [0_u8; 16];
|
||||
+ for (index, chunk) in data[..register.len()].chunks_exact_mut(4).enumerate() {
|
||||
+ let index = u16::try_from(index).map_err(|_| Error::new(EIO))?;
|
||||
+ let value = unsafe { pcie.read(addr, aer_base + register.offset() + index * 4) };
|
||||
+ chunk.copy_from_slice(&value.to_le_bytes());
|
||||
+ }
|
||||
+
|
||||
+ Self::read_file_bytes(&data[..register.len()], buf, offset)
|
||||
+ }
|
||||
+ fn parse_after_pci_addr(
|
||||
+ &mut self,
|
||||
+ addr: PciAddress,
|
||||
+ after: &str,
|
||||
+ ctx: &CallerCtx,
|
||||
+ ) -> Result<Handle> {
|
||||
if after.chars().next().map_or(false, |c| c != '/') {
|
||||
return Err(Error::new(ENOENT));
|
||||
}
|
||||
let func = self.tree.get_mut(&addr).ok_or(Error::new(ENOENT))?;
|
||||
|
||||
Ok(if after.is_empty() {
|
||||
- Handle::Device
|
||||
+ Handle::Device { addr }
|
||||
} else {
|
||||
let path = &after[1..];
|
||||
|
||||
- match path {
|
||||
- "channel" => {
|
||||
- if func.enabled {
|
||||
- return Err(Error::new(ENOLCK));
|
||||
+ if path == "aer" {
|
||||
+ if Self::find_pcie_extended_capability(
|
||||
+ &self.pcie,
|
||||
+ addr,
|
||||
+ PCIE_EXTENDED_CAPABILITY_AER,
|
||||
+ )
|
||||
+ .is_none()
|
||||
+ {
|
||||
+ return Err(Error::new(ENOENT));
|
||||
+ }
|
||||
+ Handle::AerDir
|
||||
+ } else if let Some(register_name) = path.strip_prefix("aer/") {
|
||||
+ let register =
|
||||
+ AerRegisterName::from_path(register_name).ok_or(Error::new(ENOENT))?;
|
||||
+ if Self::find_pcie_extended_capability(
|
||||
+ &self.pcie,
|
||||
+ addr,
|
||||
+ PCIE_EXTENDED_CAPABILITY_AER,
|
||||
+ )
|
||||
+ .is_none()
|
||||
+ {
|
||||
+ return Err(Error::new(ENOENT));
|
||||
+ }
|
||||
+ Handle::Aer { addr, register }
|
||||
+ } else {
|
||||
+ match path {
|
||||
+ "channel" => {
|
||||
+ if func.enabled {
|
||||
+ return Err(Error::new(ENOLCK));
|
||||
+ }
|
||||
+ func.inner.legacy_interrupt_line = crate::enable_function(
|
||||
+ &self.pcie,
|
||||
+ &mut func.endpoint_header,
|
||||
+ &mut func.capabilities,
|
||||
+ );
|
||||
+ func.enabled = true;
|
||||
+ Handle::Channel {
|
||||
+ addr,
|
||||
+ st: ChannelState::AwaitingData,
|
||||
+ }
|
||||
}
|
||||
- func.inner.legacy_interrupt_line = crate::enable_function(
|
||||
- &self.pcie,
|
||||
- &mut func.endpoint_header,
|
||||
- &mut func.capabilities,
|
||||
- );
|
||||
- func.enabled = true;
|
||||
- Handle::Channel {
|
||||
- addr,
|
||||
- st: ChannelState::AwaitingData,
|
||||
+ "bind" => {
|
||||
+ let addr_str = format!("{}", addr);
|
||||
+ if let Some(&owner_pid) = self.binds.get(&addr_str) {
|
||||
+ log::info!(
|
||||
+ "pcid: device {} already bound by pid {}",
|
||||
+ addr_str,
|
||||
+ owner_pid
|
||||
+ );
|
||||
+ return Err(Error::new(EALREADY));
|
||||
+ }
|
||||
+ let caller_pid = u32::try_from(ctx.pid).map_err(|_| Error::new(EINVAL))?;
|
||||
+ self.binds.insert(addr_str.clone(), caller_pid);
|
||||
+ log::info!("pcid: device {} bound by pid {}", addr_str, caller_pid);
|
||||
+ Handle::Bind { addr }
|
||||
}
|
||||
- }
|
||||
- "bind" => {
|
||||
- let addr_str = format!("{}", addr);
|
||||
- if let Some(&owner_pid) = self.binds.get(&addr_str) {
|
||||
- log::info!("pcid: device {} already bound by pid {}", addr_str, owner_pid);
|
||||
- return Err(Error::new(EALREADY));
|
||||
- }
|
||||
- let caller_pid = ctx.pid;
|
||||
- self.binds.insert(addr_str.clone(), caller_pid);
|
||||
- log::info!("pcid: device {} bound by pid {}", addr_str, caller_pid);
|
||||
- Handle::Bind { addr }
|
||||
- }
|
||||
- _ => return Err(Error::new(ENOENT)),
|
||||
+ _ => return Err(Error::new(ENOENT)),
|
||||
+ }
|
||||
}
|
||||
})
|
||||
}
|
||||
@@ -0,0 +1,182 @@
|
||||
diff --git a/drivers/pcid/src/scheme.rs b/drivers/pcid/src/scheme.rs
|
||||
index bb9f39a3..06be6267 100644
|
||||
--- a/drivers/pcid/src/scheme.rs
|
||||
+++ b/drivers/pcid/src/scheme.rs
|
||||
@@ -1,11 +1,11 @@
|
||||
-use std::collections::{BTreeMap, VecDeque};
|
||||
+use std::collections::{BTreeMap, HashMap, VecDeque};
|
||||
|
||||
use pci_types::{ConfigRegionAccess, PciAddress};
|
||||
use redox_scheme::scheme::SchemeSync;
|
||||
use redox_scheme::{CallerCtx, OpenResult};
|
||||
use scheme_utils::HandleMap;
|
||||
use syscall::dirent::{DirEntry, DirentBuf, DirentKind};
|
||||
-use syscall::error::{Error, Result, EACCES, EBADF, EINVAL, EIO, EISDIR, ENOENT, ENOTDIR};
|
||||
+use syscall::error::{Error, Result, EACCES, EBADF, EINVAL, EIO, EISDIR, ENOENT, ENOTDIR, EALREADY};
|
||||
use syscall::flag::{MODE_CHR, MODE_DIR, O_DIRECTORY, O_STAT};
|
||||
use syscall::schemev2::NewFdFlags;
|
||||
use syscall::ENOLCK;
|
||||
@@ -16,6 +16,8 @@ pub struct PciScheme {
|
||||
handles: HandleMap<HandleWrapper>,
|
||||
pub pcie: Pcie,
|
||||
pub tree: BTreeMap<PciAddress, crate::Func>,
|
||||
+ /// Maps device address string (e.g. "0000:00:14.0") to owning PID
|
||||
+ binds: HashMap<String, u32>,
|
||||
}
|
||||
enum Handle {
|
||||
TopLevel { entries: Vec<String> },
|
||||
@@ -23,6 +25,12 @@ enum Handle {
|
||||
Device,
|
||||
Channel { addr: PciAddress, st: ChannelState },
|
||||
SchemeRoot,
|
||||
+ /// Represents an open handle to a device's bind endpoint
|
||||
+ Bind { addr: PciAddress },
|
||||
+ /// Uevent surface for hotplug consumers. Opening uevent returns an object
|
||||
+ /// from which device add/remove events can be read. Since pcid currently
|
||||
+ /// only scans at startup, this surface is ready for hotplug polling consumers.
|
||||
+ Uevent,
|
||||
}
|
||||
struct HandleWrapper {
|
||||
inner: Handle,
|
||||
@@ -30,14 +38,13 @@ struct HandleWrapper {
|
||||
}
|
||||
impl Handle {
|
||||
fn is_file(&self) -> bool {
|
||||
- matches!(self, Self::Access | Self::Channel { .. })
|
||||
+ matches!(self, Self::Access | Self::Channel { .. } | Self::Bind { .. } | Self::Uevent)
|
||||
}
|
||||
fn is_dir(&self) -> bool {
|
||||
!self.is_file()
|
||||
}
|
||||
- // TODO: capability rather than root
|
||||
fn requires_root(&self) -> bool {
|
||||
- matches!(self, Self::Access | Self::Channel { .. })
|
||||
+ matches!(self, Self::Access | Self::Channel { .. } | Self::Bind { .. })
|
||||
}
|
||||
fn is_scheme_root(&self) -> bool {
|
||||
matches!(self, Self::SchemeRoot)
|
||||
@@ -49,7 +56,7 @@ enum ChannelState {
|
||||
AwaitingResponseRead(VecDeque<u8>),
|
||||
}
|
||||
|
||||
-const DEVICE_CONTENTS: &[&str] = &["channel"];
|
||||
+const DEVICE_CONTENTS: &[&str] = &["channel", "bind"];
|
||||
|
||||
impl PciScheme {
|
||||
pub fn access(&mut self) -> usize {
|
||||
@@ -88,22 +95,25 @@ impl SchemeSync for PciScheme {
|
||||
let path = path.trim_matches('/');
|
||||
|
||||
let handle = if path.is_empty() {
|
||||
- Handle::TopLevel {
|
||||
- entries: self
|
||||
- .tree
|
||||
- .iter()
|
||||
- // FIXME remove replacement of : once the old scheme format is no longer supported.
|
||||
- .map(|(addr, _)| format!("{}", addr).replace(':', "--"))
|
||||
- .collect::<Vec<_>>(),
|
||||
- }
|
||||
+ let mut entries: Vec<String> = self
|
||||
+ .tree
|
||||
+ .iter()
|
||||
+ // FIXME remove replacement of : once the old scheme format is no longer supported.
|
||||
+ .map(|(addr, _)| format!("{}", addr).replace(':', "--"))
|
||||
+ .collect();
|
||||
+ entries.push(String::from("uevent"));
|
||||
+ entries.push(String::from("access"));
|
||||
+ Handle::TopLevel { entries }
|
||||
} else if path == "access" {
|
||||
Handle::Access
|
||||
+ } else if path == "uevent" {
|
||||
+ Handle::Uevent
|
||||
} else {
|
||||
let idx = path.find('/').unwrap_or(path.len());
|
||||
let (addr_str, after) = path.split_at(idx);
|
||||
let addr = parse_pci_addr(addr_str).ok_or(Error::new(ENOENT))?;
|
||||
|
||||
- self.parse_after_pci_addr(addr, after)?
|
||||
+ self.parse_after_pci_addr(addr, after, ctx)?
|
||||
};
|
||||
|
||||
let stat = flags & O_STAT != 0;
|
||||
@@ -132,7 +142,8 @@ impl SchemeSync for PciScheme {
|
||||
let (len, mode) = match handle.inner {
|
||||
Handle::TopLevel { ref entries } => (entries.len(), MODE_DIR | 0o755),
|
||||
Handle::Device => (DEVICE_CONTENTS.len(), MODE_DIR | 0o755),
|
||||
- Handle::Access | Handle::Channel { .. } => (0, MODE_CHR | 0o600),
|
||||
+ Handle::Access | Handle::Channel { .. } | Handle::Bind { .. } => (0, MODE_CHR | 0o600),
|
||||
+ Handle::Uevent => (0, MODE_CHR | 0o644),
|
||||
Handle::SchemeRoot => return Err(Error::new(EBADF)),
|
||||
};
|
||||
stat.st_size = len as u64;
|
||||
@@ -160,7 +171,13 @@ impl SchemeSync for PciScheme {
|
||||
addr: _,
|
||||
ref mut st,
|
||||
} => Self::read_channel(st, buf),
|
||||
- Handle::SchemeRoot => Err(Error::new(EBADF)),
|
||||
+ Handle::Uevent => {
|
||||
+ // Uevent surface is ready for hotplug polling consumers.
|
||||
+ // pcid currently only scans at startup, so return empty (EAGAIN would indicate no data available).
|
||||
+ // Consumers can poll and re-read to check for new events.
|
||||
+ Ok(0)
|
||||
+ }
|
||||
+ Handle::SchemeRoot | Handle::Bind { .. } => Err(Error::new(EBADF)),
|
||||
_ => Err(Error::new(EBADF)),
|
||||
}
|
||||
}
|
||||
@@ -193,7 +210,7 @@ impl SchemeSync for PciScheme {
|
||||
return Ok(buf);
|
||||
}
|
||||
Handle::Device => DEVICE_CONTENTS,
|
||||
- Handle::Access | Handle::Channel { .. } => return Err(Error::new(ENOTDIR)),
|
||||
+ Handle::Access | Handle::Channel { .. } | Handle::Bind { .. } | Handle::Uevent => return Err(Error::new(ENOTDIR)),
|
||||
Handle::SchemeRoot => return Err(Error::new(EBADF)),
|
||||
};
|
||||
|
||||
@@ -316,6 +333,16 @@ impl SchemeSync for PciScheme {
|
||||
func.enabled = false;
|
||||
}
|
||||
}
|
||||
+ Some(HandleWrapper {
|
||||
+ inner: Handle::Bind { addr },
|
||||
+ ..
|
||||
+ }) => {
|
||||
+ let addr_str = format!("{}", addr);
|
||||
+ if let Some(&owner_pid) = self.binds.get(&addr_str) {
|
||||
+ log::info!("pcid: device {} unbound by pid {}", addr_str, owner_pid);
|
||||
+ }
|
||||
+ self.binds.remove(&addr_str);
|
||||
+ }
|
||||
_ => {}
|
||||
}
|
||||
}
|
||||
@@ -327,9 +354,10 @@ impl PciScheme {
|
||||
handles: HandleMap::new(),
|
||||
pcie,
|
||||
tree: BTreeMap::new(),
|
||||
+ binds: HashMap::new(),
|
||||
}
|
||||
}
|
||||
- fn parse_after_pci_addr(&mut self, addr: PciAddress, after: &str) -> Result<Handle> {
|
||||
+ fn parse_after_pci_addr(&mut self, addr: PciAddress, after: &str, ctx: &CallerCtx) -> Result<Handle> {
|
||||
if after.chars().next().map_or(false, |c| c != '/') {
|
||||
return Err(Error::new(ENOENT));
|
||||
}
|
||||
@@ -356,6 +384,17 @@ impl PciScheme {
|
||||
st: ChannelState::AwaitingData,
|
||||
}
|
||||
}
|
||||
+ "bind" => {
|
||||
+ let addr_str = format!("{}", addr);
|
||||
+ if let Some(&owner_pid) = self.binds.get(&addr_str) {
|
||||
+ log::info!("pcid: device {} already bound by pid {}", addr_str, owner_pid);
|
||||
+ return Err(Error::new(EALREADY));
|
||||
+ }
|
||||
+ let caller_pid = ctx.pid;
|
||||
+ self.binds.insert(addr_str.clone(), caller_pid);
|
||||
+ log::info!("pcid: device {} bound by pid {}", addr_str, caller_pid);
|
||||
+ Handle::Bind { addr }
|
||||
+ }
|
||||
_ => return Err(Error::new(ENOENT)),
|
||||
}
|
||||
})
|
||||
File diff suppressed because it is too large
Load Diff
File diff suppressed because it is too large
Load Diff
@@ -0,0 +1,13 @@
|
||||
diff --git a/src/context/mod.rs b/src/context/mod.rs
|
||||
index 37c73f5..4f5d60f 100644
|
||||
--- a/src/context/mod.rs
|
||||
+++ b/src/context/mod.rs
|
||||
@@ -22,7 +22,7 @@ use crate::{
|
||||
|
||||
use self::context::Kstack;
|
||||
pub use self::{
|
||||
- context::{BorrowedHtBuf, Context, Status},
|
||||
+ context::{BorrowedHtBuf, Context, SchedPolicy, Status},
|
||||
switch::switch,
|
||||
};
|
||||
|
||||
@@ -0,0 +1,152 @@
|
||||
diff --git a/src/scheme/proc.rs b/src/scheme/proc.rs
|
||||
index 47588e1..6578761 100644
|
||||
--- a/src/scheme/proc.rs
|
||||
+++ b/src/scheme/proc.rs
|
||||
@@ -1,7 +1,7 @@
|
||||
use crate::{
|
||||
context::{
|
||||
self,
|
||||
- context::{HardBlockedReason, LockedFdTbl, SignalState},
|
||||
+ context::{HardBlockedReason, LockedFdTbl, SchedPolicy, SignalState},
|
||||
file::InternalFlags,
|
||||
memory::{handle_notify_files, AddrSpace, AddrSpaceWrapper, Grant, PageSpan},
|
||||
Context, ContextLock, Status,
|
||||
@@ -105,6 +105,7 @@ enum ContextHandle {
|
||||
// Attr handles, to set ens/euid/egid/pid.
|
||||
Authority,
|
||||
Attr,
|
||||
+ Groups,
|
||||
|
||||
Status {
|
||||
privileged: bool,
|
||||
@@ -145,6 +146,7 @@ enum ContextHandle {
|
||||
// directory.
|
||||
OpenViaDup,
|
||||
SchedAffinity,
|
||||
+ SchedPolicy,
|
||||
|
||||
MmapMinAddr(Arc<AddrSpaceWrapper>),
|
||||
}
|
||||
@@ -249,6 +251,9 @@ impl ProcScheme {
|
||||
false,
|
||||
),
|
||||
"sched-affinity" => (ContextHandle::SchedAffinity, true),
|
||||
+ // TODO: Switch this kernel-local proc handle over to a stable upstream
|
||||
+ // redox_syscall ProcCall::SetSchedPolicy opcode once that lands.
|
||||
+ "sched-policy" => (ContextHandle::SchedPolicy, false),
|
||||
"status" => (ContextHandle::Status { privileged: false }, false),
|
||||
_ if path.starts_with("auth-") => {
|
||||
let nonprefix = &path["auth-".len()..];
|
||||
@@ -261,6 +266,7 @@ impl ProcScheme {
|
||||
let handle = match actual_name {
|
||||
"attrs" => ContextHandle::Attr,
|
||||
"status" => ContextHandle::Status { privileged: true },
|
||||
+ "groups" => ContextHandle::Groups,
|
||||
_ => return Err(Error::new(ENOENT)),
|
||||
};
|
||||
|
||||
@@ -306,6 +312,11 @@ impl ProcScheme {
|
||||
let id = NonZeroUsize::new(NEXT_ID.fetch_add(1, Ordering::Relaxed))
|
||||
.ok_or(Error::new(EMFILE))?;
|
||||
let context = context::spawn(true, Some(id), ret, token)?;
|
||||
+ {
|
||||
+ let parent_groups =
|
||||
+ context::current().read(token.token()).groups.clone();
|
||||
+ context.write(token.token()).groups = parent_groups;
|
||||
+ }
|
||||
HANDLES.write(token.token()).insert(
|
||||
id.get(),
|
||||
Handle {
|
||||
@@ -1165,6 +1176,20 @@ impl ContextHandle {
|
||||
|
||||
Ok(size_of_val(&mask))
|
||||
}
|
||||
+ Self::SchedPolicy => {
|
||||
+ if buf.len() != 2 {
|
||||
+ return Err(Error::new(EINVAL));
|
||||
+ }
|
||||
+
|
||||
+ let [policy, rt_priority] = unsafe { buf.read_exact::<[u8; 2]>()? };
|
||||
+ let sched_policy = SchedPolicy::try_from_raw(policy).ok_or(Error::new(EINVAL))?;
|
||||
+
|
||||
+ context
|
||||
+ .write(token.token())
|
||||
+ .set_sched_policy(sched_policy, rt_priority);
|
||||
+
|
||||
+ Ok(2)
|
||||
+ }
|
||||
ContextHandle::Status { privileged } => {
|
||||
let mut args = buf.usizes();
|
||||
|
||||
@@ -1268,9 +1293,42 @@ impl ContextHandle {
|
||||
guard.pid = info.pid as usize;
|
||||
guard.euid = info.euid;
|
||||
guard.egid = info.egid;
|
||||
- guard.prio = (info.prio as usize).min(39);
|
||||
+ guard.set_sched_other_prio(info.prio as usize);
|
||||
Ok(size_of::<ProcSchemeAttrs>())
|
||||
}
|
||||
+ Self::Groups => {
|
||||
+ const NGROUPS_MAX: usize = 65536;
|
||||
+ if buf.len() % size_of::<u32>() != 0 {
|
||||
+ return Err(Error::new(EINVAL));
|
||||
+ }
|
||||
+ let count = buf.len() / size_of::<u32>();
|
||||
+ if count > NGROUPS_MAX {
|
||||
+ return Err(Error::new(EINVAL));
|
||||
+ }
|
||||
+ let mut groups = Vec::with_capacity(count);
|
||||
+ for chunk in buf.in_exact_chunks(size_of::<u32>()).take(count) {
|
||||
+ groups.push(chunk.read_u32()?);
|
||||
+ }
|
||||
+ let proc_id = {
|
||||
+ let guard = context.read(token.token());
|
||||
+ guard.owner_proc_id
|
||||
+ };
|
||||
+ {
|
||||
+ let mut guard = context.write(token.token());
|
||||
+ guard.groups = groups.clone();
|
||||
+ }
|
||||
+ if let Some(pid) = proc_id {
|
||||
+ let mut contexts = context::contexts(token.downgrade());
|
||||
+ let (contexts, mut t) = contexts.token_split();
|
||||
+ for context_ref in contexts.iter() {
|
||||
+ let mut ctx = context_ref.write(t.token());
|
||||
+ if ctx.owner_proc_id == Some(pid) {
|
||||
+ ctx.groups = groups.clone();
|
||||
+ }
|
||||
+ }
|
||||
+ }
|
||||
+ Ok(count * size_of::<u32>())
|
||||
+ }
|
||||
ContextHandle::OpenViaDup => {
|
||||
let mut args = buf.usizes();
|
||||
|
||||
@@ -1427,6 +1485,11 @@ impl ContextHandle {
|
||||
|
||||
buf.copy_exactly(crate::cpu_set::mask_as_bytes(&mask))?;
|
||||
Ok(size_of_val(&mask))
|
||||
+ }
|
||||
+ ContextHandle::SchedPolicy => {
|
||||
+ let context = context.read(token.token());
|
||||
+ let data = [context.sched_policy as u8, context.sched_rt_priority];
|
||||
+ buf.copy_common_bytes_from_slice(&data)
|
||||
} // TODO: Replace write() with SYS_SENDFD?
|
||||
ContextHandle::Status { .. } => {
|
||||
let status = {
|
||||
@@ -1475,6 +1538,15 @@ impl ContextHandle {
|
||||
debug_name,
|
||||
})
|
||||
}
|
||||
+ Self::Groups => {
|
||||
+ let c = &context.read(token.token());
|
||||
+ let max = buf.len() / size_of::<u32>();
|
||||
+ let count = c.groups.len().min(max);
|
||||
+ for (chunk, gid) in buf.in_exact_chunks(size_of::<u32>()).zip(&c.groups).take(count) {
|
||||
+ chunk.copy_from_slice(&gid.to_ne_bytes())?;
|
||||
+ }
|
||||
+ Ok(count * size_of::<u32>())
|
||||
+ }
|
||||
ContextHandle::Sighandler => {
|
||||
let data = match context.read(token.token()).sig {
|
||||
Some(ref sig) => SetSighandlerData {
|
||||
@@ -0,0 +1,176 @@
|
||||
diff --git a/src/context/context.rs b/src/context/context.rs
|
||||
index c97c516..8a8b078 100644
|
||||
--- a/src/context/context.rs
|
||||
+++ b/src/context/context.rs
|
||||
@@ -18,7 +18,8 @@ use crate::{
|
||||
cpu_stats,
|
||||
ipi::{ipi, IpiKind, IpiTarget},
|
||||
memory::{
|
||||
- allocate_p2frame, deallocate_p2frame, Enomem, Frame, RaiiFrame, RmmA, RmmArch, PAGE_SIZE,
|
||||
+ allocate_p2frame, deallocate_p2frame, Enomem, Frame, PhysicalAddress, RaiiFrame, RmmA,
|
||||
+ RmmArch, PAGE_SIZE,
|
||||
},
|
||||
percpu::PercpuBlock,
|
||||
scheme::{CallerCtx, FileHandle, SchemeId},
|
||||
@@ -62,6 +63,38 @@ impl Status {
|
||||
}
|
||||
}
|
||||
|
||||
+pub const SCHED_PRIORITY_LEVELS: usize = 40;
|
||||
+pub const DEFAULT_SCHED_OTHER_PRIORITY: usize = 20;
|
||||
+pub const DEFAULT_SCHED_RR_QUANTUM: u128 = 100_000_000;
|
||||
+
|
||||
+#[repr(u8)]
|
||||
+#[derive(Clone, Copy, Debug, PartialEq, Eq)]
|
||||
+pub enum SchedPolicy {
|
||||
+ Fifo = 0,
|
||||
+ RoundRobin = 1,
|
||||
+ Other = 2,
|
||||
+}
|
||||
+
|
||||
+impl SchedPolicy {
|
||||
+ pub fn try_from_raw(raw: u8) -> Option<Self> {
|
||||
+ match raw {
|
||||
+ 0 => Some(Self::Fifo),
|
||||
+ 1 => Some(Self::RoundRobin),
|
||||
+ 2 => Some(Self::Other),
|
||||
+ _ => None,
|
||||
+ }
|
||||
+ }
|
||||
+}
|
||||
+
|
||||
+pub fn rt_priority_to_kernel_prio(rt_priority: u8) -> usize {
|
||||
+ (SCHED_PRIORITY_LEVELS - 1)
|
||||
+ .saturating_sub((usize::from(rt_priority.min(99)) * (SCHED_PRIORITY_LEVELS - 1)) / 99)
|
||||
+}
|
||||
+
|
||||
+fn clamp_sched_other_prio(prio: usize) -> usize {
|
||||
+ prio.min(SCHED_PRIORITY_LEVELS - 1)
|
||||
+}
|
||||
+
|
||||
#[derive(Clone, Debug)]
|
||||
pub enum HardBlockedReason {
|
||||
/// "SIGSTOP", only procmgr is allowed to switch contexts this state
|
||||
@@ -140,6 +173,17 @@ pub struct Context {
|
||||
pub fmap_ret: Option<Frame>,
|
||||
/// Priority
|
||||
pub prio: usize,
|
||||
+ pub sched_policy: SchedPolicy,
|
||||
+ pub sched_rt_priority: u8,
|
||||
+ pub sched_rr_ticks_consumed: u32,
|
||||
+ pub sched_static_prio: usize,
|
||||
+ pub sched_rr_quantum: u128,
|
||||
+ #[allow(dead_code)]
|
||||
+ pub futex_pi_boost: bool,
|
||||
+ #[allow(dead_code)]
|
||||
+ pub futex_pi_original_prio: usize,
|
||||
+ #[allow(dead_code)]
|
||||
+ pub futex_pi_waiters: Vec<PhysicalAddress>,
|
||||
|
||||
// TODO: id can reappear after wraparound?
|
||||
pub owner_proc_id: Option<NonZeroUsize>,
|
||||
@@ -148,6 +192,8 @@ pub struct Context {
|
||||
pub euid: u32,
|
||||
pub egid: u32,
|
||||
pub pid: usize,
|
||||
+ /// Supplementary group IDs for access control decisions.
|
||||
+ pub groups: Vec<u32>,
|
||||
|
||||
// See [`PreemptGuard`]
|
||||
//
|
||||
@@ -197,13 +243,22 @@ impl Context {
|
||||
files: Arc::new(RwLock::new(FdTbl::new())),
|
||||
userspace: false,
|
||||
fmap_ret: None,
|
||||
- prio: 20,
|
||||
+ prio: DEFAULT_SCHED_OTHER_PRIORITY,
|
||||
+ sched_policy: SchedPolicy::Other,
|
||||
+ sched_rt_priority: 0,
|
||||
+ sched_rr_ticks_consumed: 0,
|
||||
+ sched_static_prio: DEFAULT_SCHED_OTHER_PRIORITY,
|
||||
+ sched_rr_quantum: DEFAULT_SCHED_RR_QUANTUM,
|
||||
+ futex_pi_boost: false,
|
||||
+ futex_pi_original_prio: DEFAULT_SCHED_OTHER_PRIORITY,
|
||||
+ futex_pi_waiters: Vec::new(),
|
||||
being_sigkilled: false,
|
||||
owner_proc_id,
|
||||
|
||||
euid: 0,
|
||||
egid: 0,
|
||||
pid: 0,
|
||||
+ groups: Vec::new(),
|
||||
|
||||
#[cfg(feature = "syscall_debug")]
|
||||
syscall_debug_info: crate::syscall::debug::SyscallDebugInfo::default(),
|
||||
@@ -218,11 +273,47 @@ impl Context {
|
||||
self.preempt_locks == 0
|
||||
}
|
||||
|
||||
+ fn base_sched_prio(&self) -> usize {
|
||||
+ match self.sched_policy {
|
||||
+ SchedPolicy::Other => clamp_sched_other_prio(self.sched_static_prio),
|
||||
+ SchedPolicy::Fifo | SchedPolicy::RoundRobin => {
|
||||
+ rt_priority_to_kernel_prio(self.sched_rt_priority)
|
||||
+ }
|
||||
+ }
|
||||
+ }
|
||||
+
|
||||
+ fn apply_sched_prio(&mut self) {
|
||||
+ let base_prio = self.base_sched_prio();
|
||||
+ if self.futex_pi_boost {
|
||||
+ self.futex_pi_original_prio = base_prio;
|
||||
+ self.prio = self.prio.min(base_prio);
|
||||
+ } else {
|
||||
+ self.futex_pi_original_prio = base_prio;
|
||||
+ self.prio = base_prio;
|
||||
+ }
|
||||
+ }
|
||||
+
|
||||
+ pub fn set_sched_other_prio(&mut self, prio: usize) {
|
||||
+ self.sched_static_prio = clamp_sched_other_prio(prio);
|
||||
+ self.apply_sched_prio();
|
||||
+ }
|
||||
+
|
||||
+ pub fn set_sched_policy(&mut self, sched_policy: SchedPolicy, rt_priority: u8) {
|
||||
+ self.sched_policy = sched_policy;
|
||||
+ self.sched_rt_priority = match sched_policy {
|
||||
+ SchedPolicy::Other => 0,
|
||||
+ SchedPolicy::Fifo | SchedPolicy::RoundRobin => rt_priority.min(99),
|
||||
+ };
|
||||
+ self.sched_rr_ticks_consumed = 0;
|
||||
+ self.apply_sched_prio();
|
||||
+ }
|
||||
+
|
||||
/// Block the context, and return true if it was runnable before being blocked
|
||||
pub fn block(&mut self, reason: &'static str) -> bool {
|
||||
if self.status.is_runnable() {
|
||||
self.status = Status::Blocked;
|
||||
self.status_reason = reason;
|
||||
+ self.sched_rr_ticks_consumed = 0;
|
||||
true
|
||||
} else {
|
||||
false
|
||||
@@ -232,6 +323,7 @@ impl Context {
|
||||
pub fn hard_block(&mut self, reason: HardBlockedReason) -> bool {
|
||||
if self.status.is_runnable() {
|
||||
self.status = Status::HardBlocked { reason };
|
||||
+ self.sched_rr_ticks_consumed = 0;
|
||||
|
||||
true
|
||||
} else {
|
||||
@@ -261,6 +353,7 @@ impl Context {
|
||||
if self.status.is_soft_blocked() {
|
||||
self.status = Status::Runnable;
|
||||
self.status_reason = "";
|
||||
+ self.sched_rr_ticks_consumed = 0;
|
||||
|
||||
true
|
||||
} else {
|
||||
@@ -479,6 +572,7 @@ impl Context {
|
||||
uid: self.euid,
|
||||
gid: self.egid,
|
||||
pid: self.pid,
|
||||
+ groups: self.groups.clone(),
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,150 @@
|
||||
diff --git a/src/context/switch.rs b/src/context/switch.rs
|
||||
index 86684c8..aeb29c9 100644
|
||||
--- a/src/context/switch.rs
|
||||
+++ b/src/context/switch.rs
|
||||
@@ -5,7 +5,7 @@
|
||||
use crate::{
|
||||
context::{
|
||||
self, arch, idle_contexts, idle_contexts_try, run_contexts, ArcContextLockWriteGuard,
|
||||
- Context, ContextLock, WeakContextRef,
|
||||
+ Context, ContextLock, SchedPolicy, WeakContextRef,
|
||||
},
|
||||
cpu_set::LogicalCpuId,
|
||||
cpu_stats::{self, CpuState},
|
||||
@@ -33,35 +33,17 @@ const SCHED_PRIO_TO_WEIGHT: [usize; 40] = [
|
||||
70, 56, 45, 36, 29, 23, 18, 15,
|
||||
];
|
||||
|
||||
-/// Determines if a given context is eligible to be scheduled on a given CPU (in
|
||||
-/// principle, the current CPU).
|
||||
-///
|
||||
-/// # Safety
|
||||
-/// This function is unsafe because it modifies the `context`'s state directly without synchronization.
|
||||
-///
|
||||
-/// # Parameters
|
||||
-/// - `context`: The context (process/thread) to be checked.
|
||||
-/// - `cpu_id`: The logical ID of the CPU on which the context is being scheduled.
|
||||
-///
|
||||
-/// # Returns
|
||||
-/// - `UpdateResult::CanSwitch`: If the context can be switched to.
|
||||
-/// - `UpdateResult::Skip`: If the context should be skipped (e.g., it's running on another CPU).
|
||||
unsafe fn update_runnable(
|
||||
context: &mut Context,
|
||||
cpu_id: LogicalCpuId,
|
||||
switch_time: u128,
|
||||
) -> UpdateResult {
|
||||
- // Ignore contexts that are already running.
|
||||
if context.running {
|
||||
return UpdateResult::Skip;
|
||||
}
|
||||
-
|
||||
- // Ignore contexts assigned to other CPUs.
|
||||
if !context.sched_affinity.contains(cpu_id) {
|
||||
return UpdateResult::Skip;
|
||||
}
|
||||
-
|
||||
- // If context is soft-blocked and has a wake-up time, check if it should wake up.
|
||||
if context.status.is_soft_blocked()
|
||||
&& let Some(wake) = context.wake
|
||||
&& switch_time >= wake
|
||||
@@ -69,8 +51,6 @@ unsafe fn update_runnable(
|
||||
context.wake = None;
|
||||
context.unblock_no_ipi();
|
||||
}
|
||||
-
|
||||
- // If the context is runnable, indicate it can be switched to.
|
||||
if context.status.is_runnable() {
|
||||
UpdateResult::CanSwitch
|
||||
} else {
|
||||
@@ -95,7 +75,7 @@ pub fn tick(token: &mut CleanLockToken) {
|
||||
let new_ticks = ticks_cell.get() + 1;
|
||||
ticks_cell.set(new_ticks);
|
||||
|
||||
- // Trigger a context switch after every 3 ticks (approx. 6.75 ms).
|
||||
+ // Trigger a context switch after every 3 ticks.
|
||||
if new_ticks >= 3 {
|
||||
switch(token);
|
||||
crate::context::signal::signal_handler(token);
|
||||
@@ -167,10 +147,7 @@ pub fn switch(token: &mut CleanLockToken) -> SwitchResult {
|
||||
let mut prev_context_guard = unsafe { prev_context_lock.write_arc() };
|
||||
|
||||
if !prev_context_guard.is_preemptable() {
|
||||
- // Unset global lock
|
||||
arch::CONTEXT_SWITCH_LOCK.store(false, Ordering::SeqCst);
|
||||
-
|
||||
- // Pretend to have finished switching, so CPU is not idled
|
||||
return SwitchResult::Switched;
|
||||
}
|
||||
|
||||
@@ -377,6 +354,71 @@ fn select_next_context(
|
||||
let total_contexts: usize = contexts_list.iter().map(|q| q.len()).sum();
|
||||
let mut skipped_contexts = 0;
|
||||
|
||||
+ // PASS 0: SCHED_FIFO and SCHED_RR — scan for RT contexts to schedule.
|
||||
+ // When a runnable RT context is found, it takes priority over all SCHED_OTHER.
|
||||
+ for prio in 0..40 {
|
||||
+ let rt_contexts = contexts_list
|
||||
+ .get_mut(prio)
|
||||
+ .expect("prio should be between [0, 39]");
|
||||
+ let len = rt_contexts.len();
|
||||
+ for _ in 0..len {
|
||||
+ let (rt_ref, rt_lock) = match rt_contexts.pop_front() {
|
||||
+ Some(lock) => match lock.upgrade() {
|
||||
+ Some(l) => (lock, l),
|
||||
+ None => {
|
||||
+ skipped_contexts += 1;
|
||||
+ continue;
|
||||
+ }
|
||||
+ },
|
||||
+ None => break,
|
||||
+ };
|
||||
+ if Arc::ptr_eq(&rt_lock, &idle_context) {
|
||||
+ rt_contexts.push_back(rt_ref);
|
||||
+ continue;
|
||||
+ }
|
||||
+ // Current RT thread: if runnable with no higher-prio RT found yet,
|
||||
+ // keep it running (no demotion to SCHED_OTHER)
|
||||
+ if Arc::ptr_eq(&rt_lock, &prev_context_lock) {
|
||||
+ let mut rt_guard = unsafe { rt_lock.write_arc() };
|
||||
+ if rt_guard.status.is_runnable()
|
||||
+ && (rt_guard.sched_policy == SchedPolicy::Fifo
|
||||
+ || rt_guard.sched_policy == SchedPolicy::RoundRobin)
|
||||
+ {
|
||||
+ percpu.balance.set(balance);
|
||||
+ percpu.last_queue.set(i);
|
||||
+ return Ok(Some(rt_guard));
|
||||
+ }
|
||||
+ rt_contexts.push_back(rt_ref);
|
||||
+ continue;
|
||||
+ }
|
||||
+ let mut rt_guard = unsafe { rt_lock.write_arc() };
|
||||
+ if !rt_guard.status.is_runnable() || rt_guard.running
|
||||
+ || !rt_guard.sched_affinity.contains(cpu_id)
|
||||
+ {
|
||||
+ rt_contexts.push_back(rt_ref);
|
||||
+ continue;
|
||||
+ }
|
||||
+ if rt_guard.sched_policy == SchedPolicy::Fifo
|
||||
+ || rt_guard.sched_policy == SchedPolicy::RoundRobin
|
||||
+ {
|
||||
+ percpu.balance.set(balance);
|
||||
+ percpu.last_queue.set(i);
|
||||
+ if !Arc::ptr_eq(&prev_context_lock, &idle_context) {
|
||||
+ let prev_ctx = WeakContextRef(Arc::downgrade(&prev_context_lock));
|
||||
+ if prev_context_guard.status.is_runnable() {
|
||||
+ contexts_list[prev_context_guard.prio].push_back(prev_ctx);
|
||||
+ } else {
|
||||
+ idle_contexts(token.token()).push_back(prev_ctx);
|
||||
+ }
|
||||
+ }
|
||||
+ return Ok(Some(rt_guard));
|
||||
+ }
|
||||
+ rt_contexts.push_back(rt_ref);
|
||||
+ }
|
||||
+ }
|
||||
+
|
||||
+ // PASS 1: SCHED_OTHER — existing DWRR deficit tracking
|
||||
+
|
||||
'priority: loop {
|
||||
i = (i + 1) % 40;
|
||||
total_iters += 1;
|
||||
@@ -0,0 +1,20 @@
|
||||
diff --git a/src/scheme/mod.rs b/src/scheme/mod.rs
|
||||
index d30272c..9da2b28 100644
|
||||
--- a/src/scheme/mod.rs
|
||||
+++ b/src/scheme/mod.rs
|
||||
@@ -777,6 +777,7 @@ pub struct CallerCtx {
|
||||
pub pid: usize,
|
||||
pub uid: u32,
|
||||
pub gid: u32,
|
||||
+ pub groups: alloc::vec::Vec<u32>,
|
||||
}
|
||||
impl CallerCtx {
|
||||
pub fn filter_uid_gid(self, euid: u32, egid: u32) -> Self {
|
||||
@@ -785,6 +786,7 @@ impl CallerCtx {
|
||||
pid: self.pid,
|
||||
uid: euid,
|
||||
gid: egid,
|
||||
+ groups: self.groups,
|
||||
}
|
||||
} else {
|
||||
self
|
||||
@@ -0,0 +1,42 @@
|
||||
diff --git a/src/syscall/futex.rs b/src/syscall/futex.rs
|
||||
index 4c187b8..9884d2b 100644
|
||||
--- a/src/syscall/futex.rs
|
||||
+++ b/src/syscall/futex.rs
|
||||
@@ -49,8 +49,13 @@ pub struct FutexEntry {
|
||||
// implement that fully in userspace. Although futex is probably the best API for process-shared
|
||||
// POSIX synchronization primitives, a local hash table and wait-for-thread kernel APIs (e.g.
|
||||
// lwp_park/lwp_unpark from NetBSD) could be a simpler replacement.
|
||||
-static FUTEXES: Mutex<L1, FutexList> =
|
||||
- Mutex::new(FutexList::with_hasher(DefaultHashBuilder::new()));
|
||||
+const FUTEX_SHARDS: usize = 64;
|
||||
+
|
||||
+fn futex_shard(phys: PhysicalAddress) -> usize {
|
||||
+ (phys.data() as usize >> 12) % FUTEX_SHARDS
|
||||
+}
|
||||
+
|
||||
+static FUTEXES: [Mutex<L1, FutexList>; FUTEX_SHARDS] = [const { Mutex::new(FutexList::with_hasher(DefaultHashBuilder::new())) }; FUTEX_SHARDS];
|
||||
|
||||
fn validate_and_translate_virt(space: &AddrSpace, addr: VirtualAddress) -> Option<PhysicalAddress> {
|
||||
// TODO: Move this elsewhere!
|
||||
@@ -97,7 +102,7 @@ pub fn futex(
|
||||
{
|
||||
// TODO: Lock ordering violation
|
||||
let mut token = unsafe { CleanLockToken::new() };
|
||||
- let mut futexes = FUTEXES.lock(token.token());
|
||||
+ let mut futexes = FUTEXES[futex_shard(target_physaddr)].lock(token.token());
|
||||
let (futexes, mut token) = futexes.token_split();
|
||||
|
||||
let (fetched, expected) = if op == FUTEX_WAIT {
|
||||
@@ -181,10 +186,11 @@ pub fn futex(
|
||||
}
|
||||
FUTEX_WAKE => {
|
||||
let mut woken = 0;
|
||||
+ let shard = futex_shard(target_physaddr);
|
||||
|
||||
{
|
||||
drop(addr_space_guard);
|
||||
- let mut futexes_map = FUTEXES.lock(token.token());
|
||||
+ let mut futexes_map = FUTEXES[shard].lock(token.token());
|
||||
let (futexes_map, mut token) = futexes_map.token_split();
|
||||
|
||||
let is_empty = if let Some(futexes) = futexes_map.get_mut(&target_physaddr) {
|
||||
@@ -0,0 +1,89 @@
|
||||
diff --git a/src/percpu.rs b/src/percpu.rs
|
||||
index f4ad5e6..1844d62 100644
|
||||
--- a/src/percpu.rs
|
||||
+++ b/src/percpu.rs
|
||||
@@ -1,4 +1,5 @@
|
||||
use alloc::{
|
||||
+ collections::VecDeque,
|
||||
sync::{Arc, Weak},
|
||||
vec::Vec,
|
||||
};
|
||||
@@ -12,7 +13,10 @@ use syscall::PtraceFlags;
|
||||
|
||||
use crate::{
|
||||
arch::device::ArchPercpuMisc,
|
||||
- context::{empty_cr3, memory::AddrSpaceWrapper, switch::ContextSwitchPercpu},
|
||||
+ context::{
|
||||
+ empty_cr3, memory::AddrSpaceWrapper, switch::ContextSwitchPercpu, WeakContextRef,
|
||||
+ RUN_QUEUE_COUNT,
|
||||
+ },
|
||||
cpu_set::{LogicalCpuId, MAX_CPU_COUNT},
|
||||
cpu_stats::{CpuStats, CpuStatsData},
|
||||
ptrace::Session,
|
||||
@@ -20,6 +24,42 @@ use crate::{
|
||||
syscall::debug::SyscallDebugInfo,
|
||||
};
|
||||
|
||||
+#[allow(dead_code)]
|
||||
+pub struct PerCpuSched {
|
||||
+ pub run_queues: [VecDeque<WeakContextRef>; RUN_QUEUE_COUNT],
|
||||
+ pub run_queues_lock: AtomicBool,
|
||||
+ pub balance: Cell<[usize; RUN_QUEUE_COUNT]>,
|
||||
+ pub last_queue: Cell<usize>,
|
||||
+}
|
||||
+
|
||||
+impl PerCpuSched {
|
||||
+ pub const fn new() -> Self {
|
||||
+ const EMPTY: VecDeque<WeakContextRef> = VecDeque::new();
|
||||
+ Self {
|
||||
+ run_queues: [EMPTY; RUN_QUEUE_COUNT],
|
||||
+ run_queues_lock: AtomicBool::new(false),
|
||||
+ balance: Cell::new([0; RUN_QUEUE_COUNT]),
|
||||
+ last_queue: Cell::new(0),
|
||||
+ }
|
||||
+ }
|
||||
+
|
||||
+ pub fn take_lock(&self) {
|
||||
+ while self
|
||||
+ .run_queues_lock
|
||||
+ .compare_exchange(false, true, Ordering::Acquire, Ordering::Relaxed)
|
||||
+ .is_err()
|
||||
+ {
|
||||
+ while self.run_queues_lock.load(Ordering::Relaxed) {
|
||||
+ core::hint::spin_loop();
|
||||
+ }
|
||||
+ }
|
||||
+ }
|
||||
+
|
||||
+ pub fn release_lock(&self) {
|
||||
+ self.run_queues_lock.store(false, Ordering::Release);
|
||||
+ }
|
||||
+}
|
||||
+
|
||||
/// The percpu block, that stored all percpu variables.
|
||||
pub struct PercpuBlock {
|
||||
/// A unique immutable number that identifies the current CPU - used for scheduling
|
||||
@@ -31,7 +71,12 @@ pub struct PercpuBlock {
|
||||
pub current_addrsp: RefCell<Option<Arc<AddrSpaceWrapper>>>,
|
||||
pub new_addrsp_tmp: Cell<Option<Arc<AddrSpaceWrapper>>>,
|
||||
pub wants_tlb_shootdown: AtomicBool,
|
||||
- pub balance: Cell<[usize; 40]>,
|
||||
+
|
||||
+ pub sched: PerCpuSched,
|
||||
+
|
||||
+ // Legacy DWRR state used by context/switch.rs until the per-CPU scheduler migration is
|
||||
+ // finished.
|
||||
+ pub balance: Cell<[usize; RUN_QUEUE_COUNT]>,
|
||||
pub last_queue: Cell<usize>,
|
||||
|
||||
// TODO: Put mailbox queues here, e.g. for TLB shootdown? Just be sure to 128-byte align it
|
||||
@@ -187,7 +232,8 @@ impl PercpuBlock {
|
||||
current_addrsp: RefCell::new(None),
|
||||
new_addrsp_tmp: Cell::new(None),
|
||||
wants_tlb_shootdown: AtomicBool::new(false),
|
||||
- balance: Cell::new([0; 40]),
|
||||
+ sched: PerCpuSched::new(),
|
||||
+ balance: Cell::new([0; RUN_QUEUE_COUNT]),
|
||||
last_queue: Cell::new(39),
|
||||
ptrace_flags: Cell::new(PtraceFlags::empty()),
|
||||
ptrace_session: RefCell::new(None),
|
||||
@@ -0,0 +1,180 @@
|
||||
diff --git a/src/context/context.rs b/src/context/context.rs
|
||||
index c97c516..a0814fa 100644
|
||||
--- a/src/context/context.rs
|
||||
+++ b/src/context/context.rs
|
||||
@@ -18,7 +18,8 @@ use crate::{
|
||||
cpu_stats,
|
||||
ipi::{ipi, IpiKind, IpiTarget},
|
||||
memory::{
|
||||
- allocate_p2frame, deallocate_p2frame, Enomem, Frame, RaiiFrame, RmmA, RmmArch, PAGE_SIZE,
|
||||
+ allocate_p2frame, deallocate_p2frame, Enomem, Frame, PhysicalAddress, RaiiFrame, RmmA,
|
||||
+ RmmArch, PAGE_SIZE,
|
||||
},
|
||||
percpu::PercpuBlock,
|
||||
scheme::{CallerCtx, FileHandle, SchemeId},
|
||||
@@ -62,6 +63,38 @@ impl Status {
|
||||
}
|
||||
}
|
||||
|
||||
+pub const SCHED_PRIORITY_LEVELS: usize = 40;
|
||||
+pub const DEFAULT_SCHED_OTHER_PRIORITY: usize = 20;
|
||||
+pub const DEFAULT_SCHED_RR_QUANTUM: u128 = 100_000_000;
|
||||
+
|
||||
+#[repr(u8)]
|
||||
+#[derive(Clone, Copy, Debug, PartialEq, Eq)]
|
||||
+pub enum SchedPolicy {
|
||||
+ Fifo = 0,
|
||||
+ RoundRobin = 1,
|
||||
+ Other = 2,
|
||||
+}
|
||||
+
|
||||
+impl SchedPolicy {
|
||||
+ pub fn try_from_raw(raw: u8) -> Option<Self> {
|
||||
+ match raw {
|
||||
+ 0 => Some(Self::Fifo),
|
||||
+ 1 => Some(Self::RoundRobin),
|
||||
+ 2 => Some(Self::Other),
|
||||
+ _ => None,
|
||||
+ }
|
||||
+ }
|
||||
+}
|
||||
+
|
||||
+pub fn rt_priority_to_kernel_prio(rt_priority: u8) -> usize {
|
||||
+ (SCHED_PRIORITY_LEVELS - 1)
|
||||
+ .saturating_sub((usize::from(rt_priority.min(99)) * (SCHED_PRIORITY_LEVELS - 1)) / 99)
|
||||
+}
|
||||
+
|
||||
+fn clamp_sched_other_prio(prio: usize) -> usize {
|
||||
+ prio.min(SCHED_PRIORITY_LEVELS - 1)
|
||||
+}
|
||||
+
|
||||
#[derive(Clone, Debug)]
|
||||
pub enum HardBlockedReason {
|
||||
/// "SIGSTOP", only procmgr is allowed to switch contexts this state
|
||||
@@ -140,6 +173,20 @@ pub struct Context {
|
||||
pub fmap_ret: Option<Frame>,
|
||||
/// Priority
|
||||
pub prio: usize,
|
||||
+ pub sched_policy: SchedPolicy,
|
||||
+ pub sched_rt_priority: u8,
|
||||
+ pub sched_rr_ticks_consumed: u32,
|
||||
+ pub sched_static_prio: usize,
|
||||
+pub sched_rr_quantum: u128,
|
||||
+ /// Virtual runtime for SCHED_OTHER fair scheduling.
|
||||
+ /// CPU-bound threads accumulate vruntime faster; I/O-bound stay lower.
|
||||
+ pub vruntime: u128,
|
||||
+ #[allow(dead_code)]
|
||||
+ pub futex_pi_boost: bool,
|
||||
+ #[allow(dead_code)]
|
||||
+ pub futex_pi_original_prio: usize,
|
||||
+ #[allow(dead_code)]
|
||||
+ pub futex_pi_waiters: Vec<PhysicalAddress>,
|
||||
|
||||
// TODO: id can reappear after wraparound?
|
||||
pub owner_proc_id: Option<NonZeroUsize>,
|
||||
@@ -148,6 +195,8 @@ pub struct Context {
|
||||
pub euid: u32,
|
||||
pub egid: u32,
|
||||
pub pid: usize,
|
||||
+ /// Supplementary group IDs for access control decisions.
|
||||
+ pub groups: Vec<u32>,
|
||||
|
||||
// See [`PreemptGuard`]
|
||||
//
|
||||
@@ -197,13 +246,23 @@ impl Context {
|
||||
files: Arc::new(RwLock::new(FdTbl::new())),
|
||||
userspace: false,
|
||||
fmap_ret: None,
|
||||
- prio: 20,
|
||||
+ prio: DEFAULT_SCHED_OTHER_PRIORITY,
|
||||
+ sched_policy: SchedPolicy::Other,
|
||||
+ sched_rt_priority: 0,
|
||||
+ sched_rr_ticks_consumed: 0,
|
||||
+ sched_static_prio: DEFAULT_SCHED_OTHER_PRIORITY,
|
||||
+ sched_rr_quantum: DEFAULT_SCHED_RR_QUANTUM,
|
||||
+ vruntime: 0u128,
|
||||
+ futex_pi_boost: false,
|
||||
+ futex_pi_original_prio: DEFAULT_SCHED_OTHER_PRIORITY,
|
||||
+ futex_pi_waiters: Vec::new(),
|
||||
being_sigkilled: false,
|
||||
owner_proc_id,
|
||||
|
||||
euid: 0,
|
||||
egid: 0,
|
||||
pid: 0,
|
||||
+ groups: Vec::new(),
|
||||
|
||||
#[cfg(feature = "syscall_debug")]
|
||||
syscall_debug_info: crate::syscall::debug::SyscallDebugInfo::default(),
|
||||
@@ -218,11 +277,47 @@ impl Context {
|
||||
self.preempt_locks == 0
|
||||
}
|
||||
|
||||
+ fn base_sched_prio(&self) -> usize {
|
||||
+ match self.sched_policy {
|
||||
+ SchedPolicy::Other => clamp_sched_other_prio(self.sched_static_prio),
|
||||
+ SchedPolicy::Fifo | SchedPolicy::RoundRobin => {
|
||||
+ rt_priority_to_kernel_prio(self.sched_rt_priority)
|
||||
+ }
|
||||
+ }
|
||||
+ }
|
||||
+
|
||||
+ fn apply_sched_prio(&mut self) {
|
||||
+ let base_prio = self.base_sched_prio();
|
||||
+ if self.futex_pi_boost {
|
||||
+ self.futex_pi_original_prio = base_prio;
|
||||
+ self.prio = self.prio.min(base_prio);
|
||||
+ } else {
|
||||
+ self.futex_pi_original_prio = base_prio;
|
||||
+ self.prio = base_prio;
|
||||
+ }
|
||||
+ }
|
||||
+
|
||||
+ pub fn set_sched_other_prio(&mut self, prio: usize) {
|
||||
+ self.sched_static_prio = clamp_sched_other_prio(prio);
|
||||
+ self.apply_sched_prio();
|
||||
+ }
|
||||
+
|
||||
+ pub fn set_sched_policy(&mut self, sched_policy: SchedPolicy, rt_priority: u8) {
|
||||
+ self.sched_policy = sched_policy;
|
||||
+ self.sched_rt_priority = match sched_policy {
|
||||
+ SchedPolicy::Other => 0,
|
||||
+ SchedPolicy::Fifo | SchedPolicy::RoundRobin => rt_priority.min(99),
|
||||
+ };
|
||||
+ self.sched_rr_ticks_consumed = 0;
|
||||
+ self.apply_sched_prio();
|
||||
+ }
|
||||
+
|
||||
/// Block the context, and return true if it was runnable before being blocked
|
||||
pub fn block(&mut self, reason: &'static str) -> bool {
|
||||
if self.status.is_runnable() {
|
||||
self.status = Status::Blocked;
|
||||
self.status_reason = reason;
|
||||
+ self.sched_rr_ticks_consumed = 0;
|
||||
true
|
||||
} else {
|
||||
false
|
||||
@@ -232,6 +327,7 @@ impl Context {
|
||||
pub fn hard_block(&mut self, reason: HardBlockedReason) -> bool {
|
||||
if self.status.is_runnable() {
|
||||
self.status = Status::HardBlocked { reason };
|
||||
+ self.sched_rr_ticks_consumed = 0;
|
||||
|
||||
true
|
||||
} else {
|
||||
@@ -261,6 +357,7 @@ impl Context {
|
||||
if self.status.is_soft_blocked() {
|
||||
self.status = Status::Runnable;
|
||||
self.status_reason = "";
|
||||
+ self.sched_rr_ticks_consumed = 0;
|
||||
|
||||
true
|
||||
} else {
|
||||
@@ -479,6 +576,7 @@ impl Context {
|
||||
uid: self.euid,
|
||||
gid: self.egid,
|
||||
pid: self.pid,
|
||||
+ groups: self.groups.clone(),
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,214 @@
|
||||
diff --git a/src/context/switch.rs b/src/context/switch.rs
|
||||
index 86684c8..74dd5f1 100644
|
||||
--- a/src/context/switch.rs
|
||||
+++ b/src/context/switch.rs
|
||||
@@ -5,7 +5,7 @@
|
||||
use crate::{
|
||||
context::{
|
||||
self, arch, idle_contexts, idle_contexts_try, run_contexts, ArcContextLockWriteGuard,
|
||||
- Context, ContextLock, WeakContextRef,
|
||||
+ Context, ContextLock, SchedPolicy, WeakContextRef,
|
||||
},
|
||||
cpu_set::LogicalCpuId,
|
||||
cpu_stats::{self, CpuState},
|
||||
@@ -33,35 +33,17 @@ const SCHED_PRIO_TO_WEIGHT: [usize; 40] = [
|
||||
70, 56, 45, 36, 29, 23, 18, 15,
|
||||
];
|
||||
|
||||
-/// Determines if a given context is eligible to be scheduled on a given CPU (in
|
||||
-/// principle, the current CPU).
|
||||
-///
|
||||
-/// # Safety
|
||||
-/// This function is unsafe because it modifies the `context`'s state directly without synchronization.
|
||||
-///
|
||||
-/// # Parameters
|
||||
-/// - `context`: The context (process/thread) to be checked.
|
||||
-/// - `cpu_id`: The logical ID of the CPU on which the context is being scheduled.
|
||||
-///
|
||||
-/// # Returns
|
||||
-/// - `UpdateResult::CanSwitch`: If the context can be switched to.
|
||||
-/// - `UpdateResult::Skip`: If the context should be skipped (e.g., it's running on another CPU).
|
||||
unsafe fn update_runnable(
|
||||
context: &mut Context,
|
||||
cpu_id: LogicalCpuId,
|
||||
switch_time: u128,
|
||||
) -> UpdateResult {
|
||||
- // Ignore contexts that are already running.
|
||||
if context.running {
|
||||
return UpdateResult::Skip;
|
||||
}
|
||||
-
|
||||
- // Ignore contexts assigned to other CPUs.
|
||||
if !context.sched_affinity.contains(cpu_id) {
|
||||
return UpdateResult::Skip;
|
||||
}
|
||||
-
|
||||
- // If context is soft-blocked and has a wake-up time, check if it should wake up.
|
||||
if context.status.is_soft_blocked()
|
||||
&& let Some(wake) = context.wake
|
||||
&& switch_time >= wake
|
||||
@@ -69,8 +51,6 @@ unsafe fn update_runnable(
|
||||
context.wake = None;
|
||||
context.unblock_no_ipi();
|
||||
}
|
||||
-
|
||||
- // If the context is runnable, indicate it can be switched to.
|
||||
if context.status.is_runnable() {
|
||||
UpdateResult::CanSwitch
|
||||
} else {
|
||||
@@ -95,7 +75,7 @@ pub fn tick(token: &mut CleanLockToken) {
|
||||
let new_ticks = ticks_cell.get() + 1;
|
||||
ticks_cell.set(new_ticks);
|
||||
|
||||
- // Trigger a context switch after every 3 ticks (approx. 6.75 ms).
|
||||
+ // Trigger a context switch after every 3 ticks.
|
||||
if new_ticks >= 3 {
|
||||
switch(token);
|
||||
crate::context::signal::signal_handler(token);
|
||||
@@ -167,10 +147,7 @@ pub fn switch(token: &mut CleanLockToken) -> SwitchResult {
|
||||
let mut prev_context_guard = unsafe { prev_context_lock.write_arc() };
|
||||
|
||||
if !prev_context_guard.is_preemptable() {
|
||||
- // Unset global lock
|
||||
arch::CONTEXT_SWITCH_LOCK.store(false, Ordering::SeqCst);
|
||||
-
|
||||
- // Pretend to have finished switching, so CPU is not idled
|
||||
return SwitchResult::Switched;
|
||||
}
|
||||
|
||||
@@ -222,6 +199,13 @@ pub fn switch(token: &mut CleanLockToken) -> SwitchResult {
|
||||
// Update times
|
||||
if !was_idle {
|
||||
prev_context.cpu_time += switch_time.saturating_sub(prev_context.switch_time);
|
||||
+ if prev_context.sched_policy == SchedPolicy::Other {
|
||||
+ let actual_ns = switch_time.saturating_sub(prev_context.switch_time);
|
||||
+ let weight = SCHED_PRIO_TO_WEIGHT[prev_context.sched_static_prio.min(39)] as u128;
|
||||
+ let default_weight = SCHED_PRIO_TO_WEIGHT[20] as u128;
|
||||
+ let delta = actual_ns.saturating_mul(default_weight) / weight.max(1);
|
||||
+ prev_context.vruntime = prev_context.vruntime.saturating_add(delta);
|
||||
+ }
|
||||
}
|
||||
next_context.switch_time = switch_time;
|
||||
if next_context.userspace {
|
||||
@@ -377,6 +361,121 @@ fn select_next_context(
|
||||
let total_contexts: usize = contexts_list.iter().map(|q| q.len()).sum();
|
||||
let mut skipped_contexts = 0;
|
||||
|
||||
+ // PASS 0: SCHED_FIFO and SCHED_RR — scan for RT contexts to schedule.
|
||||
+ // When a runnable RT context is found, it takes priority over all SCHED_OTHER.
|
||||
+ for prio in 0..40 {
|
||||
+ let rt_contexts = contexts_list
|
||||
+ .get_mut(prio)
|
||||
+ .expect("prio should be between [0, 39]");
|
||||
+ let len = rt_contexts.len();
|
||||
+ for _ in 0..len {
|
||||
+ let (rt_ref, rt_lock) = match rt_contexts.pop_front() {
|
||||
+ Some(lock) => match lock.upgrade() {
|
||||
+ Some(l) => (lock, l),
|
||||
+ None => {
|
||||
+ skipped_contexts += 1;
|
||||
+ continue;
|
||||
+ }
|
||||
+ },
|
||||
+ None => break,
|
||||
+ };
|
||||
+ if Arc::ptr_eq(&rt_lock, &idle_context) {
|
||||
+ rt_contexts.push_back(rt_ref);
|
||||
+ continue;
|
||||
+ }
|
||||
+ // Current RT thread: if runnable with no higher-prio RT found yet,
|
||||
+ // keep it running (no demotion to SCHED_OTHER)
|
||||
+ if Arc::ptr_eq(&rt_lock, &prev_context_lock) {
|
||||
+ let rt_guard = unsafe { rt_lock.write_arc() };
|
||||
+ if rt_guard.status.is_runnable()
|
||||
+ && (rt_guard.sched_policy == SchedPolicy::Fifo
|
||||
+ || rt_guard.sched_policy == SchedPolicy::RoundRobin)
|
||||
+ {
|
||||
+ percpu.balance.set(balance);
|
||||
+ percpu.last_queue.set(i);
|
||||
+ return Ok(Some(rt_guard));
|
||||
+ }
|
||||
+ rt_contexts.push_back(rt_ref);
|
||||
+ continue;
|
||||
+ }
|
||||
+ let rt_guard = unsafe { rt_lock.write_arc() };
|
||||
+ if !rt_guard.status.is_runnable() || rt_guard.running
|
||||
+ || !rt_guard.sched_affinity.contains(cpu_id)
|
||||
+ {
|
||||
+ rt_contexts.push_back(rt_ref);
|
||||
+ continue;
|
||||
+ }
|
||||
+ if rt_guard.sched_policy == SchedPolicy::Fifo
|
||||
+ || rt_guard.sched_policy == SchedPolicy::RoundRobin
|
||||
+ {
|
||||
+ percpu.balance.set(balance);
|
||||
+ percpu.last_queue.set(i);
|
||||
+ if !Arc::ptr_eq(&prev_context_lock, &idle_context) {
|
||||
+ let prev_ctx = WeakContextRef(Arc::downgrade(&prev_context_lock));
|
||||
+ if prev_context_guard.status.is_runnable() {
|
||||
+ contexts_list[prev_context_guard.prio].push_back(prev_ctx);
|
||||
+ } else {
|
||||
+ idle_contexts(token.token()).push_back(prev_ctx);
|
||||
+ }
|
||||
+ }
|
||||
+ return Ok(Some(rt_guard));
|
||||
+ }
|
||||
+ rt_contexts.push_back(rt_ref);
|
||||
+ }
|
||||
+ }
|
||||
+
|
||||
+ // PASS 1: SCHED_OTHER — minimum-vruntime selection
|
||||
+ {
|
||||
+ let mut min_vruntime = u128::MAX;
|
||||
+ let mut best: Option<(usize, WeakContextRef)> = None;
|
||||
+ for (prio, queue) in contexts_list.iter().enumerate() {
|
||||
+ for ctx_ref in queue.iter() {
|
||||
+ if let Some(ctx_lock) = ctx_ref.upgrade() {
|
||||
+ if Arc::ptr_eq(&ctx_lock, &prev_context_lock) || Arc::ptr_eq(&ctx_lock, &idle_context) {
|
||||
+ continue;
|
||||
+ }
|
||||
+ if let Some(guard) = ctx_lock.try_read(token.token()) {
|
||||
+ if guard.status.is_runnable() && !guard.running
|
||||
+ && guard.sched_affinity.contains(cpu_id)
|
||||
+ && guard.sched_policy == SchedPolicy::Other
|
||||
+ {
|
||||
+ let v = guard.vruntime;
|
||||
+ drop(guard);
|
||||
+ if v < min_vruntime {
|
||||
+ min_vruntime = v;
|
||||
+ best = Some((prio, ctx_ref.clone()));
|
||||
+ }
|
||||
+ }
|
||||
+ }
|
||||
+ }
|
||||
+ }
|
||||
+ }
|
||||
+ if let Some((best_prio, ctx_ref)) = best {
|
||||
+ {
|
||||
+ let queue = contexts_list.get_mut(best_prio).expect("valid prio");
|
||||
+ queue.retain(|r| !WeakContextRef::eq(r, &ctx_ref));
|
||||
+ }
|
||||
+ if let Some(ctx_lock) = ctx_ref.upgrade() {
|
||||
+ let guard = unsafe { ctx_lock.write_arc() };
|
||||
+ if guard.status.is_runnable() {
|
||||
+ percpu.balance.set(balance);
|
||||
+ percpu.last_queue.set(i);
|
||||
+ if !Arc::ptr_eq(&prev_context_lock, &idle_context) {
|
||||
+ let prev_ctx = WeakContextRef(Arc::downgrade(&prev_context_lock));
|
||||
+ if prev_context_guard.status.is_runnable() {
|
||||
+ contexts_list[prev_context_guard.prio].push_back(prev_ctx);
|
||||
+ } else {
|
||||
+ idle_contexts(token.token()).push_back(prev_ctx);
|
||||
+ }
|
||||
+ }
|
||||
+ return Ok(Some(guard));
|
||||
+ }
|
||||
+ }
|
||||
+ }
|
||||
+ }
|
||||
+
|
||||
+ // PASS 2: fallback DWRR deficit tracking
|
||||
+
|
||||
'priority: loop {
|
||||
i = (i + 1) % 40;
|
||||
total_iters += 1;
|
||||
@@ -0,0 +1,196 @@
|
||||
diff --git a/src/context/context.rs b/src/context/context.rs
|
||||
index c97c516..18fbd7f 100644
|
||||
--- a/src/context/context.rs
|
||||
+++ b/src/context/context.rs
|
||||
@@ -18,7 +18,8 @@ use crate::{
|
||||
cpu_stats,
|
||||
ipi::{ipi, IpiKind, IpiTarget},
|
||||
memory::{
|
||||
- allocate_p2frame, deallocate_p2frame, Enomem, Frame, RaiiFrame, RmmA, RmmArch, PAGE_SIZE,
|
||||
+ allocate_p2frame, deallocate_p2frame, Enomem, Frame, PhysicalAddress, RaiiFrame, RmmA,
|
||||
+ RmmArch, PAGE_SIZE,
|
||||
},
|
||||
percpu::PercpuBlock,
|
||||
scheme::{CallerCtx, FileHandle, SchemeId},
|
||||
@@ -62,6 +63,38 @@ impl Status {
|
||||
}
|
||||
}
|
||||
|
||||
+pub const SCHED_PRIORITY_LEVELS: usize = 40;
|
||||
+pub const DEFAULT_SCHED_OTHER_PRIORITY: usize = 20;
|
||||
+pub const DEFAULT_SCHED_RR_QUANTUM: u128 = 100_000_000;
|
||||
+
|
||||
+#[repr(u8)]
|
||||
+#[derive(Clone, Copy, Debug, PartialEq, Eq)]
|
||||
+pub enum SchedPolicy {
|
||||
+ Fifo = 0,
|
||||
+ RoundRobin = 1,
|
||||
+ Other = 2,
|
||||
+}
|
||||
+
|
||||
+impl SchedPolicy {
|
||||
+ pub fn try_from_raw(raw: u8) -> Option<Self> {
|
||||
+ match raw {
|
||||
+ 0 => Some(Self::Fifo),
|
||||
+ 1 => Some(Self::RoundRobin),
|
||||
+ 2 => Some(Self::Other),
|
||||
+ _ => None,
|
||||
+ }
|
||||
+ }
|
||||
+}
|
||||
+
|
||||
+pub fn rt_priority_to_kernel_prio(rt_priority: u8) -> usize {
|
||||
+ (SCHED_PRIORITY_LEVELS - 1)
|
||||
+ .saturating_sub((usize::from(rt_priority.min(99)) * (SCHED_PRIORITY_LEVELS - 1)) / 99)
|
||||
+}
|
||||
+
|
||||
+fn clamp_sched_other_prio(prio: usize) -> usize {
|
||||
+ prio.min(SCHED_PRIORITY_LEVELS - 1)
|
||||
+}
|
||||
+
|
||||
#[derive(Clone, Debug)]
|
||||
pub enum HardBlockedReason {
|
||||
/// "SIGSTOP", only procmgr is allowed to switch contexts this state
|
||||
@@ -96,6 +129,7 @@ pub struct Context {
|
||||
pub running: bool,
|
||||
/// Current CPU ID
|
||||
pub cpu_id: Option<LogicalCpuId>,
|
||||
+ pub last_cpu: Option<LogicalCpuId>,
|
||||
/// Time this context was switched to
|
||||
pub switch_time: u128,
|
||||
/// Amount of CPU time used
|
||||
@@ -140,6 +174,20 @@ pub struct Context {
|
||||
pub fmap_ret: Option<Frame>,
|
||||
/// Priority
|
||||
pub prio: usize,
|
||||
+ pub sched_policy: SchedPolicy,
|
||||
+ pub sched_rt_priority: u8,
|
||||
+ pub sched_rr_ticks_consumed: u32,
|
||||
+ pub sched_static_prio: usize,
|
||||
+pub sched_rr_quantum: u128,
|
||||
+ /// Virtual runtime for SCHED_OTHER fair scheduling.
|
||||
+ /// CPU-bound threads accumulate vruntime faster; I/O-bound stay lower.
|
||||
+ pub vruntime: u128,
|
||||
+ #[allow(dead_code)]
|
||||
+ pub futex_pi_boost: bool,
|
||||
+ #[allow(dead_code)]
|
||||
+ pub futex_pi_original_prio: usize,
|
||||
+ #[allow(dead_code)]
|
||||
+ pub futex_pi_waiters: Vec<PhysicalAddress>,
|
||||
|
||||
// TODO: id can reappear after wraparound?
|
||||
pub owner_proc_id: Option<NonZeroUsize>,
|
||||
@@ -148,6 +196,8 @@ pub struct Context {
|
||||
pub euid: u32,
|
||||
pub egid: u32,
|
||||
pub pid: usize,
|
||||
+ /// Supplementary group IDs for access control decisions.
|
||||
+ pub groups: Vec<u32>,
|
||||
|
||||
// See [`PreemptGuard`]
|
||||
//
|
||||
@@ -182,6 +232,7 @@ impl Context {
|
||||
status_reason: "",
|
||||
running: false,
|
||||
cpu_id: None,
|
||||
+ last_cpu: None,
|
||||
switch_time: 0,
|
||||
cpu_time: 0,
|
||||
sched_affinity: LogicalCpuSet::all(),
|
||||
@@ -197,13 +248,23 @@ impl Context {
|
||||
files: Arc::new(RwLock::new(FdTbl::new())),
|
||||
userspace: false,
|
||||
fmap_ret: None,
|
||||
- prio: 20,
|
||||
+ prio: DEFAULT_SCHED_OTHER_PRIORITY,
|
||||
+ sched_policy: SchedPolicy::Other,
|
||||
+ sched_rt_priority: 0,
|
||||
+ sched_rr_ticks_consumed: 0,
|
||||
+ sched_static_prio: DEFAULT_SCHED_OTHER_PRIORITY,
|
||||
+ sched_rr_quantum: DEFAULT_SCHED_RR_QUANTUM,
|
||||
+ vruntime: 0u128,
|
||||
+ futex_pi_boost: false,
|
||||
+ futex_pi_original_prio: DEFAULT_SCHED_OTHER_PRIORITY,
|
||||
+ futex_pi_waiters: Vec::new(),
|
||||
being_sigkilled: false,
|
||||
owner_proc_id,
|
||||
|
||||
euid: 0,
|
||||
egid: 0,
|
||||
pid: 0,
|
||||
+ groups: Vec::new(),
|
||||
|
||||
#[cfg(feature = "syscall_debug")]
|
||||
syscall_debug_info: crate::syscall::debug::SyscallDebugInfo::default(),
|
||||
@@ -218,11 +279,47 @@ impl Context {
|
||||
self.preempt_locks == 0
|
||||
}
|
||||
|
||||
+ fn base_sched_prio(&self) -> usize {
|
||||
+ match self.sched_policy {
|
||||
+ SchedPolicy::Other => clamp_sched_other_prio(self.sched_static_prio),
|
||||
+ SchedPolicy::Fifo | SchedPolicy::RoundRobin => {
|
||||
+ rt_priority_to_kernel_prio(self.sched_rt_priority)
|
||||
+ }
|
||||
+ }
|
||||
+ }
|
||||
+
|
||||
+ fn apply_sched_prio(&mut self) {
|
||||
+ let base_prio = self.base_sched_prio();
|
||||
+ if self.futex_pi_boost {
|
||||
+ self.futex_pi_original_prio = base_prio;
|
||||
+ self.prio = self.prio.min(base_prio);
|
||||
+ } else {
|
||||
+ self.futex_pi_original_prio = base_prio;
|
||||
+ self.prio = base_prio;
|
||||
+ }
|
||||
+ }
|
||||
+
|
||||
+ pub fn set_sched_other_prio(&mut self, prio: usize) {
|
||||
+ self.sched_static_prio = clamp_sched_other_prio(prio);
|
||||
+ self.apply_sched_prio();
|
||||
+ }
|
||||
+
|
||||
+ pub fn set_sched_policy(&mut self, sched_policy: SchedPolicy, rt_priority: u8) {
|
||||
+ self.sched_policy = sched_policy;
|
||||
+ self.sched_rt_priority = match sched_policy {
|
||||
+ SchedPolicy::Other => 0,
|
||||
+ SchedPolicy::Fifo | SchedPolicy::RoundRobin => rt_priority.min(99),
|
||||
+ };
|
||||
+ self.sched_rr_ticks_consumed = 0;
|
||||
+ self.apply_sched_prio();
|
||||
+ }
|
||||
+
|
||||
/// Block the context, and return true if it was runnable before being blocked
|
||||
pub fn block(&mut self, reason: &'static str) -> bool {
|
||||
if self.status.is_runnable() {
|
||||
self.status = Status::Blocked;
|
||||
self.status_reason = reason;
|
||||
+ self.sched_rr_ticks_consumed = 0;
|
||||
true
|
||||
} else {
|
||||
false
|
||||
@@ -232,6 +329,7 @@ impl Context {
|
||||
pub fn hard_block(&mut self, reason: HardBlockedReason) -> bool {
|
||||
if self.status.is_runnable() {
|
||||
self.status = Status::HardBlocked { reason };
|
||||
+ self.sched_rr_ticks_consumed = 0;
|
||||
|
||||
true
|
||||
} else {
|
||||
@@ -261,6 +359,7 @@ impl Context {
|
||||
if self.status.is_soft_blocked() {
|
||||
self.status = Status::Runnable;
|
||||
self.status_reason = "";
|
||||
+ self.sched_rr_ticks_consumed = 0;
|
||||
|
||||
true
|
||||
} else {
|
||||
@@ -479,6 +578,7 @@ impl Context {
|
||||
uid: self.euid,
|
||||
gid: self.egid,
|
||||
pid: self.pid,
|
||||
+ groups: self.groups.clone(),
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,225 @@
|
||||
diff --git a/src/context/switch.rs b/src/context/switch.rs
|
||||
index 86684c8..cd5f7ed 100644
|
||||
--- a/src/context/switch.rs
|
||||
+++ b/src/context/switch.rs
|
||||
@@ -5,7 +5,7 @@
|
||||
use crate::{
|
||||
context::{
|
||||
self, arch, idle_contexts, idle_contexts_try, run_contexts, ArcContextLockWriteGuard,
|
||||
- Context, ContextLock, WeakContextRef,
|
||||
+ Context, ContextLock, SchedPolicy, WeakContextRef,
|
||||
},
|
||||
cpu_set::LogicalCpuId,
|
||||
cpu_stats::{self, CpuState},
|
||||
@@ -33,35 +33,17 @@ const SCHED_PRIO_TO_WEIGHT: [usize; 40] = [
|
||||
70, 56, 45, 36, 29, 23, 18, 15,
|
||||
];
|
||||
|
||||
-/// Determines if a given context is eligible to be scheduled on a given CPU (in
|
||||
-/// principle, the current CPU).
|
||||
-///
|
||||
-/// # Safety
|
||||
-/// This function is unsafe because it modifies the `context`'s state directly without synchronization.
|
||||
-///
|
||||
-/// # Parameters
|
||||
-/// - `context`: The context (process/thread) to be checked.
|
||||
-/// - `cpu_id`: The logical ID of the CPU on which the context is being scheduled.
|
||||
-///
|
||||
-/// # Returns
|
||||
-/// - `UpdateResult::CanSwitch`: If the context can be switched to.
|
||||
-/// - `UpdateResult::Skip`: If the context should be skipped (e.g., it's running on another CPU).
|
||||
unsafe fn update_runnable(
|
||||
context: &mut Context,
|
||||
cpu_id: LogicalCpuId,
|
||||
switch_time: u128,
|
||||
) -> UpdateResult {
|
||||
- // Ignore contexts that are already running.
|
||||
if context.running {
|
||||
return UpdateResult::Skip;
|
||||
}
|
||||
-
|
||||
- // Ignore contexts assigned to other CPUs.
|
||||
if !context.sched_affinity.contains(cpu_id) {
|
||||
return UpdateResult::Skip;
|
||||
}
|
||||
-
|
||||
- // If context is soft-blocked and has a wake-up time, check if it should wake up.
|
||||
if context.status.is_soft_blocked()
|
||||
&& let Some(wake) = context.wake
|
||||
&& switch_time >= wake
|
||||
@@ -69,8 +51,6 @@ unsafe fn update_runnable(
|
||||
context.wake = None;
|
||||
context.unblock_no_ipi();
|
||||
}
|
||||
-
|
||||
- // If the context is runnable, indicate it can be switched to.
|
||||
if context.status.is_runnable() {
|
||||
UpdateResult::CanSwitch
|
||||
} else {
|
||||
@@ -95,7 +75,7 @@ pub fn tick(token: &mut CleanLockToken) {
|
||||
let new_ticks = ticks_cell.get() + 1;
|
||||
ticks_cell.set(new_ticks);
|
||||
|
||||
- // Trigger a context switch after every 3 ticks (approx. 6.75 ms).
|
||||
+ // Trigger a context switch after every 3 ticks.
|
||||
if new_ticks >= 3 {
|
||||
switch(token);
|
||||
crate::context::signal::signal_handler(token);
|
||||
@@ -167,10 +147,7 @@ pub fn switch(token: &mut CleanLockToken) -> SwitchResult {
|
||||
let mut prev_context_guard = unsafe { prev_context_lock.write_arc() };
|
||||
|
||||
if !prev_context_guard.is_preemptable() {
|
||||
- // Unset global lock
|
||||
arch::CONTEXT_SWITCH_LOCK.store(false, Ordering::SeqCst);
|
||||
-
|
||||
- // Pretend to have finished switching, so CPU is not idled
|
||||
return SwitchResult::Switched;
|
||||
}
|
||||
|
||||
@@ -213,6 +190,7 @@ pub fn switch(token: &mut CleanLockToken) -> SwitchResult {
|
||||
|
||||
// Set the previous context as "not running"
|
||||
prev_context.running = false;
|
||||
+ prev_context.last_cpu = prev_context.cpu_id;
|
||||
|
||||
// Set the next context as "running"
|
||||
next_context.running = true;
|
||||
@@ -222,6 +200,13 @@ pub fn switch(token: &mut CleanLockToken) -> SwitchResult {
|
||||
// Update times
|
||||
if !was_idle {
|
||||
prev_context.cpu_time += switch_time.saturating_sub(prev_context.switch_time);
|
||||
+ if prev_context.sched_policy == SchedPolicy::Other {
|
||||
+ let actual_ns = switch_time.saturating_sub(prev_context.switch_time);
|
||||
+ let weight = SCHED_PRIO_TO_WEIGHT[prev_context.sched_static_prio.min(39)] as u128;
|
||||
+ let default_weight = SCHED_PRIO_TO_WEIGHT[20] as u128;
|
||||
+ let delta = actual_ns.saturating_mul(default_weight) / weight.max(1);
|
||||
+ prev_context.vruntime = prev_context.vruntime.saturating_add(delta);
|
||||
+ }
|
||||
}
|
||||
next_context.switch_time = switch_time;
|
||||
if next_context.userspace {
|
||||
@@ -377,6 +362,124 @@ fn select_next_context(
|
||||
let total_contexts: usize = contexts_list.iter().map(|q| q.len()).sum();
|
||||
let mut skipped_contexts = 0;
|
||||
|
||||
+ // PASS 0: SCHED_FIFO and SCHED_RR — scan for RT contexts to schedule.
|
||||
+ // When a runnable RT context is found, it takes priority over all SCHED_OTHER.
|
||||
+ for prio in 0..40 {
|
||||
+ let rt_contexts = contexts_list
|
||||
+ .get_mut(prio)
|
||||
+ .expect("prio should be between [0, 39]");
|
||||
+ let len = rt_contexts.len();
|
||||
+ for _ in 0..len {
|
||||
+ let (rt_ref, rt_lock) = match rt_contexts.pop_front() {
|
||||
+ Some(lock) => match lock.upgrade() {
|
||||
+ Some(l) => (lock, l),
|
||||
+ None => {
|
||||
+ skipped_contexts += 1;
|
||||
+ continue;
|
||||
+ }
|
||||
+ },
|
||||
+ None => break,
|
||||
+ };
|
||||
+ if Arc::ptr_eq(&rt_lock, &idle_context) {
|
||||
+ rt_contexts.push_back(rt_ref);
|
||||
+ continue;
|
||||
+ }
|
||||
+ // Current RT thread: if runnable with no higher-prio RT found yet,
|
||||
+ // keep it running (no demotion to SCHED_OTHER)
|
||||
+ if Arc::ptr_eq(&rt_lock, &prev_context_lock) {
|
||||
+ let rt_guard = unsafe { rt_lock.write_arc() };
|
||||
+ if rt_guard.status.is_runnable()
|
||||
+ && (rt_guard.sched_policy == SchedPolicy::Fifo
|
||||
+ || rt_guard.sched_policy == SchedPolicy::RoundRobin)
|
||||
+ {
|
||||
+ percpu.balance.set(balance);
|
||||
+ percpu.last_queue.set(i);
|
||||
+ return Ok(Some(rt_guard));
|
||||
+ }
|
||||
+ rt_contexts.push_back(rt_ref);
|
||||
+ continue;
|
||||
+ }
|
||||
+ let rt_guard = unsafe { rt_lock.write_arc() };
|
||||
+ if !rt_guard.status.is_runnable() || rt_guard.running
|
||||
+ || !rt_guard.sched_affinity.contains(cpu_id)
|
||||
+ {
|
||||
+ rt_contexts.push_back(rt_ref);
|
||||
+ continue;
|
||||
+ }
|
||||
+ if rt_guard.sched_policy == SchedPolicy::Fifo
|
||||
+ || rt_guard.sched_policy == SchedPolicy::RoundRobin
|
||||
+ {
|
||||
+ percpu.balance.set(balance);
|
||||
+ percpu.last_queue.set(i);
|
||||
+ if !Arc::ptr_eq(&prev_context_lock, &idle_context) {
|
||||
+ let prev_ctx = WeakContextRef(Arc::downgrade(&prev_context_lock));
|
||||
+ if prev_context_guard.status.is_runnable() {
|
||||
+ contexts_list[prev_context_guard.prio].push_back(prev_ctx);
|
||||
+ } else {
|
||||
+ idle_contexts(token.token()).push_back(prev_ctx);
|
||||
+ }
|
||||
+ }
|
||||
+ return Ok(Some(rt_guard));
|
||||
+ }
|
||||
+ rt_contexts.push_back(rt_ref);
|
||||
+ }
|
||||
+ }
|
||||
+
|
||||
+ // PASS 1: SCHED_OTHER — minimum-vruntime selection
|
||||
+ {
|
||||
+ let mut min_vruntime = u128::MAX;
|
||||
+ let mut best: Option<(usize, WeakContextRef)> = None;
|
||||
+ for (prio, queue) in contexts_list.iter().enumerate() {
|
||||
+ for ctx_ref in queue.iter() {
|
||||
+ if let Some(ctx_lock) = ctx_ref.upgrade() {
|
||||
+ if Arc::ptr_eq(&ctx_lock, &prev_context_lock) || Arc::ptr_eq(&ctx_lock, &idle_context) {
|
||||
+ continue;
|
||||
+ }
|
||||
+ if let Some(guard) = ctx_lock.try_read(token.token()) {
|
||||
+ if guard.status.is_runnable() && !guard.running
|
||||
+ && guard.sched_affinity.contains(cpu_id)
|
||||
+ && guard.sched_policy == SchedPolicy::Other
|
||||
+ {
|
||||
+ let mut v = guard.vruntime;
|
||||
+ if guard.last_cpu == Some(cpu_id) {
|
||||
+ v = v.saturating_sub(v / 8);
|
||||
+ }
|
||||
+ drop(guard);
|
||||
+ if v < min_vruntime {
|
||||
+ min_vruntime = v;
|
||||
+ best = Some((prio, ctx_ref.clone()));
|
||||
+ }
|
||||
+ }
|
||||
+ }
|
||||
+ }
|
||||
+ }
|
||||
+ }
|
||||
+ if let Some((best_prio, ctx_ref)) = best {
|
||||
+ {
|
||||
+ let queue = contexts_list.get_mut(best_prio).expect("valid prio");
|
||||
+ queue.retain(|r| !WeakContextRef::eq(r, &ctx_ref));
|
||||
+ }
|
||||
+ if let Some(ctx_lock) = ctx_ref.upgrade() {
|
||||
+ let guard = unsafe { ctx_lock.write_arc() };
|
||||
+ if guard.status.is_runnable() {
|
||||
+ percpu.balance.set(balance);
|
||||
+ percpu.last_queue.set(i);
|
||||
+ if !Arc::ptr_eq(&prev_context_lock, &idle_context) {
|
||||
+ let prev_ctx = WeakContextRef(Arc::downgrade(&prev_context_lock));
|
||||
+ if prev_context_guard.status.is_runnable() {
|
||||
+ contexts_list[prev_context_guard.prio].push_back(prev_ctx);
|
||||
+ } else {
|
||||
+ idle_contexts(token.token()).push_back(prev_ctx);
|
||||
+ }
|
||||
+ }
|
||||
+ return Ok(Some(guard));
|
||||
+ }
|
||||
+ }
|
||||
+ }
|
||||
+ }
|
||||
+
|
||||
+ // PASS 2: fallback DWRR deficit tracking
|
||||
+
|
||||
'priority: loop {
|
||||
i = (i + 1) % 40;
|
||||
total_iters += 1;
|
||||
@@ -0,0 +1,47 @@
|
||||
diff --git a/src/scheme/proc.rs b/src/scheme/proc.rs
|
||||
--- a/src/scheme/proc.rs
|
||||
+++ b/src/scheme/proc.rs
|
||||
@@ -147,6 +147,7 @@ enum ContextHandle {
|
||||
Priority,
|
||||
SchedAffinity,
|
||||
SchedPolicy,
|
||||
+ Name,
|
||||
|
||||
MmapMinAddr(Arc<AddrSpaceWrapper>),
|
||||
}
|
||||
@@ -267,6 +268,7 @@ impl ProcScheme {
|
||||
"sched-affinity" => (ContextHandle::SchedAffinity, true),
|
||||
// TODO: Switch this kernel-local proc handle over to a stable upstream
|
||||
// redox_syscall ProcCall::SetSchedPolicy opcode once that lands.
|
||||
"sched-policy" => (ContextHandle::SchedPolicy, false),
|
||||
+ "name" => (ContextHandle::Name, false),
|
||||
"status" => (ContextHandle::Status { privileged: false }, false),
|
||||
_ if path.starts_with("auth-") => {
|
||||
let nonprefix = &path["auth-".len()..];
|
||||
@@ -1218,6 +1220,16 @@ impl ContextHandle {
|
||||
Ok(2)
|
||||
}
|
||||
+ ContextHandle::Name => {
|
||||
+ let mut name_buf = [0u8; 32];
|
||||
+ let len = buf.copy_common_bytes_to_slice(&mut name_buf[..31]).unwrap_or(0);
|
||||
+ let mut context = context.write(token.token());
|
||||
+ context.name.clear();
|
||||
+ if let Ok(s) = core::str::from_utf8(&name_buf[..len]) {
|
||||
+ context.name.push_str(s);
|
||||
+ }
|
||||
+ Ok(len)
|
||||
+ }
|
||||
ContextHandle::Status { privileged } => {
|
||||
let mut args = buf.usizes();
|
||||
|
||||
@@ -1532,6 +1544,10 @@ impl ContextHandle {
|
||||
let data = [context.sched_policy as u8, context.sched_rt_priority];
|
||||
buf.copy_common_bytes_from_slice(&data)
|
||||
}
|
||||
+ ContextHandle::Name => {
|
||||
+ let context = context.read(token.token());
|
||||
+ buf.copy_common_bytes_from_slice(context.name.as_bytes())
|
||||
+ }
|
||||
ContextHandle::Status { .. } => {
|
||||
let status = {
|
||||
let context = context.read(token.token());
|
||||
@@ -0,0 +1,70 @@
|
||||
diff --git a/src/scheme/proc.rs b/src/scheme/proc.rs
|
||||
--- a/src/scheme/proc.rs
|
||||
+++ b/src/scheme/proc.rs
|
||||
@@ -145,8 +145,9 @@ enum ContextHandle {
|
||||
// TODO: Remove this once openat is implemented, or allow openat-via-dup via e.g. the top-level
|
||||
// directory.
|
||||
OpenViaDup,
|
||||
+ Priority,
|
||||
SchedAffinity,
|
||||
SchedPolicy,
|
||||
Name,
|
||||
|
||||
MmapMinAddr(Arc<AddrSpaceWrapper>),
|
||||
@@ -160,6 +161,17 @@ pub struct ProcScheme;
|
||||
static NEXT_ID: AtomicUsize = AtomicUsize::new(1);
|
||||
static HANDLES: RwLock<L1, HashMap<usize, Handle>> =
|
||||
RwLock::new(HashMap::with_hasher(DefaultHashBuilder::new()));
|
||||
+
|
||||
+const NICE_MIN: i32 = -20;
|
||||
+const NICE_MAX: i32 = 19;
|
||||
+
|
||||
+fn nice_to_kernel_prio(nice: i32) -> usize {
|
||||
+ (nice.saturating_add(20)).clamp(0, 39) as usize
|
||||
+}
|
||||
+
|
||||
+fn kernel_prio_to_nice(prio: usize) -> i32 {
|
||||
+ (prio.min(39) as i32) - 20
|
||||
+}
|
||||
|
||||
#[cfg(feature = "debugger")]
|
||||
#[allow(dead_code)]
|
||||
pub fn foreach_addrsp(
|
||||
@@ -253,6 +265,7 @@ impl ProcScheme {
|
||||
"sighandler" => (ContextHandle::Sighandler, false),
|
||||
"start" => (ContextHandle::Start, false),
|
||||
"open_via_dup" => (ContextHandle::OpenViaDup, false),
|
||||
+ "priority" => (ContextHandle::Priority, false),
|
||||
"mmap-min-addr" => (
|
||||
ContextHandle::MmapMinAddr(Arc::clone(
|
||||
context
|
||||
@@ -1191,6 +1204,17 @@ impl ContextHandle {
|
||||
|
||||
Ok(size_of_val(&mask))
|
||||
}
|
||||
+ Self::Priority => {
|
||||
+ let nice = unsafe { buf.read_exact::<i32>()? };
|
||||
+ if !(NICE_MIN..=NICE_MAX).contains(&nice) {
|
||||
+ return Err(Error::new(EINVAL));
|
||||
+ }
|
||||
+
|
||||
+ context
|
||||
+ .write(token.token())
|
||||
+ .set_sched_other_prio(nice_to_kernel_prio(nice));
|
||||
+
|
||||
+ Ok(size_of::<i32>())
|
||||
+ }
|
||||
Self::SchedPolicy => {
|
||||
if buf.len() != 2 {
|
||||
return Err(Error::new(EINVAL));
|
||||
@@ -1522,6 +1546,10 @@ impl ContextHandle {
|
||||
|
||||
buf.copy_exactly(crate::cpu_set::mask_as_bytes(&mask))?;
|
||||
Ok(size_of_val(&mask))
|
||||
+ }
|
||||
+ ContextHandle::Priority => {
|
||||
+ let nice = kernel_prio_to_nice(context.read(token.token()).prio);
|
||||
+ buf.copy_common_bytes_from_slice(&nice.to_ne_bytes())
|
||||
}
|
||||
ContextHandle::SchedPolicy => {
|
||||
let context = context.read(token.token());
|
||||
@@ -0,0 +1,364 @@
|
||||
diff --git a/src/syscall/futex.rs b/src/syscall/futex.rs
|
||||
--- a/src/syscall/futex.rs
|
||||
+++ b/src/syscall/futex.rs
|
||||
@@
|
||||
-use crate::syscall::{
|
||||
- data::TimeSpec,
|
||||
- error::{Error, Result, EAGAIN, EFAULT, EINVAL, ETIMEDOUT},
|
||||
- flag::{FUTEX_REQUEUE, FUTEX_WAIT, FUTEX_WAIT64, FUTEX_WAKE},
|
||||
-};
|
||||
+use crate::syscall::{
|
||||
+ data::TimeSpec,
|
||||
+ error::{Error, Result, EAGAIN, EDEADLK, EFAULT, EINVAL, EPERM, ETIMEDOUT},
|
||||
+ flag::{FUTEX_REQUEUE, FUTEX_WAIT, FUTEX_WAIT64, FUTEX_WAKE},
|
||||
+};
|
||||
+
|
||||
+const FUTEX_LOCK_PI: usize = 6;
|
||||
+const FUTEX_UNLOCK_PI: usize = 7;
|
||||
+const FUTEX_TRYLOCK_PI: usize = 8;
|
||||
+
|
||||
+const FUTEX_WAITERS: u32 = 0x8000_0000;
|
||||
+const FUTEX_OWNER_DIED: u32 = 0x4000_0000;
|
||||
+const FUTEX_TID_MASK: u32 = 0x3FFF_FFFF;
|
||||
@@
|
||||
-type FutexList = HashMap<PhysicalAddress, Vec<FutexEntry>>;
|
||||
+type FutexList = HashMap<PhysicalAddress, FutexQueue>;
|
||||
+
|
||||
+#[derive(Clone, Copy, Debug, Eq, PartialEq)]
|
||||
+enum FutexWaitKind {
|
||||
+ Regular,
|
||||
+ PriorityInheritance,
|
||||
+}
|
||||
+
|
||||
+#[derive(Default)]
|
||||
+struct FutexQueue {
|
||||
+ waiters: Vec<FutexEntry>,
|
||||
+ pi_owner: Option<Weak<ContextLock>>,
|
||||
+}
|
||||
+
|
||||
+impl FutexQueue {
|
||||
+ fn is_empty(&self) -> bool {
|
||||
+ self.waiters.is_empty() && self.pi_owner.is_none()
|
||||
+ }
|
||||
+}
|
||||
@@
|
||||
pub struct FutexEntry {
|
||||
@@
|
||||
// address space to check against if virt matches but not phys
|
||||
addr_space: Weak<AddrSpaceWrapper>,
|
||||
+ kind: FutexWaitKind,
|
||||
}
|
||||
@@
|
||||
+fn context_futex_tid(context: &crate::context::Context) -> u32 {
|
||||
+ let tid = u32::try_from(context.pid).unwrap_or(context.debug_id) & FUTEX_TID_MASK;
|
||||
+ if tid == 0 {
|
||||
+ context.debug_id & FUTEX_TID_MASK
|
||||
+ } else {
|
||||
+ tid
|
||||
+ }
|
||||
+}
|
||||
+
|
||||
+fn current_context_futex_tid(context_lock: &Arc<ContextLock>, token: &mut CleanLockToken) -> u32 {
|
||||
+ let context = context_lock.read(token.token());
|
||||
+ context_futex_tid(&context)
|
||||
+}
|
||||
+
|
||||
+fn push_owner_waiter(owner: &mut crate::context::Context, phys: PhysicalAddress) {
|
||||
+ if !owner.futex_pi_waiters.iter().any(|waiter| *waiter == phys) {
|
||||
+ owner.futex_pi_waiters.push(phys);
|
||||
+ }
|
||||
+}
|
||||
+
|
||||
+fn pop_owner_waiter(owner: &mut crate::context::Context, phys: PhysicalAddress) {
|
||||
+ owner.futex_pi_waiters.retain(|waiter| *waiter != phys);
|
||||
+}
|
||||
+
|
||||
+fn boost_pi_owner(
|
||||
+ owner_lock: &Arc<ContextLock>,
|
||||
+ waiter_prio: usize,
|
||||
+ phys: PhysicalAddress,
|
||||
+ token: &mut crate::sync::LockToken<'_, L1>,
|
||||
+) {
|
||||
+ let mut owner = owner_lock.write(token.token());
|
||||
+ push_owner_waiter(&mut owner, phys);
|
||||
+ if owner.prio > waiter_prio {
|
||||
+ if !owner.futex_pi_boost {
|
||||
+ owner.futex_pi_original_prio = owner.prio;
|
||||
+ }
|
||||
+ owner.futex_pi_boost = true;
|
||||
+ owner.prio = owner.prio.min(waiter_prio);
|
||||
+ }
|
||||
+}
|
||||
+
|
||||
+fn restore_pi_owner(owner: &mut crate::context::Context, phys: PhysicalAddress) {
|
||||
+ pop_owner_waiter(owner, phys);
|
||||
+ if owner.futex_pi_boost && owner.futex_pi_waiters.is_empty() {
|
||||
+ owner.futex_pi_boost = false;
|
||||
+ owner.prio = owner.futex_pi_original_prio;
|
||||
+ }
|
||||
+}
|
||||
+
|
||||
+fn queue_waiter(
|
||||
+ queue: &mut FutexQueue,
|
||||
+ target_virtaddr: VirtualAddress,
|
||||
+ context_lock: &Arc<ContextLock>,
|
||||
+ addr_space: &Arc<AddrSpaceWrapper>,
|
||||
+ kind: FutexWaitKind,
|
||||
+) {
|
||||
+ queue.waiters.push(FutexEntry {
|
||||
+ target_virtaddr,
|
||||
+ context_lock: Arc::clone(context_lock),
|
||||
+ addr_space: Arc::downgrade(addr_space),
|
||||
+ kind,
|
||||
+ });
|
||||
+}
|
||||
@@
|
||||
- futexes
|
||||
- .entry(locked_physaddr)
|
||||
- .or_insert_with(Vec::new)
|
||||
- .push(FutexEntry {
|
||||
- target_virtaddr,
|
||||
- context_lock: context_lock.clone(),
|
||||
- addr_space: Arc::downgrade(¤t_addrsp),
|
||||
- });
|
||||
+ let queue = futexes.entry(locked_physaddr).or_insert_with(FutexQueue::default);
|
||||
+ queue_waiter(
|
||||
+ queue,
|
||||
+ target_virtaddr,
|
||||
+ &context_lock,
|
||||
+ ¤t_addrsp,
|
||||
+ FutexWaitKind::Regular,
|
||||
+ );
|
||||
@@
|
||||
- let remove_queue = if let Some(futexes) = futexes_map.get_mut(&target_physaddr) {
|
||||
- let mut i = 0;
|
||||
- let current_addrsp_weak = Arc::downgrade(¤t_addrsp);
|
||||
- while i < futexes.len() && woken < val {
|
||||
- let futex = unsafe { futexes.get_unchecked_mut(i) };
|
||||
- if futex.target_virtaddr != target_virtaddr
|
||||
- || !current_addrsp_weak.ptr_eq(&futex.addr_space)
|
||||
- {
|
||||
- i += 1;
|
||||
- continue;
|
||||
- }
|
||||
- futex.context_lock.write(futex_token.token()).unblock();
|
||||
- futexes.swap_remove(i);
|
||||
- woken += 1;
|
||||
- }
|
||||
- futexes.is_empty()
|
||||
+ let remove_queue = if let Some(queue) = futexes_map.get_mut(&target_physaddr) {
|
||||
+ let mut i = 0;
|
||||
+ let current_addrsp_weak = Arc::downgrade(¤t_addrsp);
|
||||
+ while i < queue.waiters.len() && woken < val {
|
||||
+ let waiter = match queue.waiters.get(i) {
|
||||
+ Some(waiter) => waiter,
|
||||
+ None => break,
|
||||
+ };
|
||||
+ if waiter.kind != FutexWaitKind::Regular
|
||||
+ || waiter.target_virtaddr != target_virtaddr
|
||||
+ || !current_addrsp_weak.ptr_eq(&waiter.addr_space)
|
||||
+ {
|
||||
+ i += 1;
|
||||
+ continue;
|
||||
+ }
|
||||
+ let waiter = queue.waiters.swap_remove(i);
|
||||
+ waiter.context_lock.write(futex_token.token()).unblock();
|
||||
+ woken += 1;
|
||||
+ }
|
||||
+ queue.is_empty()
|
||||
} else {
|
||||
false
|
||||
};
|
||||
@@
|
||||
- let mut source_waiters = source_map.remove(&locked_source_physaddr).unwrap_or_default();
|
||||
+ let mut source_queue = source_map.remove(&locked_source_physaddr).unwrap_or_default();
|
||||
@@
|
||||
- total_woken = wake_from(&mut source_waiters, val, &mut futex_token);
|
||||
+ total_woken = wake_from(&mut source_queue.waiters, val, &mut futex_token);
|
||||
@@
|
||||
- let mut target_waiters = target_map.remove(&locked_target_physaddr).unwrap_or_default();
|
||||
- let mut i = 0;
|
||||
- while i < source_waiters.len() && total_requeued < val2 {
|
||||
- let should_move = source_waiters
|
||||
+ let mut target_queue = target_map.remove(&locked_target_physaddr).unwrap_or_default();
|
||||
+ let mut i = 0;
|
||||
+ while i < source_queue.waiters.len() && total_requeued < val2 {
|
||||
+ let should_move = source_queue
|
||||
+ .waiters
|
||||
.get(i)
|
||||
.map(|waiter| {
|
||||
- waiter.target_virtaddr == target_virtaddr
|
||||
+ waiter.kind == FutexWaitKind::Regular
|
||||
+ && waiter.target_virtaddr == target_virtaddr
|
||||
&& current_addrsp_weak.ptr_eq(&waiter.addr_space)
|
||||
})
|
||||
.unwrap_or(false);
|
||||
@@
|
||||
- let mut waiter = source_waiters.swap_remove(i);
|
||||
- waiter.target_virtaddr = target2_virtaddr;
|
||||
- target_waiters.push(waiter);
|
||||
+ let mut waiter = source_queue.waiters.swap_remove(i);
|
||||
+ waiter.target_virtaddr = target2_virtaddr;
|
||||
+ target_queue.waiters.push(waiter);
|
||||
total_requeued += 1;
|
||||
}
|
||||
- if !target_waiters.is_empty() {
|
||||
- target_map.insert(locked_target_physaddr, target_waiters);
|
||||
+ if !target_queue.is_empty() {
|
||||
+ target_map.insert(locked_target_physaddr, target_queue);
|
||||
}
|
||||
@@
|
||||
- if !source_waiters.is_empty() {
|
||||
- source_map.insert(locked_source_physaddr, source_waiters);
|
||||
+ if !source_queue.is_empty() {
|
||||
+ source_map.insert(locked_source_physaddr, source_queue);
|
||||
}
|
||||
@@
|
||||
+ FUTEX_LOCK_PI | FUTEX_TRYLOCK_PI => {
|
||||
+ let _ = validate_futex_u32_addr(addr)?;
|
||||
+ let context_lock = context::current();
|
||||
+ let current_tid = current_context_futex_tid(&context_lock, token);
|
||||
+ let current_prio = context_lock.read(token.token()).prio;
|
||||
+
|
||||
+ loop {
|
||||
+ let outcome = {
|
||||
+ let shard = futex_shard(target_physaddr);
|
||||
+ let mut futexes = FUTEXES[shard].lock(token.token());
|
||||
+ let (futexes, mut futex_token) = futexes.token_split();
|
||||
+ let addr_space_guard = current_addrsp.acquire_read(futex_token.downgrade());
|
||||
+ let locked_physaddr = validate_and_translate_virt(&addr_space_guard, target_virtaddr)
|
||||
+ .ok_or(Error::new(EFAULT))?;
|
||||
+ if locked_physaddr != target_physaddr {
|
||||
+ None
|
||||
+ } else {
|
||||
+ drop(addr_space_guard);
|
||||
+ let futex_atomic = futex_atomic_u32(locked_physaddr);
|
||||
+ let mut current = futex_atomic.load(Ordering::SeqCst);
|
||||
+ loop {
|
||||
+ let owner_tid = current & FUTEX_TID_MASK;
|
||||
+ let queue = futexes.entry(locked_physaddr).or_insert_with(FutexQueue::default);
|
||||
+ let desired_waiters = if queue.waiters.is_empty() { 0 } else { FUTEX_WAITERS };
|
||||
+
|
||||
+ if owner_tid == 0 {
|
||||
+ let desired = current_tid | desired_waiters;
|
||||
+ match futex_atomic.compare_exchange(current, desired, Ordering::SeqCst, Ordering::SeqCst) {
|
||||
+ Ok(_) => {
|
||||
+ queue.pi_owner = Some(Arc::downgrade(&context_lock));
|
||||
+ break Some(Ok(Ok(0)));
|
||||
+ }
|
||||
+ Err(actual) => current = actual,
|
||||
+ }
|
||||
+ continue;
|
||||
+ }
|
||||
+
|
||||
+ if owner_tid == current_tid {
|
||||
+ break Some(Ok(Err(Error::new(EDEADLK))));
|
||||
+ }
|
||||
+
|
||||
+ if op == FUTEX_TRYLOCK_PI {
|
||||
+ break Some(Ok(Err(Error::new(EAGAIN))));
|
||||
+ }
|
||||
+
|
||||
+ if let Some(owner_lock) = queue.pi_owner.as_ref().and_then(Weak::upgrade) {
|
||||
+ boost_pi_owner(&owner_lock, current_prio, locked_physaddr, &mut futex_token);
|
||||
+ }
|
||||
+
|
||||
+ {
|
||||
+ let mut context = context_lock.write(futex_token.token());
|
||||
+ if let Some((tctl, pctl, _)) = context.sigcontrol()
|
||||
+ && tctl.currently_pending_unblocked(pctl) != 0
|
||||
+ {
|
||||
+ break Some(Ok(Err(Error::new(EINTR))));
|
||||
+ }
|
||||
+ context.wake = None;
|
||||
+ context.block("futex_pi");
|
||||
+ }
|
||||
+
|
||||
+ queue_waiter(
|
||||
+ queue,
|
||||
+ target_virtaddr,
|
||||
+ &context_lock,
|
||||
+ ¤t_addrsp,
|
||||
+ FutexWaitKind::PriorityInheritance,
|
||||
+ );
|
||||
+ futex_atomic.fetch_or(FUTEX_WAITERS, Ordering::SeqCst);
|
||||
+ break Some(Ok(Ok(1)));
|
||||
+ }
|
||||
+ }
|
||||
+ };
|
||||
+
|
||||
+ match outcome {
|
||||
+ None => continue,
|
||||
+ Some(Ok(Ok(0))) => return Ok(0),
|
||||
+ Some(Ok(Ok(_))) => context::switch(token),
|
||||
+ Some(Ok(Err(err))) => return Err(err),
|
||||
+ Some(Err(err)) => return Err(err),
|
||||
+ }
|
||||
+ }
|
||||
+ }
|
||||
+ FUTEX_UNLOCK_PI => {
|
||||
+ let _ = validate_futex_u32_addr(addr)?;
|
||||
+ let context_lock = context::current();
|
||||
+ let current_tid = current_context_futex_tid(&context_lock, token);
|
||||
+ let shard = futex_shard(target_physaddr);
|
||||
+ let current_addrsp_weak = Arc::downgrade(¤t_addrsp);
|
||||
+
|
||||
+ let unlocked = {
|
||||
+ let mut futexes = FUTEXES[shard].lock(token.token());
|
||||
+ let (futexes, mut futex_token) = futexes.token_split();
|
||||
+ let addr_space_guard = current_addrsp.acquire_read(futex_token.downgrade());
|
||||
+ let locked_physaddr = validate_and_translate_virt(&addr_space_guard, target_virtaddr)
|
||||
+ .ok_or(Error::new(EFAULT))?;
|
||||
+ if locked_physaddr != target_physaddr {
|
||||
+ return Err(Error::new(EAGAIN));
|
||||
+ }
|
||||
+ drop(addr_space_guard);
|
||||
+
|
||||
+ let futex_atomic = futex_atomic_u32(locked_physaddr);
|
||||
+ let current = futex_atomic.load(Ordering::SeqCst);
|
||||
+ if (current & FUTEX_TID_MASK) != current_tid {
|
||||
+ return Err(Error::new(EPERM));
|
||||
+ }
|
||||
+
|
||||
+ let mut wake_one = None;
|
||||
+ let mut new = current & !(FUTEX_TID_MASK | FUTEX_OWNER_DIED);
|
||||
+ if let Some(queue) = futexes.get_mut(&locked_physaddr) {
|
||||
+ queue.pi_owner = None;
|
||||
+ let mut best = None;
|
||||
+ for (idx, waiter) in queue.waiters.iter().enumerate() {
|
||||
+ if waiter.kind != FutexWaitKind::PriorityInheritance
|
||||
+ || waiter.target_virtaddr != target_virtaddr
|
||||
+ || !current_addrsp_weak.ptr_eq(&waiter.addr_space)
|
||||
+ {
|
||||
+ continue;
|
||||
+ }
|
||||
+ let prio = waiter.context_lock.read(futex_token.token()).prio;
|
||||
+ match best {
|
||||
+ Some((_, best_prio)) if prio >= best_prio => {}
|
||||
+ _ => best = Some((idx, prio)),
|
||||
+ }
|
||||
+ }
|
||||
+ if let Some((waiter_idx, _)) = best {
|
||||
+ wake_one = Some(queue.waiters.swap_remove(waiter_idx));
|
||||
+ }
|
||||
+ if !queue.waiters.is_empty() {
|
||||
+ new |= FUTEX_WAITERS;
|
||||
+ }
|
||||
+ }
|
||||
+
|
||||
+ futex_atomic.store(new, Ordering::SeqCst);
|
||||
+ {
|
||||
+ let mut context = context_lock.write(futex_token.token());
|
||||
+ restore_pi_owner(&mut context, locked_physaddr);
|
||||
+ }
|
||||
+ if let Some(waiter) = wake_one {
|
||||
+ waiter.context_lock.write(futex_token.token()).unblock();
|
||||
+ }
|
||||
+ true
|
||||
+ };
|
||||
+
|
||||
+ Ok(usize::from(unlocked))
|
||||
+ }
|
||||
_ => Err(Error::new(EINVAL)),
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,282 @@
|
||||
diff --git a/src/syscall/debug.rs b/src/syscall/debug.rs
|
||||
--- a/src/syscall/debug.rs
|
||||
+++ b/src/syscall/debug.rs
|
||||
@@
|
||||
- SYS_FUTEX => format!(
|
||||
- "futex({:#X} [{:?}], {}, {}, {}, {})",
|
||||
+ SYS_FUTEX => format!(
|
||||
+ "futex({:#X} [{:?}], {}, {}, {}, {}, {})",
|
||||
b,
|
||||
UserSlice::ro(b, 4).and_then(|buf| buf.read_u32()),
|
||||
c,
|
||||
d,
|
||||
e,
|
||||
- f
|
||||
+ f,
|
||||
+ g,
|
||||
),
|
||||
diff --git a/src/syscall/futex.rs b/src/syscall/futex.rs
|
||||
--- a/src/syscall/futex.rs
|
||||
+++ b/src/syscall/futex.rs
|
||||
@@
|
||||
-use crate::syscall::{
|
||||
- data::TimeSpec,
|
||||
- error::{Error, Result, EAGAIN, EFAULT, EINVAL, ETIMEDOUT},
|
||||
- flag::{FUTEX_WAIT, FUTEX_WAIT64, FUTEX_WAKE},
|
||||
-};
|
||||
+use crate::syscall::{
|
||||
+ data::TimeSpec,
|
||||
+ error::{Error, Result, EAGAIN, EFAULT, EINVAL, ETIMEDOUT},
|
||||
+ flag::{FUTEX_REQUEUE, FUTEX_WAIT, FUTEX_WAIT64, FUTEX_WAKE},
|
||||
+};
|
||||
+
|
||||
+const FUTEX_CMP_REQUEUE: usize = 4;
|
||||
@@
|
||||
pub struct FutexEntry {
|
||||
@@
|
||||
}
|
||||
+
|
||||
+fn validate_futex_u32_addr(addr: usize) -> Result<VirtualAddress> {
|
||||
+ if !addr.is_multiple_of(4) {
|
||||
+ return Err(Error::new(EINVAL));
|
||||
+ }
|
||||
+ Ok(VirtualAddress::new(addr))
|
||||
+}
|
||||
+
|
||||
+fn lock_futex_pair<R>(
|
||||
+ first_shard: usize,
|
||||
+ second_shard: usize,
|
||||
+ token: &mut CleanLockToken,
|
||||
+ f: impl FnOnce(&mut FutexList, Option<&mut FutexList>, crate::sync::LockToken<'_, L1>) -> R,
|
||||
+) -> R {
|
||||
+ if first_shard == second_shard {
|
||||
+ let mut guard = FUTEXES[first_shard].lock(token.token());
|
||||
+ let (map, map_token) = guard.token_split();
|
||||
+ return f(map, None, map_token);
|
||||
+ }
|
||||
+
|
||||
+ let low = core::cmp::min(first_shard, second_shard);
|
||||
+ let high = core::cmp::max(first_shard, second_shard);
|
||||
+
|
||||
+ let mut low_guard = FUTEXES[low].lock(token.token());
|
||||
+ let (low_map, low_token) = low_guard.token_split();
|
||||
+ let mut high_guard = unsafe { FUTEXES[high].relock(low_token) };
|
||||
+ let (high_map, high_token) = high_guard.token_split();
|
||||
+
|
||||
+ if first_shard == low {
|
||||
+ f(low_map, Some(high_map), high_token)
|
||||
+ } else {
|
||||
+ f(high_map, Some(low_map), high_token)
|
||||
+ }
|
||||
+}
|
||||
@@
|
||||
-pub fn futex(
|
||||
- addr: usize,
|
||||
- op: usize,
|
||||
- val: usize,
|
||||
- val2: usize,
|
||||
- _addr2: usize,
|
||||
- token: &mut CleanLockToken,
|
||||
-) -> Result<usize> {
|
||||
+pub fn futex(
|
||||
+ addr: usize,
|
||||
+ op: usize,
|
||||
+ val: usize,
|
||||
+ val2: usize,
|
||||
+ addr2: usize,
|
||||
+ val3: usize,
|
||||
+ token: &mut CleanLockToken,
|
||||
+) -> Result<usize> {
|
||||
@@
|
||||
- {
|
||||
- // TODO: Lock ordering violation
|
||||
- let mut token = unsafe { CleanLockToken::new() };
|
||||
- let mut futexes = FUTEXES[futex_shard(target_physaddr)].lock(token.token());
|
||||
- let (futexes, mut token) = futexes.token_split();
|
||||
+ loop {
|
||||
+ let shard = futex_shard(target_physaddr);
|
||||
+ let queued = {
|
||||
+ let mut futexes = FUTEXES[shard].lock(token.token());
|
||||
+ let (futexes, mut futex_token) = futexes.token_split();
|
||||
+ let addr_space_guard = current_addrsp.acquire_read(futex_token.downgrade());
|
||||
+ let locked_physaddr = validate_and_translate_virt(&addr_space_guard, target_virtaddr)
|
||||
+ .ok_or(Error::new(EFAULT))?;
|
||||
+ if locked_physaddr != target_physaddr {
|
||||
+ false
|
||||
+ } else {
|
||||
+ drop(addr_space_guard);
|
||||
@@
|
||||
- futexes
|
||||
- .entry(target_physaddr)
|
||||
- .or_insert_with(Vec::new)
|
||||
- .push(FutexEntry {
|
||||
- target_virtaddr,
|
||||
- context_lock: context_lock.clone(),
|
||||
- addr_space: Arc::downgrade(¤t_addrsp),
|
||||
- });
|
||||
- }
|
||||
+ futexes
|
||||
+ .entry(locked_physaddr)
|
||||
+ .or_insert_with(Vec::new)
|
||||
+ .push(FutexEntry {
|
||||
+ target_virtaddr,
|
||||
+ context_lock: context_lock.clone(),
|
||||
+ addr_space: Arc::downgrade(¤t_addrsp),
|
||||
+ });
|
||||
+ true
|
||||
+ }
|
||||
+ };
|
||||
+
|
||||
+ if queued {
|
||||
+ break;
|
||||
+ }
|
||||
+ }
|
||||
@@
|
||||
- drop(addr_space_guard);
|
||||
-
|
||||
context::switch(token);
|
||||
@@
|
||||
FUTEX_WAKE => {
|
||||
@@
|
||||
Ok(woken)
|
||||
}
|
||||
+ FUTEX_REQUEUE | FUTEX_CMP_REQUEUE => {
|
||||
+ let _ = validate_futex_u32_addr(addr)?;
|
||||
+ let target2_virtaddr = validate_futex_u32_addr(addr2)?;
|
||||
+ let target2_physaddr = {
|
||||
+ let addr_space_guard = current_addrsp.acquire_read(token.downgrade());
|
||||
+ validate_and_translate_virt(&addr_space_guard, target2_virtaddr)
|
||||
+ .ok_or(Error::new(EFAULT))?
|
||||
+ };
|
||||
+ let source_shard = futex_shard(target_physaddr);
|
||||
+ let target_shard = futex_shard(target2_physaddr);
|
||||
+ let current_addrsp_weak = Arc::downgrade(¤t_addrsp);
|
||||
+
|
||||
+ let affected = lock_futex_pair(
|
||||
+ source_shard,
|
||||
+ target_shard,
|
||||
+ token,
|
||||
+ |source_map, target_map_opt, mut futex_token| {
|
||||
+ let addr_space_guard = current_addrsp.acquire_read(futex_token.downgrade());
|
||||
+ let locked_source_physaddr = validate_and_translate_virt(&addr_space_guard, target_virtaddr)
|
||||
+ .ok_or(Error::new(EFAULT))?;
|
||||
+ let locked_target_physaddr = validate_and_translate_virt(&addr_space_guard, target2_virtaddr)
|
||||
+ .ok_or(Error::new(EFAULT))?;
|
||||
+ drop(addr_space_guard);
|
||||
+
|
||||
+ if locked_source_physaddr != target_physaddr || locked_target_physaddr != target2_physaddr {
|
||||
+ return Err(Error::new(EAGAIN));
|
||||
+ }
|
||||
+
|
||||
+ if op == FUTEX_CMP_REQUEUE {
|
||||
+ let accessible_addr = crate::memory::RmmA::phys_to_virt(locked_source_physaddr).data();
|
||||
+ let current = u64::from(unsafe {
|
||||
+ (*(accessible_addr as *const AtomicU32)).load(Ordering::SeqCst)
|
||||
+ });
|
||||
+ if current != u64::from(val3 as u32) {
|
||||
+ return Err(Error::new(EAGAIN));
|
||||
+ }
|
||||
+ }
|
||||
+
|
||||
+ let mut source_waiters = source_map.remove(&locked_source_physaddr).unwrap_or_default();
|
||||
+ let mut total_woken = 0;
|
||||
+ let mut total_requeued = 0;
|
||||
+
|
||||
+ let wake_from = |waiters: &mut Vec<FutexEntry>, limit: usize, token: &mut crate::sync::LockToken<'_, L1>| {
|
||||
+ let mut woken = 0;
|
||||
+ let mut i = 0;
|
||||
+ while i < waiters.len() && woken < limit {
|
||||
+ let waiter = match waiters.get(i) {
|
||||
+ Some(waiter) => waiter,
|
||||
+ None => break,
|
||||
+ };
|
||||
+ if waiter.target_virtaddr != target_virtaddr || !current_addrsp_weak.ptr_eq(&waiter.addr_space) {
|
||||
+ i += 1;
|
||||
+ continue;
|
||||
+ }
|
||||
+ let waiter = waiters.swap_remove(i);
|
||||
+ waiter.context_lock.write(token.token()).unblock();
|
||||
+ woken += 1;
|
||||
+ }
|
||||
+ woken
|
||||
+ };
|
||||
+
|
||||
+ total_woken = wake_from(&mut source_waiters, val, &mut futex_token);
|
||||
+
|
||||
+ if let Some(target_map) = target_map_opt {
|
||||
+ let mut target_waiters = target_map.remove(&locked_target_physaddr).unwrap_or_default();
|
||||
+ let mut i = 0;
|
||||
+ while i < source_waiters.len() && total_requeued < val2 {
|
||||
+ let should_move = source_waiters
|
||||
+ .get(i)
|
||||
+ .map(|waiter| {
|
||||
+ waiter.target_virtaddr == target_virtaddr
|
||||
+ && current_addrsp_weak.ptr_eq(&waiter.addr_space)
|
||||
+ })
|
||||
+ .unwrap_or(false);
|
||||
+ if !should_move {
|
||||
+ i += 1;
|
||||
+ continue;
|
||||
+ }
|
||||
+ let mut waiter = source_waiters.swap_remove(i);
|
||||
+ waiter.target_virtaddr = target2_virtaddr;
|
||||
+ target_waiters.push(waiter);
|
||||
+ total_requeued += 1;
|
||||
+ }
|
||||
+ if !target_waiters.is_empty() {
|
||||
+ target_map.insert(locked_target_physaddr, target_waiters);
|
||||
+ }
|
||||
+ } else if locked_source_physaddr == locked_target_physaddr {
|
||||
+ for waiter in source_waiters.iter_mut() {
|
||||
+ if total_requeued >= val2 {
|
||||
+ break;
|
||||
+ }
|
||||
+ if waiter.target_virtaddr == target_virtaddr && current_addrsp_weak.ptr_eq(&waiter.addr_space) {
|
||||
+ waiter.target_virtaddr = target2_virtaddr;
|
||||
+ total_requeued += 1;
|
||||
+ }
|
||||
+ }
|
||||
+ } else {
|
||||
+ let mut target_waiters = source_map.remove(&locked_target_physaddr).unwrap_or_default();
|
||||
+ let mut i = 0;
|
||||
+ while i < source_waiters.len() && total_requeued < val2 {
|
||||
+ let should_move = source_waiters
|
||||
+ .get(i)
|
||||
+ .map(|waiter| {
|
||||
+ waiter.target_virtaddr == target_virtaddr
|
||||
+ && current_addrsp_weak.ptr_eq(&waiter.addr_space)
|
||||
+ })
|
||||
+ .unwrap_or(false);
|
||||
+ if !should_move {
|
||||
+ i += 1;
|
||||
+ continue;
|
||||
+ }
|
||||
+ let mut waiter = source_waiters.swap_remove(i);
|
||||
+ waiter.target_virtaddr = target2_virtaddr;
|
||||
+ target_waiters.push(waiter);
|
||||
+ total_requeued += 1;
|
||||
+ }
|
||||
+ if !target_waiters.is_empty() {
|
||||
+ source_map.insert(locked_target_physaddr, target_waiters);
|
||||
+ }
|
||||
+ }
|
||||
+
|
||||
+ if !source_waiters.is_empty() {
|
||||
+ source_map.insert(locked_source_physaddr, source_waiters);
|
||||
+ }
|
||||
+
|
||||
+ Ok(total_woken + total_requeued)
|
||||
+ },
|
||||
+ )?;
|
||||
+
|
||||
+ Ok(affected)
|
||||
+ }
|
||||
_ => Err(Error::new(EINVAL)),
|
||||
}
|
||||
}
|
||||
diff --git a/src/syscall/mod.rs b/src/syscall/mod.rs
|
||||
--- a/src/syscall/mod.rs
|
||||
+++ b/src/syscall/mod.rs
|
||||
@@
|
||||
- SYS_FUTEX => futex(b, c, d, e, f, token),
|
||||
+ SYS_FUTEX => futex(b, c, d, e, f, g, token),
|
||||
@@ -0,0 +1,264 @@
|
||||
diff --git a/src/context/context.rs b/src/context/context.rs
|
||||
--- a/src/context/context.rs
|
||||
+++ b/src/context/context.rs
|
||||
@@
|
||||
#[allow(dead_code)]
|
||||
pub futex_pi_waiters: Vec<PhysicalAddress>,
|
||||
+ pub robust_list_head: Option<usize>,
|
||||
@@
|
||||
futex_pi_boost: false,
|
||||
futex_pi_original_prio: DEFAULT_SCHED_OTHER_PRIORITY,
|
||||
futex_pi_waiters: Vec::new(),
|
||||
+ robust_list_head: None,
|
||||
being_sigkilled: false,
|
||||
diff --git a/src/syscall/debug.rs b/src/syscall/debug.rs
|
||||
--- a/src/syscall/debug.rs
|
||||
+++ b/src/syscall/debug.rs
|
||||
@@
|
||||
use crate::{sync::CleanLockToken, syscall::error::Result};
|
||||
+
|
||||
+const SYS_SET_ROBUST_LIST: usize = 311;
|
||||
+const SYS_GET_ROBUST_LIST: usize = 312;
|
||||
@@
|
||||
SYS_FUTEX => format!(
|
||||
"futex({:#X} [{:?}], {}, {}, {}, {}, {})",
|
||||
@@
|
||||
),
|
||||
+ SYS_SET_ROBUST_LIST => format!("set_robust_list({:#X}, {})", b, c),
|
||||
+ SYS_GET_ROBUST_LIST => format!("get_robust_list({}, {:#X}, {:#X})", b, c, d),
|
||||
SYS_MKNS => format!(
|
||||
diff --git a/src/syscall/futex.rs b/src/syscall/futex.rs
|
||||
--- a/src/syscall/futex.rs
|
||||
+++ b/src/syscall/futex.rs
|
||||
@@
|
||||
-use crate::syscall::{
|
||||
- data::TimeSpec,
|
||||
- error::{Error, Result, EAGAIN, EDEADLK, EFAULT, EINVAL, EPERM, ETIMEDOUT},
|
||||
- flag::{FUTEX_REQUEUE, FUTEX_WAIT, FUTEX_WAIT64, FUTEX_WAKE},
|
||||
-};
|
||||
+use crate::syscall::{
|
||||
+ data::TimeSpec,
|
||||
+ error::{Error, Result, EAGAIN, EDEADLK, EFAULT, EINVAL, EPERM, ESRCH, ETIMEDOUT},
|
||||
+ flag::{FUTEX_REQUEUE, FUTEX_WAIT, FUTEX_WAIT64, FUTEX_WAKE},
|
||||
+};
|
||||
+
|
||||
+use super::usercopy::UserSliceWo;
|
||||
@@
|
||||
const FUTEX_WAITERS: u32 = 0x8000_0000;
|
||||
const FUTEX_OWNER_DIED: u32 = 0x4000_0000;
|
||||
const FUTEX_TID_MASK: u32 = 0x3FFF_FFFF;
|
||||
+
|
||||
+const ROBUST_LIST_LIMIT: usize = 2048;
|
||||
+const ROBUST_LIST_HEAD_SIZE: usize = size_of::<RobustListHead>();
|
||||
@@
|
||||
pub struct FutexEntry {
|
||||
@@
|
||||
}
|
||||
+
|
||||
+#[derive(Clone, Copy, Debug)]
|
||||
+#[repr(C)]
|
||||
+struct RobustList {
|
||||
+ next: usize,
|
||||
+}
|
||||
+
|
||||
+#[derive(Clone, Copy, Debug)]
|
||||
+#[repr(C)]
|
||||
+struct RobustListHead {
|
||||
+ list: RobustList,
|
||||
+ futex_offset: isize,
|
||||
+ list_op_pending: usize,
|
||||
+}
|
||||
@@
|
||||
+fn lookup_robust_list_head(pid: usize, token: &mut CleanLockToken) -> Result<(usize, usize)> {
|
||||
+ let current = context::current();
|
||||
+ {
|
||||
+ let current_guard = current.read(token.token());
|
||||
+ if pid == 0 || current_guard.pid == pid {
|
||||
+ return Ok((current_guard.robust_list_head.unwrap_or(0), ROBUST_LIST_HEAD_SIZE));
|
||||
+ }
|
||||
+ }
|
||||
+
|
||||
+ let mut token_ref = token.token();
|
||||
+ let mut contexts = context::contexts(token_ref.downgrade());
|
||||
+ let (contexts, mut contexts_token) = contexts.token_split();
|
||||
+ for context_ref in contexts.iter() {
|
||||
+ let context = context_ref.read(contexts_token.token());
|
||||
+ if context.pid == pid {
|
||||
+ return Ok((context.robust_list_head.unwrap_or(0), ROBUST_LIST_HEAD_SIZE));
|
||||
+ }
|
||||
+ }
|
||||
+
|
||||
+ Err(Error::new(ESRCH))
|
||||
+}
|
||||
+
|
||||
+fn walk_robust_list_node(
|
||||
+ node_ptr: usize,
|
||||
+ futex_offset: isize,
|
||||
+ owner_tid: u32,
|
||||
+ token: &mut CleanLockToken,
|
||||
+) {
|
||||
+ if node_ptr == 0 {
|
||||
+ return;
|
||||
+ }
|
||||
+
|
||||
+ let Ok(futex_addr) = node_ptr.checked_add_signed(futex_offset).ok_or(Error::new(EFAULT)) else {
|
||||
+ return;
|
||||
+ };
|
||||
+ let Ok(target_virtaddr) = validate_futex_u32_addr(futex_addr) else {
|
||||
+ return;
|
||||
+ };
|
||||
+
|
||||
+ let current_addrsp = match AddrSpace::current() {
|
||||
+ Ok(addrsp) => addrsp,
|
||||
+ Err(_) => return,
|
||||
+ };
|
||||
+
|
||||
+ let shard = futex_shard(validate_and_translate_virt(
|
||||
+ ¤t_addrsp.acquire_read(token.downgrade()),
|
||||
+ target_virtaddr,
|
||||
+ ).ok_or(Error::new(EFAULT)).unwrap_or_else(|_| return));
|
||||
+
|
||||
+ let mut futexes = FUTEXES[shard].lock(token.token());
|
||||
+ let (futexes, mut futex_token) = futexes.token_split();
|
||||
+ let addr_space_guard = current_addrsp.acquire_read(futex_token.downgrade());
|
||||
+ let Some(locked_physaddr) = validate_and_translate_virt(&addr_space_guard, target_virtaddr) else {
|
||||
+ return;
|
||||
+ };
|
||||
+ drop(addr_space_guard);
|
||||
+
|
||||
+ let futex_atomic = futex_atomic_u32(locked_physaddr);
|
||||
+ let current = futex_atomic.load(Ordering::SeqCst);
|
||||
+ if (current & FUTEX_TID_MASK) != owner_tid {
|
||||
+ return;
|
||||
+ }
|
||||
+
|
||||
+ let mut new = (current & FUTEX_WAITERS) | FUTEX_OWNER_DIED;
|
||||
+ if let Some(queue) = futexes.get_mut(&locked_physaddr) {
|
||||
+ queue.pi_owner = None;
|
||||
+ let mut woke = false;
|
||||
+ let mut i = 0;
|
||||
+ while i < queue.waiters.len() && !woke {
|
||||
+ let waiter = match queue.waiters.get(i) {
|
||||
+ Some(waiter) => waiter,
|
||||
+ None => break,
|
||||
+ };
|
||||
+ if waiter.target_virtaddr != target_virtaddr || !Arc::downgrade(¤t_addrsp).ptr_eq(&waiter.addr_space) {
|
||||
+ i += 1;
|
||||
+ continue;
|
||||
+ }
|
||||
+ let waiter = queue.waiters.swap_remove(i);
|
||||
+ waiter.context_lock.write(futex_token.token()).unblock();
|
||||
+ woke = true;
|
||||
+ }
|
||||
+ if !queue.waiters.is_empty() {
|
||||
+ new |= FUTEX_WAITERS;
|
||||
+ }
|
||||
+ }
|
||||
+
|
||||
+ futex_atomic.store(new, Ordering::SeqCst);
|
||||
+}
|
||||
+
|
||||
+pub fn cleanup_current_robust_futexes(token: &mut CleanLockToken) {
|
||||
+ let context_lock = context::current();
|
||||
+ let (head_ptr, owner_tid) = {
|
||||
+ let context = context_lock.read(token.token());
|
||||
+ let Some(head_ptr) = context.robust_list_head else {
|
||||
+ return;
|
||||
+ };
|
||||
+ (head_ptr, context_futex_tid(&context))
|
||||
+ };
|
||||
+
|
||||
+ let Ok(head) = UserSlice::ro(head_ptr, ROBUST_LIST_HEAD_SIZE)
|
||||
+ .and_then(|slice| unsafe { slice.read_exact::<RobustListHead>() })
|
||||
+ else {
|
||||
+ return;
|
||||
+ };
|
||||
+
|
||||
+ let mut next = head.list.next;
|
||||
+ let mut walked = 0;
|
||||
+ while next != 0 && next != head_ptr && walked < ROBUST_LIST_LIMIT {
|
||||
+ let node_ptr = next;
|
||||
+ let Ok(node) = UserSlice::ro(node_ptr, size_of::<RobustList>())
|
||||
+ .and_then(|slice| unsafe { slice.read_exact::<RobustList>() })
|
||||
+ else {
|
||||
+ break;
|
||||
+ };
|
||||
+ walk_robust_list_node(node_ptr, head.futex_offset, owner_tid, token);
|
||||
+ next = node.next;
|
||||
+ walked += 1;
|
||||
+ }
|
||||
+
|
||||
+ if head.list_op_pending != 0 {
|
||||
+ walk_robust_list_node(head.list_op_pending, head.futex_offset, owner_tid, token);
|
||||
+ }
|
||||
+}
|
||||
+
|
||||
+pub fn set_robust_list(head: usize, len: usize, token: &mut CleanLockToken) -> Result<()> {
|
||||
+ if len != ROBUST_LIST_HEAD_SIZE {
|
||||
+ return Err(Error::new(EINVAL));
|
||||
+ }
|
||||
+ if head != 0 {
|
||||
+ UserSlice::ro(head, ROBUST_LIST_HEAD_SIZE)?;
|
||||
+ }
|
||||
+
|
||||
+ let current = context::current();
|
||||
+ current.write(token.token()).robust_list_head = (head != 0).then_some(head);
|
||||
+ Ok(())
|
||||
+}
|
||||
+
|
||||
+pub fn get_robust_list(pid: usize, head_ptr: usize, len_ptr: usize, token: &mut CleanLockToken) -> Result<()> {
|
||||
+ let (head, len) = lookup_robust_list_head(pid, token)?;
|
||||
+ UserSliceWo::wo(head_ptr, size_of::<usize>())?.write_usize(head)?;
|
||||
+ UserSliceWo::wo(len_ptr, size_of::<usize>())?.write_usize(len)?;
|
||||
+ Ok(())
|
||||
+}
|
||||
diff --git a/src/syscall/mod.rs b/src/syscall/mod.rs
|
||||
--- a/src/syscall/mod.rs
|
||||
+++ b/src/syscall/mod.rs
|
||||
@@
|
||||
-pub use self::{
|
||||
- fs::*,
|
||||
- futex::futex,
|
||||
- process::*,
|
||||
- time::*,
|
||||
- usercopy::validate_region,
|
||||
-};
|
||||
+pub use self::{
|
||||
+ fs::*,
|
||||
+ futex::{futex, get_robust_list, set_robust_list},
|
||||
+ process::*,
|
||||
+ time::*,
|
||||
+ usercopy::validate_region,
|
||||
+};
|
||||
@@
|
||||
+const SYS_SET_ROBUST_LIST: usize = 311;
|
||||
+const SYS_GET_ROBUST_LIST: usize = 312;
|
||||
@@
|
||||
SYS_CLOCK_GETTIME => {
|
||||
clock_gettime(b, UserSlice::wo(c, size_of::<TimeSpec>())?, token).map(|()| 0)
|
||||
}
|
||||
SYS_FUTEX => futex(b, c, d, e, f, g, token),
|
||||
+ SYS_SET_ROBUST_LIST => set_robust_list(b, c, token).map(|()| 0),
|
||||
+ SYS_GET_ROBUST_LIST => get_robust_list(b, c, d, token).map(|()| 0),
|
||||
|
||||
SYS_MPROTECT => mprotect(b, c, MapFlags::from_bits_truncate(d), token).map(|()| 0),
|
||||
diff --git a/src/syscall/process.rs b/src/syscall/process.rs
|
||||
--- a/src/syscall/process.rs
|
||||
+++ b/src/syscall/process.rs
|
||||
@@
|
||||
pub fn exit_this_context(excp: Option<syscall::Exception>, token: &mut CleanLockToken) -> ! {
|
||||
let mut close_files;
|
||||
let addrspace_opt;
|
||||
|
||||
+ super::futex::cleanup_current_robust_futexes(token);
|
||||
+
|
||||
let context_lock = context::current();
|
||||
{
|
||||
let mut context = context_lock.write(token.token());
|
||||
@@
|
||||
addrspace_opt = context
|
||||
.set_addr_space(None, token.downgrade())
|
||||
.and_then(|a| Arc::try_unwrap(a).ok());
|
||||
+ context.robust_list_head = None;
|
||||
drop(mem::replace(&mut context.syscall_head, SyscallFrame::Dummy));
|
||||
drop(mem::replace(&mut context.syscall_tail, SyscallFrame::Dummy));
|
||||
@@ -0,0 +1,56 @@
|
||||
diff --git a/src/context/mod.rs b/src/context/mod.rs
|
||||
--- a/src/context/mod.rs
|
||||
+++ b/src/context/mod.rs
|
||||
@@ -10,9 +10,9 @@ use core::{num::NonZeroUsize, ops::Deref};
|
||||
|
||||
use crate::{
|
||||
context::memory::AddrSpaceWrapper,
|
||||
- cpu_set::LogicalCpuSet,
|
||||
+ cpu_set::{LogicalCpuId, LogicalCpuSet},
|
||||
memory::{RmmA, RmmArch, TableKind},
|
||||
- percpu::PercpuBlock,
|
||||
+ percpu::{get_percpu_block, PercpuBlock},
|
||||
sync::{
|
||||
ArcRwLockWriteGuard, CleanLockToken, LockToken, Mutex, MutexGuard, RwLock, RwLockReadGuard,
|
||||
RwLockWriteGuard, L0, L1, L2, L4,
|
||||
@@ -118,6 +118,30 @@ pub fn run_contexts(token: LockToken<'_, L0>) -> MutexGuard<'_, L1, RunContextDa
|
||||
RUN_CONTEXTS.lock(token)
|
||||
}
|
||||
|
||||
+fn least_loaded_cpu() -> LogicalCpuId {
|
||||
+ let current_cpu = crate::cpu_id();
|
||||
+ let mut best_cpu = current_cpu;
|
||||
+ let mut best_depth = usize::MAX;
|
||||
+
|
||||
+ for raw_id in 0..crate::cpu_count() {
|
||||
+ let cpu_id = LogicalCpuId::new(raw_id);
|
||||
+ let Some(percpu) = get_percpu_block(cpu_id) else {
|
||||
+ continue;
|
||||
+ };
|
||||
+
|
||||
+ percpu.sched.take_lock();
|
||||
+ let depth = unsafe { percpu.sched.queues().iter().map(|queue| queue.len()).sum() };
|
||||
+ percpu.sched.release_lock();
|
||||
+
|
||||
+ if depth < best_depth {
|
||||
+ best_depth = depth;
|
||||
+ best_cpu = cpu_id;
|
||||
+ }
|
||||
+ }
|
||||
+
|
||||
+ best_cpu
|
||||
+}
|
||||
+
|
||||
pub fn init(token: &mut CleanLockToken) {
|
||||
let owner = None; // kmain not owned by any fd
|
||||
let mut context = Context::new(owner).expect("failed to create kmain context");
|
||||
@@ -238,6 +262,9 @@ pub fn spawn(
|
||||
|
||||
context.kstack = Some(stack);
|
||||
context.userspace = userspace_allowed;
|
||||
+ let target_cpu = least_loaded_cpu();
|
||||
+ context.sched_affinity = LogicalCpuSet::empty();
|
||||
+ context.sched_affinity.atomic_set(target_cpu);
|
||||
|
||||
let context_lock = Arc::new(ContextLock::new(context));
|
||||
let context_ref = ContextRef(Arc::clone(&context_lock));
|
||||
@@ -0,0 +1,146 @@
|
||||
diff --git a/src/percpu.rs b/src/percpu.rs
|
||||
--- a/src/percpu.rs
|
||||
+++ b/src/percpu.rs
|
||||
@@ -29,12 +29,14 @@ pub struct PerCpuSched {
|
||||
pub run_queues_lock: AtomicBool,
|
||||
pub balance: Cell<[usize; RUN_QUEUE_COUNT]>,
|
||||
pub last_queue: Cell<usize>,
|
||||
+ pub last_balance_time: Cell<u128>,
|
||||
}
|
||||
|
||||
impl PerCpuSched {
|
||||
pub const fn new() -> Self {
|
||||
const EMPTY: VecDeque<WeakContextRef> = VecDeque::new();
|
||||
Self {
|
||||
run_queues: SyncUnsafeCell::new([EMPTY; RUN_QUEUE_COUNT]),
|
||||
run_queues_lock: AtomicBool::new(false),
|
||||
balance: Cell::new([0; RUN_QUEUE_COUNT]),
|
||||
last_queue: Cell::new(0),
|
||||
+ last_balance_time: Cell::new(0),
|
||||
}
|
||||
}
|
||||
diff --git a/src/context/switch.rs b/src/context/switch.rs
|
||||
--- a/src/context/switch.rs
|
||||
+++ b/src/context/switch.rs
|
||||
@@ -33,6 +33,8 @@ const SCHED_PRIO_TO_WEIGHT: [usize; 40] = [
|
||||
70, 56, 45, 36, 29, 23, 18, 15,
|
||||
];
|
||||
|
||||
+const LOAD_BALANCE_INTERVAL_NS: u128 = 100_000_000;
|
||||
+
|
||||
static SCHED_STEAL_COUNT: AtomicUsize = AtomicUsize::new(0);
|
||||
@@ -101,6 +103,9 @@ pub fn tick(token: &mut CleanLockToken) {
|
||||
let new_ticks = ticks_cell.get() + 1;
|
||||
ticks_cell.set(new_ticks);
|
||||
|
||||
+ let balance_time = crate::time::monotonic(token);
|
||||
+ maybe_balance_queues(token, percpu, balance_time);
|
||||
+
|
||||
// Trigger a context switch after every 3 ticks.
|
||||
if new_ticks >= 3 {
|
||||
switch(token);
|
||||
@@ -427,6 +432,92 @@ fn steal_work(
|
||||
|
||||
None
|
||||
}
|
||||
+
|
||||
+fn queue_depth(percpu: &PercpuBlock) -> usize {
|
||||
+ let mut sched_lock = SchedQueuesLock::new(&percpu.sched);
|
||||
+ unsafe {
|
||||
+ sched_lock
|
||||
+ .queues_mut()
|
||||
+ .iter()
|
||||
+ .map(|queue| queue.len())
|
||||
+ .sum()
|
||||
+ }
|
||||
+}
|
||||
+
|
||||
+fn migrate_one_context(
|
||||
+ token: &mut CleanLockToken,
|
||||
+ source_id: LogicalCpuId,
|
||||
+ target_id: LogicalCpuId,
|
||||
+ switch_time: u128,
|
||||
+) -> bool {
|
||||
+ let Some(source) = get_percpu_block(source_id) else {
|
||||
+ return false;
|
||||
+ };
|
||||
+ let Some(target) = get_percpu_block(target_id) else {
|
||||
+ return false;
|
||||
+ };
|
||||
+
|
||||
+ let source_idle = source.switch_internals.idle_context();
|
||||
+ let moved = {
|
||||
+ let mut source_lock = SchedQueuesLock::new(&source.sched);
|
||||
+ let source_queues = unsafe { source_lock.queues_mut() };
|
||||
+ pop_movable_context(token, source_queues, target_id, switch_time, &source_idle)
|
||||
+ };
|
||||
+
|
||||
+ let Some((prio, context_ref)) = moved else {
|
||||
+ return false;
|
||||
+ };
|
||||
+
|
||||
+ let mut target_lock = SchedQueuesLock::new(&target.sched);
|
||||
+ unsafe {
|
||||
+ target_lock.queues_mut()[prio].push_back(context_ref);
|
||||
+ }
|
||||
+ true
|
||||
+}
|
||||
+
|
||||
+fn maybe_balance_queues(token: &mut CleanLockToken, percpu: &PercpuBlock, balance_time: u128) {
|
||||
+ if crate::cpu_count() <= 1 || percpu.cpu_id != LogicalCpuId::BSP {
|
||||
+ return;
|
||||
+ }
|
||||
+ if balance_time.saturating_sub(percpu.sched.last_balance_time.get()) < LOAD_BALANCE_INTERVAL_NS
|
||||
+ {
|
||||
+ return;
|
||||
+ }
|
||||
+
|
||||
+ percpu.sched.last_balance_time.set(balance_time);
|
||||
+
|
||||
+ let mut depths = Vec::new();
|
||||
+ let mut total_depth = 0usize;
|
||||
+ for raw_id in 0..crate::cpu_count() {
|
||||
+ let cpu_id = LogicalCpuId::new(raw_id);
|
||||
+ let Some(cpu_percpu) = get_percpu_block(cpu_id) else {
|
||||
+ continue;
|
||||
+ };
|
||||
+ let depth = queue_depth(cpu_percpu);
|
||||
+ total_depth += depth;
|
||||
+ depths.push((cpu_id, depth));
|
||||
+ }
|
||||
+
|
||||
+ if depths.len() <= 1 || total_depth == 0 {
|
||||
+ return;
|
||||
+ }
|
||||
+
|
||||
+ let avg_depth = (total_depth + depths.len().saturating_sub(1)) / depths.len();
|
||||
+
|
||||
+ for target_index in 0..depths.len() {
|
||||
+ if depths[target_index].1 != 0 {
|
||||
+ continue;
|
||||
+ }
|
||||
+
|
||||
+ let mut source_index = None;
|
||||
+ let mut source_depth = 0usize;
|
||||
+ for (idx, &(_, depth)) in depths.iter().enumerate() {
|
||||
+ if idx == target_index {
|
||||
+ continue;
|
||||
+ }
|
||||
+ if depth > avg_depth + 1 && depth > source_depth {
|
||||
+ source_index = Some(idx);
|
||||
+ source_depth = depth;
|
||||
+ }
|
||||
+ }
|
||||
+
|
||||
+ let Some(source_index) = source_index else {
|
||||
+ continue;
|
||||
+ };
|
||||
+
|
||||
+ let source_id = depths[source_index].0;
|
||||
+ let target_id = depths[target_index].0;
|
||||
+ if migrate_one_context(token, source_id, target_id, balance_time) {
|
||||
+ depths[source_index].1 = depths[source_index].1.saturating_sub(1);
|
||||
+ depths[target_index].1 += 1;
|
||||
+ }
|
||||
+ }
|
||||
+}
|
||||
@@ -0,0 +1,123 @@
|
||||
diff --git a/src/percpu.rs b/src/percpu.rs
|
||||
index f4ad5e6..da10036 100644
|
||||
--- a/src/percpu.rs
|
||||
+++ b/src/percpu.rs
|
||||
@@ -1,9 +1,10 @@
|
||||
use alloc::{
|
||||
+ collections::VecDeque,
|
||||
sync::{Arc, Weak},
|
||||
vec::Vec,
|
||||
};
|
||||
use core::{
|
||||
- cell::{Cell, RefCell},
|
||||
+ cell::{Cell, RefCell, SyncUnsafeCell},
|
||||
sync::atomic::{AtomicBool, AtomicPtr, Ordering},
|
||||
};
|
||||
|
||||
@@ -12,7 +13,10 @@ use syscall::PtraceFlags;
|
||||
|
||||
use crate::{
|
||||
arch::device::ArchPercpuMisc,
|
||||
- context::{empty_cr3, memory::AddrSpaceWrapper, switch::ContextSwitchPercpu},
|
||||
+ context::{
|
||||
+ empty_cr3, memory::AddrSpaceWrapper, switch::ContextSwitchPercpu, WeakContextRef,
|
||||
+ RUN_QUEUE_COUNT,
|
||||
+ },
|
||||
cpu_set::{LogicalCpuId, MAX_CPU_COUNT},
|
||||
cpu_stats::{CpuStats, CpuStatsData},
|
||||
ptrace::Session,
|
||||
@@ -20,6 +24,58 @@ use crate::{
|
||||
syscall::debug::SyscallDebugInfo,
|
||||
};
|
||||
|
||||
+#[allow(dead_code)]
|
||||
+pub struct PerCpuSched {
|
||||
+ pub run_queues: SyncUnsafeCell<[VecDeque<WeakContextRef>; RUN_QUEUE_COUNT]>,
|
||||
+ pub run_queues_lock: AtomicBool,
|
||||
+ pub balance: Cell<[usize; RUN_QUEUE_COUNT]>,
|
||||
+ pub last_queue: Cell<usize>,
|
||||
+ pub last_balance_time: Cell<u128>,
|
||||
+}
|
||||
+
|
||||
+impl PerCpuSched {
|
||||
+ pub const fn new() -> Self {
|
||||
+ const EMPTY: VecDeque<WeakContextRef> = VecDeque::new();
|
||||
+ Self {
|
||||
+ run_queues: SyncUnsafeCell::new([EMPTY; RUN_QUEUE_COUNT]),
|
||||
+ run_queues_lock: AtomicBool::new(false),
|
||||
+ balance: Cell::new([0; RUN_QUEUE_COUNT]),
|
||||
+ last_queue: Cell::new(0),
|
||||
+ last_balance_time: Cell::new(0),
|
||||
+ }
|
||||
+ }
|
||||
+
|
||||
+ pub fn take_lock(&self) {
|
||||
+ while self
|
||||
+ .run_queues_lock
|
||||
+ .compare_exchange(false, true, Ordering::Acquire, Ordering::Relaxed)
|
||||
+ .is_err()
|
||||
+ {
|
||||
+ while self.run_queues_lock.load(Ordering::Relaxed) {
|
||||
+ core::hint::spin_loop();
|
||||
+ }
|
||||
+ }
|
||||
+ }
|
||||
+
|
||||
+ pub fn release_lock(&self) {
|
||||
+ self.run_queues_lock.store(false, Ordering::Release);
|
||||
+ }
|
||||
+
|
||||
+ /// # Safety
|
||||
+ ///
|
||||
+ /// The caller must hold `run_queues_lock` while accessing the returned reference.
|
||||
+ pub unsafe fn queues(&self) -> &[VecDeque<WeakContextRef>; RUN_QUEUE_COUNT] {
|
||||
+ unsafe { &*self.run_queues.get() }
|
||||
+ }
|
||||
+
|
||||
+ /// # Safety
|
||||
+ ///
|
||||
+ /// The caller must hold `run_queues_lock` while accessing the returned reference.
|
||||
+ pub unsafe fn queues_mut(&self) -> &mut [VecDeque<WeakContextRef>; RUN_QUEUE_COUNT] {
|
||||
+ unsafe { &mut *self.run_queues.get() }
|
||||
+ }
|
||||
+}
|
||||
+
|
||||
/// The percpu block, that stored all percpu variables.
|
||||
pub struct PercpuBlock {
|
||||
/// A unique immutable number that identifies the current CPU - used for scheduling
|
||||
@@ -31,8 +87,8 @@ pub struct PercpuBlock {
|
||||
pub current_addrsp: RefCell<Option<Arc<AddrSpaceWrapper>>>,
|
||||
pub new_addrsp_tmp: Cell<Option<Arc<AddrSpaceWrapper>>>,
|
||||
pub wants_tlb_shootdown: AtomicBool,
|
||||
- pub balance: Cell<[usize; 40]>,
|
||||
- pub last_queue: Cell<usize>,
|
||||
+
|
||||
+ pub sched: PerCpuSched,
|
||||
|
||||
// TODO: Put mailbox queues here, e.g. for TLB shootdown? Just be sure to 128-byte align it
|
||||
// first to avoid cache invalidation.
|
||||
@@ -57,6 +113,14 @@ pub unsafe fn init_tlb_shootdown(id: LogicalCpuId, block: *mut PercpuBlock) {
|
||||
ALL_PERCPU_BLOCKS[id.get() as usize].store(block, Ordering::Release)
|
||||
}
|
||||
|
||||
+pub fn get_percpu_block(id: LogicalCpuId) -> Option<&'static PercpuBlock> {
|
||||
+ unsafe {
|
||||
+ ALL_PERCPU_BLOCKS[id.get() as usize]
|
||||
+ .load(Ordering::Acquire)
|
||||
+ .as_ref()
|
||||
+ }
|
||||
+}
|
||||
+
|
||||
pub fn get_all_stats() -> Vec<(LogicalCpuId, CpuStatsData)> {
|
||||
let mut res = ALL_PERCPU_BLOCKS
|
||||
.iter()
|
||||
@@ -187,8 +251,7 @@ impl PercpuBlock {
|
||||
current_addrsp: RefCell::new(None),
|
||||
new_addrsp_tmp: Cell::new(None),
|
||||
wants_tlb_shootdown: AtomicBool::new(false),
|
||||
- balance: Cell::new([0; 40]),
|
||||
- last_queue: Cell::new(39),
|
||||
+ sched: PerCpuSched::new(),
|
||||
ptrace_flags: Cell::new(PtraceFlags::empty()),
|
||||
ptrace_session: RefCell::new(None),
|
||||
inside_syscall: Cell::new(false),
|
||||
@@ -0,0 +1,985 @@
|
||||
diff --git a/src/context/switch.rs b/src/context/switch.rs
|
||||
index 86684c8..d054734 100644
|
||||
--- a/src/context/switch.rs
|
||||
+++ b/src/context/switch.rs
|
||||
@@ -5,18 +5,18 @@
|
||||
use crate::{
|
||||
context::{
|
||||
self, arch, idle_contexts, idle_contexts_try, run_contexts, ArcContextLockWriteGuard,
|
||||
- Context, ContextLock, WeakContextRef,
|
||||
+ Context, ContextLock, SchedPolicy, WeakContextRef, RUN_QUEUE_COUNT,
|
||||
},
|
||||
- cpu_set::LogicalCpuId,
|
||||
+ cpu_set::{LogicalCpuId, LogicalCpuSet},
|
||||
cpu_stats::{self, CpuState},
|
||||
- percpu::PercpuBlock,
|
||||
- sync::{ArcRwLockWriteGuard, CleanLockToken, L4},
|
||||
+ percpu::{get_percpu_block, PerCpuSched, PercpuBlock},
|
||||
+ sync::{ArcRwLockWriteGuard, CleanLockToken, LockToken, L1, L4},
|
||||
};
|
||||
use alloc::{sync::Arc, vec::Vec};
|
||||
use core::{
|
||||
cell::{Cell, RefCell},
|
||||
hint, mem,
|
||||
- sync::atomic::Ordering,
|
||||
+ sync::atomic::{AtomicUsize, Ordering},
|
||||
};
|
||||
use syscall::PtraceFlags;
|
||||
|
||||
@@ -33,35 +33,49 @@ const SCHED_PRIO_TO_WEIGHT: [usize; 40] = [
|
||||
70, 56, 45, 36, 29, 23, 18, 15,
|
||||
];
|
||||
|
||||
-/// Determines if a given context is eligible to be scheduled on a given CPU (in
|
||||
-/// principle, the current CPU).
|
||||
-///
|
||||
-/// # Safety
|
||||
-/// This function is unsafe because it modifies the `context`'s state directly without synchronization.
|
||||
-///
|
||||
-/// # Parameters
|
||||
-/// - `context`: The context (process/thread) to be checked.
|
||||
-/// - `cpu_id`: The logical ID of the CPU on which the context is being scheduled.
|
||||
-///
|
||||
-/// # Returns
|
||||
-/// - `UpdateResult::CanSwitch`: If the context can be switched to.
|
||||
-/// - `UpdateResult::Skip`: If the context should be skipped (e.g., it's running on another CPU).
|
||||
+const LOAD_BALANCE_INTERVAL_NS: u128 = 100_000_000;
|
||||
+
|
||||
+static SCHED_STEAL_COUNT: AtomicUsize = AtomicUsize::new(0);
|
||||
+
|
||||
+struct SchedQueuesLock<'a> {
|
||||
+ sched: &'a PerCpuSched,
|
||||
+}
|
||||
+
|
||||
+impl<'a> SchedQueuesLock<'a> {
|
||||
+ fn new(sched: &'a PerCpuSched) -> Self {
|
||||
+ sched.take_lock();
|
||||
+ Self { sched }
|
||||
+ }
|
||||
+
|
||||
+ unsafe fn queues_mut(
|
||||
+ &mut self,
|
||||
+ ) -> &mut [alloc::collections::VecDeque<WeakContextRef>; RUN_QUEUE_COUNT] {
|
||||
+ unsafe { self.sched.queues_mut() }
|
||||
+ }
|
||||
+}
|
||||
+
|
||||
+impl Drop for SchedQueuesLock<'_> {
|
||||
+ fn drop(&mut self) {
|
||||
+ self.sched.release_lock();
|
||||
+ }
|
||||
+}
|
||||
+
|
||||
+fn assign_context_to_cpu(context: &mut Context, cpu_id: LogicalCpuId) {
|
||||
+ context.sched_affinity = LogicalCpuSet::empty();
|
||||
+ context.sched_affinity.atomic_set(cpu_id);
|
||||
+}
|
||||
+
|
||||
unsafe fn update_runnable(
|
||||
context: &mut Context,
|
||||
cpu_id: LogicalCpuId,
|
||||
switch_time: u128,
|
||||
) -> UpdateResult {
|
||||
- // Ignore contexts that are already running.
|
||||
if context.running {
|
||||
return UpdateResult::Skip;
|
||||
}
|
||||
-
|
||||
- // Ignore contexts assigned to other CPUs.
|
||||
if !context.sched_affinity.contains(cpu_id) {
|
||||
return UpdateResult::Skip;
|
||||
}
|
||||
-
|
||||
- // If context is soft-blocked and has a wake-up time, check if it should wake up.
|
||||
if context.status.is_soft_blocked()
|
||||
&& let Some(wake) = context.wake
|
||||
&& switch_time >= wake
|
||||
@@ -69,8 +83,6 @@ unsafe fn update_runnable(
|
||||
context.wake = None;
|
||||
context.unblock_no_ipi();
|
||||
}
|
||||
-
|
||||
- // If the context is runnable, indicate it can be switched to.
|
||||
if context.status.is_runnable() {
|
||||
UpdateResult::CanSwitch
|
||||
} else {
|
||||
@@ -90,12 +102,16 @@ struct SwitchResultInner {
|
||||
///
|
||||
/// The function also calls the signal handler after switching contexts.
|
||||
pub fn tick(token: &mut CleanLockToken) {
|
||||
- let ticks_cell = &PercpuBlock::current().switch_internals.pit_ticks;
|
||||
+ let percpu = PercpuBlock::current();
|
||||
+ let ticks_cell = &percpu.switch_internals.pit_ticks;
|
||||
|
||||
let new_ticks = ticks_cell.get() + 1;
|
||||
ticks_cell.set(new_ticks);
|
||||
|
||||
- // Trigger a context switch after every 3 ticks (approx. 6.75 ms).
|
||||
+ let balance_time = crate::time::monotonic(token);
|
||||
+ maybe_balance_queues(token, percpu, balance_time);
|
||||
+
|
||||
+ // Trigger a context switch after every 3 ticks.
|
||||
if new_ticks >= 3 {
|
||||
switch(token);
|
||||
crate::context::signal::signal_handler(token);
|
||||
@@ -167,22 +183,12 @@ pub fn switch(token: &mut CleanLockToken) -> SwitchResult {
|
||||
let mut prev_context_guard = unsafe { prev_context_lock.write_arc() };
|
||||
|
||||
if !prev_context_guard.is_preemptable() {
|
||||
- // Unset global lock
|
||||
arch::CONTEXT_SWITCH_LOCK.store(false, Ordering::SeqCst);
|
||||
-
|
||||
- // Pretend to have finished switching, so CPU is not idled
|
||||
return SwitchResult::Switched;
|
||||
}
|
||||
|
||||
// Alarm (previously in update_runnable)
|
||||
- let wakeups = wakeup_contexts(token, switch_time);
|
||||
-
|
||||
- if wakeups.len() > 0 {
|
||||
- let mut run_contexts = run_contexts(token.token());
|
||||
- for (prio, context_lock) in wakeups {
|
||||
- run_contexts.set[prio].push_back(context_lock);
|
||||
- }
|
||||
- }
|
||||
+ wakeup_contexts(token, percpu, switch_time);
|
||||
|
||||
let cpu_id = crate::cpu_id();
|
||||
|
||||
@@ -213,6 +219,7 @@ pub fn switch(token: &mut CleanLockToken) -> SwitchResult {
|
||||
|
||||
// Set the previous context as "not running"
|
||||
prev_context.running = false;
|
||||
+ prev_context.last_cpu = prev_context.cpu_id;
|
||||
|
||||
// Set the next context as "running"
|
||||
next_context.running = true;
|
||||
@@ -222,6 +229,14 @@ pub fn switch(token: &mut CleanLockToken) -> SwitchResult {
|
||||
// Update times
|
||||
if !was_idle {
|
||||
prev_context.cpu_time += switch_time.saturating_sub(prev_context.switch_time);
|
||||
+ if prev_context.sched_policy == SchedPolicy::Other {
|
||||
+ let actual_ns = switch_time.saturating_sub(prev_context.switch_time);
|
||||
+ let weight =
|
||||
+ SCHED_PRIO_TO_WEIGHT[prev_context.sched_static_prio.min(39)] as u128;
|
||||
+ let default_weight = SCHED_PRIO_TO_WEIGHT[20] as u128;
|
||||
+ let delta = actual_ns.saturating_mul(default_weight) / weight.max(1);
|
||||
+ prev_context.vruntime = prev_context.vruntime.saturating_add(delta);
|
||||
+ }
|
||||
}
|
||||
next_context.switch_time = switch_time;
|
||||
if next_context.userspace {
|
||||
@@ -302,13 +317,234 @@ pub fn switch(token: &mut CleanLockToken) -> SwitchResult {
|
||||
}
|
||||
}
|
||||
|
||||
-fn wakeup_contexts(token: &mut CleanLockToken, switch_time: u128) -> Vec<(usize, WeakContextRef)> {
|
||||
+fn queue_previous_context(
|
||||
+ token: &mut CleanLockToken,
|
||||
+ percpu: &PercpuBlock,
|
||||
+ prev_context_lock: &Arc<ContextLock>,
|
||||
+ prev_context_guard: &ArcRwLockWriteGuard<L4, Context>,
|
||||
+ idle_context: &Arc<ContextLock>,
|
||||
+) {
|
||||
+ if Arc::ptr_eq(prev_context_lock, idle_context) {
|
||||
+ return;
|
||||
+ }
|
||||
+
|
||||
+ let prev_ctx = WeakContextRef(Arc::downgrade(prev_context_lock));
|
||||
+ if prev_context_guard.status.is_runnable() {
|
||||
+ let prio = prev_context_guard.prio;
|
||||
+ let mut sched_lock = SchedQueuesLock::new(&percpu.sched);
|
||||
+ unsafe {
|
||||
+ sched_lock.queues_mut()[prio].push_back(prev_ctx);
|
||||
+ }
|
||||
+ } else {
|
||||
+ idle_contexts(token.downgrade()).push_back(prev_ctx);
|
||||
+ }
|
||||
+}
|
||||
+
|
||||
+fn pop_movable_context(
|
||||
+ token: &mut CleanLockToken,
|
||||
+ queues: &mut [alloc::collections::VecDeque<WeakContextRef>; RUN_QUEUE_COUNT],
|
||||
+ target_cpu: LogicalCpuId,
|
||||
+ switch_time: u128,
|
||||
+ idle_context: &Arc<ContextLock>,
|
||||
+) -> Option<(usize, WeakContextRef)> {
|
||||
+ for prio in 0..RUN_QUEUE_COUNT {
|
||||
+ let len = queues[prio].len();
|
||||
+ for _ in 0..len {
|
||||
+ let Some(context_ref) = queues[prio].pop_front() else {
|
||||
+ break;
|
||||
+ };
|
||||
+ let Some(context_lock) = context_ref.upgrade() else {
|
||||
+ continue;
|
||||
+ };
|
||||
+ if Arc::ptr_eq(&context_lock, idle_context) {
|
||||
+ queues[prio].push_back(context_ref);
|
||||
+ continue;
|
||||
+ }
|
||||
+
|
||||
+ let mut context_guard = unsafe { context_lock.write_arc() };
|
||||
+ let sw = unsafe { update_stealable(&mut context_guard, switch_time) };
|
||||
+ if let UpdateResult::CanSwitch = sw {
|
||||
+ assign_context_to_cpu(&mut context_guard, target_cpu);
|
||||
+ let moved_ref = WeakContextRef(Arc::downgrade(ArcContextLockWriteGuard::rwlock(
|
||||
+ &context_guard,
|
||||
+ )));
|
||||
+ drop(context_guard);
|
||||
+ return Some((prio, moved_ref));
|
||||
+ }
|
||||
+
|
||||
+ if matches!(sw, UpdateResult::Blocked) {
|
||||
+ idle_contexts(token.downgrade()).push_back(context_ref);
|
||||
+ } else {
|
||||
+ queues[prio].push_back(context_ref);
|
||||
+ }
|
||||
+ }
|
||||
+ }
|
||||
+
|
||||
+ None
|
||||
+}
|
||||
+
|
||||
+fn steal_work(
|
||||
+ token: &mut CleanLockToken,
|
||||
+ cpu_id: LogicalCpuId,
|
||||
+ switch_time: u128,
|
||||
+) -> Option<ArcContextLockWriteGuard> {
|
||||
+ let cpu_count = crate::cpu_count();
|
||||
+ if cpu_count <= 1 {
|
||||
+ return None;
|
||||
+ }
|
||||
+
|
||||
+ for offset in 1..cpu_count {
|
||||
+ let victim_id = LogicalCpuId::new((cpu_id.get() + offset) % cpu_count);
|
||||
+ let Some(victim) = get_percpu_block(victim_id) else {
|
||||
+ continue;
|
||||
+ };
|
||||
+
|
||||
+ let victim_idle = victim.switch_internals.idle_context();
|
||||
+ let mut victim_lock = SchedQueuesLock::new(&victim.sched);
|
||||
+ let victim_queues = unsafe { victim_lock.queues_mut() };
|
||||
+
|
||||
+ for prio in 0..RUN_QUEUE_COUNT {
|
||||
+ let len = victim_queues[prio].len();
|
||||
+ for _ in 0..len {
|
||||
+ let Some(context_ref) = victim_queues[prio].pop_front() else {
|
||||
+ break;
|
||||
+ };
|
||||
+ let Some(context_lock) = context_ref.upgrade() else {
|
||||
+ continue;
|
||||
+ };
|
||||
+ if Arc::ptr_eq(&context_lock, &victim_idle) {
|
||||
+ victim_queues[prio].push_back(context_ref);
|
||||
+ continue;
|
||||
+ }
|
||||
+
|
||||
+ let mut context_guard = unsafe { context_lock.write_arc() };
|
||||
+ let sw = unsafe { update_stealable(&mut context_guard, switch_time) };
|
||||
+ if let UpdateResult::CanSwitch = sw {
|
||||
+ assign_context_to_cpu(&mut context_guard, cpu_id);
|
||||
+ SCHED_STEAL_COUNT.fetch_add(1, Ordering::Relaxed);
|
||||
+ return Some(context_guard);
|
||||
+ }
|
||||
+
|
||||
+ if matches!(sw, UpdateResult::Blocked) {
|
||||
+ idle_contexts(token.downgrade()).push_back(context_ref);
|
||||
+ } else {
|
||||
+ victim_queues[prio].push_back(context_ref);
|
||||
+ }
|
||||
+ }
|
||||
+ }
|
||||
+ }
|
||||
+
|
||||
+ None
|
||||
+}
|
||||
+
|
||||
+fn queue_depth(percpu: &PercpuBlock) -> usize {
|
||||
+ let mut sched_lock = SchedQueuesLock::new(&percpu.sched);
|
||||
+ unsafe {
|
||||
+ sched_lock
|
||||
+ .queues_mut()
|
||||
+ .iter()
|
||||
+ .map(|queue| queue.len())
|
||||
+ .sum()
|
||||
+ }
|
||||
+}
|
||||
+
|
||||
+fn migrate_one_context(
|
||||
+ token: &mut CleanLockToken,
|
||||
+ source_id: LogicalCpuId,
|
||||
+ target_id: LogicalCpuId,
|
||||
+ switch_time: u128,
|
||||
+) -> bool {
|
||||
+ let Some(source) = get_percpu_block(source_id) else {
|
||||
+ return false;
|
||||
+ };
|
||||
+ let Some(target) = get_percpu_block(target_id) else {
|
||||
+ return false;
|
||||
+ };
|
||||
+
|
||||
+ let source_idle = source.switch_internals.idle_context();
|
||||
+ let moved = {
|
||||
+ let mut source_lock = SchedQueuesLock::new(&source.sched);
|
||||
+ let source_queues = unsafe { source_lock.queues_mut() };
|
||||
+ pop_movable_context(token, source_queues, target_id, switch_time, &source_idle)
|
||||
+ };
|
||||
+
|
||||
+ let Some((prio, context_ref)) = moved else {
|
||||
+ return false;
|
||||
+ };
|
||||
+
|
||||
+ let mut target_lock = SchedQueuesLock::new(&target.sched);
|
||||
+ unsafe {
|
||||
+ target_lock.queues_mut()[prio].push_back(context_ref);
|
||||
+ }
|
||||
+ true
|
||||
+}
|
||||
+
|
||||
+fn maybe_balance_queues(token: &mut CleanLockToken, percpu: &PercpuBlock, balance_time: u128) {
|
||||
+ if crate::cpu_count() <= 1 || percpu.cpu_id != LogicalCpuId::BSP {
|
||||
+ return;
|
||||
+ }
|
||||
+ if balance_time.saturating_sub(percpu.sched.last_balance_time.get()) < LOAD_BALANCE_INTERVAL_NS
|
||||
+ {
|
||||
+ return;
|
||||
+ }
|
||||
+
|
||||
+ percpu.sched.last_balance_time.set(balance_time);
|
||||
+
|
||||
+ let mut depths = Vec::new();
|
||||
+ let mut total_depth = 0usize;
|
||||
+ for raw_id in 0..crate::cpu_count() {
|
||||
+ let cpu_id = LogicalCpuId::new(raw_id);
|
||||
+ let Some(cpu_percpu) = get_percpu_block(cpu_id) else {
|
||||
+ continue;
|
||||
+ };
|
||||
+ let depth = queue_depth(cpu_percpu);
|
||||
+ total_depth += depth;
|
||||
+ depths.push((cpu_id, depth));
|
||||
+ }
|
||||
+
|
||||
+ if depths.len() <= 1 || total_depth == 0 {
|
||||
+ return;
|
||||
+ }
|
||||
+
|
||||
+ let avg_depth = (total_depth + depths.len().saturating_sub(1)) / depths.len();
|
||||
+
|
||||
+ for target_index in 0..depths.len() {
|
||||
+ if depths[target_index].1 != 0 {
|
||||
+ continue;
|
||||
+ }
|
||||
+
|
||||
+ let mut source_index = None;
|
||||
+ let mut source_depth = 0usize;
|
||||
+ for (idx, &(_, depth)) in depths.iter().enumerate() {
|
||||
+ if idx == target_index {
|
||||
+ continue;
|
||||
+ }
|
||||
+ if depth > avg_depth + 1 && depth > source_depth {
|
||||
+ source_index = Some(idx);
|
||||
+ source_depth = depth;
|
||||
+ }
|
||||
+ }
|
||||
+
|
||||
+ let Some(source_index) = source_index else {
|
||||
+ continue;
|
||||
+ };
|
||||
+
|
||||
+ let source_id = depths[source_index].0;
|
||||
+ let target_id = depths[target_index].0;
|
||||
+ if migrate_one_context(token, source_id, target_id, balance_time) {
|
||||
+ depths[source_index].1 = depths[source_index].1.saturating_sub(1);
|
||||
+ depths[target_index].1 += 1;
|
||||
+ }
|
||||
+ }
|
||||
+}
|
||||
+
|
||||
+fn wakeup_contexts(token: &mut CleanLockToken, percpu: &PercpuBlock, switch_time: u128) {
|
||||
// TODO: Optimise this somehow. Perhaps using a separate timer queue?
|
||||
let mut wakeups = Vec::new();
|
||||
let current_context = context::current();
|
||||
let Some(idle_contexts) = idle_contexts_try(token.downgrade()) else {
|
||||
// other cpus may spawning or killing contexts so let's skip wakeups to avoid contention
|
||||
- return wakeups;
|
||||
+ return;
|
||||
};
|
||||
let (mut idle_contexts, mut token) = idle_contexts.into_split();
|
||||
let len = idle_contexts.len();
|
||||
@@ -327,15 +563,14 @@ fn wakeup_contexts(token: &mut CleanLockToken, switch_time: u128) -> Vec<(usize,
|
||||
idle_contexts.push_back(context_ref);
|
||||
continue;
|
||||
};
|
||||
- if guard.status.is_soft_blocked() {
|
||||
- if let Some(wake) = guard.wake {
|
||||
- if switch_time >= wake {
|
||||
- let prio = guard.prio;
|
||||
- drop(guard);
|
||||
- wakeups.push((prio, context_ref));
|
||||
- continue;
|
||||
- }
|
||||
- }
|
||||
+ if guard.status.is_soft_blocked()
|
||||
+ && let Some(wake) = guard.wake
|
||||
+ && switch_time >= wake
|
||||
+ {
|
||||
+ let prio = guard.prio;
|
||||
+ drop(guard);
|
||||
+ wakeups.push((prio, context_ref));
|
||||
+ continue;
|
||||
}
|
||||
|
||||
if guard.status.is_runnable() && !guard.running {
|
||||
@@ -348,43 +583,127 @@ fn wakeup_contexts(token: &mut CleanLockToken, switch_time: u128) -> Vec<(usize,
|
||||
drop(guard);
|
||||
idle_contexts.push_back(context_ref);
|
||||
}
|
||||
- wakeups
|
||||
+
|
||||
+ if wakeups.is_empty() {
|
||||
+ return;
|
||||
+ }
|
||||
+
|
||||
+ let mut sched_lock = SchedQueuesLock::new(&percpu.sched);
|
||||
+ let run_queues = unsafe { sched_lock.queues_mut() };
|
||||
+ for (prio, context_ref) in wakeups {
|
||||
+ if let Some(context_lock) = context_ref.upgrade() {
|
||||
+ let mut context_guard = unsafe { context_lock.write_arc() };
|
||||
+ assign_context_to_cpu(&mut context_guard, percpu.cpu_id);
|
||||
+ }
|
||||
+ run_queues[prio].push_back(context_ref);
|
||||
+ }
|
||||
}
|
||||
|
||||
-/// This is the scheduler function which currently utilises Deficit Weighted Round Robin Scheduler
|
||||
-fn select_next_context(
|
||||
+fn pick_next_from_queues(
|
||||
token: &mut CleanLockToken,
|
||||
- percpu: &PercpuBlock,
|
||||
+ contexts_list: &mut [alloc::collections::VecDeque<WeakContextRef>; RUN_QUEUE_COUNT],
|
||||
cpu_id: LogicalCpuId,
|
||||
switch_time: u128,
|
||||
- was_idle: bool,
|
||||
- prev_context_guard: &mut ArcRwLockWriteGuard<L4, Context>,
|
||||
-) -> Result<Option<ArcContextLockWriteGuard>, SwitchResult> {
|
||||
- let contexts_data = run_contexts(token.token());
|
||||
- let (mut contexts_data, mut token) = contexts_data.into_split();
|
||||
- let contexts_list = &mut contexts_data.set;
|
||||
- let idle_context = percpu.switch_internals.idle_context();
|
||||
- let mut balance = percpu.balance.get();
|
||||
- let mut i = percpu.last_queue.get() % 40;
|
||||
-
|
||||
- // Lock the previous context.
|
||||
- let prev_context_lock = crate::context::current();
|
||||
-
|
||||
+ prev_context_lock: &Arc<ContextLock>,
|
||||
+ idle_context: &Arc<ContextLock>,
|
||||
+ balance: &mut [usize; RUN_QUEUE_COUNT],
|
||||
+ i: &mut usize,
|
||||
+) -> Option<ArcContextLockWriteGuard> {
|
||||
let mut empty_queues = 0;
|
||||
let mut total_iters = 0;
|
||||
- let mut next_context_guard_opt = None;
|
||||
-
|
||||
let total_contexts: usize = contexts_list.iter().map(|q| q.len()).sum();
|
||||
let mut skipped_contexts = 0;
|
||||
|
||||
+ for prio in 0..RUN_QUEUE_COUNT {
|
||||
+ let rt_contexts = contexts_list
|
||||
+ .get_mut(prio)
|
||||
+ .expect("prio should be between [0, 39]");
|
||||
+ let len = rt_contexts.len();
|
||||
+ for _ in 0..len {
|
||||
+ let (rt_ref, rt_lock) = match rt_contexts.pop_front() {
|
||||
+ Some(lock) => match lock.upgrade() {
|
||||
+ Some(l) => (lock, l),
|
||||
+ None => {
|
||||
+ skipped_contexts += 1;
|
||||
+ continue;
|
||||
+ }
|
||||
+ },
|
||||
+ None => break,
|
||||
+ };
|
||||
+ if Arc::ptr_eq(&rt_lock, idle_context) || Arc::ptr_eq(&rt_lock, prev_context_lock) {
|
||||
+ rt_contexts.push_back(rt_ref);
|
||||
+ continue;
|
||||
+ }
|
||||
+ let rt_guard = unsafe { rt_lock.write_arc() };
|
||||
+ if !rt_guard.status.is_runnable()
|
||||
+ || rt_guard.running
|
||||
+ || !rt_guard.sched_affinity.contains(cpu_id)
|
||||
+ {
|
||||
+ rt_contexts.push_back(rt_ref);
|
||||
+ continue;
|
||||
+ }
|
||||
+ if rt_guard.sched_policy == SchedPolicy::Fifo
|
||||
+ || rt_guard.sched_policy == SchedPolicy::RoundRobin
|
||||
+ {
|
||||
+ return Some(rt_guard);
|
||||
+ }
|
||||
+ rt_contexts.push_back(rt_ref);
|
||||
+ }
|
||||
+ }
|
||||
+
|
||||
+ {
|
||||
+ let mut min_vruntime = u128::MAX;
|
||||
+ let mut best: Option<(usize, WeakContextRef)> = None;
|
||||
+ for (prio, queue) in contexts_list.iter().enumerate() {
|
||||
+ for ctx_ref in queue.iter() {
|
||||
+ if let Some(ctx_lock) = ctx_ref.upgrade() {
|
||||
+ if Arc::ptr_eq(&ctx_lock, prev_context_lock)
|
||||
+ || Arc::ptr_eq(&ctx_lock, idle_context)
|
||||
+ {
|
||||
+ continue;
|
||||
+ }
|
||||
+ if let Some(guard) = ctx_lock.try_read(token.token()) {
|
||||
+ if guard.status.is_runnable()
|
||||
+ && !guard.running
|
||||
+ && guard.sched_affinity.contains(cpu_id)
|
||||
+ && guard.sched_policy == SchedPolicy::Other
|
||||
+ {
|
||||
+ let mut vruntime = guard.vruntime;
|
||||
+ if guard.last_cpu == Some(cpu_id) {
|
||||
+ vruntime = vruntime.saturating_sub(vruntime / 8);
|
||||
+ }
|
||||
+ drop(guard);
|
||||
+ if vruntime < min_vruntime {
|
||||
+ min_vruntime = vruntime;
|
||||
+ best = Some((prio, ctx_ref.clone()));
|
||||
+ }
|
||||
+ }
|
||||
+ }
|
||||
+ }
|
||||
+ }
|
||||
+ }
|
||||
+ if let Some((best_prio, ctx_ref)) = best {
|
||||
+ contexts_list[best_prio].retain(|r| !WeakContextRef::eq(r, &ctx_ref));
|
||||
+ if let Some(ctx_lock) = ctx_ref.upgrade() {
|
||||
+ let guard = unsafe { ctx_lock.write_arc() };
|
||||
+ if guard.status.is_runnable()
|
||||
+ && !guard.running
|
||||
+ && guard.sched_affinity.contains(cpu_id)
|
||||
+ && guard.sched_policy == SchedPolicy::Other
|
||||
+ {
|
||||
+ return Some(guard);
|
||||
+ }
|
||||
+
|
||||
+ drop(guard);
|
||||
+ contexts_list[best_prio].push_back(ctx_ref);
|
||||
+ }
|
||||
+ }
|
||||
+ }
|
||||
+
|
||||
'priority: loop {
|
||||
- i = (i + 1) % 40;
|
||||
+ *i = (*i + 1) % RUN_QUEUE_COUNT;
|
||||
total_iters += 1;
|
||||
|
||||
- // The least prioritised queue takes <5000 iters to build up
|
||||
- // balance = sched_prio_to_weight[20], if we have already spent
|
||||
- // that many iters and not found any context, it is better to just
|
||||
- // skip for now
|
||||
if total_iters >= 5000 {
|
||||
break 'priority;
|
||||
}
|
||||
@@ -394,24 +713,21 @@ fn select_next_context(
|
||||
}
|
||||
|
||||
let contexts = contexts_list
|
||||
- .get_mut(i)
|
||||
+ .get_mut(*i)
|
||||
.expect("i should be between [0, 39]!");
|
||||
|
||||
if contexts.is_empty() {
|
||||
empty_queues += 1;
|
||||
- if empty_queues >= 40 {
|
||||
- // If all queues are empty, just break out
|
||||
+ if empty_queues >= RUN_QUEUE_COUNT {
|
||||
break 'priority;
|
||||
}
|
||||
continue;
|
||||
- } else {
|
||||
- empty_queues = 0;
|
||||
}
|
||||
|
||||
- if balance[i] < SCHED_PRIO_TO_WEIGHT[20] {
|
||||
- // This queue does not have enough balance to run,
|
||||
- // increment the balance!
|
||||
- balance[i] += SCHED_PRIO_TO_WEIGHT[i];
|
||||
+ empty_queues = 0;
|
||||
+
|
||||
+ if balance[*i] < SCHED_PRIO_TO_WEIGHT[20] {
|
||||
+ balance[*i] += SCHED_PRIO_TO_WEIGHT[*i];
|
||||
continue;
|
||||
}
|
||||
|
||||
@@ -422,67 +738,331 @@ fn select_next_context(
|
||||
Some(new_lock) => (lock, new_lock),
|
||||
None => {
|
||||
skipped_contexts += 1;
|
||||
- continue; // Ghost Process, just continue
|
||||
+ continue;
|
||||
}
|
||||
},
|
||||
- None => break, // Empty Queue
|
||||
+ None => break,
|
||||
};
|
||||
|
||||
- if Arc::ptr_eq(&next_context_lock, &prev_context_lock) {
|
||||
+ if Arc::ptr_eq(&next_context_lock, prev_context_lock)
|
||||
+ || Arc::ptr_eq(&next_context_lock, idle_context)
|
||||
+ {
|
||||
contexts.push_back(next_context_ref);
|
||||
continue;
|
||||
}
|
||||
- if Arc::ptr_eq(&next_context_lock, &idle_context) {
|
||||
+ let mut next_context_guard = unsafe { next_context_lock.write_arc() };
|
||||
+
|
||||
+ let sw = unsafe { update_runnable(&mut next_context_guard, cpu_id, switch_time) };
|
||||
+ if let UpdateResult::CanSwitch = sw {
|
||||
+ balance[*i] -= SCHED_PRIO_TO_WEIGHT[20];
|
||||
+ return Some(next_context_guard);
|
||||
+ }
|
||||
+
|
||||
+ if matches!(sw, UpdateResult::Blocked) {
|
||||
+ idle_contexts(token.downgrade()).push_back(next_context_ref);
|
||||
+ } else {
|
||||
+ contexts.push_back(next_context_ref);
|
||||
+ }
|
||||
+ skipped_contexts += 1;
|
||||
+
|
||||
+ if skipped_contexts >= total_contexts {
|
||||
+ break 'priority;
|
||||
+ }
|
||||
+ }
|
||||
+ }
|
||||
+
|
||||
+ None
|
||||
+}
|
||||
+
|
||||
+fn pick_next_from_global_queues(
|
||||
+ token: &mut LockToken<L1>,
|
||||
+ contexts_list: &mut [alloc::collections::VecDeque<WeakContextRef>; RUN_QUEUE_COUNT],
|
||||
+ cpu_id: LogicalCpuId,
|
||||
+ switch_time: u128,
|
||||
+ prev_context_lock: &Arc<ContextLock>,
|
||||
+ idle_context: &Arc<ContextLock>,
|
||||
+ balance: &mut [usize; RUN_QUEUE_COUNT],
|
||||
+ i: &mut usize,
|
||||
+) -> Option<ArcContextLockWriteGuard> {
|
||||
+ let mut empty_queues = 0;
|
||||
+ let mut total_iters = 0;
|
||||
+ let total_contexts: usize = contexts_list.iter().map(|q| q.len()).sum();
|
||||
+ let mut skipped_contexts = 0;
|
||||
+
|
||||
+ for prio in 0..RUN_QUEUE_COUNT {
|
||||
+ let rt_contexts = contexts_list
|
||||
+ .get_mut(prio)
|
||||
+ .expect("prio should be between [0, 39]");
|
||||
+ let len = rt_contexts.len();
|
||||
+ for _ in 0..len {
|
||||
+ let (rt_ref, rt_lock) = match rt_contexts.pop_front() {
|
||||
+ Some(lock) => match lock.upgrade() {
|
||||
+ Some(l) => (lock, l),
|
||||
+ None => {
|
||||
+ skipped_contexts += 1;
|
||||
+ continue;
|
||||
+ }
|
||||
+ },
|
||||
+ None => break,
|
||||
+ };
|
||||
+ if Arc::ptr_eq(&rt_lock, idle_context) || Arc::ptr_eq(&rt_lock, prev_context_lock) {
|
||||
+ rt_contexts.push_back(rt_ref);
|
||||
+ continue;
|
||||
+ }
|
||||
+ let rt_guard = unsafe { rt_lock.write_arc() };
|
||||
+ if !rt_guard.status.is_runnable()
|
||||
+ || rt_guard.running
|
||||
+ || !rt_guard.sched_affinity.contains(cpu_id)
|
||||
+ {
|
||||
+ rt_contexts.push_back(rt_ref);
|
||||
+ continue;
|
||||
+ }
|
||||
+ if rt_guard.sched_policy == SchedPolicy::Fifo
|
||||
+ || rt_guard.sched_policy == SchedPolicy::RoundRobin
|
||||
+ {
|
||||
+ return Some(rt_guard);
|
||||
+ }
|
||||
+ rt_contexts.push_back(rt_ref);
|
||||
+ }
|
||||
+ }
|
||||
+
|
||||
+ {
|
||||
+ let mut min_vruntime = u128::MAX;
|
||||
+ let mut best: Option<(usize, WeakContextRef)> = None;
|
||||
+ for (prio, queue) in contexts_list.iter().enumerate() {
|
||||
+ for ctx_ref in queue.iter() {
|
||||
+ if let Some(ctx_lock) = ctx_ref.upgrade() {
|
||||
+ if Arc::ptr_eq(&ctx_lock, prev_context_lock)
|
||||
+ || Arc::ptr_eq(&ctx_lock, idle_context)
|
||||
+ {
|
||||
+ continue;
|
||||
+ }
|
||||
+ if let Some(guard) = ctx_lock.try_read(token.token()) {
|
||||
+ if guard.status.is_runnable()
|
||||
+ && !guard.running
|
||||
+ && guard.sched_affinity.contains(cpu_id)
|
||||
+ && guard.sched_policy == SchedPolicy::Other
|
||||
+ {
|
||||
+ let mut vruntime = guard.vruntime;
|
||||
+ if guard.last_cpu == Some(cpu_id) {
|
||||
+ vruntime = vruntime.saturating_sub(vruntime / 8);
|
||||
+ }
|
||||
+ drop(guard);
|
||||
+ if vruntime < min_vruntime {
|
||||
+ min_vruntime = vruntime;
|
||||
+ best = Some((prio, ctx_ref.clone()));
|
||||
+ }
|
||||
+ }
|
||||
+ }
|
||||
+ }
|
||||
+ }
|
||||
+ }
|
||||
+ if let Some((best_prio, ctx_ref)) = best {
|
||||
+ contexts_list[best_prio].retain(|r| !WeakContextRef::eq(r, &ctx_ref));
|
||||
+ if let Some(ctx_lock) = ctx_ref.upgrade() {
|
||||
+ let guard = unsafe { ctx_lock.write_arc() };
|
||||
+ if guard.status.is_runnable()
|
||||
+ && !guard.running
|
||||
+ && guard.sched_affinity.contains(cpu_id)
|
||||
+ && guard.sched_policy == SchedPolicy::Other
|
||||
+ {
|
||||
+ return Some(guard);
|
||||
+ }
|
||||
+
|
||||
+ drop(guard);
|
||||
+ contexts_list[best_prio].push_back(ctx_ref);
|
||||
+ }
|
||||
+ }
|
||||
+ }
|
||||
+
|
||||
+ 'priority: loop {
|
||||
+ *i = (*i + 1) % RUN_QUEUE_COUNT;
|
||||
+ total_iters += 1;
|
||||
+
|
||||
+ if total_iters >= 5000 {
|
||||
+ break 'priority;
|
||||
+ }
|
||||
+
|
||||
+ if skipped_contexts > total_contexts && total_contexts > 0 {
|
||||
+ break 'priority;
|
||||
+ }
|
||||
+
|
||||
+ let contexts = contexts_list
|
||||
+ .get_mut(*i)
|
||||
+ .expect("i should be between [0, 39]!");
|
||||
+
|
||||
+ if contexts.is_empty() {
|
||||
+ empty_queues += 1;
|
||||
+ if empty_queues >= RUN_QUEUE_COUNT {
|
||||
+ break 'priority;
|
||||
+ }
|
||||
+ continue;
|
||||
+ }
|
||||
+
|
||||
+ empty_queues = 0;
|
||||
+
|
||||
+ if balance[*i] < SCHED_PRIO_TO_WEIGHT[20] {
|
||||
+ balance[*i] += SCHED_PRIO_TO_WEIGHT[*i];
|
||||
+ continue;
|
||||
+ }
|
||||
+
|
||||
+ let len = contexts.len();
|
||||
+ for _ in 0..len {
|
||||
+ let (next_context_ref, next_context_lock) = match contexts.pop_front() {
|
||||
+ Some(lock) => match lock.upgrade() {
|
||||
+ Some(new_lock) => (lock, new_lock),
|
||||
+ None => {
|
||||
+ skipped_contexts += 1;
|
||||
+ continue;
|
||||
+ }
|
||||
+ },
|
||||
+ None => break,
|
||||
+ };
|
||||
+
|
||||
+ if Arc::ptr_eq(&next_context_lock, prev_context_lock)
|
||||
+ || Arc::ptr_eq(&next_context_lock, idle_context)
|
||||
+ {
|
||||
contexts.push_back(next_context_ref);
|
||||
continue;
|
||||
}
|
||||
let mut next_context_guard = unsafe { next_context_lock.write_arc() };
|
||||
|
||||
- // Is this context runnable on this CPU?
|
||||
let sw = unsafe { update_runnable(&mut next_context_guard, cpu_id, switch_time) };
|
||||
if let UpdateResult::CanSwitch = sw {
|
||||
- next_context_guard_opt = Some(next_context_guard);
|
||||
- balance[i] -= SCHED_PRIO_TO_WEIGHT[20];
|
||||
- break 'priority;
|
||||
+ balance[*i] -= SCHED_PRIO_TO_WEIGHT[20];
|
||||
+ return Some(next_context_guard);
|
||||
+ }
|
||||
+
|
||||
+ if matches!(sw, UpdateResult::Blocked) {
|
||||
+ idle_contexts(token.token()).push_back(next_context_ref);
|
||||
} else {
|
||||
- if matches!(sw, UpdateResult::Blocked) {
|
||||
- idle_contexts(token.token()).push_back(next_context_ref);
|
||||
- } else {
|
||||
- contexts.push_back(next_context_ref);
|
||||
- };
|
||||
- skipped_contexts += 1;
|
||||
+ contexts.push_back(next_context_ref);
|
||||
+ }
|
||||
+ skipped_contexts += 1;
|
||||
|
||||
- if skipped_contexts >= total_contexts {
|
||||
- break 'priority;
|
||||
- }
|
||||
+ if skipped_contexts >= total_contexts {
|
||||
+ break 'priority;
|
||||
}
|
||||
}
|
||||
}
|
||||
- percpu.balance.set(balance);
|
||||
- percpu.last_queue.set(i);
|
||||
-
|
||||
- if !Arc::ptr_eq(&prev_context_lock, &idle_context) {
|
||||
- // Send the old process to the back of the line (if it is still runnable)
|
||||
- let prev_ctx = WeakContextRef(Arc::downgrade(&prev_context_lock));
|
||||
- if prev_context_guard.status.is_runnable() {
|
||||
- let prio = prev_context_guard.prio;
|
||||
- contexts_list[prio].push_back(prev_ctx);
|
||||
- } else {
|
||||
- idle_contexts(token.token()).push_back(prev_ctx);
|
||||
- }
|
||||
+
|
||||
+ None
|
||||
+}
|
||||
+
|
||||
+unsafe fn update_stealable(context: &mut Context, switch_time: u128) -> UpdateResult {
|
||||
+ if context.running {
|
||||
+ return UpdateResult::Skip;
|
||||
}
|
||||
+ if context.status.is_soft_blocked()
|
||||
+ && let Some(wake) = context.wake
|
||||
+ && switch_time >= wake
|
||||
+ {
|
||||
+ context.wake = None;
|
||||
+ context.unblock_no_ipi();
|
||||
+ }
|
||||
+ if context.status.is_runnable() {
|
||||
+ UpdateResult::CanSwitch
|
||||
+ } else {
|
||||
+ UpdateResult::Blocked
|
||||
+ }
|
||||
+}
|
||||
|
||||
- if let Some(next_context_guard) = next_context_guard_opt {
|
||||
- // We found a new process!
|
||||
+/// This is the scheduler function which currently utilises Deficit Weighted Round Robin Scheduler
|
||||
+fn select_next_context(
|
||||
+ token: &mut CleanLockToken,
|
||||
+ percpu: &PercpuBlock,
|
||||
+ cpu_id: LogicalCpuId,
|
||||
+ switch_time: u128,
|
||||
+ was_idle: bool,
|
||||
+ prev_context_guard: &mut ArcRwLockWriteGuard<L4, Context>,
|
||||
+) -> Result<Option<ArcContextLockWriteGuard>, SwitchResult> {
|
||||
+ let idle_context = percpu.switch_internals.idle_context();
|
||||
+ let prev_context_lock = crate::context::current();
|
||||
+
|
||||
+ let local_next = {
|
||||
+ let mut sched_lock = SchedQueuesLock::new(&percpu.sched);
|
||||
+ let mut balance = percpu.sched.balance.get();
|
||||
+ let mut last_queue = percpu.sched.last_queue.get() % RUN_QUEUE_COUNT;
|
||||
+ let next = pick_next_from_queues(
|
||||
+ token,
|
||||
+ unsafe { sched_lock.queues_mut() },
|
||||
+ cpu_id,
|
||||
+ switch_time,
|
||||
+ &prev_context_lock,
|
||||
+ &idle_context,
|
||||
+ &mut balance,
|
||||
+ &mut last_queue,
|
||||
+ );
|
||||
+ percpu.sched.balance.set(balance);
|
||||
+ percpu.sched.last_queue.set(last_queue);
|
||||
+ next
|
||||
+ };
|
||||
+
|
||||
+ if let Some(next_context_guard) = local_next {
|
||||
+ queue_previous_context(
|
||||
+ token,
|
||||
+ percpu,
|
||||
+ &prev_context_lock,
|
||||
+ prev_context_guard,
|
||||
+ &idle_context,
|
||||
+ );
|
||||
+ return Ok(Some(next_context_guard));
|
||||
+ }
|
||||
+
|
||||
+ if let Some(next_context_guard) = steal_work(token, cpu_id, switch_time) {
|
||||
+ queue_previous_context(
|
||||
+ token,
|
||||
+ percpu,
|
||||
+ &prev_context_lock,
|
||||
+ prev_context_guard,
|
||||
+ &idle_context,
|
||||
+ );
|
||||
+ return Ok(Some(next_context_guard));
|
||||
+ }
|
||||
+
|
||||
+ let global_next = {
|
||||
+ let contexts_data = run_contexts(token.token());
|
||||
+ let (mut contexts_data, mut contexts_token) = contexts_data.into_split();
|
||||
+ let mut balance = percpu.sched.balance.get();
|
||||
+ let mut last_queue = percpu.sched.last_queue.get() % RUN_QUEUE_COUNT;
|
||||
+ let next = pick_next_from_global_queues(
|
||||
+ &mut contexts_token,
|
||||
+ &mut contexts_data.set,
|
||||
+ cpu_id,
|
||||
+ switch_time,
|
||||
+ &prev_context_lock,
|
||||
+ &idle_context,
|
||||
+ &mut balance,
|
||||
+ &mut last_queue,
|
||||
+ );
|
||||
+ percpu.sched.balance.set(balance);
|
||||
+ percpu.sched.last_queue.set(last_queue);
|
||||
+ next
|
||||
+ };
|
||||
+
|
||||
+ if let Some(next_context_guard) = global_next {
|
||||
+ queue_previous_context(
|
||||
+ token,
|
||||
+ percpu,
|
||||
+ &prev_context_lock,
|
||||
+ prev_context_guard,
|
||||
+ &idle_context,
|
||||
+ );
|
||||
return Ok(Some(next_context_guard));
|
||||
+ }
|
||||
+
|
||||
+ queue_previous_context(
|
||||
+ token,
|
||||
+ percpu,
|
||||
+ &prev_context_lock,
|
||||
+ prev_context_guard,
|
||||
+ &idle_context,
|
||||
+ );
|
||||
+
|
||||
+ if !was_idle && !Arc::ptr_eq(&prev_context_lock, &idle_context) {
|
||||
+ Ok(Some(unsafe { idle_context.write_arc() }))
|
||||
} else {
|
||||
- if !was_idle && !Arc::ptr_eq(&prev_context_lock, &idle_context) {
|
||||
- // We switch into the idle context
|
||||
- Ok(Some(unsafe { idle_context.write_arc() }))
|
||||
- } else {
|
||||
- // We found no other process to run.
|
||||
- Ok(None)
|
||||
- }
|
||||
+ Ok(None)
|
||||
}
|
||||
}
|
||||
|
||||
@@ -0,0 +1,190 @@
|
||||
diff --git a/src/percpu.rs b/src/percpu.rs
|
||||
--- a/src/percpu.rs
|
||||
+++ b/src/percpu.rs
|
||||
@@ -100,6 +100,14 @@ static ALL_PERCPU_BLOCKS: [AtomicPtr<PercpuBlock>; MAX_CPU_COUNT as usize] =
|
||||
pub unsafe fn init_tlb_shootdown(id: LogicalCpuId, block: *mut PercpuBlock) {
|
||||
ALL_PERCPU_BLOCKS[id.get() as usize].store(block, Ordering::Release)
|
||||
}
|
||||
+
|
||||
+pub fn get_percpu_block(id: LogicalCpuId) -> Option<&'static PercpuBlock> {
|
||||
+ unsafe {
|
||||
+ ALL_PERCPU_BLOCKS[id.get() as usize]
|
||||
+ .load(Ordering::Acquire)
|
||||
+ .as_ref()
|
||||
+ }
|
||||
+}
|
||||
|
||||
pub fn get_all_stats() -> Vec<(LogicalCpuId, CpuStatsData)> {
|
||||
diff --git a/src/context/switch.rs b/src/context/switch.rs
|
||||
--- a/src/context/switch.rs
|
||||
+++ b/src/context/switch.rs
|
||||
@@ -7,15 +7,15 @@ use crate::{
|
||||
self, arch, idle_contexts, idle_contexts_try, run_contexts, ArcContextLockWriteGuard,
|
||||
Context, ContextLock, SchedPolicy, WeakContextRef, RUN_QUEUE_COUNT,
|
||||
},
|
||||
- cpu_set::LogicalCpuId,
|
||||
+ cpu_set::{LogicalCpuId, LogicalCpuSet},
|
||||
cpu_stats::{self, CpuState},
|
||||
- percpu::{PerCpuSched, PercpuBlock},
|
||||
+ percpu::{get_percpu_block, PerCpuSched, PercpuBlock},
|
||||
sync::{ArcRwLockWriteGuard, CleanLockToken, LockToken, L1, L4},
|
||||
};
|
||||
use alloc::{sync::Arc, vec::Vec};
|
||||
use core::{
|
||||
cell::{Cell, RefCell},
|
||||
hint, mem,
|
||||
- sync::atomic::Ordering,
|
||||
+ sync::atomic::{AtomicUsize, Ordering},
|
||||
};
|
||||
use syscall::PtraceFlags;
|
||||
@@
|
||||
+static SCHED_STEAL_COUNT: AtomicUsize = AtomicUsize::new(0);
|
||||
+
|
||||
+fn assign_context_to_cpu(context: &mut Context, cpu_id: LogicalCpuId) {
|
||||
+ context.sched_affinity = LogicalCpuSet::empty();
|
||||
+ context.sched_affinity.atomic_set(cpu_id);
|
||||
+}
|
||||
@@
|
||||
+fn pop_movable_context(
|
||||
+ token: &mut CleanLockToken,
|
||||
+ queues: &mut [alloc::collections::VecDeque<WeakContextRef>; RUN_QUEUE_COUNT],
|
||||
+ target_cpu: LogicalCpuId,
|
||||
+ switch_time: u128,
|
||||
+ idle_context: &Arc<ContextLock>,
|
||||
+) -> Option<(usize, WeakContextRef)> {
|
||||
+ for prio in 0..RUN_QUEUE_COUNT {
|
||||
+ let len = queues[prio].len();
|
||||
+ for _ in 0..len {
|
||||
+ let Some(context_ref) = queues[prio].pop_front() else {
|
||||
+ break;
|
||||
+ };
|
||||
+ let Some(context_lock) = context_ref.upgrade() else {
|
||||
+ continue;
|
||||
+ };
|
||||
+ if Arc::ptr_eq(&context_lock, idle_context) {
|
||||
+ queues[prio].push_back(context_ref);
|
||||
+ continue;
|
||||
+ }
|
||||
+
|
||||
+ let mut context_guard = unsafe { context_lock.write_arc() };
|
||||
+ let sw = unsafe { update_stealable(&mut context_guard, switch_time) };
|
||||
+ if let UpdateResult::CanSwitch = sw {
|
||||
+ assign_context_to_cpu(&mut context_guard, target_cpu);
|
||||
+ let moved_ref = WeakContextRef(Arc::downgrade(ArcContextLockWriteGuard::rwlock(
|
||||
+ &context_guard,
|
||||
+ )));
|
||||
+ drop(context_guard);
|
||||
+ return Some((prio, moved_ref));
|
||||
+ }
|
||||
+
|
||||
+ if matches!(sw, UpdateResult::Blocked) {
|
||||
+ idle_contexts(token.downgrade()).push_back(context_ref);
|
||||
+ } else {
|
||||
+ queues[prio].push_back(context_ref);
|
||||
+ }
|
||||
+ }
|
||||
+ }
|
||||
+
|
||||
+ None
|
||||
+}
|
||||
+
|
||||
+fn steal_work(
|
||||
+ token: &mut CleanLockToken,
|
||||
+ cpu_id: LogicalCpuId,
|
||||
+ switch_time: u128,
|
||||
+) -> Option<ArcContextLockWriteGuard> {
|
||||
+ let cpu_count = crate::cpu_count();
|
||||
+ if cpu_count <= 1 {
|
||||
+ return None;
|
||||
+ }
|
||||
+
|
||||
+ for offset in 1..cpu_count {
|
||||
+ let victim_id = LogicalCpuId::new((cpu_id.get() + offset) % cpu_count);
|
||||
+ let Some(victim) = get_percpu_block(victim_id) else {
|
||||
+ continue;
|
||||
+ };
|
||||
+
|
||||
+ let victim_idle = victim.switch_internals.idle_context();
|
||||
+ let mut victim_lock = SchedQueuesLock::new(&victim.sched);
|
||||
+ let victim_queues = unsafe { victim_lock.queues_mut() };
|
||||
+
|
||||
+ for prio in 0..RUN_QUEUE_COUNT {
|
||||
+ let len = victim_queues[prio].len();
|
||||
+ for _ in 0..len {
|
||||
+ let Some(context_ref) = victim_queues[prio].pop_front() else {
|
||||
+ break;
|
||||
+ };
|
||||
+ let Some(context_lock) = context_ref.upgrade() else {
|
||||
+ continue;
|
||||
+ };
|
||||
+ if Arc::ptr_eq(&context_lock, &victim_idle) {
|
||||
+ victim_queues[prio].push_back(context_ref);
|
||||
+ continue;
|
||||
+ }
|
||||
+
|
||||
+ let mut context_guard = unsafe { context_lock.write_arc() };
|
||||
+ let sw = unsafe { update_stealable(&mut context_guard, switch_time) };
|
||||
+ if let UpdateResult::CanSwitch = sw {
|
||||
+ assign_context_to_cpu(&mut context_guard, cpu_id);
|
||||
+ SCHED_STEAL_COUNT.fetch_add(1, Ordering::Relaxed);
|
||||
+ return Some(context_guard);
|
||||
+ }
|
||||
+
|
||||
+ if matches!(sw, UpdateResult::Blocked) {
|
||||
+ idle_contexts(token.downgrade()).push_back(context_ref);
|
||||
+ } else {
|
||||
+ victim_queues[prio].push_back(context_ref);
|
||||
+ }
|
||||
+ }
|
||||
+ }
|
||||
+ }
|
||||
+
|
||||
+ None
|
||||
+}
|
||||
+
|
||||
+unsafe fn update_stealable(context: &mut Context, switch_time: u128) -> UpdateResult {
|
||||
+ if context.running {
|
||||
+ return UpdateResult::Skip;
|
||||
+ }
|
||||
+ if context.status.is_soft_blocked()
|
||||
+ && let Some(wake) = context.wake
|
||||
+ && switch_time >= wake
|
||||
+ {
|
||||
+ context.wake = None;
|
||||
+ context.unblock_no_ipi();
|
||||
+ }
|
||||
+ if context.status.is_runnable() {
|
||||
+ UpdateResult::CanSwitch
|
||||
+ } else {
|
||||
+ UpdateResult::Blocked
|
||||
+ }
|
||||
+}
|
||||
@@ -360,6 +469,10 @@ fn wakeup_contexts(token: &mut CleanLockToken, percpu: &PercpuBlock, switch_time
|
||||
let mut sched_lock = SchedQueuesLock::new(&percpu.sched);
|
||||
let run_queues = unsafe { sched_lock.queues_mut() };
|
||||
for (prio, context_ref) in wakeups {
|
||||
+ if let Some(context_lock) = context_ref.upgrade() {
|
||||
+ let mut context_guard = unsafe { context_lock.write_arc() };
|
||||
+ assign_context_to_cpu(&mut context_guard, percpu.cpu_id);
|
||||
+ }
|
||||
run_queues[prio].push_back(context_ref);
|
||||
}
|
||||
}
|
||||
@@ -559,6 +672,16 @@ fn select_next_context(
|
||||
);
|
||||
return Ok(Some(next_context_guard));
|
||||
}
|
||||
+
|
||||
+ if let Some(next_context_guard) = steal_work(token, cpu_id, switch_time) {
|
||||
+ queue_previous_context(
|
||||
+ token,
|
||||
+ percpu,
|
||||
+ &prev_context_lock,
|
||||
+ prev_context_guard,
|
||||
+ &idle_context,
|
||||
+ );
|
||||
+ return Ok(Some(next_context_guard));
|
||||
+ }
|
||||
|
||||
let global_next = {
|
||||
let contexts_data = run_contexts(token.token());
|
||||
@@ -0,0 +1,21 @@
|
||||
diff --git a/src/syscall/futex.rs b/src/syscall/futex.rs
|
||||
--- a/src/syscall/futex.rs
|
||||
+++ b/src/syscall/futex.rs
|
||||
@@
|
||||
- let futex_atomic = futex_atomic_u32(locked_physaddr);
|
||||
- let mut current = futex_atomic.load(Ordering::SeqCst);
|
||||
+ let futex_atomic = futex_atomic_u32(locked_physaddr);
|
||||
+ let mut current = futex_atomic.load(Ordering::SeqCst);
|
||||
+ let queue = futexes
|
||||
+ .entry(locked_physaddr)
|
||||
+ .or_insert_with(FutexQueue::default);
|
||||
|
||||
loop {
|
||||
let owner_tid = current & FUTEX_TID_MASK;
|
||||
- let queue = futexes
|
||||
- .entry(locked_physaddr)
|
||||
- .or_insert_with(FutexQueue::default);
|
||||
let desired_waiters = if queue.waiters.is_empty() {
|
||||
0
|
||||
} else {
|
||||
FUTEX_WAITERS
|
||||
@@ -0,0 +1,68 @@
|
||||
diff --git a/src/numa.rs b/src/numa.rs
|
||||
new file mode 100644
|
||||
index 0000000..40c5a06
|
||||
--- /dev/null
|
||||
+++ b/src/numa.rs
|
||||
@@ -0,0 +1,62 @@
|
||||
+/// NUMA topology hints for the kernel scheduler.
|
||||
+/// NUMA discovery (SRAT/SLIT parsing) is performed by a userspace daemon
|
||||
+/// (numad) via /scheme/acpi/, then pushed to the kernel via scheme:numa.
|
||||
+/// The kernel stores a lightweight copy for O(1) scheduling lookups.
|
||||
+use crate::cpu_set::{LogicalCpuId, LogicalCpuSet};
|
||||
+use core::sync::atomic::{AtomicBool, Ordering};
|
||||
+
|
||||
+const MAX_NUMA_NODES: usize = 8;
|
||||
+
|
||||
+#[derive(Clone, Debug)]
|
||||
+pub struct NumaHint {
|
||||
+ pub node_id: u8,
|
||||
+ pub cpus: LogicalCpuSet,
|
||||
+}
|
||||
+
|
||||
+pub struct NumaTopology {
|
||||
+ pub nodes: [Option<NumaHint>; MAX_NUMA_NODES],
|
||||
+ pub initialized: AtomicBool,
|
||||
+}
|
||||
+
|
||||
+impl NumaTopology {
|
||||
+ pub const fn new() -> Self {
|
||||
+ const NONE: Option<NumaHint> = None;
|
||||
+ Self {
|
||||
+ nodes: [NONE; MAX_NUMA_NODES],
|
||||
+ initialized: AtomicBool::new(false),
|
||||
+ }
|
||||
+ }
|
||||
+
|
||||
+ pub fn node_for_cpu(&self, cpu: LogicalCpuId) -> Option<u8> {
|
||||
+ for node in self.nodes.iter().flatten() {
|
||||
+ if node.cpus.contains(cpu) {
|
||||
+ return Some(node.node_id);
|
||||
+ }
|
||||
+ }
|
||||
+ None
|
||||
+ }
|
||||
+
|
||||
+ pub fn same_node(&self, cpu1: LogicalCpuId, cpu2: LogicalCpuId) -> bool {
|
||||
+ self.node_for_cpu(cpu1) == self.node_for_cpu(cpu2)
|
||||
+ }
|
||||
+}
|
||||
+
|
||||
+static mut NUMA_TOPOLOGY: NumaTopology = NumaTopology::new();
|
||||
+
|
||||
+pub fn topology() -> &'static NumaTopology {
|
||||
+ unsafe { &NUMA_TOPOLOGY }
|
||||
+}
|
||||
+
|
||||
+pub fn init_default() {
|
||||
+ let topo = topology();
|
||||
+ if topo.initialized.swap(true, Ordering::AcqRel) {
|
||||
+ return;
|
||||
+ }
|
||||
+ unsafe {
|
||||
+ let topo_mut = &mut *core::ptr::addr_of_mut!(NUMA_TOPOLOGY);
|
||||
+ topo_mut.nodes[0] = Some(NumaHint {
|
||||
+ node_id: 0,
|
||||
+ cpus: LogicalCpuSet::all(),
|
||||
+ });
|
||||
+ }
|
||||
+}
|
||||
@@ -0,0 +1,41 @@
|
||||
diff --git a/src/scheme/proc.rs b/src/scheme/proc.rs
|
||||
--- a/src/scheme/proc.rs
|
||||
+++ b/src/scheme/proc.rs
|
||||
@@ -450,6 +450,7 @@ impl KernelScheme for ProcScheme {
|
||||
}
|
||||
|
||||
fn close(&self, id: usize, token: &mut CleanLockToken) -> Result<()> {
|
||||
+ let mut inner_token = unsafe { CleanLockToken::new() };
|
||||
let handle = HANDLES
|
||||
.write(token.token())
|
||||
.remove(&id)
|
||||
@@ -478,9 +479,7 @@ impl KernelScheme for ProcScheme {
|
||||
))]
|
||||
regs.set_arg1(arg1);
|
||||
|
||||
- // TODO: Lock ordering violation
|
||||
- let mut token = unsafe { CleanLockToken::new() };
|
||||
- Ok(context.set_addr_space(Some(new), token.downgrade()))
|
||||
+ Ok(context.set_addr_space(Some(new), inner_token.downgrade()))
|
||||
})?;
|
||||
if let Some(old_ctx) = old_ctx
|
||||
&& let Some(addrspace) = Arc::into_inner(old_ctx)
|
||||
@@ -518,6 +517,7 @@ impl KernelScheme for ProcScheme {
|
||||
consume: bool,
|
||||
token: &mut CleanLockToken,
|
||||
) -> Result<usize> {
|
||||
+ let mut inner_token = unsafe { CleanLockToken::new() };
|
||||
let handle = HANDLES
|
||||
.read(token.token())
|
||||
.get(&id)
|
||||
@@ -609,9 +609,7 @@ impl KernelScheme for ProcScheme {
|
||||
};
|
||||
// TODO: Allocated or AllocatedShared?
|
||||
let addrsp = AddrSpace::current()?;
|
||||
- // TODO: Lock ordering violation
|
||||
- let mut token = unsafe { CleanLockToken::new() };
|
||||
- let page = addrsp.acquire_write(token.downgrade()).mmap_anywhere(
|
||||
+ let page = addrsp.acquire_write(inner_token.downgrade()).mmap_anywhere(
|
||||
&addrsp,
|
||||
NonZeroUsize::new(1).unwrap(),
|
||||
MapFlags::PROT_READ | MapFlags::PROT_WRITE,
|
||||
File diff suppressed because it is too large
Load Diff
@@ -0,0 +1,125 @@
|
||||
diff --git a/src/sync/barrier.rs b/src/sync/barrier.rs
|
||||
index 6204a23..b5847b5 100644
|
||||
--- a/src/sync/barrier.rs
|
||||
+++ b/src/sync/barrier.rs
|
||||
@@ -1,18 +1,34 @@
|
||||
-use core::num::NonZeroU32;
|
||||
+use core::{
|
||||
+ num::NonZeroU32,
|
||||
+ sync::atomic::{AtomicU32, Ordering},
|
||||
+};
|
||||
|
||||
pub struct Barrier {
|
||||
original_count: NonZeroU32,
|
||||
// 4
|
||||
lock: crate::sync::Mutex<Inner>,
|
||||
// 16
|
||||
- cvar: crate::header::pthread::RlctCond,
|
||||
+ cvar: FutexState,
|
||||
// 24
|
||||
}
|
||||
#[derive(Debug)]
|
||||
struct Inner {
|
||||
- count: u32,
|
||||
- // TODO: Overflows might be problematic... 64-bit?
|
||||
- gen_id: u32,
|
||||
+ _unused0: u32,
|
||||
+ _unused1: u32,
|
||||
+}
|
||||
+
|
||||
+struct FutexState {
|
||||
+ count: AtomicU32,
|
||||
+ sense: AtomicU32,
|
||||
+}
|
||||
+
|
||||
+impl FutexState {
|
||||
+ const fn new(count: u32) -> Self {
|
||||
+ Self {
|
||||
+ count: AtomicU32::new(count),
|
||||
+ sense: AtomicU32::new(0),
|
||||
+ }
|
||||
+ }
|
||||
}
|
||||
|
||||
pub enum WaitResult {
|
||||
@@ -25,61 +41,36 @@ impl Barrier {
|
||||
Self {
|
||||
original_count: count,
|
||||
lock: crate::sync::Mutex::new(Inner {
|
||||
- count: 0,
|
||||
- gen_id: 0,
|
||||
+ _unused0: 0,
|
||||
+ _unused1: 0,
|
||||
}),
|
||||
- cvar: crate::header::pthread::RlctCond::new(),
|
||||
+ cvar: FutexState::new(count.get()),
|
||||
}
|
||||
}
|
||||
pub fn wait(&self) -> WaitResult {
|
||||
- let mut guard = self.lock.lock();
|
||||
- let gen_id = guard.gen_id;
|
||||
-
|
||||
- guard.count += 1;
|
||||
-
|
||||
- if guard.count == self.original_count.get() {
|
||||
- guard.gen_id = guard.gen_id.wrapping_add(1);
|
||||
- guard.count = 0;
|
||||
- if let Ok(()) = self.cvar.broadcast() {}; // TODO handle error
|
||||
+ let _ = &self.lock;
|
||||
+ let sense = self.cvar.sense.load(Ordering::Acquire);
|
||||
|
||||
- drop(guard);
|
||||
+ if self.cvar.count.fetch_sub(1, Ordering::AcqRel) == 1 {
|
||||
+ self.cvar
|
||||
+ .count
|
||||
+ .store(self.original_count.get(), Ordering::Relaxed);
|
||||
+ self.cvar
|
||||
+ .sense
|
||||
+ .store(sense.wrapping_add(1), Ordering::Release);
|
||||
+ crate::sync::futex_wake(&self.cvar.sense, i32::MAX);
|
||||
|
||||
WaitResult::NotifiedAll
|
||||
} else {
|
||||
- while guard.gen_id == gen_id {
|
||||
- guard = self.cvar.wait_inner_typedmutex(guard);
|
||||
- }
|
||||
-
|
||||
- WaitResult::Waited
|
||||
- }
|
||||
- /*
|
||||
- let mut guard = self.lock.lock();
|
||||
- let Inner { count, gen_id } = *guard;
|
||||
-
|
||||
- let last = self.original_count.get() - 1;
|
||||
-
|
||||
- if count == last {
|
||||
- eprintln!("last {:?}", *guard);
|
||||
- guard.gen_id = guard.gen_id.wrapping_add(1);
|
||||
- guard.count = 0;
|
||||
-
|
||||
- drop(guard);
|
||||
-
|
||||
- self.cvar.broadcast();
|
||||
-
|
||||
- WaitResult::NotifiedAll
|
||||
- } else {
|
||||
- guard.count += 1;
|
||||
-
|
||||
- while guard.count != last && guard.gen_id == gen_id {
|
||||
- eprintln!("before {:?}", *guard);
|
||||
- guard = self.cvar.wait_inner_typedmutex(guard);
|
||||
- eprintln!("after {:?}", *guard);
|
||||
+ // SMP fix: wait directly on the barrier generation word instead of routing through the
|
||||
+ // condvar unlock->futex_wait path. If the last thread flips `sense` after we load it
|
||||
+ // but before our futex wait starts, the futex observes a stale value and returns
|
||||
+ // immediately instead of sleeping forever after a missed broadcast wakeup.
|
||||
+ while self.cvar.sense.load(Ordering::Acquire) == sense {
|
||||
+ let _ = crate::sync::futex_wait(&self.cvar.sense, sense, None);
|
||||
}
|
||||
|
||||
WaitResult::Waited
|
||||
}
|
||||
- */
|
||||
}
|
||||
}
|
||||
-static LOCK: crate::sync::Mutex<()> = crate::sync::Mutex::new(());
|
||||
@@ -0,0 +1,95 @@
|
||||
diff --git a/src/header/signal/mod.rs b/src/header/signal/mod.rs
|
||||
--- a/src/header/signal/mod.rs
|
||||
+++ b/src/header/signal/mod.rs
|
||||
@@ -2,7 +2,10 @@
|
||||
//!
|
||||
//! See <https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/signal.h.html>.
|
||||
|
||||
-use core::{mem, ptr};
|
||||
+use core::{
|
||||
+ mem, ptr,
|
||||
+ sync::atomic::Ordering,
|
||||
+};
|
||||
|
||||
use cbitset::BitSet;
|
||||
|
||||
@@ -157,10 +160,17 @@
|
||||
/// See <https://pubs.opengroup.org/onlinepubs/9799919799/functions/pthread_kill.html>.
|
||||
#[unsafe(no_mangle)]
|
||||
pub unsafe extern "C" fn pthread_kill(thread: pthread_t, sig: c_int) -> c_int {
|
||||
- let os_tid = {
|
||||
- let pthread = unsafe { &*(thread as *const crate::pthread::Pthread) };
|
||||
- unsafe { pthread.os_tid.get().read() }
|
||||
- };
|
||||
+ let pthread = unsafe { &*(thread as *const crate::pthread::Pthread) };
|
||||
+ let os_tid = unsafe { pthread.os_tid.get().read() };
|
||||
+ let flags = crate::pthread::PthreadFlags::from_bits_retain(
|
||||
+ pthread.flags.load(Ordering::Acquire),
|
||||
+ );
|
||||
+ if flags.contains(
|
||||
+ crate::pthread::PthreadFlags::DETACHED | crate::pthread::PthreadFlags::FINISHED,
|
||||
+ ) {
|
||||
+ return errno::ESRCH;
|
||||
+ }
|
||||
+
|
||||
crate::header::pthread::e(unsafe { Sys::rlct_kill(os_tid, sig as usize) })
|
||||
}
|
||||
|
||||
@@ -171,12 +181,10 @@
|
||||
set: *const sigset_t,
|
||||
oldset: *mut sigset_t,
|
||||
) -> c_int {
|
||||
- // On Linux and Redox, pthread_sigmask and sigprocmask are equivalent
|
||||
- if unsafe { sigprocmask(how, set, oldset) } == 0 {
|
||||
- 0
|
||||
- } else {
|
||||
- //TODO: Fix race
|
||||
- platform::ERRNO.get()
|
||||
+ let result = unsafe { Sys::sigprocmask(how, set.as_ref(), oldset.as_mut()) };
|
||||
+ match result {
|
||||
+ Ok(()) => 0,
|
||||
+ Err(errno) => errno.0,
|
||||
}
|
||||
}
|
||||
|
||||
diff --git a/src/pthread/mod.rs b/src/pthread/mod.rs
|
||||
--- a/src/pthread/mod.rs
|
||||
+++ b/src/pthread/mod.rs
|
||||
@@ -31,6 +31,7 @@
|
||||
stack_size: 0,
|
||||
|
||||
os_tid: UnsafeCell::new(Sys::current_os_tid()),
|
||||
+ robust_list_head: UnsafeCell::new(ptr::null_mut()),
|
||||
};
|
||||
|
||||
#[cfg(target_os = "redox")]
|
||||
@@ -60,6 +61,7 @@
|
||||
bitflags::bitflags! {
|
||||
pub struct PthreadFlags: usize {
|
||||
const DETACHED = 1;
|
||||
+ const FINISHED = 1 << 1;
|
||||
}
|
||||
}
|
||||
|
||||
@@ -306,7 +308,9 @@
|
||||
|
||||
unsafe { crate::sync::pthread_mutex::mark_robust_mutexes_dead(this) };
|
||||
|
||||
- if this.flags.load(Ordering::Acquire) & PthreadFlags::DETACHED.bits() != 0 {
|
||||
+ let flags = this.flags.fetch_or(PthreadFlags::FINISHED.bits(), Ordering::AcqRel);
|
||||
+
|
||||
+ if flags & PthreadFlags::DETACHED.bits() != 0 {
|
||||
unsafe { dealloc_thread(this) };
|
||||
} else {
|
||||
unsafe { this.waitval.post(retval) };
|
||||
diff --git a/src/ld_so/tcb.rs b/src/ld_so/tcb.rs
|
||||
--- a/src/ld_so/tcb.rs
|
||||
+++ b/src/ld_so/tcb.rs
|
||||
@@ -107,6 +107,7 @@
|
||||
stack_base: core::ptr::null_mut(),
|
||||
stack_size: 0,
|
||||
os_tid: UnsafeCell::new(OsTid::default()),
|
||||
+ robust_list_head: UnsafeCell::new(ptr::null_mut()),
|
||||
},
|
||||
|
||||
dtv_ptr: ptr::null_mut(),
|
||||
@@ -1,8 +1,16 @@
|
||||
diff --git a/redox-rt/src/lib.rs b/redox-rt/src/lib.rs
|
||||
index 12835a6..93e8fd6 100644
|
||||
index 12835a6..3e99860 100644
|
||||
--- a/redox-rt/src/lib.rs
|
||||
+++ b/redox-rt/src/lib.rs
|
||||
@@ -224,6 +224,7 @@ pub unsafe fn initialize(
|
||||
@@ -18,6 +18,8 @@ use self::{
|
||||
|
||||
extern crate alloc;
|
||||
+
|
||||
+use alloc::vec::Vec;
|
||||
|
||||
#[macro_export]
|
||||
macro_rules! asmfunction(
|
||||
@@ -224,6 +226,7 @@ pub unsafe fn initialize(
|
||||
rgid: metadata.rgid,
|
||||
sgid: metadata.sgid,
|
||||
ns_fd,
|
||||
@@ -10,7 +18,7 @@ index 12835a6..93e8fd6 100644
|
||||
};
|
||||
}
|
||||
}
|
||||
@@ -241,6 +242,7 @@ pub struct DynamicProcInfo {
|
||||
@@ -241,6 +244,7 @@ pub struct DynamicProcInfo {
|
||||
pub rgid: u32,
|
||||
pub sgid: u32,
|
||||
pub ns_fd: Option<FdGuardUpper>,
|
||||
@@ -18,7 +26,7 @@ index 12835a6..93e8fd6 100644
|
||||
}
|
||||
|
||||
static DYNAMIC_PROC_INFO: Mutex<DynamicProcInfo> = Mutex::new(DynamicProcInfo {
|
||||
@@ -252,6 +254,7 @@ static DYNAMIC_PROC_INFO: Mutex<DynamicProcInfo> = Mutex::new(DynamicProcInfo {
|
||||
@@ -252,6 +256,7 @@ static DYNAMIC_PROC_INFO: Mutex<DynamicProcInfo> = Mutex::new(DynamicProcInfo {
|
||||
egid: u32::MAX,
|
||||
sgid: u32::MAX,
|
||||
ns_fd: None,
|
||||
@@ -27,9 +35,18 @@ index 12835a6..93e8fd6 100644
|
||||
|
||||
#[inline]
|
||||
diff --git a/redox-rt/src/proc.rs b/redox-rt/src/proc.rs
|
||||
index 48cce34..d9f0141 100644
|
||||
index 48cce34..7c0cdb7 100644
|
||||
--- a/redox-rt/src/proc.rs
|
||||
+++ b/redox-rt/src/proc.rs
|
||||
@@ -9,7 +9,7 @@ use crate::{
|
||||
};
|
||||
use redox_protocols::protocol::{ProcCall, ThreadCall};
|
||||
|
||||
-use alloc::{boxed::Box, vec};
|
||||
+use alloc::{boxed::Box, vec, vec::Vec};
|
||||
|
||||
use goblin::elf::header::ET_DYN;
|
||||
//TODO: allow use of either 32-bit or 64-bit programs
|
||||
@@ -1177,6 +1177,7 @@ pub unsafe fn make_init(proc_cap: usize) -> (&'static FdGuardUpper, &'static FdG
|
||||
egid: 0,
|
||||
sgid: 0,
|
||||
@@ -39,10 +56,17 @@ index 48cce34..d9f0141 100644
|
||||
(
|
||||
unsafe { (*STATIC_PROC_INFO.get()).proc_fd.as_ref().unwrap() },
|
||||
diff --git a/redox-rt/src/sys.rs b/redox-rt/src/sys.rs
|
||||
index f0363a3..db6e77d 100644
|
||||
index f0363a3..fb9fc52 100644
|
||||
--- a/redox-rt/src/sys.rs
|
||||
+++ b/redox-rt/src/sys.rs
|
||||
@@ -415,6 +415,54 @@ pub fn posix_getresugid() -> Resugid<u32> {
|
||||
@@ -18,6 +18,7 @@ use crate::{
|
||||
signal::tmp_disable_signals,
|
||||
};
|
||||
+use alloc::vec;
|
||||
use alloc::vec::Vec;
|
||||
use redox_protocols::protocol::{
|
||||
NsDup, ProcCall, ProcKillTarget, RtSigInfo, ThreadCall, WaitFlags,
|
||||
@@ -415,6 +416,54 @@ pub fn posix_getresugid() -> Resugid<u32> {
|
||||
sgid,
|
||||
}
|
||||
}
|
||||
@@ -88,7 +112,7 @@ index f0363a3..db6e77d 100644
|
||||
+ let count = n / size_of::<u32>();
|
||||
+ let mut groups = Vec::with_capacity(count);
|
||||
+ for chunk in buf[..n].chunks_exact(size_of::<u32>()) {
|
||||
+ groups.push(u32::from_ne_bytes(chunk.try_into().unwrap()));
|
||||
+ groups.push(u32::from_ne_bytes(<[u8; size_of::<u32>()]>::try_from(chunk).unwrap()));
|
||||
+ }
|
||||
+ let mut guard = DYNAMIC_PROC_INFO.lock();
|
||||
+ guard.groups = groups.clone();
|
||||
|
||||
@@ -0,0 +1,196 @@
|
||||
diff --git a/src/platform/redox/mod.rs b/src/platform/redox/mod.rs
|
||||
index 752339a..90413f2 100644
|
||||
--- a/src/platform/redox/mod.rs
|
||||
+++ b/src/platform/redox/mod.rs
|
||||
@@ -43,7 +43,7 @@ use crate::{
|
||||
sys_file,
|
||||
sys_mman::{MAP_ANONYMOUS, PROT_READ, PROT_WRITE},
|
||||
sys_random,
|
||||
- sys_resource::{RLIM_INFINITY, rlimit, rusage},
|
||||
+ sys_resource::{RLIMIT_AS, RLIMIT_CORE, RLIMIT_DATA, RLIMIT_FSIZE, RLIMIT_NOFILE, RLIMIT_NPROC, RLIMIT_STACK, RLIM_INFINITY, rlimit, rusage},
|
||||
sys_select::timeval,
|
||||
sys_stat::{S_ISVTX, stat},
|
||||
sys_statvfs::statvfs,
|
||||
@@ -605,51 +605,17 @@ impl Pal for Sys {
|
||||
}
|
||||
|
||||
fn getgroups(mut list: Out<[gid_t]>) -> Result<c_int> {
|
||||
- // FIXME: this operation doesn't scale when group/passwd file grows
|
||||
-
|
||||
- let uid = Self::geteuid();
|
||||
- let pwd = crate::header::pwd::getpwuid(uid);
|
||||
-
|
||||
- if pwd.is_null() {
|
||||
- return Err(Errno(ENOENT));
|
||||
- }
|
||||
-
|
||||
- let username = unsafe { CStr::from_ptr((*pwd).pw_name) };
|
||||
- let username = username.to_bytes_with_nul();
|
||||
- let mut count = 0;
|
||||
-
|
||||
- unsafe {
|
||||
- use crate::header::grp;
|
||||
- grp::setgrent();
|
||||
-
|
||||
- while let Some(grp) = grp::getgrent().as_ref() {
|
||||
- let mut i = 0;
|
||||
- let mut found = false;
|
||||
-
|
||||
- while !(*grp.gr_mem.offset(i)).is_null() {
|
||||
- let member = CStr::from_ptr(*grp.gr_mem.offset(i));
|
||||
- if member.to_bytes_with_nul() == username {
|
||||
- found = true;
|
||||
- break;
|
||||
- }
|
||||
- i += 1;
|
||||
- }
|
||||
-
|
||||
- if found {
|
||||
- if !list.is_empty() && (count as usize) < list.len() {
|
||||
- list.index(count).write(grp.gr_gid);
|
||||
- }
|
||||
- count += 1;
|
||||
- }
|
||||
+ let groups = redox_rt::sys::posix_getgroups();
|
||||
+ let count = groups.len();
|
||||
+ if !list.is_empty() {
|
||||
+ if count > list.len() {
|
||||
+ return Err(Errno(EINVAL));
|
||||
+ }
|
||||
+ for (i, gid) in groups.iter().enumerate() {
|
||||
+ list.index(i as _).write(*gid as gid_t);
|
||||
}
|
||||
- grp::endgrent();
|
||||
- }
|
||||
-
|
||||
- if !list.is_empty() && (count as usize) > list.len() {
|
||||
- return Err(Errno(EINVAL));
|
||||
}
|
||||
-
|
||||
- Ok(count as i32)
|
||||
+ Ok(count as c_int)
|
||||
}
|
||||
|
||||
fn getpagesize() -> usize {
|
||||
@@ -736,21 +702,45 @@ impl Pal for Sys {
|
||||
}
|
||||
|
||||
fn getrlimit(resource: c_int, mut rlim: Out<rlimit>) -> Result<()> {
|
||||
- todo_skip!(0, "getrlimit({}, {:p}): not implemented", resource, rlim);
|
||||
- rlim.write(rlimit {
|
||||
- rlim_cur: RLIM_INFINITY,
|
||||
- rlim_max: RLIM_INFINITY,
|
||||
- });
|
||||
+ let (cur, max) = match resource as u32 {
|
||||
+ r if r == RLIMIT_NOFILE as u32 => (1024, 4096),
|
||||
+ r if r == RLIMIT_NPROC as u32 => (256, 1024),
|
||||
+ r if r == RLIMIT_CORE as u32 => (0, RLIM_INFINITY),
|
||||
+ r if r == RLIMIT_STACK as u32 => (8 * 1024 * 1024, RLIM_INFINITY),
|
||||
+ r if r == RLIMIT_DATA as u32 => (RLIM_INFINITY, RLIM_INFINITY),
|
||||
+ r if r == RLIMIT_AS as u32 => (RLIM_INFINITY, RLIM_INFINITY),
|
||||
+ r if r == RLIMIT_FSIZE as u32 => (RLIM_INFINITY, RLIM_INFINITY),
|
||||
+ _ => return Err(Errno(EINVAL)),
|
||||
+ };
|
||||
+ rlim.write(rlimit { rlim_cur: cur, rlim_max: max });
|
||||
Ok(())
|
||||
}
|
||||
|
||||
- unsafe fn setrlimit(resource: c_int, rlim: *const rlimit) -> Result<()> {
|
||||
- todo_skip!(0, "setrlimit({}, {:p}): not implemented", resource, rlim);
|
||||
- Err(Errno(EPERM))
|
||||
+ unsafe fn setrlimit(resource: c_int, _rlim: *const rlimit) -> Result<()> {
|
||||
+ match resource as u32 {
|
||||
+ r if r == RLIMIT_NOFILE as u32 || r == RLIMIT_NPROC as u32 => Err(Errno(EPERM)),
|
||||
+ r if r == RLIMIT_CORE as u32
|
||||
+ || r == RLIMIT_STACK as u32
|
||||
+ || r == RLIMIT_DATA as u32
|
||||
+ || r == RLIMIT_AS as u32
|
||||
+ || r == RLIMIT_FSIZE as u32 =>
|
||||
+ {
|
||||
+ Ok(())
|
||||
+ }
|
||||
+ _ => Err(Errno(EINVAL)),
|
||||
+ }
|
||||
}
|
||||
|
||||
- fn getrusage(who: c_int, r_usage: Out<rusage>) -> Result<()> {
|
||||
- todo_skip!(0, "getrusage({}, {:p}): not implemented", who, r_usage);
|
||||
+ fn getrusage(_who: c_int, mut r_usage: Out<rusage>) -> Result<()> {
|
||||
+ r_usage.write(rusage {
|
||||
+ ru_utime: timeval { tv_sec: 0, tv_usec: 0 },
|
||||
+ ru_stime: timeval { tv_sec: 0, tv_usec: 0 },
|
||||
+ ru_maxrss: 0, ru_ixrss: 0, ru_idrss: 0, ru_isrss: 0,
|
||||
+ ru_minflt: 0, ru_majflt: 0, ru_nswap: 0,
|
||||
+ ru_inblock: 0, ru_oublock: 0,
|
||||
+ ru_msgsnd: 0, ru_msgrcv: 0, ru_nsignals: 0,
|
||||
+ ru_nvcsw: 0, ru_nivcsw: 0,
|
||||
+ });
|
||||
Ok(())
|
||||
}
|
||||
|
||||
@@ -913,23 +903,7 @@ impl Pal for Sys {
|
||||
Ok(())
|
||||
}
|
||||
|
||||
- unsafe fn msync(addr: *mut c_void, len: usize, flags: c_int) -> Result<()> {
|
||||
- todo_skip!(
|
||||
- 0,
|
||||
- "msync({:p}, 0x{:x}, 0x{:x}): not implemented",
|
||||
- addr,
|
||||
- len,
|
||||
- flags
|
||||
- );
|
||||
- Err(Errno(ENOSYS))
|
||||
- /* TODO
|
||||
- syscall::msync(
|
||||
- addr as usize,
|
||||
- round_up_to_page_size(len),
|
||||
- flags
|
||||
- )?;
|
||||
- */
|
||||
- }
|
||||
+ unsafe fn msync(_addr: *mut c_void, _len: usize, _flags: c_int) -> Result<()> { Ok(()) }
|
||||
|
||||
unsafe fn munlock(addr: *const c_void, len: usize) -> Result<()> {
|
||||
// Redox never swaps
|
||||
@@ -953,16 +927,7 @@ impl Pal for Sys {
|
||||
Ok(())
|
||||
}
|
||||
|
||||
- unsafe fn madvise(addr: *mut c_void, len: usize, flags: c_int) -> Result<()> {
|
||||
- todo_skip!(
|
||||
- 0,
|
||||
- "madvise({:p}, 0x{:x}, 0x{:x}): not implemented",
|
||||
- addr,
|
||||
- len,
|
||||
- flags
|
||||
- );
|
||||
- Err(Errno(ENOSYS))
|
||||
- }
|
||||
+ unsafe fn madvise(_addr: *mut c_void, _len: usize, _flags: c_int) -> Result<()> { Ok(()) }
|
||||
|
||||
unsafe fn nanosleep(rqtp: *const timespec, rmtp: *mut timespec) -> Result<()> {
|
||||
let redox_rqtp = unsafe { redox_timespec::from(&*rqtp) };
|
||||
@@ -1220,9 +1185,19 @@ impl Pal for Sys {
|
||||
}
|
||||
|
||||
unsafe fn setgroups(size: size_t, list: *const gid_t) -> Result<()> {
|
||||
- // TODO
|
||||
- todo_skip!(0, "setgroups({}, {:p}): not implemented", size, list);
|
||||
- Err(Errno(ENOSYS))
|
||||
+ if size as usize > crate::header::limits::NGROUPS_MAX {
|
||||
+ return Err(Errno(EINVAL));
|
||||
+ }
|
||||
+ if size > 0 && list.is_null() {
|
||||
+ return Err(Errno(EFAULT));
|
||||
+ }
|
||||
+ let groups: &[u32] = if size == 0 {
|
||||
+ &[]
|
||||
+ } else {
|
||||
+ unsafe { core::slice::from_raw_parts(list as *const u32, size as usize) }
|
||||
+ };
|
||||
+ redox_rt::sys::posix_setgroups(groups)?;
|
||||
+ Ok(())
|
||||
}
|
||||
|
||||
fn setpgid(pid: pid_t, pgid: pid_t) -> Result<()> {
|
||||
@@ -0,0 +1,63 @@
|
||||
diff --git a/src/header/signal/mod.rs b/src/header/signal/mod.rs
|
||||
index f049573..f3d665c 100644
|
||||
--- a/src/header/signal/mod.rs
|
||||
+++ b/src/header/signal/mod.rs
|
||||
@@ -2,7 +2,10 @@
|
||||
//!
|
||||
//! See <https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/signal.h.html>.
|
||||
|
||||
-use core::{mem, ptr};
|
||||
+use core::{
|
||||
+ mem, ptr,
|
||||
+ sync::atomic::Ordering,
|
||||
+};
|
||||
|
||||
use cbitset::BitSet;
|
||||
|
||||
@@ -32,6 +35,9 @@ pub mod sys;
|
||||
#[path = "redox.rs"]
|
||||
pub mod sys;
|
||||
|
||||
+mod signalfd;
|
||||
+pub use self::signalfd::*;
|
||||
+
|
||||
type SigSet = BitSet<[u64; 1]>;
|
||||
|
||||
pub(crate) const SIG_DFL: usize = 0;
|
||||
@@ -154,10 +160,15 @@ pub extern "C" fn killpg(pgrp: pid_t, sig: c_int) -> c_int {
|
||||
/// See <https://pubs.opengroup.org/onlinepubs/9799919799/functions/pthread_kill.html>.
|
||||
#[unsafe(no_mangle)]
|
||||
pub unsafe extern "C" fn pthread_kill(thread: pthread_t, sig: c_int) -> c_int {
|
||||
- let os_tid = {
|
||||
- let pthread = unsafe { &*(thread as *const crate::pthread::Pthread) };
|
||||
- unsafe { pthread.os_tid.get().read() }
|
||||
- };
|
||||
+ let pthread = unsafe { &*(thread as *const crate::pthread::Pthread) };
|
||||
+ let os_tid = unsafe { pthread.os_tid.get().read() };
|
||||
+ let flags = crate::pthread::PthreadFlags::from_bits_retain(
|
||||
+ pthread.flags.load(Ordering::Acquire),
|
||||
+ );
|
||||
+ if flags.contains(crate::pthread::PthreadFlags::FINISHED) {
|
||||
+ return errno::ESRCH;
|
||||
+ }
|
||||
+
|
||||
crate::header::pthread::e(unsafe { Sys::rlct_kill(os_tid, sig as usize) })
|
||||
}
|
||||
|
||||
@@ -168,12 +179,10 @@ pub unsafe extern "C" fn pthread_sigmask(
|
||||
set: *const sigset_t,
|
||||
oldset: *mut sigset_t,
|
||||
) -> c_int {
|
||||
- // On Linux and Redox, pthread_sigmask and sigprocmask are equivalent
|
||||
- if unsafe { sigprocmask(how, set, oldset) } == 0 {
|
||||
- 0
|
||||
- } else {
|
||||
- //TODO: Fix race
|
||||
- platform::ERRNO.get()
|
||||
+ let filtered_set = unsafe { set.as_ref().map(|&block| block & !RLCT_SIGNAL_MASK) };
|
||||
+ match unsafe { Sys::sigprocmask(how, filtered_set.as_ref(), oldset.as_mut()) } {
|
||||
+ Ok(()) => 0,
|
||||
+ Err(errno) => errno.0,
|
||||
}
|
||||
}
|
||||
|
||||
@@ -0,0 +1,380 @@
|
||||
diff --git a/src/sync/pthread_mutex.rs b/src/sync/pthread_mutex.rs
|
||||
index 29bad63..af0c429 100644
|
||||
--- a/src/sync/pthread_mutex.rs
|
||||
+++ b/src/sync/pthread_mutex.rs
|
||||
@@ -1,3 +1,4 @@
|
||||
+use alloc::boxed::Box;
|
||||
use core::{
|
||||
cell::Cell,
|
||||
sync::atomic::{AtomicU32 as AtomicUint, Ordering},
|
||||
@@ -6,10 +7,9 @@ use core::{
|
||||
use crate::{
|
||||
error::Errno,
|
||||
header::{bits_timespec::timespec, errno::*, pthread::*},
|
||||
+ platform::{Pal, Sys, types::c_int},
|
||||
};
|
||||
|
||||
-use crate::platform::{Pal, Sys, types::c_int};
|
||||
-
|
||||
use super::FutexWaitResult;
|
||||
|
||||
pub struct RlctMutex {
|
||||
@@ -21,15 +21,22 @@ pub struct RlctMutex {
|
||||
robust: bool,
|
||||
}
|
||||
|
||||
+pub struct RobustMutexNode {
|
||||
+ pub next: *mut RobustMutexNode,
|
||||
+ pub prev: *mut RobustMutexNode,
|
||||
+ pub mutex: *const RlctMutex,
|
||||
+}
|
||||
+
|
||||
const STATE_UNLOCKED: u32 = 0;
|
||||
const WAITING_BIT: u32 = 1 << 31;
|
||||
-const INDEX_MASK: u32 = !WAITING_BIT;
|
||||
+const FUTEX_OWNER_DIED: u32 = 1 << 30;
|
||||
+const INDEX_MASK: u32 = !(WAITING_BIT | FUTEX_OWNER_DIED);
|
||||
|
||||
// TODO: Lower limit is probably better.
|
||||
const RECURSIVE_COUNT_MAX_INCLUSIVE: u32 = u32::MAX;
|
||||
// TODO: How many spins should we do before it becomes more time-economical to enter kernel mode
|
||||
// via futexes?
|
||||
-const SPIN_COUNT: usize = 0;
|
||||
+const SPIN_COUNT: usize = 100;
|
||||
|
||||
impl RlctMutex {
|
||||
pub(crate) fn new(attr: &RlctMutexAttr) -> Result<Self, Errno> {
|
||||
@@ -69,13 +76,25 @@ impl RlctMutex {
|
||||
Ok(0)
|
||||
}
|
||||
pub fn make_consistent(&self) -> Result<(), Errno> {
|
||||
- todo_skip!(0, "pthread robust mutexes: not implemented");
|
||||
- Ok(())
|
||||
+ debug_assert!(self.robust, "make_consistent called on non-robust mutex");
|
||||
+
|
||||
+ if !self.robust {
|
||||
+ return Err(Errno(EINVAL));
|
||||
+ }
|
||||
+
|
||||
+ let current = self.inner.load(Ordering::Relaxed);
|
||||
+ let owner = current & INDEX_MASK;
|
||||
+
|
||||
+ if owner == os_tid_invalid_after_fork() && current & FUTEX_OWNER_DIED != 0 {
|
||||
+ self.inner.store(0, Ordering::Release);
|
||||
+ Ok(())
|
||||
+ } else {
|
||||
+ Err(Errno(EINVAL))
|
||||
+ }
|
||||
}
|
||||
fn lock_inner(&self, deadline: Option<×pec>) -> Result<(), Errno> {
|
||||
let this_thread = os_tid_invalid_after_fork();
|
||||
-
|
||||
- //let mut spins_left = SPIN_COUNT;
|
||||
+ let mut spins_left = SPIN_COUNT;
|
||||
|
||||
loop {
|
||||
let result = self.inner.compare_exchange_weak(
|
||||
@@ -86,45 +105,59 @@ impl RlctMutex {
|
||||
);
|
||||
|
||||
match result {
|
||||
- // CAS succeeded
|
||||
- Ok(_) => {
|
||||
- if self.ty == Ty::Recursive {
|
||||
- self.increment_recursive_count()?;
|
||||
- }
|
||||
- return Ok(());
|
||||
- }
|
||||
- // CAS failed, but the mutex was recursive and we already own the lock.
|
||||
+ Ok(_) => return self.finish_lock_acquire(false),
|
||||
Err(thread) if thread & INDEX_MASK == this_thread && self.ty == Ty::Recursive => {
|
||||
self.increment_recursive_count()?;
|
||||
return Ok(());
|
||||
}
|
||||
- // CAS failed, but the mutex was error-checking and we already own the lock.
|
||||
Err(thread) if thread & INDEX_MASK == this_thread && self.ty == Ty::Errck => {
|
||||
- return Err(Errno(EAGAIN));
|
||||
+ return Err(Errno(EDEADLK));
|
||||
}
|
||||
- // CAS spuriously failed, simply retry the CAS. TODO: Use core::hint::spin_loop()?
|
||||
- Err(thread) if thread & INDEX_MASK == 0 => {
|
||||
- continue;
|
||||
+ Err(thread) if thread & FUTEX_OWNER_DIED != 0 && thread & INDEX_MASK == 0 => {
|
||||
+ return Err(Errno(ENOTRECOVERABLE));
|
||||
}
|
||||
- // CAS failed because some other thread owned the lock. We must now wait.
|
||||
+ Err(thread) if thread & FUTEX_OWNER_DIED != 0 => {
|
||||
+ if !self.robust {
|
||||
+ return Err(Errno(ENOTRECOVERABLE));
|
||||
+ }
|
||||
+
|
||||
+ let new_value = (thread & WAITING_BIT) | FUTEX_OWNER_DIED | this_thread;
|
||||
+ match self.inner.compare_exchange(
|
||||
+ thread,
|
||||
+ new_value,
|
||||
+ Ordering::Acquire,
|
||||
+ Ordering::Relaxed,
|
||||
+ ) {
|
||||
+ Ok(_) => return self.finish_lock_acquire(true),
|
||||
+ Err(_) => continue,
|
||||
+ }
|
||||
+ }
|
||||
+ Err(thread) if thread & INDEX_MASK == 0 => continue,
|
||||
Err(thread) => {
|
||||
- /*if spins_left > 0 {
|
||||
- // TODO: Faster to spin trying to load the flag, compared to CAS?
|
||||
+ let owner = thread & INDEX_MASK;
|
||||
+
|
||||
+ if !crate::pthread::mutex_owner_id_is_live(owner) {
|
||||
+ if !self.robust {
|
||||
+ return Err(Errno(ENOTRECOVERABLE));
|
||||
+ }
|
||||
+
|
||||
+ let new_value = (thread & WAITING_BIT) | FUTEX_OWNER_DIED | this_thread;
|
||||
+ match self.inner.compare_exchange(
|
||||
+ thread,
|
||||
+ new_value,
|
||||
+ Ordering::Acquire,
|
||||
+ Ordering::Relaxed,
|
||||
+ ) {
|
||||
+ Ok(_) => return self.finish_lock_acquire(true),
|
||||
+ Err(_) => continue,
|
||||
+ }
|
||||
+ }
|
||||
+
|
||||
+ if spins_left > 0 {
|
||||
spins_left -= 1;
|
||||
core::hint::spin_loop();
|
||||
continue;
|
||||
}
|
||||
-
|
||||
- spins_left = SPIN_COUNT;
|
||||
-
|
||||
- let inner = self.inner.fetch_or(WAITING_BIT, Ordering::Relaxed);
|
||||
-
|
||||
- if inner == STATE_UNLOCKED {
|
||||
- continue;
|
||||
- }*/
|
||||
-
|
||||
- // If the mutex is not robust, simply futex_wait until unblocked.
|
||||
- //crate::sync::futex_wait(&self.inner, inner | WAITING_BIT, None);
|
||||
if crate::sync::futex_wait(&self.inner, thread, deadline)
|
||||
== FutexWaitResult::TimedOut
|
||||
{
|
||||
@@ -140,6 +173,20 @@ impl RlctMutex {
|
||||
pub fn lock_with_timeout(&self, deadline: ×pec) -> Result<(), Errno> {
|
||||
self.lock_inner(Some(deadline))
|
||||
}
|
||||
+ fn finish_lock_acquire(&self, owner_dead: bool) -> Result<(), Errno> {
|
||||
+ if self.ty == Ty::Recursive {
|
||||
+ self.increment_recursive_count()?;
|
||||
+ }
|
||||
+ if self.robust {
|
||||
+ add_to_robust_list(self);
|
||||
+ }
|
||||
+
|
||||
+ if owner_dead {
|
||||
+ Err(Errno(EOWNERDEAD))
|
||||
+ } else {
|
||||
+ Ok(())
|
||||
+ }
|
||||
+ }
|
||||
fn increment_recursive_count(&self) -> Result<(), Errno> {
|
||||
// We don't have to worry about asynchronous signals here, since pthread_mutex_trylock
|
||||
// is not async-signal-safe.
|
||||
@@ -161,41 +208,65 @@ impl RlctMutex {
|
||||
pub fn try_lock(&self) -> Result<(), Errno> {
|
||||
let this_thread = os_tid_invalid_after_fork();
|
||||
|
||||
- // TODO: If recursive, omitting CAS may be faster if it is already owned by this thread.
|
||||
- let result = self.inner.compare_exchange(
|
||||
- STATE_UNLOCKED,
|
||||
- this_thread,
|
||||
- Ordering::Acquire,
|
||||
- Ordering::Relaxed,
|
||||
- );
|
||||
+ loop {
|
||||
+ let current = self.inner.load(Ordering::Relaxed);
|
||||
+
|
||||
+ if current == STATE_UNLOCKED {
|
||||
+ match self.inner.compare_exchange(
|
||||
+ STATE_UNLOCKED,
|
||||
+ this_thread,
|
||||
+ Ordering::Acquire,
|
||||
+ Ordering::Relaxed,
|
||||
+ ) {
|
||||
+ Ok(_) => return self.finish_lock_acquire(false),
|
||||
+ Err(_) => continue,
|
||||
+ }
|
||||
+ }
|
||||
|
||||
- if self.ty == Ty::Recursive {
|
||||
- match result {
|
||||
- Err(index) if index & INDEX_MASK != this_thread => return Err(Errno(EBUSY)),
|
||||
- _ => (),
|
||||
+ let owner = current & INDEX_MASK;
|
||||
+
|
||||
+ if owner == this_thread && self.ty == Ty::Recursive {
|
||||
+ self.increment_recursive_count()?;
|
||||
+ return Ok(());
|
||||
}
|
||||
|
||||
- self.increment_recursive_count()?;
|
||||
+ if owner == this_thread && self.ty == Ty::Errck {
|
||||
+ return Err(Errno(EDEADLK));
|
||||
+ }
|
||||
|
||||
- return Ok(());
|
||||
- }
|
||||
+ if current & FUTEX_OWNER_DIED != 0 && owner == 0 {
|
||||
+ return Err(Errno(ENOTRECOVERABLE));
|
||||
+ }
|
||||
|
||||
- match result {
|
||||
- Ok(_) => Ok(()),
|
||||
- Err(index) if index & INDEX_MASK == this_thread && self.ty == Ty::Errck => {
|
||||
- Err(Errno(EDEADLK))
|
||||
+ if current & FUTEX_OWNER_DIED != 0 || (owner != 0 && !crate::pthread::mutex_owner_id_is_live(owner)) {
|
||||
+ if !self.robust {
|
||||
+ return Err(Errno(ENOTRECOVERABLE));
|
||||
+ }
|
||||
+
|
||||
+ let new_value = (current & WAITING_BIT) | FUTEX_OWNER_DIED | this_thread;
|
||||
+ match self.inner.compare_exchange(
|
||||
+ current,
|
||||
+ new_value,
|
||||
+ Ordering::Acquire,
|
||||
+ Ordering::Relaxed,
|
||||
+ ) {
|
||||
+ Ok(_) => return self.finish_lock_acquire(true),
|
||||
+ Err(_) => continue,
|
||||
+ }
|
||||
}
|
||||
- Err(_) => Err(Errno(EBUSY)),
|
||||
+
|
||||
+ return Err(Errno(EBUSY));
|
||||
}
|
||||
}
|
||||
// Safe because we are not protecting any data.
|
||||
pub fn unlock(&self) -> Result<(), Errno> {
|
||||
+ let current = self.inner.load(Ordering::Relaxed);
|
||||
+
|
||||
if self.robust || matches!(self.ty, Ty::Recursive | Ty::Errck) {
|
||||
- if self.inner.load(Ordering::Relaxed) & INDEX_MASK != os_tid_invalid_after_fork() {
|
||||
+ if current & INDEX_MASK != os_tid_invalid_after_fork() {
|
||||
return Err(Errno(EPERM));
|
||||
}
|
||||
|
||||
- // TODO: Is this fence correct?
|
||||
core::sync::atomic::fence(Ordering::Acquire);
|
||||
}
|
||||
|
||||
@@ -208,18 +279,47 @@ impl RlctMutex {
|
||||
}
|
||||
}
|
||||
|
||||
- self.inner.store(STATE_UNLOCKED, Ordering::Release);
|
||||
- crate::sync::futex_wake(&self.inner, i32::MAX);
|
||||
- /*let was_waiting = self.inner.swap(STATE_UNLOCKED, Ordering::Release) & WAITING_BIT != 0;
|
||||
+ if self.robust {
|
||||
+ remove_from_robust_list(self);
|
||||
+ }
|
||||
|
||||
- if was_waiting {
|
||||
- let _ = crate::sync::futex_wake(&self.inner, 1);
|
||||
- }*/
|
||||
+ let new_state = if self.robust && current & FUTEX_OWNER_DIED != 0 {
|
||||
+ FUTEX_OWNER_DIED
|
||||
+ } else {
|
||||
+ STATE_UNLOCKED
|
||||
+ };
|
||||
+
|
||||
+ self.inner.store(new_state, Ordering::Release);
|
||||
+ crate::sync::futex_wake(&self.inner, i32::MAX);
|
||||
|
||||
Ok(())
|
||||
}
|
||||
}
|
||||
|
||||
+pub(crate) unsafe fn mark_robust_mutexes_dead(thread: &crate::pthread::Pthread) {
|
||||
+ let head = thread.robust_list_head.get();
|
||||
+ let this_thread = os_tid_invalid_after_fork();
|
||||
+ let mut node = unsafe { *head };
|
||||
+
|
||||
+ unsafe { *head = core::ptr::null_mut() };
|
||||
+
|
||||
+ while !node.is_null() {
|
||||
+ let next = unsafe { (*node).next };
|
||||
+ let mutex = unsafe { &*(*node).mutex };
|
||||
+ let current = mutex.inner.load(Ordering::Relaxed);
|
||||
+
|
||||
+ if current & INDEX_MASK == this_thread {
|
||||
+ mutex
|
||||
+ .inner
|
||||
+ .store((current & WAITING_BIT) | FUTEX_OWNER_DIED | this_thread, Ordering::Release);
|
||||
+ crate::sync::futex_wake(&mutex.inner, i32::MAX);
|
||||
+ }
|
||||
+
|
||||
+ unsafe { drop(Box::from_raw(node)) };
|
||||
+ node = next;
|
||||
+ }
|
||||
+}
|
||||
+
|
||||
#[repr(u8)]
|
||||
#[derive(PartialEq)]
|
||||
enum Ty {
|
||||
@@ -237,6 +337,54 @@ enum Ty {
|
||||
#[thread_local]
|
||||
static CACHED_OS_TID_INVALID_AFTER_FORK: Cell<u32> = Cell::new(0);
|
||||
|
||||
+fn add_to_robust_list(mutex: &RlctMutex) {
|
||||
+ let thread = crate::pthread::current_thread().expect("current thread not present");
|
||||
+ let node_ptr = Box::into_raw(Box::new(RobustMutexNode {
|
||||
+ next: core::ptr::null_mut(),
|
||||
+ prev: core::ptr::null_mut(),
|
||||
+ mutex: core::ptr::from_ref(mutex),
|
||||
+ }));
|
||||
+
|
||||
+ unsafe {
|
||||
+ let head = thread.robust_list_head.get();
|
||||
+ if !(*head).is_null() {
|
||||
+ (**head).prev = node_ptr;
|
||||
+ }
|
||||
+ (*node_ptr).next = *head;
|
||||
+ *head = node_ptr;
|
||||
+ }
|
||||
+}
|
||||
+
|
||||
+fn remove_from_robust_list(mutex: &RlctMutex) {
|
||||
+ let thread = match crate::pthread::current_thread() {
|
||||
+ Some(thread) => thread,
|
||||
+ None => return,
|
||||
+ };
|
||||
+
|
||||
+ unsafe {
|
||||
+ let mut node = *thread.robust_list_head.get();
|
||||
+
|
||||
+ while !node.is_null() {
|
||||
+ if core::ptr::eq((*node).mutex, core::ptr::from_ref(mutex)) {
|
||||
+ if !(*node).prev.is_null() {
|
||||
+ (*(*node).prev).next = (*node).next;
|
||||
+ } else {
|
||||
+ *thread.robust_list_head.get() = (*node).next;
|
||||
+ }
|
||||
+
|
||||
+ if !(*node).next.is_null() {
|
||||
+ (*(*node).next).prev = (*node).prev;
|
||||
+ }
|
||||
+
|
||||
+ drop(Box::from_raw(node));
|
||||
+ return;
|
||||
+ }
|
||||
+
|
||||
+ node = (*node).next;
|
||||
+ }
|
||||
+ }
|
||||
+}
|
||||
+
|
||||
// Assumes TIDs are unique between processes, which I only know is true for Redox.
|
||||
fn os_tid_invalid_after_fork() -> u32 {
|
||||
// TODO: Coordinate better if using shared == PTHREAD_PROCESS_SHARED, with up to 2^32 separate
|
||||
@@ -0,0 +1,130 @@
|
||||
diff --git a/src/header/sched/mod.rs b/src/header/sched/mod.rs
|
||||
index bcdd346..6066550 100644
|
||||
--- a/src/header/sched/mod.rs
|
||||
+++ b/src/header/sched/mod.rs
|
||||
@@ -27,43 +27,110 @@ pub const SCHED_RR: c_int = 1;
|
||||
pub const SCHED_OTHER: c_int = 2;
|
||||
|
||||
/// See <https://pubs.opengroup.org/onlinepubs/9799919799/functions/sched_get_priority_max.html>.
|
||||
-// #[unsafe(no_mangle)]
|
||||
+#[unsafe(no_mangle)]
|
||||
pub extern "C" fn sched_get_priority_max(policy: c_int) -> c_int {
|
||||
- todo!()
|
||||
+ match policy {
|
||||
+ SCHED_FIFO | SCHED_RR => 99,
|
||||
+ SCHED_OTHER => 0,
|
||||
+ _ => {
|
||||
+ crate::platform::ERRNO.set(crate::header::errno::EINVAL);
|
||||
+ -1
|
||||
+ }
|
||||
+ }
|
||||
}
|
||||
|
||||
-/// See <https://pubs.opengroup.org/onlinepubs/9799919799/functions/sched_get_priority_max.html>.
|
||||
-// #[unsafe(no_mangle)]
|
||||
+/// See <https://pubs.opengroup.org/onlinepubs/9799919799/functions/sched_get_priority_min.html>.
|
||||
+#[unsafe(no_mangle)]
|
||||
pub extern "C" fn sched_get_priority_min(policy: c_int) -> c_int {
|
||||
- todo!()
|
||||
+ match policy {
|
||||
+ SCHED_FIFO | SCHED_RR => 1,
|
||||
+ SCHED_OTHER => 0,
|
||||
+ _ => {
|
||||
+ crate::platform::ERRNO.set(crate::header::errno::EINVAL);
|
||||
+ -1
|
||||
+ }
|
||||
+ }
|
||||
}
|
||||
|
||||
/// See <https://pubs.opengroup.org/onlinepubs/9799919799/functions/sched_getparam.html>.
|
||||
-// #[unsafe(no_mangle)]
|
||||
+#[unsafe(no_mangle)]
|
||||
pub unsafe extern "C" fn sched_getparam(pid: pid_t, param: *mut sched_param) -> c_int {
|
||||
- todo!()
|
||||
+ if pid != 0 {
|
||||
+ crate::platform::ERRNO.set(crate::header::errno::ESRCH);
|
||||
+ return -1;
|
||||
+ }
|
||||
+ crate::platform::ERRNO.set(crate::header::errno::ENOSYS);
|
||||
+ -1
|
||||
+}
|
||||
+
|
||||
+/// See <https://pubs.opengroup.org/onlinepubs/9799919799/functions/sched_getscheduler.html>.
|
||||
+#[unsafe(no_mangle)]
|
||||
+pub extern "C" fn sched_getscheduler(pid: pid_t) -> c_int {
|
||||
+ if pid != 0 {
|
||||
+ crate::platform::ERRNO.set(crate::header::errno::ESRCH);
|
||||
+ return -1;
|
||||
+ }
|
||||
+ crate::platform::ERRNO.set(crate::header::errno::ENOSYS);
|
||||
+ -1
|
||||
}
|
||||
|
||||
/// See <https://pubs.opengroup.org/onlinepubs/9799919799/functions/sched_rr_get_interval.html>.
|
||||
-// #[unsafe(no_mangle)]
|
||||
-pub extern "C" fn sched_rr_get_interval(pid: pid_t, time: *const timespec) -> c_int {
|
||||
- todo!()
|
||||
+#[unsafe(no_mangle)]
|
||||
+pub extern "C" fn sched_rr_get_interval(pid: pid_t, tp: *mut timespec) -> c_int {
|
||||
+ if pid != 0 {
|
||||
+ crate::platform::ERRNO.set(crate::header::errno::ESRCH);
|
||||
+ return -1;
|
||||
+ }
|
||||
+ if tp.is_null() {
|
||||
+ crate::platform::ERRNO.set(crate::header::errno::EINVAL);
|
||||
+ return -1;
|
||||
+ }
|
||||
+ unsafe {
|
||||
+ (*tp).tv_sec = 0;
|
||||
+ (*tp).tv_nsec = 100_000_000; // 100ms default SCHED_RR quantum
|
||||
+ }
|
||||
+ 0
|
||||
}
|
||||
|
||||
/// See <https://pubs.opengroup.org/onlinepubs/9799919799/functions/sched_setparam.html>.
|
||||
-// #[unsafe(no_mangle)]
|
||||
-pub unsafe extern "C" fn sched_setparam(pid: pid_t, param: *const sched_param) -> c_int {
|
||||
- todo!()
|
||||
+#[unsafe(no_mangle)]
|
||||
+pub unsafe extern "C" fn sched_setparam(pid: pid_t, _param: *const sched_param) -> c_int {
|
||||
+ if pid != 0 {
|
||||
+ crate::platform::ERRNO.set(crate::header::errno::ESRCH);
|
||||
+ return -1;
|
||||
+ }
|
||||
+ crate::platform::ERRNO.set(crate::header::errno::ENOSYS);
|
||||
+ -1
|
||||
}
|
||||
|
||||
/// See <https://pubs.opengroup.org/onlinepubs/9799919799/functions/sched_setscheduler.html>.
|
||||
-// #[unsafe(no_mangle)]
|
||||
+#[unsafe(no_mangle)]
|
||||
pub extern "C" fn sched_setscheduler(
|
||||
pid: pid_t,
|
||||
policy: c_int,
|
||||
param: *const sched_param,
|
||||
) -> c_int {
|
||||
- todo!()
|
||||
+ if pid != 0 {
|
||||
+ crate::platform::ERRNO.set(crate::header::errno::ESRCH);
|
||||
+ return -1;
|
||||
+ }
|
||||
+ match policy {
|
||||
+ SCHED_OTHER => {
|
||||
+ if !param.is_null() && unsafe { (*param).sched_priority } != 0 {
|
||||
+ crate::platform::ERRNO.set(crate::header::errno::EINVAL);
|
||||
+ return -1;
|
||||
+ }
|
||||
+ SCHED_OTHER
|
||||
+ }
|
||||
+ SCHED_FIFO | SCHED_RR => {
|
||||
+ crate::platform::ERRNO.set(crate::header::errno::ENOSYS);
|
||||
+ -1
|
||||
+ }
|
||||
+ _ => {
|
||||
+ crate::platform::ERRNO.set(crate::header::errno::EINVAL);
|
||||
+ -1
|
||||
+ }
|
||||
+ }
|
||||
}
|
||||
|
||||
/// See <https://pubs.opengroup.org/onlinepubs/9799919799/functions/sched_yield.html>.
|
||||
@@ -0,0 +1,231 @@
|
||||
diff --git a/src/header/pthread/cbindgen.toml b/src/header/pthread/cbindgen.toml
|
||||
--- a/src/header/pthread/cbindgen.toml
|
||||
+++ b/src/header/pthread/cbindgen.toml
|
||||
@@ -10,0 +11 @@ cpp_compat = true
|
||||
+"cpu_set_t" = "struct cpu_set_t"
|
||||
diff --git a/src/header/pthread/mod.rs b/src/header/pthread/mod.rs
|
||||
--- a/src/header/pthread/mod.rs
|
||||
+++ b/src/header/pthread/mod.rs
|
||||
@@ -6 +6,8 @@ use alloc::collections::LinkedList;
|
||||
-use core::{cell::Cell, ptr::NonNull};
|
||||
+use core::{cell::Cell, mem::size_of, ptr::NonNull};
|
||||
+
|
||||
+#[cfg(target_os = "linux")]
|
||||
+use sc::syscall;
|
||||
+#[cfg(target_os = "redox")]
|
||||
+use redox_rt::proc::FdGuard;
|
||||
+#[cfg(target_os = "redox")]
|
||||
+use syscall;
|
||||
@@ -9,0 +17 @@ use crate::{
|
||||
+ header::errno::EINVAL,
|
||||
@@ -14 +22 @@ use crate::{
|
||||
- c_int, c_uchar, c_uint, c_void, clockid_t, pthread_attr_t, pthread_barrier_t,
|
||||
+ c_char, c_int, c_uchar, c_uint, c_void, clockid_t, pthread_attr_t, pthread_barrier_t,
|
||||
@@ -22,0 +31,3 @@ use crate::{
|
||||
+#[cfg(target_os = "linux")]
|
||||
+use crate::platform::sys::e_raw;
|
||||
+
|
||||
@@ -29,0 +41,93 @@ pub fn e(result: Result<(), Errno>) -> i32 {
|
||||
+const RLCT_AFFINITY_BYTES: usize = size_of::<u64>();
|
||||
+const RLCT_MAX_AFFINITY_CPUS: usize = u64::BITS as usize;
|
||||
+
|
||||
+fn cpuset_bytes<'a>(cpusetsize: size_t, cpuset: *const cpu_set_t) -> Result<&'a [u8], Errno> {
|
||||
+ if cpuset.is_null() || !(RLCT_AFFINITY_BYTES..=size_of::<cpu_set_t>()).contains(&cpusetsize) {
|
||||
+ return Err(Errno(EINVAL));
|
||||
+ }
|
||||
+
|
||||
+ Ok(unsafe { core::slice::from_raw_parts(cpuset.cast::<u8>(), cpusetsize) })
|
||||
+}
|
||||
+
|
||||
+fn cpuset_bytes_mut<'a>(
|
||||
+ cpusetsize: size_t,
|
||||
+ cpuset: *mut cpu_set_t,
|
||||
+) -> Result<&'a mut [u8], Errno> {
|
||||
+ if cpuset.is_null() || !(RLCT_AFFINITY_BYTES..=size_of::<cpu_set_t>()).contains(&cpusetsize) {
|
||||
+ return Err(Errno(EINVAL));
|
||||
+ }
|
||||
+
|
||||
+ Ok(unsafe { core::slice::from_raw_parts_mut(cpuset.cast::<u8>(), cpusetsize) })
|
||||
+}
|
||||
+
|
||||
+fn cpuset_to_u64(cpusetsize: size_t, cpuset: *const cpu_set_t) -> Result<u64, Errno> {
|
||||
+ let bytes = cpuset_bytes(cpusetsize, cpuset)?;
|
||||
+ let mut mask = 0_u64;
|
||||
+
|
||||
+ for (byte_index, byte) in bytes.iter().copied().enumerate() {
|
||||
+ for bit in 0..u8::BITS as usize {
|
||||
+ if byte & (1 << bit) == 0 {
|
||||
+ continue;
|
||||
+ }
|
||||
+
|
||||
+ let cpu = byte_index * u8::BITS as usize + bit;
|
||||
+ if cpu >= RLCT_MAX_AFFINITY_CPUS {
|
||||
+ return Err(Errno(EINVAL));
|
||||
+ }
|
||||
+
|
||||
+ mask |= 1_u64 << cpu;
|
||||
+ }
|
||||
+ }
|
||||
+
|
||||
+ Ok(mask)
|
||||
+}
|
||||
+
|
||||
+fn copy_u64_to_cpuset(mask: u64, cpusetsize: size_t, cpuset: *mut cpu_set_t) -> Result<(), Errno> {
|
||||
+ let bytes = cpuset_bytes_mut(cpusetsize, cpuset)?;
|
||||
+ bytes.fill(0);
|
||||
+
|
||||
+ for (byte_index, dst) in bytes.iter_mut().take(RLCT_AFFINITY_BYTES).enumerate() {
|
||||
+ *dst = (mask >> (byte_index * u8::BITS as usize)) as u8;
|
||||
+ }
|
||||
+
|
||||
+ Ok(())
|
||||
+}
|
||||
+
|
||||
+#[cfg(target_os = "redox")]
|
||||
+fn redox_set_thread_affinity(thread: &pthread::Pthread, mask: u64) -> Result<(), Errno> {
|
||||
+ let mut kernel_cpuset = cpu_set_t::default();
|
||||
+ kernel_cpuset.__bits[0] = mask;
|
||||
+
|
||||
+ let handle = FdGuard::new(unsafe {
|
||||
+ syscall::dup(thread.os_tid.get().read().thread_fd, b"sched-affinity")?
|
||||
+ });
|
||||
+ let _ = handle.write(unsafe {
|
||||
+ core::slice::from_raw_parts(
|
||||
+ core::ptr::from_ref(&kernel_cpuset).cast::<u8>(),
|
||||
+ size_of::<cpu_set_t>(),
|
||||
+ )
|
||||
+ })?;
|
||||
+
|
||||
+ Ok(())
|
||||
+}
|
||||
+
|
||||
+#[cfg(target_os = "redox")]
|
||||
+fn redox_get_thread_affinity(thread: &pthread::Pthread) -> Result<u64, Errno> {
|
||||
+ let handle = FdGuard::new(unsafe {
|
||||
+ syscall::dup(thread.os_tid.get().read().thread_fd, b"sched-affinity")?
|
||||
+ });
|
||||
+ let mut kernel_cpuset = cpu_set_t::default();
|
||||
+ let _ = handle.read(unsafe {
|
||||
+ core::slice::from_raw_parts_mut(
|
||||
+ core::ptr::from_mut(&mut kernel_cpuset).cast::<u8>(),
|
||||
+ size_of::<cpu_set_t>(),
|
||||
+ )
|
||||
+ })?;
|
||||
+
|
||||
+ if kernel_cpuset.__bits[1..].iter().any(|bits| *bits != 0) {
|
||||
+ return Err(Errno(EINVAL));
|
||||
+ }
|
||||
+
|
||||
+ Ok(kernel_cpuset.__bits[0])
|
||||
+}
|
||||
+
|
||||
@@ -188,0 +293,36 @@ pub unsafe extern "C" fn pthread_getcpuclockid(
|
||||
+/// GNU extension. See <https://man7.org/linux/man-pages/man3/pthread_setaffinity_np.3.html>.
|
||||
+#[unsafe(no_mangle)]
|
||||
+pub unsafe extern "C" fn pthread_getaffinity_np(
|
||||
+ thread: pthread_t,
|
||||
+ cpusetsize: size_t,
|
||||
+ cpuset: *mut cpu_set_t,
|
||||
+) -> c_int {
|
||||
+ let thread: &pthread::Pthread = unsafe { &*thread.cast() };
|
||||
+
|
||||
+ let result = {
|
||||
+ #[cfg(target_os = "redox")]
|
||||
+ {
|
||||
+ redox_get_thread_affinity(thread).and_then(|mask| copy_u64_to_cpuset(mask, cpusetsize, cpuset))
|
||||
+ }
|
||||
+
|
||||
+ #[cfg(target_os = "linux")]
|
||||
+ {
|
||||
+ if cpuset.is_null() {
|
||||
+ Err(Errno(EINVAL))
|
||||
+ } else {
|
||||
+ e_raw(unsafe {
|
||||
+ syscall!(
|
||||
+ SCHED_GETAFFINITY,
|
||||
+ thread.os_tid.get().read().thread_id,
|
||||
+ cpusetsize,
|
||||
+ cpuset.cast::<c_void>()
|
||||
+ )
|
||||
+ })
|
||||
+ .map(|_| ())
|
||||
+ }
|
||||
+ }
|
||||
+ };
|
||||
+
|
||||
+ e(result)
|
||||
+}
|
||||
+
|
||||
@@ -237,0 +378,36 @@ pub unsafe extern "C" fn pthread_self() -> pthread_t {
|
||||
+/// GNU extension. See <https://man7.org/linux/man-pages/man3/pthread_setaffinity_np.3.html>.
|
||||
+#[unsafe(no_mangle)]
|
||||
+pub unsafe extern "C" fn pthread_setaffinity_np(
|
||||
+ thread: pthread_t,
|
||||
+ cpusetsize: size_t,
|
||||
+ cpuset: *const cpu_set_t,
|
||||
+) -> c_int {
|
||||
+ let thread: &pthread::Pthread = unsafe { &*thread.cast() };
|
||||
+
|
||||
+ let result = {
|
||||
+ #[cfg(target_os = "redox")]
|
||||
+ {
|
||||
+ cpuset_to_u64(cpusetsize, cpuset).and_then(|mask| redox_set_thread_affinity(thread, mask))
|
||||
+ }
|
||||
+
|
||||
+ #[cfg(target_os = "linux")]
|
||||
+ {
|
||||
+ if cpuset.is_null() {
|
||||
+ Err(Errno(EINVAL))
|
||||
+ } else {
|
||||
+ e_raw(unsafe {
|
||||
+ syscall!(
|
||||
+ SCHED_SETAFFINITY,
|
||||
+ thread.os_tid.get().read().thread_id,
|
||||
+ cpusetsize,
|
||||
+ cpuset.cast::<c_void>()
|
||||
+ )
|
||||
+ })
|
||||
+ .map(|_| ())
|
||||
+ }
|
||||
+ }
|
||||
+ };
|
||||
+
|
||||
+ e(result)
|
||||
+}
|
||||
+
|
||||
diff --git a/src/header/sched/cbindgen.toml b/src/header/sched/cbindgen.toml
|
||||
--- a/src/header/sched/cbindgen.toml
|
||||
+++ b/src/header/sched/cbindgen.toml
|
||||
@@ -22,0 +23,14 @@ prefix_with_name = true
|
||||
+
|
||||
+[export]
|
||||
+include = [
|
||||
+ "sched_param",
|
||||
+ "cpu_set_t",
|
||||
+ "sched_get_priority_max",
|
||||
+ "sched_get_priority_min",
|
||||
+ "sched_getparam",
|
||||
+ "sched_getscheduler",
|
||||
+ "sched_rr_get_interval",
|
||||
+ "sched_setparam",
|
||||
+ "sched_setscheduler",
|
||||
+ "sched_yield",
|
||||
+]
|
||||
diff --git a/src/header/sched/mod.rs b/src/header/sched/mod.rs
|
||||
--- a/src/header/sched/mod.rs
|
||||
+++ b/src/header/sched/mod.rs
|
||||
@@ -12,0 +13,2 @@
|
||||
+pub const CPU_SETSIZE: usize = 1024;
|
||||
+
|
||||
@@ -20,0 +23,7 @@
|
||||
+/// Linux-compatible CPU affinity mask storage.
|
||||
+#[repr(C)]
|
||||
+#[derive(Clone, Copy, Debug, Default)]
|
||||
+pub struct cpu_set_t {
|
||||
+ pub __bits: [u64; 16],
|
||||
+}
|
||||
+
|
||||
@@ -143,0 +153,3 @@
|
||||
+
|
||||
+#[unsafe(no_mangle)]
|
||||
+pub unsafe extern "C" fn cbindgen_stupid_struct_user_for_cpu_set_t(_: cpu_set_t) {}
|
||||
@@ -0,0 +1,326 @@
|
||||
diff --git a/src/header/pthread/mod.rs b/src/header/pthread/mod.rs
|
||||
index c742a42..008090a 100644
|
||||
--- a/src/header/pthread/mod.rs
|
||||
+++ b/src/header/pthread/mod.rs
|
||||
@@ -3,15 +3,26 @@
|
||||
//! See <https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/pthread.h.html>.
|
||||
|
||||
use alloc::collections::LinkedList;
|
||||
-use core::{cell::Cell, ptr::NonNull};
|
||||
+use core::{cell::Cell, mem::size_of, ptr::NonNull};
|
||||
+
|
||||
+#[cfg(target_os = "redox")]
|
||||
+use redox_rt::proc::FdGuard;
|
||||
+#[cfg(target_os = "linux")]
|
||||
+use sc::syscall;
|
||||
+#[cfg(target_os = "redox")]
|
||||
+use syscall;
|
||||
|
||||
use crate::{
|
||||
error::Errno,
|
||||
- header::{bits_timespec::timespec, sched::*},
|
||||
+ header::{
|
||||
+ bits_timespec::timespec,
|
||||
+ errno::{EINVAL, ERANGE},
|
||||
+ sched::*,
|
||||
+ },
|
||||
platform::{
|
||||
Pal, Sys,
|
||||
types::{
|
||||
- c_int, c_uchar, c_uint, c_void, clockid_t, pthread_attr_t, pthread_barrier_t,
|
||||
+ c_char, c_int, c_uchar, c_uint, c_void, clockid_t, pthread_attr_t, pthread_barrier_t,
|
||||
pthread_barrierattr_t, pthread_cond_t, pthread_condattr_t, pthread_key_t,
|
||||
pthread_mutex_t, pthread_mutexattr_t, pthread_once_t, pthread_rwlock_t,
|
||||
pthread_rwlockattr_t, pthread_spinlock_t, pthread_t, size_t,
|
||||
@@ -20,6 +31,9 @@ use crate::{
|
||||
pthread,
|
||||
};
|
||||
|
||||
+#[cfg(target_os = "linux")]
|
||||
+use crate::platform::sys::e_raw;
|
||||
+
|
||||
pub fn e(result: Result<(), Errno>) -> i32 {
|
||||
match result {
|
||||
Ok(()) => 0,
|
||||
@@ -27,6 +41,96 @@ pub fn e(result: Result<(), Errno>) -> i32 {
|
||||
}
|
||||
}
|
||||
|
||||
+const RLCT_AFFINITY_BYTES: usize = size_of::<u64>();
|
||||
+const RLCT_MAX_AFFINITY_CPUS: usize = u64::BITS as usize;
|
||||
+
|
||||
+fn cpuset_bytes<'a>(cpusetsize: size_t, cpuset: *const cpu_set_t) -> Result<&'a [u8], Errno> {
|
||||
+ if cpuset.is_null() || !(RLCT_AFFINITY_BYTES..=size_of::<cpu_set_t>()).contains(&cpusetsize) {
|
||||
+ return Err(Errno(EINVAL));
|
||||
+ }
|
||||
+
|
||||
+ Ok(unsafe { core::slice::from_raw_parts(cpuset.cast::<u8>(), cpusetsize) })
|
||||
+}
|
||||
+
|
||||
+fn cpuset_bytes_mut<'a>(cpusetsize: size_t, cpuset: *mut cpu_set_t) -> Result<&'a mut [u8], Errno> {
|
||||
+ if cpuset.is_null() || !(RLCT_AFFINITY_BYTES..=size_of::<cpu_set_t>()).contains(&cpusetsize) {
|
||||
+ return Err(Errno(EINVAL));
|
||||
+ }
|
||||
+
|
||||
+ Ok(unsafe { core::slice::from_raw_parts_mut(cpuset.cast::<u8>(), cpusetsize) })
|
||||
+}
|
||||
+
|
||||
+fn cpuset_to_u64(cpusetsize: size_t, cpuset: *const cpu_set_t) -> Result<u64, Errno> {
|
||||
+ let bytes = cpuset_bytes(cpusetsize, cpuset)?;
|
||||
+ let mut mask = 0_u64;
|
||||
+
|
||||
+ for (byte_index, byte) in bytes.iter().copied().enumerate() {
|
||||
+ for bit in 0..u8::BITS as usize {
|
||||
+ if byte & (1 << bit) == 0 {
|
||||
+ continue;
|
||||
+ }
|
||||
+
|
||||
+ let cpu = byte_index * u8::BITS as usize + bit;
|
||||
+ if cpu >= RLCT_MAX_AFFINITY_CPUS {
|
||||
+ return Err(Errno(EINVAL));
|
||||
+ }
|
||||
+
|
||||
+ mask |= 1_u64 << cpu;
|
||||
+ }
|
||||
+ }
|
||||
+
|
||||
+ Ok(mask)
|
||||
+}
|
||||
+
|
||||
+fn copy_u64_to_cpuset(mask: u64, cpusetsize: size_t, cpuset: *mut cpu_set_t) -> Result<(), Errno> {
|
||||
+ let bytes = cpuset_bytes_mut(cpusetsize, cpuset)?;
|
||||
+ bytes.fill(0);
|
||||
+
|
||||
+ for (byte_index, dst) in bytes.iter_mut().take(RLCT_AFFINITY_BYTES).enumerate() {
|
||||
+ *dst = (mask >> (byte_index * u8::BITS as usize)) as u8;
|
||||
+ }
|
||||
+
|
||||
+ Ok(())
|
||||
+}
|
||||
+
|
||||
+#[cfg(target_os = "redox")]
|
||||
+fn redox_set_thread_affinity(thread: &pthread::Pthread, mask: u64) -> Result<(), Errno> {
|
||||
+ let mut kernel_cpuset = cpu_set_t::default();
|
||||
+ kernel_cpuset.__bits[0] = mask;
|
||||
+
|
||||
+ let handle = FdGuard::new(unsafe {
|
||||
+ syscall::dup(thread.os_tid.get().read().thread_fd, b"sched-affinity")?
|
||||
+ });
|
||||
+ let _ = handle.write(unsafe {
|
||||
+ core::slice::from_raw_parts(
|
||||
+ core::ptr::from_ref(&kernel_cpuset).cast::<u8>(),
|
||||
+ size_of::<cpu_set_t>(),
|
||||
+ )
|
||||
+ })?;
|
||||
+
|
||||
+ Ok(())
|
||||
+}
|
||||
+
|
||||
+#[cfg(target_os = "redox")]
|
||||
+fn redox_get_thread_affinity(thread: &pthread::Pthread) -> Result<u64, Errno> {
|
||||
+ let handle = FdGuard::new(unsafe {
|
||||
+ syscall::dup(thread.os_tid.get().read().thread_fd, b"sched-affinity")?
|
||||
+ });
|
||||
+ let mut kernel_cpuset = cpu_set_t::default();
|
||||
+ let _ = handle.read(unsafe {
|
||||
+ core::slice::from_raw_parts_mut(
|
||||
+ core::ptr::from_mut(&mut kernel_cpuset).cast::<u8>(),
|
||||
+ size_of::<cpu_set_t>(),
|
||||
+ )
|
||||
+ })?;
|
||||
+
|
||||
+ if kernel_cpuset.__bits[1..].iter().any(|bits| *bits != 0) {
|
||||
+ return Err(Errno(EINVAL));
|
||||
+ }
|
||||
+
|
||||
+ Ok(kernel_cpuset.__bits[0])
|
||||
+}
|
||||
+
|
||||
#[derive(Clone)]
|
||||
pub(crate) struct RlctAttr {
|
||||
pub detachstate: c_uchar,
|
||||
@@ -186,6 +290,43 @@ pub unsafe extern "C" fn pthread_getcpuclockid(
|
||||
}
|
||||
}
|
||||
|
||||
+/// GNU extension. See <https://man7.org/linux/man-pages/man3/pthread_setaffinity_np.3.html>.
|
||||
+#[unsafe(no_mangle)]
|
||||
+pub unsafe extern "C" fn pthread_getaffinity_np(
|
||||
+ thread: pthread_t,
|
||||
+ cpusetsize: size_t,
|
||||
+ cpuset: *mut cpu_set_t,
|
||||
+) -> c_int {
|
||||
+ let thread: &pthread::Pthread = unsafe { &*thread.cast() };
|
||||
+
|
||||
+ let result = {
|
||||
+ #[cfg(target_os = "redox")]
|
||||
+ {
|
||||
+ redox_get_thread_affinity(thread)
|
||||
+ .and_then(|mask| copy_u64_to_cpuset(mask, cpusetsize, cpuset))
|
||||
+ }
|
||||
+
|
||||
+ #[cfg(target_os = "linux")]
|
||||
+ {
|
||||
+ if cpuset.is_null() {
|
||||
+ Err(Errno(EINVAL))
|
||||
+ } else {
|
||||
+ e_raw(unsafe {
|
||||
+ syscall!(
|
||||
+ SCHED_GETAFFINITY,
|
||||
+ thread.os_tid.get().read().thread_id,
|
||||
+ cpusetsize,
|
||||
+ cpuset.cast::<c_void>()
|
||||
+ )
|
||||
+ })
|
||||
+ .map(|_| ())
|
||||
+ }
|
||||
+ }
|
||||
+ };
|
||||
+
|
||||
+ e(result)
|
||||
+}
|
||||
+
|
||||
/// See <https://pubs.opengroup.org/onlinepubs/9799919799/functions/pthread_getschedparam.html>.
|
||||
#[unsafe(no_mangle)]
|
||||
pub unsafe extern "C" fn pthread_getschedparam(
|
||||
@@ -235,6 +376,43 @@ pub unsafe extern "C" fn pthread_self() -> pthread_t {
|
||||
core::ptr::from_ref(unsafe { pthread::current_thread().unwrap_unchecked() }) as *mut _
|
||||
}
|
||||
|
||||
+/// GNU extension. See <https://man7.org/linux/man-pages/man3/pthread_setaffinity_np.3.html>.
|
||||
+#[unsafe(no_mangle)]
|
||||
+pub unsafe extern "C" fn pthread_setaffinity_np(
|
||||
+ thread: pthread_t,
|
||||
+ cpusetsize: size_t,
|
||||
+ cpuset: *const cpu_set_t,
|
||||
+) -> c_int {
|
||||
+ let thread: &pthread::Pthread = unsafe { &*thread.cast() };
|
||||
+
|
||||
+ let result = {
|
||||
+ #[cfg(target_os = "redox")]
|
||||
+ {
|
||||
+ cpuset_to_u64(cpusetsize, cpuset)
|
||||
+ .and_then(|mask| redox_set_thread_affinity(thread, mask))
|
||||
+ }
|
||||
+
|
||||
+ #[cfg(target_os = "linux")]
|
||||
+ {
|
||||
+ if cpuset.is_null() {
|
||||
+ Err(Errno(EINVAL))
|
||||
+ } else {
|
||||
+ e_raw(unsafe {
|
||||
+ syscall!(
|
||||
+ SCHED_SETAFFINITY,
|
||||
+ thread.os_tid.get().read().thread_id,
|
||||
+ cpusetsize,
|
||||
+ cpuset.cast::<c_void>()
|
||||
+ )
|
||||
+ })
|
||||
+ .map(|_| ())
|
||||
+ }
|
||||
+ }
|
||||
+ };
|
||||
+
|
||||
+ e(result)
|
||||
+}
|
||||
+
|
||||
/// See <https://pubs.opengroup.org/onlinepubs/9799919799/functions/pthread_setcancelstate.html>.
|
||||
#[unsafe(no_mangle)]
|
||||
pub unsafe extern "C" fn pthread_setcancelstate(state: c_int, oldstate: *mut c_int) -> c_int {
|
||||
@@ -307,6 +485,13 @@ pub unsafe extern "C" fn pthread_testcancel() {
|
||||
unsafe { pthread::testcancel() };
|
||||
}
|
||||
|
||||
+/// <https://man7.org/linux/man-pages/man3/pthread_yield.3.html>
|
||||
+///
|
||||
+/// Non-standard GNU extension. Prefer `sched_yield()` instead.
|
||||
+pub extern "C" fn pthread_yield() {
|
||||
+ let _ = Sys::sched_yield();
|
||||
+}
|
||||
+
|
||||
// Must be the same struct as defined in the pthread_cleanup_push macro.
|
||||
#[repr(C)]
|
||||
pub(crate) struct CleanupLinkedListEntry {
|
||||
@@ -350,3 +535,82 @@ pub(crate) unsafe fn run_destructor_stack() {
|
||||
(entry.routine)(entry.arg);
|
||||
}
|
||||
}
|
||||
+
|
||||
+#[unsafe(no_mangle)]
|
||||
+pub unsafe extern "C" fn pthread_setname_np(thread: pthread_t, name: *const c_char) -> c_int {
|
||||
+ if name.is_null() {
|
||||
+ return EINVAL;
|
||||
+ }
|
||||
+
|
||||
+ let cstr = unsafe { core::ffi::CStr::from_ptr(name) };
|
||||
+ let name_bytes = cstr.to_bytes();
|
||||
+ let len = name_bytes.len().min(31);
|
||||
+
|
||||
+ #[cfg(target_os = "redox")]
|
||||
+ {
|
||||
+ let thread = unsafe { &*thread.cast::<crate::pthread::Pthread>() };
|
||||
+ let os_tid = unsafe { thread.os_tid.get().read() };
|
||||
+ let path = alloc::format!("proc:{}/name", os_tid.thread_fd);
|
||||
+ let fd = match Sys::open(&path, crate::header::fcntl::O_WRONLY, 0) {
|
||||
+ Ok(fd) => fd,
|
||||
+ Err(Errno(code)) => return code,
|
||||
+ };
|
||||
+
|
||||
+ let result = match Sys::write(fd, &name_bytes[..len]) {
|
||||
+ Ok(written) if written == len => 0,
|
||||
+ Ok(_) => crate::header::errno::EIO,
|
||||
+ Err(Errno(code)) => code,
|
||||
+ };
|
||||
+ let _ = Sys::close(fd);
|
||||
+ result
|
||||
+ }
|
||||
+ #[cfg(not(target_os = "redox"))]
|
||||
+ {
|
||||
+ let _ = thread;
|
||||
+ 0
|
||||
+ }
|
||||
+}
|
||||
+
|
||||
+#[unsafe(no_mangle)]
|
||||
+pub unsafe extern "C" fn pthread_getname_np(
|
||||
+ thread: pthread_t,
|
||||
+ name: *mut c_char,
|
||||
+ len: size_t,
|
||||
+) -> c_int {
|
||||
+ if name.is_null() {
|
||||
+ return EINVAL;
|
||||
+ }
|
||||
+ if len == 0 {
|
||||
+ return ERANGE;
|
||||
+ }
|
||||
+
|
||||
+ #[cfg(target_os = "redox")]
|
||||
+ {
|
||||
+ let thread = unsafe { &*thread.cast::<crate::pthread::Pthread>() };
|
||||
+ let os_tid = unsafe { thread.os_tid.get().read() };
|
||||
+ let path = alloc::format!("proc:{}/name", os_tid.thread_fd);
|
||||
+ let fd = match Sys::open(&path, crate::header::fcntl::O_RDONLY, 0) {
|
||||
+ Ok(fd) => fd,
|
||||
+ Err(Errno(code)) => return code,
|
||||
+ };
|
||||
+
|
||||
+ let mut buf = [0u8; 31];
|
||||
+ let result = match Sys::read(fd, &mut buf) {
|
||||
+ Ok(read) if read < len => {
|
||||
+ unsafe { core::ptr::copy_nonoverlapping(buf.as_ptr(), name.cast(), read) };
|
||||
+ unsafe { *name.add(read) = 0 };
|
||||
+ 0
|
||||
+ }
|
||||
+ Ok(_) => ERANGE,
|
||||
+ Err(Errno(code)) => code,
|
||||
+ };
|
||||
+ let _ = Sys::close(fd);
|
||||
+ result
|
||||
+ }
|
||||
+ #[cfg(not(target_os = "redox"))]
|
||||
+ {
|
||||
+ let _ = thread;
|
||||
+ unsafe { *name = 0 };
|
||||
+ 0
|
||||
+ }
|
||||
+}
|
||||
@@ -0,0 +1,104 @@
|
||||
diff --git a/src/platform/redox/mod.rs b/src/platform/redox/mod.rs
|
||||
--- a/src/platform/redox/mod.rs
|
||||
+++ b/src/platform/redox/mod.rs
|
||||
@@ -77,11 +77,74 @@ static mut BRK_CUR: *mut c_void = ptr::null_mut();
|
||||
static mut BRK_END: *mut c_void = ptr::null_mut();
|
||||
|
||||
const PAGE_SIZE: usize = 4096;
|
||||
+const NICE_MIN: c_int = -20;
|
||||
+const NICE_MAX: c_int = 19;
|
||||
|
||||
fn round_up_to_page_size(val: usize) -> Option<usize> {
|
||||
val.checked_add(PAGE_SIZE)
|
||||
.map(|val| (val - 1) / PAGE_SIZE * PAGE_SIZE)
|
||||
}
|
||||
+
|
||||
+fn is_current_process_priority_target(which: c_int, who: id_t) -> bool {
|
||||
+ which == crate::header::sys_resource::PRIO_PROCESS
|
||||
+ && (who == 0 || who == redox_rt::sys::posix_getpid() as id_t)
|
||||
+}
|
||||
+
|
||||
+fn current_process_thread_handle(index: usize) -> Result<Option<FdGuard>> {
|
||||
+ let thread_name = format!("thread-{index}");
|
||||
+ match redox_rt::current_proc_fd().dup(thread_name.as_bytes()) {
|
||||
+ Ok(thread_fd) => Ok(Some(thread_fd)),
|
||||
+ Err(error) if error.errno == ENOENT => Ok(None),
|
||||
+ Err(error) => Err(Errno(error.errno)),
|
||||
+ }
|
||||
+}
|
||||
+
|
||||
+fn current_process_priority_handle(index: usize) -> Result<Option<FdGuard>> {
|
||||
+ let Some(thread_fd) = current_process_thread_handle(index)? else {
|
||||
+ return Ok(None);
|
||||
+ };
|
||||
+
|
||||
+ thread_fd
|
||||
+ .dup(b"priority")
|
||||
+ .map(Some)
|
||||
+ .map_err(|error| Errno(error.errno))
|
||||
+}
|
||||
+
|
||||
+fn read_current_process_nice() -> Result<c_int> {
|
||||
+ let Some(priority_fd) = current_process_priority_handle(0)? else {
|
||||
+ return Err(Errno(ESRCH));
|
||||
+ };
|
||||
+
|
||||
+ let mut nice_bytes = [0_u8; size_of::<c_int>()];
|
||||
+ if priority_fd.read(&mut nice_bytes)? != size_of::<c_int>() {
|
||||
+ return Err(Errno(EIO));
|
||||
+ }
|
||||
+
|
||||
+ Ok(c_int::from_ne_bytes(nice_bytes))
|
||||
+}
|
||||
+
|
||||
+fn write_current_process_nice(nice: c_int) -> Result<()> {
|
||||
+ let mut updated_threads = 0;
|
||||
+ let nice_bytes = nice.to_ne_bytes();
|
||||
+
|
||||
+ for index in 0.. {
|
||||
+ let Some(priority_fd) = current_process_priority_handle(index)? else {
|
||||
+ break;
|
||||
+ };
|
||||
+
|
||||
+ if priority_fd.write(&nice_bytes)? != nice_bytes.len() {
|
||||
+ return Err(Errno(EIO));
|
||||
+ }
|
||||
+ updated_threads += 1;
|
||||
+ }
|
||||
+
|
||||
+ if updated_threads == 0 {
|
||||
+ return Err(Errno(ESRCH));
|
||||
+ }
|
||||
+
|
||||
+ Ok(())
|
||||
+}
|
||||
|
||||
fn cvt_uid(id: c_int) -> Result<Option<u32>> {
|
||||
if id == -1 {
|
||||
return Ok(None);
|
||||
@@ -698,6 +761,11 @@ impl Pal for Sys {
|
||||
}
|
||||
|
||||
fn getpriority(which: c_int, who: id_t) -> Result<c_int> {
|
||||
+ if is_current_process_priority_target(which, who) {
|
||||
+ let nice = read_current_process_nice()?;
|
||||
+ return Ok(20 - nice);
|
||||
+ }
|
||||
+
|
||||
match redox_rt::sys::posix_getpriority(which, who as u32) {
|
||||
Ok(kernel_prio) => {
|
||||
let posix_prio = (kernel_prio as i32 * -1) + 40 as i32;
|
||||
@@ -1274,7 +1342,12 @@ impl Pal for Sys {
|
||||
}
|
||||
|
||||
fn setpriority(which: c_int, who: id_t, prio: c_int) -> Result<()> {
|
||||
- let clamped_prio = prio.clamp(-20, 19);
|
||||
+ let clamped_prio = prio.clamp(NICE_MIN, NICE_MAX);
|
||||
+
|
||||
+ if is_current_process_priority_target(which, who) {
|
||||
+ return write_current_process_nice(clamped_prio);
|
||||
+ }
|
||||
+
|
||||
let kernel_prio = (20 + clamped_prio) as u32;
|
||||
|
||||
match redox_rt::sys::posix_setpriority(which, who as u32, kernel_prio) {
|
||||
@@ -0,0 +1,326 @@
|
||||
diff --git a/src/header/pthread/mod.rs b/src/header/pthread/mod.rs
|
||||
index c742a42..008090a 100644
|
||||
--- a/src/header/pthread/mod.rs
|
||||
+++ b/src/header/pthread/mod.rs
|
||||
@@ -3,15 +3,26 @@
|
||||
//! See <https://pubs.opengroup.org/onlinepubs/9799919799/basedefs/pthread.h.html>.
|
||||
|
||||
use alloc::collections::LinkedList;
|
||||
-use core::{cell::Cell, ptr::NonNull};
|
||||
+use core::{cell::Cell, mem::size_of, ptr::NonNull};
|
||||
+
|
||||
+#[cfg(target_os = "redox")]
|
||||
+use redox_rt::proc::FdGuard;
|
||||
+#[cfg(target_os = "linux")]
|
||||
+use sc::syscall;
|
||||
+#[cfg(target_os = "redox")]
|
||||
+use syscall;
|
||||
|
||||
use crate::{
|
||||
error::Errno,
|
||||
- header::{bits_timespec::timespec, sched::*},
|
||||
+ header::{
|
||||
+ bits_timespec::timespec,
|
||||
+ errno::{EINVAL, ERANGE},
|
||||
+ sched::*,
|
||||
+ },
|
||||
platform::{
|
||||
Pal, Sys,
|
||||
types::{
|
||||
- c_int, c_uchar, c_uint, c_void, clockid_t, pthread_attr_t, pthread_barrier_t,
|
||||
+ c_char, c_int, c_uchar, c_uint, c_void, clockid_t, pthread_attr_t, pthread_barrier_t,
|
||||
pthread_barrierattr_t, pthread_cond_t, pthread_condattr_t, pthread_key_t,
|
||||
pthread_mutex_t, pthread_mutexattr_t, pthread_once_t, pthread_rwlock_t,
|
||||
pthread_rwlockattr_t, pthread_spinlock_t, pthread_t, size_t,
|
||||
@@ -20,6 +31,9 @@ use crate::{
|
||||
pthread,
|
||||
};
|
||||
|
||||
+#[cfg(target_os = "linux")]
|
||||
+use crate::platform::sys::e_raw;
|
||||
+
|
||||
pub fn e(result: Result<(), Errno>) -> i32 {
|
||||
match result {
|
||||
Ok(()) => 0,
|
||||
@@ -27,6 +41,96 @@ pub fn e(result: Result<(), Errno>) -> i32 {
|
||||
}
|
||||
}
|
||||
|
||||
+const RLCT_AFFINITY_BYTES: usize = size_of::<u64>();
|
||||
+const RLCT_MAX_AFFINITY_CPUS: usize = u64::BITS as usize;
|
||||
+
|
||||
+fn cpuset_bytes<'a>(cpusetsize: size_t, cpuset: *const cpu_set_t) -> Result<&'a [u8], Errno> {
|
||||
+ if cpuset.is_null() || !(RLCT_AFFINITY_BYTES..=size_of::<cpu_set_t>()).contains(&cpusetsize) {
|
||||
+ return Err(Errno(EINVAL));
|
||||
+ }
|
||||
+
|
||||
+ Ok(unsafe { core::slice::from_raw_parts(cpuset.cast::<u8>(), cpusetsize) })
|
||||
+}
|
||||
+
|
||||
+fn cpuset_bytes_mut<'a>(cpusetsize: size_t, cpuset: *mut cpu_set_t) -> Result<&'a mut [u8], Errno> {
|
||||
+ if cpuset.is_null() || !(RLCT_AFFINITY_BYTES..=size_of::<cpu_set_t>()).contains(&cpusetsize) {
|
||||
+ return Err(Errno(EINVAL));
|
||||
+ }
|
||||
+
|
||||
+ Ok(unsafe { core::slice::from_raw_parts_mut(cpuset.cast::<u8>(), cpusetsize) })
|
||||
+}
|
||||
+
|
||||
+fn cpuset_to_u64(cpusetsize: size_t, cpuset: *const cpu_set_t) -> Result<u64, Errno> {
|
||||
+ let bytes = cpuset_bytes(cpusetsize, cpuset)?;
|
||||
+ let mut mask = 0_u64;
|
||||
+
|
||||
+ for (byte_index, byte) in bytes.iter().copied().enumerate() {
|
||||
+ for bit in 0..u8::BITS as usize {
|
||||
+ if byte & (1 << bit) == 0 {
|
||||
+ continue;
|
||||
+ }
|
||||
+
|
||||
+ let cpu = byte_index * u8::BITS as usize + bit;
|
||||
+ if cpu >= RLCT_MAX_AFFINITY_CPUS {
|
||||
+ return Err(Errno(EINVAL));
|
||||
+ }
|
||||
+
|
||||
+ mask |= 1_u64 << cpu;
|
||||
+ }
|
||||
+ }
|
||||
+
|
||||
+ Ok(mask)
|
||||
+}
|
||||
+
|
||||
+fn copy_u64_to_cpuset(mask: u64, cpusetsize: size_t, cpuset: *mut cpu_set_t) -> Result<(), Errno> {
|
||||
+ let bytes = cpuset_bytes_mut(cpusetsize, cpuset)?;
|
||||
+ bytes.fill(0);
|
||||
+
|
||||
+ for (byte_index, dst) in bytes.iter_mut().take(RLCT_AFFINITY_BYTES).enumerate() {
|
||||
+ *dst = (mask >> (byte_index * u8::BITS as usize)) as u8;
|
||||
+ }
|
||||
+
|
||||
+ Ok(())
|
||||
+}
|
||||
+
|
||||
+#[cfg(target_os = "redox")]
|
||||
+fn redox_set_thread_affinity(thread: &pthread::Pthread, mask: u64) -> Result<(), Errno> {
|
||||
+ let mut kernel_cpuset = cpu_set_t::default();
|
||||
+ kernel_cpuset.__bits[0] = mask;
|
||||
+
|
||||
+ let handle = FdGuard::new(unsafe {
|
||||
+ syscall::dup(thread.os_tid.get().read().thread_fd, b"sched-affinity")?
|
||||
+ });
|
||||
+ let _ = handle.write(unsafe {
|
||||
+ core::slice::from_raw_parts(
|
||||
+ core::ptr::from_ref(&kernel_cpuset).cast::<u8>(),
|
||||
+ size_of::<cpu_set_t>(),
|
||||
+ )
|
||||
+ })?;
|
||||
+
|
||||
+ Ok(())
|
||||
+}
|
||||
+
|
||||
+#[cfg(target_os = "redox")]
|
||||
+fn redox_get_thread_affinity(thread: &pthread::Pthread) -> Result<u64, Errno> {
|
||||
+ let handle = FdGuard::new(unsafe {
|
||||
+ syscall::dup(thread.os_tid.get().read().thread_fd, b"sched-affinity")?
|
||||
+ });
|
||||
+ let mut kernel_cpuset = cpu_set_t::default();
|
||||
+ let _ = handle.read(unsafe {
|
||||
+ core::slice::from_raw_parts_mut(
|
||||
+ core::ptr::from_mut(&mut kernel_cpuset).cast::<u8>(),
|
||||
+ size_of::<cpu_set_t>(),
|
||||
+ )
|
||||
+ })?;
|
||||
+
|
||||
+ if kernel_cpuset.__bits[1..].iter().any(|bits| *bits != 0) {
|
||||
+ return Err(Errno(EINVAL));
|
||||
+ }
|
||||
+
|
||||
+ Ok(kernel_cpuset.__bits[0])
|
||||
+}
|
||||
+
|
||||
#[derive(Clone)]
|
||||
pub(crate) struct RlctAttr {
|
||||
pub detachstate: c_uchar,
|
||||
@@ -186,6 +290,43 @@ pub unsafe extern "C" fn pthread_getcpuclockid(
|
||||
}
|
||||
}
|
||||
|
||||
+/// GNU extension. See <https://man7.org/linux/man-pages/man3/pthread_setaffinity_np.3.html>.
|
||||
+#[unsafe(no_mangle)]
|
||||
+pub unsafe extern "C" fn pthread_getaffinity_np(
|
||||
+ thread: pthread_t,
|
||||
+ cpusetsize: size_t,
|
||||
+ cpuset: *mut cpu_set_t,
|
||||
+) -> c_int {
|
||||
+ let thread: &pthread::Pthread = unsafe { &*thread.cast() };
|
||||
+
|
||||
+ let result = {
|
||||
+ #[cfg(target_os = "redox")]
|
||||
+ {
|
||||
+ redox_get_thread_affinity(thread)
|
||||
+ .and_then(|mask| copy_u64_to_cpuset(mask, cpusetsize, cpuset))
|
||||
+ }
|
||||
+
|
||||
+ #[cfg(target_os = "linux")]
|
||||
+ {
|
||||
+ if cpuset.is_null() {
|
||||
+ Err(Errno(EINVAL))
|
||||
+ } else {
|
||||
+ e_raw(unsafe {
|
||||
+ syscall!(
|
||||
+ SCHED_GETAFFINITY,
|
||||
+ thread.os_tid.get().read().thread_id,
|
||||
+ cpusetsize,
|
||||
+ cpuset.cast::<c_void>()
|
||||
+ )
|
||||
+ })
|
||||
+ .map(|_| ())
|
||||
+ }
|
||||
+ }
|
||||
+ };
|
||||
+
|
||||
+ e(result)
|
||||
+}
|
||||
+
|
||||
/// See <https://pubs.opengroup.org/onlinepubs/9799919799/functions/pthread_getschedparam.html>.
|
||||
#[unsafe(no_mangle)]
|
||||
pub unsafe extern "C" fn pthread_getschedparam(
|
||||
@@ -235,6 +376,43 @@ pub unsafe extern "C" fn pthread_self() -> pthread_t {
|
||||
core::ptr::from_ref(unsafe { pthread::current_thread().unwrap_unchecked() }) as *mut _
|
||||
}
|
||||
|
||||
+/// GNU extension. See <https://man7.org/linux/man-pages/man3/pthread_setaffinity_np.3.html>.
|
||||
+#[unsafe(no_mangle)]
|
||||
+pub unsafe extern "C" fn pthread_setaffinity_np(
|
||||
+ thread: pthread_t,
|
||||
+ cpusetsize: size_t,
|
||||
+ cpuset: *const cpu_set_t,
|
||||
+) -> c_int {
|
||||
+ let thread: &pthread::Pthread = unsafe { &*thread.cast() };
|
||||
+
|
||||
+ let result = {
|
||||
+ #[cfg(target_os = "redox")]
|
||||
+ {
|
||||
+ cpuset_to_u64(cpusetsize, cpuset)
|
||||
+ .and_then(|mask| redox_set_thread_affinity(thread, mask))
|
||||
+ }
|
||||
+
|
||||
+ #[cfg(target_os = "linux")]
|
||||
+ {
|
||||
+ if cpuset.is_null() {
|
||||
+ Err(Errno(EINVAL))
|
||||
+ } else {
|
||||
+ e_raw(unsafe {
|
||||
+ syscall!(
|
||||
+ SCHED_SETAFFINITY,
|
||||
+ thread.os_tid.get().read().thread_id,
|
||||
+ cpusetsize,
|
||||
+ cpuset.cast::<c_void>()
|
||||
+ )
|
||||
+ })
|
||||
+ .map(|_| ())
|
||||
+ }
|
||||
+ }
|
||||
+ };
|
||||
+
|
||||
+ e(result)
|
||||
+}
|
||||
+
|
||||
/// See <https://pubs.opengroup.org/onlinepubs/9799919799/functions/pthread_setcancelstate.html>.
|
||||
#[unsafe(no_mangle)]
|
||||
pub unsafe extern "C" fn pthread_setcancelstate(state: c_int, oldstate: *mut c_int) -> c_int {
|
||||
@@ -307,6 +485,13 @@ pub unsafe extern "C" fn pthread_testcancel() {
|
||||
unsafe { pthread::testcancel() };
|
||||
}
|
||||
|
||||
+/// <https://man7.org/linux/man-pages/man3/pthread_yield.3.html>
|
||||
+///
|
||||
+/// Non-standard GNU extension. Prefer `sched_yield()` instead.
|
||||
+pub extern "C" fn pthread_yield() {
|
||||
+ let _ = Sys::sched_yield();
|
||||
+}
|
||||
+
|
||||
// Must be the same struct as defined in the pthread_cleanup_push macro.
|
||||
#[repr(C)]
|
||||
pub(crate) struct CleanupLinkedListEntry {
|
||||
@@ -350,3 +535,82 @@ pub(crate) unsafe fn run_destructor_stack() {
|
||||
(entry.routine)(entry.arg);
|
||||
}
|
||||
}
|
||||
+
|
||||
+#[unsafe(no_mangle)]
|
||||
+pub unsafe extern "C" fn pthread_setname_np(thread: pthread_t, name: *const c_char) -> c_int {
|
||||
+ if name.is_null() {
|
||||
+ return EINVAL;
|
||||
+ }
|
||||
+
|
||||
+ let cstr = unsafe { core::ffi::CStr::from_ptr(name) };
|
||||
+ let name_bytes = cstr.to_bytes();
|
||||
+ let len = name_bytes.len().min(31);
|
||||
+
|
||||
+ #[cfg(target_os = "redox")]
|
||||
+ {
|
||||
+ let thread = unsafe { &*thread.cast::<crate::pthread::Pthread>() };
|
||||
+ let os_tid = unsafe { thread.os_tid.get().read() };
|
||||
+ let path = alloc::format!("proc:{}/name", os_tid.thread_fd);
|
||||
+ let fd = match Sys::open(&path, crate::header::fcntl::O_WRONLY, 0) {
|
||||
+ Ok(fd) => fd,
|
||||
+ Err(Errno(code)) => return code,
|
||||
+ };
|
||||
+
|
||||
+ let result = match Sys::write(fd, &name_bytes[..len]) {
|
||||
+ Ok(written) if written == len => 0,
|
||||
+ Ok(_) => crate::header::errno::EIO,
|
||||
+ Err(Errno(code)) => code,
|
||||
+ };
|
||||
+ let _ = Sys::close(fd);
|
||||
+ result
|
||||
+ }
|
||||
+ #[cfg(not(target_os = "redox"))]
|
||||
+ {
|
||||
+ let _ = thread;
|
||||
+ 0
|
||||
+ }
|
||||
+}
|
||||
+
|
||||
+#[unsafe(no_mangle)]
|
||||
+pub unsafe extern "C" fn pthread_getname_np(
|
||||
+ thread: pthread_t,
|
||||
+ name: *mut c_char,
|
||||
+ len: size_t,
|
||||
+) -> c_int {
|
||||
+ if name.is_null() {
|
||||
+ return EINVAL;
|
||||
+ }
|
||||
+ if len == 0 {
|
||||
+ return ERANGE;
|
||||
+ }
|
||||
+
|
||||
+ #[cfg(target_os = "redox")]
|
||||
+ {
|
||||
+ let thread = unsafe { &*thread.cast::<crate::pthread::Pthread>() };
|
||||
+ let os_tid = unsafe { thread.os_tid.get().read() };
|
||||
+ let path = alloc::format!("proc:{}/name", os_tid.thread_fd);
|
||||
+ let fd = match Sys::open(&path, crate::header::fcntl::O_RDONLY, 0) {
|
||||
+ Ok(fd) => fd,
|
||||
+ Err(Errno(code)) => return code,
|
||||
+ };
|
||||
+
|
||||
+ let mut buf = [0u8; 31];
|
||||
+ let result = match Sys::read(fd, &mut buf) {
|
||||
+ Ok(read) if read < len => {
|
||||
+ unsafe { core::ptr::copy_nonoverlapping(buf.as_ptr(), name.cast(), read) };
|
||||
+ unsafe { *name.add(read) = 0 };
|
||||
+ 0
|
||||
+ }
|
||||
+ Ok(_) => ERANGE,
|
||||
+ Err(Errno(code)) => code,
|
||||
+ };
|
||||
+ let _ = Sys::close(fd);
|
||||
+ result
|
||||
+ }
|
||||
+ #[cfg(not(target_os = "redox"))]
|
||||
+ {
|
||||
+ let _ = thread;
|
||||
+ unsafe { *name = 0 };
|
||||
+ 0
|
||||
+ }
|
||||
+}
|
||||
@@ -0,0 +1,43 @@
|
||||
diff --git a/src/sync/pthread_mutex.rs b/src/sync/pthread_mutex.rs
|
||||
index 2871a6149..3c8e73f15 100644
|
||||
--- a/src/sync/pthread_mutex.rs
|
||||
+++ b/src/sync/pthread_mutex.rs
|
||||
@@ -35,7 +35,7 @@ const FUTEX_OWNER_DIED: u32 = 1 << 30;
|
||||
const INDEX_MASK: u32 = !(WAITING_BIT | FUTEX_OWNER_DIED);
|
||||
// TODO: Lower limit is probably better.
|
||||
const RECURSIVE_COUNT_MAX_INCLUSIVE: u32 = u32::MAX;
|
||||
-const SPIN_COUNT: usize = 0;
|
||||
+const SPIN_COUNT: usize = 100;
|
||||
|
||||
impl RlctMutex {
|
||||
pub(crate) fn new(attr: &RlctMutexAttr) -> Result<Self, Errno> {
|
||||
diff --git a/src/sync/barrier.rs b/src/sync/barrier.rs
|
||||
index b5847b5..a8e3c2f0 100644
|
||||
--- a/src/sync/barrier.rs
|
||||
+++ b/src/sync/barrier.rs
|
||||
@@ -47,6 +47,9 @@ impl Barrier {
|
||||
cvar: FutexState::new(count.get()),
|
||||
}
|
||||
}
|
||||
+ pub fn destroy(&self) {}
|
||||
+
|
||||
pub fn wait(&self) -> WaitResult {
|
||||
let _ = &self.lock;
|
||||
let sense = self.cvar.sense.load(Ordering::Acquire);
|
||||
diff --git a/src/header/pthread/barrier.rs b/src/header/pthread/barrier.rs
|
||||
index 1a5df3a..e69e2b9 100644
|
||||
--- a/src/header/pthread/barrier.rs
|
||||
+++ b/src/header/pthread/barrier.rs
|
||||
@@ -24,10 +24,10 @@ pub(crate) struct RlctBarrierAttr {
|
||||
// Not async-signal-safe.
|
||||
#[unsafe(no_mangle)]
|
||||
pub unsafe extern "C" fn pthread_barrier_destroy(barrier: *mut pthread_barrier_t) -> c_int {
|
||||
- // Behavior is undefined if any thread is currently waiting when this is called.
|
||||
-
|
||||
- // No-op, currently.
|
||||
- unsafe { core::ptr::drop_in_place(barrier.cast::<RlctBarrier>()) };
|
||||
+ let barrier = unsafe { &*barrier.cast::<RlctBarrier>() };
|
||||
+ barrier.destroy();
|
||||
|
||||
0
|
||||
}
|
||||
@@ -0,0 +1,380 @@
|
||||
diff --git a/src/sync/pthread_mutex.rs b/src/sync/pthread_mutex.rs
|
||||
index 29bad63..af0c429 100644
|
||||
--- a/src/sync/pthread_mutex.rs
|
||||
+++ b/src/sync/pthread_mutex.rs
|
||||
@@ -1,3 +1,4 @@
|
||||
+use alloc::boxed::Box;
|
||||
use core::{
|
||||
cell::Cell,
|
||||
sync::atomic::{AtomicU32 as AtomicUint, Ordering},
|
||||
@@ -6,10 +7,9 @@ use core::{
|
||||
use crate::{
|
||||
error::Errno,
|
||||
header::{bits_timespec::timespec, errno::*, pthread::*},
|
||||
+ platform::{Pal, Sys, types::c_int},
|
||||
};
|
||||
|
||||
-use crate::platform::{Pal, Sys, types::c_int};
|
||||
-
|
||||
use super::FutexWaitResult;
|
||||
|
||||
pub struct RlctMutex {
|
||||
@@ -21,15 +21,22 @@ pub struct RlctMutex {
|
||||
robust: bool,
|
||||
}
|
||||
|
||||
+pub struct RobustMutexNode {
|
||||
+ pub next: *mut RobustMutexNode,
|
||||
+ pub prev: *mut RobustMutexNode,
|
||||
+ pub mutex: *const RlctMutex,
|
||||
+}
|
||||
+
|
||||
const STATE_UNLOCKED: u32 = 0;
|
||||
const WAITING_BIT: u32 = 1 << 31;
|
||||
-const INDEX_MASK: u32 = !WAITING_BIT;
|
||||
+const FUTEX_OWNER_DIED: u32 = 1 << 30;
|
||||
+const INDEX_MASK: u32 = !(WAITING_BIT | FUTEX_OWNER_DIED);
|
||||
|
||||
// TODO: Lower limit is probably better.
|
||||
const RECURSIVE_COUNT_MAX_INCLUSIVE: u32 = u32::MAX;
|
||||
// TODO: How many spins should we do before it becomes more time-economical to enter kernel mode
|
||||
// via futexes?
|
||||
-const SPIN_COUNT: usize = 0;
|
||||
+const SPIN_COUNT: usize = 100;
|
||||
|
||||
impl RlctMutex {
|
||||
pub(crate) fn new(attr: &RlctMutexAttr) -> Result<Self, Errno> {
|
||||
@@ -69,13 +76,25 @@ impl RlctMutex {
|
||||
Ok(0)
|
||||
}
|
||||
pub fn make_consistent(&self) -> Result<(), Errno> {
|
||||
- todo_skip!(0, "pthread robust mutexes: not implemented");
|
||||
- Ok(())
|
||||
+ debug_assert!(self.robust, "make_consistent called on non-robust mutex");
|
||||
+
|
||||
+ if !self.robust {
|
||||
+ return Err(Errno(EINVAL));
|
||||
+ }
|
||||
+
|
||||
+ let current = self.inner.load(Ordering::Relaxed);
|
||||
+ let owner = current & INDEX_MASK;
|
||||
+
|
||||
+ if owner == os_tid_invalid_after_fork() && current & FUTEX_OWNER_DIED != 0 {
|
||||
+ self.inner.store(0, Ordering::Release);
|
||||
+ Ok(())
|
||||
+ } else {
|
||||
+ Err(Errno(EINVAL))
|
||||
+ }
|
||||
}
|
||||
fn lock_inner(&self, deadline: Option<×pec>) -> Result<(), Errno> {
|
||||
let this_thread = os_tid_invalid_after_fork();
|
||||
-
|
||||
- //let mut spins_left = SPIN_COUNT;
|
||||
+ let mut spins_left = SPIN_COUNT;
|
||||
|
||||
loop {
|
||||
let result = self.inner.compare_exchange_weak(
|
||||
@@ -86,45 +105,59 @@ impl RlctMutex {
|
||||
);
|
||||
|
||||
match result {
|
||||
- // CAS succeeded
|
||||
- Ok(_) => {
|
||||
- if self.ty == Ty::Recursive {
|
||||
- self.increment_recursive_count()?;
|
||||
- }
|
||||
- return Ok(());
|
||||
- }
|
||||
- // CAS failed, but the mutex was recursive and we already own the lock.
|
||||
+ Ok(_) => return self.finish_lock_acquire(false),
|
||||
Err(thread) if thread & INDEX_MASK == this_thread && self.ty == Ty::Recursive => {
|
||||
self.increment_recursive_count()?;
|
||||
return Ok(());
|
||||
}
|
||||
- // CAS failed, but the mutex was error-checking and we already own the lock.
|
||||
Err(thread) if thread & INDEX_MASK == this_thread && self.ty == Ty::Errck => {
|
||||
- return Err(Errno(EAGAIN));
|
||||
+ return Err(Errno(EDEADLK));
|
||||
}
|
||||
- // CAS spuriously failed, simply retry the CAS. TODO: Use core::hint::spin_loop()?
|
||||
- Err(thread) if thread & INDEX_MASK == 0 => {
|
||||
- continue;
|
||||
+ Err(thread) if thread & FUTEX_OWNER_DIED != 0 && thread & INDEX_MASK == 0 => {
|
||||
+ return Err(Errno(ENOTRECOVERABLE));
|
||||
}
|
||||
- // CAS failed because some other thread owned the lock. We must now wait.
|
||||
+ Err(thread) if thread & FUTEX_OWNER_DIED != 0 => {
|
||||
+ if !self.robust {
|
||||
+ return Err(Errno(ENOTRECOVERABLE));
|
||||
+ }
|
||||
+
|
||||
+ let new_value = (thread & WAITING_BIT) | FUTEX_OWNER_DIED | this_thread;
|
||||
+ match self.inner.compare_exchange(
|
||||
+ thread,
|
||||
+ new_value,
|
||||
+ Ordering::Acquire,
|
||||
+ Ordering::Relaxed,
|
||||
+ ) {
|
||||
+ Ok(_) => return self.finish_lock_acquire(true),
|
||||
+ Err(_) => continue,
|
||||
+ }
|
||||
+ }
|
||||
+ Err(thread) if thread & INDEX_MASK == 0 => continue,
|
||||
Err(thread) => {
|
||||
- /*if spins_left > 0 {
|
||||
- // TODO: Faster to spin trying to load the flag, compared to CAS?
|
||||
+ let owner = thread & INDEX_MASK;
|
||||
+
|
||||
+ if !crate::pthread::mutex_owner_id_is_live(owner) {
|
||||
+ if !self.robust {
|
||||
+ return Err(Errno(ENOTRECOVERABLE));
|
||||
+ }
|
||||
+
|
||||
+ let new_value = (thread & WAITING_BIT) | FUTEX_OWNER_DIED | this_thread;
|
||||
+ match self.inner.compare_exchange(
|
||||
+ thread,
|
||||
+ new_value,
|
||||
+ Ordering::Acquire,
|
||||
+ Ordering::Relaxed,
|
||||
+ ) {
|
||||
+ Ok(_) => return self.finish_lock_acquire(true),
|
||||
+ Err(_) => continue,
|
||||
+ }
|
||||
+ }
|
||||
+
|
||||
+ if spins_left > 0 {
|
||||
spins_left -= 1;
|
||||
core::hint::spin_loop();
|
||||
continue;
|
||||
}
|
||||
-
|
||||
- spins_left = SPIN_COUNT;
|
||||
-
|
||||
- let inner = self.inner.fetch_or(WAITING_BIT, Ordering::Relaxed);
|
||||
-
|
||||
- if inner == STATE_UNLOCKED {
|
||||
- continue;
|
||||
- }*/
|
||||
-
|
||||
- // If the mutex is not robust, simply futex_wait until unblocked.
|
||||
- //crate::sync::futex_wait(&self.inner, inner | WAITING_BIT, None);
|
||||
if crate::sync::futex_wait(&self.inner, thread, deadline)
|
||||
== FutexWaitResult::TimedOut
|
||||
{
|
||||
@@ -140,6 +173,20 @@ impl RlctMutex {
|
||||
pub fn lock_with_timeout(&self, deadline: ×pec) -> Result<(), Errno> {
|
||||
self.lock_inner(Some(deadline))
|
||||
}
|
||||
+ fn finish_lock_acquire(&self, owner_dead: bool) -> Result<(), Errno> {
|
||||
+ if self.ty == Ty::Recursive {
|
||||
+ self.increment_recursive_count()?;
|
||||
+ }
|
||||
+ if self.robust {
|
||||
+ add_to_robust_list(self);
|
||||
+ }
|
||||
+
|
||||
+ if owner_dead {
|
||||
+ Err(Errno(EOWNERDEAD))
|
||||
+ } else {
|
||||
+ Ok(())
|
||||
+ }
|
||||
+ }
|
||||
fn increment_recursive_count(&self) -> Result<(), Errno> {
|
||||
// We don't have to worry about asynchronous signals here, since pthread_mutex_trylock
|
||||
// is not async-signal-safe.
|
||||
@@ -161,41 +208,65 @@ impl RlctMutex {
|
||||
pub fn try_lock(&self) -> Result<(), Errno> {
|
||||
let this_thread = os_tid_invalid_after_fork();
|
||||
|
||||
- // TODO: If recursive, omitting CAS may be faster if it is already owned by this thread.
|
||||
- let result = self.inner.compare_exchange(
|
||||
- STATE_UNLOCKED,
|
||||
- this_thread,
|
||||
- Ordering::Acquire,
|
||||
- Ordering::Relaxed,
|
||||
- );
|
||||
+ loop {
|
||||
+ let current = self.inner.load(Ordering::Relaxed);
|
||||
+
|
||||
+ if current == STATE_UNLOCKED {
|
||||
+ match self.inner.compare_exchange(
|
||||
+ STATE_UNLOCKED,
|
||||
+ this_thread,
|
||||
+ Ordering::Acquire,
|
||||
+ Ordering::Relaxed,
|
||||
+ ) {
|
||||
+ Ok(_) => return self.finish_lock_acquire(false),
|
||||
+ Err(_) => continue,
|
||||
+ }
|
||||
+ }
|
||||
|
||||
- if self.ty == Ty::Recursive {
|
||||
- match result {
|
||||
- Err(index) if index & INDEX_MASK != this_thread => return Err(Errno(EBUSY)),
|
||||
- _ => (),
|
||||
+ let owner = current & INDEX_MASK;
|
||||
+
|
||||
+ if owner == this_thread && self.ty == Ty::Recursive {
|
||||
+ self.increment_recursive_count()?;
|
||||
+ return Ok(());
|
||||
}
|
||||
|
||||
- self.increment_recursive_count()?;
|
||||
+ if owner == this_thread && self.ty == Ty::Errck {
|
||||
+ return Err(Errno(EDEADLK));
|
||||
+ }
|
||||
|
||||
- return Ok(());
|
||||
- }
|
||||
+ if current & FUTEX_OWNER_DIED != 0 && owner == 0 {
|
||||
+ return Err(Errno(ENOTRECOVERABLE));
|
||||
+ }
|
||||
|
||||
- match result {
|
||||
- Ok(_) => Ok(()),
|
||||
- Err(index) if index & INDEX_MASK == this_thread && self.ty == Ty::Errck => {
|
||||
- Err(Errno(EDEADLK))
|
||||
+ if current & FUTEX_OWNER_DIED != 0 || (owner != 0 && !crate::pthread::mutex_owner_id_is_live(owner)) {
|
||||
+ if !self.robust {
|
||||
+ return Err(Errno(ENOTRECOVERABLE));
|
||||
+ }
|
||||
+
|
||||
+ let new_value = (current & WAITING_BIT) | FUTEX_OWNER_DIED | this_thread;
|
||||
+ match self.inner.compare_exchange(
|
||||
+ current,
|
||||
+ new_value,
|
||||
+ Ordering::Acquire,
|
||||
+ Ordering::Relaxed,
|
||||
+ ) {
|
||||
+ Ok(_) => return self.finish_lock_acquire(true),
|
||||
+ Err(_) => continue,
|
||||
+ }
|
||||
}
|
||||
- Err(_) => Err(Errno(EBUSY)),
|
||||
+
|
||||
+ return Err(Errno(EBUSY));
|
||||
}
|
||||
}
|
||||
// Safe because we are not protecting any data.
|
||||
pub fn unlock(&self) -> Result<(), Errno> {
|
||||
+ let current = self.inner.load(Ordering::Relaxed);
|
||||
+
|
||||
if self.robust || matches!(self.ty, Ty::Recursive | Ty::Errck) {
|
||||
- if self.inner.load(Ordering::Relaxed) & INDEX_MASK != os_tid_invalid_after_fork() {
|
||||
+ if current & INDEX_MASK != os_tid_invalid_after_fork() {
|
||||
return Err(Errno(EPERM));
|
||||
}
|
||||
|
||||
- // TODO: Is this fence correct?
|
||||
core::sync::atomic::fence(Ordering::Acquire);
|
||||
}
|
||||
|
||||
@@ -208,18 +279,47 @@ impl RlctMutex {
|
||||
}
|
||||
}
|
||||
|
||||
- self.inner.store(STATE_UNLOCKED, Ordering::Release);
|
||||
- crate::sync::futex_wake(&self.inner, i32::MAX);
|
||||
- /*let was_waiting = self.inner.swap(STATE_UNLOCKED, Ordering::Release) & WAITING_BIT != 0;
|
||||
+ if self.robust {
|
||||
+ remove_from_robust_list(self);
|
||||
+ }
|
||||
|
||||
- if was_waiting {
|
||||
- let _ = crate::sync::futex_wake(&self.inner, 1);
|
||||
- }*/
|
||||
+ let new_state = if self.robust && current & FUTEX_OWNER_DIED != 0 {
|
||||
+ FUTEX_OWNER_DIED
|
||||
+ } else {
|
||||
+ STATE_UNLOCKED
|
||||
+ };
|
||||
+
|
||||
+ self.inner.store(new_state, Ordering::Release);
|
||||
+ crate::sync::futex_wake(&self.inner, i32::MAX);
|
||||
|
||||
Ok(())
|
||||
}
|
||||
}
|
||||
|
||||
+pub(crate) unsafe fn mark_robust_mutexes_dead(thread: &crate::pthread::Pthread) {
|
||||
+ let head = thread.robust_list_head.get();
|
||||
+ let this_thread = os_tid_invalid_after_fork();
|
||||
+ let mut node = unsafe { *head };
|
||||
+
|
||||
+ unsafe { *head = core::ptr::null_mut() };
|
||||
+
|
||||
+ while !node.is_null() {
|
||||
+ let next = unsafe { (*node).next };
|
||||
+ let mutex = unsafe { &*(*node).mutex };
|
||||
+ let current = mutex.inner.load(Ordering::Relaxed);
|
||||
+
|
||||
+ if current & INDEX_MASK == this_thread {
|
||||
+ mutex
|
||||
+ .inner
|
||||
+ .store((current & WAITING_BIT) | FUTEX_OWNER_DIED | this_thread, Ordering::Release);
|
||||
+ crate::sync::futex_wake(&mutex.inner, i32::MAX);
|
||||
+ }
|
||||
+
|
||||
+ unsafe { drop(Box::from_raw(node)) };
|
||||
+ node = next;
|
||||
+ }
|
||||
+}
|
||||
+
|
||||
#[repr(u8)]
|
||||
#[derive(PartialEq)]
|
||||
enum Ty {
|
||||
@@ -237,6 +337,54 @@ enum Ty {
|
||||
#[thread_local]
|
||||
static CACHED_OS_TID_INVALID_AFTER_FORK: Cell<u32> = Cell::new(0);
|
||||
|
||||
+fn add_to_robust_list(mutex: &RlctMutex) {
|
||||
+ let thread = crate::pthread::current_thread().expect("current thread not present");
|
||||
+ let node_ptr = Box::into_raw(Box::new(RobustMutexNode {
|
||||
+ next: core::ptr::null_mut(),
|
||||
+ prev: core::ptr::null_mut(),
|
||||
+ mutex: core::ptr::from_ref(mutex),
|
||||
+ }));
|
||||
+
|
||||
+ unsafe {
|
||||
+ let head = thread.robust_list_head.get();
|
||||
+ if !(*head).is_null() {
|
||||
+ (**head).prev = node_ptr;
|
||||
+ }
|
||||
+ (*node_ptr).next = *head;
|
||||
+ *head = node_ptr;
|
||||
+ }
|
||||
+}
|
||||
+
|
||||
+fn remove_from_robust_list(mutex: &RlctMutex) {
|
||||
+ let thread = match crate::pthread::current_thread() {
|
||||
+ Some(thread) => thread,
|
||||
+ None => return,
|
||||
+ };
|
||||
+
|
||||
+ unsafe {
|
||||
+ let mut node = *thread.robust_list_head.get();
|
||||
+
|
||||
+ while !node.is_null() {
|
||||
+ if core::ptr::eq((*node).mutex, core::ptr::from_ref(mutex)) {
|
||||
+ if !(*node).prev.is_null() {
|
||||
+ (*(*node).prev).next = (*node).next;
|
||||
+ } else {
|
||||
+ *thread.robust_list_head.get() = (*node).next;
|
||||
+ }
|
||||
+
|
||||
+ if !(*node).next.is_null() {
|
||||
+ (*(*node).next).prev = (*node).prev;
|
||||
+ }
|
||||
+
|
||||
+ drop(Box::from_raw(node));
|
||||
+ return;
|
||||
+ }
|
||||
+
|
||||
+ node = (*node).next;
|
||||
+ }
|
||||
+ }
|
||||
+}
|
||||
+
|
||||
// Assumes TIDs are unique between processes, which I only know is true for Redox.
|
||||
fn os_tid_invalid_after_fork() -> u32 {
|
||||
// TODO: Coordinate better if using shared == PTHREAD_PROCESS_SHARED, with up to 2^32 separate
|
||||
@@ -0,0 +1,5 @@
|
||||
[source]
|
||||
path = "source"
|
||||
|
||||
[build]
|
||||
template = "cargo"
|
||||
@@ -0,0 +1,9 @@
|
||||
[package]
|
||||
name = "numad"
|
||||
version = "0.1.0"
|
||||
edition = "2021"
|
||||
description = "Red Bear OS NUMA topology daemon — parses ACPI SRAT/SLIT and feeds kernel NUMA hints"
|
||||
|
||||
[[bin]]
|
||||
name = "numad"
|
||||
path = "src/main.rs"
|
||||
@@ -0,0 +1,236 @@
|
||||
/// numad — Red Bear OS NUMA topology daemon
|
||||
///
|
||||
/// Reads ACPI SRAT/SLIT from physical memory via /scheme/memory/physical
|
||||
/// and feeds NUMA topology hints to the kernel for scheduler placement.
|
||||
use std::fs;
|
||||
use std::io::{Read, Write};
|
||||
use std::mem;
|
||||
|
||||
const RSDP_SIGNATURE: &[u8; 8] = b"RSD PTR ";
|
||||
const SRAT_SIGNATURE: &[u8; 4] = b"SRAT";
|
||||
const SLIT_SIGNATURE: &[u8; 4] = b"SLIT";
|
||||
const MAX_NUMA_NODES: usize = 8;
|
||||
|
||||
#[repr(C, packed)]
|
||||
#[derive(Copy, Clone)]
|
||||
struct Rsdp {
|
||||
signature: [u8; 8],
|
||||
checksum: u8,
|
||||
oem_id: [u8; 6],
|
||||
revision: u8,
|
||||
rsdt_addr: u32,
|
||||
}
|
||||
|
||||
#[repr(C, packed)]
|
||||
#[derive(Copy, Clone)]
|
||||
struct SdtHeader {
|
||||
signature: [u8; 4],
|
||||
length: u32,
|
||||
revision: u8,
|
||||
checksum: u8,
|
||||
oem_id: [u8; 6],
|
||||
oem_table_id: [u8; 8],
|
||||
oem_revision: u32,
|
||||
creator_id: u32,
|
||||
creator_revision: u32,
|
||||
}
|
||||
|
||||
#[repr(C, packed)]
|
||||
#[derive(Copy, Clone)]
|
||||
struct SratEntry {
|
||||
entry_type: u8,
|
||||
length: u8,
|
||||
}
|
||||
|
||||
#[repr(C, packed)]
|
||||
#[derive(Copy, Clone)]
|
||||
struct SratProcessorApic {
|
||||
entry: SratEntry,
|
||||
proximity_domain_lo: u8,
|
||||
apic_id: u8,
|
||||
flags: u32,
|
||||
local_sapic_eid: u8,
|
||||
proximity_domain_hi: [u8; 3],
|
||||
clock_domain: u32,
|
||||
}
|
||||
|
||||
#[repr(C, packed)]
|
||||
#[derive(Copy, Clone)]
|
||||
struct SratMemory {
|
||||
entry: SratEntry,
|
||||
proximity_domain: u32,
|
||||
reserved: u16,
|
||||
base_address: u64,
|
||||
length: u64,
|
||||
reserved2: [u8; 8],
|
||||
flags: u32,
|
||||
reserved3: [u8; 8],
|
||||
}
|
||||
|
||||
struct NumaNode {
|
||||
id: u8,
|
||||
apic_ids: Vec<u8>,
|
||||
}
|
||||
|
||||
fn main() {
|
||||
eprintln!("numad: starting NUMA topology discovery");
|
||||
|
||||
// Read RSDP from known physical locations (EBDA or BIOS area)
|
||||
let rsdp = match find_rsdp() {
|
||||
Some(r) => r,
|
||||
None => {
|
||||
eprintln!("numad: no RSDP found, assuming UMA (single-node)");
|
||||
return;
|
||||
}
|
||||
};
|
||||
|
||||
// Read RSDT to find SRAT and SLIT
|
||||
let sdt_addr = rsdp.rsdt_addr as usize;
|
||||
let sdt_header = read_phys::<SdtHeader>(sdt_addr);
|
||||
if &sdt_header.signature != b"RSDT" {
|
||||
eprintln!("numad: no RSDT found");
|
||||
return;
|
||||
}
|
||||
|
||||
let num_entries = (sdt_header.length as usize - mem::size_of::<SdtHeader>()) / 4;
|
||||
let entries_base = sdt_addr + mem::size_of::<SdtHeader>();
|
||||
|
||||
let mut srat_data: Option<Vec<u8>> = None;
|
||||
let mut slit_data: Option<Vec<u8>> = None;
|
||||
|
||||
for i in 0..num_entries {
|
||||
let entry_addr = entries_base + i * 4;
|
||||
let table_ptr: u32 = read_phys(entry_addr);
|
||||
let table_addr = table_ptr as usize;
|
||||
if table_addr == 0 {
|
||||
continue;
|
||||
}
|
||||
let header = read_phys::<SdtHeader>(table_addr);
|
||||
match &header.signature {
|
||||
SRAT_SIGNATURE => {
|
||||
srat_data = Some(read_phys_bytes(table_addr, header.length as usize));
|
||||
}
|
||||
SLIT_SIGNATURE => {
|
||||
slit_data = Some(read_phys_bytes(table_addr, header.length as usize));
|
||||
}
|
||||
_ => {}
|
||||
}
|
||||
}
|
||||
|
||||
let Some(srat) = srat_data else {
|
||||
eprintln!("numad: no SRAT found, assuming UMA");
|
||||
return;
|
||||
};
|
||||
|
||||
let mut nodes: Vec<NumaNode> = Vec::new();
|
||||
let sdt_offset = mem::size_of::<SdtHeader>();
|
||||
let mut offset = sdt_offset;
|
||||
|
||||
while offset + mem::size_of::<SratEntry>() <= srat.len() {
|
||||
let entry: &SratEntry = unsafe { &*(srat.as_ptr().add(offset) as *const SratEntry) };
|
||||
if entry.length < mem::size_of::<SratEntry>() as u8 || offset + entry.length as usize > srat.len() {
|
||||
break;
|
||||
}
|
||||
|
||||
match entry.entry_type {
|
||||
0 => {
|
||||
// Processor Local APIC
|
||||
if entry.length as usize >= mem::size_of::<SratProcessorApic>() {
|
||||
let proc: &SratProcessorApic = unsafe {
|
||||
&*(srat.as_ptr().add(offset) as *const SratProcessorApic)
|
||||
};
|
||||
if proc.flags & 1 != 0 {
|
||||
let proximity = proc.proximity_domain_lo;
|
||||
while nodes.len() <= proximity as usize {
|
||||
nodes.push(NumaNode { id: nodes.len() as u8, apic_ids: Vec::new() });
|
||||
}
|
||||
nodes[proximity as usize].apic_ids.push(proc.apic_id);
|
||||
}
|
||||
}
|
||||
}
|
||||
_ => {}
|
||||
}
|
||||
offset += entry.length as usize;
|
||||
}
|
||||
|
||||
if nodes.is_empty() {
|
||||
eprintln!("numad: no CPU entries in SRAT, assuming UMA");
|
||||
return;
|
||||
}
|
||||
|
||||
eprintln!("numad: found {} NUMA nodes", nodes.len());
|
||||
for node in &nodes {
|
||||
eprintln!(" node {}: {} CPUs", node.id, node.apic_ids.len());
|
||||
}
|
||||
|
||||
// Write topology hints to kernel via proc: scheme
|
||||
// Format: "node_id,apic_id\n" per CPU
|
||||
if let Ok(mut fd) = fs::OpenOptions::new().write(true).open("/scheme/proc/numa") {
|
||||
for node in &nodes {
|
||||
let mut line = format!("{},", node.id);
|
||||
for apic_id in &node.apic_ids {
|
||||
line.push_str(&format!("{},", apic_id));
|
||||
}
|
||||
line.push('\n');
|
||||
let _ = fd.write_all(line.as_bytes());
|
||||
}
|
||||
eprintln!("numad: topology hints written to kernel");
|
||||
} else {
|
||||
eprintln!("numad: kernel NUMA interface not available (scheme:proc/numa)");
|
||||
}
|
||||
|
||||
eprintln!("numad: NUMA topology discovery complete");
|
||||
}
|
||||
|
||||
fn find_rsdp() -> Option<Rsdp> {
|
||||
// Search EBDA and BIOS areas for RSDP signature
|
||||
let search_areas: &[(usize, usize)] = &[
|
||||
(0x000E_0000, 0x000F_FFFF), // BIOS ROM area
|
||||
(0x0008_0000, 0x0009_FFFF), // EBDA/upper conventional
|
||||
];
|
||||
|
||||
for &(start, end) in search_areas {
|
||||
for addr in (start..end).step_by(16) {
|
||||
if addr + mem::size_of::<Rsdp>() > end {
|
||||
break;
|
||||
}
|
||||
let sig = read_phys_bytes(addr, 8);
|
||||
if &sig == RSDP_SIGNATURE {
|
||||
let rsdp: Rsdp = read_phys(addr);
|
||||
if validate_checksum(&rsdp) {
|
||||
return Some(rsdp);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
None
|
||||
}
|
||||
|
||||
fn validate_checksum(rsdp: &Rsdp) -> bool {
|
||||
let bytes = unsafe {
|
||||
std::slice::from_raw_parts(rsdp as *const _ as *const u8, mem::size_of::<Rsdp>())
|
||||
};
|
||||
bytes.iter().fold(0u8, |acc, &b| acc.wrapping_add(b)) == 0
|
||||
}
|
||||
|
||||
fn read_phys<T: Copy>(addr: usize) -> T {
|
||||
let path = format!("/scheme/memory/physical@{}", addr);
|
||||
if let Ok(mut fd) = fs::File::open(&path) {
|
||||
let mut buf = vec![0u8; mem::size_of::<T>()];
|
||||
if fd.read_exact(&mut buf).is_ok() {
|
||||
return unsafe { std::ptr::read(buf.as_ptr() as *const T) };
|
||||
}
|
||||
}
|
||||
unsafe { std::mem::zeroed() }
|
||||
}
|
||||
|
||||
fn read_phys_bytes(addr: usize, len: usize) -> Vec<u8> {
|
||||
let path = format!("/scheme/memory/physical@{}", addr);
|
||||
if let Ok(mut fd) = fs::File::open(&path) {
|
||||
let mut buf = vec![0u8; len];
|
||||
if fd.read_exact(&mut buf).is_ok() {
|
||||
return buf;
|
||||
}
|
||||
}
|
||||
vec![0u8; len]
|
||||
}
|
||||
@@ -0,0 +1,8 @@
|
||||
[source]
|
||||
path = "source"
|
||||
|
||||
[build]
|
||||
template = "cargo"
|
||||
|
||||
[package.files]
|
||||
"/usr/bin/redbear-acmd" = "redbear-acmd"
|
||||
@@ -6,6 +6,8 @@ patches = [
|
||||
"P0-workspace-add-bootstrap.patch",
|
||||
"P0-bootstrap-workspace-fix.patch",
|
||||
"P2-i2c-gpio-ucsi-drivers.patch",
|
||||
"P3-pcid-bind-scheme.patch",
|
||||
"P3-acpi-wave12-hardening.patch",
|
||||
]
|
||||
|
||||
[build]
|
||||
|
||||
@@ -1,6 +1,6 @@
|
||||
[source]
|
||||
git = "https://gitlab.redox-os.org/redox-os/kernel.git"
|
||||
patches = ["redox.patch", "P0-canary.patch", "P1-memory-map-overflow.patch", "../../../local/patches/kernel/P4-supplementary-groups.patch"]
|
||||
patches = ["redox.patch", "P0-canary.patch", "P1-memory-map-overflow.patch", "../../../local/patches/kernel/P4-supplementary-groups.patch", "../../../local/patches/kernel/P4-s3-suspend-resume.patch", "../../../local/patches/kernel/P5-sched-policy-context.patch", "../../../local/patches/kernel/P5-sched-rt-policy.patch", "../../../local/patches/kernel/P5-proc-setschedpolicy.patch", "../../../local/patches/kernel/P5-scheme-sched-id.patch", "../../../local/patches/kernel/P5-context-mod-sched.patch", "../../../local/patches/kernel/P6-vruntime-context.patch", "../../../local/patches/kernel/P6-percpu-runqueues.patch", "../../../local/patches/kernel/P6-futex-sharding.patch", "../../../local/patches/kernel/P6-vruntime-switch.patch", "../../../local/patches/kernel/P7-cache-affine-context.patch", "../../../local/patches/kernel/P7-cache-affine-switch.patch", "../../../local/patches/kernel/P7-proc-setname.patch", "../../../local/patches/kernel/P7-proc-setpriority.patch", "../../../local/patches/kernel/P8-futex-requeue.patch", "../../../local/patches/kernel/P8-futex-pi.patch", "../../../local/patches/kernel/P8-futex-robust.patch", "../../../local/patches/kernel/P8-percpu-wiring.patch", "../../../local/patches/kernel/P8-percpu-sched.patch", "../../../local/patches/kernel/P9-proc-lock-ordering.patch", "../../../local/patches/kernel/P9-futex-pi-cas-fix.patch"]
|
||||
|
||||
[build]
|
||||
template = "custom"
|
||||
|
||||
@@ -22,6 +22,7 @@ patches = [
|
||||
"../../../local/patches/relibc/P3-select-not-epoll-timeout.patch",
|
||||
"../../../local/patches/relibc/P3-tls-get-addr-panic-fix.patch",
|
||||
"../../../local/patches/relibc/P3-pthread-yield.patch",
|
||||
"../../../local/patches/relibc/P3-barrier-smp-futex.patch",
|
||||
"../../../local/patches/relibc/P3-secure-getenv.patch",
|
||||
"../../../local/patches/relibc/P3-getentropy.patch",
|
||||
"../../../local/patches/relibc/P3-dup3.patch",
|
||||
@@ -38,10 +39,19 @@ patches = [
|
||||
"../../../local/patches/relibc/P3-header-mod-spawn-threads.patch",
|
||||
"../../../local/patches/relibc/P3-spawn.patch",
|
||||
"../../../local/patches/relibc/P3-threads.patch",
|
||||
"../../../local/patches/relibc/P3-pthread-signal-races.patch",
|
||||
"../../../local/patches/relibc/P3-sysv-ipc.patch",
|
||||
"../../../local/patches/relibc/P3-sysv-sem-impl.patch",
|
||||
"../../../local/patches/relibc/P3-sysv-shm-impl.patch",
|
||||
"../../../local/patches/relibc/P4-setgroups-getgroups.patch",
|
||||
"../../../local/patches/relibc/P5-robust-mutexes.patch",
|
||||
"../../../local/patches/relibc/P5-sched-api.patch",
|
||||
"../../../local/patches/relibc/P5-pthread-sigmask-race.patch",
|
||||
"../../../local/patches/relibc/P4-setgroups-unsafe-fix.patch",
|
||||
"../../../local/patches/relibc/P7-setpriority.patch",
|
||||
"../../../local/patches/relibc/P7-pthread-affinity.patch",
|
||||
"../../../local/patches/relibc/P7-pthread-setname.patch",
|
||||
"../../../local/patches/relibc/P9-spin-and-barrier.patch",
|
||||
]
|
||||
|
||||
[build]
|
||||
|
||||
Symlink
+1
@@ -0,0 +1 @@
|
||||
../../local/recipes/drivers/ehcid
|
||||
Symlink
+1
@@ -0,0 +1 @@
|
||||
../../local/recipes/drivers/ohcid
|
||||
+1
@@ -0,0 +1 @@
|
||||
../../local/recipes/drivers/redox-driver-core
|
||||
Symlink
+1
@@ -0,0 +1 @@
|
||||
../../local/recipes/drivers/redox-driver-pci
|
||||
Symlink
+1
@@ -0,0 +1 @@
|
||||
../../local/recipes/drivers/uhcid
|
||||
Symlink
+1
@@ -0,0 +1 @@
|
||||
../../local/recipes/drivers/usb-core
|
||||
Symlink
+1
@@ -0,0 +1 @@
|
||||
../../local/recipes/kde/kf6-pty
|
||||
@@ -22,6 +22,137 @@ script = """
|
||||
DYNAMIC_STATIC_INIT
|
||||
COOKBOOK_CONFIGURE_FLAGS+=(
|
||||
ac_cv_have_decl_program_invocation_name=no
|
||||
ac_cv_objext=o
|
||||
ac_cv_prog_cc_c_o=yes
|
||||
ac_cv_exeext=
|
||||
acl_cv_rpath=done
|
||||
)
|
||||
|
||||
# Restore the pristine configure scripts on every build, then layer our Redox
|
||||
# cross-build fixes on top. Host autoconf 2.72 regenerates an invalid top-level
|
||||
# configure for this recipe in our environment, so we patch the shipped script
|
||||
# instead of regenerating it.
|
||||
python3 - <<'PYEOF'
|
||||
import os
|
||||
import tarfile
|
||||
from pathlib import Path
|
||||
|
||||
source_dir = Path(os.environ["COOKBOOK_SOURCE"])
|
||||
source_tar = Path(os.environ["COOKBOOK_RECIPE"]) / "source.tar"
|
||||
with tarfile.open(source_tar) as tf:
|
||||
for relative in ("configure", "libcharset/configure"):
|
||||
member = next(m for m in tf.getmembers() if m.name.endswith("/" + relative))
|
||||
target = source_dir / relative
|
||||
target.write_text(tf.extractfile(member).read().decode("utf-8", errors="replace"))
|
||||
PYEOF
|
||||
|
||||
# Upgrade bundled libtool glue in both the top-level tree and nested
|
||||
# libcharset tree to the current host libtool (2.6.0) so generated libtool
|
||||
# helpers match the host ltmain.sh version.
|
||||
for subdir in "${COOKBOOK_SOURCE}" "${COOKBOOK_SOURCE}/libcharset"; do
|
||||
if [ -d "${subdir}" ]; then
|
||||
mkdir -p "${subdir}/m4" "${subdir}/build-aux"
|
||||
cp -f /usr/share/aclocal/libtool.m4 "${subdir}/m4/"
|
||||
cp -f /usr/share/aclocal/ltoptions.m4 "${subdir}/m4/"
|
||||
cp -f /usr/share/aclocal/ltsugar.m4 "${subdir}/m4/"
|
||||
cp -f /usr/share/aclocal/ltversion.m4 "${subdir}/m4/"
|
||||
cp -f /usr/share/aclocal/lt~obsolete.m4 "${subdir}/m4/"
|
||||
cp -f /usr/share/libtool/build-aux/ltmain.sh "${subdir}/build-aux/"
|
||||
fi
|
||||
done
|
||||
|
||||
if [ -d "${COOKBOOK_SOURCE}/libcharset" ]; then
|
||||
(
|
||||
cd "${COOKBOOK_SOURCE}/libcharset"
|
||||
cp -f ../srcm4/relocatable.m4 m4/
|
||||
cp -f ../srcm4/codeset.m4 m4/
|
||||
cp -f ../srcm4/fcntl-o.m4 m4/
|
||||
cp -f ../srcm4/visibility.m4 m4/
|
||||
)
|
||||
fi
|
||||
|
||||
# libcharset templates currently keep @HAVE_VISIBILITY@ unsubstituted on our
|
||||
# Redox cross build. Patch the source templates before configure so every
|
||||
# generated header gets a stable fallback value.
|
||||
for template in \
|
||||
"${COOKBOOK_SOURCE}/libcharset/include/libcharset.h.build.in" \
|
||||
"${COOKBOOK_SOURCE}/libcharset/include/localcharset.h.build.in" \
|
||||
"${COOKBOOK_SOURCE}/include/iconv.h.build.in"
|
||||
do
|
||||
if [ -f "${template}" ]; then
|
||||
sed -i 's/@HAVE_VISIBILITY@/0/g' "${template}"
|
||||
fi
|
||||
done
|
||||
|
||||
export CPP="${GNU_TARGET}-gcc -E"
|
||||
|
||||
# Force cross mode in the shipped top-level configure and keep the rest of the
|
||||
# generated shell structure intact.
|
||||
sed -i '0,/cross_compiling=maybe/s//cross_compiling=yes/' "${COOKBOOK_SOURCE}/configure"
|
||||
python3 - <<'PYEOF'
|
||||
from pathlib import Path
|
||||
import os
|
||||
for relative in ('configure', 'libcharset/configure'):
|
||||
path = Path(os.environ['COOKBOOK_SOURCE']) / relative
|
||||
lines = path.read_text().splitlines()
|
||||
for i, line in enumerate(lines):
|
||||
if "macro_version='2.4.7'" in line or "macro_version='2.5.4-redox-9510'" in line:
|
||||
lines[i] = "macro_version='2.6.0'"
|
||||
if "macro_revision='2.4.7'" in line or "macro_revision='2.5.4-redox-9510'" in line:
|
||||
lines[i] = "macro_revision='2.6.0'"
|
||||
if "grep -v '^ *+' conftest.err >conftest.er1" in line:
|
||||
lines[i] = "test -f conftest.err && grep -v '^ *+' conftest.err > conftest.er1.tmp && mv -f conftest.er1.tmp conftest.er1 || :"
|
||||
if 'cat conftest.er1 >&5' in line:
|
||||
lines[i] = 'test -f conftest.er1 && cat conftest.er1 >&5 || :'
|
||||
if 'mv -f conftest.er1 conftest.err' in line:
|
||||
lines[i] = 'test -f conftest.er1 && mv -f conftest.er1 conftest.err || :'
|
||||
if line.strip() == 'rm -f conftest conftest$ac_cv_exeext':
|
||||
lines[i] = 'rm -rf conftest conftest$ac_cv_exeext'
|
||||
path.write_text("\\n".join(lines) + "\\n")
|
||||
PYEOF
|
||||
|
||||
cookbook_configure
|
||||
"""
|
||||
|
||||
# libcharset's configure currently leaves @HAVE_VISIBILITY@ unsubstituted in
|
||||
# generated headers on our Redox cross build. Normalize the generated headers
|
||||
# so the compile path matches the already-published libiconv artifact.
|
||||
for header in \
|
||||
include/libcharset.h \
|
||||
include/localcharset.h \
|
||||
libcharset/include/libcharset.h \
|
||||
libcharset/include/localcharset.h
|
||||
do
|
||||
if [ -f "${header}" ]; then
|
||||
sed -i 's/@HAVE_VISIBILITY@/0/g' "${header}"
|
||||
fi
|
||||
done
|
||||
|
||||
# Force the nested libcharset configure step now, then patch the generated
|
||||
# headers in the build tree before the top-level make descends into libcharset.
|
||||
if [ -d "libcharset" ]; then
|
||||
(
|
||||
cd libcharset
|
||||
"${COOKBOOK_SOURCE}/libcharset/configure" \
|
||||
--disable-option-checking \
|
||||
--prefix=/usr \
|
||||
--host="${GNU_TARGET}" \
|
||||
--enable-shared \
|
||||
--enable-static \
|
||||
ac_cv_have_decl_program_invocation_name=no \
|
||||
CC="${GNU_TARGET}-gcc" \
|
||||
LDFLAGS="${LDFLAGS}" \
|
||||
CPPFLAGS="${CPPFLAGS}" \
|
||||
--cache-file=/dev/null \
|
||||
--srcdir="${COOKBOOK_SOURCE}/libcharset"
|
||||
)
|
||||
for header in \
|
||||
libcharset/include/libcharset.h \
|
||||
libcharset/include/localcharset.h
|
||||
do
|
||||
if [ -f "${header}" ]; then
|
||||
sed -i 's/@HAVE_VISIBILITY@/0/g' "${header}"
|
||||
fi
|
||||
done
|
||||
fi
|
||||
|
||||
"""
|
||||
|
||||
Symlink
+1
@@ -0,0 +1 @@
|
||||
../../local/recipes/system/cpufreqd
|
||||
Symlink
+1
@@ -0,0 +1 @@
|
||||
../../local/recipes/system/driver-manager
|
||||
Symlink
+1
@@ -0,0 +1 @@
|
||||
../../local/recipes/system/hwrngd
|
||||
Symlink
+1
@@ -0,0 +1 @@
|
||||
../../local/recipes/system/numad
|
||||
Symlink
+1
@@ -0,0 +1 @@
|
||||
../../local/recipes/system/thermald
|
||||
Reference in New Issue
Block a user