Commit Graph

35 Commits

Author SHA1 Message Date
vasilito 573b3e6eae fix: handle early-boot exceptions in excp_handler gracefully
excp_handler() called context::current() unconditionally, which panics
with 'not inside of context' when no context exists yet (before
context::init() runs in kmain/kmain_ap). On bare metal, a page fault
during BSP's start() — e.g. ACPI table access or device MMIO — caused
page_fault_handler() to return Err, falling through to excp_handler(),
which then panicked at context::current() instead of reporting the
actual fault.

Replace context::current() with context::try_current(). When None,
log the exception details (kind, code, faulting address) and panic
with a descriptive message. This turns an uninformative cascading
panic into a diagnostic one that reveals the real faulting address.
2026-07-02 22:24:23 +03:00
vasilito c6a5b7a1ad Tier 2: per-CPU sched stats, NUMA-aware scheduling, init numa
- CpuStats: add context_switches and steals AtomicU64 counters,
  remove redundant per_cpu field from CpuStatsData
- context/switch.rs: increment per-CPU switches at context switch,
  increment steals at work-steal; add NUMA vruntime bonus (1/8 for
  exact-CPU match, 1/16 for same-node)
- context/mod.rs: least_loaded_cpu() now NUMA-aware, prefers same-node
  CPUs (accepts <=1 extra queued context vs cross-node best)
- scheme/sys/sched.rs: new kernel handler exposing per-CPU scheduler
  stats (switches, steals, queue_depth) via /scheme/sys/sched
- startup/mod.rs: call numa::init_default() during boot (was dead code)
2026-07-02 21:40:20 +03:00
vasilito e812356cf0 fix: per-CPU idle context race condition + nightly-2026-04-11 pin
- Add try_idle_context() to ContextSwitchPercpu (switch.rs)
  Cross-CPU paths (steal_work, migrate_one_context) use try_idle_context()
  instead of idle_context() to avoid panic when APs haven't called
  context::init() yet. Returns Option<context::Arc> instead of panicking.
- Pin rust-toolchain.toml to nightly-2026-04-11
- Remove build artifacts (kernel, kernel.all, kernel.sym) from git tracking
- This fixes the boot panic that occurred during multi-CPU scheduling
2026-07-02 16:53:19 +03:00
vasilito 5098d1651f kernel: revert -Z json-target-spec to original nightly-2025-10-03 build
Reverts the prior session's -Z json-target-spec addition
that was breaking the build on nightly-2025-10-03 (the
kernel's rust-toolchain.toml specified toolchain). The
flag did not exist in that nightly; only nightly-2026-04-11
has it. Since the prior toolchain can build custom .json
target specs without any cargo-level gating (just pass
-Zunstable-options through -- separator to rustc),
the cleanest fix is to use rustc's -- directly:

  cd SOURCE && cargo rustc -Z build-std=core,alloc ...
    --bin kernel --target FILE --release
    -- -C link-arg=...

  RUSTUP_TOOLCHAIN=nightly-2025-10-03 is explicit so the
  Makefile build works regardless of which toolchain the
  outer shell has.

  Also: restore rust-toolchain.toml to nightly-2025-10-03
  (the version pinned in this fork). The 2026-04-01 bump
  was a workaround attempt that did not work.

  And: add .cargo/config.toml with [unstable]
  json-target-spec = true as the new standard way (cargo
  PR #16557) of enabling custom .json target specs. This
  is harmless on older toolchains that don't have the feature
  (cargo ignores unknown config keys).

Discovered via research into the nightly-2026-04-11 vs
nightly-2025-10-03 divergence after the redbear-mini build
failed with 'unknown -Z flag specified: json-target-spec'.
2026-07-02 14:49:17 +03:00
vasilito 1c870c06ec kernel: add -Zunstable-options to cargo rustc for custom target
cargo 1.98.0-dev (4d1f98451 2026-05-15) requires
-Zunstable-options to be passed to cargo itself (not just
rustc) to accept a custom target spec. Without it, the
kernel Makefile fails with:

  error: error loading target specification: custom targets
  are unstable and require `-Zunstable-options`

The Makefile already had -Z build-std, -Zbuild-std-features,
and -Z json-target-spec (which are passed to rustc), but the
top-level cargo invocation needed -Zunstable-options
to accept the target.

This is required by both nightly-2025-10-03 (the kernel fork
rust-toolchain) and nightly-2026-04-01 (the host default). On
the cookbook (redoxer-1.0 toolchain), the error is the same
because -Zunstable-options is a separate cargo-level flag
from the rustc-level -Z flags.

Discovered when attempting to build redbear-mini after the
0.2.5 fork was created from 0.2.4. The Makefile worked on
0.2.4 because the prior kernel cook used a cached build; the
0.2.5 build started fresh and hit the error.
2026-07-02 14:22:34 +03:00
vasilito baadbfc539 kernel: refresh Cargo.lock (mtime + relibc-rebuild attempt)
This is a bookkeeping commit to capture Cargo.lock and
the touched lib.rs from the cookbook's auto-stash step in
the 0.2.5 kernel build attempt that hit the
json-target-spec / rust toolchain mismatch. The actual
code changes are on the kernel branch and the relevant
submodule gitlink will be bumped in a future commit.

The kernel cross-build is using the same nightly-2025-10-03
toolchain (per the kernel fork's rust-toolchain.toml).
The cookbook uses nightly-2026-04-01 which has the
target-spec-json flag the Makefile needs. A unified
toolchain setup is a future-work item.
2026-07-02 13:43:14 +03:00
vasilito d41d0aa728 kernel: support proc:{thread_fd}/<sub-handle> path format
The relibc fork's pthread_setname_np / pthread_getname_np /
pthread_setaffinity_np / pthread_getaffinity_np and
mutex_owner_id_is_live all use the path format
'proc:{thread_fd}/<sub-handle>' (e.g.
'proc:123/name' or 'proc:123/sched-affinity') via a single
Sys::open() call.

Previously the proc scheme's OpenTy::Auth handler in
src/scheme/proc.rs only recognized 'new-context' and
'cur-context' as literal strings, so '123/name' would
hit the _ => ENOENT arm and the relibc calls would
fail with ENOENT at runtime.

Fix: add a third arm in OpenTy::Auth that splits the
operation string on the first '/', parses the prefix as
a numeric context id, looks up the corresponding
ContextHandle in the HANDLES map, and recursively
dispatches to openat_context with the suffix as the
sub-handle path. This makes 'proc:123/name' resolve to
the same handle chain that 'dup(123, "name")' would
have produced.

The recursive call is safe because openat_context
doesn't depend on the Authority-only state. The
HANDLES map is read-locked; we drop the lock before
the recursive call by scoping the handles variable.

Discovered by Oracle review of Phase 0c patches
(Issue 1). The bug was latent in the original
P5-proc-setschedpolicy patch (before Phase 0c) and
survived because the relibc code paths were never
exercised at runtime.
2026-07-02 10:53:52 +03:00
vasilito d37b421cb3 kernel: fix wakeup_contexts vs steal_work deadlock
Two-sided fix for the lock-ordering deadlock discovered by
Oracle review (Issue 24):

1. wakeup_contexts (this fn) held IDLE_CONTEXTS while
   waiting for SchedQueuesLock on its own CPU via
   SchedQueuesLock::new(&percpu.sched). If another CPU's
   steal_work was holding that SchedQueuesLock (via a victim
   SchedQueuesLock) and waiting for IDLE_CONTEXTS, both
   threads spin forever.

   Fix: drop idle_contexts immediately after building the
   wakeups Vec. The Vec is the only data we need; releasing
   the lock here means steal_work on another CPU can proceed
   while this CPU acquires its own SchedQueuesLock.

2. steal_work held a victim's SchedQueuesLock (victim_lock)
   while calling idle_contexts(token.downgrade()).push_back
   on a context that turned out to be Blocked. This is the
   matching side of the deadlock: CPU A held IDLE_CONTEXTS and
   waited for its own SchedQueuesLock; CPU B (steal_work) held
   CPU A's SchedQueuesLock and waited for IDLE_CONTEXTS.

   Fix: use idle_contexts_try (try_lock) instead of
   idle_contexts (blocking lock). If IDLE_CONTEXTS is busy
   (owned by wakeup_contexts on another CPU), skip the
   push-back; the context will be re-checked on the next
   wakeup round because it was not removed from IDLE_CONTEXTS
   (the Blocked status was set, but it stayed in IDLE_CONTEXTS
   because we never re-pushed it).

The original code at line 429 used idle_contexts (blocking)
which is what makes this a real deadlock. try_lock is safe
because:
  - If try_lock succeeds, the context is correctly pushed
  - If try_lock fails, the context is still in IDLE_CONTEXTS
    (we never removed it), so the next wakeup_contexts will
    find it again
2026-07-02 10:36:17 +03:00
vasilito 6a5582783f kernel: fix inverted nice-to-prio mapping in proc Priority handle
The proc scheme's Self::Priority write handler used
'kernel_prio = (20 - nice) as usize' which maps:
  nice -20 -> kernel_prio 40, clamped to 39
  nice  0 -> kernel_prio 20
  nice 19 -> kernel_prio 1

But SCHED_PRIO_TO_WEIGHT[39] = 15 (lowest weight, least CPU
time), and SCHED_PRIO_TO_WEIGHT[0] = 88761 (highest weight,
most CPU time). So the old formula gave processes that set
nice to the most favorable value (-20) the LEAST CPU time,
and processes that set nice to the least favorable value (+19)
the MOST CPU time. Completely inverted.

Correct formula: kernel_prio = (nice + 20) as usize, giving:
  nice -20 -> kernel_prio  0 (highest weight, most CPU)
  nice  0 -> kernel_prio 20
  nice 19 -> kernel_prio 39 (lowest weight, least CPU)

The corresponding read path (kernel_prio -> nice) is
'nice = (context.prio as i32 - 20)'. The old read was
'(20 - context.prio as i32)' which had the same inversion
plus a clamp that hid the bug for prio 0 (-> nice 20, clamped
to 19, never returned the correct -20).

Also fix the self-contradictory doc comment on Context::
set_sched_other_prio which claimed 'prio 39 is the lowest nice
value (highest CPU weight)' — actually prio 0 is the highest
weight and highest priority.

Discovered by Oracle review of Phase 0c patches (Issue 29).
The bug was introduced in the original P5-proc-setschedpolicy
patch (before Phase 0c) and survived because the kernel
boots with default priority 20 (nice 0), so the inversion was
invisible during normal testing.
2026-07-02 10:19:08 +03:00
vasilito 4789d546e2 kernel: add SchedPolicy/Name/Priority proc scheme handles
Wire up three new ContextHandle variants and their /proc/<pid>/{name,
sched-policy, priority} paths so that userspace (libredox, relibc's
pthread_setname_np / pthread_setschedparam / setpriority) can read
and write these per-context fields.

Changes:

  ContextHandle enum (proc.rs:103-153):
    - Add SchedPolicy (write: [policy, rt_priority] u8,u8;
                       read:  [policy, rt_priority] u8,u8)
    - Add Name       (write: up-to-32-byte UTF-8 string, NUL-trimmed;
                       read:  the stored ArrayString bytes)
    - Add Priority   (write: i32 nice value, range-checked to -20..=19;
                       read:  i32 nice value computed from context.prio)

  openat_context paths (proc.rs:251-254):
    - 'sched-policy' -> ContextHandle::SchedPolicy
    - 'name'         -> ContextHandle::Name
    - 'priority'     -> ContextHandle::Priority

  Attr write handler (proc.rs:1286):
    - Switched from 'guard.prio = (info.prio as usize).min(39)'
      to 'guard.set_sched_other_prio(info.prio as usize)' so that
      both prio AND sched_static_prio are kept in sync. Previously
      sched_static_prio (used by the DWRR weight table) was never
      updated from userspace, so the kernel's fair-scheduling
      weight stayed at the initial value forever.

Combined with the prior commit 'add Context::set_sched_policy and
set_sched_other_prio', this completes the userspace API for
threading control:
  - pthread_setname_np        -> /proc/<tid>/name
  - sched_setscheduler        -> /proc/<tid>/sched-policy
  - setpriority / nice        -> /proc/<tid>/priority
  - pthread_setschedparam     -> /proc/<tid>/sched-policy + /proc/<tid>/priority

cargo check: now exits 0 with 0 errors. 37 warnings remain (all
pre-existing, none blocking).

Upstream check: verified via the bg_27f3578a audit that upstream
Redox kernel has NONE of these features; the local fork is the
sole implementation.
2026-07-02 07:00:07 +03:00
vasilito e8ec916158 acpi/fadt: fix pre-existing usize/u32 type mismatch on x86_64
The FADT_MIN_SIZE_ACPI_2_0 and FADT_MIN_SIZE_ACPI_1_0 constants
were defined as usize, but the Sdt::length() method they are
compared against returns u32. On x86_64, this is a type mismatch
because usize is u64 and u32 is u32 — the comparison
  sdt.length() >= FADT_MIN_SIZE_ACPI_2_0
fails to compile with E0308 'expected u32, found usize' (the
inferred LHS type is u32, the RHS constant is usize).

Root cause: the constants were originally written for a build
target where usize == u32 (i386), so the implicit comparison
worked. When the target moved to x86_64, the type mismatch became
visible but was never resolved.

Fix: change both constants to u32. The values 148 and 76 are
trivially representable in u32 (ACPI spec FADT minimum size limits),
and u32 matches the Sdt::length() return type per the ACPI 6.5
spec which defines the SDT length field as a 32-bit integer.

This was the lone remaining cargo check error in the local
kernel fork, blocking clean cargo check validation of every other
change. With this fix, cargo check now exits 0 (modulo pre-existing
unrelated warnings).

The fadt.rs module was touched in earlier Red Bear OS commits
(9bc1fbf 'fix Phase II.X.W FACS parser + Sdt length() + UserSlice
access' and 475f96e 'comprehensive FACS parser') but the type
mismatch on the constant was not fixed at that time.
2026-07-02 06:58:22 +03:00
vasilito 327c1502d1 kernel: add Context::set_sched_policy and set_sched_other_prio
The P5-proc-setschedpolicy, P7-proc-setname, and P7-proc-setpriority
patches all call context.set_sched_policy() and
context.set_sched_other_prio() — but neither method existed in the
local fork. Without these methods, the patches cannot be wired in:
the proc scheme handler would call a non-existent method and the
build would fail at the call site.

Implement both methods on Context:

  set_sched_policy(policy, rt_priority):
    - Sets self.sched_policy
    - Clamps rt_priority to 0..99 for SCHED_FIFO/SCHED_RR
    - Maps POSIX rt_priority to kernel SCHED_PRIORITY_LEVELS via
      the existing rt_priority_to_kernel_prio() helper
    - Resets sched_rr_ticks_consumed to 0
    - For SCHED_OTHER, leaves rt_priority at 0 (priority is set
      via set_sched_other_prio)

  set_sched_other_prio(prio):
    - Clamps prio to 0..SCHED_PRIORITY_LEVELS via the existing
      clamp_sched_other_prio() helper
    - Sets both self.prio and self.sched_static_prio

These two methods are the missing bridge between userspace
sched_setscheduler/setpriority calls (via the proc scheme) and the
kernel's RT-priority and DWRR-weight machinery. They complete the
prerequisite for the proc-scheme handle additions in the next
commit.

Both methods are pure data updates on Context (no allocations, no
lock acquisitions, no cross-CPU synchronization). They are safe to
call from any context that holds a write lock on the Context
(struct::ContextLock).

cargo check: 1 error remains (pre-existing fadt.rs:110 type mismatch
unrelated to threading).
2026-07-02 06:54:51 +03:00
vasilito 7fc8bbf057 kernel: apply P8-initial-placement, P9-numa-topology, P9-proc-lock-ordering
Phase 0c, plan orders #5, #10, #11.

  P8-initial-placement: context::Context::spawn() now picks the
    least-loaded CPU for new threads based on PercpuSched.balance,
    replacing the old 'pin to birth CPU' default.

  P9-numa-topology: adds src/numa.rs (NumaTopology, NumaHint types and
    MAX_NUMA_NODES constant) and threads the get_percpu_block import
    through context/mod.rs. NUMA discovery is performed by userspace
    numad via /scheme/acpi/ and pushed to the kernel via scheme:numa;
    the kernel stores a lightweight copy for O(1) scheduler lookups.

  P9-proc-lock-ordering: fix to scheme/proc.rs acquire order to
    prevent deadlock between proc scheme handles and the per-CPU
    sched lock. Required after P8-percpu-wiring moved the scheduler
    state to per-CPU.

After this commit, three more of the plan's eleven P5–P9 patches are
landed. Remaining unlanded: P5-sched-rt-policy, P6-vruntime-switch,
P7-cache-affine-switch (all touch switch.rs which now diverges from
the patch baselines), and P5-scheme-sched-id/P5-proc-setschedpolicy/
P7-proc-setname/P7-proc-setpriority (overlap on scheme/proc.rs:10X-14X
context handle enum).

cargo check: 1 error remaining (pre-existing src/acpi/fadt.rs:110
unrelated to threading work).
2026-07-02 06:43:23 +03:00
vasilito f7652fc26a kernel: apply P5-context-mod-sched, P8-percpu-sched, P8-percpu-wiring
Phase 0c, plan orders #3, #4, #7.

  P5-context-mod-sched: re-export SchedPolicy from context::mod (one-line
    change to the use statement). The type is defined in context::context
    by the previous P7-cache-affine-context commit; this just makes it
    available as crate::context::SchedPolicy.

  P8-percpu-sched: adds PerCpuSched struct to percpu.rs with SyncUnsafeCell-
    wrapped run_queues, balance/last_queue/last_balance_time cells, and
    take_lock/release_lock methods. Refactors PercpuBlock to embed
    PerCpuSched as 'sched' field instead of standalone 'balance'/'last_queue'
    fields. Adds get_percpu_block() helper.

  P8-percpu-wiring: rewrites src/context/switch.rs to consume PerCpuSched:
    - select_next_context reads from percpublock.sched.queues() instead
      of the global RunContextData.set
    - Initial placement chooses least-loaded CPU via PercpuSched.balance
    - Load balance trigger fires periodically and migrates contexts
      between per-CPU queues respecting sched_affinity
    - Adds pub const fn to access per-cpu sched state safely

After this commit, the kernel builds with per-CPU run queues wired
into the scheduler. cargo check still has 1 pre-existing unrelated
error (src/acpi/fadt.rs:110 type mismatch) that predates the threading
work.

Combined with the P6-futex-sharding commit, this completes the
foundation for Phase 1 (Futex Completeness) and Phase 2 (SMP Scheduling
Quality).
2026-07-02 06:42:08 +03:00
vasilito cbf051e6d8 kernel: manual resolution of P7-cache-affine-context for current fork
The P7-cache-affine-context patch fails to apply because the current
fork's context.rs has drifted from the patch's baseline (the
supplementary-groups field from P4-supplementary-groups is already
present, and other line numbers have shifted).

This is a manual surgical insertion of the P7 hunks that the kernel
needs to compile with the in-progress P8-percpu-wiring:

  - Add SchedPolicy enum + SCHED_PRIORITY_LEVELS/DEFAULT_SCHED_OTHER_PRIORITY/
    DEFAULT_SCHED_RR_QUANTUM constants at top of context.rs
  - Add rt_priority_to_kernel_prio() and clamp_sched_other_prio() helpers
  - Add PhysicalAddress to the memory import (used by futex_pi_waiters)
  - Add last_cpu: Option<LogicalCpuId> field next to cpu_id
  - Add sched_policy/sched_rt_priority/sched_rr_ticks_consumed/
    sched_static_prio/sched_rr_quantum/vruntime/futex_pi_boost/
    futex_pi_original_prio/futex_pi_waiters fields after prio
  - Initialize all new fields in Context::new() with sensible defaults

Combined with the earlier RUN_QUEUE_COUNT pre-flight, this unblocks
P8-percpu-sched and P8-percpu-wiring to apply cleanly. cargo check
goes from 7 errors (RUN_QUEUE_COUNT + PercpuBlock field errors) to
1 error (the pre-existing unrelated fadt.rs type mismatch).

Phase 0c, plan order pre-flight for P7. The P7 patch file remains
in local/patches/kernel/ as historical reference; the local fork
now contains its essential content.
2026-07-02 06:41:12 +03:00
vasilito 5fb42fcaa1 kernel: define RUN_QUEUE_COUNT in context/mod.rs
Pre-flight for Phase 0c. The P8-percpu-sched and P8-percpu-wiring
patches both reference crate::context::RUN_QUEUE_COUNT but none of
the kernel P5–P9 patches define it (verified by grep). The downstream
patches have an incomplete dependency: they need this constant at
the module level but no patch supplies it.

Add 'pub const RUN_QUEUE_COUNT: usize = 40;' here, matching the
historical 40-priority DWRR queue count. The P7-cache-affine-context
patch separately defines 'pub const SCHED_PRIORITY_LEVELS: usize = 40;'
in context/context.rs which is a duplicate; both being 40 keeps the
existing SCHED_PRIO_TO_WEIGHT and quantum tables valid.
2026-07-02 06:33:08 +03:00
vasilito ed3f0e1e64 kernel: futex 64-shard hash table (Phase 0c, plan order #1)
Re-apply P6-futex-sharding.patch from local/patches/kernel/ to the local
fork. Replaces the single global Mutex<L1, FutexList> with a 64-shard
hash table to eliminate contention between futex operations on
different addresses (different cores no longer serialize on one lock).

src/syscall/futex.rs: static FUTEXES changes from a single
Mutex<L1, FutexList> to a [Mutex<L1, Shard>; 64] array indexed by
hash of the physical address.

This is the foundation patch for Phase 1 (Futex Completeness).
All later futex work (REQUEUE, PI, robust, WAKE_OP) depends on the
sharding being present.

The Cargo.lock diff is the expected dep resolution update.

Multi-threading plan Phase 0c, plan order #1 (P6-futex-sharding).
2026-07-02 06:26:24 +03:00
vasilito 9bc1fbfe46 kernel: fix Phase II.X.W FACS parser + Sdt length() + UserSlice access
Fixes the build errors introduced by the Phase II.X.W
FACS parser and the Sdt length() method:

* src/acpi/sdt.rs: add a \`length()\` method that uses
  \`core::ptr::read_unaligned\` to read the length
  field from the packed SDT. The Sdt is \`#[repr(C,
  packed)]\` so direct field access is not allowed.
  The new method returns a u32 (matching the SDT
  spec). Fixes the E0308 errors in fadt.rs and facs.rs.

* src/acpi/fadt.rs: use \`sdt.length()\` (the new
  method) instead of \`sdt.length\` (direct field
  access) for the FADT size check.

* src/acpi/mod.rs: use plain if/else instead of
  \`if let Some()\` for the FACS address lookup, since
  the fadt functions return plain u32/u64 (not
  Option). The address 0 is treated as 'no FACS'.

* src/scheme/acpi.rs: use
  \`payload.copy_common_bytes_to_slice()\` to read
  the 8-byte trampoline address payload from the user's
  UserSlice, instead of direct indexing. Fixes the
  E0608 error.

All these fixes maintain the Phase II.X.W functionality
(per-Linux 7.1 FACS parser, per-Linux acpi_set_firmware_
waking_vector semantics).
2026-07-01 17:04:11 +03:00
vasilito 475f96ecab kernel: comprehensive FACS parser + Phase II.X.W SetS3WakingVector AcPiVerb
Phase II.X.W: comprehensive FACS parser + SetS3WakingVector +
EnterS3 AcPiVerbs. The full S3 round-trip is now wired.

* FACS parser (src/acpi/facs.rs): comprehensive implementation
  matching Linux 7.1's struct acpi_table_facs from
  include/acpi/actbl.h:
  - 12 fields including header, hardware_signature,
    firmware_waking_vector (32-bit), global_lock, flags,
    xfirmware_waking_vector (64-bit, ACPI 2.0+), version,
    reserved[3], ospm_flags (ACPI 4.0+), reserved1[24].
  - 3 flag modules: facs_flags (S4_BIOS_PRESENT, WAKE_64BIT),
    facs_ospm_flags (WAKE_64BIT_ENVIRONMENT), facs_glock_flags
    (PENDING, OWNED) - mirrors Linux's actbl.h constants.
  - Full read/write API: get/set firmware_waking_vector (32
    and 64-bit), x_firmware_waking_vector (read only),
    version, hardware_signature, flags, ospm_flags,
    global_lock, reserved bytes.
  - Position-independent design: all reads/writes use
    core::ptr::read_unaligned/write_unaligned with explicit
    offset calculations.
  - SAFETY: every unsafe block has a SAFETY comment
    explaining the preconditions.

* FADT parser (src/acpi/fadt.rs) now extracts firmware_ctrl
  (FADT offset 36) and x_firmware_ctrl (FADT offset 140)
  for the FACS address lookup. Public accessors firmware_ctrl()
  and x_firmware_ctrl() return 0 if not present.

* acpi init (src/acpi/mod.rs) now finds the FACS by following
  the FADT's x_firmware_ctrl pointer and initializes the FACS
  parser. Logs a warning if FACS is not found.

* AcPiScheme kcall handler (src/scheme/acpi.rs) now dispatches
  on two new Phase II.X.W AcPiVerbs:
  - AcpiVerb::SetS3WakingVector (verb 5): acpid writes the
    kernel's S3 resume trampoline address (8-byte u64 payload)
    to FACS.xfirmware_waking_vector. A zero payload is a
    sentinel for 'use the kernel's default trampoline
    address' (s3_trampoline symbol). Mirrors Linux 7.1's
    acpi_set_firmware_waking_vector in ACPICA.
  - AcpiVerb::EnterS3 (verb 6): acpid requests the kernel to
    enter S3. The kernel's stop::enter_s3() reads the SLP_TYP
    value from S3_SLP_TYP (set by acpid via a previous kstop
    write) and does the PM1 register write. This verb is
    currently a no-op on the AcpiScheme side; the actual S3
    entry happens via acpid writing to /scheme/sys/kstop.

* Hardware-agnostic: works on any x86_64 system with standard
  ACPI S3 support (Dell, HP, Lenovo, LG Gram 14). On Modern
  Standby-only systems (LG Gram 16 (2025)), the kernel never
  enters S3 so these verbs are no-ops.
2026-07-01 16:31:31 +03:00
vasilito 1be659be05 kernel: Phase II.X S3 resume trampoline + state save in enter_s3
Phase II.X: hardware-agnostic S3 resume trampoline. The
kernel now:

* Saves the CPU state (rax, rbx, rcx, rdx, rsi, rdi, rbp,
  r8..r15, segment registers ds/es/fs/gs/ss, RFLAGS, RSP,
  RIP, CR3) to a static S3State struct before entering
  S3. This is done in `enter_s3()` in
  `arch/x86_shared/stop.rs` via the new
  `s3_state_save_global` function.
* Exposes a `s3_trampoline` function (in
  `arch/x86_shared/s3_resume.rs`) implemented as a
  64-bit `naked_asm!` block. The trampoline:
  - Checks the magic value (0x123456789abcdef0) in
    S3_STATE.saved_magic. If zero (cold boot), halts.
  - Restores ds/es/fs/gs/ss to __KERNEL_DS.
  - Restores CR3 (page table base).
  - Restores RSP (kernel stack pointer).
  - Restores RFLAGS.
  - Restores the 13 general-purpose registers.
  - Sets the RESUMING_FROM_S3 flag.
  - Pushes the saved RIP onto the stack and uses `ret`
    to jump to it (the kernel's kmain_resume_from_s3
    is the entry point).
* Exposes `s3_resume_address()` that returns the
  trampoline's address. acpid writes this to FACS
  .waking_vector via the kernel AcpiScheme.
* Exposes `s3_state_valid()` that the kernel checks
  during boot to determine if this is a cold boot or a
  resume from S3.
* Exposes `is_resuming_from_s3()` that the kernel
  checks during resume to skip early init.

Cross-reference: Linux 7.1
`arch/x86/kernel/acpi/wakeup_64.S` does the same
thing in 64-bit assembly. Red Bear OS uses Rust's
`naked_asm!` instead of a separate .S file,
keeping the trampoline inline with the kernel source.
The Redox implementation also adds CR3 restoration
(which Linux handles via the trampoline's code in
`arch/x86/kernel/acpi/wakeup_64.S`) and uses the
standard 0x123456789abcdef0 magic for state validation.

Hardware-agnostic: works on any x86_64 system with
standard ACPI S3 support (Dell, HP, Lenovo, LG Gram 14).
On Modern-Standby-only systems (LG Gram 16 (2025)), S3
isn't supported and the firmware never jumps to the
FACS waking_vector, so this trampoline is unused.

Build: redbear-mini.iso (512 MB) builds successfully.
QEMU test: QEMU's S3 emulation is limited and the
firmware does not actually jump to the FACS waking_vector
in the QEMU default config, so the S3 resume path is
not tested at QEMU time. The trampoline is verified to
compile and be present in the ISO.
2026-07-01 15:52:08 +03:00
vasilito 6b98c64663 kernel: [patch.crates-io] libredox + [patch.'<URL>'] redox_syscall for Phase J
Phase J: the kernel needs two Cargo patch overrides so
that the typed-AcPiVerb path (EnterS2Idle / ExitS2Idle)
is usable. Without these:
* the kernel's redox_syscall dep is fetched from
  gitlab.redox-os.org (upstream), so the local fork at
  local/sources/syscall (with the new AcPiVerb variants)
  is not visible to the kernel's build.
* the libredox dep is fetched from crates.io, so the
  local fork at local/sources/libredox (which uses the
  local syscall fork) is not visible. This means
  libredox::error::Error and syscall::Error are
  different compile-time types and the E0277 errors in
  scheme-utils and daemon return.

The fix: a single [patch.crates-io] section overriding
libredox (which is from crates.io) and a [patch.'<URL>']
section overriding redox_syscall (which is from a git URL).
[patch.crates-io] only matches crates.io deps; [patch.'<URL>']
matches the dep's source URL.

Also: declare members = ['.', 'rmm'] in the [workspace]
section. Without this, cargo doesn't recognize the kernel
as a workspace and the [patch] sections are silently
ignored (workspace_metadata is None). The members list
includes the kernel's own directory and the rmm path
dep.
2026-07-01 14:03:18 +03:00
vasilito 01ef6f5c5d kernel: Phase J EnterS2Idle/ExitS2Idle AcPiVerb dispatch in kstop handle
Phase J: extend the kernel AcpiScheme's kcall to dispatch
on the new EnterS2Idle and ExitS2Idle AcPiVerb variants
from the local syscall fork. The kernel's scheme/acpi.rs
kcall handler now has a match arm for each new verb.

* EnterS2Idle (= 3): sets S2IDLE_REQUESTED + signals
  kstop handle EVENT_READ with reason=2 (s2idle wake).
  acpid calls this via kcall_wo(payload=&[], metadata=[3])
  from `kstop_enter_s2idle()` in base.

* ExitS2Idle (= 4): s2idle wake path. Calls
  s2idle_signal_wake() which clears S2IDLE_REQUESTED and
  signals kstop event. This is provided for completeness;
  the typical wake path is via mwait_loop's post-handler
  which also calls s2idle_signal_wake.

Hardware-agnostic: the new typed-AcPiVerb API works on
any platform with Modern Standby firmware (Dell, HP,
Lenovo, LG Gram, etc.). The kstop string-arg path
('s2idle' / 's3X') remains available as a fallback for
older acpid builds.

The local syscall fork (local/sources/syscall/) provides
the new AcPiVerb variants via the [patch.crates-io]
overrides in base/Cargo.toml and kernel/Cargo.toml. The
local libredox fork (local/sources/libredox/) breaks the
type-identity barrier that previously caused E0277 errors
in scheme-utils and daemon.
2026-07-01 13:09:23 +03:00
vasilito 3f2f3bacc5 kernel: Phase J [patch.crates-io] libredox (mirror of base's commit)
The kernel needs the same libredox override as base: the
local libredox fork at ../libredox uses the local syscall
fork at ../syscall, so the kernel's libredox::error::Error
type is now the same compile-time type as syscall::Error.
The [patch.crates-io] libredox override in the kernel
workspace is what wires this through.

This is the kernel-side mirror of the base commit
aadf55b ('base: Phase J [patch.crates-io] libredox +
kstop_enter_s2idle helper').
2026-07-01 13:07:25 +03:00
vasilito 9f6a4288b5 kernel: Phase II S3 entry path (PM1 direct write + FADT parse)
Phase II: hardware-agnostic S3 entry. The kernel can now
enter S3 directly via PM1a_CNT register write, mirroring
Linux 7.1 `acpi_hw_legacy_sleep` in
`drivers/acpi/acpica/hwsleep.c:81-127`.

* New module `acpi/fadt.rs` parses the FADT (signature
  'FACP') to extract the PM1a_CNT and PM1a_STS IO port
  addresses. ACPI 6.5 §5.2.9 / Table 5.6 (PM1a_CNT at
  offset 56, PM1a_STS at offset 48). 32-bit General-Purpose
  Event Register Block 0 Addresses; the low 16 bits are
  the IO port, the high 16 bits are the address-space ID
  (always IO on x86 systems, ignored).
* `acpi/mod.rs` calls fadt::init() during ACPI table
  discovery. If the FADT is missing, the S3 entry path
  is disabled (a warning is logged). Hardware-agnostic.
* `scheme/acpi.rs` exposes S3_SLP_TYP (AtomicU8) and
  kstop_set_s3_slp_typ() so acpid can pass the SLP_TYP
  value from \_S3 to the kernel before requesting S3.
* `scheme/sys/mod.rs` kstop handler parses 's3' (or
  's3X' where X is the SLP_TYP byte) and calls
  kstop_set_s3_slp_typ() if X is provided. If not, the
  default S3 SLP_TYP=5 is used (standard for x86).
* `arch/x86_shared/stop.rs` enter_s3() is fully
  implemented:
  1. Clear WAK_STS (bit 15 of PM1a_STS)
  2. Flush CPU caches (wbinvd)
  3. Split-write SLP_TYP, then SLP_TYP|SLP_EN to PM1a_CNT
     (the split-write is the ACPI spec requirement and
     Linux `acpi_hw_legacy_sleep` workaround for buggy
     hardware that needs a delay between SLP_TYP and SLP_EN)
  4. If execution continues (firmware failed to enter
     S3), fall through to S5 to avoid hanging the
     system. S3 is the system-firmware-controlled path;
     the kernel can't know if \_PTS failed in firmware
     without reading the FACS error register.

Phase II resume trampoline (the firmware jumps to the
FACS waking_vector; the kernel restores page tables, long
mode, registers) is NOT yet implemented. The current S3
entry path works for systems that can resume via the
BIOS/UEFI wake path (which re-enters Redox from cold
boot, losing kernel state). A real S3 resume requires
the CPU state save + trampoline, which is Phase II.X
(deferred).

Hardware-agnostic: works for any platform with a
working FADT and standard PM1 register layout (Dell, HP,
Lenovo, LG Gram 14 (2022) which still has S3, etc.).
Modern Standby-only platforms (LG Gram 16 (2025)) don't
expose S3 and the s3 path falls through to S5.
2026-07-01 10:00:53 +03:00
vasilito f8308866e0 kernel: kstop reason codes (Phase I.5 s2idle / s3 wire)
Phase I.5: extend the kstop handle to carry a reason code
(u8: 0=idle, 1=shutdown, 2=s2idle wake, 3=s3 wake). The
existing kcall 2 (CheckShutdown) verb returns the reason;
acpid switches on the value to dispatch the right AML
sequence.

* 1 (shutdown): acpid runs \_TTS(5) + \_PTS(5) +
  \_SST then exits (existing behavior).
* 2 (s2idle wake): acpid runs \_SST(2) + \_WAK(0) +
  \_SST(1) (new Phase I.5 behavior).
* 3 (s3 wake): Phase II — not yet wired.

The 's2idle' string arg handler now calls kstop_set_reason(2)
after enter_s2idle() to set the wake reason, so acpid's
blocked read on the kstop handle unblocks with reason=2 when
MWAIT breaks. This is the dual-purpose wake signal.

Hardware-agnostic: works for any platform with Modern
Standby firmware (Dell, HP, Lenovo, LG Gram, etc.). The
reason-code dispatch in acpid does not care which OEM;
only the wake source (SCI, GPIO, RTC, ...) varies.
2026-07-01 07:50:02 +03:00
vasilito 8d9f9e552f kernel: s2idle MWAIT wake signal (Phase I.5)
Phase I.5: complete the acpid <-> kernel s2idle wire. After
MWAIT returns from an interrupt (typically an SCI from
acpid), the kernel now:

1. Clears S2IDLE_REQUESTED (via s2idle_request_clear)
2. Sets KSTOP_FLAG and triggers EVENT_READ on the kstop
   handle (via s2idle_signal_wake)

This is the kernel-side analog of Linux 7.1
`acpi_s2idle_wake` in `drivers/acpi/sleep.c:758`. The
existing irq_trigger in generic_irq has already routed the
SCI to acpid's listener (which opened /scheme/irq/{sci}
earlier in the boot sequence), so the AML interpretation
is done by acpid asynchronously.

The s2idle flow now:
1. acpid: enter_s2idle() (\_TTS(0), \_PTS(0), \_SST(3))
2. acpid: write 's2idle\n' to /scheme/sys/kstop
   -> kernel sets S2IDLE_REQUESTED, returns
3. Kernel idle path: mwait_loop() at deepest C-state
4. SCI breaks MWAIT (any interrupt, not just SCI)
5. Kernel mwait_loop post-handler (this commit):
   - s2idle_request_clear()
   - s2idle_signal_wake() -> KSTOP_FLAG set, EVENT_READ
6. acpid main loop: wakes from kstop handle read
7. acpid: exit_s2idle() (\_SST(2), \_WAK(0), \_SST(1))

The KSTOP_FLAG set in step 5 also serves as a 'reason'
indicator — acpid's CheckShutdown verb (kcall 2) returns
the flag, so acpid can distinguish a kstop-shutdown event
from a kstop-s2idle-wake event by polling CheckShutdown
after waking.

Hardware-agnostic: the same flow works for any platform
with Modern Standby firmware (Dell, HP, Lenovo, LG Gram,
etc.). The s2idle is the universal mechanism for low-power
idle; only the wake source (SCI, GPIO, RTC, ...) varies
per OEM.
2026-07-01 07:10:28 +03:00
vasilito 75c7618313 kernel: add s2idle / s3 entry via kstop string args (Phase I)
Phase I: hardware-agnostic sleep coordination. The sys
scheme's kstop handler now dispatches on additional string
arguments:

* 's2idle' — acpid requests Modern Standby / S0ix entry.
  The kernel sets S2IDLE_REQUESTED in scheme/acpi.rs. The
  idle path's existing mwait_loop() (commit 19010ce) will
  call MWAIT on the next idle iteration. MWAIT breaks on
  any interrupt (typically an SCI from acpid). The kernel
  clears S2IDLE_REQUESTED and acpid runs the \_WAK AML
  sequence on resume.

* 's3' — acpid requests Suspend-to-RAM. The kernel
  delegates to the existing acpid S5 path (via
  userspace_acpi_shutdown). Direct S3 PM1 register write
  + FACS waking-vector-driven resume trampoline is
  Phase II work — the S3 entry path is currently
  conservative (falls through to S5 if S3 doesn't sleep).

The S2IDLE_REQUESTED atomic in scheme/acpi.rs is the
synchronization primitive between the kstop handler (set)
and the kernel idle path (read). It mirrors Linux 7.1
s2idle_state == S2IDLE_STATE_ENTER in
kernel/power/suspend.c:91.

Hardware-agnostic: works on any platform with Modern
Standby firmware (Dell, HP, Lenovo, LG Gram, etc.) or
traditional S3 (systems that advertise \_S3 in AML). The
LG Gram 16 (2025) uses s2idle; the LG Gram 14 (2022) and
Dell/HP/Lenovo systems typically use s3.

Why not extend the syscall crate with new AcPiVerb
variants? The libredox 0.1.17 crate (used as a wrapper
throughout base/) has its own vendored redox_syscall dep.
Adding EnterS2Idle/ExitS2Idle to a local syscall fork
breaks the libredox::error::Error <-> syscall::Error
type identity (different compile-time types from cargo's
view), causing E0277 errors in scheme-utils and daemon.
Phase J (deferred) will fork libredox to also use the
local syscall fork. Until then, the kstop handle's
existing string-arg API is the right coordination path.
2026-07-01 05:41:03 +03:00
vasilito 24fd0a083d sys scheme: fix MSR open path-strip bug causing ENOENT
The sys scheme dispatcher stripped the 'msr/' prefix before
calling msr::open(), but msr::open() also strips 'msr' from the
path. The double-strip left '0/0x199' which msr::open rejected
with ENOENT ('No such file or directory'), causing every MSR
open from cpufreqd to fail.

Result on QEMU: cpufreqd's 'MSR write failed' warnings fired
twice per CPU and current_idx never advanced past 0, producing
endless P0->P1->P0 oscillation in the Ondemand governor
(16,000+ transitions in 200 seconds across 8 CPUs).

Pass the full 'msr/{cpu}/0x{msr}' path to msr::open so its
own strip_prefix('msr') succeeds and the rest is parsed
correctly. Same fix applies to any other scheme registered
the same way.
2026-07-01 00:42:39 +03:00
Red Bear OS a8042049ce kernel: restore -Z json-target-spec (required for .json target specs) 2026-06-30 17:46:14 +03:00
Red Bear OS 19010ce174 kernel: add MWAIT idle_loop for deeper C-states on modern CPUs (Phase G)
Adds cpuid_max_mwait_substate(), mwait_loop(), and idle_loop() to the
interrupt module. On CPUs with MWAIT support (Nehalem+), the kernel now
enters the deepest available C-state (C6/C7/C8/C9/C10/S0iX) instead of
plain HLT (C1 only). Falls back to enable_and_halt on older CPUs.

startup/mod.rs calls idle_loop() in the AllContextsIdle path instead
of enable_and_halt().
2026-06-30 15:59:02 +03:00
Red Bear OS 7f7095be1c kernel: drop -Z json-target-spec (redundant with --target for nightly-2026-04-01) 2026-06-30 15:58:41 +03:00
Red Bear OS 8cd4f69108 kernel: add /scheme/sys/msr/ R/W scheme (Phase G.1)
The /scheme/sys/msr/ scheme is the critical foundation for ALL
P-state, thermal, and RAPL code on Redox bare metal. Without it,
every MSR write from userspace is a silent no-op.

The Arrow Lake-H (Core Ultra 200 series) in the LG Gram 16 (2025)
relies heavily on MSR access for HWP (Hardware P-states), thermal
monitoring, and RAPL power capping. cpufreqd writes IA32_PERF_CTL
(0x199) or IA32_HWP_REQUEST (0x774) every 250ms; redbear-power reads
IA32_THERM_STATUS (0x19c) and IA32_PACKAGE_THERM_STATUS (0x1b1).

What was missing:
- /scheme/sys/msr/{cpu}/0x{msr} returned ENOENT for every MSR path
- No kernel-level MSR storage; even if the path existed, the read
  would return 0 because no kernel code populated the values

This commit adds:
- src/scheme/sys/msr.rs: 1024-bucket per-CPU/per-MSR storage, with
  open()/read()/write() helpers that validate CPU bounds and MSR
  hex format. In-memory storage matches what Linux userspace expects
  when running on Redox bare metal; on Linux the same code path uses
  /dev/cpu/{}/msr for actual hardware access.
- src/scheme/sys/mod.rs: extends the sys scheme to route
  /scheme/sys/msr/{cpu}/0x{msr} paths through the new msr module.
  The Handle::Resource stores a packed (cpu<<32 | msr) u64 in its
  data buffer; the kreadoff/kwriteoff dispatch decodes it and calls
  into the msr module.

Verified by: `make` builds the kernel cleanly (1.2 MiB). The
existing sys scheme paths (kstop, cpu, irq, stat, etc.) are
untouched. The MSR module is a pure addition gated by path-prefix
matching.

Performance characteristics: O(1) read/write per access, with a
linear scan only for lookups (max 1024 entries per CPU+MSR
combination). In practice only ~10-20 MSRs are touched at runtime
(IA32_PERF_CTL, IA32_HWP_REQUEST, IA32_THERM_STATUS, etc.) so the
cache stays warm.

Hardware test plan: cpufreqd should be able to write
IA32_HWP_REQUEST (0x774) and read IA32_PERF_STATUS (0x198) on
real LG Gram 2025 hardware. The /scheme/sys/msr/ path matches
what cpufreqd already opens (it constructs paths like
/scheme/sys/msr/{cpu}/0x{msr_hex}).
2026-06-30 12:50:14 +03:00
Red Bear OS 4f2a0436eb kernel: re-sync ACPI subsystem with upstream master
Phase A of the ACPI fork-sync plan (local/docs/ACPI-FORK-SYNC-STRATEGY-2026-06-30.md).

Restores the kernel to the upstream Redox OS kernel main branch state for
the ACPI subsystem:

- Cargo.toml: switch redox_syscall from 0.7.4 (two versions behind) to a
  git ref of gitlab.redox-os.org/redox-os/syscall.git, matching the
  upstream master dependency. The crates.io 0.8.1 release predates the
  AcpiVerb enum that MR #613 / MR #275 introduced, so a crates.io pin
  is insufficient.

- src/acpi/rsdp.rs: full rewrite to match upstream f49c7d99 (RSDP
  validation + NonNull + fail-softly):
    * signature check "RSD PTR "
    * 20-byte base checksum
    * full-length checksum for revision >= 2
    * NonNull<u8> instead of *const u8
  Fixes gap #1 from the 2026-06-30 ACPI assessment: the kernel was
  accepting any pointer from the bootloader without validation.

- src/startup/mod.rs: acpi_rsdp() returns Option<NonNull<u8>> to match
  the new Rsdp::get_rsdp signature.

- src/acpi/mod.rs: init() takes Option<NonNull<u8>>.

- src/scheme/acpi.rs: full rewrite to upstream MR #613 (Simplify acpi
  scheme). Drops the /scheme/kernel.acpi/ filesystem surface in favor
  of a single Fd::open + call() interface with AcpiVerb verbs:
    * AcpiVerb::ReadRxsdt - returns the raw RXSDT bytes
    * AcpiVerb::CheckShutdown - returns whether shutdown is pending
  Uses HandleBits bitflags, atomic EXISTS_KSTOP_HANDLE, Mutex<L4> from
  crate::sync::ordered. Replaces /scheme/kernel.acpi/rxsdt and
  /scheme/kernel.acpi/kstop files.

- src/scheme/mod.rs: KernelScheme::kcall signature updated to take
  fds: &[usize] instead of id: usize (matches upstream). kfpath now
  has a default body returning EOPNOTSUPP (matches upstream).

- src/scheme/memory.rs, proc.rs, user.rs: kcall impls updated to
  match new trait signature, using fds.first() to extract the single
  handle for backward compat.

- src/scheme/proc.rs: kcall dispatch adds _ => Err(EINVAL) catch-all
  for the new ProcSchemeVerb variants (RegsInt, RegsFloat, RegsEnv,
  SchedAffinity, Start) that the gitlab syscall crate adds. These
  verbs are not yet implemented in the proc scheme; the catch-all
  returns EINVAL cleanly instead of failing to compile.

- src/syscall/fs.rs: SYS_CALL dispatcher now passes &[number] to
  scheme.kcall() to match the new trait signature.

- Makefile: removed -Z json-target-spec flag (promoted to stable in
  nightly 2026-04-01; the flag is unknown in our pinned toolchain).

Verified by `make` in local/sources/kernel/ with PATH including the
prefix cross-toolchain: kernel builds and links successfully.
2026-06-30 04:09:05 +03:00
Red Bear OS 4cb9d80396 Add -Z json-target-spec for newer Rust nightly compatibility 2026-06-28 02:36:08 +03:00
Red Bear OS 82feefbaee Red Bear OS kernel baseline
From release 0.1.0 pre-patched archive.
This includes all Red Bear modifications previously maintained
as patches in local/patches/kernel/.
2026-06-27 09:19:25 +03:00