T

vasilito d37b421cb3 kernel: fix wakeup_contexts vs steal_work deadlock

Two-sided fix for the lock-ordering deadlock discovered by
Oracle review (Issue 24):

1. wakeup_contexts (this fn) held IDLE_CONTEXTS while
   waiting for SchedQueuesLock on its own CPU via
   SchedQueuesLock::new(&percpu.sched). If another CPU's
   steal_work was holding that SchedQueuesLock (via a victim
   SchedQueuesLock) and waiting for IDLE_CONTEXTS, both
   threads spin forever.

   Fix: drop idle_contexts immediately after building the
   wakeups Vec. The Vec is the only data we need; releasing
   the lock here means steal_work on another CPU can proceed
   while this CPU acquires its own SchedQueuesLock.

2. steal_work held a victim's SchedQueuesLock (victim_lock)
   while calling idle_contexts(token.downgrade()).push_back
   on a context that turned out to be Blocked. This is the
   matching side of the deadlock: CPU A held IDLE_CONTEXTS and
   waited for its own SchedQueuesLock; CPU B (steal_work) held
   CPU A's SchedQueuesLock and waited for IDLE_CONTEXTS.

   Fix: use idle_contexts_try (try_lock) instead of
   idle_contexts (blocking lock). If IDLE_CONTEXTS is busy
   (owned by wakeup_contexts on another CPU), skip the
   push-back; the context will be re-checked on the next
   wakeup round because it was not removed from IDLE_CONTEXTS
   (the Blocked status was set, but it stayed in IDLE_CONTEXTS
   because we never re-pushed it).

The original code at line 429 used idle_contexts (blocking)
which is what makes this a real deadlock. try_lock is safe
because:
  - If try_lock succeeds, the context is correctly pushed
  - If try_lock fails, the context is still in IDLE_CONTEXTS
    (we never removed it), so the next wakeup_contexts will
    find it again

2026-07-02 10:36:17 +03:00

.helix

Red Bear OS kernel baseline

2026-06-27 09:19:25 +03:00

linkers

Red Bear OS kernel baseline

2026-06-27 09:19:25 +03:00

res

Red Bear OS kernel baseline

2026-06-27 09:19:25 +03:00

rmm

Red Bear OS kernel baseline

2026-06-27 09:19:25 +03:00

src

kernel: fix wakeup_contexts vs steal_work deadlock

2026-07-02 10:36:17 +03:00

targets

Red Bear OS kernel baseline

2026-06-27 09:19:25 +03:00

.gitignore

Red Bear OS kernel baseline

2026-06-27 09:19:25 +03:00

.gitlab-ci.yml

Red Bear OS kernel baseline

2026-06-27 09:19:25 +03:00

.gitmodules

Red Bear OS kernel baseline

2026-06-27 09:19:25 +03:00

ARM-AARCH64-PORT-OUTLINE.md

Red Bear OS kernel baseline

2026-06-27 09:19:25 +03:00

build.rs

Red Bear OS kernel baseline

2026-06-27 09:19:25 +03:00

Cargo.lock

kernel: futex 64-shard hash table (Phase 0c, plan order #1 )

2026-07-02 06:26:24 +03:00

Cargo.toml

kernel: [patch.crates-io] libredox + [patch.'<URL>'] redox_syscall for Phase J

2026-07-01 14:03:18 +03:00

clippy.sh

Red Bear OS kernel baseline

2026-06-27 09:19:25 +03:00

config.toml.example

Red Bear OS kernel baseline

2026-06-27 09:19:25 +03:00

LICENSE

Red Bear OS kernel baseline

2026-06-27 09:19:25 +03:00

Makefile

kernel: restore -Z json-target-spec (required for .json target specs)

2026-06-30 17:46:14 +03:00

README.md

Red Bear OS kernel baseline

2026-06-27 09:19:25 +03:00

rust-toolchain.toml

Red Bear OS kernel baseline

2026-06-27 09:19:25 +03:00

rustfmt.toml

Red Bear OS kernel baseline

2026-06-27 09:19:25 +03:00

README.md

Kernel

Redox OS Microkernel

Requirements

nasm needs to be available on the PATH at build time.

Building The Documentation

Use this command:

cargo doc --open --target x86_64-unknown-none

Debugging

QEMU

Running QEMU with the -s flag will set up QEMU to listen on port 1234 for a GDB client to connect to it. To debug the redox kernel run.

make qemu gdb=yes

This will start a virtual machine with and listen on port 1234 for a GDB or LLDB client.

GDB

If you are going to use GDB, run these commands to load debug symbols and connect to your running kernel:

(gdb) symbol-file build/kernel.sym
(gdb) target remote localhost:1234

LLDB

If you are going to use LLDB, run these commands to start debugging:

(lldb) target create -s build/kernel.sym build/kernel
(lldb) gdb-remote localhost:1234

After connecting to your kernel you can set some interesting breakpoints and continue the process. See your debuggers man page for more information on useful commands to run.

Notes

Always use foo.get(n) instead of foo[n] and try to cover for the possibility of Option::None. Doing the regular way may work fine for applications, but never in the kernel. No possible panics should ever exist in kernel space, because then the whole OS would just stop working.
If you receive a kernel panic in QEMU, use pkill qemu-system to kill the frozen QEMU process.

How To Contribute

To learn how to contribute to this system component you need to read the following document:

CONTRIBUTING.md

Development

To learn how to do development with this system component inside the Redox build system you need to read the Build System and Coding and Building pages.

How To Build

To build this system component you need to download the Redox build system, you can learn how to do it on the Building Redox page.

This is necessary because they only work with cross-compilation to a Redox virtual machine, but you can do some testing from Linux.

Funding - Unix-style Signals and Process Management

This project is funded through NGI Zero Core, a fund established by NLnet with financial support from the European Commission's Next Generation Internet program. Learn more at the NLnet project page.

Languages

C 43.9%

C++ 23.5%

Makefile 7.3%

Python 3.7%

JavaScript 3.4%

Other 17.1%