Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-openagent) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>
72 KiB
IOMMU Specification Reference — AMD-Vi & Intel VT-d
Purpose: Implementation-ready hardware register and data structure reference for Red Bear OS IOMMU support. Based on AMD IOMMU Specification 48882 Rev 3.10 and Intel Virtualization Technology for Directed I/O (VT-d) Rev 5.0.
Status: The iommu daemon now builds in-tree, but hardware validation is still missing in the AMD-first integration plan (see AMD-FIRST-INTEGRATION.md). This document provides the register and data-structure reference for finishing AMD-Vi and Intel VT-d bring-up.
Table of Contents
1. AMD-Vi (AMD IOMMU)
1.1 MMIO Register Map
Base address obtained from ACPI IVRS table (IVHD entry IOMMUInfo field).
| Offset | Name | Size | Access | Description |
|---|---|---|---|---|
| 0x0000 | DevTableBar | 64-bit | R/W | Device Table Base Address. Bits 12:51 hold physical address. Bits 0:8 = DeviceTableSize (entries = 2^(size+1), max 65536). Must be 4KiB-aligned. |
| 0x0008 | CmdBufBar | 64-bit | R/W | Command Buffer Base Address. Bits 12:51 hold physical address. Bits 0:8 = CmdBufLen (size = 2^(len+2) × 16 bytes). Must be 4KiB-aligned. |
| 0x0010 | EvtLogBar | 64-bit | R/W | Event Log Base Address. Bits 12:51 hold physical address. Bits 0:8 = EvtLogLen (size = 2^(len+2) × 16 bytes). Must be 4KiB-aligned. |
| 0x0018 | Control | 32-bit | R/W | IOMMU Control Register. See bit layout below. |
| 0x0020 | ExclusionBase | 64-bit | R/W | Exclusion Range Base Address. Physical address of excluded region start. |
| 0x0028 | ExclusionLimit | 64-bit | R/W | Exclusion Range Limit Address. Physical address of excluded region end. |
| 0x0030 | ExtendedFeature | 64-bit | RO | Extended Feature Register. Capability flags. Read to determine supported features. |
| 0x0038 | PprLogBar | 64-bit | R/W | Peripheral Page Request Log Base Address. Bits 12:51 = address, Bits 0:8 = log length. |
| 0x0030 | ExtendedFeature | 64-bit | RO | Extended Feature Register (alias for capability query). |
| 0x2000 | CmdBufHead | 64-bit | R/W | Command Buffer Head Pointer. Index into command buffer (byte offset / 16). |
| 0x2008 | CmdBufTail | 64-bit | R/W | Command Buffer Tail Pointer. Written by software to submit commands. |
| 0x2010 | EvtLogHead | 64-bit | R/W | Event Log Head Pointer. Written by software after reading events. |
| 0x2018 | EvtLogTail | 64-bit | RO | Event Log Tail Pointer. Updated by IOMMU hardware after writing event. |
| 0x2020 | Status | 32-bit | RO | IOMMU Status Register. See bit layout below. |
| 0x2028 | PprLogHead | 64-bit | R/W | PPR Log Head Pointer. |
| 0x2030 | PprLogTail | 64-bit | RO | PPR Log Tail Pointer. |
Control Register (0x0018) Bit Layout
| Bit | Name | Description |
|---|---|---|
| 0 | IOMMUEnable | 0 = IOMMU translations disabled, 1 = enabled. Must be set last after all other config. |
| 1 | HTTunEn | HyperTransport Tunnel Enable. Set 0 for modern systems. |
| 2 | EventLogEn | Event Log Enable. Set 1 to enable event logging. |
| 3 | EventIntEn | Event Log Interrupt Enable. Set 1 to generate interrupts on event log overflow. |
| 4 | ComWaitIntEn | Completion Wait Interrupt Enable. |
| 5 | CmdBufEn | Command Buffer Enable. Set 1 to enable command processing. |
| 6 | PprLogEn | Peripheral Page Request Log Enable. |
| 7 | PprIntEn | PPR Log Interrupt Enable. |
| 8 | PprEn | Peripheral Page Request Processing Enable. |
| 9 | GTEn | Guest Translation Enable. |
| 10 | GAEn | Guest APIC (Advanced Programmable Interrupt Controller) Enable. |
| 12 | CRW | IOMMU Reset. Write 1 to clear errors after reset. |
| 13 | SMifEn | SMI Filter Enable. |
| 14 | SlFWEn | Self-Modify Firmware Enable. |
| 15 | SMifLogEn | SMI Filter Log Enable. |
| 16 | GAMEn_0 | Guest APIC Mode bit 0. |
| 17 | GAMEn_1 | Guest APIC Mode bit 1. |
| 18 | GAMEn_2 | Guest APIC Mode bit 2. |
| 22 | XTEn | x2APIC Enabled. |
| 23 | NXEn | No-Execute Enable. |
| 24 | IRQTableLEn | Interrupt Remap Table Length Enable. |
Status Register (0x2020) Bit Layout
| Bit | Name | Description |
|---|---|---|
| 0 | IOMMURunning | 1 = IOMMU is processing commands or translations. |
| 1 | EventOverflow | 1 = Event log overflow occurred. Write 1 to clear. |
| 2 | EventLogInt | 1 = Event log interrupt pending. |
| 3 | ComWaitInt | 1 = Completion wait interrupt pending. |
| 4 | PprOverflow | 1 = PPR log overflow. |
| 5 | PprInt | 1 = PPR log interrupt pending. |
| 31 | RsvdP | Reserved (polling status bits). |
Extended Feature Register (0x0030) Bit Layout
| Bit | Name | Description |
|---|---|---|
| 0 | PrefSup | Prefetch Support. |
| 1 | PPRSup | Peripheral Page Request Support. |
| 2 | XTSup | x2APIC Support. |
| 3 | NXSup | No-Execute Support. |
| 4 | GTSup | Guest Translation Support. |
| 5 | bit5 | Reserved. |
| 6 | IASup | Invalidate IOMMU All Support. |
| 7 | GASup | Guest APIC Support. |
| 8 | HESup | Hardware Error Registers Support. |
| 9 | PCSup | Performance Counters Support. |
| 12:15 | MsiNumPPR | MSI message number for PPR. |
| 27 | PASMax | Maximum PASID support. |
| 46:52 | PASMax | Physical Address Space Max (1 = 48-bit, 2 = 52-bit). |
| 57 | GISup | Global Invalidate Support. |
| 58 | HASup | Host Address Translation Size. |
1.2 Device Table Entry (DTE)
The Device Table holds up to 65536 entries indexed by BDF (Bus:Device:Function). Each entry is 256 bits (32 bytes). The table must be contiguous in physical memory.
Table size: entries × 32 bytes. With 65536 entries, max 2 MiB.
DTE layout (256 bits = data[0] data[1] data[2] data[3], each u64):
data[0] (bits 0-63):
[ 0] V — Valid. 1 = entry is valid.
[ 1] TV — Translation Valid. 1 = address translation enabled for this device.
[ 2:3] Reserved
[ 4] IW — Write permission (when Mode != 0). 1 = device may write.
[ 5] IR — Read permission (when Mode != 0). 1 = device may read.
[ 6:7] Reserved
[ 8] SE — Snoop Enable. 1 = device requests are snooped.
[ 9:11] Mode — Translation mode:
000 = No translation (pass-through if TV=0)
001 = 1-level page table
010 = 2-level page table
011 = 3-level page table
100 = 4-level page table
101 = 5-level page table
110 = 6-level page table
111 = Reserved
[12:51] PTP — Page Table Root Pointer. Physical address of top-level page table.
Must be 4KiB-aligned. Bits 0:11 of the address are assumed zero.
[52:55] GCR3Trp0 — Guest CR3 Table Root Pointer bits 12:15.
[56:58] GV — Guest Translation Valid bits.
[59] GLX — Guest Levels bit 0.
[60] GLX — Guest Levels bit 1.
[61] IR — Interrupt Remapping Enable. 1 = interrupts from this device are remapped.
[62] IW — Interrupt Write permission. 1 = device may generate interrupt writes.
[63] Reserved
data[1] (bits 64-127):
[0:3] IntTabLen — Interrupt Remap Table Length. Number of entries = 2^(IntTabLen+1).
0 = 2 entries, 1 = 4 entries, ..., 10 = 2048 entries, 11 = 4096 entries.
[4:5] IntCtl — Interrupt Control. 00 = abort, 01 = pass-through (no remap),
10 = remapped, 11 = reserved.
[6:51] IRTP — Interrupt Remap Table Pointer. Physical address of interrupt
remap table. Must be 4KiB-aligned (bits 0:11 assumed zero).
[52:63] Reserved
data[2] (bits 128-191):
[0:51] GCR3Trp1 — Guest CR3 Table Root Pointer bits 16:63.
[52:63] Reserved
data[3] (bits 192-255):
[0:15] GCR3Trp2 — Guest CR3 Table Root Pointer bits 64:79.
[16] AttrRsvd — Reserved attribute bit.
[17] AttrU — User bit for device-specific use.
[18:20] Mode2 — Alias to Mode bits (duplicate for hardware).
[21:63] Reserved
Key constants from Linux (drivers/iommu/amd/amd_iommu_types.h):
#define DTE_FLAG_V (1ULL << 0)
#define DTE_FLAG_TV (1ULL << 1)
#define DTE_FLAG_IR (1ULL << 61)
#define DTE_FLAG_IW (1ULL << 62)
#define DTE_MODE_MASK 0x0E00ULL // bits 9:11
#define DTE_PT_ADDR_MASK 0x0FFFFFFFFFF000ULL // bits 12:51
#define DEV_DOMID_MASK 0x0FFFFULL // domain ID in bits 0:15 (when TV=0)
1.3 Interrupt Remapping Table Entry (IRTE)
The Interrupt Remap Table is pointed to by the IRTP field in the DTE. Each entry is 128 bits (16 bytes). Length is 2^(IntTabLen+1) entries.
IRTE (128 bits = data[0] data[1], each u64):
data[0]:
[0] RemapEn — Remap Enable. 1 = this entry is valid for remapping.
[1] SupIOPF — Suppress I/O Page Faults. 1 = suppress faults from this interrupt.
[2] IntType — Interrupt Type:
000 = Fixed (edge or level, determined by trigger mode)
001 = Arbitrated
010 = SMI
011 = NMI
100 = INIT
101 = EXTINT
111 = Hardware-specific
[3:4] IntType bits continued (3-bit field uses bits 2:4)
[5] Rsvd — Reserved.
[5:7] DM — Delivery Mode. 0 = Fixed, 1 = Lowest Priority.
[8] IRrsvd — Reserved.
[9:10] GV — Guest Vector.
[11] GDstMode — Guest Destination Mode. 0 = Physical, 1 = Logical.
[12] DstMode — Destination Mode. 0 = Physical APIC ID, 1 = Logical.
[13:15] Rsvd — Reserved.
[16:31] DstID — Destination APIC ID. For x2APIC, full 32-bit ID (low 16 bits here).
[16:31] DstLo — Low 16 bits of destination APIC ID.
[32:63] Vector — Interrupt vector (0x10..0xFE).
data[1]:
[0:31] DstHi — High 32 bits of x2APIC destination ID. Zero for xAPIC.
[32:63] Rsvd — Reserved. Must be zero.
IRTE bit layout for x2APIC mode (when XTSup=1 in ExtendedFeature):
data[0]:
[0] RemapEn — 1 = valid
[1] SupIOPF — Suppress IO Page Fault
[2:4] IntType — Interrupt type (same as above)
[5:7] Rsvd
[8] DstMode — 0 = physical, 1 = logical
[9:10] Rsvd
[16:31] DstIDLo — Low 16 bits of x2APIC ID
[32:39] Vector — Interrupt vector
[40:63] Rsvd
data[1]:
[0:31] DstIDHi — High 32 bits of x2APIC destination ID
[32:63] Rsvd
1.4 Command Buffer Entry
The command buffer is a circular queue. Each entry is 128 bits (16 bytes = 4 × u32). Software writes to the tail, hardware reads from the head. Base address in CmdBufBar, head/tail pointers in CmdBufHead/CmdBufTail.
Buffer sizing: 8192 bytes default (512 entries). Size = 2^(CmdBufLen+2) × 16 bytes.
Command Buffer Entry (128 bits = word[0] word[1] word[2] word[3], each u32):
word[0]:
[0:3] Opcode — Command opcode (see below)
[4:31] Varies — Opcode-specific operands
word[1], word[2], word[3]:
Opcode-specific payload. See each command format below.
COMPLETION_WAIT (Opcode 0x01)
Used to poll for command completion. Can generate an interrupt or write a value to memory.
word[0]: [0:3]=0x01, [4]=Store (1=write to memory), [5]=Interrupt (1=generate IRQ),
[6:31] Reserved
word[1]: [0:31] Store Address low 32 bits (physical, must be 8-byte aligned)
word[2]: [0:31] Store Address high 32 bits
word[3]: [0:31] Store Data — value written to Store Address when command completes
INVALIDATE_DEVTAB_ENTRY (Opcode 0x02)
Invalidates a single device table entry. Must be issued after modifying a DTE.
word[0]: [0:3]=0x02, [4:31] Reserved
word[1]: [0:15] DeviceId (BDF format: Bus[15:8] | Dev[7:3] | Func[2:0])
[16:31] Reserved
word[2]: [0:31] Reserved
word[3]: [0:31] Reserved
INVALIDATE_IOMMU_PAGES (Opcode 0x03)
Invalidates translation cache (TLB) entries for a range of pages.
word[0]: [0:3]=0x03, [4]=S (Size: 0=invalidate one page, 1=invalidate all pages for domain),
[5]=PDE (Page Directory Entry: 1=invalidate PDE cache too),
[6:31] Reserved
word[1]: [0:15] DomainId — domain to invalidate
[16:31] Reserved
word[2]: [0:51] Address — virtual address to invalidate (page-aligned). Ignored if S=1.
[52:63] Reserved
word[3]: [0:31] Reserved
INVALIDATE_INTERRUPT_TABLE (Opcode 0x04)
Invalidates the interrupt remap cache for a device.
word[0]: [0:3]=0x04, [4:31] Reserved
word[1]: [0:15] DeviceId (BDF format)
[16:31] Reserved
word[2]: [0:31] Reserved
word[3]: [0:31] Reserved
INVALIDATE_IOMMU_ALL (Opcode 0x05)
Invalidates all IOMMU caches (TLB, DTE, IRTE). Available when IASup=1.
word[0]: [0:3]=0x05, [4:31] Reserved
word[1]: [0:31] Reserved
word[2]: [0:31] Reserved
word[3]: [0:31] Reserved
1.5 Event Log Entry
The event log is a circular queue written by the IOMMU hardware. Each entry is 128 bits (16 bytes = 4 × u32). Base address in EvtLogBar.
Buffer sizing: 8192 bytes default (512 entries). Size = 2^(EvtLogLen+2) × 16 bytes.
Event Log Entry (128 bits = word[0] word[1] word[2] word[3]):
word[0]:
[0:15] EventCode — Event type code (see below)
[16:31] EventFlags — Event-specific flags
word[1], word[2], word[3]:
Event-specific data. See each event type below.
IO_PAGE_FAULT (Event Code 0x01)
Generated when a device accesses an address that fails translation.
word[0]: [0:15]=0x01, [16] TR (Translation Response: 1=fault in translation),
[17] RZ (Read/Zero: 1=read of zero page), [18] I (Interrupt: 1=interrupt request),
[19] PE (Permission Error: 1=permission violation), [20] RW (1=write, 0=read),
[21] PR (Present: 1=PTE was present), [22] Rsvd
word[1]: [0:15] DeviceId (BDF), [16:31] Reserved or PASID
word[2]: [0:31] Fault Address low 32 bits
word[3]: [0:31] Fault Address high 32 bits
INVALIDATE_DEVICE_TABLE (Event Code 0x02)
Generated when hardware detects an invalid DTE during a transaction.
word[0]: [0:15]=0x02, [16:31] Reserved
word[1]: [0:15] DeviceId (BDF), [16:31] Reserved
word[2]: [0:31] Reserved
word[3]: [0:31] Reserved
INVALIDATE_COMMAND (Event Code 0x03)
Generated when an invalid command is detected in the command buffer.
word[0]: [0:15]=0x03, [16:31] Reserved
word[1]: [0:15] Reserved, [16:31] Reserved
word[2]: [0:31] Physical address of the illegal command (low)
word[3]: [0:31] Physical address of the illegal command (high)
COMMAND_HARDWARE_ERROR (Event Code 0x05)
Hardware error during command processing.
word[0]: [0:15]=0x05, [16:31] Error flags
word[1]: [0:31] Error address or type
word[2]: [0:31] Error address low
word[3]: [0:31] Error address high
1.6 IVRS ACPI Table
The IVRS (I/O Virtualization Reporting Structure) is the ACPI table that describes AMD IOMMU topology. Found by scanning ACPI tables with signature "IVRS" (0x56534949).
IVRS Header (36 bytes)
Offset Size Field Description
0x00 4 Signature "IVRS" (0x56534949)
0x04 4 Length Total table length in bytes
0x08 1 Revision 2 = revision 2 (AMD-Vi), 3 = revision 3
0x09 1 Checksum ACPI checksum (sum of all bytes = 0)
0x0A 6 OemId OEM identifier
0x10 8 OemTableId OEM table identifier
0x18 4 OemRevision OEM revision
0x1C 4 CreatorId ASL compiler vendor
0x20 4 CreatorRevision ASL compiler revision
0x24 4 IvInfo IOMMU Virtualization Info:
[0:7] = Virtualization Spec Revision (40 = rev 4.0)
[8:9] = EFRSup (Extended Feature Register supported)
[10:11] = Reserved
[31] = HT AtsResv (HT ATS reserved)
IVHD Entry (I/O Virtualization Hardware Definition)
Describes a single IOMMU unit. There can be multiple IVHD entries for multiple IOMMUs.
Offset Size Field Description
0x00 1 Type 0x10 = IVHD type 10 (rev 2), 0x11 = IVHD type 11 (rev 3, 64-bit)
0x01 1 Flags Feature flags:
[0] = HtTunEn (HT tunnel enable)
[1] = PassPW (Pass posted writes)
[2] = ResPassPW (Reset PassPW)
[3] = Isoc (Isoc support)
[4] = IotlbSup (IOTLB support)
[5] = Coherent (Coherent IOMMU)
[6] = PrefSup (Prefetch support)
[7] = PPRSup (PPR support)
0x02 2 Length Total length of this IVHD entry including device entries
0x04 2 DeviceId BDF of the IOMMU PCI device
0x06 2 CapabilityOffset PCI capability offset for IOMMU capability block
0x08 8 IOMMUBaseAddress Physical MMIO base address of IOMMU registers
(type 10: bits 0:51 valid, type 11: full 64-bit)
0x10 2 PciSegmentGroup PCI segment group number
0x12 2 IommuInfo IOMMU Info:
[0:5] = MSI number for event log
[6:12] = Unit ID (IOMMU hardware unit ID)
[13:15] = Reserved
0x14 4 IommuEfr Extended Feature Register attributes (type 11 only)
0x18 ... DeviceEntries Variable-length device entry list follows
IVHD Device Entry Types
Each device entry in an IVHD starts with a type byte followed by data.
| Type | Name | Size | Description |
|---|---|---|---|
| 0x00 | IVHD_ALL | 4 | Select all devices (except those listed in other entries). Data = all zeros. |
| 0x01 | IVHD_SEL | 4 | Select a single device. Bytes 2:3 = DeviceId (BDF). Byte 4 = Data (LSA flags). |
| 0x02 | IVHD_SOR | 4 | Start of Range. Bytes 2:3 = first DeviceId in range. |
| 0x03 | IVHD_EOR | 4 | End of Range. Bytes 2:3 = last DeviceId in range. |
| 0x42 | IVHD_PAD4 | 8 | 4-byte PAD entry (reserved extension). |
| 0x43 | IVHD_PAD8 | 12 | 8-byte PAD entry (reserved extension). |
| 0x44 | IVHD_VAR | Variable | Variable-length entry. Byte 1 = length. Used for alias, extended selections. |
IVHD Device Entry Data Byte
Bits of the Data byte in IVHD_SEL/IVHD_SOR:
[0] Lint0Pass — LINT0 remapping passthrough
[1] Lint1Pass — LINT1 remapping passthrough
[2] SysMgt — System Management:
00 = No system management
01 = System Management at request level
10 = System Management at fault level
[3] SysMgt — (continued)
[4] NMIPass — NMI remapping passthrough
[5] ExtIntPass — External Interrupt remapping passthrough
[6] InitPass — INIT remapping passthrough
[7] Rsvd — Reserved
IVMD Entry (I/O Virtualization Memory Definition)
Describes a memory region that has special IOMMU handling. Appears after IVHD entries.
Offset Size Field Description
0x00 1 Type 0x20 = IVMD type 20 (rev 2), 0x21 = IVMD type 21 (rev 3)
0x01 1 Flags Memory block flags:
[0] = Unity (untranslated/unity mapping)
[1] = Read (device may read)
[2] = Write (device may write)
[3] = ExclRange (exclusion range)
0x02 2 Length Total length of this IVMD entry (16 or 24 bytes)
0x04 2 DeviceId Start DeviceId (BDF) or 0x0000 for all devices
0x06 2 AuxData Auxiliary data (reserved in most implementations)
0x08 8 StartAddress Physical start address of the memory region (type 20: 32-bit in low bits)
0x10 8 MemoryLength Length of the memory region in bytes (type 20: 32-bit in low bits)
1.7 Page Table Entry (PTE)
AMD-Vi page tables use multi-level radix tree. The number of levels is set by the DTE Mode field (1 to 6 levels). Each PTE is 64 bits.
PTE (64 bits):
[0] PR — Present. 1 = this entry maps a valid page or points to next level.
[1] U — User/Supervisor. 1 = accessible from user level. (only with NXSup)
[2] IW — Write permission. 1 = device may write to this page.
[3] IR — Read permission. 1 = device may read this page.
[4:8] Rsvd — Reserved. Must be zero.
[9:11] NextLevel — Next page table level (0=PTE/leaf, 1=PDE, 2=PDPTE, 3=PML4E, 4=PML5E).
At leaf level (PR=1, NextLevel=0): bits 12:51 = physical page frame.
At non-leaf level (PR=1, NextLevel>0): bits 12:51 = next table address.
[12:51] OutputAddr — Physical address of page frame (leaf) or next-level table (non-leaf).
Must be 4KiB-aligned (bits 0:11 assumed zero).
[51:58] Rsvd — Reserved. Must be zero.
[59] FC — Force Coherent. 1 = force coherent transactions for this page.
[60] Rsvd — Reserved.
[61] IR — Interrupt Remap (alias in page tables, platform-specific).
[62] IW — Interrupt Write (alias in page tables, platform-specific).
[63] NX — No-Execute. 1 = instruction fetches from this page are blocked (only with NXSup).
Level-to-address-bits mapping:
| Levels | Address Bits | Max Physical Address |
|---|---|---|
| 1 | 21 | 2 MiB |
| 2 | 30 | 1 GiB |
| 3 | 39 | 512 GiB |
| 4 | 48 | 256 TiB |
| 5 | 57 | 128 PiB |
| 6 | 63 | ~8 EiB |
Linux page table macros (drivers/iommu/amd/amd_iommu_types.h):
#define PM_LEVEL_SHIFT 9
#define PM_LEVEL_SIZE (1UL << PM_LEVEL_SHIFT)
#define PM_LEVEL_INDEX(level, address) \
(((address) >> (12 + (((level) - 1) * 9))) & 0x1FF)
#define PM_LEVEL_ENC(level, address) \
((address) | (((level) - 1) << 9) | 1ULL) // PR=1, NextLevel=level-1
#define PM_PTE_LEVEL(pte) (((pte) >> 9) & 0x7)
1.8 Initialization Sequence (AMD-Vi)
Step-by-step register programming to bring up AMD-Vi IOMMU.
Step 1: Discover IOMMU hardware
- Scan ACPI tables for IVRS signature
- Parse IVHD entries to find MMIO base address
- Read ExtendedFeature (0x0030) to determine capabilities
Step 2: Disable IOMMU (ensure clean state)
- Control = 0x00000000 (IOMMUEnable=0, all features off)
- Wait until Status[0] (IOMMURunning) = 0
Step 3: Allocate and zero Device Table
- Alloc 2 MiB contiguous physical memory (65536 × 32 bytes)
- Zero all entries
- Write DevTableBar (0x0000):
Bits 0:8 = DevTableSize (0x0F for 65536 entries: 2^(0x0F+1) = 65536)
Bits 12:51 = Physical address of table
Step 4: Allocate and zero Command Buffer
- Alloc 8192 bytes contiguous physical (512 entries × 16 bytes)
- Zero all entries
- Write CmdBufBar (0x0008):
Bits 0:8 = CmdBufLen (0x08 for 512 entries: 2^(0x08+2) = 4096 bytes... use 0x09 for 8192)
Bits 12:51 = Physical address
Step 5: Allocate and zero Event Log
- Alloc 8192 bytes contiguous physical (512 entries × 16 bytes)
- Zero all entries
- Write EvtLogBar (0x0010):
Bits 0:8 = EvtLogLen (0x09 for 8192 bytes)
Bits 12:51 = Physical address
Step 6: Set up exclusion range (optional)
- Write ExclusionBase (0x0020) = start of excluded physical range
- Write ExclusionLimit (0x0028) = end of excluded physical range
- Skip if no exclusion needed
Step 7: Reset head/tail pointers
- CmdBufHead (0x2000) = 0
- CmdBufTail (0x2008) = 0
- EvtLogHead (0x2010) = 0
- (EvtLogTail is RO, hardware sets it)
Step 8: Allocate and zero Interrupt Remap Table (if IR needed)
- Alloc 4096 × 16 bytes = 64 KiB (for IntTabLen=11, max 4096 entries)
- Zero all entries
- Configure each device's DTE with IRTP pointing to this table
Step 9: Configure DTEs for devices
- For each device that needs translation:
Set V=1, TV=1, Mode=4 (4-level), PTP=root page table address
Set IR=1, IW=1 if interrupt remapping is used
Set IntCtl=0x02 (remapped), IntTabLen, IRTP
Step 10: Enable features in Control register
- Control = 0x00000000 | bits for enabled features:
Bit 2 (EventLogEn) = 1
Bit 5 (CmdBufEn) = 1
Bit 22 (XTEn) = 1 (if x2APIC supported and in use)
Bit 23 (NXEn) = 1 (if NX supported)
- DO NOT set bit 0 (IOMMUEnable) yet
Step 11: Flush caches via command buffer
- Submit INVALIDATE_IOMMU_ALL (0x05) if supported, or:
INVALIDATE_DEVTAB_ENTRY for each modified device
INVALIDATE_INTERRUPT_TABLE for each device with IR
- Submit COMPLETION_WAIT (0x01) to synchronize
- Wait for completion
Step 12: Enable IOMMU translations
- Set Control bit 0 (IOMMUEnable) = 1
- Read Status to verify IOMMURunning = 1
Step 13: Enable interrupts (optional)
- Set Control bit 3 (EventIntEn) = 1
- Configure MSI delivery for the IOMMU PCI device
2. Intel VT-d
2.1 MMIO Register Map
Base address obtained from ACPI DMAR table (DRHD entry RegisterBase field).
| Offset | Name | Size | Access | Description |
|---|---|---|---|---|
| 0x00 | VER_REG | 32-bit | RO | Architecture Version. [0:7] = Minor, [8:15] = Major. |
| 0x08 | CAP_REG | 64-bit | RO | Capability Register. See bit layout below. |
| 0x10 | ECAP_REG | 64-bit | RO | Extended Capability Register. See bit layout below. |
| 0x18 | GCMD_REG | 32-bit | WO | Global Command Register. Write to request operations. |
| 0x1C | GSTS_REG | 32-bit | RO | Global Status Register. Reflects GCMD results. |
| 0x20 | RTADDR_REG | 64-bit | R/W | Root Table Address. Bit 0 = RTT (Root Table Type: 0=legacy, 1=extended). Bits 12:63 = physical address. |
| 0x28 | CCMD_REG | 64-bit | R/W | Context Command Register. For invalidating context caches. |
| 0x30 | FSTS_REG | 32-bit | RO | Fault Status Register. |
| 0x34 | FECTL_REG | 32-bit | R/W | Fault Event Control Register. |
| 0x38 | FEDATA_REG | 32-bit | R/W | Fault Event Data Register. MSI data. |
| 0x3C | FEADDR_REG | 32-bit | R/W | Fault Event Address Register. MSI address low. |
| 0x40 | FEUADDR_REG | 32-bit | R/W | Fault Event Upper Address Register. MSI address high. |
| 0x48 | AFLOG_REG | 64-bit | R/W | Advanced Fault Log Register. |
| 0x58 | PMEN_REG | 32-bit | R/W | Protected Memory Enable Register. |
| 0x5C | PLMBASE_REG | 32-bit | R/W | Protected Low Memory Base Register. |
| 0x60 | PLMLIMIT_REG | 32-bit | R/W | Protected Low Memory Limit Register. |
| 0x68 | PHMBASE_REG | 64-bit | R/W | Protected High Memory Base Register. |
| 0x70 | PHMLIMIT_REG | 64-bit | R/W | Protected High Memory Limit Register. |
| 0x78 | IQH_REG | 64-bit | RO | Invalidation Queue Head Register. |
| 0x80 | IQT_REG | 64-bit | R/W | Invalidation Queue Tail Register. |
| 0x88 | IQA_REG | 64-bit | R/W | Invalidation Queue Address Register. |
| 0x90 | ICS_REG | 32-bit | RO | Invalidation Completion Status Register. |
| 0x94 | IECTL_REG | 32-bit | R/W | Invalidation Event Control Register. |
| 0x98 | IEDATA_REG | 32-bit | R/W | Invalidation Event Data Register. |
| 0x9C | IEADDR_REG | 32-bit | R/W | Invalidation Event Address Register. |
| 0xA0 | IEUADDR_REG | 32-bit | R/W | Invalidation Event Upper Address Register. |
| 0xB0 | IRTA_REG | 64-bit | R/W | Interrupt Remapping Table Address Register. |
CAP_REG (0x08) Bit Layout
| Bit | Name | Description |
|---|---|---|
| 0 | ND (bits 0:2) | Number of Domains Supported. 0=4, 1=16, 2=64, 3=256, 4=1024, 5=4K, 6=16K, 7=64K. |
| 3:7 | ZLR | Zero Length Read. 1 = supported. |
| 8 | AFL | Advanced Fault Logging. 1 = supported. |
| 9 | RWBF | Required Write-Buffer Flushing. 1 = software must flush write buffers before invalidations. |
| 10:11 | PLMR | Protected Low Memory Region. 1 = supported. |
| 12:13 | PHMR | Protected High Memory Region. 1 = supported. |
| 14 | CM | Caching Mode. 1 = IOMMU operates in caching mode (no explicit invalidation needed). |
| 15:23 | SAGAW | Supported Adjusted Guest Address Widths. Bit N set = (N+1)-level page tables supported. |
| 24:33 | MGAW | Maximum Guest Address Width. Actual address width = MGAW + 1. |
| 34:35 | MAMV | Maximum Address Mask Value. For interrupt remapping. |
| 36 | ZAM | Zero Address/Mask. For interrupt remapping. |
| 37:39 | Rsvd | Reserved. |
| 40 | FL1GP | First Level 1-GByte Page Support. |
| 41:43 | Rsvd | Reserved. |
| 44 | PSI | Page Selective Invalidation. 1 = supported. |
| 45:51 | Rsvd | Reserved. |
| 52 | SPS | Super Page Support. Bits indicate 2MiB, 1GiB, 512GiB support. |
| 52:55 | FR | Fault Recording Register count minus 1. |
| 56:60 | Rsvd | Reserved. |
| 61:63 | Rsvd | Reserved. |
ECAP_REG (0x10) Bit Layout
| Bit | Name | Description |
|---|---|---|
| 0 | C | Page Request (PRI) support. |
| 1 | QI | Queued Invalidation support. 1 = IQ mechanism supported. |
| 2 | DT | Device TLB support. |
| 3 | IR | Interrupt Remapping support. 1 = supported. |
| 4 | EIM | Extended Interrupt Mode. 1 = x2APIC mode supported for IR. |
| 5:7 | Rsvd | Reserved. |
| 8 | PT | Pass Through. 1 = second-level translation bypass supported. |
| 9:17 | Rsvd | Reserved. |
| 18 | SC | Snoop Control. |
| 19:24 | Rsvd | Reserved. |
| 25:34 | IRO | IOTLB Register Offset. Offset from base for IOTLB registers. |
| 35:43 | Rsvd | Reserved. |
| 44:47 | MHMV | Maximum Handle Mask Value. |
| 48 | ECS | Extended Context Support. |
| 49 | MTS | Memory Type Support. |
| 50 | NEST | Nested Translation Support. |
| 51:63 | Rsvd | Reserved. |
GCMD_REG (0x18) Bit Layout (Write-Only)
| Bit | Name | Description |
|---|---|---|
| 31 | TE | Translation Enable. Write 1 to enable/disable. |
| 30 | SRTP | Set Root Table Pointer. Write 1, hardware sets GSTS.RTPS when done. |
| 29 | SFL | Set Fault Log. Write 1 to set fault log pointer. |
| 28 | EAFL | Enable Advanced Fault Log. |
| 27 | WBF | Write Buffer Flush. Write 1, hardware sets GSTS.WBFS when done. |
| 26 | QIE | Queued Invalidation Enable. Write 1 to enable. |
| 25 | SIRTP | Set Interrupt Remap Table Pointer. Write 1, hardware sets GSTS.IRTPS. |
| 24 | CFI | Compatibility Format Interrupt. Write 1 to block compatibility interrupts. |
| 23 | IR | Interrupt Remap. Write 1 to enable interrupt remapping. |
| 0:22 | Rsvd | Reserved. Must write zero. |
GSTS_REG (0x1C) Bit Layout (Read-Only)
| Bit | Name | Description |
|---|---|---|
| 31 | TES | Translation Enable Status. 1 = enabled. |
| 30 | RTPS | Root Table Pointer Status. 1 = root table pointer set. |
| 29 | FLS | Fault Log Status. |
| 28 | AFLS | Advanced Fault Log Status. |
| 27 | WBFS | Write Buffer Flush Status. 1 = flush complete. |
| 26 | QIES | Queued Invalidation Enable Status. |
| 25 | IRTPS | Interrupt Remap Table Pointer Status. |
| 24 | CFIS | Compatibility Format Interrupt Status. |
| 23 | IRES | Interrupt Remap Enable Status. |
| 0:22 | Rsvd | Reserved. |
2.2 Root Table Entry
The Root Table is pointed to by RTADDR_REG. It contains 256 entries (one per PCI bus). Each entry is 128 bits (16 bytes). Must be 4KiB-aligned.
Root Entry (128 bits = data[0] data[1], each u64):
data[0]:
[0] P — Present. 1 = this bus has context entries.
[1:63] CTP — Context Table Pointer. Physical address of the context table
for this bus. Bits 12:63 hold address. Must be 4KiB-aligned.
data[1]:
[0:63] Rsvd — Reserved. Must be zero.
2.3 Context Entry
Each Context Table contains 256 entries (one per device:function on a bus). Each entry is 128 bits (16 bytes).
Context Entry (128 bits = data[0] data[1], each u64):
data[0]:
[0] P — Present. 1 = entry is valid.
[1] FPD — Fault Processing Disable. 1 = faults from this device are suppressed.
[2:3] TT — Translation Type:
00 = Legacy mode (second-level translation only)
01 = PASID-granular translation
10 = Pass-through (no second-level translation, bypass)
11 = Reserved
[4:11] Rsvd — Reserved.
[12:63] SLPTPTR — Second Level Page Table Pointer. Physical address of the
second-level (guest) page table root. Must be 4KiB-aligned.
data[1]:
[0:15] DID — Domain Identifier. Associates this device with a domain.
[16:63] Rsvd — Reserved. Must be zero.
Extended Context Entry (when ECS=1 in ECAP):
data[0]:
[0] P — Present
[1] FPD — Fault Processing Disable
[2:3] TT — Translation Type (same as above)
[4:11] Rsvd
[12:63] SLPTPTR — Page table pointer (same as above)
data[1]:
[0:15] DID — Domain Identifier
[16:19] AW — Address Width. 0=3-level, 1=4-level, 2=5-level, 3=6-level.
[20:63] Rsvd — Reserved
2.4 DMAR ACPI Table
The DMAR (DMA Remapping) table describes Intel VT-d IOMMU topology. Found by scanning ACPI tables with signature "DMAR" (0x52414D44).
DMAR Header (48 bytes)
Offset Size Field Description
0x00 4 Signature "DMAR" (0x52414D44)
0x04 4 Length Total table length in bytes
0x08 1 Revision 1
0x09 1 Checksum ACPI checksum (sum of all bytes = 0)
0x0A 6 OemId OEM identifier
0x10 8 OemTableId OEM table identifier
0x18 4 OemRevision OEM revision
0x1C 4 CreatorId ASL compiler vendor
0x20 4 CreatorRevision ASL compiler revision
0x24 1 HostAddressWidth DMA physical address width (e.g., 46 for 64 TiB)
0x25 1 Flags [0] = INTR_REMAP (interrupt remapping supported)
[1] = X2APIC_OPT_OUT (firmware requests no x2APIC)
0x26 6 Reserved Reserved
0x2C ... RemappingStructures Variable-length list of DRHD/RMRR/ATSR/etc entries
DRHD (DMA Remapping Hardware Unit Definition)
Describes a single IOMMU unit. Multiple DRHD entries for systems with multiple IOMMUs.
Offset Size Field Description
0x00 2 Type 0x0001 = DRHD
0x02 2 Length Total length of this entry including device scope
0x04 1 Flags [0] = INCLUDE_PCI_ALL (1=this IOMMU handles all PCI devices
not covered by other non-ALL DRHD entries)
0x05 1 Reserved Reserved
0x06 2 SegmentNumber PCI Segment Group number
0x08 8 RegisterBaseAddress Physical MMIO base address of IOMMU registers
0x10 ... DeviceScope Variable-length device scope entries follow
DRHD Device Scope Entry
Offset Size Field Description
0x00 1 Type Device scope type:
0x01 = PCI Endpoint Device
0x02 = PCI SubHierarchy
0x03 = IOAPIC
0x04 = MSI Capable HPET
0x05 = ACPI Name-Space Device
0x01 1 Length Total length of this scope entry
0x02 1 EnumerationId Enumeration ID (e.g., IOAPIC ID for type 0x03)
0x03 1 StartBusNumber Starting PCI bus number
0x04 ... Path PCI path entries (each 2 bytes: Device, Function)
RMRR (Reserved Memory Region Reporting)
Describes memory regions that must be identity-mapped for specific devices (e.g., USB controllers, graphics).
Offset Size Field Description
0x00 2 Type 0x0002 = RMRR
0x02 2 Length Total length of this entry
0x04 2 Reserved Reserved
0x06 2 SegmentNumber PCI Segment Group
0x08 8 BaseAddress Physical start address of reserved region
0x10 8 EndAddress Physical end address of reserved region (inclusive)
0x18 ... DeviceScope Device scope entries for devices that access this region
Other DMAR Sub-Table Types
| Type | Name | Description |
|---|---|---|
| 0x0000 | Reserved | Reserved. |
| 0x0001 | DRHD | DMA Remapping Hardware Unit Definition. |
| 0x0002 | RMRR | Reserved Memory Region Reporting. |
| 0x0003 | ATSR | Root Port ATS (Address Translation Service) Capability Reporting. |
| 0x0004 | RHSA | Remapping Hardware Static Affinity (NUMA locality). |
| 0x0005 | ANDD | ACPI Name-space Device Declaration. |
2.5 Page Table Entry (Intel VT-d)
Intel VT-d uses multi-level page tables. The number of levels depends on SAGAW in CAP_REG. Typically 3 or 4 levels. Each PTE is 64 bits.
PTE (64 bits):
[0] R — Read permission. 1 = device may read.
[1] W — Write permission. 1 = device may write.
[2:11] Rsvd — Reserved. Must be zero unless extended features.
[12:63] ADDR — Physical address. For non-leaf: next-level table address (4KiB-aligned).
For leaf: page frame address.
Mask depends on page size:
4KiB: bits 12:63
2MiB: bits 21:63 (super page)
1GiB: bits 30:63 (super page)
Extended PTE with Supervisor bit (when CAP_REG supports it):
[2] S — Supervisor. 1 = supervisor-mode page.
[3] AW — Access/Dirty (for first-level translation).
[4] PSE — Page Size Extension (1 = super page at this level).
[5] A — Accessed flag.
[6] D — Dirty flag.
[7:11] Rsvd — Reserved.
2.6 Initialization Sequence (Intel VT-d)
Step-by-step register programming to bring up Intel VT-d.
Step 1: Discover IOMMU hardware
- Scan ACPI tables for DMAR signature
- Parse DRHD entries to find MMIO base addresses
- Read CAP_REG (0x08) for capabilities
- Read ECAP_REG (0x10) for extended capabilities
- Read VER_REG (0x00) for architecture version
Step 2: Ensure IOMMU is disabled
- Verify GSTS_REG.TES = 0 (translation not enabled)
- If TES=1, write GCMD_REG with TE=0, wait for TES to clear
Step 3: Allocate and zero Root Table
- Alloc 4 KiB (256 entries × 16 bytes)
- Zero all entries
- Write RTADDR_REG (0x20):
Bit 0 = 0 (legacy root table type)
Bits 12:63 = physical address
Step 4: Set Root Table Pointer
- Write GCMD_REG bit 30 (SRTP) = 1
- Poll GSTS_REG bit 30 (RTPS) until it reads 1
Step 5: Allocate and zero Context Tables (per bus)
- For each bus with devices:
Alloc 4 KiB (256 entries × 16 bytes)
Zero all entries
Set Root Entry P=1, CTP=context table address
Step 6: Configure Context Entries
- For each device:function:
Set P=1, TT=00 (legacy), SLPTPTR=page table root, DID=domain ID
- For pass-through: Set P=1, TT=10, DID=domain ID
Step 7: Build page tables (per domain)
- Create page table hierarchy matching SAGAW levels (typically 3 or 4)
- Map device-visible physical addresses to host physical addresses
- For identity mapping: GPA = HPA
Step 8: Handle RMRR regions
- Identity-map all RMRR regions for their respective devices
- These regions must always be accessible to the listed devices
Step 9: Allocate Interrupt Remap Table (if ECAP_REG.IR=1)
- Alloc table: 2^(IRTA_REG.TableSize+1) × 16 bytes
- Zero all entries
- Write IRTA_REG (0xB0):
Bits 0:6 = TableSize (e.g., 0xF = 65536 entries)
Bits 6:7 = IRTE Mode (00=remapped, 01=posted)
Bits 12:63 = Physical address
Bit 4 = EIME (Extended Interrupt Mode Enable) if x2APIC
Step 10: Enable Interrupt Remapping
- Write GCMD_REG bit 25 (SIRTP) = 1
- Poll GSTS_REG bit 25 (IRTPS) until 1
- Write GCMD_REG bit 24 (CFI) = 1 to block compatibility format interrupts
- Poll GSTS_REG bit 24 (CFIS) until 1
- Write GCMD_REG bit 23 (IR) = 1
- Poll GSTS_REG bit 23 (IRES) until 1
Step 11: Invalidate caches
- If QI (Queued Invalidation) supported (ECAP_REG.QI=1):
Set up Invalidation Queue (IQA_REG)
Submit queue-based invalidation descriptors
- Else use register-based invalidation:
Write CCMD_REG for context cache invalidation
Write IOTLB registers for TLB invalidation
Step 12: Enable translation
- Write GCMD_REG bit 31 (TE) = 1
- Poll GSTS_REG bit 31 (TES) until 1
Step 13: Enable fault handling
- Program FEDATA_REG, FEADDR_REG, FEUADDR_REG for MSI delivery
- Write FECTL_REG to enable fault interrupts
3. Rust Struct Definitions
These #[repr(C, packed)] structs can be used directly in the Red Bear OS IOMMU implementation. All bitfield access should go through helper methods (shown below) to ensure correct masking.
3.1 AMD-Vi Structs
// AMD-Vi MMIO Registers
/// AMD-Vi IOMMU MMIO register block.
/// Base address from ACPI IVRS IVHD entry.
#[repr(C)]
pub struct AmdViMmio {
pub dev_table_bar: u64, // 0x0000
pub cmd_buf_bar: u64, // 0x0008
pub evt_log_bar: u64, // 0x0010
pub control: u32, // 0x0018
_pad0: u32, // 0x001C
pub exclusion_base: u64, // 0x0020
pub exclusion_limit: u64, // 0x0028
pub extended_feature: u64, // 0x0030
pub ppr_log_bar: u64, // 0x0038
_pad1: [u64; 0x03F0], // 0x0040..0x1FFC (padding to 0x2000)
pub cmd_buf_head: u64, // 0x2000
pub cmd_buf_tail: u64, // 0x2008
pub evt_log_head: u64, // 0x2010
pub evt_log_tail: u64, // 0x2018
pub status: u32, // 0x2020
}
// Static assertions for offset verification
const _: () = assert!(core::mem::offset_of!(AmdViMmio, dev_table_bar) == 0x0000);
const _: () = assert!(core::mem::offset_of!(AmdViMmio, control) == 0x0018);
const _: () = assert!(core::mem::offset_of!(AmdViMmio, cmd_buf_head) == 0x2000);
/// AMD-Vi Control Register bits.
pub mod amd_control {
pub const IOMMU_ENABLE: u32 = 1 << 0;
pub const HT_TUN_EN: u32 = 1 << 1;
pub const EVENT_LOG_EN: u32 = 1 << 2;
pub const EVENT_INT_EN: u32 = 1 << 3;
pub const COM_WAIT_INT_EN: u32 = 1 << 4;
pub const CMD_BUF_EN: u32 = 1 << 5;
pub const PPR_LOG_EN: u32 = 1 << 6;
pub const PPR_INT_EN: u32 = 1 << 7;
pub const PPR_EN: u32 = 1 << 8;
pub const GT_EN: u32 = 1 << 9;
pub const GA_EN: u32 = 1 << 10;
pub const XT_EN: u32 = 1 << 22;
pub const NX_EN: u32 = 1 << 23;
}
/// AMD-Vi Status Register bits.
pub mod amd_status {
pub const IOMMU_RUNNING: u32 = 1 << 0;
pub const EVENT_OVERFLOW: u32 = 1 << 1;
pub const EVENT_LOG_INT: u32 = 1 << 2;
pub const COM_WAIT_INT: u32 = 1 << 3;
pub const PPR_OVERFLOW: u32 = 1 << 4;
pub const PPR_INT: u32 = 1 << 5;
}
/// AMD-Vi Extended Feature Register bits.
pub mod amd_ext_feature {
pub const PREF_SUP: u64 = 1 << 0;
pub const PPR_SUP: u64 = 1 << 1;
pub const XT_SUP: u64 = 1 << 2;
pub const NX_SUP: u64 = 1 << 3;
pub const GT_SUP: u64 = 1 << 4;
pub const IA_SUP: u64 = 1 << 6;
pub const GA_SUP: u64 = 1 << 7;
pub const HE_SUP: u64 = 1 << 8;
pub const PC_SUP: u64 = 1 << 9;
pub const GI_SUP: u64 = 1 << 57;
}
/// AMD-Vi Device Table Entry (256 bits = 32 bytes).
/// Index by BDF: (bus << 8) | (dev << 3) | func.
/// Table holds up to 65536 entries.
#[repr(C, packed)]
pub struct AmdDte {
pub data: [u64; 4],
}
impl AmdDte {
/// Create a zeroed (invalid) DTE.
pub const fn zeroed() -> Self {
Self { data: [0; 4] }
}
// data[0] accessors
pub fn valid(&self) -> bool {
self.data[0] & (1 << 0) != 0
}
pub fn set_valid(&mut self, v: bool) {
if v { self.data[0] |= 1 << 0; } else { self.data[0] &= !(1 << 0); }
}
pub fn translation_valid(&self) -> bool {
self.data[0] & (1 << 1) != 0
}
pub fn set_translation_valid(&mut self, v: bool) {
if v { self.data[0] |= 1 << 1; } else { self.data[0] &= !(1 << 1); }
}
/// Translation mode (bits 9:11). 0=no translation, 4=4-level page table.
pub fn mode(&self) -> u64 {
(self.data[0] >> 9) & 0x7
}
pub fn set_mode(&mut self, m: u64) {
self.data[0] = (self.data[0] & !(0x7 << 9)) | ((m & 0x7) << 9);
}
/// Page Table Root Pointer (bits 12:51 of data[0]).
/// Address must be 4KiB-aligned.
pub fn page_table_root(&self) -> u64 {
(self.data[0] >> 12) & 0x000F_FFFF_FFFF_FFFF
}
pub fn set_page_table_root(&mut self, addr: u64) {
self.data[0] = (self.data[0] & !(0x000F_FFFF_FFFF_FFFF << 12))
| ((addr >> 12) << 12);
}
/// Interrupt Remapping Enable (bit 61 of data[0]).
pub fn interrupt_remap(&self) -> bool {
self.data[0] & (1 << 61) != 0
}
pub fn set_interrupt_remap(&mut self, v: bool) {
if v { self.data[0] |= 1 << 61; } else { self.data[0] &= !(1 << 61); }
}
/// Interrupt Write permission (bit 62 of data[0]).
pub fn interrupt_write(&self) -> bool {
self.data[0] & (1 << 62) != 0
}
pub fn set_interrupt_write(&mut self, v: bool) {
if v { self.data[0] |= 1 << 62; } else { self.data[0] &= !(1 << 62); }
}
// data[1] accessors
/// Interrupt Remap Table Length (bits 0:3 of data[1]).
/// Number of IRTEs = 2^(len+1).
pub fn int_table_len(&self) -> u64 {
self.data[1] & 0xF
}
pub fn set_int_table_len(&mut self, len: u64) {
self.data[1] = (self.data[1] & !0xF) | (len & 0xF);
}
/// Interrupt Control (bits 4:5 of data[1]).
/// 00=abort, 01=pass-through, 10=remapped.
pub fn int_control(&self) -> u64 {
(self.data[1] >> 4) & 0x3
}
pub fn set_int_control(&mut self, ctl: u64) {
self.data[1] = (self.data[1] & !(0x3 << 4)) | ((ctl & 0x3) << 4);
}
/// Interrupt Remap Table Pointer (bits 6:51 of data[1]).
/// Address must be 4KiB-aligned.
pub fn int_remap_table_ptr(&self) -> u64 {
(self.data[1] >> 6) & 0x000F_FFFF_FFFF_FFFF
}
pub fn set_int_remap_table_ptr(&mut self, addr: u64) {
self.data[1] = (self.data[1] & !(0x000F_FFFF_FFFF_FFFF << 6))
| ((addr >> 6) << 6);
}
}
const _: () = assert!(core::mem::size_of::<AmdDte>() == 32);
/// AMD-Vi Interrupt Remapping Table Entry (128 bits = 16 bytes).
#[repr(C, packed)]
pub struct AmdIrte {
pub data: [u64; 2],
}
impl AmdIrte {
pub const fn zeroed() -> Self {
Self { data: [0; 2] }
}
/// Remap enable (bit 0 of data[0]).
pub fn remap_enabled(&self) -> bool {
self.data[0] & (1 << 0) != 0
}
pub fn set_remap_enabled(&mut self, v: bool) {
if v { self.data[0] |= 1 << 0; } else { self.data[0] &= !(1 << 0); }
}
/// Suppress IO Page Fault (bit 1).
pub fn suppress_io_pf(&self) -> bool {
self.data[0] & (1 << 1) != 0
}
pub fn set_suppress_io_pf(&mut self, v: bool) {
if v { self.data[0] |= 1 << 1; } else { self.data[0] &= !(1 << 1); }
}
/// Interrupt type (bits 2:4 of data[0]).
pub fn int_type(&self) -> u64 {
(self.data[0] >> 2) & 0x7
}
pub fn set_int_type(&mut self, t: u64) {
self.data[0] = (self.data[0] & !(0x7 << 2)) | ((t & 0x7) << 2);
}
/// Destination mode (bit 2 of data[0], when using xAPIC logical).
/// 0=physical APIC ID, 1=logical.
pub fn dst_mode(&self) -> bool {
self.data[0] & (1 << 2) != 0
}
/// Destination APIC ID (bits 16:31 of data[0], low 16 bits).
/// For x2APIC, high 32 bits in data[1] bits 0:31.
pub fn destination(&self) -> u32 {
((self.data[0] >> 16) & 0xFFFF) as u32 | ((self.data[1] & 0xFFFF_FFFF) as u32) << 16
}
pub fn set_destination(&mut self, apic_id: u32) {
self.data[0] = (self.data[0] & !(0xFFFF << 16)) | (((apic_id & 0xFFFF) as u64) << 16);
self.data[1] = (self.data[1] & !0xFFFF_FFFF) | ((apic_id >> 16) as u64);
}
/// Vector (bits 32:39 of data[0], but stored in low byte of upper word).
pub fn vector(&self) -> u8 {
((self.data[0] >> 32) & 0xFF) as u8
}
pub fn set_vector(&mut self, v: u8) {
self.data[0] = (self.data[0] & !(0xFF_u64 << 32)) | ((v as u64) << 32);
}
}
const _: () = assert!(core::mem::size_of::<AmdIrte>() == 16);
/// AMD-Vi Command Buffer Entry (128 bits = 16 bytes = 4 × u32).
#[repr(C, packed)]
pub struct AmdCmdEntry {
pub word: [u32; 4],
}
impl AmdCmdEntry {
pub const fn zeroed() -> Self {
Self { word: [0; 4] }
}
pub fn opcode(&self) -> u8 {
(self.word[0] & 0xF) as u8
}
pub fn set_opcode(&mut self, op: u8) {
self.word[0] = (self.word[0] & !0xF) | (op as u32 & 0xF);
}
}
const _: () = assert!(core::mem::size_of::<AmdCmdEntry>() == 16);
/// AMD-Vi Command Opcodes.
pub mod amd_cmd_opcode {
pub const COMPLETION_WAIT: u8 = 0x01;
pub const INVALIDATE_DEVTAB_ENTRY: u8 = 0x02;
pub const INVALIDATE_IOMMU_PAGES: u8 = 0x03;
pub const INVALIDATE_INTERRUPT_TABLE: u8 = 0x04;
pub const INVALIDATE_IOMMU_ALL: u8 = 0x05;
}
/// Build a COMPLETION_WAIT command.
pub fn amd_cmd_completion_wait(store_addr: u64, store_data: u32) -> AmdCmdEntry {
let mut cmd = AmdCmdEntry::zeroed();
cmd.set_opcode(amd_cmd_opcode::COMPLETION_WAIT);
cmd.word[0] |= 1 << 4; // Store = 1
cmd.word[1] = store_addr as u32;
cmd.word[2] = (store_addr >> 32) as u32;
cmd.word[3] = store_data;
cmd
}
/// Build an INVALIDATE_DEVTAB_ENTRY command for a given BDF.
pub fn amd_cmd_invalidate_devtab(bdf: u16) -> AmdCmdEntry {
let mut cmd = AmdCmdEntry::zeroed();
cmd.set_opcode(amd_cmd_opcode::INVALIDATE_DEVTAB_ENTRY);
cmd.word[1] = bdf as u32;
cmd
}
/// Build an INVALIDATE_IOMMU_PAGES command.
/// If size=true, invalidates all pages for the domain (address ignored).
pub fn amd_cmd_invalidate_pages(domain_id: u16, address: u64, size: bool) -> AmdCmdEntry {
let mut cmd = AmdCmdEntry::zeroed();
cmd.set_opcode(amd_cmd_opcode::INVALIDATE_IOMMU_PAGES);
if size { cmd.word[0] |= 1 << 4; } // S bit
cmd.word[1] = domain_id as u32;
cmd.word[2] = address as u32;
cmd.word[3] = (address >> 32) as u32;
cmd
}
/// Build an INVALIDATE_INTERRUPT_TABLE command.
pub fn amd_cmd_invalidate_int_table(bdf: u16) -> AmdCmdEntry {
let mut cmd = AmdCmdEntry::zeroed();
cmd.set_opcode(amd_cmd_opcode::INVALIDATE_INTERRUPT_TABLE);
cmd.word[1] = bdf as u32;
cmd
}
/// Build an INVALIDATE_IOMMU_ALL command.
pub fn amd_cmd_invalidate_all() -> AmdCmdEntry {
let mut cmd = AmdCmdEntry::zeroed();
cmd.set_opcode(amd_cmd_opcode::INVALIDATE_IOMMU_ALL);
cmd
}
/// AMD-Vi Event Log Entry (128 bits = 16 bytes = 4 × u32).
#[repr(C, packed)]
pub struct AmdEvtEntry {
pub word: [u32; 4],
}
impl AmdEvtEntry {
pub const fn zeroed() -> Self {
Self { word: [0; 4] }
}
/// Event code (bits 0:15 of word[0]).
pub fn event_code(&self) -> u16 {
(self.word[0] & 0xFFFF) as u16
}
/// Device ID / BDF (bits 0:15 of word[1]).
pub fn device_id(&self) -> u16 {
(self.word[1] & 0xFFFF) as u16
}
/// Fault address (word[2] | word[3] << 32).
pub fn fault_address(&self) -> u64 {
self.word[2] as u64 | ((self.word[3] as u64) << 32)
}
/// Flags from word[0] bits 16:22 (for IO_PAGE_FAULT).
pub fn fault_flags(&self) -> u16 {
((self.word[0] >> 16) & 0x7F) as u16
}
/// Read/write direction from fault flags bit 4 (RW).
pub fn is_write(&self) -> bool {
self.word[0] & (1 << 20) != 0
}
/// Permission error from fault flags bit 3 (PE).
pub fn is_permission_error(&self) -> bool {
self.word[0] & (1 << 19) != 0
}
}
const _: () = assert!(core::mem::size_of::<AmdEvtEntry>() == 16);
/// AMD-Vi Event Codes.
pub mod amd_evt_code {
pub const ILLEGAL_DEV_TABLE_ENTRY: u16 = 0x01;
pub const IO_PAGE_FAULT: u16 = 0x02;
pub const DEV_TABLE_HW_ERROR: u16 = 0x03;
pub const PAGE_TABLE_HW_ERROR: u16 = 0x04;
pub const ILLEGAL_COMMAND: u16 = 0x05;
pub const COMMAND_HW_ERROR: u16 = 0x06;
pub const IOTLB_INV_TIMEOUT: u16 = 0x07;
pub const INVALID_DEV_REQUEST: u16 = 0x08;
}
/// AMD-Vi Page Table Entry (64 bits).
#[repr(C, packed)]
pub struct AmdPte(pub u64);
impl AmdPte {
/// Present bit (bit 0).
pub fn present(&self) -> bool {
self.0 & (1 << 0) != 0
}
pub fn set_present(&mut self, v: bool) {
if v { self.0 |= 1 << 0; } else { self.0 &= !(1 << 0); }
}
/// Next level (bits 9:11). 0 = leaf PTE, 1-5 = pointer to next table.
pub fn next_level(&self) -> u64 {
(self.0 >> 9) & 0x7
}
pub fn set_next_level(&mut self, level: u64) {
self.0 = (self.0 & !(0x7 << 9)) | ((level & 0x7) << 9);
}
/// Output address (bits 12:51). Physical frame or next-table address.
pub fn output_addr(&self) -> u64 {
self.0 & (0x000F_FFFF_FFFF_FFFF << 12)
}
pub fn set_output_addr(&mut self, addr: u64) {
self.0 = (self.0 & !(0x000F_FFFF_FFFF_FFFF << 12)) | (addr & (0x000F_FFFF_FFFF_FFFF << 12));
}
/// No-execute (bit 63). Only valid when NXSup=1.
pub fn no_execute(&self) -> bool {
self.0 & (1 << 63) != 0
}
pub fn set_no_execute(&mut self, v: bool) {
if v { self.0 |= 1 << 63; } else { self.0 &= !(1 << 63); }
}
}
/// Build a leaf PTE that maps addr with Read+Write permissions.
pub fn amd_pte_leaf(addr: u64) -> AmdPte {
let mut pte = AmdPte(0);
pte.set_present(true);
pte.set_next_level(0); // leaf
pte.set_output_addr(addr);
pte.0 |= (1 << 2) | (1 << 3); // IW + IR (write + read permission)
pte
}
/// Build a non-leaf PTE that points to the next-level table at addr.
pub fn amd_pte_pointer(addr: u64, level: u64) -> AmdPte {
let mut pte = AmdPte(0);
pte.set_present(true);
pte.set_next_level(level);
pte.set_output_addr(addr);
pte
}
3.2 Intel VT-d Structs
/// Intel VT-d IOMMU MMIO register block.
/// Base address from ACPI DMAR DRHD entry.
#[repr(C)]
pub struct IntelVtdMmio {
pub ver_reg: u32, // 0x00 Version
_pad0: u32, // 0x04
pub cap_reg: u64, // 0x08 Capability
pub ecap_reg: u64, // 0x10 Extended Capability
pub gcmd_reg: u32, // 0x18 Global Command (write-only)
pub gsts_reg: u32, // 0x1C Global Status (read-only)
pub rtaddr_reg: u64, // 0x20 Root Table Address
pub ccmd_reg: u64, // 0x28 Context Command
_pad1: u64, // 0x30
pub fsts_reg: u32, // 0x34 Fault Status
pub fectl_reg: u32, // 0x38 Fault Event Control
pub fedata_reg: u32, // 0x3C Fault Event Data
pub feaddr_reg: u32, // 0x40 Fault Event Address
pub feuaddr_reg: u32, // 0x44 Fault Event Upper Address
_pad2: u32, // 0x48
pub aflog_reg: u64, // 0x4C Advanced Fault Log (note: spec says 0x48 for 64-bit)
_pad3: u32, // padding
pub pmen_reg: u32, // 0x64 Protected Memory Enable (spec: 0x64)
pub plmbase_reg: u32, // 0x68 Protected Low Memory Base
pub plmlimit_reg: u32, // 0x6C Protected Low Memory Limit
_pad4: u32,
pub phmbase_reg: u64, // 0x70 Protected High Memory Base
pub phmlimit_reg: u64, // 0x78 Protected High Memory Limit
pub iqh_reg: u64, // 0x80 Invalidation Queue Head
pub iqt_reg: u64, // 0x88 Invalidation Queue Tail
pub iqa_reg: u64, // 0x90 Invalidation Queue Address
pub ics_reg: u32, // 0x98 Invalidation Completion Status
_pad5: u32,
pub iectl_reg: u32, // 0xA0 Invalidation Event Control
pub iedata_reg: u32, // 0xA4 Invalidation Event Data
pub ieaddr_reg: u32, // 0xA8 Invalidation Event Address
pub ieuaddr_reg: u32, // 0xAC Invalidation Event Upper Address
_pad6: [u32; 2], // 0xB0..0xB7 (IRTA is separate below)
pub irta_reg: u64, // 0xB8 Interrupt Remapping Table Address
}
// Note: The VT-d register layout has vendor-specific gaps. For production code,
// use volatile read/write helpers with explicit offsets rather than relying
// purely on struct field offsets. The struct above serves as a reference.
// The IRTA_REG offset is 0xB8 per VT-d spec 5.0 (some earlier specs say 0xB0).
/// Intel VT-d CAP_REG bits.
pub mod vtd_cap {
pub const ND_MASK: u64 = 0x7;
pub const ZLR: u64 = 1 << 8;
pub const AFL: u64 = 1 << 9;
pub const RWBF: u64 = 1 << 10;
pub const PLMR: u64 = 1 << 11;
pub const PHMR: u64 = 1 << 13;
pub const CM: u64 = 1 << 14;
pub const SAGAW: u64 = 0xFF << 16;
pub const SAGAW_3LVL: u64 = 1 << 18; // 3-level page tables
pub const SAGAW_4LVL: u64 = 1 << 19; // 4-level page tables
pub const SAGAW_5LVL: u64 = 1 << 20; // 5-level page tables
pub const SAGAW_6LVL: u64 = 1 << 21; // 6-level page tables
pub const MGAW_SHIFT: u64 = 24;
pub const MGAW_MASK: u64 = 0x3F << 24;
}
/// Intel VT-d ECAP_REG bits.
pub mod vtd_ecap {
pub const C: u64 = 1 << 0; // Page Request
pub const QI: u64 = 1 << 1; // Queued Invalidation
pub const DT: u64 = 1 << 2; // Device TLB
pub const IR: u64 = 1 << 3; // Interrupt Remapping
pub const EIM: u64 = 1 << 4; // Extended Interrupt Mode (x2APIC)
pub const PT: u64 = 1 << 8; // Pass Through
pub const SC: u64 = 1 << 18; // Snoop Control
pub const IRO_SHIFT: u64 = 25;
pub const IRO_MASK: u64 = 0x3FF << 25;
}
/// Intel VT-d GCMD_REG bits (write-only).
pub mod vtd_gcmd {
pub const TE: u32 = 1 << 31; // Translation Enable
pub const SRTP: u32 = 1 << 30; // Set Root Table Pointer
pub const SFL: u32 = 1 << 29; // Set Fault Log
pub const EAFL: u32 = 1 << 28; // Enable Advanced Fault Log
pub const WBF: u32 = 1 << 27; // Write Buffer Flush
pub const QIE: u32 = 1 << 26; // Queued Invalidation Enable
pub const SIRTP: u32 = 1 << 25; // Set Interrupt Remap Table Pointer
pub const CFI: u32 = 1 << 24; // Compatibility Format Interrupt
pub const IR: u32 = 1 << 23; // Interrupt Remap Enable
}
/// Intel VT-d GSTS_REG bits (read-only).
pub mod vtd_gsts {
pub const TES: u32 = 1 << 31; // Translation Enable Status
pub const RTPS: u32 = 1 << 30; // Root Table Pointer Status
pub const FLS: u32 = 1 << 29; // Fault Log Status
pub const AFLS: u32 = 1 << 28; // Advanced Fault Log Status
pub const WBFS: u32 = 1 << 27; // Write Buffer Flush Status
pub const QIES: u32 = 1 << 26; // Queued Invalidation Enable Status
pub const IRTPS: u32 = 1 << 25; // Interrupt Remap Table Pointer Status
pub const CFIS: u32 = 1 << 24; // Compatibility Format Interrupt Status
pub const IRES: u32 = 1 << 23; // Interrupt Remap Enable Status
}
/// Intel VT-d Root Table Entry (128 bits = 16 bytes).
/// 256 entries (one per PCI bus). 4KiB-aligned.
#[repr(C, packed)]
pub struct VtdRootEntry {
pub data: [u64; 2],
}
impl VtdRootEntry {
pub const fn zeroed() -> Self {
Self { data: [0; 2] }
}
/// Present (bit 0 of data[0]).
pub fn present(&self) -> bool {
self.data[0] & (1 << 0) != 0
}
pub fn set_present(&mut self, v: bool) {
if v { self.data[0] |= 1 << 0; } else { self.data[0] &= !(1 << 0); }
}
/// Context Table Pointer (bits 12:63 of data[0]).
pub fn context_table_ptr(&self) -> u64 {
self.data[0] & !0xFFF
}
pub fn set_context_table_ptr(&mut self, addr: u64) {
self.data[0] = (self.data[0] & 0xFFF) | (addr & !0xFFF);
}
}
const _: () = assert!(core::mem::size_of::<VtdRootEntry>() == 16);
/// Intel VT-d Context Entry (128 bits = 16 bytes).
/// 256 entries per bus (one per device:function). 4KiB-aligned table.
#[repr(C, packed)]
pub struct VtdContextEntry {
pub data: [u64; 2],
}
impl VtdContextEntry {
pub const fn zeroed() -> Self {
Self { data: [0; 2] }
}
/// Present (bit 0 of data[0]).
pub fn present(&self) -> bool {
self.data[0] & (1 << 0) != 0
}
pub fn set_present(&mut self, v: bool) {
if v { self.data[0] |= 1 << 0; } else { self.data[0] &= !(1 << 0); }
}
/// Fault Processing Disable (bit 1 of data[0]).
pub fn fault_processing_disable(&self) -> bool {
self.data[0] & (1 << 1) != 0
}
pub fn set_fault_processing_disable(&mut self, v: bool) {
if v { self.data[0] |= 1 << 1; } else { self.data[0] &= !(1 << 1); }
}
/// Translation Type (bits 2:3 of data[0]).
/// 00=legacy, 01=PASID, 10=pass-through, 11=reserved.
pub fn translation_type(&self) -> u64 {
(self.data[0] >> 2) & 0x3
}
pub fn set_translation_type(&mut self, tt: u64) {
self.data[0] = (self.data[0] & !(0x3 << 2)) | ((tt & 0x3) << 2);
}
/// Second Level Page Table Pointer (bits 12:63 of data[0]).
pub fn slpt_ptr(&self) -> u64 {
self.data[0] & !0xFFF
}
pub fn set_slpt_ptr(&mut self, addr: u64) {
self.data[0] = (self.data[0] & 0xFFF) | (addr & !0xFFF);
}
/// Domain Identifier (bits 0:15 of data[1]).
pub fn domain_id(&self) -> u16 {
(self.data[1] & 0xFFFF) as u16
}
pub fn set_domain_id(&mut self, id: u16) {
self.data[1] = (self.data[1] & !0xFFFF) | (id as u64);
}
}
const _: () = assert!(core::mem::size_of::<VtdContextEntry>() == 16);
/// Intel VT-d Translation Type constants.
pub mod vtd_tt {
pub const LEGACY: u64 = 0b00;
pub const PASID: u64 = 0b01;
pub const PASS_THROUGH: u64 = 0b10;
}
/// Intel VT-d Page Table Entry (64 bits).
#[repr(C, packed)]
pub struct VtdPte(pub u64);
impl VtdPte {
/// Read permission (bit 0).
pub fn read(&self) -> bool {
self.0 & (1 << 0) != 0
}
pub fn set_read(&mut self, v: bool) {
if v { self.0 |= 1 << 0; } else { self.0 &= !(1 << 0); }
}
/// Write permission (bit 1).
pub fn write(&self) -> bool {
self.0 & (1 << 1) != 0
}
pub fn set_write(&mut self, v: bool) {
if v { self.0 |= 1 << 1; } else { self.0 &= !(1 << 1); }
}
/// Page frame or next-table address (bits 12:63).
pub fn addr(&self) -> u64 {
self.0 & !0xFFF
}
pub fn set_addr(&mut self, a: u64) {
self.0 = (self.0 & 0xFFF) | (a & !0xFFF);
}
}
/// Build a leaf PTE for Intel VT-d with read+write.
pub fn vtd_pte_leaf(addr: u64) -> VtdPte {
let mut pte = VtdPte(0);
pte.set_read(true);
pte.set_write(true);
pte.set_addr(addr);
pte
}
/// Build a non-leaf PTE for Intel VT-d pointing to next-level table.
pub fn vtd_pte_pointer(addr: u64) -> VtdPte {
let mut pte = VtdPte(0);
pte.set_read(true);
pte.set_write(true);
pte.set_addr(addr);
pte
}
3.3 ACPI Table Structs
/// Common ACPI table header (24 bytes).
#[repr(C, packed)]
pub struct AcpiTableHeader {
pub signature: [u8; 4],
pub length: u32,
pub revision: u8,
pub checksum: u8,
pub oem_id: [u8; 6],
pub oem_table_id: [u8; 8],
pub oem_revision: u32,
pub creator_id: [u8; 4],
pub creator_revision: u32,
}
const _: () = assert!(core::mem::size_of::<AcpiTableHeader>() == 36);
/// IVRS ACPI Table Header.
#[repr(C, packed)]
pub struct IvrsTable {
pub header: AcpiTableHeader, // 36 bytes
pub iv_info: u32, // IOMMU Virtualization Info
// Followed by variable-length IVHD/IVMD entries.
}
/// IVHD Entry (I/O Virtualization Hardware Definition).
#[repr(C, packed)]
pub struct IvhdEntry {
pub entry_type: u8, // 0x10 or 0x11
pub flags: u8, // Feature flags
pub length: u16, // Total length including device entries
pub device_id: u16, // BDF of IOMMU PCI device
pub capability_offset: u16, // PCI capability offset
pub iommu_base_address: u64, // MMIO base address
pub pci_segment_group: u16, // PCI segment group
pub iommu_info: u16, // IOMMU info (MSI number, unit ID)
pub iommu_efr: u32, // Extended features (type 11 only)
// Followed by variable-length device entries.
}
/// IVMD Entry (I/O Virtualization Memory Definition).
#[repr(C, packed)]
pub struct IvmdEntry {
pub entry_type: u8, // 0x20 or 0x21
pub flags: u8, // Memory block flags
pub length: u16, // Total length
pub device_id: u16, // Start DeviceId (BDF) or 0x0000 for all
pub aux_data: u16, // Auxiliary data
pub start_address: u64, // Physical start address
pub memory_length: u64, // Length in bytes
}
/// IVHD Device Entry (4 bytes minimum).
#[repr(C, packed)]
pub struct IvhdDeviceEntry {
pub dev_type: u8, // Device entry type (0x00..0x44)
pub data: u8, // LSA flags
pub device_id: u16, // BDF for SEL/SOR/EOR
}
/// DMAR ACPI Table Header.
#[repr(C, packed)]
pub struct DmarTable {
pub header: AcpiTableHeader, // 36 bytes
pub host_address_width: u8, // DMA physical address width
pub flags: u8, // [0]=INTR_REMAP, [1]=X2APIC_OPT_OUT
pub reserved: [u8; 10], // Reserved
// Followed by variable-length DRHD/RMRR entries.
}
const _: () = assert!(core::mem::size_of::<DmarTable>() == 48);
/// DRHD Entry (DMA Remapping Hardware Unit Definition).
#[repr(C, packed)]
pub struct DrhdEntry {
pub entry_type: u16, // 0x0001
pub length: u16, // Total length including device scope
pub flags: u8, // [0]=INCLUDE_PCI_ALL
pub reserved: u8, // Reserved
pub segment_number: u16, // PCI segment group
pub register_base_address: u64, // Physical MMIO base address
// Followed by variable-length device scope entries.
}
/// DRHD Device Scope Entry.
#[repr(C, packed)]
pub struct DmarDeviceScope {
pub scope_type: u8, // 0x01=PCI EP, 0x02=PCI sub-hierarchy, 0x03=IOAPIC, 0x04=HPET
pub length: u8, // Total length including path entries
pub enumeration_id: u8, // Enumeration ID (IOAPIC ID, etc.)
pub start_bus_number: u8, // Starting PCI bus number
// Followed by path entries (each 2 bytes: device, function).
}
/// RMRR Entry (Reserved Memory Region Reporting).
#[repr(C, packed)]
pub struct RmrrEntry {
pub entry_type: u16, // 0x0002
pub length: u16, // Total length
pub reserved: u16, // Reserved
pub segment_number: u16, // PCI segment group
pub base_address: u64, // Physical start address
pub end_address: u64, // Physical end address (inclusive)
// Followed by variable-length device scope entries.
}
/// DMAR Sub-Table Types.
pub mod dmar_type {
pub const DRHD: u16 = 0x0001;
pub const RMRR: u16 = 0x0002;
pub const ATSR: u16 = 0x0003;
pub const RHSA: u16 = 0x0004;
pub const ANDD: u16 = 0x0005;
}
/// DMAR Device Scope Types.
pub mod dmar_scope_type {
pub const PCI_ENDPOINT: u8 = 0x01;
pub const PCI_SUBHIERARCHY: u8 = 0x02;
pub const IOAPIC: u8 = 0x03;
pub const MSI_HPET: u8 = 0x04;
pub const ACPI_NAMESPACE: u8 = 0x05;
}
3.4 Utility Types
/// BDF (Bus:Device:Function) packed as u16.
/// Format: bus[15:8] | device[7:3] | function[2:0].
#[derive(Clone, Copy, Debug, PartialEq, Eq)]
pub struct Bdf(pub u16);
impl Bdf {
pub fn new(bus: u8, device: u8, function: u8) -> Self {
Self(((bus as u16) << 8) | ((device as u16 & 0x1F) << 3) | (function as u16 & 0x7))
}
pub fn bus(&self) -> u8 {
(self.0 >> 8) as u8
}
pub fn device(&self) -> u8 {
((self.0 >> 3) & 0x1F) as u8
}
pub fn function(&self) -> u8 {
(self.0 & 0x7) as u8
}
/// Index into the AMD Device Table (same as raw BDF value).
pub fn dev_table_index(&self) -> usize {
self.0 as usize
}
}
/// Domain ID. Used to group devices sharing a page table.
#[derive(Clone, Copy, Debug, PartialEq, Eq)]
pub struct DomainId(pub u16);
/// Page table level constants.
pub mod pt_level {
/// AMD-Vi levels (Mode field in DTE).
pub const AMD_1_LEVEL: u64 = 1;
pub const AMD_2_LEVEL: u64 = 2;
pub const AMD_3_LEVEL: u64 = 3;
pub const AMD_4_LEVEL: u64 = 4;
pub const AMD_5_LEVEL: u64 = 5;
pub const AMD_6_LEVEL: u64 = 6;
/// Intel VT-d levels (SAGAW field).
pub const VTd_3_LEVEL: u64 = 3;
pub const VTd_4_LEVEL: u64 = 4;
pub const VTd_5_LEVEL: u64 = 5;
pub const VTd_6_LEVEL: u64 = 6;
}
3.5 Size Constants
/// AMD-Vi sizing constants.
pub mod amd_sizes {
/// Maximum Device Table entries.
pub const MAX_DEV_TABLE_ENTRIES: usize = 65536;
/// Device Table Entry size.
pub const DTE_SIZE: usize = 32;
/// Maximum Device Table size (65536 × 32 bytes).
pub const MAX_DEV_TABLE_SIZE: usize = MAX_DEV_TABLE_ENTRIES * DTE_SIZE; // 2 MiB
/// Default Command Buffer entries.
pub const CMD_BUF_ENTRIES: usize = 512;
/// Command Buffer Entry size.
pub const CMD_ENTRY_SIZE: usize = 16;
/// Default Command Buffer size.
pub const CMD_BUF_SIZE: usize = CMD_BUF_ENTRIES * CMD_ENTRY_SIZE; // 8 KiB
/// Default Event Log entries.
pub const EVT_LOG_ENTRIES: usize = 512;
/// Event Log Entry size.
pub const EVT_ENTRY_SIZE: usize = 16;
/// Default Event Log size.
pub const EVT_LOG_SIZE: usize = EVT_LOG_ENTRIES * EVT_ENTRY_SIZE; // 8 KiB
/// IRTE size (128 bits).
pub const IRTE_SIZE: usize = 16;
/// Maximum Interrupt Remap Table entries (IntTabLen=11 → 2^12 = 4096).
pub const MAX_IRT_ENTRIES: usize = 4096;
/// Maximum Interrupt Remap Table size.
pub const MAX_IRT_SIZE: usize = MAX_IRT_ENTRIES * IRTE_SIZE; // 64 KiB
/// Page table entry size (both AMD and Intel).
pub const PTE_SIZE: usize = 8;
/// Entries per page table page (4KiB / 8 bytes).
pub const PTES_PER_PAGE: usize = 512;
}
/// Intel VT-d sizing constants.
pub mod vtd_sizes {
/// Root Table entries (one per PCI bus).
pub const ROOT_TABLE_ENTRIES: usize = 256;
/// Root/Context Entry size.
pub const ENTRY_SIZE: usize = 16;
/// Root Table size.
pub const ROOT_TABLE_SIZE: usize = ROOT_TABLE_ENTRIES * ENTRY_SIZE; // 4 KiB
/// Context Table entries (one per device:function per bus).
pub const CTX_TABLE_ENTRIES: usize = 256;
/// Context Table size.
pub const CTX_TABLE_SIZE: usize = CTX_TABLE_ENTRIES * ENTRY_SIZE; // 4 KiB
/// Page table entry size.
pub const PTE_SIZE: usize = 8;
/// Entries per page table page.
pub const PTES_PER_PAGE: usize = 512;
}
/// PCI BDF address space: 256 buses × 32 devices × 8 functions = 65536.
pub const PCI_BDF_COUNT: usize = 256 * 32 * 8;
Appendix: Linux Kernel Reference
The Linux kernel IOMMU drivers are the primary reference implementation. Key files:
| Path | Description |
|---|---|
drivers/iommu/amd/amd_iommu_types.h |
AMD-Vi type definitions, DTE/IRTE/PTE formats, register constants |
drivers/iommu/amd/amd_iommu.c |
AMD-Vi main driver: init, command buffer, device table management |
drivers/iommu/amd/init.c |
AMD-Vi initialization, IVRS parsing, early setup |
drivers/iommu/amd/irq.c |
AMD-Vi interrupt remapping |
drivers/iommu/intel/dmar.c |
Intel VT-d DMAR table parsing |
drivers/iommu/intel/iommu.c |
Intel VT-d main driver |
drivers/iommu/intel/irq_remapping.c |
Intel VT-d interrupt remapping |
include/linux/intel-iommu.h |
Intel VT-d register definitions, struct definitions |
drivers/iommu/io-pgtable.c |
Generic page table allocation |
Key Linux Constants for Cross-Reference
// AMD DTE bits (from amd_iommu_types.h)
#define DTE_FLAG_V (1ULL << 0)
#define DTE_FLAG_TV (1ULL << 1)
#define DTE_FLAG_IR (1ULL << 61)
#define DTE_FLAG_IW (1ULL << 62)
#define DTE_FLAG_SE (1ULL << 8)
// AMD page table modes (DTE Mode field)
#define DTE_MODE_4LVL 4 // 4-level page tables (most common)
// AMD command opcodes
#define CMD_COMPLETION_WAIT 0x01
#define CMD_INVALIDATE_DEVTAB_ENTRY 0x02
#define CMD_INVALIDATE_IOMMU_PAGES 0x03
#define CMD_INVALIDATE_INTERRUPT_TABLE 0x04
// Intel DMAR flags
#define DMAR_INTR_REMAP 0x1
#define DMAR_X2APIC_OPT_OUT 0x2
// Intel context entry TT (Translation Type)
#define CONTEXT_TT_MULTI_LEVEL 0
#define CONTEXT_TT_DEV_IOTLB 1
#define CONTEXT_TT_PASS_THROUGH 2
Document generated for Red Bear OS IOMMU implementation. Sources: AMD IOMMU Specification 48882 Rev 3.10, Intel VT-d Specification Rev 5.0, Linux kernel v6.x source.