1947 lines
72 KiB
Markdown
1947 lines
72 KiB
Markdown
# IOMMU Specification Reference — AMD-Vi & Intel VT-d
|
||
|
||
**Purpose**: Implementation-ready hardware register and data structure reference for Red Bear OS IOMMU support. Based on AMD IOMMU Specification 48882 Rev 3.10 and Intel Virtualization Technology for Directed I/O (VT-d) Rev 5.0.
|
||
|
||
**Status**: The `iommu` daemon now builds in-tree, owns AMD-Vi runtime initialization, and also detects the presence of a kernel ACPI `DMAR` table as a convergence seam so Intel VT-d runtime ownership can move here instead of remaining conceptually stranded in `acpid`. That does **not** yet mean Intel VT-d ownership is cleanly closed in the current tree. Hardware validation is still missing in the AMD-first integration plan (see `AMD-FIRST-INTEGRATION.md`). This document provides the register and data-structure reference for finishing AMD-Vi and Intel VT-d bring-up.
|
||
|
||
---
|
||
|
||
## 1. AMD-Vi (AMD IOMMU)
|
||
|
||
### 1.1 MMIO Register Map
|
||
|
||
Base address obtained from ACPI IVRS table (IVHD entry `IOMMUInfo` field).
|
||
|
||
| Offset | Name | Size | Access | Description |
|
||
|--------|------|------|--------|-------------|
|
||
| 0x0000 | DevTableBar | 64-bit | R/W | Device Table Base Address. Bits 12:51 hold physical address. Bits 0:8 = DeviceTableSize (entries = 2^(size+1), max 65536). Must be 4KiB-aligned. |
|
||
| 0x0008 | CmdBufBar | 64-bit | R/W | Command Buffer Base Address. Bits 12:51 hold physical address. Bits 0:8 = CmdBufLen (size = 2^(len+2) × 16 bytes). Must be 4KiB-aligned. |
|
||
| 0x0010 | EvtLogBar | 64-bit | R/W | Event Log Base Address. Bits 12:51 hold physical address. Bits 0:8 = EvtLogLen (size = 2^(len+2) × 16 bytes). Must be 4KiB-aligned. |
|
||
| 0x0018 | Control | 32-bit | R/W | IOMMU Control Register. See bit layout below. |
|
||
| 0x0020 | ExclusionBase | 64-bit | R/W | Exclusion Range Base Address. Physical address of excluded region start. |
|
||
| 0x0028 | ExclusionLimit | 64-bit | R/W | Exclusion Range Limit Address. Physical address of excluded region end. |
|
||
| 0x0030 | ExtendedFeature | 64-bit | RO | Extended Feature Register. Capability flags. Read to determine supported features. |
|
||
| 0x0038 | PprLogBar | 64-bit | R/W | Peripheral Page Request Log Base Address. Bits 12:51 = address, Bits 0:8 = log length. |
|
||
| 0x0030 | ExtendedFeature | 64-bit | RO | Extended Feature Register (alias for capability query). |
|
||
| 0x2000 | CmdBufHead | 64-bit | R/W | Command Buffer Head Pointer. Index into command buffer (byte offset / 16). |
|
||
| 0x2008 | CmdBufTail | 64-bit | R/W | Command Buffer Tail Pointer. Written by software to submit commands. |
|
||
| 0x2010 | EvtLogHead | 64-bit | R/W | Event Log Head Pointer. Written by software after reading events. |
|
||
| 0x2018 | EvtLogTail | 64-bit | RO | Event Log Tail Pointer. Updated by IOMMU hardware after writing event. |
|
||
| 0x2020 | Status | 32-bit | RO | IOMMU Status Register. See bit layout below. |
|
||
| 0x2028 | PprLogHead | 64-bit | R/W | PPR Log Head Pointer. |
|
||
| 0x2030 | PprLogTail | 64-bit | RO | PPR Log Tail Pointer. |
|
||
|
||
#### Control Register (0x0018) Bit Layout
|
||
|
||
| Bit | Name | Description |
|
||
|-----|------|-------------|
|
||
| 0 | IOMMUEnable | 0 = IOMMU translations disabled, 1 = enabled. Must be set last after all other config. |
|
||
| 1 | HTTunEn | HyperTransport Tunnel Enable. Set 0 for modern systems. |
|
||
| 2 | EventLogEn | Event Log Enable. Set 1 to enable event logging. |
|
||
| 3 | EventIntEn | Event Log Interrupt Enable. Set 1 to generate interrupts on event log overflow. |
|
||
| 4 | ComWaitIntEn | Completion Wait Interrupt Enable. |
|
||
| 5 | CmdBufEn | Command Buffer Enable. Set 1 to enable command processing. |
|
||
| 6 | PprLogEn | Peripheral Page Request Log Enable. |
|
||
| 7 | PprIntEn | PPR Log Interrupt Enable. |
|
||
| 8 | PprEn | Peripheral Page Request Processing Enable. |
|
||
| 9 | GTEn | Guest Translation Enable. |
|
||
| 10 | GAEn | Guest APIC (Advanced Programmable Interrupt Controller) Enable. |
|
||
| 12 | CRW | IOMMU Reset. Write 1 to clear errors after reset. |
|
||
| 13 | SMifEn | SMI Filter Enable. |
|
||
| 14 | SlFWEn | Self-Modify Firmware Enable. |
|
||
| 15 | SMifLogEn | SMI Filter Log Enable. |
|
||
| 16 | GAMEn_0 | Guest APIC Mode bit 0. |
|
||
| 17 | GAMEn_1 | Guest APIC Mode bit 1. |
|
||
| 18 | GAMEn_2 | Guest APIC Mode bit 2. |
|
||
| 22 | XTEn | x2APIC Enabled. |
|
||
| 23 | NXEn | No-Execute Enable. |
|
||
| 24 | IRQTableLEn | Interrupt Remap Table Length Enable. |
|
||
|
||
#### Status Register (0x2020) Bit Layout
|
||
|
||
| Bit | Name | Description |
|
||
|-----|------|-------------|
|
||
| 0 | IOMMURunning | 1 = IOMMU is processing commands or translations. |
|
||
| 1 | EventOverflow | 1 = Event log overflow occurred. Write 1 to clear. |
|
||
| 2 | EventLogInt | 1 = Event log interrupt pending. |
|
||
| 3 | ComWaitInt | 1 = Completion wait interrupt pending. |
|
||
| 4 | PprOverflow | 1 = PPR log overflow. |
|
||
| 5 | PprInt | 1 = PPR log interrupt pending. |
|
||
| 31 | RsvdP | Reserved (polling status bits). |
|
||
|
||
#### Extended Feature Register (0x0030) Bit Layout
|
||
|
||
| Bit | Name | Description |
|
||
|-----|------|-------------|
|
||
| 0 | PrefSup | Prefetch Support. |
|
||
| 1 | PPRSup | Peripheral Page Request Support. |
|
||
| 2 | XTSup | x2APIC Support. |
|
||
| 3 | NXSup | No-Execute Support. |
|
||
| 4 | GTSup | Guest Translation Support. |
|
||
| 5 | bit5 | Reserved. |
|
||
| 6 | IASup | Invalidate IOMMU All Support. |
|
||
| 7 | GASup | Guest APIC Support. |
|
||
| 8 | HESup | Hardware Error Registers Support. |
|
||
| 9 | PCSup | Performance Counters Support. |
|
||
| 12:15 | MsiNumPPR | MSI message number for PPR. |
|
||
| 27 | PASMax | Maximum PASID support. |
|
||
| 46:52 | PASMax | Physical Address Space Max (1 = 48-bit, 2 = 52-bit). |
|
||
| 57 | GISup | Global Invalidate Support. |
|
||
| 58 | HASup | Host Address Translation Size. |
|
||
|
||
### 1.2 Device Table Entry (DTE)
|
||
|
||
The Device Table holds up to 65536 entries indexed by BDF (Bus:Device:Function). Each entry is 256 bits (32 bytes). The table must be contiguous in physical memory.
|
||
|
||
**Table size**: entries × 32 bytes. With 65536 entries, max 2 MiB.
|
||
|
||
```
|
||
DTE layout (256 bits = data[0] data[1] data[2] data[3], each u64):
|
||
|
||
data[0] (bits 0-63):
|
||
[ 0] V — Valid. 1 = entry is valid.
|
||
[ 1] TV — Translation Valid. 1 = address translation enabled for this device.
|
||
[ 2:3] Reserved
|
||
[ 4] IW — Write permission (when Mode != 0). 1 = device may write.
|
||
[ 5] IR — Read permission (when Mode != 0). 1 = device may read.
|
||
[ 6:7] Reserved
|
||
[ 8] SE — Snoop Enable. 1 = device requests are snooped.
|
||
[ 9:11] Mode — Translation mode:
|
||
000 = No translation (pass-through if TV=0)
|
||
001 = 1-level page table
|
||
010 = 2-level page table
|
||
011 = 3-level page table
|
||
100 = 4-level page table
|
||
101 = 5-level page table
|
||
110 = 6-level page table
|
||
111 = Reserved
|
||
[12:51] PTP — Page Table Root Pointer. Physical address of top-level page table.
|
||
Must be 4KiB-aligned. Bits 0:11 of the address are assumed zero.
|
||
[52:55] GCR3Trp0 — Guest CR3 Table Root Pointer bits 12:15.
|
||
[56:58] GV — Guest Translation Valid bits.
|
||
[59] GLX — Guest Levels bit 0.
|
||
[60] GLX — Guest Levels bit 1.
|
||
[61] IR — Interrupt Remapping Enable. 1 = interrupts from this device are remapped.
|
||
[62] IW — Interrupt Write permission. 1 = device may generate interrupt writes.
|
||
[63] Reserved
|
||
|
||
data[1] (bits 64-127):
|
||
[0:3] IntTabLen — Interrupt Remap Table Length. Number of entries = 2^(IntTabLen+1).
|
||
0 = 2 entries, 1 = 4 entries, ..., 10 = 2048 entries, 11 = 4096 entries.
|
||
[4:5] IntCtl — Interrupt Control. 00 = abort, 01 = pass-through (no remap),
|
||
10 = remapped, 11 = reserved.
|
||
[6:51] IRTP — Interrupt Remap Table Pointer. Physical address of interrupt
|
||
remap table. Must be 4KiB-aligned (bits 0:11 assumed zero).
|
||
[52:63] Reserved
|
||
|
||
data[2] (bits 128-191):
|
||
[0:51] GCR3Trp1 — Guest CR3 Table Root Pointer bits 16:63.
|
||
[52:63] Reserved
|
||
|
||
data[3] (bits 192-255):
|
||
[0:15] GCR3Trp2 — Guest CR3 Table Root Pointer bits 64:79.
|
||
[16] AttrRsvd — Reserved attribute bit.
|
||
[17] AttrU — User bit for device-specific use.
|
||
[18:20] Mode2 — Alias to Mode bits (duplicate for hardware).
|
||
[21:63] Reserved
|
||
```
|
||
|
||
**Key constants from Linux** (`drivers/iommu/amd/amd_iommu_types.h`):
|
||
|
||
```c
|
||
#define DTE_FLAG_V (1ULL << 0)
|
||
#define DTE_FLAG_TV (1ULL << 1)
|
||
#define DTE_FLAG_IR (1ULL << 61)
|
||
#define DTE_FLAG_IW (1ULL << 62)
|
||
#define DTE_MODE_MASK 0x0E00ULL // bits 9:11
|
||
#define DTE_PT_ADDR_MASK 0x0FFFFFFFFFF000ULL // bits 12:51
|
||
#define DEV_DOMID_MASK 0x0FFFFULL // domain ID in bits 0:15 (when TV=0)
|
||
```
|
||
|
||
### 1.3 Interrupt Remapping Table Entry (IRTE)
|
||
|
||
The Interrupt Remap Table is pointed to by the IRTP field in the DTE. Each entry is 128 bits (16 bytes). Length is 2^(IntTabLen+1) entries.
|
||
|
||
```
|
||
IRTE (128 bits = data[0] data[1], each u64):
|
||
|
||
data[0]:
|
||
[0] RemapEn — Remap Enable. 1 = this entry is valid for remapping.
|
||
[1] SupIOPF — Suppress I/O Page Faults. 1 = suppress faults from this interrupt.
|
||
[2] IntType — Interrupt Type:
|
||
000 = Fixed (edge or level, determined by trigger mode)
|
||
001 = Arbitrated
|
||
010 = SMI
|
||
011 = NMI
|
||
100 = INIT
|
||
101 = EXTINT
|
||
111 = Hardware-specific
|
||
[3:4] IntType bits continued (3-bit field uses bits 2:4)
|
||
[5] Rsvd — Reserved.
|
||
[5:7] DM — Delivery Mode. 0 = Fixed, 1 = Lowest Priority.
|
||
[8] IRrsvd — Reserved.
|
||
[9:10] GV — Guest Vector.
|
||
[11] GDstMode — Guest Destination Mode. 0 = Physical, 1 = Logical.
|
||
[12] DstMode — Destination Mode. 0 = Physical APIC ID, 1 = Logical.
|
||
[13:15] Rsvd — Reserved.
|
||
[16:31] DstID — Destination APIC ID. For x2APIC, full 32-bit ID (low 16 bits here).
|
||
[16:31] DstLo — Low 16 bits of destination APIC ID.
|
||
[32:63] Vector — Interrupt vector (0x10..0xFE).
|
||
|
||
data[1]:
|
||
[0:31] DstHi — High 32 bits of x2APIC destination ID. Zero for xAPIC.
|
||
[32:63] Rsvd — Reserved. Must be zero.
|
||
```
|
||
|
||
**IRTE bit layout for x2APIC mode (when XTSup=1 in ExtendedFeature)**:
|
||
|
||
```
|
||
data[0]:
|
||
[0] RemapEn — 1 = valid
|
||
[1] SupIOPF — Suppress IO Page Fault
|
||
[2:4] IntType — Interrupt type (same as above)
|
||
[5:7] Rsvd
|
||
[8] DstMode — 0 = physical, 1 = logical
|
||
[9:10] Rsvd
|
||
[16:31] DstIDLo — Low 16 bits of x2APIC ID
|
||
[32:39] Vector — Interrupt vector
|
||
[40:63] Rsvd
|
||
|
||
data[1]:
|
||
[0:31] DstIDHi — High 32 bits of x2APIC destination ID
|
||
[32:63] Rsvd
|
||
```
|
||
|
||
### 1.4 Command Buffer Entry
|
||
|
||
The command buffer is a circular queue. Each entry is 128 bits (16 bytes = 4 × u32). Software writes to the tail, hardware reads from the head. Base address in CmdBufBar, head/tail pointers in CmdBufHead/CmdBufTail.
|
||
|
||
**Buffer sizing**: 8192 bytes default (512 entries). Size = 2^(CmdBufLen+2) × 16 bytes.
|
||
|
||
```
|
||
Command Buffer Entry (128 bits = word[0] word[1] word[2] word[3], each u32):
|
||
|
||
word[0]:
|
||
[0:3] Opcode — Command opcode (see below)
|
||
[4:31] Varies — Opcode-specific operands
|
||
|
||
word[1], word[2], word[3]:
|
||
Opcode-specific payload. See each command format below.
|
||
```
|
||
|
||
#### COMPLETION_WAIT (Opcode 0x01)
|
||
|
||
Used to poll for command completion. Can generate an interrupt or write a value to memory.
|
||
|
||
```
|
||
word[0]: [0:3]=0x01, [4]=Store (1=write to memory), [5]=Interrupt (1=generate IRQ),
|
||
[6:31] Reserved
|
||
word[1]: [0:31] Store Address low 32 bits (physical, must be 8-byte aligned)
|
||
word[2]: [0:31] Store Address high 32 bits
|
||
word[3]: [0:31] Store Data — value written to Store Address when command completes
|
||
```
|
||
|
||
#### INVALIDATE_DEVTAB_ENTRY (Opcode 0x02)
|
||
|
||
Invalidates a single device table entry. Must be issued after modifying a DTE.
|
||
|
||
```
|
||
word[0]: [0:3]=0x02, [4:31] Reserved
|
||
word[1]: [0:15] DeviceId (BDF format: Bus[15:8] | Dev[7:3] | Func[2:0])
|
||
[16:31] Reserved
|
||
word[2]: [0:31] Reserved
|
||
word[3]: [0:31] Reserved
|
||
```
|
||
|
||
#### INVALIDATE_IOMMU_PAGES (Opcode 0x03)
|
||
|
||
Invalidates translation cache (TLB) entries for a range of pages.
|
||
|
||
```
|
||
word[0]: [0:3]=0x03, [4]=S (Size: 0=invalidate one page, 1=invalidate all pages for domain),
|
||
[5]=PDE (Page Directory Entry: 1=invalidate PDE cache too),
|
||
[6:31] Reserved
|
||
word[1]: [0:15] DomainId — domain to invalidate
|
||
[16:31] Reserved
|
||
word[2]: [0:51] Address — virtual address to invalidate (page-aligned). Ignored if S=1.
|
||
[52:63] Reserved
|
||
word[3]: [0:31] Reserved
|
||
```
|
||
|
||
#### INVALIDATE_INTERRUPT_TABLE (Opcode 0x04)
|
||
|
||
Invalidates the interrupt remap cache for a device.
|
||
|
||
```
|
||
word[0]: [0:3]=0x04, [4:31] Reserved
|
||
word[1]: [0:15] DeviceId (BDF format)
|
||
[16:31] Reserved
|
||
word[2]: [0:31] Reserved
|
||
word[3]: [0:31] Reserved
|
||
```
|
||
|
||
#### INVALIDATE_IOMMU_ALL (Opcode 0x05)
|
||
|
||
Invalidates all IOMMU caches (TLB, DTE, IRTE). Available when IASup=1.
|
||
|
||
```
|
||
word[0]: [0:3]=0x05, [4:31] Reserved
|
||
word[1]: [0:31] Reserved
|
||
word[2]: [0:31] Reserved
|
||
word[3]: [0:31] Reserved
|
||
```
|
||
|
||
### 1.5 Event Log Entry
|
||
|
||
The event log is a circular queue written by the IOMMU hardware. Each entry is 128 bits (16 bytes = 4 × u32). Base address in EvtLogBar.
|
||
|
||
**Buffer sizing**: 8192 bytes default (512 entries). Size = 2^(EvtLogLen+2) × 16 bytes.
|
||
|
||
```
|
||
Event Log Entry (128 bits = word[0] word[1] word[2] word[3]):
|
||
|
||
word[0]:
|
||
[0:15] EventCode — Event type code (see below)
|
||
[16:31] EventFlags — Event-specific flags
|
||
|
||
word[1], word[2], word[3]:
|
||
Event-specific data. See each event type below.
|
||
```
|
||
|
||
#### IO_PAGE_FAULT (Event Code 0x01)
|
||
|
||
Generated when a device accesses an address that fails translation.
|
||
|
||
```
|
||
word[0]: [0:15]=0x01, [16] TR (Translation Response: 1=fault in translation),
|
||
[17] RZ (Read/Zero: 1=read of zero page), [18] I (Interrupt: 1=interrupt request),
|
||
[19] PE (Permission Error: 1=permission violation), [20] RW (1=write, 0=read),
|
||
[21] PR (Present: 1=PTE was present), [22] Rsvd
|
||
word[1]: [0:15] DeviceId (BDF), [16:31] Reserved or PASID
|
||
word[2]: [0:31] Fault Address low 32 bits
|
||
word[3]: [0:31] Fault Address high 32 bits
|
||
```
|
||
|
||
#### INVALIDATE_DEVICE_TABLE (Event Code 0x02)
|
||
|
||
Generated when hardware detects an invalid DTE during a transaction.
|
||
|
||
```
|
||
word[0]: [0:15]=0x02, [16:31] Reserved
|
||
word[1]: [0:15] DeviceId (BDF), [16:31] Reserved
|
||
word[2]: [0:31] Reserved
|
||
word[3]: [0:31] Reserved
|
||
```
|
||
|
||
#### INVALIDATE_COMMAND (Event Code 0x03)
|
||
|
||
Generated when an invalid command is detected in the command buffer.
|
||
|
||
```
|
||
word[0]: [0:15]=0x03, [16:31] Reserved
|
||
word[1]: [0:15] Reserved, [16:31] Reserved
|
||
word[2]: [0:31] Physical address of the illegal command (low)
|
||
word[3]: [0:31] Physical address of the illegal command (high)
|
||
```
|
||
|
||
#### COMMAND_HARDWARE_ERROR (Event Code 0x05)
|
||
|
||
Hardware error during command processing.
|
||
|
||
```
|
||
word[0]: [0:15]=0x05, [16:31] Error flags
|
||
word[1]: [0:31] Error address or type
|
||
word[2]: [0:31] Error address low
|
||
word[3]: [0:31] Error address high
|
||
```
|
||
|
||
### 1.6 IVRS ACPI Table
|
||
|
||
The IVRS (I/O Virtualization Reporting Structure) is the ACPI table that describes AMD IOMMU topology. Found by scanning ACPI tables with signature "IVRS" (0x56534949).
|
||
|
||
#### IVRS Header (36 bytes)
|
||
|
||
```
|
||
Offset Size Field Description
|
||
0x00 4 Signature "IVRS" (0x56534949)
|
||
0x04 4 Length Total table length in bytes
|
||
0x08 1 Revision 2 = revision 2 (AMD-Vi), 3 = revision 3
|
||
0x09 1 Checksum ACPI checksum (sum of all bytes = 0)
|
||
0x0A 6 OemId OEM identifier
|
||
0x10 8 OemTableId OEM table identifier
|
||
0x18 4 OemRevision OEM revision
|
||
0x1C 4 CreatorId ASL compiler vendor
|
||
0x20 4 CreatorRevision ASL compiler revision
|
||
0x24 4 IvInfo IOMMU Virtualization Info:
|
||
[0:7] = Virtualization Spec Revision (40 = rev 4.0)
|
||
[8:9] = EFRSup (Extended Feature Register supported)
|
||
[10:11] = Reserved
|
||
[31] = HT AtsResv (HT ATS reserved)
|
||
```
|
||
|
||
#### IVHD Entry (I/O Virtualization Hardware Definition)
|
||
|
||
Describes a single IOMMU unit. There can be multiple IVHD entries for multiple IOMMUs.
|
||
|
||
```
|
||
Offset Size Field Description
|
||
0x00 1 Type 0x10 = IVHD type 10 (rev 2), 0x11 = IVHD type 11 (rev 3, 64-bit)
|
||
0x01 1 Flags Feature flags:
|
||
[0] = HtTunEn (HT tunnel enable)
|
||
[1] = PassPW (Pass posted writes)
|
||
[2] = ResPassPW (Reset PassPW)
|
||
[3] = Isoc (Isoc support)
|
||
[4] = IotlbSup (IOTLB support)
|
||
[5] = Coherent (Coherent IOMMU)
|
||
[6] = PrefSup (Prefetch support)
|
||
[7] = PPRSup (PPR support)
|
||
0x02 2 Length Total length of this IVHD entry including device entries
|
||
0x04 2 DeviceId BDF of the IOMMU PCI device
|
||
0x06 2 CapabilityOffset PCI capability offset for IOMMU capability block
|
||
0x08 8 IOMMUBaseAddress Physical MMIO base address of IOMMU registers
|
||
(type 10: bits 0:51 valid, type 11: full 64-bit)
|
||
0x10 2 PciSegmentGroup PCI segment group number
|
||
0x12 2 IommuInfo IOMMU Info:
|
||
[0:5] = MSI number for event log
|
||
[6:12] = Unit ID (IOMMU hardware unit ID)
|
||
[13:15] = Reserved
|
||
0x14 4 IommuEfr Extended Feature Register attributes (type 11 only)
|
||
0x18 ... DeviceEntries Variable-length device entry list follows
|
||
```
|
||
|
||
#### IVHD Device Entry Types
|
||
|
||
Each device entry in an IVHD starts with a type byte followed by data.
|
||
|
||
| Type | Name | Size | Description |
|
||
|------|------|------|-------------|
|
||
| 0x00 | IVHD_ALL | 4 | Select all devices (except those listed in other entries). Data = all zeros. |
|
||
| 0x01 | IVHD_SEL | 4 | Select a single device. Bytes 2:3 = DeviceId (BDF). Byte 4 = Data (LSA flags). |
|
||
| 0x02 | IVHD_SOR | 4 | Start of Range. Bytes 2:3 = first DeviceId in range. |
|
||
| 0x03 | IVHD_EOR | 4 | End of Range. Bytes 2:3 = last DeviceId in range. |
|
||
| 0x42 | IVHD_PAD4 | 8 | 4-byte PAD entry (reserved extension). |
|
||
| 0x43 | IVHD_PAD8 | 12 | 8-byte PAD entry (reserved extension). |
|
||
| 0x44 | IVHD_VAR | Variable | Variable-length entry. Byte 1 = length. Used for alias, extended selections. |
|
||
|
||
#### IVHD Device Entry Data Byte
|
||
|
||
```
|
||
Bits of the Data byte in IVHD_SEL/IVHD_SOR:
|
||
[0] Lint0Pass — LINT0 remapping passthrough
|
||
[1] Lint1Pass — LINT1 remapping passthrough
|
||
[2] SysMgt — System Management:
|
||
00 = No system management
|
||
01 = System Management at request level
|
||
10 = System Management at fault level
|
||
[3] SysMgt — (continued)
|
||
[4] NMIPass — NMI remapping passthrough
|
||
[5] ExtIntPass — External Interrupt remapping passthrough
|
||
[6] InitPass — INIT remapping passthrough
|
||
[7] Rsvd — Reserved
|
||
```
|
||
|
||
#### IVMD Entry (I/O Virtualization Memory Definition)
|
||
|
||
Describes a memory region that has special IOMMU handling. Appears after IVHD entries.
|
||
|
||
```
|
||
Offset Size Field Description
|
||
0x00 1 Type 0x20 = IVMD type 20 (rev 2), 0x21 = IVMD type 21 (rev 3)
|
||
0x01 1 Flags Memory block flags:
|
||
[0] = Unity (untranslated/unity mapping)
|
||
[1] = Read (device may read)
|
||
[2] = Write (device may write)
|
||
[3] = ExclRange (exclusion range)
|
||
0x02 2 Length Total length of this IVMD entry (16 or 24 bytes)
|
||
0x04 2 DeviceId Start DeviceId (BDF) or 0x0000 for all devices
|
||
0x06 2 AuxData Auxiliary data (reserved in most implementations)
|
||
0x08 8 StartAddress Physical start address of the memory region (type 20: 32-bit in low bits)
|
||
0x10 8 MemoryLength Length of the memory region in bytes (type 20: 32-bit in low bits)
|
||
```
|
||
|
||
### 1.7 Page Table Entry (PTE)
|
||
|
||
AMD-Vi page tables use multi-level radix tree. The number of levels is set by the DTE Mode field (1 to 6 levels). Each PTE is 64 bits.
|
||
|
||
```
|
||
PTE (64 bits):
|
||
[0] PR — Present. 1 = this entry maps a valid page or points to next level.
|
||
[1] U — User/Supervisor. 1 = accessible from user level. (only with NXSup)
|
||
[2] IW — Write permission. 1 = device may write to this page.
|
||
[3] IR — Read permission. 1 = device may read this page.
|
||
[4:8] Rsvd — Reserved. Must be zero.
|
||
[9:11] NextLevel — Next page table level (0=PTE/leaf, 1=PDE, 2=PDPTE, 3=PML4E, 4=PML5E).
|
||
At leaf level (PR=1, NextLevel=0): bits 12:51 = physical page frame.
|
||
At non-leaf level (PR=1, NextLevel>0): bits 12:51 = next table address.
|
||
[12:51] OutputAddr — Physical address of page frame (leaf) or next-level table (non-leaf).
|
||
Must be 4KiB-aligned (bits 0:11 assumed zero).
|
||
[51:58] Rsvd — Reserved. Must be zero.
|
||
[59] FC — Force Coherent. 1 = force coherent transactions for this page.
|
||
[60] Rsvd — Reserved.
|
||
[61] IR — Interrupt Remap (alias in page tables, platform-specific).
|
||
[62] IW — Interrupt Write (alias in page tables, platform-specific).
|
||
[63] NX — No-Execute. 1 = instruction fetches from this page are blocked (only with NXSup).
|
||
```
|
||
|
||
**Level-to-address-bits mapping**:
|
||
|
||
| Levels | Address Bits | Max Physical Address |
|
||
|--------|-------------|---------------------|
|
||
| 1 | 21 | 2 MiB |
|
||
| 2 | 30 | 1 GiB |
|
||
| 3 | 39 | 512 GiB |
|
||
| 4 | 48 | 256 TiB |
|
||
| 5 | 57 | 128 PiB |
|
||
| 6 | 63 | ~8 EiB |
|
||
|
||
**Linux page table macros** (`drivers/iommu/amd/amd_iommu_types.h`):
|
||
|
||
```c
|
||
#define PM_LEVEL_SHIFT 9
|
||
#define PM_LEVEL_SIZE (1UL << PM_LEVEL_SHIFT)
|
||
#define PM_LEVEL_INDEX(level, address) \
|
||
(((address) >> (12 + (((level) - 1) * 9))) & 0x1FF)
|
||
#define PM_LEVEL_ENC(level, address) \
|
||
((address) | (((level) - 1) << 9) | 1ULL) // PR=1, NextLevel=level-1
|
||
#define PM_PTE_LEVEL(pte) (((pte) >> 9) & 0x7)
|
||
```
|
||
|
||
### 1.8 Initialization Sequence (AMD-Vi)
|
||
|
||
Step-by-step register programming to bring up AMD-Vi IOMMU.
|
||
|
||
```
|
||
Step 1: Discover IOMMU hardware
|
||
- Scan ACPI tables for IVRS signature
|
||
- Parse IVHD entries to find MMIO base address
|
||
- Read ExtendedFeature (0x0030) to determine capabilities
|
||
|
||
Step 2: Disable IOMMU (ensure clean state)
|
||
- Control = 0x00000000 (IOMMUEnable=0, all features off)
|
||
- Wait until Status[0] (IOMMURunning) = 0
|
||
|
||
Step 3: Allocate and zero Device Table
|
||
- Alloc 2 MiB contiguous physical memory (65536 × 32 bytes)
|
||
- Zero all entries
|
||
- Write DevTableBar (0x0000):
|
||
Bits 0:8 = DevTableSize (0x0F for 65536 entries: 2^(0x0F+1) = 65536)
|
||
Bits 12:51 = Physical address of table
|
||
|
||
Step 4: Allocate and zero Command Buffer
|
||
- Alloc 8192 bytes contiguous physical (512 entries × 16 bytes)
|
||
- Zero all entries
|
||
- Write CmdBufBar (0x0008):
|
||
Bits 0:8 = CmdBufLen (0x08 for 512 entries: 2^(0x08+2) = 4096 bytes... use 0x09 for 8192)
|
||
Bits 12:51 = Physical address
|
||
|
||
Step 5: Allocate and zero Event Log
|
||
- Alloc 8192 bytes contiguous physical (512 entries × 16 bytes)
|
||
- Zero all entries
|
||
- Write EvtLogBar (0x0010):
|
||
Bits 0:8 = EvtLogLen (0x09 for 8192 bytes)
|
||
Bits 12:51 = Physical address
|
||
|
||
Step 6: Set up exclusion range (optional)
|
||
- Write ExclusionBase (0x0020) = start of excluded physical range
|
||
- Write ExclusionLimit (0x0028) = end of excluded physical range
|
||
- Skip if no exclusion needed
|
||
|
||
Step 7: Reset head/tail pointers
|
||
- CmdBufHead (0x2000) = 0
|
||
- CmdBufTail (0x2008) = 0
|
||
- EvtLogHead (0x2010) = 0
|
||
- (EvtLogTail is RO, hardware sets it)
|
||
|
||
Step 8: Allocate and zero Interrupt Remap Table (if IR needed)
|
||
- Alloc 4096 × 16 bytes = 64 KiB (for IntTabLen=11, max 4096 entries)
|
||
- Zero all entries
|
||
- Configure each device's DTE with IRTP pointing to this table
|
||
|
||
Step 9: Configure DTEs for devices
|
||
- For each device that needs translation:
|
||
Set V=1, TV=1, Mode=4 (4-level), PTP=root page table address
|
||
Set IR=1, IW=1 if interrupt remapping is used
|
||
Set IntCtl=0x02 (remapped), IntTabLen, IRTP
|
||
|
||
Step 10: Enable features in Control register
|
||
- Control = 0x00000000 | bits for enabled features:
|
||
Bit 2 (EventLogEn) = 1
|
||
Bit 5 (CmdBufEn) = 1
|
||
Bit 22 (XTEn) = 1 (if x2APIC supported and in use)
|
||
Bit 23 (NXEn) = 1 (if NX supported)
|
||
- DO NOT set bit 0 (IOMMUEnable) yet
|
||
|
||
Step 11: Flush caches via command buffer
|
||
- Submit INVALIDATE_IOMMU_ALL (0x05) if supported, or:
|
||
INVALIDATE_DEVTAB_ENTRY for each modified device
|
||
INVALIDATE_INTERRUPT_TABLE for each device with IR
|
||
- Submit COMPLETION_WAIT (0x01) to synchronize
|
||
- Wait for completion
|
||
|
||
Step 12: Enable IOMMU translations
|
||
- Set Control bit 0 (IOMMUEnable) = 1
|
||
- Read Status to verify IOMMURunning = 1
|
||
|
||
Step 13: Enable interrupts (optional)
|
||
- Set Control bit 3 (EventIntEn) = 1
|
||
- Configure MSI delivery for the IOMMU PCI device
|
||
```
|
||
|
||
---
|
||
|
||
## 2. Intel VT-d
|
||
|
||
### 2.1 MMIO Register Map
|
||
|
||
Base address obtained from ACPI DMAR table (DRHD entry `RegisterBase` field).
|
||
|
||
| Offset | Name | Size | Access | Description |
|
||
|--------|------|------|--------|-------------|
|
||
| 0x00 | VER_REG | 32-bit | RO | Architecture Version. [0:7] = Minor, [8:15] = Major. |
|
||
| 0x08 | CAP_REG | 64-bit | RO | Capability Register. See bit layout below. |
|
||
| 0x10 | ECAP_REG | 64-bit | RO | Extended Capability Register. See bit layout below. |
|
||
| 0x18 | GCMD_REG | 32-bit | WO | Global Command Register. Write to request operations. |
|
||
| 0x1C | GSTS_REG | 32-bit | RO | Global Status Register. Reflects GCMD results. |
|
||
| 0x20 | RTADDR_REG | 64-bit | R/W | Root Table Address. Bit 0 = RTT (Root Table Type: 0=legacy, 1=extended). Bits 12:63 = physical address. |
|
||
| 0x28 | CCMD_REG | 64-bit | R/W | Context Command Register. For invalidating context caches. |
|
||
| 0x30 | FSTS_REG | 32-bit | RO | Fault Status Register. |
|
||
| 0x34 | FECTL_REG | 32-bit | R/W | Fault Event Control Register. |
|
||
| 0x38 | FEDATA_REG | 32-bit | R/W | Fault Event Data Register. MSI data. |
|
||
| 0x3C | FEADDR_REG | 32-bit | R/W | Fault Event Address Register. MSI address low. |
|
||
| 0x40 | FEUADDR_REG | 32-bit | R/W | Fault Event Upper Address Register. MSI address high. |
|
||
| 0x48 | AFLOG_REG | 64-bit | R/W | Advanced Fault Log Register. |
|
||
| 0x58 | PMEN_REG | 32-bit | R/W | Protected Memory Enable Register. |
|
||
| 0x5C | PLMBASE_REG | 32-bit | R/W | Protected Low Memory Base Register. |
|
||
| 0x60 | PLMLIMIT_REG | 32-bit | R/W | Protected Low Memory Limit Register. |
|
||
| 0x68 | PHMBASE_REG | 64-bit | R/W | Protected High Memory Base Register. |
|
||
| 0x70 | PHMLIMIT_REG | 64-bit | R/W | Protected High Memory Limit Register. |
|
||
| 0x78 | IQH_REG | 64-bit | RO | Invalidation Queue Head Register. |
|
||
| 0x80 | IQT_REG | 64-bit | R/W | Invalidation Queue Tail Register. |
|
||
| 0x88 | IQA_REG | 64-bit | R/W | Invalidation Queue Address Register. |
|
||
| 0x90 | ICS_REG | 32-bit | RO | Invalidation Completion Status Register. |
|
||
| 0x94 | IECTL_REG | 32-bit | R/W | Invalidation Event Control Register. |
|
||
| 0x98 | IEDATA_REG | 32-bit | R/W | Invalidation Event Data Register. |
|
||
| 0x9C | IEADDR_REG | 32-bit | R/W | Invalidation Event Address Register. |
|
||
| 0xA0 | IEUADDR_REG | 32-bit | R/W | Invalidation Event Upper Address Register. |
|
||
| 0xB0 | IRTA_REG | 64-bit | R/W | Interrupt Remapping Table Address Register. |
|
||
|
||
#### CAP_REG (0x08) Bit Layout
|
||
|
||
| Bit | Name | Description |
|
||
|-----|------|-------------|
|
||
| 0 | ND (bits 0:2) | Number of Domains Supported. 0=4, 1=16, 2=64, 3=256, 4=1024, 5=4K, 6=16K, 7=64K. |
|
||
| 3:7 | ZLR | Zero Length Read. 1 = supported. |
|
||
| 8 | AFL | Advanced Fault Logging. 1 = supported. |
|
||
| 9 | RWBF | Required Write-Buffer Flushing. 1 = software must flush write buffers before invalidations. |
|
||
| 10:11 | PLMR | Protected Low Memory Region. 1 = supported. |
|
||
| 12:13 | PHMR | Protected High Memory Region. 1 = supported. |
|
||
| 14 | CM | Caching Mode. 1 = IOMMU operates in caching mode (no explicit invalidation needed). |
|
||
| 15:23 | SAGAW | Supported Adjusted Guest Address Widths. Bit N set = (N+1)-level page tables supported. |
|
||
| 24:33 | MGAW | Maximum Guest Address Width. Actual address width = MGAW + 1. |
|
||
| 34:35 | MAMV | Maximum Address Mask Value. For interrupt remapping. |
|
||
| 36 | ZAM | Zero Address/Mask. For interrupt remapping. |
|
||
| 37:39 | Rsvd | Reserved. |
|
||
| 40 | FL1GP | First Level 1-GByte Page Support. |
|
||
| 41:43 | Rsvd | Reserved. |
|
||
| 44 | PSI | Page Selective Invalidation. 1 = supported. |
|
||
| 45:51 | Rsvd | Reserved. |
|
||
| 52 | SPS | Super Page Support. Bits indicate 2MiB, 1GiB, 512GiB support. |
|
||
| 52:55 | FR | Fault Recording Register count minus 1. |
|
||
| 56:60 | Rsvd | Reserved. |
|
||
| 61:63 | Rsvd | Reserved. |
|
||
|
||
#### ECAP_REG (0x10) Bit Layout
|
||
|
||
| Bit | Name | Description |
|
||
|-----|------|-------------|
|
||
| 0 | C | Page Request (PRI) support. |
|
||
| 1 | QI | Queued Invalidation support. 1 = IQ mechanism supported. |
|
||
| 2 | DT | Device TLB support. |
|
||
| 3 | IR | Interrupt Remapping support. 1 = supported. |
|
||
| 4 | EIM | Extended Interrupt Mode. 1 = x2APIC mode supported for IR. |
|
||
| 5:7 | Rsvd | Reserved. |
|
||
| 8 | PT | Pass Through. 1 = second-level translation bypass supported. |
|
||
| 9:17 | Rsvd | Reserved. |
|
||
| 18 | SC | Snoop Control. |
|
||
| 19:24 | Rsvd | Reserved. |
|
||
| 25:34 | IRO | IOTLB Register Offset. Offset from base for IOTLB registers. |
|
||
| 35:43 | Rsvd | Reserved. |
|
||
| 44:47 | MHMV | Maximum Handle Mask Value. |
|
||
| 48 | ECS | Extended Context Support. |
|
||
| 49 | MTS | Memory Type Support. |
|
||
| 50 | NEST | Nested Translation Support. |
|
||
| 51:63 | Rsvd | Reserved. |
|
||
|
||
#### GCMD_REG (0x18) Bit Layout (Write-Only)
|
||
|
||
| Bit | Name | Description |
|
||
|-----|------|-------------|
|
||
| 31 | TE | Translation Enable. Write 1 to enable/disable. |
|
||
| 30 | SRTP | Set Root Table Pointer. Write 1, hardware sets GSTS.RTPS when done. |
|
||
| 29 | SFL | Set Fault Log. Write 1 to set fault log pointer. |
|
||
| 28 | EAFL | Enable Advanced Fault Log. |
|
||
| 27 | WBF | Write Buffer Flush. Write 1, hardware sets GSTS.WBFS when done. |
|
||
| 26 | QIE | Queued Invalidation Enable. Write 1 to enable. |
|
||
| 25 | SIRTP | Set Interrupt Remap Table Pointer. Write 1, hardware sets GSTS.IRTPS. |
|
||
| 24 | CFI | Compatibility Format Interrupt. Write 1 to block compatibility interrupts. |
|
||
| 23 | IR | Interrupt Remap. Write 1 to enable interrupt remapping. |
|
||
| 0:22 | Rsvd | Reserved. Must write zero. |
|
||
|
||
#### GSTS_REG (0x1C) Bit Layout (Read-Only)
|
||
|
||
| Bit | Name | Description |
|
||
|-----|------|-------------|
|
||
| 31 | TES | Translation Enable Status. 1 = enabled. |
|
||
| 30 | RTPS | Root Table Pointer Status. 1 = root table pointer set. |
|
||
| 29 | FLS | Fault Log Status. |
|
||
| 28 | AFLS | Advanced Fault Log Status. |
|
||
| 27 | WBFS | Write Buffer Flush Status. 1 = flush complete. |
|
||
| 26 | QIES | Queued Invalidation Enable Status. |
|
||
| 25 | IRTPS | Interrupt Remap Table Pointer Status. |
|
||
| 24 | CFIS | Compatibility Format Interrupt Status. |
|
||
| 23 | IRES | Interrupt Remap Enable Status. |
|
||
| 0:22 | Rsvd | Reserved. |
|
||
|
||
### 2.2 Root Table Entry
|
||
|
||
The Root Table is pointed to by RTADDR_REG. It contains 256 entries (one per PCI bus). Each entry is 128 bits (16 bytes). Must be 4KiB-aligned.
|
||
|
||
```
|
||
Root Entry (128 bits = data[0] data[1], each u64):
|
||
|
||
data[0]:
|
||
[0] P — Present. 1 = this bus has context entries.
|
||
[1:63] CTP — Context Table Pointer. Physical address of the context table
|
||
for this bus. Bits 12:63 hold address. Must be 4KiB-aligned.
|
||
|
||
data[1]:
|
||
[0:63] Rsvd — Reserved. Must be zero.
|
||
```
|
||
|
||
### 2.3 Context Entry
|
||
|
||
Each Context Table contains 256 entries (one per device:function on a bus). Each entry is 128 bits (16 bytes).
|
||
|
||
```
|
||
Context Entry (128 bits = data[0] data[1], each u64):
|
||
|
||
data[0]:
|
||
[0] P — Present. 1 = entry is valid.
|
||
[1] FPD — Fault Processing Disable. 1 = faults from this device are suppressed.
|
||
[2:3] TT — Translation Type:
|
||
00 = Legacy mode (second-level translation only)
|
||
01 = PASID-granular translation
|
||
10 = Pass-through (no second-level translation, bypass)
|
||
11 = Reserved
|
||
[4:11] Rsvd — Reserved.
|
||
[12:63] SLPTPTR — Second Level Page Table Pointer. Physical address of the
|
||
second-level (guest) page table root. Must be 4KiB-aligned.
|
||
|
||
data[1]:
|
||
[0:15] DID — Domain Identifier. Associates this device with a domain.
|
||
[16:63] Rsvd — Reserved. Must be zero.
|
||
```
|
||
|
||
**Extended Context Entry** (when ECS=1 in ECAP):
|
||
|
||
```
|
||
data[0]:
|
||
[0] P — Present
|
||
[1] FPD — Fault Processing Disable
|
||
[2:3] TT — Translation Type (same as above)
|
||
[4:11] Rsvd
|
||
[12:63] SLPTPTR — Page table pointer (same as above)
|
||
|
||
data[1]:
|
||
[0:15] DID — Domain Identifier
|
||
[16:19] AW — Address Width. 0=3-level, 1=4-level, 2=5-level, 3=6-level.
|
||
[20:63] Rsvd — Reserved
|
||
```
|
||
|
||
### 2.4 DMAR ACPI Table
|
||
|
||
The DMAR (DMA Remapping) table describes Intel VT-d IOMMU topology. Found by scanning ACPI tables with signature "DMAR" (0x52414D44).
|
||
|
||
#### DMAR Header (48 bytes)
|
||
|
||
```
|
||
Offset Size Field Description
|
||
0x00 4 Signature "DMAR" (0x52414D44)
|
||
0x04 4 Length Total table length in bytes
|
||
0x08 1 Revision 1
|
||
0x09 1 Checksum ACPI checksum (sum of all bytes = 0)
|
||
0x0A 6 OemId OEM identifier
|
||
0x10 8 OemTableId OEM table identifier
|
||
0x18 4 OemRevision OEM revision
|
||
0x1C 4 CreatorId ASL compiler vendor
|
||
0x20 4 CreatorRevision ASL compiler revision
|
||
0x24 1 HostAddressWidth DMA physical address width (e.g., 46 for 64 TiB)
|
||
0x25 1 Flags [0] = INTR_REMAP (interrupt remapping supported)
|
||
[1] = X2APIC_OPT_OUT (firmware requests no x2APIC)
|
||
0x26 6 Reserved Reserved
|
||
0x2C ... RemappingStructures Variable-length list of DRHD/RMRR/ATSR/etc entries
|
||
```
|
||
|
||
#### DRHD (DMA Remapping Hardware Unit Definition)
|
||
|
||
Describes a single IOMMU unit. Multiple DRHD entries for systems with multiple IOMMUs.
|
||
|
||
```
|
||
Offset Size Field Description
|
||
0x00 2 Type 0x0001 = DRHD
|
||
0x02 2 Length Total length of this entry including device scope
|
||
0x04 1 Flags [0] = INCLUDE_PCI_ALL (1=this IOMMU handles all PCI devices
|
||
not covered by other non-ALL DRHD entries)
|
||
0x05 1 Reserved Reserved
|
||
0x06 2 SegmentNumber PCI Segment Group number
|
||
0x08 8 RegisterBaseAddress Physical MMIO base address of IOMMU registers
|
||
0x10 ... DeviceScope Variable-length device scope entries follow
|
||
```
|
||
|
||
#### DRHD Device Scope Entry
|
||
|
||
```
|
||
Offset Size Field Description
|
||
0x00 1 Type Device scope type:
|
||
0x01 = PCI Endpoint Device
|
||
0x02 = PCI SubHierarchy
|
||
0x03 = IOAPIC
|
||
0x04 = MSI Capable HPET
|
||
0x05 = ACPI Name-Space Device
|
||
0x01 1 Length Total length of this scope entry
|
||
0x02 1 EnumerationId Enumeration ID (e.g., IOAPIC ID for type 0x03)
|
||
0x03 1 StartBusNumber Starting PCI bus number
|
||
0x04 ... Path PCI path entries (each 2 bytes: Device, Function)
|
||
```
|
||
|
||
#### RMRR (Reserved Memory Region Reporting)
|
||
|
||
Describes memory regions that must be identity-mapped for specific devices (e.g., USB controllers, graphics).
|
||
|
||
```
|
||
Offset Size Field Description
|
||
0x00 2 Type 0x0002 = RMRR
|
||
0x02 2 Length Total length of this entry
|
||
0x04 2 Reserved Reserved
|
||
0x06 2 SegmentNumber PCI Segment Group
|
||
0x08 8 BaseAddress Physical start address of reserved region
|
||
0x10 8 EndAddress Physical end address of reserved region (inclusive)
|
||
0x18 ... DeviceScope Device scope entries for devices that access this region
|
||
```
|
||
|
||
#### Other DMAR Sub-Table Types
|
||
|
||
| Type | Name | Description |
|
||
|------|------|-------------|
|
||
| 0x0000 | Reserved | Reserved. |
|
||
| 0x0001 | DRHD | DMA Remapping Hardware Unit Definition. |
|
||
| 0x0002 | RMRR | Reserved Memory Region Reporting. |
|
||
| 0x0003 | ATSR | Root Port ATS (Address Translation Service) Capability Reporting. |
|
||
| 0x0004 | RHSA | Remapping Hardware Static Affinity (NUMA locality). |
|
||
| 0x0005 | ANDD | ACPI Name-space Device Declaration. |
|
||
|
||
### 2.5 Page Table Entry (Intel VT-d)
|
||
|
||
Intel VT-d uses multi-level page tables. The number of levels depends on SAGAW in CAP_REG. Typically 3 or 4 levels. Each PTE is 64 bits.
|
||
|
||
```
|
||
PTE (64 bits):
|
||
[0] R — Read permission. 1 = device may read.
|
||
[1] W — Write permission. 1 = device may write.
|
||
[2:11] Rsvd — Reserved. Must be zero unless extended features.
|
||
[12:63] ADDR — Physical address. For non-leaf: next-level table address (4KiB-aligned).
|
||
For leaf: page frame address.
|
||
Mask depends on page size:
|
||
4KiB: bits 12:63
|
||
2MiB: bits 21:63 (super page)
|
||
1GiB: bits 30:63 (super page)
|
||
```
|
||
|
||
**Extended PTE with Supervisor bit** (when CAP_REG supports it):
|
||
|
||
```
|
||
[2] S — Supervisor. 1 = supervisor-mode page.
|
||
[3] AW — Access/Dirty (for first-level translation).
|
||
[4] PSE — Page Size Extension (1 = super page at this level).
|
||
[5] A — Accessed flag.
|
||
[6] D — Dirty flag.
|
||
[7:11] Rsvd — Reserved.
|
||
```
|
||
|
||
### 2.6 Initialization Sequence (Intel VT-d)
|
||
|
||
Step-by-step register programming to bring up Intel VT-d.
|
||
|
||
```
|
||
Step 1: Discover IOMMU hardware
|
||
- Scan ACPI tables for DMAR signature
|
||
- Parse DRHD entries to find MMIO base addresses
|
||
- Read CAP_REG (0x08) for capabilities
|
||
- Read ECAP_REG (0x10) for extended capabilities
|
||
- Read VER_REG (0x00) for architecture version
|
||
|
||
Step 2: Ensure IOMMU is disabled
|
||
- Verify GSTS_REG.TES = 0 (translation not enabled)
|
||
- If TES=1, write GCMD_REG with TE=0, wait for TES to clear
|
||
|
||
Step 3: Allocate and zero Root Table
|
||
- Alloc 4 KiB (256 entries × 16 bytes)
|
||
- Zero all entries
|
||
- Write RTADDR_REG (0x20):
|
||
Bit 0 = 0 (legacy root table type)
|
||
Bits 12:63 = physical address
|
||
|
||
Step 4: Set Root Table Pointer
|
||
- Write GCMD_REG bit 30 (SRTP) = 1
|
||
- Poll GSTS_REG bit 30 (RTPS) until it reads 1
|
||
|
||
Step 5: Allocate and zero Context Tables (per bus)
|
||
- For each bus with devices:
|
||
Alloc 4 KiB (256 entries × 16 bytes)
|
||
Zero all entries
|
||
Set Root Entry P=1, CTP=context table address
|
||
|
||
Step 6: Configure Context Entries
|
||
- For each device:function:
|
||
Set P=1, TT=00 (legacy), SLPTPTR=page table root, DID=domain ID
|
||
- For pass-through: Set P=1, TT=10, DID=domain ID
|
||
|
||
Step 7: Build page tables (per domain)
|
||
- Create page table hierarchy matching SAGAW levels (typically 3 or 4)
|
||
- Map device-visible physical addresses to host physical addresses
|
||
- For identity mapping: GPA = HPA
|
||
|
||
Step 8: Handle RMRR regions
|
||
- Identity-map all RMRR regions for their respective devices
|
||
- These regions must always be accessible to the listed devices
|
||
|
||
Step 9: Allocate Interrupt Remap Table (if ECAP_REG.IR=1)
|
||
- Alloc table: 2^(IRTA_REG.TableSize+1) × 16 bytes
|
||
- Zero all entries
|
||
- Write IRTA_REG (0xB0):
|
||
Bits 0:6 = TableSize (e.g., 0xF = 65536 entries)
|
||
Bits 6:7 = IRTE Mode (00=remapped, 01=posted)
|
||
Bits 12:63 = Physical address
|
||
Bit 4 = EIME (Extended Interrupt Mode Enable) if x2APIC
|
||
|
||
Step 10: Enable Interrupt Remapping
|
||
- Write GCMD_REG bit 25 (SIRTP) = 1
|
||
- Poll GSTS_REG bit 25 (IRTPS) until 1
|
||
- Write GCMD_REG bit 24 (CFI) = 1 to block compatibility format interrupts
|
||
- Poll GSTS_REG bit 24 (CFIS) until 1
|
||
- Write GCMD_REG bit 23 (IR) = 1
|
||
- Poll GSTS_REG bit 23 (IRES) until 1
|
||
|
||
Step 11: Invalidate caches
|
||
- If QI (Queued Invalidation) supported (ECAP_REG.QI=1):
|
||
Set up Invalidation Queue (IQA_REG)
|
||
Submit queue-based invalidation descriptors
|
||
- Else use register-based invalidation:
|
||
Write CCMD_REG for context cache invalidation
|
||
Write IOTLB registers for TLB invalidation
|
||
|
||
Step 12: Enable translation
|
||
- Write GCMD_REG bit 31 (TE) = 1
|
||
- Poll GSTS_REG bit 31 (TES) until 1
|
||
|
||
Step 13: Enable fault handling
|
||
- Program FEDATA_REG, FEADDR_REG, FEUADDR_REG for MSI delivery
|
||
- Write FECTL_REG to enable fault interrupts
|
||
```
|
||
|
||
---
|
||
|
||
## 3. Rust Struct Definitions
|
||
|
||
These `#[repr(C, packed)]` structs can be used directly in the Red Bear OS IOMMU implementation. All bitfield access should go through helper methods (shown below) to ensure correct masking.
|
||
|
||
### 3.1 AMD-Vi Structs
|
||
|
||
```rust
|
||
// AMD-Vi MMIO Registers
|
||
|
||
/// AMD-Vi IOMMU MMIO register block.
|
||
/// Base address from ACPI IVRS IVHD entry.
|
||
#[repr(C)]
|
||
pub struct AmdViMmio {
|
||
pub dev_table_bar: u64, // 0x0000
|
||
pub cmd_buf_bar: u64, // 0x0008
|
||
pub evt_log_bar: u64, // 0x0010
|
||
pub control: u32, // 0x0018
|
||
_pad0: u32, // 0x001C
|
||
pub exclusion_base: u64, // 0x0020
|
||
pub exclusion_limit: u64, // 0x0028
|
||
pub extended_feature: u64, // 0x0030
|
||
pub ppr_log_bar: u64, // 0x0038
|
||
_pad1: [u64; 0x03F0], // 0x0040..0x1FFC (padding to 0x2000)
|
||
pub cmd_buf_head: u64, // 0x2000
|
||
pub cmd_buf_tail: u64, // 0x2008
|
||
pub evt_log_head: u64, // 0x2010
|
||
pub evt_log_tail: u64, // 0x2018
|
||
pub status: u32, // 0x2020
|
||
}
|
||
// Static assertions for offset verification
|
||
const _: () = assert!(core::mem::offset_of!(AmdViMmio, dev_table_bar) == 0x0000);
|
||
const _: () = assert!(core::mem::offset_of!(AmdViMmio, control) == 0x0018);
|
||
const _: () = assert!(core::mem::offset_of!(AmdViMmio, cmd_buf_head) == 0x2000);
|
||
|
||
/// AMD-Vi Control Register bits.
|
||
pub mod amd_control {
|
||
pub const IOMMU_ENABLE: u32 = 1 << 0;
|
||
pub const HT_TUN_EN: u32 = 1 << 1;
|
||
pub const EVENT_LOG_EN: u32 = 1 << 2;
|
||
pub const EVENT_INT_EN: u32 = 1 << 3;
|
||
pub const COM_WAIT_INT_EN: u32 = 1 << 4;
|
||
pub const CMD_BUF_EN: u32 = 1 << 5;
|
||
pub const PPR_LOG_EN: u32 = 1 << 6;
|
||
pub const PPR_INT_EN: u32 = 1 << 7;
|
||
pub const PPR_EN: u32 = 1 << 8;
|
||
pub const GT_EN: u32 = 1 << 9;
|
||
pub const GA_EN: u32 = 1 << 10;
|
||
pub const XT_EN: u32 = 1 << 22;
|
||
pub const NX_EN: u32 = 1 << 23;
|
||
}
|
||
|
||
/// AMD-Vi Status Register bits.
|
||
pub mod amd_status {
|
||
pub const IOMMU_RUNNING: u32 = 1 << 0;
|
||
pub const EVENT_OVERFLOW: u32 = 1 << 1;
|
||
pub const EVENT_LOG_INT: u32 = 1 << 2;
|
||
pub const COM_WAIT_INT: u32 = 1 << 3;
|
||
pub const PPR_OVERFLOW: u32 = 1 << 4;
|
||
pub const PPR_INT: u32 = 1 << 5;
|
||
}
|
||
|
||
/// AMD-Vi Extended Feature Register bits.
|
||
pub mod amd_ext_feature {
|
||
pub const PREF_SUP: u64 = 1 << 0;
|
||
pub const PPR_SUP: u64 = 1 << 1;
|
||
pub const XT_SUP: u64 = 1 << 2;
|
||
pub const NX_SUP: u64 = 1 << 3;
|
||
pub const GT_SUP: u64 = 1 << 4;
|
||
pub const IA_SUP: u64 = 1 << 6;
|
||
pub const GA_SUP: u64 = 1 << 7;
|
||
pub const HE_SUP: u64 = 1 << 8;
|
||
pub const PC_SUP: u64 = 1 << 9;
|
||
pub const GI_SUP: u64 = 1 << 57;
|
||
}
|
||
|
||
/// AMD-Vi Device Table Entry (256 bits = 32 bytes).
|
||
/// Index by BDF: (bus << 8) | (dev << 3) | func.
|
||
/// Table holds up to 65536 entries.
|
||
#[repr(C, packed)]
|
||
pub struct AmdDte {
|
||
pub data: [u64; 4],
|
||
}
|
||
|
||
impl AmdDte {
|
||
/// Create a zeroed (invalid) DTE.
|
||
pub const fn zeroed() -> Self {
|
||
Self { data: [0; 4] }
|
||
}
|
||
|
||
// data[0] accessors
|
||
|
||
pub fn valid(&self) -> bool {
|
||
self.data[0] & (1 << 0) != 0
|
||
}
|
||
|
||
pub fn set_valid(&mut self, v: bool) {
|
||
if v { self.data[0] |= 1 << 0; } else { self.data[0] &= !(1 << 0); }
|
||
}
|
||
|
||
pub fn translation_valid(&self) -> bool {
|
||
self.data[0] & (1 << 1) != 0
|
||
}
|
||
|
||
pub fn set_translation_valid(&mut self, v: bool) {
|
||
if v { self.data[0] |= 1 << 1; } else { self.data[0] &= !(1 << 1); }
|
||
}
|
||
|
||
/// Translation mode (bits 9:11). 0=no translation, 4=4-level page table.
|
||
pub fn mode(&self) -> u64 {
|
||
(self.data[0] >> 9) & 0x7
|
||
}
|
||
|
||
pub fn set_mode(&mut self, m: u64) {
|
||
self.data[0] = (self.data[0] & !(0x7 << 9)) | ((m & 0x7) << 9);
|
||
}
|
||
|
||
/// Page Table Root Pointer (bits 12:51 of data[0]).
|
||
/// Address must be 4KiB-aligned.
|
||
pub fn page_table_root(&self) -> u64 {
|
||
(self.data[0] >> 12) & 0x000F_FFFF_FFFF_FFFF
|
||
}
|
||
|
||
pub fn set_page_table_root(&mut self, addr: u64) {
|
||
self.data[0] = (self.data[0] & !(0x000F_FFFF_FFFF_FFFF << 12))
|
||
| ((addr >> 12) << 12);
|
||
}
|
||
|
||
/// Interrupt Remapping Enable (bit 61 of data[0]).
|
||
pub fn interrupt_remap(&self) -> bool {
|
||
self.data[0] & (1 << 61) != 0
|
||
}
|
||
|
||
pub fn set_interrupt_remap(&mut self, v: bool) {
|
||
if v { self.data[0] |= 1 << 61; } else { self.data[0] &= !(1 << 61); }
|
||
}
|
||
|
||
/// Interrupt Write permission (bit 62 of data[0]).
|
||
pub fn interrupt_write(&self) -> bool {
|
||
self.data[0] & (1 << 62) != 0
|
||
}
|
||
|
||
pub fn set_interrupt_write(&mut self, v: bool) {
|
||
if v { self.data[0] |= 1 << 62; } else { self.data[0] &= !(1 << 62); }
|
||
}
|
||
|
||
// data[1] accessors
|
||
|
||
/// Interrupt Remap Table Length (bits 0:3 of data[1]).
|
||
/// Number of IRTEs = 2^(len+1).
|
||
pub fn int_table_len(&self) -> u64 {
|
||
self.data[1] & 0xF
|
||
}
|
||
|
||
pub fn set_int_table_len(&mut self, len: u64) {
|
||
self.data[1] = (self.data[1] & !0xF) | (len & 0xF);
|
||
}
|
||
|
||
/// Interrupt Control (bits 4:5 of data[1]).
|
||
/// 00=abort, 01=pass-through, 10=remapped.
|
||
pub fn int_control(&self) -> u64 {
|
||
(self.data[1] >> 4) & 0x3
|
||
}
|
||
|
||
pub fn set_int_control(&mut self, ctl: u64) {
|
||
self.data[1] = (self.data[1] & !(0x3 << 4)) | ((ctl & 0x3) << 4);
|
||
}
|
||
|
||
/// Interrupt Remap Table Pointer (bits 6:51 of data[1]).
|
||
/// Address must be 4KiB-aligned.
|
||
pub fn int_remap_table_ptr(&self) -> u64 {
|
||
(self.data[1] >> 6) & 0x000F_FFFF_FFFF_FFFF
|
||
}
|
||
|
||
pub fn set_int_remap_table_ptr(&mut self, addr: u64) {
|
||
self.data[1] = (self.data[1] & !(0x000F_FFFF_FFFF_FFFF << 6))
|
||
| ((addr >> 6) << 6);
|
||
}
|
||
}
|
||
|
||
const _: () = assert!(core::mem::size_of::<AmdDte>() == 32);
|
||
|
||
/// AMD-Vi Interrupt Remapping Table Entry (128 bits = 16 bytes).
|
||
#[repr(C, packed)]
|
||
pub struct AmdIrte {
|
||
pub data: [u64; 2],
|
||
}
|
||
|
||
impl AmdIrte {
|
||
pub const fn zeroed() -> Self {
|
||
Self { data: [0; 2] }
|
||
}
|
||
|
||
/// Remap enable (bit 0 of data[0]).
|
||
pub fn remap_enabled(&self) -> bool {
|
||
self.data[0] & (1 << 0) != 0
|
||
}
|
||
|
||
pub fn set_remap_enabled(&mut self, v: bool) {
|
||
if v { self.data[0] |= 1 << 0; } else { self.data[0] &= !(1 << 0); }
|
||
}
|
||
|
||
/// Suppress IO Page Fault (bit 1).
|
||
pub fn suppress_io_pf(&self) -> bool {
|
||
self.data[0] & (1 << 1) != 0
|
||
}
|
||
|
||
pub fn set_suppress_io_pf(&mut self, v: bool) {
|
||
if v { self.data[0] |= 1 << 1; } else { self.data[0] &= !(1 << 1); }
|
||
}
|
||
|
||
/// Interrupt type (bits 2:4 of data[0]).
|
||
pub fn int_type(&self) -> u64 {
|
||
(self.data[0] >> 2) & 0x7
|
||
}
|
||
|
||
pub fn set_int_type(&mut self, t: u64) {
|
||
self.data[0] = (self.data[0] & !(0x7 << 2)) | ((t & 0x7) << 2);
|
||
}
|
||
|
||
/// Destination mode (bit 2 of data[0], when using xAPIC logical).
|
||
/// 0=physical APIC ID, 1=logical.
|
||
pub fn dst_mode(&self) -> bool {
|
||
self.data[0] & (1 << 2) != 0
|
||
}
|
||
|
||
/// Destination APIC ID (bits 16:31 of data[0], low 16 bits).
|
||
/// For x2APIC, high 32 bits in data[1] bits 0:31.
|
||
pub fn destination(&self) -> u32 {
|
||
((self.data[0] >> 16) & 0xFFFF) as u32 | ((self.data[1] & 0xFFFF_FFFF) as u32) << 16
|
||
}
|
||
|
||
pub fn set_destination(&mut self, apic_id: u32) {
|
||
self.data[0] = (self.data[0] & !(0xFFFF << 16)) | (((apic_id & 0xFFFF) as u64) << 16);
|
||
self.data[1] = (self.data[1] & !0xFFFF_FFFF) | ((apic_id >> 16) as u64);
|
||
}
|
||
|
||
/// Vector (bits 32:39 of data[0], but stored in low byte of upper word).
|
||
pub fn vector(&self) -> u8 {
|
||
((self.data[0] >> 32) & 0xFF) as u8
|
||
}
|
||
|
||
pub fn set_vector(&mut self, v: u8) {
|
||
self.data[0] = (self.data[0] & !(0xFF_u64 << 32)) | ((v as u64) << 32);
|
||
}
|
||
}
|
||
|
||
const _: () = assert!(core::mem::size_of::<AmdIrte>() == 16);
|
||
|
||
/// AMD-Vi Command Buffer Entry (128 bits = 16 bytes = 4 × u32).
|
||
#[repr(C, packed)]
|
||
pub struct AmdCmdEntry {
|
||
pub word: [u32; 4],
|
||
}
|
||
|
||
impl AmdCmdEntry {
|
||
pub const fn zeroed() -> Self {
|
||
Self { word: [0; 4] }
|
||
}
|
||
|
||
pub fn opcode(&self) -> u8 {
|
||
(self.word[0] & 0xF) as u8
|
||
}
|
||
|
||
pub fn set_opcode(&mut self, op: u8) {
|
||
self.word[0] = (self.word[0] & !0xF) | (op as u32 & 0xF);
|
||
}
|
||
}
|
||
|
||
const _: () = assert!(core::mem::size_of::<AmdCmdEntry>() == 16);
|
||
|
||
/// AMD-Vi Command Opcodes.
|
||
pub mod amd_cmd_opcode {
|
||
pub const COMPLETION_WAIT: u8 = 0x01;
|
||
pub const INVALIDATE_DEVTAB_ENTRY: u8 = 0x02;
|
||
pub const INVALIDATE_IOMMU_PAGES: u8 = 0x03;
|
||
pub const INVALIDATE_INTERRUPT_TABLE: u8 = 0x04;
|
||
pub const INVALIDATE_IOMMU_ALL: u8 = 0x05;
|
||
}
|
||
|
||
/// Build a COMPLETION_WAIT command.
|
||
pub fn amd_cmd_completion_wait(store_addr: u64, store_data: u32) -> AmdCmdEntry {
|
||
let mut cmd = AmdCmdEntry::zeroed();
|
||
cmd.set_opcode(amd_cmd_opcode::COMPLETION_WAIT);
|
||
cmd.word[0] |= 1 << 4; // Store = 1
|
||
cmd.word[1] = store_addr as u32;
|
||
cmd.word[2] = (store_addr >> 32) as u32;
|
||
cmd.word[3] = store_data;
|
||
cmd
|
||
}
|
||
|
||
/// Build an INVALIDATE_DEVTAB_ENTRY command for a given BDF.
|
||
pub fn amd_cmd_invalidate_devtab(bdf: u16) -> AmdCmdEntry {
|
||
let mut cmd = AmdCmdEntry::zeroed();
|
||
cmd.set_opcode(amd_cmd_opcode::INVALIDATE_DEVTAB_ENTRY);
|
||
cmd.word[1] = bdf as u32;
|
||
cmd
|
||
}
|
||
|
||
/// Build an INVALIDATE_IOMMU_PAGES command.
|
||
/// If size=true, invalidates all pages for the domain (address ignored).
|
||
pub fn amd_cmd_invalidate_pages(domain_id: u16, address: u64, size: bool) -> AmdCmdEntry {
|
||
let mut cmd = AmdCmdEntry::zeroed();
|
||
cmd.set_opcode(amd_cmd_opcode::INVALIDATE_IOMMU_PAGES);
|
||
if size { cmd.word[0] |= 1 << 4; } // S bit
|
||
cmd.word[1] = domain_id as u32;
|
||
cmd.word[2] = address as u32;
|
||
cmd.word[3] = (address >> 32) as u32;
|
||
cmd
|
||
}
|
||
|
||
/// Build an INVALIDATE_INTERRUPT_TABLE command.
|
||
pub fn amd_cmd_invalidate_int_table(bdf: u16) -> AmdCmdEntry {
|
||
let mut cmd = AmdCmdEntry::zeroed();
|
||
cmd.set_opcode(amd_cmd_opcode::INVALIDATE_INTERRUPT_TABLE);
|
||
cmd.word[1] = bdf as u32;
|
||
cmd
|
||
}
|
||
|
||
/// Build an INVALIDATE_IOMMU_ALL command.
|
||
pub fn amd_cmd_invalidate_all() -> AmdCmdEntry {
|
||
let mut cmd = AmdCmdEntry::zeroed();
|
||
cmd.set_opcode(amd_cmd_opcode::INVALIDATE_IOMMU_ALL);
|
||
cmd
|
||
}
|
||
|
||
/// AMD-Vi Event Log Entry (128 bits = 16 bytes = 4 × u32).
|
||
#[repr(C, packed)]
|
||
pub struct AmdEvtEntry {
|
||
pub word: [u32; 4],
|
||
}
|
||
|
||
impl AmdEvtEntry {
|
||
pub const fn zeroed() -> Self {
|
||
Self { word: [0; 4] }
|
||
}
|
||
|
||
/// Event code (bits 0:15 of word[0]).
|
||
pub fn event_code(&self) -> u16 {
|
||
(self.word[0] & 0xFFFF) as u16
|
||
}
|
||
|
||
/// Device ID / BDF (bits 0:15 of word[1]).
|
||
pub fn device_id(&self) -> u16 {
|
||
(self.word[1] & 0xFFFF) as u16
|
||
}
|
||
|
||
/// Fault address (word[2] | word[3] << 32).
|
||
pub fn fault_address(&self) -> u64 {
|
||
self.word[2] as u64 | ((self.word[3] as u64) << 32)
|
||
}
|
||
|
||
/// Flags from word[0] bits 16:22 (for IO_PAGE_FAULT).
|
||
pub fn fault_flags(&self) -> u16 {
|
||
((self.word[0] >> 16) & 0x7F) as u16
|
||
}
|
||
|
||
/// Read/write direction from fault flags bit 4 (RW).
|
||
pub fn is_write(&self) -> bool {
|
||
self.word[0] & (1 << 20) != 0
|
||
}
|
||
|
||
/// Permission error from fault flags bit 3 (PE).
|
||
pub fn is_permission_error(&self) -> bool {
|
||
self.word[0] & (1 << 19) != 0
|
||
}
|
||
}
|
||
|
||
const _: () = assert!(core::mem::size_of::<AmdEvtEntry>() == 16);
|
||
|
||
/// AMD-Vi Event Codes.
|
||
pub mod amd_evt_code {
|
||
pub const ILLEGAL_DEV_TABLE_ENTRY: u16 = 0x01;
|
||
pub const IO_PAGE_FAULT: u16 = 0x02;
|
||
pub const DEV_TABLE_HW_ERROR: u16 = 0x03;
|
||
pub const PAGE_TABLE_HW_ERROR: u16 = 0x04;
|
||
pub const ILLEGAL_COMMAND: u16 = 0x05;
|
||
pub const COMMAND_HW_ERROR: u16 = 0x06;
|
||
pub const IOTLB_INV_TIMEOUT: u16 = 0x07;
|
||
pub const INVALID_DEV_REQUEST: u16 = 0x08;
|
||
}
|
||
|
||
/// AMD-Vi Page Table Entry (64 bits).
|
||
#[repr(C, packed)]
|
||
pub struct AmdPte(pub u64);
|
||
|
||
impl AmdPte {
|
||
/// Present bit (bit 0).
|
||
pub fn present(&self) -> bool {
|
||
self.0 & (1 << 0) != 0
|
||
}
|
||
|
||
pub fn set_present(&mut self, v: bool) {
|
||
if v { self.0 |= 1 << 0; } else { self.0 &= !(1 << 0); }
|
||
}
|
||
|
||
/// Next level (bits 9:11). 0 = leaf PTE, 1-5 = pointer to next table.
|
||
pub fn next_level(&self) -> u64 {
|
||
(self.0 >> 9) & 0x7
|
||
}
|
||
|
||
pub fn set_next_level(&mut self, level: u64) {
|
||
self.0 = (self.0 & !(0x7 << 9)) | ((level & 0x7) << 9);
|
||
}
|
||
|
||
/// Output address (bits 12:51). Physical frame or next-table address.
|
||
pub fn output_addr(&self) -> u64 {
|
||
self.0 & (0x000F_FFFF_FFFF_FFFF << 12)
|
||
}
|
||
|
||
pub fn set_output_addr(&mut self, addr: u64) {
|
||
self.0 = (self.0 & !(0x000F_FFFF_FFFF_FFFF << 12)) | (addr & (0x000F_FFFF_FFFF_FFFF << 12));
|
||
}
|
||
|
||
/// No-execute (bit 63). Only valid when NXSup=1.
|
||
pub fn no_execute(&self) -> bool {
|
||
self.0 & (1 << 63) != 0
|
||
}
|
||
|
||
pub fn set_no_execute(&mut self, v: bool) {
|
||
if v { self.0 |= 1 << 63; } else { self.0 &= !(1 << 63); }
|
||
}
|
||
}
|
||
|
||
/// Build a leaf PTE that maps addr with Read+Write permissions.
|
||
pub fn amd_pte_leaf(addr: u64) -> AmdPte {
|
||
let mut pte = AmdPte(0);
|
||
pte.set_present(true);
|
||
pte.set_next_level(0); // leaf
|
||
pte.set_output_addr(addr);
|
||
pte.0 |= (1 << 2) | (1 << 3); // IW + IR (write + read permission)
|
||
pte
|
||
}
|
||
|
||
/// Build a non-leaf PTE that points to the next-level table at addr.
|
||
pub fn amd_pte_pointer(addr: u64, level: u64) -> AmdPte {
|
||
let mut pte = AmdPte(0);
|
||
pte.set_present(true);
|
||
pte.set_next_level(level);
|
||
pte.set_output_addr(addr);
|
||
pte
|
||
}
|
||
```
|
||
|
||
### 3.2 Intel VT-d Structs
|
||
|
||
```rust
|
||
/// Intel VT-d IOMMU MMIO register block.
|
||
/// Base address from ACPI DMAR DRHD entry.
|
||
#[repr(C)]
|
||
pub struct IntelVtdMmio {
|
||
pub ver_reg: u32, // 0x00 Version
|
||
_pad0: u32, // 0x04
|
||
pub cap_reg: u64, // 0x08 Capability
|
||
pub ecap_reg: u64, // 0x10 Extended Capability
|
||
pub gcmd_reg: u32, // 0x18 Global Command (write-only)
|
||
pub gsts_reg: u32, // 0x1C Global Status (read-only)
|
||
pub rtaddr_reg: u64, // 0x20 Root Table Address
|
||
pub ccmd_reg: u64, // 0x28 Context Command
|
||
_pad1: u64, // 0x30
|
||
pub fsts_reg: u32, // 0x34 Fault Status
|
||
pub fectl_reg: u32, // 0x38 Fault Event Control
|
||
pub fedata_reg: u32, // 0x3C Fault Event Data
|
||
pub feaddr_reg: u32, // 0x40 Fault Event Address
|
||
pub feuaddr_reg: u32, // 0x44 Fault Event Upper Address
|
||
_pad2: u32, // 0x48
|
||
pub aflog_reg: u64, // 0x4C Advanced Fault Log (note: spec says 0x48 for 64-bit)
|
||
_pad3: u32, // padding
|
||
pub pmen_reg: u32, // 0x64 Protected Memory Enable (spec: 0x64)
|
||
pub plmbase_reg: u32, // 0x68 Protected Low Memory Base
|
||
pub plmlimit_reg: u32, // 0x6C Protected Low Memory Limit
|
||
_pad4: u32,
|
||
pub phmbase_reg: u64, // 0x70 Protected High Memory Base
|
||
pub phmlimit_reg: u64, // 0x78 Protected High Memory Limit
|
||
pub iqh_reg: u64, // 0x80 Invalidation Queue Head
|
||
pub iqt_reg: u64, // 0x88 Invalidation Queue Tail
|
||
pub iqa_reg: u64, // 0x90 Invalidation Queue Address
|
||
pub ics_reg: u32, // 0x98 Invalidation Completion Status
|
||
_pad5: u32,
|
||
pub iectl_reg: u32, // 0xA0 Invalidation Event Control
|
||
pub iedata_reg: u32, // 0xA4 Invalidation Event Data
|
||
pub ieaddr_reg: u32, // 0xA8 Invalidation Event Address
|
||
pub ieuaddr_reg: u32, // 0xAC Invalidation Event Upper Address
|
||
_pad6: [u32; 2], // 0xB0..0xB7 (IRTA is separate below)
|
||
pub irta_reg: u64, // 0xB8 Interrupt Remapping Table Address
|
||
}
|
||
// Note: The VT-d register layout has vendor-specific gaps. For production code,
|
||
// use volatile read/write helpers with explicit offsets rather than relying
|
||
// purely on struct field offsets. The struct above serves as a reference.
|
||
// The IRTA_REG offset is 0xB8 per VT-d spec 5.0 (some earlier specs say 0xB0).
|
||
|
||
/// Intel VT-d CAP_REG bits.
|
||
pub mod vtd_cap {
|
||
pub const ND_MASK: u64 = 0x7;
|
||
pub const ZLR: u64 = 1 << 8;
|
||
pub const AFL: u64 = 1 << 9;
|
||
pub const RWBF: u64 = 1 << 10;
|
||
pub const PLMR: u64 = 1 << 11;
|
||
pub const PHMR: u64 = 1 << 13;
|
||
pub const CM: u64 = 1 << 14;
|
||
pub const SAGAW: u64 = 0xFF << 16;
|
||
pub const SAGAW_3LVL: u64 = 1 << 18; // 3-level page tables
|
||
pub const SAGAW_4LVL: u64 = 1 << 19; // 4-level page tables
|
||
pub const SAGAW_5LVL: u64 = 1 << 20; // 5-level page tables
|
||
pub const SAGAW_6LVL: u64 = 1 << 21; // 6-level page tables
|
||
pub const MGAW_SHIFT: u64 = 24;
|
||
pub const MGAW_MASK: u64 = 0x3F << 24;
|
||
}
|
||
|
||
/// Intel VT-d ECAP_REG bits.
|
||
pub mod vtd_ecap {
|
||
pub const C: u64 = 1 << 0; // Page Request
|
||
pub const QI: u64 = 1 << 1; // Queued Invalidation
|
||
pub const DT: u64 = 1 << 2; // Device TLB
|
||
pub const IR: u64 = 1 << 3; // Interrupt Remapping
|
||
pub const EIM: u64 = 1 << 4; // Extended Interrupt Mode (x2APIC)
|
||
pub const PT: u64 = 1 << 8; // Pass Through
|
||
pub const SC: u64 = 1 << 18; // Snoop Control
|
||
pub const IRO_SHIFT: u64 = 25;
|
||
pub const IRO_MASK: u64 = 0x3FF << 25;
|
||
}
|
||
|
||
/// Intel VT-d GCMD_REG bits (write-only).
|
||
pub mod vtd_gcmd {
|
||
pub const TE: u32 = 1 << 31; // Translation Enable
|
||
pub const SRTP: u32 = 1 << 30; // Set Root Table Pointer
|
||
pub const SFL: u32 = 1 << 29; // Set Fault Log
|
||
pub const EAFL: u32 = 1 << 28; // Enable Advanced Fault Log
|
||
pub const WBF: u32 = 1 << 27; // Write Buffer Flush
|
||
pub const QIE: u32 = 1 << 26; // Queued Invalidation Enable
|
||
pub const SIRTP: u32 = 1 << 25; // Set Interrupt Remap Table Pointer
|
||
pub const CFI: u32 = 1 << 24; // Compatibility Format Interrupt
|
||
pub const IR: u32 = 1 << 23; // Interrupt Remap Enable
|
||
}
|
||
|
||
/// Intel VT-d GSTS_REG bits (read-only).
|
||
pub mod vtd_gsts {
|
||
pub const TES: u32 = 1 << 31; // Translation Enable Status
|
||
pub const RTPS: u32 = 1 << 30; // Root Table Pointer Status
|
||
pub const FLS: u32 = 1 << 29; // Fault Log Status
|
||
pub const AFLS: u32 = 1 << 28; // Advanced Fault Log Status
|
||
pub const WBFS: u32 = 1 << 27; // Write Buffer Flush Status
|
||
pub const QIES: u32 = 1 << 26; // Queued Invalidation Enable Status
|
||
pub const IRTPS: u32 = 1 << 25; // Interrupt Remap Table Pointer Status
|
||
pub const CFIS: u32 = 1 << 24; // Compatibility Format Interrupt Status
|
||
pub const IRES: u32 = 1 << 23; // Interrupt Remap Enable Status
|
||
}
|
||
|
||
/// Intel VT-d Root Table Entry (128 bits = 16 bytes).
|
||
/// 256 entries (one per PCI bus). 4KiB-aligned.
|
||
#[repr(C, packed)]
|
||
pub struct VtdRootEntry {
|
||
pub data: [u64; 2],
|
||
}
|
||
|
||
impl VtdRootEntry {
|
||
pub const fn zeroed() -> Self {
|
||
Self { data: [0; 2] }
|
||
}
|
||
|
||
/// Present (bit 0 of data[0]).
|
||
pub fn present(&self) -> bool {
|
||
self.data[0] & (1 << 0) != 0
|
||
}
|
||
|
||
pub fn set_present(&mut self, v: bool) {
|
||
if v { self.data[0] |= 1 << 0; } else { self.data[0] &= !(1 << 0); }
|
||
}
|
||
|
||
/// Context Table Pointer (bits 12:63 of data[0]).
|
||
pub fn context_table_ptr(&self) -> u64 {
|
||
self.data[0] & !0xFFF
|
||
}
|
||
|
||
pub fn set_context_table_ptr(&mut self, addr: u64) {
|
||
self.data[0] = (self.data[0] & 0xFFF) | (addr & !0xFFF);
|
||
}
|
||
}
|
||
|
||
const _: () = assert!(core::mem::size_of::<VtdRootEntry>() == 16);
|
||
|
||
/// Intel VT-d Context Entry (128 bits = 16 bytes).
|
||
/// 256 entries per bus (one per device:function). 4KiB-aligned table.
|
||
#[repr(C, packed)]
|
||
pub struct VtdContextEntry {
|
||
pub data: [u64; 2],
|
||
}
|
||
|
||
impl VtdContextEntry {
|
||
pub const fn zeroed() -> Self {
|
||
Self { data: [0; 2] }
|
||
}
|
||
|
||
/// Present (bit 0 of data[0]).
|
||
pub fn present(&self) -> bool {
|
||
self.data[0] & (1 << 0) != 0
|
||
}
|
||
|
||
pub fn set_present(&mut self, v: bool) {
|
||
if v { self.data[0] |= 1 << 0; } else { self.data[0] &= !(1 << 0); }
|
||
}
|
||
|
||
/// Fault Processing Disable (bit 1 of data[0]).
|
||
pub fn fault_processing_disable(&self) -> bool {
|
||
self.data[0] & (1 << 1) != 0
|
||
}
|
||
|
||
pub fn set_fault_processing_disable(&mut self, v: bool) {
|
||
if v { self.data[0] |= 1 << 1; } else { self.data[0] &= !(1 << 1); }
|
||
}
|
||
|
||
/// Translation Type (bits 2:3 of data[0]).
|
||
/// 00=legacy, 01=PASID, 10=pass-through, 11=reserved.
|
||
pub fn translation_type(&self) -> u64 {
|
||
(self.data[0] >> 2) & 0x3
|
||
}
|
||
|
||
pub fn set_translation_type(&mut self, tt: u64) {
|
||
self.data[0] = (self.data[0] & !(0x3 << 2)) | ((tt & 0x3) << 2);
|
||
}
|
||
|
||
/// Second Level Page Table Pointer (bits 12:63 of data[0]).
|
||
pub fn slpt_ptr(&self) -> u64 {
|
||
self.data[0] & !0xFFF
|
||
}
|
||
|
||
pub fn set_slpt_ptr(&mut self, addr: u64) {
|
||
self.data[0] = (self.data[0] & 0xFFF) | (addr & !0xFFF);
|
||
}
|
||
|
||
/// Domain Identifier (bits 0:15 of data[1]).
|
||
pub fn domain_id(&self) -> u16 {
|
||
(self.data[1] & 0xFFFF) as u16
|
||
}
|
||
|
||
pub fn set_domain_id(&mut self, id: u16) {
|
||
self.data[1] = (self.data[1] & !0xFFFF) | (id as u64);
|
||
}
|
||
}
|
||
|
||
const _: () = assert!(core::mem::size_of::<VtdContextEntry>() == 16);
|
||
|
||
/// Intel VT-d Translation Type constants.
|
||
pub mod vtd_tt {
|
||
pub const LEGACY: u64 = 0b00;
|
||
pub const PASID: u64 = 0b01;
|
||
pub const PASS_THROUGH: u64 = 0b10;
|
||
}
|
||
|
||
/// Intel VT-d Page Table Entry (64 bits).
|
||
#[repr(C, packed)]
|
||
pub struct VtdPte(pub u64);
|
||
|
||
impl VtdPte {
|
||
/// Read permission (bit 0).
|
||
pub fn read(&self) -> bool {
|
||
self.0 & (1 << 0) != 0
|
||
}
|
||
|
||
pub fn set_read(&mut self, v: bool) {
|
||
if v { self.0 |= 1 << 0; } else { self.0 &= !(1 << 0); }
|
||
}
|
||
|
||
/// Write permission (bit 1).
|
||
pub fn write(&self) -> bool {
|
||
self.0 & (1 << 1) != 0
|
||
}
|
||
|
||
pub fn set_write(&mut self, v: bool) {
|
||
if v { self.0 |= 1 << 1; } else { self.0 &= !(1 << 1); }
|
||
}
|
||
|
||
/// Page frame or next-table address (bits 12:63).
|
||
pub fn addr(&self) -> u64 {
|
||
self.0 & !0xFFF
|
||
}
|
||
|
||
pub fn set_addr(&mut self, a: u64) {
|
||
self.0 = (self.0 & 0xFFF) | (a & !0xFFF);
|
||
}
|
||
}
|
||
|
||
/// Build a leaf PTE for Intel VT-d with read+write.
|
||
pub fn vtd_pte_leaf(addr: u64) -> VtdPte {
|
||
let mut pte = VtdPte(0);
|
||
pte.set_read(true);
|
||
pte.set_write(true);
|
||
pte.set_addr(addr);
|
||
pte
|
||
}
|
||
|
||
/// Build a non-leaf PTE for Intel VT-d pointing to next-level table.
|
||
pub fn vtd_pte_pointer(addr: u64) -> VtdPte {
|
||
let mut pte = VtdPte(0);
|
||
pte.set_read(true);
|
||
pte.set_write(true);
|
||
pte.set_addr(addr);
|
||
pte
|
||
}
|
||
```
|
||
|
||
### 3.3 ACPI Table Structs
|
||
|
||
```rust
|
||
/// Common ACPI table header (24 bytes).
|
||
#[repr(C, packed)]
|
||
pub struct AcpiTableHeader {
|
||
pub signature: [u8; 4],
|
||
pub length: u32,
|
||
pub revision: u8,
|
||
pub checksum: u8,
|
||
pub oem_id: [u8; 6],
|
||
pub oem_table_id: [u8; 8],
|
||
pub oem_revision: u32,
|
||
pub creator_id: [u8; 4],
|
||
pub creator_revision: u32,
|
||
}
|
||
|
||
const _: () = assert!(core::mem::size_of::<AcpiTableHeader>() == 36);
|
||
|
||
/// IVRS ACPI Table Header.
|
||
#[repr(C, packed)]
|
||
pub struct IvrsTable {
|
||
pub header: AcpiTableHeader, // 36 bytes
|
||
pub iv_info: u32, // IOMMU Virtualization Info
|
||
// Followed by variable-length IVHD/IVMD entries.
|
||
}
|
||
|
||
/// IVHD Entry (I/O Virtualization Hardware Definition).
|
||
#[repr(C, packed)]
|
||
pub struct IvhdEntry {
|
||
pub entry_type: u8, // 0x10 or 0x11
|
||
pub flags: u8, // Feature flags
|
||
pub length: u16, // Total length including device entries
|
||
pub device_id: u16, // BDF of IOMMU PCI device
|
||
pub capability_offset: u16, // PCI capability offset
|
||
pub iommu_base_address: u64, // MMIO base address
|
||
pub pci_segment_group: u16, // PCI segment group
|
||
pub iommu_info: u16, // IOMMU info (MSI number, unit ID)
|
||
pub iommu_efr: u32, // Extended features (type 11 only)
|
||
// Followed by variable-length device entries.
|
||
}
|
||
|
||
/// IVMD Entry (I/O Virtualization Memory Definition).
|
||
#[repr(C, packed)]
|
||
pub struct IvmdEntry {
|
||
pub entry_type: u8, // 0x20 or 0x21
|
||
pub flags: u8, // Memory block flags
|
||
pub length: u16, // Total length
|
||
pub device_id: u16, // Start DeviceId (BDF) or 0x0000 for all
|
||
pub aux_data: u16, // Auxiliary data
|
||
pub start_address: u64, // Physical start address
|
||
pub memory_length: u64, // Length in bytes
|
||
}
|
||
|
||
/// IVHD Device Entry (4 bytes minimum).
|
||
#[repr(C, packed)]
|
||
pub struct IvhdDeviceEntry {
|
||
pub dev_type: u8, // Device entry type (0x00..0x44)
|
||
pub data: u8, // LSA flags
|
||
pub device_id: u16, // BDF for SEL/SOR/EOR
|
||
}
|
||
|
||
/// DMAR ACPI Table Header.
|
||
#[repr(C, packed)]
|
||
pub struct DmarTable {
|
||
pub header: AcpiTableHeader, // 36 bytes
|
||
pub host_address_width: u8, // DMA physical address width
|
||
pub flags: u8, // [0]=INTR_REMAP, [1]=X2APIC_OPT_OUT
|
||
pub reserved: [u8; 10], // Reserved
|
||
// Followed by variable-length DRHD/RMRR entries.
|
||
}
|
||
|
||
const _: () = assert!(core::mem::size_of::<DmarTable>() == 48);
|
||
|
||
/// DRHD Entry (DMA Remapping Hardware Unit Definition).
|
||
#[repr(C, packed)]
|
||
pub struct DrhdEntry {
|
||
pub entry_type: u16, // 0x0001
|
||
pub length: u16, // Total length including device scope
|
||
pub flags: u8, // [0]=INCLUDE_PCI_ALL
|
||
pub reserved: u8, // Reserved
|
||
pub segment_number: u16, // PCI segment group
|
||
pub register_base_address: u64, // Physical MMIO base address
|
||
// Followed by variable-length device scope entries.
|
||
}
|
||
|
||
/// DRHD Device Scope Entry.
|
||
#[repr(C, packed)]
|
||
pub struct DmarDeviceScope {
|
||
pub scope_type: u8, // 0x01=PCI EP, 0x02=PCI sub-hierarchy, 0x03=IOAPIC, 0x04=HPET
|
||
pub length: u8, // Total length including path entries
|
||
pub enumeration_id: u8, // Enumeration ID (IOAPIC ID, etc.)
|
||
pub start_bus_number: u8, // Starting PCI bus number
|
||
// Followed by path entries (each 2 bytes: device, function).
|
||
}
|
||
|
||
/// RMRR Entry (Reserved Memory Region Reporting).
|
||
#[repr(C, packed)]
|
||
pub struct RmrrEntry {
|
||
pub entry_type: u16, // 0x0002
|
||
pub length: u16, // Total length
|
||
pub reserved: u16, // Reserved
|
||
pub segment_number: u16, // PCI segment group
|
||
pub base_address: u64, // Physical start address
|
||
pub end_address: u64, // Physical end address (inclusive)
|
||
// Followed by variable-length device scope entries.
|
||
}
|
||
|
||
/// DMAR Sub-Table Types.
|
||
pub mod dmar_type {
|
||
pub const DRHD: u16 = 0x0001;
|
||
pub const RMRR: u16 = 0x0002;
|
||
pub const ATSR: u16 = 0x0003;
|
||
pub const RHSA: u16 = 0x0004;
|
||
pub const ANDD: u16 = 0x0005;
|
||
}
|
||
|
||
/// DMAR Device Scope Types.
|
||
pub mod dmar_scope_type {
|
||
pub const PCI_ENDPOINT: u8 = 0x01;
|
||
pub const PCI_SUBHIERARCHY: u8 = 0x02;
|
||
pub const IOAPIC: u8 = 0x03;
|
||
pub const MSI_HPET: u8 = 0x04;
|
||
pub const ACPI_NAMESPACE: u8 = 0x05;
|
||
}
|
||
```
|
||
|
||
### 3.4 Utility Types
|
||
|
||
```rust
|
||
/// BDF (Bus:Device:Function) packed as u16.
|
||
/// Format: bus[15:8] | device[7:3] | function[2:0].
|
||
#[derive(Clone, Copy, Debug, PartialEq, Eq)]
|
||
pub struct Bdf(pub u16);
|
||
|
||
impl Bdf {
|
||
pub fn new(bus: u8, device: u8, function: u8) -> Self {
|
||
Self(((bus as u16) << 8) | ((device as u16 & 0x1F) << 3) | (function as u16 & 0x7))
|
||
}
|
||
|
||
pub fn bus(&self) -> u8 {
|
||
(self.0 >> 8) as u8
|
||
}
|
||
|
||
pub fn device(&self) -> u8 {
|
||
((self.0 >> 3) & 0x1F) as u8
|
||
}
|
||
|
||
pub fn function(&self) -> u8 {
|
||
(self.0 & 0x7) as u8
|
||
}
|
||
|
||
/// Index into the AMD Device Table (same as raw BDF value).
|
||
pub fn dev_table_index(&self) -> usize {
|
||
self.0 as usize
|
||
}
|
||
}
|
||
|
||
/// Domain ID. Used to group devices sharing a page table.
|
||
#[derive(Clone, Copy, Debug, PartialEq, Eq)]
|
||
pub struct DomainId(pub u16);
|
||
|
||
/// Page table level constants.
|
||
pub mod pt_level {
|
||
/// AMD-Vi levels (Mode field in DTE).
|
||
pub const AMD_1_LEVEL: u64 = 1;
|
||
pub const AMD_2_LEVEL: u64 = 2;
|
||
pub const AMD_3_LEVEL: u64 = 3;
|
||
pub const AMD_4_LEVEL: u64 = 4;
|
||
pub const AMD_5_LEVEL: u64 = 5;
|
||
pub const AMD_6_LEVEL: u64 = 6;
|
||
|
||
/// Intel VT-d levels (SAGAW field).
|
||
pub const VTd_3_LEVEL: u64 = 3;
|
||
pub const VTd_4_LEVEL: u64 = 4;
|
||
pub const VTd_5_LEVEL: u64 = 5;
|
||
pub const VTd_6_LEVEL: u64 = 6;
|
||
}
|
||
```
|
||
|
||
### 3.5 Size Constants
|
||
|
||
```rust
|
||
/// AMD-Vi sizing constants.
|
||
pub mod amd_sizes {
|
||
/// Maximum Device Table entries.
|
||
pub const MAX_DEV_TABLE_ENTRIES: usize = 65536;
|
||
/// Device Table Entry size.
|
||
pub const DTE_SIZE: usize = 32;
|
||
/// Maximum Device Table size (65536 × 32 bytes).
|
||
pub const MAX_DEV_TABLE_SIZE: usize = MAX_DEV_TABLE_ENTRIES * DTE_SIZE; // 2 MiB
|
||
|
||
/// Default Command Buffer entries.
|
||
pub const CMD_BUF_ENTRIES: usize = 512;
|
||
/// Command Buffer Entry size.
|
||
pub const CMD_ENTRY_SIZE: usize = 16;
|
||
/// Default Command Buffer size.
|
||
pub const CMD_BUF_SIZE: usize = CMD_BUF_ENTRIES * CMD_ENTRY_SIZE; // 8 KiB
|
||
|
||
/// Default Event Log entries.
|
||
pub const EVT_LOG_ENTRIES: usize = 512;
|
||
/// Event Log Entry size.
|
||
pub const EVT_ENTRY_SIZE: usize = 16;
|
||
/// Default Event Log size.
|
||
pub const EVT_LOG_SIZE: usize = EVT_LOG_ENTRIES * EVT_ENTRY_SIZE; // 8 KiB
|
||
|
||
/// IRTE size (128 bits).
|
||
pub const IRTE_SIZE: usize = 16;
|
||
/// Maximum Interrupt Remap Table entries (IntTabLen=11 → 2^12 = 4096).
|
||
pub const MAX_IRT_ENTRIES: usize = 4096;
|
||
/// Maximum Interrupt Remap Table size.
|
||
pub const MAX_IRT_SIZE: usize = MAX_IRT_ENTRIES * IRTE_SIZE; // 64 KiB
|
||
|
||
/// Page table entry size (both AMD and Intel).
|
||
pub const PTE_SIZE: usize = 8;
|
||
/// Entries per page table page (4KiB / 8 bytes).
|
||
pub const PTES_PER_PAGE: usize = 512;
|
||
}
|
||
|
||
/// Intel VT-d sizing constants.
|
||
pub mod vtd_sizes {
|
||
/// Root Table entries (one per PCI bus).
|
||
pub const ROOT_TABLE_ENTRIES: usize = 256;
|
||
/// Root/Context Entry size.
|
||
pub const ENTRY_SIZE: usize = 16;
|
||
/// Root Table size.
|
||
pub const ROOT_TABLE_SIZE: usize = ROOT_TABLE_ENTRIES * ENTRY_SIZE; // 4 KiB
|
||
|
||
/// Context Table entries (one per device:function per bus).
|
||
pub const CTX_TABLE_ENTRIES: usize = 256;
|
||
/// Context Table size.
|
||
pub const CTX_TABLE_SIZE: usize = CTX_TABLE_ENTRIES * ENTRY_SIZE; // 4 KiB
|
||
|
||
/// Page table entry size.
|
||
pub const PTE_SIZE: usize = 8;
|
||
/// Entries per page table page.
|
||
pub const PTES_PER_PAGE: usize = 512;
|
||
}
|
||
|
||
/// PCI BDF address space: 256 buses × 32 devices × 8 functions = 65536.
|
||
pub const PCI_BDF_COUNT: usize = 256 * 32 * 8;
|
||
```
|
||
|
||
---
|
||
|
||
## Appendix: Linux Kernel Reference
|
||
|
||
The Linux kernel IOMMU drivers are the primary reference implementation. Key files:
|
||
|
||
| Path | Description |
|
||
|------|-------------|
|
||
| `drivers/iommu/amd/amd_iommu_types.h` | AMD-Vi type definitions, DTE/IRTE/PTE formats, register constants |
|
||
| `drivers/iommu/amd/amd_iommu.c` | AMD-Vi main driver: init, command buffer, device table management |
|
||
| `drivers/iommu/amd/init.c` | AMD-Vi initialization, IVRS parsing, early setup |
|
||
| `drivers/iommu/amd/irq.c` | AMD-Vi interrupt remapping |
|
||
| `drivers/iommu/intel/dmar.c` | Intel VT-d DMAR table parsing |
|
||
| `drivers/iommu/intel/iommu.c` | Intel VT-d main driver |
|
||
| `drivers/iommu/intel/irq_remapping.c` | Intel VT-d interrupt remapping |
|
||
| `include/linux/intel-iommu.h` | Intel VT-d register definitions, struct definitions |
|
||
| `drivers/iommu/io-pgtable.c` | Generic page table allocation |
|
||
|
||
### Key Linux Constants for Cross-Reference
|
||
|
||
```c
|
||
// AMD DTE bits (from amd_iommu_types.h)
|
||
#define DTE_FLAG_V (1ULL << 0)
|
||
#define DTE_FLAG_TV (1ULL << 1)
|
||
#define DTE_FLAG_IR (1ULL << 61)
|
||
#define DTE_FLAG_IW (1ULL << 62)
|
||
#define DTE_FLAG_SE (1ULL << 8)
|
||
|
||
// AMD page table modes (DTE Mode field)
|
||
#define DTE_MODE_4LVL 4 // 4-level page tables (most common)
|
||
|
||
// AMD command opcodes
|
||
#define CMD_COMPLETION_WAIT 0x01
|
||
#define CMD_INVALIDATE_DEVTAB_ENTRY 0x02
|
||
#define CMD_INVALIDATE_IOMMU_PAGES 0x03
|
||
#define CMD_INVALIDATE_INTERRUPT_TABLE 0x04
|
||
|
||
// Intel DMAR flags
|
||
#define DMAR_INTR_REMAP 0x1
|
||
#define DMAR_X2APIC_OPT_OUT 0x2
|
||
|
||
// Intel context entry TT (Translation Type)
|
||
#define CONTEXT_TT_MULTI_LEVEL 0
|
||
#define CONTEXT_TT_DEV_IOTLB 1
|
||
#define CONTEXT_TT_PASS_THROUGH 2
|
||
```
|
||
|
||
---
|
||
|
||
*Document generated for Red Bear OS IOMMU implementation. Sources: AMD IOMMU Specification 48882 Rev 3.10, Intel VT-d Specification Rev 5.0, Linux kernel v6.x source.*
|