diff --git a/Documentation/arch/x86/xstate.rst b/Documentation/arch/x86/xstate.rst index 5cec7fb558d60346d2b55039c2dd7882575ea3aa..ae5c69e48b115180ab25f96ee67ea100dfe1b664 100644 --- a/Documentation/arch/x86/xstate.rst +++ b/Documentation/arch/x86/xstate.rst @@ -11,6 +11,22 @@ are enabled by XCR0 as well, but the first use of related instruction is trapped by the kernel because by default the required large XSTATE buffers are not allocated automatically. +The purpose for dynamic features +-------------------------------- + +Legacy userspace libraries often have hard-coded, static sizes for +alternate signal stacks, often using MINSIGSTKSZ which is typically 2KB. +That stack must be able to store at *least* the signal frame that the +kernel sets up before jumping into the signal handler. That signal frame +must include an XSAVE buffer defined by the CPU. + +However, that means that the size of signal stacks is dynamic, not static, +because different CPUs have differently-sized XSAVE buffers. A compiled-in +size of 2KB with existing applications is too small for new CPU features +like AMX. Instead of universally requiring larger stack, with the dynamic +enabling, the kernel can enforce userspace applications to have +properly-sized altstacks. + Using dynamically enabled XSTATE features in user space applications -------------------------------------------------------------------- @@ -64,6 +80,61 @@ the handler allocates a larger xstate buffer for the task so the large state can be context switched. In the unlikely cases that the allocation fails, the kernel sends SIGSEGV. +AMX TILE_DATA enabling example +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Below is the example of how userspace applications enable +TILE_DATA dynamically: + + 1. The application first needs to query the kernel for AMX + support:: + + #include <asm/prctl.h> + #include <sys/syscall.h> + #include <stdio.h> + #include <unistd.h> + + #ifndef ARCH_GET_XCOMP_SUPP + #define ARCH_GET_XCOMP_SUPP 0x1021 + #endif + + #ifndef ARCH_XCOMP_TILECFG + #define ARCH_XCOMP_TILECFG 17 + #endif + + #ifndef ARCH_XCOMP_TILEDATA + #define ARCH_XCOMP_TILEDATA 18 + #endif + + #define MASK_XCOMP_TILE ((1 << ARCH_XCOMP_TILECFG) | \ + (1 << ARCH_XCOMP_TILEDATA)) + + unsigned long features; + long rc; + + ... + + rc = syscall(SYS_arch_prctl, ARCH_GET_XCOMP_SUPP, &features); + + if (!rc && (features & MASK_XCOMP_TILE) == MASK_XCOMP_TILE) + printf("AMX is available.\n"); + + 2. After that, determining support for AMX, an application must + explicitly ask permission to use it:: + + #ifndef ARCH_REQ_XCOMP_PERM + #define ARCH_REQ_XCOMP_PERM 0x1023 + #endif + + ... + + rc = syscall(SYS_arch_prctl, ARCH_REQ_XCOMP_PERM, ARCH_XCOMP_TILEDATA); + + if (!rc) + printf("AMX is ready for use.\n"); + +Note this example does not include the sigaltstack preparation. + Dynamic features in signal frames --------------------------------- @@ -72,3 +143,32 @@ entry if the feature is in its initial configuration. This differs from non-dynamic features which are always written regardless of their configuration. Signal handlers can examine the XSAVE buffer's XSTATE_BV field to determine if a features was written. + +Dynamic features for virtual machines +------------------------------------- + +The permission for the guest state component needs to be managed separately +from the host, as they are exclusive to each other. A coupled of options +are extended to control the guest permission: + +-ARCH_GET_XCOMP_GUEST_PERM + + arch_prctl(ARCH_GET_XCOMP_GUEST_PERM, &features); + + ARCH_GET_XCOMP_GUEST_PERM is a variant of ARCH_GET_XCOMP_PERM. So it + provides the same semantics and functionality but for the guest + components. + +-ARCH_REQ_XCOMP_GUEST_PERM + + arch_prctl(ARCH_REQ_XCOMP_GUEST_PERM, feature_nr); + + ARCH_REQ_XCOMP_GUEST_PERM is a variant of ARCH_REQ_XCOMP_PERM. It has the + same semantics for the guest permission. While providing a similar + functionality, this comes with a constraint. Permission is frozen when the + first VCPU is created. Any attempt to change permission after that point + is going to be rejected. So, the permission has to be requested before the + first VCPU creation. + +Note that some VMMs may have already established a set of supported state +components. These options are not presumed to support any particular VMM. diff --git a/arch/x86/include/uapi/asm/prctl.h b/arch/x86/include/uapi/asm/prctl.h index 500b96e71f1868dd097a2a796c8da89b5f85bed3..f298c778f856e9f77a8979120b4e0ac6451ec0ee 100644 --- a/arch/x86/include/uapi/asm/prctl.h +++ b/arch/x86/include/uapi/asm/prctl.h @@ -16,6 +16,9 @@ #define ARCH_GET_XCOMP_GUEST_PERM 0x1024 #define ARCH_REQ_XCOMP_GUEST_PERM 0x1025 +#define ARCH_XCOMP_TILECFG 17 +#define ARCH_XCOMP_TILEDATA 18 + #define ARCH_MAP_VDSO_X32 0x2001 #define ARCH_MAP_VDSO_32 0x2002 #define ARCH_MAP_VDSO_64 0x2003