Skip to content
Snippets Groups Projects
  1. Aug 05, 2021
  2. Jul 30, 2021
  3. Jul 27, 2021
  4. Jul 08, 2021
  5. Jul 05, 2021
  6. Jul 01, 2021
    • Alistair Popple's avatar
      mm: remove special swap entry functions · af5cdaf8
      Alistair Popple authored
      Patch series "Add support for SVM atomics in Nouveau", v11.
      
      Introduction
      ============
      
      Some devices have features such as atomic PTE bits that can be used to
      implement atomic access to system memory.  To support atomic operations to
      a shared virtual memory page such a device needs access to that page which
      is exclusive of the CPU.  This series introduces a mechanism to
      temporarily unmap pages granting exclusive access to a device.
      
      These changes are required to support OpenCL atomic operations in Nouveau
      to shared virtual memory (SVM) regions allocated with the
      CL_MEM_SVM_ATOMICS clSVMAlloc flag.  A more complete description of the
      OpenCL SVM feature is available at
      https://www.khronos.org/registry/OpenCL/specs/3.0-unified/html/
      OpenCL_API.html#_shared_virtual_memory .
      
      Implementation
      ==============
      
      Exclusive device access is implemented by adding a new swap entry type
      (SWAP_DEVICE_EXCLUSIVE) which is similar to a migration entry.  The main
      difference is that on fault the original entry is immediately restored by
      the fault handler instead of waiting.
      
      Restoring the entry triggers calls to MMU notifers which allows a device
      driver to revoke the atomic access permission from the GPU prior to the
      CPU finalising the entry.
      
      Patches
      =======
      
      Patches 1 & 2 refactor existing migration and device private entry
      functions.
      
      Patches 3 & 4 rework try_to_unmap_one() by splitting out unrelated
      functionality into separate functions - try_to_migrate_one() and
      try_to_munlock_one().
      
      Patch 5 renames some existing code but does not introduce functionality.
      
      Patch 6 is a small clean-up to swap entry handling in copy_pte_range().
      
      Patch 7 contains the bulk of the implementation for device exclusive
      memory.
      
      Patch 8 contains some additions to the HMM selftests to ensure everything
      works as expected.
      
      Patch 9 is a cleanup for the Nouveau SVM implementation.
      
      Patch 10 contains the implementation of atomic access for the Nouveau
      driver.
      
      Testing
      =======
      
      This has been tested with upstream Mesa 21.1.0 and a simple OpenCL program
      which checks that GPU atomic accesses to system memory are atomic.
      Without this series the test fails as there is no way of write-protecting
      the page mapping which results in the device clobbering CPU writes.  For
      reference the test is available at
      https://ozlabs.org/~apopple/opencl_svm_atomics/
      
      Further testing has been performed by adding support for testing exclusive
      access to the hmm-tests kselftests.
      
      This patch (of 10):
      
      Remove multiple similar inline functions for dealing with different types
      of special swap entries.
      
      Both migration and device private swap entries use the swap offset to
      store a pfn.  Instead of multiple inline functions to obtain a struct page
      for each swap entry type use a common function pfn_swap_entry_to_page().
      Also open-code the various entry_to_pfn() functions as this results is
      shorter code that is easier to understand.
      
      Link: https://lkml.kernel.org/r/20210616105937.23201-1-apopple@nvidia.com
      Link: https://lkml.kernel.org/r/20210616105937.23201-2-apopple@nvidia.com
      
      
      Signed-off-by: default avatarAlistair Popple <apopple@nvidia.com>
      Reviewed-by: default avatarRalph Campbell <rcampbell@nvidia.com>
      Reviewed-by: default avatarChristoph Hellwig <hch@lst.de>
      Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Shakeel Butt <shakeelb@google.com>
      Cc: Ben Skeggs <bskeggs@redhat.com>
      Cc: Jason Gunthorpe <jgg@nvidia.com>
      Cc: John Hubbard <jhubbard@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      af5cdaf8
  7. Jun 28, 2021
  8. Jun 18, 2021
  9. May 05, 2021
    • Peter Xu's avatar
      hugetlb: pass vma into huge_pte_alloc() and huge_pmd_share() · aec44e0f
      Peter Xu authored
      Patch series "hugetlb: Disable huge pmd unshare for uffd-wp", v4.
      
      This series tries to disable huge pmd unshare of hugetlbfs backed memory
      for uffd-wp.  Although uffd-wp of hugetlbfs is still during rfc stage,
      the idea of this series may be needed for multiple tasks (Axel's uffd
      minor fault series, and Mike's soft dirty series), so I picked it out
      from the larger series.
      
      This patch (of 4):
      
      It is a preparation work to be able to behave differently in the per
      architecture huge_pte_alloc() according to different VMA attributes.
      
      Pass it deeper into huge_pmd_share() so that we can avoid the find_vma() call.
      
      [peterx@redhat.com: build fix]
        Link: https://lkml.kernel.org/r/20210304164653.GB397383@xz-x1Link: https://lkml.kernel.org/r/20210218230633.15028-1-peterx@redhat.com
      
      Link: https://lkml.kernel.org/r/20210218230633.15028-2-peterx@redhat.com
      
      
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Suggested-by: default avatarMike Kravetz <mike.kravetz@oracle.com>
      Cc: Adam Ruprecht <ruprecht@google.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Andrea Arcangeli <aarcange@redhat.com>
      Cc: Anshuman Khandual <anshuman.khandual@arm.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: Cannon Matthews <cannonmatthews@google.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Chinwen Chang <chinwen.chang@mediatek.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: "Dr . David Alan Gilbert" <dgilbert@redhat.com>
      Cc: Huang Ying <ying.huang@intel.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: Jerome Glisse <jglisse@redhat.com>
      Cc: Kirill A. Shutemov <kirill@shutemov.name>
      Cc: Lokesh Gidra <lokeshgidra@google.com>
      Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: "Michal Koutn" <mkoutny@suse.com>
      Cc: Michel Lespinasse <walken@google.com>
      Cc: Mike Rapoport <rppt@linux.vnet.ibm.com>
      Cc: Mina Almasry <almasrymina@google.com>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Oliver Upton <oupton@google.com>
      Cc: Shaohua Li <shli@fb.com>
      Cc: Shawn Anastasio <shawn@anastas.io>
      Cc: Steven Price <steven.price@arm.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      aec44e0f
  10. Apr 30, 2021
  11. Apr 12, 2021
  12. Apr 05, 2021
  13. Feb 26, 2021
  14. Feb 23, 2021
  15. Jan 19, 2021
    • Sven Schnelle's avatar
      s390: convert to generic entry · 56e62a73
      Sven Schnelle authored
      
      This patch converts s390 to use the generic entry infrastructure from
      kernel/entry/*.
      
      There are a few special things on s390:
      
      - PIF_PER_TRAP is moved to TIF_PER_TRAP as the generic code doesn't
        know about our PIF flags in exit_to_user_mode_loop().
      
      - The old code had several ways to restart syscalls:
      
        a) PIF_SYSCALL_RESTART, which was only set during execve to force a
           restart after upgrading a process (usually qemu-kvm) to pgste page
           table extensions.
      
        b) PIF_SYSCALL, which is set by do_signal() to indicate that the
           current syscall should be restarted. This is changed so that
           do_signal() now also uses PIF_SYSCALL_RESTART. Continuing to use
           PIF_SYSCALL doesn't work with the generic code, and changing it
           to PIF_SYSCALL_RESTART makes PIF_SYSCALL and PIF_SYSCALL_RESTART
           more unique.
      
      - On s390 calling sys_sigreturn or sys_rt_sigreturn is implemented by
      executing a svc instruction on the process stack which causes a fault.
      While handling that fault the fault code sets PIF_SYSCALL to hand over
      processing to the syscall code on exit to usermode.
      
      The patch introduces PIF_SYSCALL_RET_SET, which is set if ptrace sets
      a return value for a syscall. The s390x ptrace ABI uses r2 both for the
      syscall number and return value, so ptrace cannot set the syscall number +
      return value at the same time. The flag makes handling that a bit easier.
      do_syscall() will just skip executing the syscall if PIF_SYSCALL_RET_SET
      is set.
      
      CONFIG_DEBUG_ASCE was removd in favour of the generic CONFIG_DEBUG_ENTRY.
      CR1/7/13 will be checked both on kernel entry and exit to contain the
      correct asces.
      
      Signed-off-by: default avatarSven Schnelle <svens@linux.ibm.com>
      Signed-off-by: default avatarVasily Gorbik <gor@linux.ibm.com>
      56e62a73
  16. Dec 10, 2020
  17. Nov 23, 2020
    • Heiko Carstens's avatar
      s390/mm: use invalid asce instead of kernel asce · 0290c9e3
      Heiko Carstens authored
      
      Create a region 3 page table which contains only invalid entries, and
      use that via "s390_invalid_asce" instead of the kernel ASCE whenever
      there is either
      - no user address space available, e.g. during early startup
      - as an intermediate ASCE when address spaces are switched
      
      This makes sure that user space accesses in such situations are
      guaranteed to fail.
      
      Reviewed-by: default avatarSven Schnelle <svens@linux.ibm.com>
      Reviewed-by: default avatarAlexander Gordeev <agordeev@linux.ibm.com>
      Signed-off-by: default avatarHeiko Carstens <hca@linux.ibm.com>
      0290c9e3
    • Heiko Carstens's avatar
      s390/mm: remove set_fs / rework address space handling · 87d59863
      Heiko Carstens authored
      
      Remove set_fs support from s390. With doing this rework address space
      handling and simplify it. As a result address spaces are now setup
      like this:
      
      CPU running in              | %cr1 ASCE | %cr7 ASCE | %cr13 ASCE
      ----------------------------|-----------|-----------|-----------
      user space                  |  user     |  user     |  kernel
      kernel, normal execution    |  kernel   |  user     |  kernel
      kernel, kvm guest execution |  gmap     |  user     |  kernel
      
      To achieve this the getcpu vdso syscall is removed in order to avoid
      secondary address mode and a separate vdso address space in for user
      space. The getcpu vdso syscall will be implemented differently with a
      subsequent patch.
      
      The kernel accesses user space always via secondary address space.
      This happens in different ways:
      - with mvcos in home space mode and directly read/write to secondary
        address space
      - with mvcs/mvcp in primary space mode and copy from primary space to
        secondary space or vice versa
      - with e.g. cs in secondary space mode and access secondary space
      
      Switching translation modes happens with sacf before and after
      instructions which access user space, like before.
      
      Lazy handling of control register reloading is removed in the hope to
      make everything simpler, but at the cost of making kernel entry and
      exit a bit slower. That is: on kernel entry the primary asce is always
      changed to contain the kernel asce, and on kernel exit the primary
      asce is changed again so it contains the user asce.
      
      In kernel mode there is only one exception to the primary asce: when
      kvm guests are executed the primary asce contains the gmap asce (which
      describes the guest address space). The primary asce is reset to
      kernel asce whenever kvm guest execution is interrupted, so that this
      doesn't has to be taken into account for any user space accesses.
      
      Reviewed-by: default avatarSven Schnelle <svens@linux.ibm.com>
      Signed-off-by: default avatarHeiko Carstens <hca@linux.ibm.com>
      87d59863
  18. Nov 20, 2020
  19. Nov 11, 2020
  20. Nov 09, 2020
  21. Oct 25, 2020
  22. Oct 21, 2020
  23. Oct 14, 2020
    • Mike Rapoport's avatar
      arch, drivers: replace for_each_membock() with for_each_mem_range() · b10d6bca
      Mike Rapoport authored
      There are several occurrences of the following pattern:
      
      	for_each_memblock(memory, reg) {
      		start = __pfn_to_phys(memblock_region_memory_base_pfn(reg);
      		end = __pfn_to_phys(memblock_region_memory_end_pfn(reg));
      
      		/* do something with start and end */
      	}
      
      Using for_each_mem_range() iterator is more appropriate in such cases and
      allows simpler and cleaner code.
      
      [akpm@linux-foundation.org: fix arch/arm/mm/pmsa-v7.c build]
      [rppt@linux.ibm.com: mips: fix cavium-octeon build caused by memblock refactoring]
        Link: http://lkml.kernel.org/r/20200827124549.GD167163@linux.ibm.com
      
      
      
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Daniel Axtens <dja@axtens.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Emil Renner Berthing <kernel@esmil.dk>
      Cc: Hari Bathini <hbathini@linux.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: https://lkml.kernel.org/r/20200818151634.14343-13-rppt@kernel.org
      
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b10d6bca
    • Mike Rapoport's avatar
      arch, mm: replace for_each_memblock() with for_each_mem_pfn_range() · c9118e6c
      Mike Rapoport authored
      
      There are several occurrences of the following pattern:
      
      	for_each_memblock(memory, reg) {
      		start_pfn = memblock_region_memory_base_pfn(reg);
      		end_pfn = memblock_region_memory_end_pfn(reg);
      
      		/* do something with start_pfn and end_pfn */
      	}
      
      Rather than iterate over all memblock.memory regions and each time query
      for their start and end PFNs, use for_each_mem_pfn_range() iterator to get
      simpler and clearer code.
      
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarBaoquan He <bhe@redhat.com>
      Acked-by: Miguel Ojeda <miguel.ojeda.sandonis@gmail.com>	[.clang-format]
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
      Cc: Borislav Petkov <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Daniel Axtens <dja@axtens.net>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Emil Renner Berthing <kernel@esmil.dk>
      Cc: Hari Bathini <hbathini@linux.ibm.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Palmer Dabbelt <palmer@dabbelt.com>
      Cc: Paul Mackerras <paulus@samba.org>
      Cc: Paul Walmsley <paul.walmsley@sifive.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Thomas Bogendoerfer <tsbogend@alpha.franken.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will@kernel.org>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Link: https://lkml.kernel.org/r/20200818151634.14343-12-rppt@kernel.org
      
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      c9118e6c
  24. Sep 16, 2020
    • Vasily Gorbik's avatar
      s390/kasan: support protvirt with 4-level paging · c360c9a2
      Vasily Gorbik authored
      
      Currently the kernel crashes in Kasan instrumentation code if
      CONFIG_KASAN_S390_4_LEVEL_PAGING is used on protected virtualization
      capable machine where the ultravisor imposes addressing limitations on
      the host and those limitations are lower then KASAN_SHADOW_OFFSET.
      
      The problem is that Kasan has to know in advance where vmalloc/modules
      areas would be. With protected virtualization enabled vmalloc/modules
      areas are moved down to the ultravisor secure storage limit while kasan
      still expects them at the very end of 4-level paging address space.
      
      To fix that make Kasan recognize when protected virtualization is enabled
      and predefine vmalloc/modules areas position which are compliant with
      ultravisor secure storage limit.
      
      Kasan shadow itself stays in place and might reside above that ultravisor
      secure storage limit.
      
      One slight difference compaired to a kernel without Kasan enabled is that
      vmalloc/modules areas position is not reverted to default if ultravisor
      initialization fails. It would still be below the ultravisor secure
      storage limit.
      
      Kernel layout with kasan, 4-level paging and protected virtualization
      enabled (ultravisor secure storage limit is at 0x0000800000000000):
      ---[ vmemmap Area Start ]---
      0x0000400000000000-0x0000400080000000
      ---[ vmemmap Area End ]---
      ---[ vmalloc Area Start ]---
      0x00007fe000000000-0x00007fff80000000
      ---[ vmalloc Area End ]---
      ---[ Modules Area Start ]---
      0x00007fff80000000-0x0000800000000000
      ---[ Modules Area End ]---
      ---[ Kasan Shadow Start ]---
      0x0018000000000000-0x001c000000000000
      ---[ Kasan Shadow End ]---
      0x001c000000000000-0x0020000000000000         1P PGD I
      
      Kernel layout with kasan, 4-level paging and protected virtualization
      disabled/unsupported:
      ---[ vmemmap Area Start ]---
      0x0000400000000000-0x0000400060000000
      ---[ vmemmap Area End ]---
      ---[ Kasan Shadow Start ]---
      0x0018000000000000-0x001c000000000000
      ---[ Kasan Shadow End ]---
      ---[ vmalloc Area Start ]---
      0x001fffe000000000-0x001fffff80000000
      ---[ vmalloc Area End ]---
      ---[ Modules Area Start ]---
      0x001fffff80000000-0x0020000000000000
      ---[ Modules Area End ]---
      
      Signed-off-by: default avatarVasily Gorbik <gor@linux.ibm.com>
      c360c9a2
    • Vasily Gorbik's avatar
      s390/mm,ptdump: sort markers · ee4b2ce6
      Vasily Gorbik authored
      
      Kasan configuration options and size of physical memory present could
      affect kernel memory layout. In particular vmemmap, vmalloc and modules
      might come before kasan shadow or after it. To make ptdump correctly
      output markers in the right order markers have to be sorted.
      
      To preserve the original order of markers with the same start address
      avoid using sort() from lib/sort.c (which is not stable sorting algorithm)
      and sort markers in place.
      
      Reviewed-by: default avatarHeiko Carstens <hca@linux.ibm.com>
      Signed-off-by: default avatarVasily Gorbik <gor@linux.ibm.com>
      ee4b2ce6
    • Heiko Carstens's avatar
      s390/mm,ptdump: add proper ifdefs · 48111b48
      Heiko Carstens authored
      
      Use ifdefs instead of IS_ENABLED() to avoid compile error
      for !PTDUMP_DEBUGFS:
      
      arch/s390/mm/dump_pagetables.c: In function ‘pt_dump_init’:
      arch/s390/mm/dump_pagetables.c:248:64: error: ‘ptdump_fops’ undeclared (first use in this function); did you mean ‘pidfd_fops’?
         debugfs_create_file("kernel_page_tables", 0400, NULL, NULL, &ptdump_fops);
      
      Reported-by: default avatarJulian Wiedmann <jwi@linux.ibm.com>
      Fixes: 08c8e685 ("s390: add ARCH_HAS_DEBUG_WX support")
      Signed-off-by: default avatarHeiko Carstens <hca@linux.ibm.com>
      Signed-off-by: default avatarVasily Gorbik <gor@linux.ibm.com>
      48111b48
  25. Sep 14, 2020
Loading