Skip to content
Snippets Groups Projects
  1. Oct 17, 2024
  2. Sep 10, 2024
  3. Sep 09, 2024
  4. Sep 04, 2024
  5. Sep 02, 2024
    • David Hildenbrand's avatar
      mm: remove follow_page() · 7290840d
      David Hildenbrand authored
      All users are gone, let's remove it and any leftovers in comments.  We'll
      leave any FOLL/follow_page_() naming cleanups as future work.
      
      Link: https://lkml.kernel.org/r/20240802155524.517137-11-david@redhat.com
      
      
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Alexander Gordeev <agordeev@linux.ibm.com>
      Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
      Cc: Claudio Imbrenda <imbrenda@linux.ibm.com>
      Cc: Gerald Schaefer <gerald.schaefer@linux.ibm.com>
      Cc: Heiko Carstens <hca@linux.ibm.com>
      Cc: Janosch Frank <frankja@linux.ibm.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Matthew Wilcox <willy@infradead.org>
      Cc: Sven Schnelle <svens@linux.ibm.com>
      Cc: Vasily Gorbik <gor@linux.ibm.com>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: Zi Yan <ziy@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      7290840d
  6. Jul 10, 2024
  7. Jul 04, 2024
  8. Jul 03, 2024
  9. Jun 26, 2024
  10. May 11, 2024
  11. May 07, 2024
    • Ryan Roberts's avatar
      mm: fix race between __split_huge_pmd_locked() and GUP-fast · 3a5a8d34
      Ryan Roberts authored
      __split_huge_pmd_locked() can be called for a present THP, devmap or
      (non-present) migration entry.  It calls pmdp_invalidate() unconditionally
      on the pmdp and only determines if it is present or not based on the
      returned old pmd.  This is a problem for the migration entry case because
      pmd_mkinvalid(), called by pmdp_invalidate() must only be called for a
      present pmd.
      
      On arm64 at least, pmd_mkinvalid() will mark the pmd such that any future
      call to pmd_present() will return true.  And therefore any lockless
      pgtable walker could see the migration entry pmd in this state and start
      interpretting the fields as if it were present, leading to BadThings (TM).
      GUP-fast appears to be one such lockless pgtable walker.
      
      x86 does not suffer the above problem, but instead pmd_mkinvalid() will
      corrupt the offset field of the swap entry within the swap pte.  See link
      below for discussion of that problem.
      
      Fix all of this by only calling pmdp_invalidate() for a present pmd.  And
      for good measure let's add a warning to all implementations of
      pmdp_invalidate[_ad]().  I've manually reviewed all other
      pmdp_invalidate[_ad]() call sites and believe all others to be conformant.
      
      This is a theoretical bug found during code review.  I don't have any test
      case to trigger it in practice.
      
      Link: https://lkml.kernel.org/r/20240501143310.1381675-1-ryan.roberts@arm.com
      Link: https://lore.kernel.org/all/0dd7827a-6334-439a-8fd0-43c98e6af22b@arm.com/
      
      
      Fixes: 84c3fc4e ("mm: thp: check pmd migration entry in common path")
      Signed-off-by: default avatarRyan Roberts <ryan.roberts@arm.com>
      Reviewed-by: default avatarZi Yan <ziy@nvidia.com>
      Reviewed-by: default avatarAnshuman Khandual <anshuman.khandual@arm.com>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Andreas Larsson <andreas@gaisler.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Aneesh Kumar K.V <aneesh.kumar@kernel.org>
      Cc: Borislav Petkov (AMD) <bp@alien8.de>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christian Borntraeger <borntraeger@linux.ibm.com>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Mark Rutland <mark.rutland@arm.com>
      Cc: Naveen N. Rao <naveen.n.rao@linux.ibm.com>
      Cc: Nicholas Piggin <npiggin@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sven Schnelle <svens@linux.ibm.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Will Deacon <will@kernel.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      3a5a8d34
  12. May 06, 2024
    • SeongJae Park's avatar
      Docs/mm/damon/design: document 'young page' type DAMOS filter · 26dd7cc7
      SeongJae Park authored
      Update DAMON design document for the newly added DAMOS filter type, 'young
      page'.
      
      Link: https://lkml.kernel.org/r/20240426195247.100306-6-sj@kernel.org
      
      
      Signed-off-by: default avatarSeongJae Park <sj@kernel.org>
      Cc: Honggyu Kim <honggyu.kim@sk.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      26dd7cc7
    • Peter Xu's avatar
      mm/page_table_check: support userfault wr-protect entries · 8430557f
      Peter Xu authored
      Allow page_table_check hooks to check over userfaultfd wr-protect criteria
      upon pgtable updates.  The rule is no co-existance allowed for any
      writable flag against userfault wr-protect flag.
      
      This should be better than c2da319c, where we used to only sanitize such
      issues during a pgtable walk, but when hitting such issue we don't have a
      good chance to know where does that writable bit came from [1], so that
      even the pgtable walk exposes a kernel bug (which is still helpful on
      triaging) but not easy to track and debug.
      
      Now we switch to track the source.  It's much easier too with the recent
      introduction of page table check.
      
      There are some limitations with using the page table check here for
      userfaultfd wr-protect purpose:
      
        - It is only enabled with explicit enablement of page table check configs
        and/or boot parameters, but should be good enough to track at least
        syzbot issues, as syzbot should enable PAGE_TABLE_CHECK[_ENFORCED] for
        x86 [1].  We used to have DEBUG_VM but it's now off for most distros,
        while distros also normally not enable PAGE_TABLE_CHECK[_ENFORCED], which
        is similar.
      
        - It conditionally works with the ptep_modify_prot API.  It will be
        bypassed when e.g. XEN PV is enabled, however still work for most of the
        rest scenarios, which should be the common cases so should be good
        enough.
      
        - Hugetlb check is a bit hairy, as the page table check cannot identify
        hugetlb pte or normal pte via trapping at set_pte_at(), because of the
        current design where hugetlb maps every layers to pte_t... For example,
        the default set_huge_pte_at() can invoke set_pte_at() directly and lose
        the hugetlb context, treating it the same as a normal pte_t. So far it's
        fine because we have huge_pte_uffd_wp() always equals to pte_uffd_wp() as
        long as supported (x86 only).  It'll be a bigger problem when we'll
        define _PAGE_UFFD_WP differently at various pgtable levels, because then
        one huge_pte_uffd_wp() per-arch will stop making sense first.. as of now
        we can leave this for later too.
      
      This patch also removes commit c2da319c altogether, as we have something
      better now.
      
      [1] https://lore.kernel.org/all/000000000000dce0530615c89210@google.com/
      
      Link: https://lkml.kernel.org/r/20240417212549.2766883-1-peterx@redhat.com
      
      
      Signed-off-by: default avatarPeter Xu <peterx@redhat.com>
      Reviewed-by: default avatarPasha Tatashin <pasha.tatashin@soleen.com>
      Cc: Axel Rasmussen <axelrasmussen@google.com>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: Nadav Amit <nadav.amit@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      8430557f
    • David Hildenbrand's avatar
      mm: track mapcount of large folios in single value · 05c5323b
      David Hildenbrand authored
      Let's track the mapcount of large folios in a single value.  The mapcount
      of a large folio currently corresponds to the sum of the entire mapcount
      and all page mapcounts.
      
      This sum is what we actually want to know in folio_mapcount() and it is
      also sufficient for implementing folio_mapped().
      
      With PTE-mapped THP becoming more important and more widely used, we want
      to avoid looping over all pages of a folio just to obtain the mapcount of
      large folios.  The comment "In the common case, avoid the loop when no
      pages mapped by PTE" in folio_total_mapcount() does no longer hold for
      mTHP that are always mapped by PTE.
      
      Further, we are planning on using folio_mapcount() more frequently, and
      might even want to remove page mapcounts for large folios in some kernel
      configs.  Therefore, allow for reading the mapcount of large folios
      efficiently and atomically without looping over any pages.
      
      Maintain the mapcount also for hugetlb pages for simplicity.  Use the new
      mapcount to implement folio_mapcount() and folio_mapped().  Make
      page_mapped() simply call folio_mapped().  We can now get rid of
      folio_large_is_mapped().
      
      _nr_pages_mapped is now only used in rmap code and for debugging purposes.
      Keep folio_nr_pages_mapped() around, but document that its use should be
      limited to rmap internals and debugging purposes.
      
      This change implies one additional atomic add/sub whenever
      mapping/unmapping (parts of) a large folio.
      
      As we now batch RMAP operations for PTE-mapped THP during fork(), during
      unmap/zap, and when PTE-remapping a PMD-mapped THP, and we adjust the
      large mapcount for a PTE batch only once, the added overhead in the common
      case is small.  Only when unmapping individual pages of a large folio
      (e.g., during COW), the overhead might be bigger in comparison, but it's
      essentially one additional atomic operation.
      
      Note that before the new mapcount would overflow, already our refcount
      would overflow: each mapping requires a folio reference.  Extend the
      focumentation of folio_mapcount().
      
      Link: https://lkml.kernel.org/r/20240409192301.907377-5-david@redhat.com
      
      
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      Reviewed-by: default avatarYin Fengwei <fengwei.yin@intel.com>
      Cc: Chris Zankel <chris@zankel.net>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: John Paul Adrian Glaubitz <glaubitz@physik.fu-berlin.de>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Miaohe Lin <linmiaohe@huawei.com>
      Cc: Muchun Song <muchun.song@linux.dev>
      Cc: Naoya Horiguchi <nao.horiguchi@gmail.com>
      Cc: Peter Xu <peterx@redhat.com>
      Cc: Richard Chang <richardycc@google.com>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Ryan Roberts <ryan.roberts@arm.com>
      Cc: Yang Shi <shy828301@gmail.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Cc: Zi Yan <ziy@nvidia.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      05c5323b
  13. May 02, 2024
  14. Apr 26, 2024
  15. Apr 16, 2024
  16. Mar 29, 2024
Loading