Skip to content
Snippets Groups Projects
  1. Feb 09, 2023
  2. Oct 03, 2022
  3. Sep 27, 2022
  4. Sep 12, 2022
  5. May 10, 2022
  6. May 09, 2022
  7. May 02, 2022
  8. Apr 15, 2022
    • Minchan Kim's avatar
      mm: fix unexpected zeroed page mapping with zram swap · e914d8f0
      Minchan Kim authored
      Two processes under CLONE_VM cloning, user process can be corrupted by
      seeing zeroed page unexpectedly.
      
            CPU A                        CPU B
      
        do_swap_page                do_swap_page
        SWP_SYNCHRONOUS_IO path     SWP_SYNCHRONOUS_IO path
        swap_readpage valid data
          swap_slot_free_notify
            delete zram entry
                                    swap_readpage zeroed(invalid) data
                                    pte_lock
                                    map the *zero data* to userspace
                                    pte_unlock
        pte_lock
        if (!pte_same)
          goto out_nomap;
        pte_unlock
        return and next refault will
        read zeroed data
      
      The swap_slot_free_notify is bogus for CLONE_VM case since it doesn't
      increase the refcount of swap slot at copy_mm so it couldn't catch up
      whether it's safe or not to discard data from backing device.  In the
      case, only the lock it could rely on to synchronize swap slot freeing is
      page table lock.  Thus, this patch gets rid of the swap_slot_free_notify
      function.  With this patch, CPU A will see correct data.
      
            CPU A                        CPU B
      
        do_swap_page                do_swap_page
        SWP_SYNCHRONOUS_IO path     SWP_SYNCHRONOUS_IO path
                                    swap_readpage original data
                                    pte_lock
                                    map the original data
                                    swap_free
                                      swap_range_free
                                        bd_disk->fops->swap_slot_free_notify
        swap_readpage read zeroed data
                                    pte_unlock
        pte_lock
        if (!pte_same)
          goto out_nomap;
        pte_unlock
        return
        on next refault will see mapped data by CPU B
      
      The concern of the patch would increase memory consumption since it
      could keep wasted memory with compressed form in zram as well as
      uncompressed form in address space.  However, most of cases of zram uses
      no readahead and do_swap_page is followed by swap_free so it will free
      the compressed form from in zram quickly.
      
      Link: https://lkml.kernel.org/r/YjTVVxIAsnKAXjTd@google.com
      
      
      Fixes: 0bcac06f ("mm, swap: skip swapcache for swapin of synchronous device")
      Reported-by: default avatarIvan Babrou <ivan@cloudflare.com>
      Tested-by: default avatarIvan Babrou <ivan@cloudflare.com>
      Signed-off-by: default avatarMinchan Kim <minchan@kernel.org>
      Cc: Nitin Gupta <ngupta@vflare.org>
      Cc: Sergey Senozhatsky <senozhatsky@chromium.org>
      Cc: Jens Axboe <axboe@kernel.dk>
      Cc: David Hildenbrand <david@redhat.com>
      Cc: <stable@vger.kernel.org>	[4.14+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e914d8f0
  9. Mar 22, 2022
    • Johannes Weiner's avatar
      mm: page_io: fix psi memory pressure error on cold swapins · d8c47cc7
      Johannes Weiner authored
      Once upon a time, all swapins counted toward memory pressure[1].  Then
      Joonsoo introduced workingset detection for anonymous pages and we gained
      the ability to distinguish hot from cold swapins[2][3].  But we failed to
      update swap_readpage() accordingly, and now we account partial memory
      pressure in the swapin path of cold memory.
      
      Not for all situations - which adds more inconsistency: paths using the
      conventional submit_bio() and lock_page() route will not see much pressure
      - unless storage itself is heavily congested and the bio submissions
      stall.  ZRAM and ZSWAP do most of the work directly from swap_readpage()
      and will see all swapins reflected as pressure.
      
      IOW, a workload doing cold swapins could see little to no pressure
      reported with on-disk swap, but potentially high pressure with a zram or
      zswap backend.  That confuses any psi-based health monitoring, load
      shedding, proactive reclaim, or userspace OOM killing schemes that might
      be in place for the workload.
      
      Restore consistency by making all swapin stall accounting conditional on
      the page actually being part of the workingset.
      
      [1] commit 93779069 ("mm/page_io.c: annotate refault stalls from swap_readpage")
      [2] commit aae466b0 ("mm/swap: implement workingset detection for anonymous LRU")
      [3] commit cad8320b ("mm/swap: don't SetPageWorkingset unconditionally during swapin")
      
      Link: https://lkml.kernel.org/r/20220214214921.419687-1-hannes@cmpxchg.org
      
      
      Signed-off-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Reported-by: default avatarCGEL <cgel.zte@gmail.com>
      Acked-by: default avatarMinchan Kim <minchan@kernel.org>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Yu Zhao <yuzhao@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d8c47cc7
  10. Mar 16, 2022
  11. Mar 15, 2022
  12. Feb 02, 2022
  13. Jan 20, 2022
  14. Oct 18, 2021
  15. Sep 27, 2021
  16. Mar 03, 2021
  17. Feb 24, 2021
  18. Jan 27, 2021
  19. Jan 25, 2021
  20. Dec 03, 2020
    • Roman Gushchin's avatar
      mm: memcontrol: Use helpers to read page's memcg data · bcfe06bf
      Roman Gushchin authored
      
      Patch series "mm: allow mapping accounted kernel pages to userspace", v6.
      
      Currently a non-slab kernel page which has been charged to a memory cgroup
      can't be mapped to userspace.  The underlying reason is simple: PageKmemcg
      flag is defined as a page type (like buddy, offline, etc), so it takes a
      bit from a page->mapped counter.  Pages with a type set can't be mapped to
      userspace.
      
      But in general the kmemcg flag has nothing to do with mapping to
      userspace.  It only means that the page has been accounted by the page
      allocator, so it has to be properly uncharged on release.
      
      Some bpf maps are mapping the vmalloc-based memory to userspace, and their
      memory can't be accounted because of this implementation detail.
      
      This patchset removes this limitation by moving the PageKmemcg flag into
      one of the free bits of the page->mem_cgroup pointer.  Also it formalizes
      accesses to the page->mem_cgroup and page->obj_cgroups using new helpers,
      adds several checks and removes a couple of obsolete functions.  As the
      result the code became more robust with fewer open-coded bit tricks.
      
      This patch (of 4):
      
      Currently there are many open-coded reads of the page->mem_cgroup pointer,
      as well as a couple of read helpers, which are barely used.
      
      It creates an obstacle on a way to reuse some bits of the pointer for
      storing additional bits of information.  In fact, we already do this for
      slab pages, where the last bit indicates that a pointer has an attached
      vector of objcg pointers instead of a regular memcg pointer.
      
      This commits uses 2 existing helpers and introduces a new helper to
      converts all read sides to calls of these helpers:
        struct mem_cgroup *page_memcg(struct page *page);
        struct mem_cgroup *page_memcg_rcu(struct page *page);
        struct mem_cgroup *page_memcg_check(struct page *page);
      
      page_memcg_check() is intended to be used in cases when the page can be a
      slab page and have a memcg pointer pointing at objcg vector.  It does
      check the lowest bit, and if set, returns NULL.  page_memcg() contains a
      VM_BUG_ON_PAGE() check for the page not being a slab page.
      
      To make sure nobody uses a direct access, struct page's
      mem_cgroup/obj_cgroups is converted to unsigned long memcg_data.
      
      Signed-off-by: default avatarRoman Gushchin <guro@fb.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Reviewed-by: default avatarShakeel Butt <shakeelb@google.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Link: https://lkml.kernel.org/r/20201027001657.3398190-1-guro@fb.com
      Link: https://lkml.kernel.org/r/20201027001657.3398190-2-guro@fb.com
      Link: https://lore.kernel.org/bpf/20201201215900.3569844-2-guro@fb.com
      bcfe06bf
  21. Oct 14, 2020
  22. Sep 24, 2020
  23. Sep 04, 2020
    • Steven Price's avatar
      mm: Add arch hooks for saving/restoring tags · 8a84802e
      Steven Price authored
      
      Arm's Memory Tagging Extension (MTE) adds some metadata (tags) to
      every physical page, when swapping pages out to disk it is necessary to
      save these tags, and later restore them when reading the pages back.
      
      Add some hooks along with dummy implementations to enable the
      arch code to handle this.
      
      Three new hooks are added to the swap code:
       * arch_prepare_to_swap() and
       * arch_swap_invalidate_page() / arch_swap_invalidate_area().
      One new hook is added to shmem:
       * arch_swap_restore()
      
      Signed-off-by: default avatarSteven Price <steven.price@arm.com>
      [catalin.marinas@arm.com: add unlock_page() on the error path]
      [catalin.marinas@arm.com: dropped the _tags suffix]
      Signed-off-by: default avatarCatalin Marinas <catalin.marinas@arm.com>
      Acked-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      8a84802e
  24. Aug 15, 2020
  25. Aug 07, 2020
Loading