Skip to content
Snippets Groups Projects
  1. Jan 15, 2022
    • Suren Baghdasaryan's avatar
      mm/pagealloc: sysctl: change watermark_scale_factor max limit to 30% · 39c65a94
      Suren Baghdasaryan authored
      For embedded systems with low total memory, having to run applications
      with relatively large memory requirements, 10% max limitation for
      watermark_scale_factor poses an issue of triggering direct reclaim every
      time such application is started.  This results in slow application
      startup times and bad end-user experience.
      
      By increasing watermark_scale_factor max limit we allow vendors more
      flexibility to choose the right level of kswapd aggressiveness for their
      device and workload requirements.
      
      Link: https://lkml.kernel.org/r/20211124193604.2758863-1-surenb@google.com
      
      
      Signed-off-by: default avatarSuren Baghdasaryan <surenb@google.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Lukas Middendorf <kernel@tuxforce.de>
      Cc: Antti Palosaari <crope@iki.fi>
      Cc: Luis Chamberlain <mcgrof@kernel.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Iurii Zaikin <yzaikin@google.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Zhang Yi <yi.zhang@huawei.com>
      Cc: Fengfei Xi <xi.fengfei@h3c.com>
      Cc: Mike Rapoport <rppt@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      39c65a94
    • Pasha Tatashin's avatar
      mm: page table check · df4e817b
      Pasha Tatashin authored
      Check user page table entries at the time they are added and removed.
      
      Allows to synchronously catch memory corruption issues related to double
      mapping.
      
      When a pte for an anonymous page is added into page table, we verify
      that this pte does not already point to a file backed page, and vice
      versa if this is a file backed page that is being added we verify that
      this page does not have an anonymous mapping
      
      We also enforce that read-only sharing for anonymous pages is allowed
      (i.e.  cow after fork).  All other sharing must be for file pages.
      
      Page table check allows to protect and debug cases where "struct page"
      metadata became corrupted for some reason.  For example, when refcnt or
      mapcount become invalid.
      
      Link: https://lkml.kernel.org/r/20211221154650.1047963-4-pasha.tatashin@soleen.com
      
      
      Signed-off-by: default avatarPasha Tatashin <pasha.tatashin@soleen.com>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Frederic Weisbecker <frederic@kernel.org>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Slaby <jirislaby@kernel.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Masahiro Yamada <masahiroy@kernel.org>
      Cc: Mike Rapoport <rppt@kernel.org>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Cc: Paul Turner <pjt@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sami Tolvanen <samitolvanen@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wei Xu <weixugc@google.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      df4e817b
    • Pasha Tatashin's avatar
      mm: ptep_clear() page table helper · 08d5b29e
      Pasha Tatashin authored
      We have ptep_get_and_clear() and ptep_get_and_clear_full() helpers to
      clear PTE from user page tables, but there is no variant for simple
      clear of a present PTE from user page tables without using a low level
      pte_clear() which can be either native or para-virtualised.
      
      Add a new ptep_clear() that can be used in common code to clear PTEs
      from page table.  We will need this call later in order to add a hook
      for page table check.
      
      Link: https://lkml.kernel.org/r/20211221154650.1047963-3-pasha.tatashin@soleen.com
      
      
      Signed-off-by: default avatarPasha Tatashin <pasha.tatashin@soleen.com>
      Cc: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com>
      Cc: Dave Hansen <dave.hansen@linux.intel.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Frederic Weisbecker <frederic@kernel.org>
      Cc: Greg Thelen <gthelen@google.com>
      Cc: "H. Peter Anvin" <hpa@zytor.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Jiri Slaby <jirislaby@kernel.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Masahiro Yamada <masahiroy@kernel.org>
      Cc: Mike Rapoport <rppt@kernel.org>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Cc: Paul Turner <pjt@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Sami Tolvanen <samitolvanen@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Wei Xu <weixugc@google.com>
      Cc: Will Deacon <will@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      08d5b29e
    • Shuah Khan's avatar
      docs/vm: add vmalloced-kernel-stacks document · 4b8fec28
      Shuah Khan authored
      Add a new document to explain Virtually Mapped Kernel Stack Support.
      This is a compilation of information from the code and original patch
      series that introduced the Virtually Mapped Kernel Stacks feature.
      
      This document summarizes the feature and provides details on allocation,
      free, and stack overflow handling.  Provides reference to available
      tests.
      
      Link: https://lkml.kernel.org/r/20211215002004.47981-1-skhan@linuxfoundation.org
      
      
      Signed-off-by: default avatarShuah Khan <skhan@linuxfoundation.org>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Andy Lutomirski <luto@kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4b8fec28
    • Colin Cross's avatar
      mm: add a field to store names for private anonymous memory · 9a10064f
      Colin Cross authored
      In many userspace applications, and especially in VM based applications
      like Android uses heavily, there are multiple different allocators in
      use.  At a minimum there is libc malloc and the stack, and in many cases
      there are libc malloc, the stack, direct syscalls to mmap anonymous
      memory, and multiple VM heaps (one for small objects, one for big
      objects, etc.).  Each of these layers usually has its own tools to
      inspect its usage; malloc by compiling a debug version, the VM through
      heap inspection tools, and for direct syscalls there is usually no way
      to track them.
      
      On Android we heavily use a set of tools that use an extended version of
      the logic covered in Documentation/vm/pagemap.txt to walk all pages
      mapped in userspace and slice their usage by process, shared (COW) vs.
      unique mappings, backing, etc.  This can account for real physical
      memory usage even in cases like fork without exec (which Android uses
      heavily to share as many private COW pages as possible between
      processes), Kernel SamePage Merging, and clean zero pages.  It produces
      a measurement of the pages that only exist in that process (USS, for
      unique), and a measurement of the physical memory usage of that process
      with the cost of shared pages being evenly split between processes that
      share them (PSS).
      
      If all anonymous memory is indistinguishable then figuring out the real
      physical memory usage (PSS) of each heap requires either a pagemap
      walking tool that can understand the heap debugging of every layer, or
      for every layer's heap debugging tools to implement the pagemap walking
      logic, in which case it is hard to get a consistent view of memory
      across the whole system.
      
      Tracking the information in userspace leads to all sorts of problems.
      It either needs to be stored inside the process, which means every
      process has to have an API to export its current heap information upon
      request, or it has to be stored externally in a filesystem that somebody
      needs to clean up on crashes.  It needs to be readable while the process
      is still running, so it has to have some sort of synchronization with
      every layer of userspace.  Efficiently tracking the ranges requires
      reimplementing something like the kernel vma trees, and linking to it
      from every layer of userspace.  It requires more memory, more syscalls,
      more runtime cost, and more complexity to separately track regions that
      the kernel is already tracking.
      
      This patch adds a field to /proc/pid/maps and /proc/pid/smaps to show a
      userspace-provided name for anonymous vmas.  The names of named
      anonymous vmas are shown in /proc/pid/maps and /proc/pid/smaps as
      [anon:<name>].
      
      Userspace can set the name for a region of memory by calling
      
         prctl(PR_SET_VMA, PR_SET_VMA_ANON_NAME, start, len, (unsigned long)name)
      
      Setting the name to NULL clears it.  The name length limit is 80 bytes
      including NUL-terminator and is checked to contain only printable ascii
      characters (including space), except '[',']','\','$' and '`'.
      
      Ascii strings are being used to have a descriptive identifiers for vmas,
      which can be understood by the users reading /proc/pid/maps or
      /proc/pid/smaps.  Names can be standardized for a given system and they
      can include some variable parts such as the name of the allocator or a
      library, tid of the thread using it, etc.
      
      The name is stored in a pointer in the shared union in vm_area_struct
      that points to a null terminated string.  Anonymous vmas with the same
      name (equivalent strings) and are otherwise mergeable will be merged.
      The name pointers are not shared between vmas even if they contain the
      same name.  The name pointer is stored in a union with fields that are
      only used on file-backed mappings, so it does not increase memory usage.
      
      CONFIG_ANON_VMA_NAME kernel configuration is introduced to enable this
      feature.  It keeps the feature disabled by default to prevent any
      additional memory overhead and to avoid confusing procfs parsers on
      systems which are not ready to support named anonymous vmas.
      
      The patch is based on the original patch developed by Colin Cross, more
      specifically on its latest version [1] posted upstream by Sumit Semwal.
      It used a userspace pointer to store vma names.  In that design, name
      pointers could be shared between vmas.  However during the last
      upstreaming attempt, Kees Cook raised concerns [2] about this approach
      and suggested to copy the name into kernel memory space, perform
      validity checks [3] and store as a string referenced from
      vm_area_struct.
      
      One big concern is about fork() performance which would need to strdup
      anonymous vma names.  Dave Hansen suggested experimenting with
      worst-case scenario of forking a process with 64k vmas having longest
      possible names [4].  I ran this experiment on an ARM64 Android device
      and recorded a worst-case regression of almost 40% when forking such a
      process.
      
      This regression is addressed in the followup patch which replaces the
      pointer to a name with a refcounted structure that allows sharing the
      name pointer between vmas of the same name.  Instead of duplicating the
      string during fork() or when splitting a vma it increments the refcount.
      
      [1] https://lore.kernel.org/linux-mm/20200901161459.11772-4-sumit.semwal@linaro.org/
      [2] https://lore.kernel.org/linux-mm/202009031031.D32EF57ED@keescook/
      [3] https://lore.kernel.org/linux-mm/202009031022.3834F692@keescook/
      [4] https://lore.kernel.org/linux-mm/5d0358ab-8c47-2f5f-8e43-23b89d6a8e95@intel.com/
      
      Changes for prctl(2) manual page (in the options section):
      
      PR_SET_VMA
      	Sets an attribute specified in arg2 for virtual memory areas
      	starting from the address specified in arg3 and spanning the
      	size specified	in arg4. arg5 specifies the value of the attribute
      	to be set. Note that assigning an attribute to a virtual memory
      	area might prevent it from being merged with adjacent virtual
      	memory areas due to the difference in that attribute's value.
      
      	Currently, arg2 must be one of:
      
      	PR_SET_VMA_ANON_NAME
      		Set a name for anonymous virtual memory areas. arg5 should
      		be a pointer to a null-terminated string containing the
      		name. The name length including null byte cannot exceed
      		80 bytes. If arg5 is NULL, the name of the appropriate
      		anonymous virtual memory areas will be reset. The name
      		can contain only printable ascii characters (including
                      space), except '[',']','\','$' and '`'.
      
                      This feature is available only if the kernel is built with
                      the CONFIG_ANON_VMA_NAME option enabled.
      
      [surenb@google.com: docs: proc.rst: /proc/PID/maps: fix malformed table]
        Link: https://lkml.kernel.org/r/20211123185928.2513763-1-surenb@google.com
      [surenb: rebased over v5.15-rc6, replaced userpointer with a kernel copy,
       added input sanitization and CONFIG_ANON_VMA_NAME config. The bulk of the
       work here was done by Colin Cross, therefore, with his permission, keeping
       him as the author]
      
      Link: https://lkml.kernel.org/r/20211019215511.3771969-2-surenb@google.com
      
      
      Signed-off-by: default avatarColin Cross <ccross@google.com>
      Signed-off-by: default avatarSuren Baghdasaryan <surenb@google.com>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Cyrill Gorcunov <gorcunov@openvz.org>
      Cc: Dave Hansen <dave.hansen@intel.com>
      Cc: David Rientjes <rientjes@google.com>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Hugh Dickins <hughd@google.com>
      Cc: Ingo Molnar <mingo@kernel.org>
      Cc: Jan Glauber <jan.glauber@gmail.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: John Stultz <john.stultz@linaro.org>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rob Landley <rob@landley.net>
      Cc: "Serge E. Hallyn" <serge.hallyn@ubuntu.com>
      Cc: Shaohua Li <shli@fusionio.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9a10064f
    • Shakeel Butt's avatar
      memcg: add per-memcg vmalloc stat · 4e5aa1f4
      Shakeel Butt authored
      The kvmalloc* allocation functions can fallback to vmalloc allocations
      and more often on long running machines.  In addition the kernel does
      have __GFP_ACCOUNT kvmalloc* calls.  So, often on long running machines,
      the memory.stat does not tell the complete picture which type of memory
      is charged to the memcg.  So add a per-memcg vmalloc stat.
      
      [shakeelb@google.com: page_memcg() within rcu lock, per Muchun]
        Link: https://lkml.kernel.org/r/20211222052457.1960701-1-shakeelb@google.com
      [akpm@linux-foundation.org: remove cast, per Muchun]
      [shakeelb@google.com: remove area->page[0] checks and move to page by page accounting per Michal]
        Link: https://lkml.kernel.org/r/20220104222341.3972772-1-shakeelb@google.com
      
      Link: https://lkml.kernel.org/r/20211221215336.1922823-1-shakeelb@google.com
      
      
      Signed-off-by: default avatarShakeel Butt <shakeelb@google.com>
      Acked-by: default avatarRoman Gushchin <guro@fb.com>
      Reviewed-by: default avatarMuchun Song <songmuchun@bytedance.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      4e5aa1f4
    • Dan Schatzberg's avatar
      mm/memcg: add oom_group_kill memory event · b6bf9abb
      Dan Schatzberg authored
      Our container agent wants to know when a container exits if it was OOM
      killed or not to report to the user.  We use memory.oom.group = 1 to
      ensure that OOM kills within the container's cgroup kill everything.
      Existing memory.events are insufficient for knowing if this triggered:
      
      1) Our current approach reads memory.events oom_kill and reports the
         container was killed if the value is non-zero. This is erroneous in
         some cases where containers create their children cgroups with
         memory.oom.group=1 as such OOM kills will get counted against the
         container cgroup's oom_kill counter despite not actually OOM killing
         the entire container.
      
      2) Reading memory.events.local will fail to identify OOM kills in leaf
         cgroups (that don't set memory.oom.group) within the container
         cgroup.
      
      This patch adds a new oom_group_kill event when memory.oom.group
      triggers to allow userspace to cleanly identify when an entire cgroup is
      oom killed.
      
      [schatzberg.dan@gmail.com: changes from Johannes and Chris]
        Link: https://lkml.kernel.org/r/20211213162511.2492267-1-schatzberg.dan@gmail.com
      
      Link: https://lkml.kernel.org/r/20211203162426.3375036-1-schatzberg.dan@gmail.com
      
      
      Signed-off-by: default avatarDan Schatzberg <schatzberg.dan@gmail.com>
      Reviewed-by: default avatarRoman Gushchin <guro@fb.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Acked-by: default avatarChris Down <chris@chrisdown.name>
      Reviewed-by: default avatarShakeel Butt <shakeelb@google.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Zefan Li <lizefan.x@bytedance.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: Matthew Wilcox (Oracle) <willy@infradead.org>
      Cc: Muchun Song <songmuchun@bytedance.com>
      Cc: Alex Shi <alexs@kernel.org>
      Cc: Wei Yang <richard.weiyang@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b6bf9abb
    • Anshuman Khandual's avatar
      mm/debug_vm_pgtable: update comments regarding migration swap entries · 23647618
      Anshuman Khandual authored
      Commit 4dd845b5 ("mm/swapops: rework swap entry manipulation code")
      had changed migtation entry related helpers.  Just update
      debug_vm_pgatble() synced documentation to reflect those changes.
      
      Link: https://lkml.kernel.org/r/1641880417-24848-1-git-send-email-anshuman.khandual@arm.com
      
      
      Signed-off-by: default avatarAnshuman Khandual <anshuman.khandual@arm.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      23647618
  2. Dec 31, 2021
  3. Dec 30, 2021
  4. Dec 27, 2021
  5. Dec 22, 2021
  6. Dec 21, 2021
  7. Dec 20, 2021
  8. Dec 15, 2021
  9. Dec 14, 2021
  10. Dec 06, 2021
  11. Dec 03, 2021
    • Rob Herring's avatar
      dt-bindings: media: nxp,imx7-mipi-csi2: Drop bad if/then schema · b54472a0
      Rob Herring authored
      
      The if/then schema for 'data-lanes' doesn't work as 'compatible' is at a
      different level than 'data-lanes'. To make it work, the if/then schema
      would have to be moved to the top level and then whole hierarchy of
      nodes down to 'data-lanes' created. I don't think it is worth the
      complexity to do that, so let's just drop it.
      
      The error in this schema is masked by a fixup in the tools causing the
      'allOf' to get overwritten. Removing the fixup as part of moving to
      json-schema draft 2019-09 revealed the issue:
      
      Documentation/devicetree/bindings/media/nxp,imx7-mipi-csi2.example.dt.yaml: mipi-csi@30750000: ports:port@0:endpoint:data-lanes:0: [1] is too short
      	From schema: /builds/robherring/linux-dt-review/Documentation/devicetree/bindings/media/nxp,imx7-mipi-csi2.yaml
      Documentation/devicetree/bindings/media/nxp,imx7-mipi-csi2.example.dt.yaml: mipi-csi@32e30000: ports:port@0:endpoint:data-lanes:0: [1, 2, 3, 4] is too long
      	From schema: /builds/robherring/linux-dt-review/Documentation/devicetree/bindings/media/nxp,imx7-mipi-csi2.yaml
      
      The if condition was always true because 'compatible' did not exist in
      'endpoint' node and a non-existent property is true for json-schema.
      
      Fixes: 85b62ff2 ("media: dt-bindings: media: nxp,imx7-mipi-csi2: Add i.MX8MM support")
      Cc: Rui Miguel Silva <rmfrfs@gmail.com>
      Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com>
      Cc: Mauro Carvalho Chehab <mchehab@kernel.org>
      Cc: Shawn Guo <shawnguo@kernel.org>
      Cc: Sascha Hauer <s.hauer@pengutronix.de>
      Cc: Pengutronix Kernel Team <kernel@pengutronix.de>
      Cc: Fabio Estevam <festevam@gmail.com>
      Cc: NXP Linux Team <linux-imx@nxp.com>
      Cc: linux-media@vger.kernel.org
      Cc: linux-arm-kernel@lists.infradead.org
      Signed-off-by: default avatarRob Herring <robh@kernel.org>
      Reviewed-by: default avatarLaurent Pinchart <laurent.pinchart@ideasonboard.com>
      Acked-by: default avatarRui Miguel Silva <rmfrfs@gmail.com>
      Link: https://lore.kernel.org/r/20211203164828.187642-1-robh@kernel.org
      b54472a0
  12. Dec 02, 2021
  13. Dec 01, 2021
  14. Nov 30, 2021
  15. Nov 29, 2021
  16. Nov 26, 2021
  17. Nov 25, 2021
  18. Nov 23, 2021
Loading