Skip to content
Snippets Groups Projects
  1. Jul 11, 2024
  2. Dec 12, 2023
  3. May 13, 2022
    • Dong Aisheng's avatar
      Revert "mm/cma.c: remove redundant cma_mutex lock" · 60a60e32
      Dong Aisheng authored
      This reverts commit a4efc174 which introduced a regression issue
      that when there're multiple processes allocating dma memory in parallel by
      calling dma_alloc_coherent(), it may fail sometimes as follows:
      
      Error log:
      cma: cma_alloc: linux,cma: alloc failed, req-size: 148 pages, ret: -16
      cma: number of available pages:
      3@125+20@172+12@236+4@380+32@736+17@2287+23@2473+20@36076+99@40477+108@40852+44@41108+20@41196+108@41364+108@41620+
      108@42900+108@43156+483@44061+1763@45341+1440@47712+20@49324+20@49388+5076@49452+2304@55040+35@58141+20@58220+20@58284+
      7188@58348+84@66220+7276@66452+227@74525+6371@75549=> 33161 free of 81920 total pages
      
      When issue happened, we saw there were still 33161 pages (129M) free CMA
      memory and a lot available free slots for 148 pages in CMA bitmap that we
      want to allocate.
      
      When dumping memory info, we found that there was also ~342M normal
      memory, but only 1352K CMA memory left in buddy system while a lot of
      pageblocks were isolated.
      
      Memory info log:
      Normal free:351096kB min:30000kB low:37500kB high:45000kB reserved_highatomic:0KB
      	    active_anon:98060kB inactive_anon:98948kB active_file:60864kB inactive_file:31776kB
      	    unevictable:0kB writepending:0kB present:1048576kB managed:1018328kB mlocked:0kB
      	    bounce:0kB free_pcp:220kB local_pcp:192kB free_cma:1352kB lowmem_reserve[]: 0 0 0
      Normal: 78*4kB (UECI) 1772*8kB (UMECI) 1335*16kB (UMECI) 360*32kB (UMECI) 65*64kB (UMCI)
      	36*128kB (UMECI) 16*256kB (UMCI) 6*512kB (EI) 8*1024kB (UEI) 4*2048kB (MI) 8*4096kB (EI)
      	8*8192kB (UI) 3*16384kB (EI) 8*32768kB (M) = 489288kB
      
      The root cause of this issue is that since commit a4efc174 ("mm/cma.c:
      remove redundant cma_mutex lock"), CMA supports concurrent memory
      allocation.  It's possible that the memory range process A trying to alloc
      has already been isolated by the allocation of process B during memory
      migration.
      
      The problem here is that the memory range isolated during one allocation
      by start_isolate_page_range() could be much bigger than the real size we
      want to alloc due to the range is aligned to MAX_ORDER_NR_PAGES.
      
      Taking an ARMv7 platform with 1G memory as an example, when
      MAX_ORDER_NR_PAGES is big (e.g.  32M with max_order 14) and CMA memory is
      relatively small (e.g.  128M), there're only 4 MAX_ORDER slot, then it's
      very easy that all CMA memory may have already been isolated by other
      processes when one trying to allocate memory using dma_alloc_coherent(). 
      Since current CMA code will only scan one time of whole available CMA
      memory, then dma_alloc_coherent() may easy fail due to contention with
      other processes.
      
      This patch simply falls back to the original method that using cma_mutex
      to make alloc_contig_range() run sequentially to avoid the issue.
      
      Link: https://lkml.kernel.org/r/20220509094551.3596244-1-aisheng.dong@nxp.com
      Link: https://lore.kernel.org/all/20220315144521.3810298-2-aisheng.dong@nxp.com/
      
      
      Fixes: a4efc174 ("mm/cma.c: remove redundant cma_mutex lock")
      Signed-off-by: default avatarDong Aisheng <aisheng.dong@nxp.com>
      Acked-by: default avatarMinchan Kim <minchan@kernel.org>
      Acked-by: default avatarDavid Hildenbrand <david@redhat.com>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Lecopzer Chen <lecopzer.chen@mediatek.com>
      Cc: Vlastimil Babka <vbabka@suse.cz>
      Cc: <stable@vger.kernel.org>	[5.11+]
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      60a60e32
  4. Mar 22, 2022
  5. Nov 06, 2021
  6. May 05, 2021
  7. Feb 26, 2021
  8. Dec 15, 2020
  9. Aug 12, 2020
  10. Jul 03, 2020
    • Barry Song's avatar
      mm/cma.c: use exact_nid true to fix possible per-numa cma leak · 40366bd7
      Barry Song authored
      
      Calling cma_declare_contiguous_nid() with false exact_nid for per-numa
      reservation can easily cause cma leak and various confusion.  For example,
      mm/hugetlb.c is trying to reserve per-numa cma for gigantic pages.  But it
      can easily leak cma and make users confused when system has memoryless
      nodes.
      
      In case the system has 4 numa nodes, and only numa node0 has memory.  if
      we set hugetlb_cma=4G in bootargs, mm/hugetlb.c will get 4 cma areas for 4
      different numa nodes.  since exact_nid=false in current code, all 4 numa
      nodes will get cma successfully from node0, but hugetlb_cma[1 to 3] will
      never be available to hugepage will only allocate memory from
      hugetlb_cma[0].
      
      In case the system has 4 numa nodes, both numa node0&2 has memory, other
      nodes have no memory.  if we set hugetlb_cma=4G in bootargs, mm/hugetlb.c
      will get 4 cma areas for 4 different numa nodes.  since exact_nid=false in
      current code, all 4 numa nodes will get cma successfully from node0 or 2,
      but hugetlb_cma[1] and [3] will never be available to hugepage as
      mm/hugetlb.c will only allocate memory from hugetlb_cma[0] and
      hugetlb_cma[2].  This causes permanent leak of the cma areas which are
      supposed to be used by memoryless node.
      
      Of cource we can workaround the issue by letting mm/hugetlb.c scan all cma
      areas in alloc_gigantic_page() even node_mask includes node0 only.  that
      means when node_mask includes node0 only, we can get page from
      hugetlb_cma[1] to hugetlb_cma[3].  But this will cause kernel crash in
      free_gigantic_page() while it wants to free page by:
      cma_release(hugetlb_cma[page_to_nid(page)], page, 1 << order)
      
      On the other hand, exact_nid=false won't consider numa distance, it might
      be not that useful to leverage cma areas on remote nodes.  I feel it is
      much simpler to make exact_nid true to make everything clear.  After that,
      memoryless nodes won't be able to reserve per-numa CMA from other nodes
      which have memory.
      
      Fixes: cf11e85f ("mm: hugetlb: optionally allocate gigantic hugepages using cma")
      Signed-off-by: default avatarBarry Song <song.bao.hua@hisilicon.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: default avatarRoman Gushchin <guro@fb.com>
      Cc: Jonathan Cameron <Jonathan.Cameron@huawei.com>
      Cc: Aslan Bakirov <aslan@fb.com>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Andreas Schaufler <andreas.schaufler@gmx.de>
      Cc: Mike Kravetz <mike.kravetz@oracle.com>
      Cc: Rik van Riel <riel@surriel.com>
      Cc: Joonsoo Kim <js1304@gmail.com>
      Cc: Robin Murphy <robin.murphy@arm.com>
      Cc: <stable@vger.kernel.org>
      Link: http://lkml.kernel.org/r/20200628074345.27228-1-song.bao.hua@hisilicon.com
      
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      40366bd7
  11. Apr 10, 2020
  12. Dec 01, 2019
  13. Jul 17, 2019
  14. May 24, 2019
  15. May 14, 2019
  16. Mar 12, 2019
    • Mike Rapoport's avatar
      memblock: emphasize that memblock_alloc_range() returns a physical address · 8a770c2a
      Mike Rapoport authored
      Rename memblock_alloc_range() to memblock_phys_alloc_range() to
      emphasize that it returns a physical address.
      
      While on it, remove the 'enum memblock_flags' parameter from this
      function as its only user anyway sets it to MEMBLOCK_NONE, which is the
      default for the most of memblock allocations.
      
      Link: http://lkml.kernel.org/r/1548057848-15136-6-git-send-email-rppt@linux.ibm.com
      
      
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Cc: Catalin Marinas <catalin.marinas@arm.com>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: "David S. Miller" <davem@davemloft.net>
      Cc: Dennis Zhou <dennis@kernel.org>
      Cc: Geert Uytterhoeven <geert@linux-m68k.org>
      Cc: Greentime Hu <green.hu@gmail.com>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Guan Xuetao <gxt@pku.edu.cn>
      Cc: Guo Ren <guoren@kernel.org>
      Cc: Guo Ren <ren_guo@c-sky.com>				[c-sky]
      Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
      Cc: Juergen Gross <jgross@suse.com>			[Xen]
      Cc: Mark Salter <msalter@redhat.com>
      Cc: Matt Turner <mattst88@gmail.com>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Michael Ellerman <mpe@ellerman.id.au>
      Cc: Michal Simek <monstr@monstr.eu>
      Cc: Paul Burton <paul.burton@mips.com>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Richard Weinberger <richard@nod.at>
      Cc: Rich Felker <dalias@libc.org>
      Cc: Rob Herring <robh+dt@kernel.org>
      Cc: Rob Herring <robh@kernel.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Tony Luck <tony.luck@intel.com>
      Cc: Vineet Gupta <vgupta@synopsys.com>
      Cc: Yoshinori Sato <ysato@users.sourceforge.jp>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8a770c2a
  17. Mar 06, 2019
  18. Dec 28, 2018
  19. Aug 17, 2018
  20. May 24, 2018
    • Joonsoo Kim's avatar
      Revert "mm/cma: manage the memory of the CMA area by using the ZONE_MOVABLE" · d883c6cf
      Joonsoo Kim authored
      
      This reverts the following commits that change CMA design in MM.
      
       3d2054ad ("ARM: CMA: avoid double mapping to the CMA area if CONFIG_HIGHMEM=y")
      
       1d47a3ec ("mm/cma: remove ALLOC_CMA")
      
       bad8c6c0 ("mm/cma: manage the memory of the CMA area by using the ZONE_MOVABLE")
      
      Ville reported a following error on i386.
      
        Inode-cache hash table entries: 65536 (order: 6, 262144 bytes)
        microcode: microcode updated early to revision 0x4, date = 2013-06-28
        Initializing CPU#0
        Initializing HighMem for node 0 (000377fe:00118000)
        Initializing Movable for node 0 (00000001:00118000)
        BUG: Bad page state in process swapper  pfn:377fe
        page:f53effc0 count:0 mapcount:-127 mapping:00000000 index:0x0
        flags: 0x80000000()
        raw: 80000000 00000000 00000000 ffffff80 00000000 00000100 00000200 00000001
        page dumped because: nonzero mapcount
        Modules linked in:
        CPU: 0 PID: 0 Comm: swapper Not tainted 4.17.0-rc5-elk+ #145
        Hardware name: Dell Inc. Latitude E5410/03VXMC, BIOS A15 07/11/2013
        Call Trace:
         dump_stack+0x60/0x96
         bad_page+0x9a/0x100
         free_pages_check_bad+0x3f/0x60
         free_pcppages_bulk+0x29d/0x5b0
         free_unref_page_commit+0x84/0xb0
         free_unref_page+0x3e/0x70
         __free_pages+0x1d/0x20
         free_highmem_page+0x19/0x40
         add_highpages_with_active_regions+0xab/0xeb
         set_highmem_pages_init+0x66/0x73
         mem_init+0x1b/0x1d7
         start_kernel+0x17a/0x363
         i386_start_kernel+0x95/0x99
         startup_32_smp+0x164/0x168
      
      The reason for this error is that the span of MOVABLE_ZONE is extended
      to whole node span for future CMA initialization, and, normal memory is
      wrongly freed here.  I submitted the fix and it seems to work, but,
      another problem happened.
      
      It's so late time to fix the later problem so I decide to reverting the
      series.
      
      Reported-by: default avatarVille Syrjälä <ville.syrjala@linux.intel.com>
      Acked-by: default avatarLaura Abbott <labbott@redhat.com>
      Acked-by: default avatarMichal Hocko <mhocko@suse.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d883c6cf
  21. Apr 11, 2018
    • Joonsoo Kim's avatar
      mm/cma: manage the memory of the CMA area by using the ZONE_MOVABLE · bad8c6c0
      Joonsoo Kim authored
      Patch series "mm/cma: manage the memory of the CMA area by using the
      ZONE_MOVABLE", v2.
      
      0. History
      
      This patchset is the follow-up of the discussion about the "Introduce
      ZONE_CMA (v7)" [1].  Please reference it if more information is needed.
      
      1. What does this patch do?
      
      This patch changes the management way for the memory of the CMA area in
      the MM subsystem.  Currently the memory of the CMA area is managed by
      the zone where their pfn is belong to.  However, this approach has some
      problems since MM subsystem doesn't have enough logic to handle the
      situation that different characteristic memories are in a single zone.
      To solve this issue, this patch try to manage all the memory of the CMA
      area by using the MOVABLE zone.  In MM subsystem's point of view,
      characteristic of the memory on the MOVABLE zone and the memory of the
      CMA area are the same.  So, managing the memory of the CMA area by using
      the MOVABLE zone will not have any problem.
      
      2. Motivation
      
      There are some problems with current approach.  See following.  Although
      these problem would not be inherent and it could be fixed without this
      conception change, it requires many hooks addition in various code path
      and it would be intrusive to core MM and would be really error-prone.
      Therefore, I try to solve them with this new approach.  Anyway,
      following is the problems of the current implementation.
      
      o CMA memory utilization
      
      First, following is the freepage calculation logic in MM.
      
       - For movable allocation: freepage = total freepage
       - For unmovable allocation: freepage = total freepage - CMA freepage
      
      Freepages on the CMA area is used after the normal freepages in the zone
      where the memory of the CMA area is belong to are exhausted.  At that
      moment that the number of the normal freepages is zero, so
      
       - For movable allocation: freepage = total freepage = CMA freepage
       - For unmovable allocation: freepage = 0
      
      If unmovable allocation comes at this moment, allocation request would
      fail to pass the watermark check and reclaim is started.  After reclaim,
      there would exist the normal freepages so freepages on the CMA areas
      would not be used.
      
      FYI, there is another attempt [2] trying to solve this problem in lkml.
      And, as far as I know, Qualcomm also has out-of-tree solution for this
      problem.
      
      Useless reclaim:
      
      There is no logic to distinguish CMA pages in the reclaim path.  Hence,
      CMA page is reclaimed even if the system just needs the page that can be
      usable for the kernel allocation.
      
      Atomic allocation failure:
      
      This is also related to the fallback allocation policy for the memory of
      the CMA area.  Consider the situation that the number of the normal
      freepages is *zero* since the bunch of the movable allocation requests
      come.  Kswapd would not be woken up due to following freepage
      calculation logic.
      
      - For movable allocation: freepage = total freepage = CMA freepage
      
      If atomic unmovable allocation request comes at this moment, it would
      fails due to following logic.
      
      - For unmovable allocation: freepage = total freepage - CMA freepage = 0
      
      It was reported by Aneesh [3].
      
      Useless compaction:
      
      Usual high-order allocation request is unmovable allocation request and
      it cannot be served from the memory of the CMA area.  In compaction,
      migration scanner try to migrate the page in the CMA area and make
      high-order page there.  As mentioned above, it cannot be usable for the
      unmovable allocation request so it's just waste.
      
      3. Current approach and new approach
      
      Current approach is that the memory of the CMA area is managed by the
      zone where their pfn is belong to.  However, these memory should be
      distinguishable since they have a strong limitation.  So, they are
      marked as MIGRATE_CMA in pageblock flag and handled specially.  However,
      as mentioned in section 2, the MM subsystem doesn't have enough logic to
      deal with this special pageblock so many problems raised.
      
      New approach is that the memory of the CMA area is managed by the
      MOVABLE zone.  MM already have enough logic to deal with special zone
      like as HIGHMEM and MOVABLE zone.  So, managing the memory of the CMA
      area by the MOVABLE zone just naturally work well because constraints
      for the memory of the CMA area that the memory should always be
      migratable is the same with the constraint for the MOVABLE zone.
      
      There is one side-effect for the usability of the memory of the CMA
      area.  The use of MOVABLE zone is only allowed for a request with
      GFP_HIGHMEM && GFP_MOVABLE so now the memory of the CMA area is also
      only allowed for this gfp flag.  Before this patchset, a request with
      GFP_MOVABLE can use them.  IMO, It would not be a big issue since most
      of GFP_MOVABLE request also has GFP_HIGHMEM flag.  For example, file
      cache page and anonymous page.  However, file cache page for blockdev
      file is an exception.  Request for it has no GFP_HIGHMEM flag.  There is
      pros and cons on this exception.  In my experience, blockdev file cache
      pages are one of the top reason that causes cma_alloc() to fail
      temporarily.  So, we can get more guarantee of cma_alloc() success by
      discarding this case.
      
      Note that there is no change in admin POV since this patchset is just
      for internal implementation change in MM subsystem.  Just one minor
      difference for admin is that the memory stat for CMA area will be
      printed in the MOVABLE zone.  That's all.
      
      4. Result
      
      Following is the experimental result related to utilization problem.
      
      8 CPUs, 1024 MB, VIRTUAL MACHINE
      make -j16
      
      <Before>
        CMA area:               0 MB            512 MB
        Elapsed-time:           92.4		186.5
        pswpin:                 82		18647
        pswpout:                160		69839
      
      <After>
        CMA        :            0 MB            512 MB
        Elapsed-time:           93.1		93.4
        pswpin:                 84		46
        pswpout:                183		92
      
      akpm: "kernel test robot" reported a 26% improvement in
      vm-scalability.throughput:
      http://lkml.kernel.org/r/20180330012721.GA3845@yexl-desktop
      
      [1]: lkml.kernel.org/r/1491880640-9944-1-git-send-email-iamjoonsoo.kim@lge.com
      [2]: https://lkml.org/lkml/2014/10/15/623
      [3]: http://www.spinics.net/lists/linux-mm/msg100562.html
      
      Link: http://lkml.kernel.org/r/1512114786-5085-2-git-send-email-iamjoonsoo.kim@lge.com
      
      
      Signed-off-by: default avatarJoonsoo Kim <iamjoonsoo.kim@lge.com>
      Reviewed-by: default avatarAneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
      Tested-by: default avatarTony Lindgren <tony@atomide.com>
      Acked-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Laura Abbott <lauraa@codeaurora.org>
      Cc: Marek Szyprowski <m.szyprowski@samsung.com>
      Cc: Mel Gorman <mgorman@techsingularity.net>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: Michal Nazarewicz <mina86@mina86.com>
      Cc: Minchan Kim <minchan@kernel.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Will Deacon <will.deacon@arm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      bad8c6c0
  22. Apr 06, 2018
  23. Nov 16, 2017
  24. Oct 13, 2017
Loading