- Mar 06, 2024
-
-
Peter Xu authored
There is an old trick in shadow_mapped() to use pXd_bad() to detect huge pages. After commit 93fab1b2 ("mm: add generic p?d_leaf() macros") we have a global API for huge mappings. Use that to replace the trick. Link: https://lkml.kernel.org/r/20240305043750.93762-7-peterx@redhat.com Signed-off-by:
Peter Xu <peterx@redhat.com> Reviewed-by:
Mike Rapoport (IBM) <rppt@kernel.org> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@kernel.org> Cc: Borislav Petkov <bp@alien8.de> Cc: Christophe Leroy <christophe.leroy@csgroup.eu> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jason Gunthorpe <jgg@nvidia.com> Cc: Kirill A. Shutemov <kirill@shutemov.name> Cc: Michael Ellerman <mpe@ellerman.id.au> Cc: Muchun Song <muchun.song@linux.dev> Cc: "Naveen N. Rao" <naveen.n.rao@linux.ibm.com> Cc: Nicholas Piggin <npiggin@gmail.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Yang Shi <shy828301@gmail.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Chengming Zhou authored
Commit bf9b7df2 ("mm/zswap: global lru and shrinker shared by all zswap_pools") introduced a new lock to protect zswap_next_shrink, instead of reusing zswap_pools_lock. But the problem is that it's initialized only when zswap enabled, which causes bug if zswap_memcg_offline_cleanup() called without zswap enabled. Fix it by using DEFINE_SPINLOCK() to statically initialize them and define them as multiple static variables to keep in consistent with the existing global variables in zswap. Link: https://lkml.kernel.org/r/20240305075345.1493214-1-chengming.zhou@linux.dev Fixes: bf9b7df2 ("mm/zswap: global lru and shrinker shared by all zswap_pools") Reported-by:
kernel test robot <oliver.sang@intel.com> Closes: https://lore.kernel.org/oe-lkp/202403051008.a8cf8a94-lkp@intel.com Signed-off-by:
Chengming Zhou <chengming.zhou@linux.dev> Acked-by:
Johannes Weiner <hannes@cmpxchg.org> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Yosry Ahmed <yosryahmed@google.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Kefeng Wang authored
The rounddown_pow_of_two(0) is undefined, so val = 0 is not allowed in the fault_around_bytes_set(), and leads to shift-out-of-bounds, UBSAN: shift-out-of-bounds in include/linux/log2.h:67:13 shift exponent 4294967295 is too large for 64-bit type 'long unsigned int' CPU: 7 PID: 107 Comm: sh Not tainted 6.8.0-rc6-next-20240301 #294 Hardware name: QEMU QEMU Virtual Machine, BIOS 0.0.0 02/06/2015 Call trace: dump_backtrace+0x94/0xec show_stack+0x18/0x24 dump_stack_lvl+0x78/0x90 dump_stack+0x18/0x24 ubsan_epilogue+0x10/0x44 __ubsan_handle_shift_out_of_bounds+0x98/0x134 fault_around_bytes_set+0xa4/0xb0 simple_attr_write_xsigned.isra.0+0xe4/0x1ac simple_attr_write+0x18/0x24 debugfs_attr_write+0x4c/0x98 vfs_write+0xd0/0x4b0 ksys_write+0x6c/0xfc __arm64_sys_write+0x1c/0x28 invoke_syscall+0x44/0x104 el0_svc_common.constprop.0+0x40/0xe0 do_el0_svc+0x1c/0x28 el0_svc+0x34/0xdc el0t_64_sync_handler+0xc0/0xc4 el0t_64_sync+0x190/0x194 ---[ end trace ]--- Fix it by setting the minimum val to PAGE_SIZE. Link: https://lkml.kernel.org/r/20240302064312.2358924-1-wangkefeng.wang@huawei.com Fixes: 53d36a56 ("mm: prefer fault_around_pages to fault_around_bytes") Signed-off-by:
Kefeng Wang <wangkefeng.wang@huawei.com> Reported-by:
Yue Sun <samsun1006219@gmail.com> Closes: https://lore.kernel.org/all/CAEkJfYPim6DQqW1GqCiHLdh2-eweqk1fGyXqs3JM+8e1qGge8w@mail.gmail.com/ Reviewed-by:
Lorenzo Stoakes <lstoakes@gmail.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Thorsten Blum authored
Fixes Coccinelle/coccicheck warning reported by do_div.cocci. Compared to do_div(), div64_ul() does not implicitly cast the divisor and does not unnecessarily calculate the remainder. Link: https://lkml.kernel.org/r/20240228224911.1164-2-thorsten.blum@toblux.com Signed-off-by:
Thorsten Blum <thorsten.blum@toblux.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Matthew Wilcox (Oracle) authored
We actually add folios to the pagelist already, but then work with them as pages. Removes a call to compound_head() in PageKsm() and removes a reference to page->index. Link: https://lkml.kernel.org/r/20240229153015.1996829-1-willy@infradead.org Signed-off-by:
Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by:
Gregory Price <gregory.price@memverge.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Barry Song authored
madvise, mprotect and some others might need folio_pte_batch to check if a range of PTEs are completely mapped to a large folio with contiguous physical addresses. Let's make it available in mm/internal.h. While at it, add proper kernel doc and sanity-check more input parameters using two additional VM_WARN_ON_FOLIO(). [21cnbao@gmail.com: build fix] Link: https://lkml.kernel.org/r/CAGsJ_4wWzG-37D82vqP_zt+Fcbz+URVe5oXLBc4M5wbN8A_gpQ@mail.gmail.com [david@redhat.com: improve the doc for the exported func] Link: https://lkml.kernel.org/r/20240227104201.337988-1-21cnbao@gmail.com Signed-off-by:
David Hildenbrand <david@redhat.com> Signed-off-by:
Barry Song <v-songbaohua@oppo.com> Suggested-by:
David Hildenbrand <david@redhat.com> Reviewed-by:
Ryan Roberts <ryan.roberts@arm.com> Acked-by:
David Hildenbrand <david@redhat.com> Cc: Lance Yang <ioworker0@gmail.com> Cc: Yin Fengwei <fengwei.yin@intel.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Matthew Wilcox (Oracle) authored
Constify the flag tests that aren't automatically generated and the tests that look like flag tests but are more complicated. Link: https://lkml.kernel.org/r/20240227192337.757313-8-willy@infradead.org Signed-off-by:
Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by:
David Hildenbrand <david@redhat.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Matthew Wilcox (Oracle) authored
Now that __dump_page() takes a const argument, we can make dump_page() take a const struct page too. Link: https://lkml.kernel.org/r/20240227192337.757313-6-willy@infradead.org Signed-off-by:
Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by:
David Hildenbrand <david@redhat.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Matthew Wilcox (Oracle) authored
Turn __dump_page() into a wrapper around __dump_folio(). Snapshot the page & folio into a stack variable so we don't hit BUG_ON() if an allocation is freed under us and what was a folio pointer becomes a pointer to a tail page. [willy@infradead.org: fix build issue] Link: https://lkml.kernel.org/r/ZeAKCyTn_xS3O9cE@casper.infradead.org [willy@infradead.org: fix __dump_folio] Link: https://lkml.kernel.org/r/ZeJJegP8zM7S9GTy@casper.infradead.org [willy@infradead.org: fix pointer confusion] Link: https://lkml.kernel.org/r/ZeYa00ixxC4k1ot-@casper.infradead.org [akpm@linux-foundation.org: s/printk/pr_warn/] Link: https://lkml.kernel.org/r/20240227192337.757313-5-willy@infradead.org Signed-off-by:
Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Gang Li authored
Optimizing the initialization speed of 1G huge pages through parallelization. 1G hugetlbs are allocated from bootmem, a process that is already very fast and does not currently require optimization. Therefore, we focus on parallelizing only the initialization phase in `gather_bootmem_prealloc`. Here are some test results: test case no patch(ms) patched(ms) saved ------------------- -------------- ------------- -------- 256c2T(4 node) 1G 4745 2024 57.34% 128c1T(2 node) 1G 3358 1712 49.02% 12T 1G 77000 18300 76.23% [akpm@linux-foundation.org: s/initialied/initialized/, per Alexey] Link: https://lkml.kernel.org/r/20240222140422.393911-9-gang.li@linux.dev Signed-off-by:
Gang Li <ligang.bdlg@bytedance.com> Tested-by:
David Rientjes <rientjes@google.com> Reviewed-by:
Muchun Song <muchun.song@linux.dev> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: David Hildenbrand <david@redhat.com> Cc: Jane Chu <jane.chu@oracle.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Gang Li authored
By distributing both the allocation and the initialization tasks across multiple threads, the initialization of 2M hugetlb will be faster, thereby improving the boot speed. Here are some test results: test case no patch(ms) patched(ms) saved ------------------- -------------- ------------- -------- 256c2T(4 node) 2M 3336 1051 68.52% 128c1T(2 node) 2M 1943 716 63.15% Link: https://lkml.kernel.org/r/20240222140422.393911-8-gang.li@linux.dev Signed-off-by:
Gang Li <ligang.bdlg@bytedance.com> Tested-by:
David Rientjes <rientjes@google.com> Reviewed-by:
Muchun Song <muchun.song@linux.dev> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: David Hildenbrand <david@redhat.com> Cc: Jane Chu <jane.chu@oracle.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Tim Chen <tim.c.chen@linux.intel.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
different nodes Date: Thu, 22 Feb 2024 22:04:17 +0800 When a group of tasks that access different nodes are scheduled on the same node, they may encounter bandwidth bottlenecks and access latency. Thus, numa_aware flag is introduced here, allowing tasks to be distributed across different nodes to fully utilize the advantage of multi-node systems. Link: https://lkml.kernel.org/r/20240222140422.393911-5-gang.li@linux.dev Signed-off-by:
Gang Li <ligang.bdlg@bytedance.com> Tested-by:
David Rientjes <rientjes@google.com> Reviewed-by:
Muchun Song <muchun.song@linux.dev> Reviewed-by:
Tim Chen <tim.c.chen@linux.intel.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: David Hildenbrand <david@redhat.com> Cc: Jane Chu <jane.chu@oracle.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Gang Li authored
With parallelization of hugetlb allocation across different threads, each thread works on a differnet node to allocate pages from, instead of all allocating from a common node h->next_nid_to_alloc. To address this, it's necessary to assign a separate next_nid_to_alloc for each thread. Consequently, the hstate_next_node_to_alloc and for_each_node_mask_to_alloc have been modified to directly accept a *next_nid_to_alloc parameter, ensuring thread-specific allocation and avoiding concurrent access issues. Link: https://lkml.kernel.org/r/20240222140422.393911-4-gang.li@linux.dev Signed-off-by:
Gang Li <ligang.bdlg@bytedance.com> Tested-by:
David Rientjes <rientjes@google.com> Reviewed-by:
Tim Chen <tim.c.chen@linux.intel.com> Reviewed-by:
Muchun Song <muchun.song@linux.dev> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: David Hildenbrand <david@redhat.com> Cc: Jane Chu <jane.chu@oracle.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Gang Li authored
1G and 2M huge pages have different allocation and initialization logic, which leads to subtle differences in parallelization. Therefore, it is appropriate to split hugetlb_hstate_alloc_pages into gigantic and non-gigantic. This patch has no functional changes. Link: https://lkml.kernel.org/r/20240222140422.393911-3-gang.li@linux.dev Signed-off-by:
Gang Li <ligang.bdlg@bytedance.com> Tested-by:
David Rientjes <rientjes@google.com> Reviewed-by:
Tim Chen <tim.c.chen@linux.intel.com> Reviewed-by:
Muchun Song <muchun.song@linux.dev> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: David Hildenbrand <david@redhat.com> Cc: Jane Chu <jane.chu@oracle.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Gang Li authored
Patch series "hugetlb: parallelize hugetlb page init on boot", v6. Introduction ------------ Hugetlb initialization during boot takes up a considerable amount of time. For instance, on a 2TB system, initializing 1,800 1GB huge pages takes 1-2 seconds out of 10 seconds. Initializing 11,776 1GB pages on a 12TB Intel host takes more than 1 minute[1]. This is a noteworthy figure. Inspired by [2] and [3], hugetlb initialization can also be accelerated through parallelization. Kernel already has infrastructure like padata_do_multithreaded, this patch uses it to achieve effective results by minimal modifications. [1] https://lore.kernel.org/all/783f8bac-55b8-5b95-eb6a-11a583675000@google.com/ [2] https://lore.kernel.org/all/20200527173608.2885243-1-daniel.m.jordan@oracle.com/ [3] https://lore.kernel.org/all/20230906112605.2286994-1-usama.arif@bytedance.com/ [4] https://lore.kernel.org/all/76becfc1-e609-e3e8-2966-4053143170b6@google.com/ max_threads ----------- This patch use `padata_do_multithreaded` like this: ``` job.max_threads = num_node_state(N_MEMORY) * multiplier; padata_do_multithreaded(&job); ``` To fully utilize the CPU, the number of parallel threads needs to be carefully considered. `max_threads = num_node_state(N_MEMORY)` does not fully utilize the CPU, so we need to multiply it by a multiplier. Tests below indicate that a multiplier of 2 significantly improves performance, and although larger values also provide improvements, the gains are marginal. multiplier 1 2 3 4 5 ------------ ------- ------- ------- ------- ------- 256G 2node 358ms 215ms 157ms 134ms 126ms 2T 4node 979ms 679ms 543ms 489ms 481ms 50G 2node 71ms 44ms 37ms 30ms 31ms Therefore, choosing 2 as the multiplier strikes a good balance between enhancing parallel processing capabilities and maintaining efficient resource management. Test result ----------- test case no patch(ms) patched(ms) saved ------------------- -------------- ------------- -------- 256c2T(4 node) 1G 4745 2024 57.34% 128c1T(2 node) 1G 3358 1712 49.02% 12T 1G 77000 18300 76.23% 256c2T(4 node) 2M 3336 1051 68.52% 128c1T(2 node) 2M 1943 716 63.15% This patch (of 8): The readability of `hugetlb_hstate_alloc_pages` is poor. By cleaning the code, its readability can be improved, facilitating future modifications. This patch extracts two functions to reduce the complexity of `hugetlb_hstate_alloc_pages` and has no functional changes. - hugetlb_hstate_alloc_pages_node_specific() to handle iterates through each online node and performs allocation if necessary. - hugetlb_hstate_alloc_pages_report() report error during allocation. And the value of h->max_huge_pages is updated accordingly. Link: https://lkml.kernel.org/r/20240222140422.393911-1-gang.li@linux.dev Link: https://lkml.kernel.org/r/20240222140422.393911-2-gang.li@linux.dev Signed-off-by:
Gang Li <ligang.bdlg@bytedance.com> Tested-by:
David Rientjes <rientjes@google.com> Reviewed-by:
Muchun Song <muchun.song@linux.dev> Reviewed-by:
Tim Chen <tim.c.chen@linux.intel.com> Cc: Daniel Jordan <daniel.m.jordan@oracle.com> Cc: David Hildenbrand <david@redhat.com> Cc: Jane Chu <jane.chu@oracle.com> Cc: Paul E. McKenney <paulmck@kernel.org> Cc: Randy Dunlap <rdunlap@infradead.org> Cc: Steffen Klassert <steffen.klassert@secunet.com> Cc: Alexey Dobriyan <adobriyan@gmail.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
- Mar 05, 2024
-
-
Chengming Zhou authored
We will save allocated tag in the object header to indicate that it's allocated. handle |= OBJ_ALLOCATED_TAG; So the object header needs to reserve LSB for this tag bit. But the handle itself doesn't need to reserve LSB to save tag, since it's only used to find the position of object, by (pfn + obj_idx). So remove LSB reserve from handle, one more bit can be used as obj_idx. Link: https://lkml.kernel.org/r/20240228023854.3511239-1-chengming.zhou@linux.dev Signed-off-by:
Chengming Zhou <chengming.zhou@linux.dev> Reviewed-by:
Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Minchan Kim <minchan@kernel.org> Cc: Nhat Pham <nphamcs@gmail.com> Cc: Yosry Ahmed <yosryahmed@google.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
John Hubbard authored
do_numa_page() is reading from the same page table entry, twice, while holding the page table lock: once while checking that the pte hasn't changed, and again in order to modify the pte. Instead, just read the pte once, and save it in the same old_pte variable that already exists. This has no effect on behavior, other than to provide a tiny potential improvement to performance, by avoiding the redundant memory read (which the compiler cannot elide, due to READ_ONCE()). Also improve the associated comments nearby. Link: https://lkml.kernel.org/r/20240228034151.459370-1-jhubbard@nvidia.com Signed-off-by:
John Hubbard <jhubbard@nvidia.com> Reviewed-by:
David Hildenbrand <david@redhat.com> Reviewed-by:
Ryan Roberts <ryan.roberts@arm.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Richard Chang authored
alloc_contig_migrate_range has every information to be able to understand big contiguous allocation latency. For example, how many pages are migrated, how many times they were needed to unmap from page tables. This patch adds the trace event to collect the allocation statistics. In the field, it was quite useful to understand CMA allocation latency. [akpm@linux-foundation.org: a/trace_mm_alloc_config_migrate_range_info_enabled/trace_mm_alloc_contig_migrate_range_info_enabled] Link: https://lkml.kernel.org/r/20240228051127.2859472-1-richardycc@google.com Signed-off-by:
Richard Chang <richardycc@google.com> Reviewed-by:
Steven Rostedt (Google) <rostedt@goodmis.org.> Cc: Martin Liu <liumartin@google.com> Cc: "Masami Hiramatsu (Google)" <mhiramat@kernel.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Suren Baghdasaryan <surenb@google.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Matthew Wilcox (Oracle) authored
We already have a folio; use it instead of the head page where reasonable. Saves a couple of calls to compound_head() and elimimnates a few references to page->mapping. Link: https://lkml.kernel.org/r/20240228164326.1355045-1-willy@infradead.org Signed-off-by:
Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by:
Sidhartha Kumar <sidhartha.kumar@oracle.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Matthew Wilcox (Oracle) authored
All but one caller already has a folio, so convert free_page_and_swap_cache() to have a folio and remove the call to page_folio(). Link: https://lkml.kernel.org/r/20240227174254.710559-19-willy@infradead.org Signed-off-by:
Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by:
Ryan Roberts <ryan.roberts@arm.com> Reviewed-by:
David Hildenbrand <david@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Matthew Wilcox (Oracle) authored
These pages are all chained together through the lru list, so we know they're folios. Use the folio APIs to save three hidden calls to compound_head(). Link: https://lkml.kernel.org/r/20240227174254.710559-18-willy@infradead.org Signed-off-by:
Matthew Wilcox (Oracle) <willy@infradead.org> Cc: David Hildenbrand <david@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Matthew Wilcox (Oracle) authored
Process the pages in batch-sized quantities instead of all-at-once. Link: https://lkml.kernel.org/r/20240227174254.710559-17-willy@infradead.org Signed-off-by:
Matthew Wilcox (Oracle) <willy@infradead.org> Cc: David Hildenbrand <david@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Matthew Wilcox (Oracle) authored
All callers now use free_unref_folios() so we can delete this function. Link: https://lkml.kernel.org/r/20240227174254.710559-15-willy@infradead.org Signed-off-by:
Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by:
Ryan Roberts <ryan.roberts@arm.com> Cc: David Hildenbrand <david@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Matthew Wilcox (Oracle) authored
All users have been converted to mem_cgroup_uncharge_folios() so we can remove this API. Link: https://lkml.kernel.org/r/20240227174254.710559-14-willy@infradead.org Signed-off-by:
Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by:
Ryan Roberts <ryan.roberts@arm.com> Cc: David Hildenbrand <david@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Matthew Wilcox (Oracle) authored
The few folios which can't be moved to the LRU list (because their refcount dropped to zero) used to be returned to the caller to dispose of. Make this simpler to call by freeing the folios directly through free_unref_folios(). Link: https://lkml.kernel.org/r/20240227174254.710559-13-willy@infradead.org Signed-off-by:
Matthew Wilcox (Oracle) <willy@infradead.org> Cc: David Hildenbrand <david@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Matthew Wilcox (Oracle) authored
Use free_unref_page_batch() to free the folios. This may increase the number of IPIs from calling try_to_unmap_flush() more often, but that's going to be very workload-dependent. It may even reduce the number of IPIs as we now batch-free large folios instead of freeing them one at a time. Link: https://lkml.kernel.org/r/20240227174254.710559-12-willy@infradead.org Signed-off-by:
Matthew Wilcox (Oracle) <willy@infradead.org> Cc: Mel Gorman <mgorman@suse.de> Cc: David Hildenbrand <david@redhat.com> Cc: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Matthew Wilcox (Oracle) authored
Hugetlb folios still get special treatment, but normal large folios can now be freed by free_unref_folios(). This should have a reasonable performance impact, TBD. Link: https://lkml.kernel.org/r/20240227174254.710559-11-willy@infradead.org Signed-off-by:
Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by:
Ryan Roberts <ryan.roberts@arm.com> Cc: David Hildenbrand <david@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Matthew Wilcox (Oracle) authored
Call folio_undo_large_rmappable() if needed. free_unref_page_prepare() destroys the ability to call folio_order(), so stash the order in folio->private for the benefit of the second loop. Link: https://lkml.kernel.org/r/20240227174254.710559-10-willy@infradead.org Signed-off-by:
Matthew Wilcox (Oracle) <willy@infradead.org> Cc: David Hildenbrand <david@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Matthew Wilcox (Oracle) authored
Pass a pointer to the lruvec so we can take advantage of the folio_lruvec_relock_irqsave(). Adjust the calling convention of folio_lruvec_relock_irqsave() to suit and add a page_cache_release() wrapper. Link: https://lkml.kernel.org/r/20240227174254.710559-9-willy@infradead.org Signed-off-by:
Matthew Wilcox (Oracle) <willy@infradead.org> Cc: David Hildenbrand <david@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Matthew Wilcox (Oracle) authored
Break up the list of folios into batches here so that the folios are more likely to be cache hot when doing the rest of the processing. Link: https://lkml.kernel.org/r/20240227174254.710559-8-willy@infradead.org Signed-off-by:
Matthew Wilcox (Oracle) <willy@infradead.org> Cc: David Hildenbrand <david@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Matthew Wilcox (Oracle) authored
Instead of putting the interesting folios on a list, delete the uninteresting one from the folio_batch. Link: https://lkml.kernel.org/r/20240227174254.710559-7-willy@infradead.org Signed-off-by:
Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by:
Ryan Roberts <ryan.roberts@arm.com> Cc: David Hildenbrand <david@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Matthew Wilcox (Oracle) authored
Almost identical to mem_cgroup_uncharge_list(), except it takes a folio_batch instead of a list_head. Link: https://lkml.kernel.org/r/20240227174254.710559-6-willy@infradead.org Signed-off-by:
Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by:
Ryan Roberts <ryan.roberts@arm.com> Cc: David Hildenbrand <david@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Matthew Wilcox (Oracle) authored
There's no need to indirect through release_pages() and iterate over this batch of folios an extra time; we can just use the batch that we have. Link: https://lkml.kernel.org/r/20240227174254.710559-5-willy@infradead.org Signed-off-by:
Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by:
Ryan Roberts <ryan.roberts@arm.com> Cc: David Hildenbrand <david@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Matthew Wilcox (Oracle) authored
Iterate over a folio_batch rather than a linked list. This is easier for the CPU to prefetch and has a batch count naturally built in so we don't need to track it. Again, this lowers the maximum lock hold time from 32 folios to 15, but I do not expect this to have a significant effect. Link: https://lkml.kernel.org/r/20240227174254.710559-4-willy@infradead.org Signed-off-by:
Matthew Wilcox (Oracle) <willy@infradead.org> Cc: David Hildenbrand <david@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Matthew Wilcox (Oracle) authored
Most of its callees are not yet ready to accept a folio, but we know all of the pages passed in are actually folios because they're linked through ->lru. Link: https://lkml.kernel.org/r/20240227174254.710559-3-willy@infradead.org Signed-off-by:
Matthew Wilcox (Oracle) <willy@infradead.org> Reviewed-by:
Ryan Roberts <ryan.roberts@arm.com> Cc: David Hildenbrand <david@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Matthew Wilcox (Oracle) authored
Patch series "Rearrange batched folio freeing", v3. Other than the obvious "remove calls to compound_head" changes, the fundamental belief here is that iterating a linked list is much slower than iterating an array (5-15x slower in my testing). There's also an associated belief that since we iterate the batch of folios three times, we do better when the array is small (ie 15 entries) than we do with a batch that is hundreds of entries long, which only gives us the opportunity for the first pages to fall out of cache by the time we get to the end. It is possible we should increase the size of folio_batch. Hopefully the bots let us know if this introduces any performance regressions. This patch (of 3): By making release_pages() call folios_put(), we can get rid of the calls to compound_head() for the callers that already know they have folios. We can also get rid of the lock_batch tracking as we know the size of the batch is limited by folio_batch. This does reduce the maximum number of pages for which the lruvec lock is held, from SWAP_CLUSTER_MAX (32) to PAGEVEC_SIZE (15). I do not expect this to make a significant difference, but if it does, we can increase PAGEVEC_SIZE to 31. Link: https://lkml.kernel.org/r/20240227174254.710559-1-willy@infradead.org Link: https://lkml.kernel.org/r/20240227174254.710559-2-willy@infradead.org Signed-off-by:
Matthew Wilcox (Oracle) <willy@infradead.org> Cc: David Hildenbrand <david@redhat.com> Cc: Mel Gorman <mgorman@suse.de> Cc: Ryan Roberts <ryan.roberts@arm.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Lance Yang authored
Previously, we removed the mm from mm_slot and dropped mm_count if the MMF_THP_DISABLE flag was set. However, we didn't re-add the mm back after clearing the MMF_THP_DISABLE flag. Additionally, We add a check for the MMF_THP_DISABLE flag in hugepage_vma_revalidate(). Link: https://lkml.kernel.org/r/20240227035135.54593-1-ioworker0@gmail.com Fixes: 879c6000 ("mm/khugepaged: bypassing unnecessary scans with MMF_DISABLE_THP check") Signed-off-by:
Lance Yang <ioworker0@gmail.com> Suggested-by:
Yang Shi <shy828301@gmail.com> Reviewed-by:
Yang Shi <shy828301@gmail.com> Reviewed-by:
David Hildenbrand <david@redhat.com> Cc: Michal Hocko <mhocko@suse.com> Cc: Minchan Kim <minchan@kernel.org> Cc: Muchun Song <songmuchun@bytedance.com> Cc: Peter Xu <peterx@redhat.com> Cc: Zach O'Keefe <zokeefe@google.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
David Hildenbrand authored
Patch series "mm: remove total_mapcount()", v2. Let's remove the remaining user from mm/memfd.c so we can get rid of total_mapcount(). This patch (of 2): Both functions are the remaining users of total_mapcount(). Let's get rid of the calls by converting the code to folios. As it turns out, the code is unnecessarily complicated, especially: 1) We can query the number of pagecache references for a folio simply via folio_nr_pages(). This will handle other folio sizes in the future correctly. 2) The xas_set(xas, page->index + cache_count) call to increment the iterator for large folios is not required. Remove it. Further, simplify the XA_CHECK_SCHED check, counting each entry exactly once. Memfd pages can be swapped out when using shmem; leave xa_is_value() checks in place. Link: https://lkml.kernel.org/r/20240226141324.278526-1-david@redhat.com Link: https://lkml.kernel.org/r/20240226141324.278526-2-david@redhat.com Co-developed-by:
Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by:
Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by:
David Hildenbrand <david@redhat.com> Reviewed-by:
Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Zi Yan authored
It is used to test split_huge_page_to_list_to_order for pagecache THPs. Also add test cases for split_huge_page_to_list_to_order via both debugfs. [ziy@nvidia.com: fix issue discovered with NFS] Link: https://lkml.kernel.org/r/262E4DAA-4A78-4328-B745-1355AE356A07@nvidia.com Link: https://lkml.kernel.org/r/20240226205534.1603748-9-zi.yan@sent.com Signed-off-by:
Zi Yan <ziy@nvidia.com> Tested-by:
Aishwarya TCV <aishwarya.tcv@arm.com> Cc: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Michal Koutny <mkoutny@suse.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Yang Shi <shy828301@gmail.com> Cc: Yu Zhao <yuzhao@google.com> Cc: Zach O'Keefe <zokeefe@google.com> Cc: Aishwarya TCV <aishwarya.tcv@arm.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
Zi Yan authored
To split a THP to any lower order pages, we need to reform THPs on subpages at given order and add page refcount based on the new page order. Also we need to reinitialize page_deferred_list after removing the page from the split_queue, otherwise a subsequent split will see list corruption when checking the page_deferred_list again. Note: Anonymous order-1 folio is not supported because _deferred_list, which is used by partially mapped folios, is stored in subpage 2 and an order-1 folio only has subpage 0 and 1. File-backed order-1 folios are fine, since they do not use _deferred_list. [ziy@nvidia.com: fixup per discussion with Ryan] Link: https://lkml.kernel.org/r/494F48CD-1F0F-4CAD-884E-6D48F40AF990@nvidia.com Link: https://lkml.kernel.org/r/20240226205534.1603748-8-zi.yan@sent.com Signed-off-by:
Zi Yan <ziy@nvidia.com> Cc: David Hildenbrand <david@redhat.com> Cc: Hugh Dickins <hughd@google.com> Cc: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Luis Chamberlain <mcgrof@kernel.org> Cc: "Matthew Wilcox (Oracle)" <willy@infradead.org> Cc: Michal Koutny <mkoutny@suse.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Ryan Roberts <ryan.roberts@arm.com> Cc: Yang Shi <shy828301@gmail.com> Cc: Yu Zhao <yuzhao@google.com> Cc: Zach O'Keefe <zokeefe@google.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-