- Feb 09, 2023
-
-
Al Viro authored
[ Upstream commit de4eda9d ] READ/WRITE proved to be actively confusing - the meanings are "data destination, as used with read(2)" and "data source, as used with write(2)", but people keep interpreting those as "we read data from it" and "we write data to it", i.e. exactly the wrong way. Call them ITER_DEST and ITER_SOURCE - at least that is harder to misinterpret... Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk> Stable-dep-of: 6dd88fd5 ("vhost-scsi: unbreak any layout for response") Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
- Oct 03, 2022
-
-
Matthew Wilcox (Oracle) authored
Removes many calls to compound_head(). Link: https://lkml.kernel.org/r/20220902194653.1739778-41-willy@infradead.org Signed-off-by:
Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
- Sep 27, 2022
-
-
Yang Yang authored
Once upon a time, we only support accounting thrashing of page cache. Then Joonsoo introduced workingset detection for anonymous pages and we gained the ability to account thrashing of them[1]. Likes PSI, we count submission time as thrashing delay because when the device is congested, or the submitting cgroup IO-throttled, submission can be a significant part of overall IO time. Without this patch, swap thrashing through frontswap or some block device supporting rw_page operation isn't measured correctly. This patch is based on "delayacct: support re-entrance detection of thrashing accounting". [1] commit aae466b0 ("mm/swap: implement workingset detection for anonymous LRU") Link: https://lkml.kernel.org/r/20220815072835.74876-1-yang.yang29@zte.com.cn Signed-off-by:
Yang Yang <yang.yang29@zte.com.cn> Signed-off-by:
CGEL ZTE <cgel.zte@gmail.com> Reviewed-by:
Ran Xiaokai <ran.xiaokai@zte.com.cn> Reviewed-by:
wangyong <wang.yong12@zte.com.cn> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
- Sep 12, 2022
-
-
Christoph Hellwig authored
The argument is always set to end_swap_bio_write, so remove the argument and mark end_swap_bio_write static. Link: https://lkml.kernel.org/r/20220811141741.660214-1-hch@lst.de Signed-off-by:
Christoph Hellwig <hch@lst.de> Cc: Seth Jennings <sjenning@redhat.com> Cc: Dan Streetman <ddstreet@ieee.org> Cc: Vitaly Wool <vitaly.wool@konsulko.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
- May 10, 2022
-
-
NeilBrown authored
We need to use count_swpout_vm_event() for sio_write_complete() to get correct counting. Note that THP swap in (if it ever happens) is current accounted 1 for each page, whether HUGE or normal. This is different from swap-out accounting. This patch should be squashed into MM: handle THP in swap_*page_fs() Link: https://lkml.kernel.org/r/165146948934.24404.5909750610552745025@noble.neil.brown.name Signed-off-by:
NeilBrown <neilb@suse.de> Reported-by:
Miaohe Lin <linmiaohe@huawei.com> Reviewed-by:
Miaohe Lin <linmiaohe@huawei.com> Cc: Geert Uytterhoeven <geert+renesas@glider.be> Cc: Christoph Hellwig <hch@lst.de> Cc: Matthew Wilcox <willy@infradead.org> Cc: Yang Shi <shy828301@gmail.com> Cc: Huang Ying <ying.huang@intel.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
NeilBrown authored
Pages passed to swap_readpage()/swap_writepage() are not necessarily all the same size - there may be transparent-huge-pages involves. The BIO paths of swap_*page() handle this correctly, but the SWP_FS_OPS path does not. So we need to use thp_size() to find the size, not just assume PAGE_SIZE, and we need to track the total length of the request, not just assume it is "page * PAGE_SIZE". Link: https://lkml.kernel.org/r/165119301488.15698.9457662928942765453.stgit@noble.brown Signed-off-by:
NeilBrown <neilb@suse.de> Reported-by:
Miaohe Lin <linmiaohe@huawei.com> Cc: Christoph Hellwig <hch@lst.de> Cc: David Howells <dhowells@redhat.com> Cc: Geert Uytterhoeven <geert+renesas@glider.be> Cc: Hugh Dickins <hughd@google.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
NeilBrown authored
swap_writepage() is given one page at a time, but may be called repeatedly in succession. For block-device swapspace, the blk_plug functionality allows the multiple pages to be combined together at lower layers. That cannot be used for SWP_FS_OPS as blk_plug may not exist - it is only active when CONFIG_BLOCK=y. Consequently all swap reads over NFS are single page reads. With this patch we pass a pointer-to-pointer via the wbc. swap_writepage can store state between calls - much like the pointer passed explicitly to swap_readpage. After calling swap_writepage() some number of times, the state will be passed to swap_write_unplug() which can submit the combined request. Link: https://lkml.kernel.org/r/164859778128.29473.5191868522654408537.stgit@noble.brown Signed-off-by:
NeilBrown <neilb@suse.de> Reviewed-by:
Christoph Hellwig <hch@lst.de> Tested-by:
David Howells <dhowells@redhat.com> Tested-by:
Geert Uytterhoeven <geert+renesas@glider.be> Cc: Hugh Dickins <hughd@google.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
NeilBrown authored
swap_readpage() is given one page at a time, but may be called repeatedly in succession. For block-device swap-space, the blk_plug functionality allows the multiple pages to be combined together at lower layers. That cannot be used for SWP_FS_OPS as blk_plug may not exist - it is only active when CONFIG_BLOCK=y. Consequently all swap reads over NFS are single page reads. With this patch we pass in a pointer-to-pointer when swap_readpage can store state between calls - much like the effect of blk_plug. After calling swap_readpage() some number of times, the state will be passed to swap_read_unplug() which can submit the combined request. Link: https://lkml.kernel.org/r/164859778127.29473.14059420492644907783.stgit@noble.brown Signed-off-by:
NeilBrown <neilb@suse.de> Reviewed-by:
Christoph Hellwig <hch@lst.de> Tested-by:
David Howells <dhowells@redhat.com> Tested-by:
Geert Uytterhoeven <geert+renesas@glider.be> Cc: Hugh Dickins <hughd@google.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
NeilBrown authored
This patch switches swap-out to SWP_FS_OPS swap-spaces to use ->swap_rw and makes the writes asynchronous, like they are for other swap spaces. To make it async we need to allocate the kiocb struct from a mempool. This may block, but won't block as long as waiting for the write to complete. At most it will wait for some previous swap IO to complete. Link: https://lkml.kernel.org/r/164859778126.29473.12399585304843922231.stgit@noble.brown Signed-off-by:
NeilBrown <neilb@suse.de> Reviewed-by:
Christoph Hellwig <hch@lst.de> Tested-by:
David Howells <dhowells@redhat.com> Tested-by:
Geert Uytterhoeven <geert+renesas@glider.be> Cc: Hugh Dickins <hughd@google.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
NeilBrown authored
swap currently uses ->readpage to read swap pages. This can only request one page at a time from the filesystem, which is not most efficient. swap uses ->direct_IO for writes which while this is adequate is an inappropriate over-loading. ->direct_IO may need to had handle allocate space for holes or other details that are not relevant for swap. So this patch introduces a new address_space operation: ->swap_rw. In this patch it is used for reads, and a subsequent patch will switch writes to use it. No filesystem yet supports ->swap_rw, but that is not a problem because no filesystem actually works with filesystem-based swap. Only two filesystems set SWP_FS_OPS: - cifs sets the flag, but ->direct_IO always fails so swap cannot work. - nfs sets the flag, but ->direct_IO calls generic_write_checks() which has failed on swap files for several releases. To ensure that a NULL ->swap_rw isn't called, ->activate_swap() for both NFS and cifs are changed to fail if ->swap_rw is not set. This can be removed if/when the function is added. Future patches will restore swap-over-NFS functionality. To submit an async read with ->swap_rw() we need to allocate a structure to hold the kiocb and other details. swap_readpage() cannot handle transient failure, so we create a mempool to provide the structures. Link: https://lkml.kernel.org/r/164859778125.29473.13430559328221330589.stgit@noble.brown Signed-off-by:
NeilBrown <neilb@suse.de> Reviewed-by:
Christoph Hellwig <hch@lst.de> Tested-by:
David Howells <dhowells@redhat.com> Tested-by:
Geert Uytterhoeven <geert+renesas@glider.be> Cc: Hugh Dickins <hughd@google.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
NeilBrown authored
folios that are written to swap are owned by the MM subsystem - not any filesystem. When such a folio is passed to a filesystem to be written out to a swap-file, the filesystem handles the data, but the folio itself does not belong to the filesystem. So calling the filesystem's ->dirty_folio() address_space operation makes no sense. This is for folios in the given address space, and a folio to be written to swap does not exist in the given address space. So drop swap_dirty_folio() which calls the address-space's ->dirty_folio(), and always use noop_dirty_folio(), which is appropriate for folios being swapped out. Link: https://lkml.kernel.org/r/164859778123.29473.6900942583784889976.stgit@noble.brown Signed-off-by:
NeilBrown <neilb@suse.de> Reviewed-by:
Christoph Hellwig <hch@lst.de> Tested-by:
David Howells <dhowells@redhat.com> Tested-by:
Geert Uytterhoeven <geert+renesas@glider.be> Cc: Hugh Dickins <hughd@google.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
NeilBrown authored
Patch series "MM changes to improve swap-over-NFS support". Assorted improvements for swap-via-filesystem. This is a resend of these patches, rebased on current HEAD. The only substantial changes is that swap_dirty_folio has replaced swap_set_page_dirty. Currently swap-via-fs (SWP_FS_OPS) doesn't work for any filesystem. It has previously worked for NFS but that broke a few releases back. This series changes to use a new ->swap_rw rather than ->readpage and ->direct_IO. It also makes other improvements. There is a companion series already in linux-next which fixes various issues with NFS. Once both series land, a final patch is needed which changes NFS over to use ->swap_rw. This patch (of 10): Many functions declared in include/linux/swap.h are only used within mm/ Create a new "mm/swap.h" and move some of these declarations there. Remove the redundant 'extern' from the function declarations. [akpm@linux-foundation.org: mm/memory-failure.c needs mm/swap.h] Link: https://lkml.kernel.org/r/164859751830.29473.5309689752169286816.stgit@noble.brown Link: https://lkml.kernel.org/r/164859778120.29473.11725907882296224053.stgit@noble.brown Signed-off-by:
NeilBrown <neilb@suse.de> Reviewed-by:
Christoph Hellwig <hch@lst.de> Tested-by:
David Howells <dhowells@redhat.com> Tested-by:
Geert Uytterhoeven <geert+renesas@glider.be> Cc: Trond Myklebust <trond.myklebust@hammerspace.com> Cc: Hugh Dickins <hughd@google.com> Cc: Mel Gorman <mgorman@techsingularity.net> Cc: Miaohe Lin <linmiaohe@huawei.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org>
-
- May 09, 2022
-
-
Matthew Wilcox (Oracle) authored
This commit is split out so it can be dropped when resolving conflicts with Neil Brown's series to stop calling ->readpage in the swap code. Signed-off-by:
Matthew Wilcox (Oracle) <willy@infradead.org>
-
- May 02, 2022
-
-
Ming Lei authored
So far bio is marked as REQ_POLLED if RWF_HIPRI/IOCB_HIPRI is passed from userspace sync io interface, then block layer tries to poll until the bio is completed. But the current implementation calls blk_io_schedule() if bio_poll() returns 0, and this way causes io hang or timeout easily. But looks no one reports this kind of issue, which should have been triggered in normal io poll sanity test or blktests block/007 as observed by Changhui, that means it is very likely that no one uses it or no one cares it. Also after io_uring is invented, io poll for sync dio becomes legacy interface. So ignore RWF_HIPRI hint for sync dio. CC: linux-mm@kvack.org Cc: linux-xfs@vger.kernel.org Reported-by:
Changhui Zhong <czhong@redhat.com> Suggested-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Ming Lei <ming.lei@redhat.com> Tested-by:
Changhui Zhong <czhong@redhat.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Link: https://lore.kernel.org/r/20220420143110.2679002-1-ming.lei@redhat.com Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Apr 15, 2022
-
-
Minchan Kim authored
Two processes under CLONE_VM cloning, user process can be corrupted by seeing zeroed page unexpectedly. CPU A CPU B do_swap_page do_swap_page SWP_SYNCHRONOUS_IO path SWP_SYNCHRONOUS_IO path swap_readpage valid data swap_slot_free_notify delete zram entry swap_readpage zeroed(invalid) data pte_lock map the *zero data* to userspace pte_unlock pte_lock if (!pte_same) goto out_nomap; pte_unlock return and next refault will read zeroed data The swap_slot_free_notify is bogus for CLONE_VM case since it doesn't increase the refcount of swap slot at copy_mm so it couldn't catch up whether it's safe or not to discard data from backing device. In the case, only the lock it could rely on to synchronize swap slot freeing is page table lock. Thus, this patch gets rid of the swap_slot_free_notify function. With this patch, CPU A will see correct data. CPU A CPU B do_swap_page do_swap_page SWP_SYNCHRONOUS_IO path SWP_SYNCHRONOUS_IO path swap_readpage original data pte_lock map the original data swap_free swap_range_free bd_disk->fops->swap_slot_free_notify swap_readpage read zeroed data pte_unlock pte_lock if (!pte_same) goto out_nomap; pte_unlock return on next refault will see mapped data by CPU B The concern of the patch would increase memory consumption since it could keep wasted memory with compressed form in zram as well as uncompressed form in address space. However, most of cases of zram uses no readahead and do_swap_page is followed by swap_free so it will free the compressed form from in zram quickly. Link: https://lkml.kernel.org/r/YjTVVxIAsnKAXjTd@google.com Fixes: 0bcac06f ("mm, swap: skip swapcache for swapin of synchronous device") Reported-by:
Ivan Babrou <ivan@cloudflare.com> Tested-by:
Ivan Babrou <ivan@cloudflare.com> Signed-off-by:
Minchan Kim <minchan@kernel.org> Cc: Nitin Gupta <ngupta@vflare.org> Cc: Sergey Senozhatsky <senozhatsky@chromium.org> Cc: Jens Axboe <axboe@kernel.dk> Cc: David Hildenbrand <david@redhat.com> Cc: <stable@vger.kernel.org> [4.14+] Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Mar 22, 2022
-
-
Johannes Weiner authored
Once upon a time, all swapins counted toward memory pressure[1]. Then Joonsoo introduced workingset detection for anonymous pages and we gained the ability to distinguish hot from cold swapins[2][3]. But we failed to update swap_readpage() accordingly, and now we account partial memory pressure in the swapin path of cold memory. Not for all situations - which adds more inconsistency: paths using the conventional submit_bio() and lock_page() route will not see much pressure - unless storage itself is heavily congested and the bio submissions stall. ZRAM and ZSWAP do most of the work directly from swap_readpage() and will see all swapins reflected as pressure. IOW, a workload doing cold swapins could see little to no pressure reported with on-disk swap, but potentially high pressure with a zram or zswap backend. That confuses any psi-based health monitoring, load shedding, proactive reclaim, or userspace OOM killing schemes that might be in place for the workload. Restore consistency by making all swapin stall accounting conditional on the page actually being part of the workingset. [1] commit 93779069 ("mm/page_io.c: annotate refault stalls from swap_readpage") [2] commit aae466b0 ("mm/swap: implement workingset detection for anonymous LRU") [3] commit cad8320b ("mm/swap: don't SetPageWorkingset unconditionally during swapin") Link: https://lkml.kernel.org/r/20220214214921.419687-1-hannes@cmpxchg.org Signed-off-by:
Johannes Weiner <hannes@cmpxchg.org> Reported-by:
CGEL <cgel.zte@gmail.com> Acked-by:
Minchan Kim <minchan@kernel.org> Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com> Cc: Yu Zhao <yuzhao@google.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Mar 16, 2022
-
-
Matthew Wilcox (Oracle) authored
With all implementations converted to ->dirty_folio, we can stop calling this fallback method and remove it entirely. Signed-off-by:
Matthew Wilcox (Oracle) <willy@infradead.org> Tested-by:
Damien Le Moal <damien.lemoal@opensource.wdc.com> Acked-by:
Damien Le Moal <damien.lemoal@opensource.wdc.com> Tested-by: Mike Marshall <hubcap@omnibond.com> # orangefs Tested-by: David Howells <dhowells@redhat.com> # afs
-
Matthew Wilcox (Oracle) authored
This is a mechanical change. Signed-off-by:
Matthew Wilcox (Oracle) <willy@infradead.org> Tested-by:
Damien Le Moal <damien.lemoal@opensource.wdc.com> Acked-by:
Damien Le Moal <damien.lemoal@opensource.wdc.com> Tested-by: Mike Marshall <hubcap@omnibond.com> # orangefs Tested-by: David Howells <dhowells@redhat.com> # afs
-
- Mar 15, 2022
-
-
Matthew Wilcox (Oracle) authored
Straightforward conversion. Signed-off-by:
Matthew Wilcox (Oracle) <willy@infradead.org> Tested-by:
Damien Le Moal <damien.lemoal@opensource.wdc.com> Acked-by:
Damien Le Moal <damien.lemoal@opensource.wdc.com> Tested-by: Mike Marshall <hubcap@omnibond.com> # orangefs Tested-by: David Howells <dhowells@redhat.com> # afs
-
Matthew Wilcox (Oracle) authored
This replaces ->set_page_dirty(). It returns a bool instead of an int and takes the address_space as a parameter instead of expecting the implementations to retrieve the address_space from the page. This is particularly important for filesystems which use FS_OPS for swap. Signed-off-by:
Matthew Wilcox (Oracle) <willy@infradead.org> Tested-by:
Damien Le Moal <damien.lemoal@opensource.wdc.com> Acked-by:
Damien Le Moal <damien.lemoal@opensource.wdc.com> Tested-by: Mike Marshall <hubcap@omnibond.com> # orangefs Tested-by: David Howells <dhowells@redhat.com> # afs
-
- Feb 02, 2022
-
-
Christoph Hellwig authored
Pass the block_device and operation that we plan to use this bio for to bio_alloc to optimize the assignment. NULL/0 can be passed, both for the passthrough case on a raw request_queue and to temporarily avoid refactoring some nasty code. Also move the gfp_mask argument after the nr_vecs argument for a much more logical calling convention matching what most of the kernel does. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Chaitanya Kulkarni <kch@nvidia.com> Link: https://lore.kernel.org/r/20220124091107.642561-18-hch@lst.de Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Jan 20, 2022
-
-
Yang Yang authored
Currently delayacct accounts swapin delay only for swapping that cause blkio. If we use zram for swapping, tools/accounting/getdelays can't get any SWAP delay. It's useful to get zram swapin delay information, for example to adjust compress algorithm or /proc/sys/vm/swappiness. Reference to PSI, it accounts any kind of swapping by doing its work in swap_readpage(), no matter whether swapping causes blkio. Let delayacct do the similar work. Link: https://lkml.kernel.org/r/20211112083813.8559-1-yang.yang29@zte.com.cn Signed-off-by:
Yang Yang <yang.yang29@zte.com.cn> Reported-by:
Zeal Robot <zealci@zte.com.cn> Cc: Balbir Singh <bsingharora@gmail.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Oct 18, 2021
-
-
Jens Axboe authored
struct io_comp_batch contains a list head and a completion handler, which will allow completions to more effciently completed batches of IO. For now, no functional changes in this patch, we just define the io_comp_batch structure and add the argument to the file_operations iopoll handler. Reviewed-by:
Christoph Hellwig <hch@lst.de> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Replace the blk_poll interface that requires the caller to keep a queue and cookie from the submissions with polling based on the bio. Polling for the bio itself leads to a few advantages: - the cookie construction can made entirely private in blk-mq.c - the caller does not need to remember the request_queue and cookie separately and thus sidesteps their lifetime issues - keeping the device and the cookie inside the bio allows to trivially support polling BIOs remapping by stacking drivers - a lot of code to propagate the cookie back up the submission path can be removed entirely. Signed-off-by:
Christoph Hellwig <hch@lst.de> Tested-by:
Mark Wunderlich <mark.wunderlich@intel.com> Link: https://lore.kernel.org/r/20211012111226.760968-15-hch@lst.de Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Unlike the RWF_HIPRI userspace ABI which is intentionally kept vague, the bio flag is specific to the polling implementation, so rename and document it properly. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Sagi Grimberg <sagi@grimberg.me> Reviewed-by:
Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Tested-by:
Mark Wunderlich <mark.wunderlich@intel.com> Link: https://lore.kernel.org/r/20211012111226.760968-12-hch@lst.de Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
Christoph Hellwig authored
Switch the boolean spin argument to blk_poll to passing a set of flags instead. This will allow to control polling behavior in a more fine grained way. Signed-off-by:
Christoph Hellwig <hch@lst.de> Tested-by:
Mark Wunderlich <mark.wunderlich@intel.com> Link: https://lore.kernel.org/r/20211012111226.760968-10-hch@lst.de [axboe: adapt to changed io_uring iopoll] Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Sep 27, 2021
-
-
Matthew Wilcox (Oracle) authored
Convert rotate_reclaimable_page() to folio_rotate_reclaimable(). This eliminates all five of the calls to compound_head() in this function, saving 75 bytes at the cost of adding 15 bytes to its one caller, end_page_writeback(). We also save 36 bytes from pagevec_move_tail_fn() due to using folios there. Net 96 bytes savings. Also move its declaration to mm/internal.h as it's only used by filemap.c. Signed-off-by:
Matthew Wilcox (Oracle) <willy@infradead.org> Acked-by:
Vlastimil Babka <vbabka@suse.cz> Reviewed-by:
William Kucharski <william.kucharski@oracle.com> Reviewed-by:
Christoph Hellwig <hch@lst.de> Acked-by:
Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Acked-by:
Mike Rapoport <rppt@linux.ibm.com> Reviewed-by:
David Howells <dhowells@redhat.com>
-
- Mar 03, 2021
-
-
Jens Axboe authored
We're not factoring in the start of the file for where to write and read the swapfile, which leads to very unfortunate side effects of writing where we should not be... Fixes: 48d15436 ("mm: remove get_swap_bio") Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Feb 24, 2021
-
-
Georgi Djakov authored
If there are errors during swap read or write, they can easily fill the log buffer and remove any previous messages that might be useful for debugging, especially on systems that rely for logging only on the kernel ring-buffer. For example, on a systems using zram as swap, we are more likely to see any page allocation errors preceding the swap write errors if the alerts are ratelimited. Link: https://lkml.kernel.org/r/20210201142055.29068-1-georgi.djakov@linaro.org Signed-off-by:
Georgi Djakov <georgi.djakov@linaro.org> Acked-by:
Minchan Kim <minchan@kernel.org> Reviewed-by:
Miaohe Lin <linmiaohe@huawei.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Jan 27, 2021
-
-
Christoph Hellwig authored
Just reuse the block_device and sector from the swap_info structure, just as used by the SWP_SYNCHRONOUS path. Also remove the checks for NULL returns from bio_alloc as that can't happen for sleeping allocations. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Johannes Thumshirn <johannes.thumshirn@wdc.com> Reviewed-by:
Chaitanya Kulkarni <chaitanya.kulkarni@wdc.com> Acked-by:
Damien Le Moal <damien.lemoal@wdc.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Jan 25, 2021
-
-
Christoph Hellwig authored
Replace the gendisk pointer in struct bio with a pointer to the newly improved struct block device. From that the gendisk can be trivially accessed with an extra indirection, but it also allows to directly look up all information related to partition remapping. Signed-off-by:
Christoph Hellwig <hch@lst.de> Acked-by:
Tejun Heo <tj@kernel.org> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Dec 03, 2020
-
-
Roman Gushchin authored
Patch series "mm: allow mapping accounted kernel pages to userspace", v6. Currently a non-slab kernel page which has been charged to a memory cgroup can't be mapped to userspace. The underlying reason is simple: PageKmemcg flag is defined as a page type (like buddy, offline, etc), so it takes a bit from a page->mapped counter. Pages with a type set can't be mapped to userspace. But in general the kmemcg flag has nothing to do with mapping to userspace. It only means that the page has been accounted by the page allocator, so it has to be properly uncharged on release. Some bpf maps are mapping the vmalloc-based memory to userspace, and their memory can't be accounted because of this implementation detail. This patchset removes this limitation by moving the PageKmemcg flag into one of the free bits of the page->mem_cgroup pointer. Also it formalizes accesses to the page->mem_cgroup and page->obj_cgroups using new helpers, adds several checks and removes a couple of obsolete functions. As the result the code became more robust with fewer open-coded bit tricks. This patch (of 4): Currently there are many open-coded reads of the page->mem_cgroup pointer, as well as a couple of read helpers, which are barely used. It creates an obstacle on a way to reuse some bits of the pointer for storing additional bits of information. In fact, we already do this for slab pages, where the last bit indicates that a pointer has an attached vector of objcg pointers instead of a regular memcg pointer. This commits uses 2 existing helpers and introduces a new helper to converts all read sides to calls of these helpers: struct mem_cgroup *page_memcg(struct page *page); struct mem_cgroup *page_memcg_rcu(struct page *page); struct mem_cgroup *page_memcg_check(struct page *page); page_memcg_check() is intended to be used in cases when the page can be a slab page and have a memcg pointer pointing at objcg vector. It does check the lowest bit, and if set, returns NULL. page_memcg() contains a VM_BUG_ON_PAGE() check for the page not being a slab page. To make sure nobody uses a direct access, struct page's mem_cgroup/obj_cgroups is converted to unsigned long memcg_data. Signed-off-by:
Roman Gushchin <guro@fb.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Alexei Starovoitov <ast@kernel.org> Reviewed-by:
Shakeel Butt <shakeelb@google.com> Acked-by:
Johannes Weiner <hannes@cmpxchg.org> Acked-by:
Michal Hocko <mhocko@suse.com> Link: https://lkml.kernel.org/r/20201027001657.3398190-1-guro@fb.com Link: https://lkml.kernel.org/r/20201027001657.3398190-2-guro@fb.com Link: https://lore.kernel.org/bpf/20201201215900.3569844-2-guro@fb.com
-
- Oct 14, 2020
-
-
Miaohe Lin authored
The out label is only used in one place and return ret directly without something like resource cleanup or lock release and so on. So we should remove this jump label and do some cleanup. Signed-off-by:
Miaohe Lin <linmiaohe@huawei.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Reviewed-by:
Andrew Morton <akpm@linux-foundation.org> Link: https://lkml.kernel.org/r/20200927124032.22521-1-linmiaohe@huawei.com Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Gao Xiang authored
SWP_FS is used to make swap_{read,write}page() go through the filesystem, and it's only used for swap files over NFS for now. Otherwise it will directly submit IO to blockdev according to swapfile extents reported by filesystems in advance. As Matthew pointed out [1], SWP_FS naming is somewhat confusing, so let's rename to SWP_FS_OPS. [1] https://lore.kernel.org/r/20200820113448.GM17456@casper.infradead.org Suggested-by:
Matthew Wilcox <willy@infradead.org> Signed-off-by:
Gao Xiang <hsiangkao@redhat.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Link: https://lkml.kernel.org/r/20200822113019.11319-1-hsiangkao@redhat.com Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Sep 24, 2020
-
-
Christoph Hellwig authored
There is no point in trying to call bdev_read_page if SWP_SYNCHRONOUS_IO is not set, as the device won't support it. Signed-off-by:
Christoph Hellwig <hch@lst.de> Reviewed-by:
Jan Kara <jack@suse.cz> Reviewed-by:
Johannes Thumshirn <johannes.thumshirn@wdc.com> Signed-off-by:
Jens Axboe <axboe@kernel.dk>
-
- Sep 04, 2020
-
-
Steven Price authored
Arm's Memory Tagging Extension (MTE) adds some metadata (tags) to every physical page, when swapping pages out to disk it is necessary to save these tags, and later restore them when reading the pages back. Add some hooks along with dummy implementations to enable the arch code to handle this. Three new hooks are added to the swap code: * arch_prepare_to_swap() and * arch_swap_invalidate_page() / arch_swap_invalidate_area(). One new hook is added to shmem: * arch_swap_restore() Signed-off-by:
Steven Price <steven.price@arm.com> [catalin.marinas@arm.com: add unlock_page() on the error path] [catalin.marinas@arm.com: dropped the _tags suffix] Signed-off-by:
Catalin Marinas <catalin.marinas@arm.com> Acked-by:
Andrew Morton <akpm@linux-foundation.org>
-
- Aug 15, 2020
-
-
Qian Cai authored
struct swap_info_struct si.flags could be accessed concurrently as noticed by KCSAN, BUG: KCSAN: data-race in scan_swap_map_slots / swap_readpage write to 0xffff9c77b80ac400 of 8 bytes by task 91325 on cpu 16: scan_swap_map_slots+0x6fe/0xb50 scan_swap_map_slots at mm/swapfile.c:887 get_swap_pages+0x39d/0x5c0 get_swap_page+0x377/0x524 add_to_swap+0xe4/0x1c0 shrink_page_list+0x1740/0x2820 shrink_inactive_list+0x316/0x8b0 shrink_lruvec+0x8dc/0x1380 shrink_node+0x317/0xd80 do_try_to_free_pages+0x1f7/0xa10 try_to_free_pages+0x26c/0x5e0 __alloc_pages_slowpath+0x458/0x1290 __alloc_pages_nodemask+0x3bb/0x450 alloc_pages_vma+0x8a/0x2c0 do_anonymous_page+0x170/0x700 __handle_mm_fault+0xc9f/0xd00 handle_mm_fault+0xfc/0x2f0 do_page_fault+0x263/0x6f9 page_fault+0x34/0x40 read to 0xffff9c77b80ac400 of 8 bytes by task 5422 on cpu 7: swap_readpage+0x204/0x6a0 swap_readpage at mm/page_io.c:380 read_swap_cache_async+0xa2/0xb0 swapin_readahead+0x6a0/0x890 do_swap_page+0x465/0xeb0 __handle_mm_fault+0xc7a/0xd00 handle_mm_fault+0xfc/0x2f0 do_page_fault+0x263/0x6f9 page_fault+0x34/0x40 Reported by Kernel Concurrency Sanitizer on: CPU: 7 PID: 5422 Comm: gmain Tainted: G W O L 5.5.0-next-20200204+ #6 Hardware name: HPE ProLiant DL385 Gen10/ProLiant DL385 Gen10, BIOS A40 07/10/2019 Other reads, read to 0xffff91ea33eac400 of 8 bytes by task 11276 on cpu 120: __swap_writepage+0x140/0xc20 __swap_writepage at mm/page_io.c:289 read to 0xffff91ea33eac400 of 8 bytes by task 11264 on cpu 16: swap_set_page_dirty+0x44/0x1f4 swap_set_page_dirty at mm/page_io.c:442 The write is under &si->lock, but the reads are done as lockless. Since the reads only check for a specific bit in the flag, it is harmless even if load tearing happens. Thus, just mark them as intentional data races using the data_race() macro. [cai@lca.pw: add a missing annotation] Link: http://lkml.kernel.org/r/1581612585-5812-1-git-send-email-cai@lca.pw Signed-off-by:
Qian Cai <cai@lca.pw> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Cc: Marco Elver <elver@google.com> Link: http://lkml.kernel.org/r/20200207003601.1526-1-cai@lca.pw Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Matthew Wilcox (Oracle) authored
The thp prefix is more frequently used than hpage and we should be consistent between the various functions. [akpm@linux-foundation.org: fix mm/migrate.c] Signed-off-by:
Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Reviewed-by:
William Kucharski <william.kucharski@oracle.com> Reviewed-by:
Zi Yan <ziy@nvidia.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: David Hildenbrand <david@redhat.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Link: http://lkml.kernel.org/r/20200629151959.15779-6-willy@infradead.org Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Matthew Wilcox (Oracle) authored
This function returns the number of bytes in a THP. It is like page_size(), but compiles to just PAGE_SIZE if CONFIG_TRANSPARENT_HUGEPAGE is disabled. Signed-off-by:
Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Reviewed-by:
William Kucharski <william.kucharski@oracle.com> Reviewed-by:
Zi Yan <ziy@nvidia.com> Cc: David Hildenbrand <david@redhat.com> Cc: Mike Kravetz <mike.kravetz@oracle.com> Cc: "Kirill A. Shutemov" <kirill.shutemov@linux.intel.com> Link: http://lkml.kernel.org/r/20200629151959.15779-5-willy@infradead.org Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Aug 07, 2020
-
-
Xianting Tian authored
swap_readpage() does the sync io for one page, the io is not big, normally, the io can be finished quickly, but it may take long time or wait forever in case of io failure or discard. This patch uses blk_io_schedule() instead of io_schedule() to avoid task hung and crash (when set /proc/sys/kernel/hung_task_panic) when the above exception occurs. This is similar to the hung task avoidance in submit_bio_wait(), blk_execute_rq() and __blkdev_direct_IO(). Signed-off-by:
Xianting Tian <xianting_tian@126.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Reviewed-by:
Andrew Morton <akpm@linux-foundation.org> Cc: Ming Lei <ming.lei@redhat.com> Cc: Bart Van Assche <bvanassche@acm.org> Cc: Hannes Reinecke <hare@suse.de> Cc: Jens Axboe <axboe@kernel.dk> Cc: Hugh Dickins <hughd@google.com> Link: http://lkml.kernel.org/r/1596461807-21087-1-git-send-email-xianting_tian@126.com Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-