- Mar 03, 2025
-
-
commit 367a9bffabe08c04f6d725032cce3d891b2b9e1a upstream. nilfs_lookup_dirty_data_buffers(), which iterates through the buffers attached to dirty data folios/pages, accesses the attached buffers without locking the folios/pages. For data cache, nilfs_clear_folio_dirty() may be called asynchronously when the file system degenerates to read only, so nilfs_lookup_dirty_data_buffers() still has the potential to cause use after free issues when buffers lose the protection of their dirty state midway due to this asynchronous clearing and are unintentionally freed by try_to_free_buffers(). Eliminate this race issue by adjusting the lock section in this function. [konishi.ryusuke@gmail.com: adjusted for page/folio conversion] Link: https://lkml.kernel.org/r/20250107200202.6432-3-konishi.ryusuke@gmail.com Signed-off-by:
Ryusuke Konishi <konishi.ryusuke@gmail.com> Fixes: 8c26c4e2 ("nilfs2: fix issue with flush kernel thread after remount in RO mode because of driver's internal error or metadata corruption") Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
commit ca76bb226bf47ff04c782cacbd299f12ddee1ec1 upstream. Patch series "nilfs2: protect busy buffer heads from being force-cleared". This series fixes the buffer head state inconsistency issues reported by syzbot that occurs when the filesystem is corrupted and falls back to read-only, and the associated buffer head use-after-free issue. This patch (of 2): Syzbot has reported that after nilfs2 detects filesystem corruption and falls back to read-only, inconsistencies in the buffer state may occur. One of the inconsistencies is that when nilfs2 calls mark_buffer_dirty() to set a data or metadata buffer as dirty, but it detects that the buffer is not in the uptodate state: WARNING: CPU: 0 PID: 6049 at fs/buffer.c:1177 mark_buffer_dirty+0x2e5/0x520 fs/buffer.c:1177 ... Call Trace: <TASK> nilfs_palloc_commit_alloc_entry+0x4b/0x160 fs/nilfs2/alloc.c:598 nilfs_ifile_create_inode+0x1dd/0x3a0 fs/nilfs2/ifile.c:73 nilfs_new_inode+0x254/0x830 fs/nilfs2/inode.c:344 nilfs_mkdir+0x10d/0x340 fs/nilfs2/namei.c:218 vfs_mkdir+0x2f9/0x4f0 fs/namei.c:4257 do_mkdirat+0x264/0x3a0 fs/namei.c:4280 __do_sys_mkdirat fs/namei.c:4295 [inline] __se_sys_mkdirat fs/namei.c:4293 [inline] __x64_sys_mkdirat+0x87/0xa0 fs/namei.c:4293 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f The other is when nilfs_btree_propagate(), which propagates the dirty state to the ancestor nodes of a b-tree that point to a dirty buffer, detects that the origin buffer is not dirty, even though it should be: WARNING: CPU: 0 PID: 5245 at fs/nilfs2/btree.c:2089 nilfs_btree_propagate+0xc79/0xdf0 fs/nilfs2/btree.c:2089 ... Call Trace: <TASK> nilfs_bmap_propagate+0x75/0x120 fs/nilfs2/bmap.c:345 nilfs_collect_file_data+0x4d/0xd0 fs/nilfs2/segment.c:587 nilfs_segctor_apply_buffers+0x184/0x340 fs/nilfs2/segment.c:1006 nilfs_segctor_scan_file+0x28c/0xa50 fs/nilfs2/segment.c:1045 nilfs_segctor_collect_blocks fs/nilfs2/segment.c:1216 [inline] nilfs_segctor_collect fs/nilfs2/segment.c:1540 [inline] nilfs_segctor_do_construct+0x1c28/0x6b90 fs/nilfs2/segment.c:2115 nilfs_segctor_construct+0x181/0x6b0 fs/nilfs2/segment.c:2479 nilfs_segctor_thread_construct fs/nilfs2/segment.c:2587 [inline] nilfs_segctor_thread+0x69e/0xe80 fs/nilfs2/segment.c:2701 kthread+0x2f0/0x390 kernel/kthread.c:389 ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147 ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244 </TASK> Both of these issues are caused by the callbacks that handle the page/folio write requests, forcibly clear various states, including the working state of the buffers they hold, at unexpected times when they detect read-only fallback. Fix these issues by checking if the buffer is referenced before clearing the page/folio state, and skipping the clear if it is. [konishi.ryusuke@gmail.com: adjusted for page/folio conversion] Link: https://lkml.kernel.org/r/20250107200202.6432-1-konishi.ryusuke@gmail.com Link: https://lkml.kernel.org/r/20250107200202.6432-2-konishi.ryusuke@gmail.com Signed-off-by:
Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by:
<syzbot+b2b14916b77acf8626d7@syzkaller.appspotmail.com> Closes: https://syzkaller.appspot.com/bug?extid=b2b14916b77acf8626d7 Reported-by:
<syzbot+d98fd19acd08b36ff422@syzkaller.appspotmail.com> Link: https://syzkaller.appspot.com/bug?extid=d98fd19acd08b36ff422 Fixes: 8c26c4e2 ("nilfs2: fix issue with flush kernel thread after remount in RO mode because of driver's internal error or metadata corruption") Tested-by:
<syzbot+b2b14916b77acf8626d7@syzkaller.appspotmail.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
commit 299910dc upstream. After detecting file system corruption and degrading to a read-only mount, dirty folios and buffers in the page cache are cleared, and a large number of warnings are output at that time, often filling up the kernel log. In this case, since the degrading to a read-only mount is output to the kernel log, these warnings are not very meaningful, and are rather a nuisance in system management and debugging. The related nilfs2-specific page/folio routines have a silent argument that suppresses the warning output, but since it is not currently used meaningfully, remove both the silent argument and the warning output. [konishi.ryusuke@gmail.com: adjusted for page/folio conversion] Link: https://lkml.kernel.org/r/20240816090128.4561-1-konishi.ryusuke@gmail.com Signed-off-by:
Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Stable-dep-of: ca76bb226bf4 ("nilfs2: do not force clear folio if buffer is referenced") Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
commit 6438ef381c183444f7f9d1de18f22661cba1e946 upstream. Since nilfs_bmap_lookup_contig() in nilfs_fiemap() calculates its result by being prepared to go through potentially maxblocks == INT_MAX blocks, the value in n may experience an overflow caused by left shift of blkbits. While it is extremely unlikely to occur, play it safe and cast right hand expression to wider type to mitigate the issue. Found by Linux Verification Center (linuxtesting.org) with static analysis tool SVACE. Link: https://lkml.kernel.org/r/20250124222133.5323-1-konishi.ryusuke@gmail.com Fixes: 622daaff ("nilfs2: fiemap support") Signed-off-by:
Nikita Zhandarovich <n.zhandarovich@fintech.ru> Signed-off-by:
Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- Jan 14, 2025
-
-
commit 901ce9705fbb9f330ff1f19600e5daf9770b0175 upstream. syzbot reported a WARNING in nilfs_rmdir. [1] Because the inode bitmap is corrupted, an inode with an inode number that should exist as a ".nilfs" file was reassigned by nilfs_mkdir for "file0", causing an inode duplication during execution. And this causes an underflow of i_nlink in rmdir operations. The inode is used twice by the same task to unmount and remove directories ".nilfs" and "file0", it trigger warning in nilfs_rmdir. Avoid to this issue, check i_nlink in nilfs_iget(), if it is 0, it means that this inode has been deleted, and iput is executed to reclaim it. [1] WARNING: CPU: 1 PID: 5824 at fs/inode.c:407 drop_nlink+0xc4/0x110 fs/inode.c:407 ... Call Trace: <TASK> nilfs_rmdir+0x1b0/0x250 fs/nilfs2/namei.c:342 vfs_rmdir+0x3a3/0x510 fs/namei.c:4394 do_rmdir+0x3b5/0x580 fs/namei.c:4453 __do_sys_rmdir fs/namei.c:4472 [inline] __se_sys_rmdir fs/namei.c:4470 [inline] __x64_sys_rmdir+0x47/0x50 fs/namei.c:4470 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f Link: https://lkml.kernel.org/r/20241209065759.6781-1-konishi.ryusuke@gmail.com Fixes: d2500652 ("nilfs2: pathname operations") Signed-off-by:
Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by:
<syzbot+9260555647a5132edd48@syzkaller.appspotmail.com> Closes: https://syzkaller.appspot.com/bug?extid=9260555647a5132edd48 Tested-by:
<syzbot+9260555647a5132edd48@syzkaller.appspotmail.com> Signed-off-by:
Edward Adam Davis <eadavis@qq.com> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
commit 6309b8ce98e9a18390b9fd8f03fc412f3c17aee9 upstream. When block_invalidatepage was converted to block_invalidate_folio, the fallback to block_invalidatepage in folio_invalidate() if the address_space_operations method invalidatepage (currently invalidate_folio) was not set, was removed. Unfortunately, some pseudo-inodes in nilfs2 use empty_aops set by inode_init_always_gfp() as is, or explicitly set it to address_space_operations. Therefore, with this change, block_invalidatepage() is no longer called from folio_invalidate(), and as a result, the buffer_head structures attached to these pages/folios are no longer freed via try_to_free_buffers(). Thus, these buffer heads are now leaked by truncate_inode_pages(), which cleans up the page cache from inode evict(), etc. Three types of caches use empty_aops: gc inode caches and the DAT shadow inode used by GC, and b-tree node caches. Of these, b-tree node caches explicitly call invalidate_mapping_pages() during cleanup, which involves calling try_to_free_buffers(), so the leak was not visible during normal operation but worsened when GC was performed. Fix this issue by using address_space_operations with invalidate_folio set to block_invalidate_folio instead of empty_aops, which will ensure the same behavior as before. Link: https://lkml.kernel.org/r/20241212164556.21338-1-konishi.ryusuke@gmail.com Fixes: 7ba13abb ("fs: Turn block_invalidatepage into block_invalidate_folio") Signed-off-by:
Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> [5.18+] Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
commit 985ebec4ab0a28bb5910c3b1481a40fbf7f9e61d upstream. Syzbot reported that when searching for records in a directory where the inode's i_size is corrupted and has a large value, memory access outside the folio/page range may occur, or a use-after-free bug may be detected if KASAN is enabled. This is because nilfs_last_byte(), which is called by nilfs_find_entry() and others to calculate the number of valid bytes of directory data in a page from i_size and the page index, loses the upper 32 bits of the 64-bit size information due to an inappropriate type of local variable to which the i_size value is assigned. This caused a large byte offset value due to underflow in the end address calculation in the calling nilfs_find_entry(), resulting in memory access that exceeds the folio/page size. Fix this issue by changing the type of the local variable causing the bit loss from "unsigned int" to "u64". The return value of nilfs_last_byte() is also of type "unsigned int", but it is truncated so as not to exceed PAGE_SIZE and no bit loss occurs, so no change is required. Link: https://lkml.kernel.org/r/20241119172403.9292-1-konishi.ryusuke@gmail.com Fixes: 2ba466d7 ("nilfs2: directory entry operations") Signed-off-by:
Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by:
<syzbot+96d5d14c47d97015c624@syzkaller.appspotmail.com> Closes: https://syzkaller.appspot.com/bug?extid=96d5d14c47d97015c624 Tested-by:
<syzbot+96d5d14c47d97015c624@syzkaller.appspotmail.com> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
commit 2026559a upstream. When using the "block:block_dirty_buffer" tracepoint, mark_buffer_dirty() may cause a NULL pointer dereference, or a general protection fault when KASAN is enabled. This happens because, since the tracepoint was added in mark_buffer_dirty(), it references the dev_t member bh->b_bdev->bd_dev regardless of whether the buffer head has a pointer to a block_device structure. In the current implementation, nilfs_grab_buffer(), which grabs a buffer to read (or create) a block of metadata, including b-tree node blocks, does not set the block device, but instead does so only if the buffer is not in the "uptodate" state for each of its caller block reading functions. However, if the uptodate flag is set on a folio/page, and the buffer heads are detached from it by try_to_free_buffers(), and new buffer heads are then attached by create_empty_buffers(), the uptodate flag may be restored to each buffer without the block device being set to bh->b_bdev, and mark_buffer_dirty() may be called later in that state, resulting in the bug mentioned above. Fix this issue by making nilfs_grab_buffer() always set the block device of the super block structure to the buffer head, regardless of the state of the buffer's uptodate flag. Link: https://lkml.kernel.org/r/20241106160811.3316-3-konishi.ryusuke@gmail.com Fixes: 5305cb83 ("block: add block_{touch|dirty}_buffer tracepoint") Signed-off-by:
Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: Tejun Heo <tj@kernel.org> Cc: Ubisectech Sirius <bugreport@valiantsec.com> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
commit cd45e963 upstream. Patch series "nilfs2: fix null-ptr-deref bugs on block tracepoints". This series fixes null pointer dereference bugs that occur when using nilfs2 and two block-related tracepoints. This patch (of 2): It has been reported that when using "block:block_touch_buffer" tracepoint, touch_buffer() called from __nilfs_get_folio_block() causes a NULL pointer dereference, or a general protection fault when KASAN is enabled. This happens because since the tracepoint was added in touch_buffer(), it references the dev_t member bh->b_bdev->bd_dev regardless of whether the buffer head has a pointer to a block_device structure. In the current implementation, the block_device structure is set after the function returns to the caller. Here, touch_buffer() is used to mark the folio/page that owns the buffer head as accessed, but the common search helper for folio/page used by the caller function was optimized to mark the folio/page as accessed when it was reimplemented a long time ago, eliminating the need to call touch_buffer() here in the first place. So this solves the issue by eliminating the touch_buffer() call itself. Link: https://lkml.kernel.org/r/20241106160811.3316-1-konishi.ryusuke@gmail.com Link: https://lkml.kernel.org/r/20241106160811.3316-2-konishi.ryusuke@gmail.com Fixes: 5305cb83 ("block: add block_{touch|dirty}_buffer tracepoint") Signed-off-by:
Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by:
Ubisectech Sirius <bugreport@valiantsec.com> Closes: https://lkml.kernel.org/r/86bd3013-887e-4e38-960f-ca45c657f032.bugreport@valiantsec.com Reported-by:
<syzbot+9982fb8d18eba905abe2@syzkaller.appspotmail.com> Closes: https://syzkaller.appspot.com/bug?extid=9982fb8d18eba905abe2 Tested-by:
<syzbot+9982fb8d18eba905abe2@syzkaller.appspotmail.com> Cc: Tejun Heo <tj@kernel.org> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
commit 41e192ad upstream. Syzbot reported that in directory operations after nilfs2 detects filesystem corruption and degrades to read-only, __block_write_begin_int(), which is called to prepare block writes, may fail the BUG_ON check for accesses exceeding the folio/page size, triggering a kernel bug. This was found to be because the "checked" flag of a page/folio was not cleared when it was discarded by nilfs2's own routine, which causes the sanity check of directory entries to be skipped when the directory page/folio is reloaded. So, fix that. This was necessary when the use of nilfs2's own page discard routine was applied to more than just metadata files. Link: https://lkml.kernel.org/r/20241017193359.5051-1-konishi.ryusuke@gmail.com Fixes: 8c26c4e2 ("nilfs2: fix issue with flush kernel thread after remount in RO mode because of driver's internal error or metadata corruption") Signed-off-by:
Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by:
<syzbot+d6ca2daf692c7a82f959@syzkaller.appspotmail.com> Closes: https://syzkaller.appspot.com/bug?extid=d6ca2daf692c7a82f959 Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
commit b3a033e3 upstream. Syzbot reported that page_symlink(), called by nilfs_symlink(), triggers memory reclamation involving the filesystem layer, which can result in circular lock dependencies among the reader/writer semaphore nilfs->ns_segctor_sem, s_writers percpu_rwsem (intwrite) and the fs_reclaim pseudo lock. This is because after commit 21fc61c7 ("don't put symlink bodies in pagecache into highmem"), the gfp flags of the page cache for symbolic links are overwritten to GFP_KERNEL via inode_nohighmem(). This is not a problem for symlinks read from the backing device, because the __GFP_FS flag is dropped after inode_nohighmem() is called. However, when a new symlink is created with nilfs_symlink(), the gfp flags remain overwritten to GFP_KERNEL. Then, memory allocation called from page_symlink() etc. triggers memory reclamation including the FS layer, which may call nilfs_evict_inode() or nilfs_dirty_inode(). And these can cause a deadlock if they are called while nilfs->ns_segctor_sem is held: Fix this issue by dropping the __GFP_FS flag from the page cache GFP flags of newly created symlinks in the same way that nilfs_new_inode() and __nilfs_read_inode() do, as a workaround until we adopt nofs allocation scope consistently or improve the locking constraints. Link: https://lkml.kernel.org/r/20241020050003.4308-1-konishi.ryusuke@gmail.com Fixes: 21fc61c7 ("don't put symlink bodies in pagecache into highmem") Signed-off-by:
Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by:
<syzbot+9ef37ac20608f4836256@syzkaller.appspotmail.com> Closes: https://syzkaller.appspot.com/bug?extid=9ef37ac20608f4836256 Tested-by:
<syzbot+9ef37ac20608f4836256@syzkaller.appspotmail.com> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
commit 6ed469df upstream. Syzbot reported that after nilfs2 reads a corrupted file system image and degrades to read-only, the BUG_ON check for the buffer delay flag in submit_bh_wbc() may fail, causing a kernel bug. This is because the buffer delay flag is not cleared when clearing the buffer state flags to discard a page/folio or a buffer head. So, fix this. This became necessary when the use of nilfs2's own page clear routine was expanded. This state inconsistency does not occur if the buffer is written normally by log writing. Signed-off-by:
Ryusuke Konishi <konishi.ryusuke@gmail.com> Link: https://lore.kernel.org/r/20241015213300.7114-1-konishi.ryusuke@gmail.com Fixes: 8c26c4e2 ("nilfs2: fix issue with flush kernel thread after remount in RO mode because of driver's internal error or metadata corruption") Reported-by:
<syzbot+985ada84bf055a575c07@syzkaller.appspotmail.com> Closes: https://syzkaller.appspot.com/bug?extid=985ada84bf055a575c07 Cc: stable@vger.kernel.org Signed-off-by:
Christian Brauner <brauner@kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
commit 08cfa12a upstream. Syzbot reported that a task hang occurs in vcs_open() during a fuzzing test for nilfs2. The root cause of this problem is that in nilfs_find_entry(), which searches for directory entries, ignores errors when loading a directory page/folio via nilfs_get_folio() fails. If the filesystem images is corrupted, and the i_size of the directory inode is large, and the directory page/folio is successfully read but fails the sanity check, for example when it is zero-filled, nilfs_check_folio() may continue to spit out error messages in bursts. Fix this issue by propagating the error to the callers when loading a page/folio fails in nilfs_find_entry(). The current interface of nilfs_find_entry() and its callers is outdated and cannot propagate error codes such as -EIO and -ENOMEM returned via nilfs_find_entry(), so fix it together. Link: https://lkml.kernel.org/r/20241004033640.6841-1-konishi.ryusuke@gmail.com Fixes: 2ba466d7 ("nilfs2: directory entry operations") Signed-off-by:
Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by:
Lizhi Xu <lizhi.xu@windriver.com> Closes: https://lkml.kernel.org/r/20240927013806.3577931-1-lizhi.xu@windriver.com Reported-by:
<syzbot+8a192e8d090fa9a31135@syzkaller.appspotmail.com> Closes: https://syzkaller.appspot.com/bug?extid=8a192e8d090fa9a31135 Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
[ Upstream commit f9c96351 ] The function nilfs_btree_check_delete(), which checks whether degeneration to direct mapping occurs before deleting a b-tree entry, causes memory access outside the block buffer when retrieving the maximum key if the root node has no entries. This does not usually happen because b-tree mappings with 0 child nodes are never created by mkfs.nilfs2 or nilfs2 itself. However, it can happen if the b-tree root node read from a device is configured that way, so fix this potential issue by adding a check for that case. Link: https://lkml.kernel.org/r/20240904081401.16682-4-konishi.ryusuke@gmail.com Fixes: 17c76b01 ("nilfs2: B-tree based block mapping") Signed-off-by:
Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: Lizhi Xu <lizhi.xu@windriver.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
[ Upstream commit 111b812d ] Due to the nature of b-trees, nilfs2 itself and admin tools such as mkfs.nilfs2 will never create an intermediate b-tree node block with 0 child nodes, nor will they delete (key, pointer)-entries that would result in such a state. However, it is possible that a b-tree node block is corrupted on the backing device and is read with 0 child nodes. Because operation is not guaranteed if the number of child nodes is 0 for intermediate node blocks other than the root node, modify nilfs_btree_node_broken(), which performs sanity checks when reading a b-tree node block, so that such cases will be judged as metadata corruption. Link: https://lkml.kernel.org/r/20240904081401.16682-3-konishi.ryusuke@gmail.com Fixes: 17c76b01 ("nilfs2: B-tree based block mapping") Signed-off-by:
Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: Lizhi Xu <lizhi.xu@windriver.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
[ Upstream commit 9403001a ] Patch series "nilfs2: fix potential issues with empty b-tree nodes". This series addresses three potential issues with empty b-tree nodes that can occur with corrupted filesystem images, including one recently discovered by syzbot. This patch (of 3): If a b-tree is broken on the device, and the b-tree height is greater than 2 (the level of the root node is greater than 1) even if the number of child nodes of the b-tree root is 0, a NULL pointer dereference occurs in nilfs_btree_prepare_insert(), which is called from nilfs_btree_insert(). This is because, when the number of child nodes of the b-tree root is 0, nilfs_btree_do_lookup() does not set the block buffer head in any of path[x].bp_bh, leaving it as the initial value of NULL, but if the level of the b-tree root node is greater than 1, nilfs_btree_get_nonroot_node(), which accesses the buffer memory of path[x].bp_bh, is called. Fix this issue by adding a check to nilfs_btree_root_broken(), which performs sanity checks when reading the root node from the device, to detect this inconsistency. Thanks to Lizhi Xu for trying to solve the bug and clarifying the cause early on. Link: https://lkml.kernel.org/r/20240904081401.16682-1-konishi.ryusuke@gmail.com Link: https://lkml.kernel.org/r/20240902084101.138971-1-lizhi.xu@windriver.com Link: https://lkml.kernel.org/r/20240904081401.16682-2-konishi.ryusuke@gmail.com Fixes: 17c76b01 ("nilfs2: B-tree based block mapping") Signed-off-by:
Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by:
<syzbot+9bff4c7b992038a7409f@syzkaller.appspotmail.com> Closes: https://syzkaller.appspot.com/bug?extid=9bff4c7b992038a7409f Cc: Lizhi Xu <lizhi.xu@windriver.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
commit 6576dd66 upstream. After commit a694291a ("nilfs2: separate wait function from nilfs_segctor_write") was applied, the log writing function nilfs_segctor_do_construct() was able to issue I/O requests continuously even if user data blocks were split into multiple logs across segments, but two potential flaws were introduced in its error handling. First, if nilfs_segctor_begin_construction() fails while creating the second or subsequent logs, the log writing function returns without calling nilfs_segctor_abort_construction(), so the writeback flag set on pages/folios will remain uncleared. This causes page cache operations to hang waiting for the writeback flag. For example, truncate_inode_pages_final(), which is called via nilfs_evict_inode() when an inode is evicted from memory, will hang. Second, the NILFS_I_COLLECTED flag set on normal inodes remain uncleared. As a result, if the next log write involves checkpoint creation, that's fine, but if a partial log write is performed that does not, inodes with NILFS_I_COLLECTED set are erroneously removed from the "sc_dirty_files" list, and their data and b-tree blocks may not be written to the device, corrupting the block mapping. Fix these issues by uniformly calling nilfs_segctor_abort_construction() on failure of each step in the loop in nilfs_segctor_do_construct(), having it clean up logs and segment usages according to progress, and correcting the conditions for calling nilfs_redirty_inodes() to ensure that the NILFS_I_COLLECTED flag is cleared. Link: https://lkml.kernel.org/r/20240814101119.4070-1-konishi.ryusuke@gmail.com Fixes: a694291a ("nilfs2: separate wait function from nilfs_segctor_write") Signed-off-by:
Ryusuke Konishi <konishi.ryusuke@gmail.com> Tested-by:
Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
commit 68340825 upstream. The superblock buffers of nilfs2 can not only be overwritten at runtime for modifications/repairs, but they are also regularly swapped, replaced during resizing, and even abandoned when degrading to one side due to backing device issues. So, accessing them requires mutual exclusion using the reader/writer semaphore "nilfs->ns_sem". Some sysfs attribute show methods read this superblock buffer without the necessary mutual exclusion, which can cause problems with pointer dereferencing and memory access, so fix it. Link: https://lkml.kernel.org/r/20240811100320.9913-1-konishi.ryusuke@gmail.com Fixes: da7141fb ("nilfs2: add /sys/fs/nilfs2/<device> group") Signed-off-by:
Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
commit 5787fcaa upstream. In an error injection test of a routine for mount-time recovery, KASAN found a use-after-free bug. It turned out that if data recovery was performed using partial logs created by dsync writes, but an error occurred before starting the log writer to create a recovered checkpoint, the inodes whose data had been recovered were left in the ns_dirty_files list of the nilfs object and were not freed. Fix this issue by cleaning up inodes that have read the recovery data if the recovery routine fails midway before the log writer starts. Link: https://lkml.kernel.org/r/20240810065242.3701-1-konishi.ryusuke@gmail.com Fixes: 0f3e1c7f ("nilfs2: recovery functions") Signed-off-by:
Ryusuke Konishi <konishi.ryusuke@gmail.com> Tested-by:
Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- Sep 17, 2024
-
-
Tetsuo Handa authored
[ Upstream commit 73970316 ] nilfs_btree_assign_p() and nilfs_direct_assign_p() are not initializing "struct nilfs_binfo_dat"->bi_pad field, causing uninit-value reports when being passed to CRC function. Link: https://lkml.kernel.org/r/20230326152146.15872-1-konishi.ryusuke@gmail.com Reported-by:
syzbot <syzbot+048585f3f4227bb2b49b@syzkaller.appspotmail.com> Link: https://syzkaller.appspot.com/bug?extid=048585f3f4227bb2b49b Reported-by:
Dipanjan Das <mail.dipanjan.das@gmail.com> Link: https://lkml.kernel.org/r/CANX2M5bVbzRi6zH3PTcNE_31TzerstOXUa9Bay4E6y6dX23_pg@mail.gmail.com Signed-off-by:
Tetsuo Handa <penguin-kernel@I-love.SAKURA.ne.jp> Signed-off-by:
Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: Alexander Potapenko <glider@google.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Ryusuke Konishi authored
[ Upstream commit 602ce7b8 ] If nilfs2 reads a corrupted disk image and its DAT metadata file contains invalid lifetime data for a virtual block number, a kernel warning can be generated by the WARN_ON check in nilfs_dat_commit_end() and can panic if the kernel is booted with panic_on_warn. This patch avoids the issue with a sanity check that treats it as an error. Since error return is not allowed in the execution phase of nilfs_dat_commit_end(), this inserts that sanity check in nilfs_dat_prepare_end(), which prepares for nilfs_dat_commit_end(). As the error code, -EINVAL is returned to notify bmap layer of the metadata corruption. When the bmap layer sees this code, it handles the abnormal situation and replaces the return code with -EIO as it should. Link: https://lkml.kernel.org/r/000000000000154d2c05e9ec7df6@google.com Link: https://lkml.kernel.org/r/20230127132202.6083-1-konishi.ryusuke@gmail.com Signed-off-by:
Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by:
<syzbot+cbff7a52b6f99059e67f@syzkaller.appspotmail.com> Tested-by:
Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
- Aug 12, 2024
-
-
commit 4811f7af upstream. Syzbot reported that a buffer state inconsistency was detected in nilfs_btnode_create_block(), triggering a kernel bug. It is not appropriate to treat this inconsistency as a bug; it can occur if the argument block address (the buffer index of the newly created block) is a virtual block number and has been reallocated due to corruption of the bitmap used to manage its allocation state. So, modify nilfs_btnode_create_block() and its callers to treat it as a possible filesystem error, rather than triggering a kernel bug. Link: https://lkml.kernel.org/r/20240725052007.4562-1-konishi.ryusuke@gmail.com Fixes: a60be987 ("nilfs2: B-tree node cache") Signed-off-by:
Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by:
<syzbot+89cc4f2324ed37988b60@syzkaller.appspotmail.com> Closes: https://syzkaller.appspot.com/bug?extid=89cc4f2324ed37988b60 Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
[ Upstream commit 0f3819e8 ] According to the C standard 3.4.3p3, the result of signed integer overflow is undefined. The macro nilfs_cnt32_ge(), which compares two sequence numbers, uses signed integer subtraction that can overflow, and therefore the result of the calculation may differ from what is expected due to undefined behavior in different environments. Similar to an earlier change to the jiffies-related comparison macros in commit 5a581b36 ("jiffies: Avoid undefined behavior from signed overflow"), avoid this potential issue by changing the definition of the macro to perform the subtraction as unsigned integers, then cast the result to a signed integer for comparison. Link: https://lkml.kernel.org/r/20130727225828.GA11864@linux.vnet.ibm.com Link: https://lkml.kernel.org/r/20240702183512.6390-1-konishi.ryusuke@gmail.com Fixes: 9ff05123 ("nilfs2: segment constructor") Signed-off-by:
Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
commit a9e1ddc0 upstream. Syzbot reported that in rename directory operation on broken directory on nilfs2, __block_write_begin_int() called to prepare block write may fail BUG_ON check for access exceeding the folio/page size. This is because nilfs_dotdot(), which gets parent directory reference entry ("..") of the directory to be moved or renamed, does not check consistency enough, and may return location exceeding folio/page size for broken directories. Fix this issue by checking required directory entries ("." and "..") in the first chunk of the directory in nilfs_dotdot(). Link: https://lkml.kernel.org/r/20240628165107.9006-1-konishi.ryusuke@gmail.com Signed-off-by:
Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by:
<syzbot+d3abed1ad3d367fa2627@syzkaller.appspotmail.com> Closes: https://syzkaller.appspot.com/bug?extid=d3abed1ad3d367fa2627 Fixes: 2ba466d7 ("nilfs2: directory entry operations") Tested-by:
Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
commit 93aef9ed upstream. If the bitmap block that manages the inode allocation status is corrupted, nilfs_ifile_create_inode() may allocate a new inode from the reserved inode area where it should not be allocated. Previous fix commit d325dc6e ("nilfs2: fix use-after-free bug of struct nilfs_root"), fixed the problem that reserved inodes with inode numbers less than NILFS_USER_INO (=11) were incorrectly reallocated due to bitmap corruption, but since the start number of non-reserved inodes is read from the super block and may change, in which case inode allocation may occur from the extended reserved inode area. If that happens, access to that inode will cause an IO error, causing the file system to degrade to an error state. Fix this potential issue by adding a wraparound option to the common metadata object allocation routine and by modifying nilfs_ifile_create_inode() to disable the option so that it only allocates inodes with inode numbers greater than or equal to the inode number read in "nilfs->ns_first_ino", regardless of the bitmap status of reserved inodes. Link: https://lkml.kernel.org/r/20240623051135.4180-4-konishi.ryusuke@gmail.com Signed-off-by:
Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: Hillf Danton <hdanton@sina.com> Cc: Jan Kara <jack@suse.cz> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
commit bb76c6c2 upstream. Syzbot reported that mounting and unmounting a specific pattern of corrupted nilfs2 filesystem images causes a use-after-free of metadata file inodes, which triggers a kernel bug in lru_add_fn(). As Jan Kara pointed out, this is because the link count of a metadata file gets corrupted to 0, and nilfs_evict_inode(), which is called from iput(), tries to delete that inode (ifile inode in this case). The inconsistency occurs because directories containing the inode numbers of these metadata files that should not be visible in the namespace are read without checking. Fix this issue by treating the inode numbers of these internal files as errors in the sanity check helper when reading directory folios/pages. Also thanks to Hillf Danton and Matthew Wilcox for their initial mm-layer analysis. Link: https://lkml.kernel.org/r/20240623051135.4180-3-konishi.ryusuke@gmail.com Signed-off-by:
Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by:
<syzbot+d79afb004be235636ee8@syzkaller.appspotmail.com> Closes: https://syzkaller.appspot.com/bug?extid=d79afb004be235636ee8 Reported-by:
Jan Kara <jack@suse.cz> Closes: https://lkml.kernel.org/r/20240617075758.wewhukbrjod5fp5o@quack3 Tested-by:
Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: Hillf Danton <hdanton@sina.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
commit e2fec219 upstream. Patch series "nilfs2: fix potential issues related to reserved inodes". This series fixes one use-after-free issue reported by syzbot, caused by nilfs2's internal inode being exposed in the namespace on a corrupted filesystem, and a couple of flaws that cause problems if the starting number of non-reserved inodes written in the on-disk super block is intentionally (or corruptly) changed from its default value. This patch (of 3): In the current implementation of nilfs2, "nilfs->ns_first_ino", which gives the first non-reserved inode number, is read from the superblock, but its lower limit is not checked. As a result, if a number that overlaps with the inode number range of reserved inodes such as the root directory or metadata files is set in the super block parameter, the inode number test macros (NILFS_MDT_INODE and NILFS_VALID_INODE) will not function properly. In addition, these test macros use left bit-shift calculations using with the inode number as the shift count via the BIT macro, but the result of a shift calculation that exceeds the bit width of an integer is undefined in the C specification, so if "ns_first_ino" is set to a large value other than the default value NILFS_USER_INO (=11), the macros may potentially malfunction depending on the environment. Fix these issues by checking the lower bound of "nilfs->ns_first_ino" and by preventing bit shifts equal to or greater than the NILFS_USER_INO constant in the inode number test macros. Also, change the type of "ns_first_ino" from signed integer to unsigned integer to avoid the need for type casting in comparisons such as the lower bound check introduced this time. Link: https://lkml.kernel.org/r/20240623051135.4180-1-konishi.ryusuke@gmail.com Link: https://lkml.kernel.org/r/20240623051135.4180-2-konishi.ryusuke@gmail.com Signed-off-by:
Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: Hillf Danton <hdanton@sina.com> Cc: Jan Kara <jack@suse.cz> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- Jul 11, 2024
-
-
commit a4ca369c upstream. Destructive writes to a block device on which nilfs2 is mounted can cause a kernel bug in the folio/page writeback start routine or writeback end routine (__folio_start_writeback in the log below): kernel BUG at mm/page-writeback.c:3070! Oops: invalid opcode: 0000 [#1] PREEMPT SMP KASAN PTI ... RIP: 0010:__folio_start_writeback+0xbaa/0x10e0 Code: 25 ff 0f 00 00 0f 84 18 01 00 00 e8 40 ca c6 ff e9 17 f6 ff ff e8 36 ca c6 ff 4c 89 f7 48 c7 c6 80 c0 12 84 e8 e7 b3 0f 00 90 <0f> 0b e8 1f ca c6 ff 4c 89 f7 48 c7 c6 a0 c6 12 84 e8 d0 b3 0f 00 ... Call Trace: <TASK> nilfs_segctor_do_construct+0x4654/0x69d0 [nilfs2] nilfs_segctor_construct+0x181/0x6b0 [nilfs2] nilfs_segctor_thread+0x548/0x11c0 [nilfs2] kthread+0x2f0/0x390 ret_from_fork+0x4b/0x80 ret_from_fork_asm+0x1a/0x30 </TASK> This is because when the log writer starts a writeback for segment summary blocks or a super root block that use the backing device's page cache, it does not wait for the ongoing folio/page writeback, resulting in an inconsistent writeback state. Fix this issue by waiting for ongoing writebacks when putting folios/pages on the backing device into writeback state. Link: https://lkml.kernel.org/r/20240530141556.4411-1-konishi.ryusuke@gmail.com Fixes: 9ff05123 ("nilfs2: segment constructor") Signed-off-by:
Ryusuke Konishi <konishi.ryusuke@gmail.com> Tested-by:
Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
[ Upstream commit 7373a51e ] The error handling in nilfs_empty_dir() when a directory folio/page read fails is incorrect, as in the old ext2 implementation, and if the folio/page cannot be read or nilfs_check_folio() fails, it will falsely determine the directory as empty and corrupt the file system. In addition, since nilfs_empty_dir() does not immediately return on a failed folio/page read, but continues to loop, this can cause a long loop with I/O if i_size of the directory's inode is also corrupted, causing the log writer thread to wait and hang, as reported by syzbot. Fix these issues by making nilfs_empty_dir() immediately return a false value (0) if it fails to get a directory folio/page. Link: https://lkml.kernel.org/r/20240604134255.7165-1-konishi.ryusuke@gmail.com Signed-off-by:
Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by:
<syzbot+c8166c541d3971bf6c87@syzkaller.appspotmail.com> Closes: https://syzkaller.appspot.com/bug?extid=c8166c541d3971bf6c87 Fixes: 2ba466d7 ("nilfs2: directory entry operations") Tested-by:
Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
[ Upstream commit 09a46acb ] In prepartion for switching from kmap() to kmap_local(), return the kmap address from nilfs_get_page() instead of having the caller look up page_address(). [konishi.ryusuke: fixed a missing blank line after declaration] Link: https://lkml.kernel.org/r/20231127143036.2425-7-konishi.ryusuke@gmail.com Signed-off-by:
Matthew Wilcox (Oracle) <willy@infradead.org> Signed-off-by:
Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Stable-dep-of: 7373a51e ("nilfs2: fix nilfs_empty_dir() misjudgment and long loop on I/O errors") Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
commit f5d4e046 upstream. Patch series "nilfs2: fix log writer related issues". This bug fix series covers three nilfs2 log writer-related issues, including a timer use-after-free issue and potential deadlock issue on unmount, and a potential freeze issue in event synchronization found during their analysis. Details are described in each commit log. This patch (of 3): A use-after-free issue has been reported regarding the timer sc_timer on the nilfs_sc_info structure. The problem is that even though it is used to wake up a sleeping log writer thread, sc_timer is not shut down until the nilfs_sc_info structure is about to be freed, and is used regardless of the thread's lifetime. Fix this issue by limiting the use of sc_timer only while the log writer thread is alive. Link: https://lkml.kernel.org/r/20240520132621.4054-1-konishi.ryusuke@gmail.com Link: https://lkml.kernel.org/r/20240520132621.4054-2-konishi.ryusuke@gmail.com Fixes: fdce895e ("nilfs2: change sc_timer from a pointer to an embedded one in struct nilfs_sc_info") Signed-off-by:
Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by:
"Bai, Shuangpeng" <sjb7183@psu.edu> Closes: https://groups.google.com/g/syzkaller/c/MK_LYqtt8ko/m/8rgdWeseAwAJ Tested-by:
Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
[ Upstream commit c473bcdd ] clang-14 points out that v_size is always smaller than a 64KB page size if that is configured by the CPU architecture: fs/nilfs2/ioctl.c:63:19: error: result of comparison of constant 65536 with expression of type '__u16' (aka 'unsigned short') is always false [-Werror,-Wtautological-constant-out-of-range-compare] if (argv->v_size > PAGE_SIZE) ~~~~~~~~~~~~ ^ ~~~~~~~~~ This is ok, so just shut up that warning with a cast. Signed-off-by:
Arnd Bergmann <arnd@arndb.de> Link: https://lore.kernel.org/r/20240328143051.1069575-7-arnd@kernel.org Fixes: 3358b4aa ("nilfs2: fix problems of memory allocation in ioctl") Acked-by:
Ryusuke Konishi <konishi.ryusuke@gmail.com> Reviewed-by:
Justin Stitt <justinstitt@google.com> Signed-off-by:
Christian Brauner <brauner@kernel.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
commit eb85dace upstream. Syzbot has reported a potential hang in nilfs_detach_log_writer() called during nilfs2 unmount. Analysis revealed that this is because nilfs_segctor_sync(), which synchronizes with the log writer thread, can be called after nilfs_segctor_destroy() terminates that thread, as shown in the call trace below: nilfs_detach_log_writer nilfs_segctor_destroy nilfs_segctor_kill_thread --> Shut down log writer thread flush_work nilfs_iput_work_func nilfs_dispose_list iput nilfs_evict_inode nilfs_transaction_commit nilfs_construct_segment (if inode needs sync) nilfs_segctor_sync --> Attempt to synchronize with log writer thread *** DEADLOCK *** Fix this issue by changing nilfs_segctor_sync() so that the log writer thread returns normally without synchronizing after it terminates, and by forcing tasks that are already waiting to complete once after the thread terminates. The skipped inode metadata flushout will then be processed together in the subsequent cleanup work in nilfs_segctor_destroy(). Link: https://lkml.kernel.org/r/20240520132621.4054-4-konishi.ryusuke@gmail.com Signed-off-by:
Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by:
<syzbot+e3973c409251e136fdd0@syzkaller.appspotmail.com> Closes: https://syzkaller.appspot.com/bug?extid=e3973c409251e136fdd0 Tested-by:
Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Cc: "Bai, Shuangpeng" <sjb7183@psu.edu> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
commit 936184ea upstream. A potential and reproducible race issue has been identified where nilfs_segctor_sync() would block even after the log writer thread writes a checkpoint, unless there is an interrupt or other trigger to resume log writing. This turned out to be because, depending on the execution timing of the log writer thread running in parallel, the log writer thread may skip responding to nilfs_segctor_sync(), which causes a call to schedule() waiting for completion within nilfs_segctor_sync() to lose the opportunity to wake up. The reason why waking up the task waiting in nilfs_segctor_sync() may be skipped is that updating the request generation issued using a shared sequence counter and adding an wait queue entry to the request wait queue to the log writer, are not done atomically. There is a possibility that log writing and request completion notification by nilfs_segctor_wakeup() may occur between the two operations, and in that case, the wait queue entry is not yet visible to nilfs_segctor_wakeup() and the wake-up of nilfs_segctor_sync() will be carried over until the next request occurs. Fix this issue by performing these two operations simultaneously within the lock section of sc_state_lock. Also, following the memory barrier guidelines for event waiting loops, move the call to set_current_state() in the same location into the event waiting loop to ensure that a memory barrier is inserted just before the event condition determination. Link: https://lkml.kernel.org/r/20240520132621.4054-3-konishi.ryusuke@gmail.com Fixes: 9ff05123 ("nilfs2: segment constructor") Signed-off-by:
Ryusuke Konishi <konishi.ryusuke@gmail.com> Tested-by:
Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Cc: "Bai, Shuangpeng" <sjb7183@psu.edu> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- May 13, 2024
-
-
commit c4a7dc95 upstream. The size of the nilfs_type_by_mode array in the fs/nilfs2/dir.c file is defined as "S_IFMT >> S_SHIFT", but the nilfs_set_de_type() function, which uses this array, specifies the index to read from the array in the same way as "(mode & S_IFMT) >> S_SHIFT". static void nilfs_set_de_type(struct nilfs_dir_entry *de, struct inode *inode) { umode_t mode = inode->i_mode; de->file_type = nilfs_type_by_mode[(mode & S_IFMT)>>S_SHIFT]; // oob } However, when the index is determined this way, an out-of-bounds (OOB) error occurs by referring to an index that is 1 larger than the array size when the condition "mode & S_IFMT == S_IFMT" is satisfied. Therefore, a patch to resize the nilfs_type_by_mode array should be applied to prevent OOB errors. Link: https://lkml.kernel.org/r/20240415182048.7144-1-konishi.ryusuke@gmail.com Reported-by:
<syzbot+2e22057de05b9f3b30d8@syzkaller.appspotmail.com> Closes: https://syzkaller.appspot.com/bug?extid=2e22057de05b9f3b30d8 Fixes: 2ba466d7 ("nilfs2: directory entry operations") Signed-off-by:
Jeongjun Park <aha310510@gmail.com> Signed-off-by:
Ryusuke Konishi <konishi.ryusuke@gmail.com> Tested-by:
Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- Apr 11, 2024
-
-
[ Upstream commit 269cdf35 ] Fix a bug where nilfs_get_block() returns a successful status when searching and inserting the specified block both fail inconsistently. If this inconsistent behavior is not due to a previously fixed bug, then an unexpected race is occurring, so return a temporary error -EAGAIN instead. This prevents callers such as __block_write_begin_int() from requesting a read into a buffer that is not mapped, which would cause the BUG_ON check for the BH_Mapped flag in submit_bh_wbc() to fail. Link: https://lkml.kernel.org/r/20240313105827.5296-3-konishi.ryusuke@gmail.com Fixes: 1f5abe7e ("nilfs2: replace BUG_ON and BUG calls triggerable from ioctl") Signed-off-by:
Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
[ Upstream commit f2f26b4a ] Patch series "nilfs2: fix kernel bug at submit_bh_wbc()". This resolves a kernel BUG reported by syzbot. Since there are two flaws involved, I've made each one a separate patch. The first patch alone resolves the syzbot-reported bug, but I think both fixes should be sent to stable, so I've tagged them as such. This patch (of 2): Syzbot has reported a kernel bug in submit_bh_wbc() when writing file data to a nilfs2 file system whose metadata is corrupted. There are two flaws involved in this issue. The first flaw is that when nilfs_get_block() locates a data block using btree or direct mapping, if the disk address translation routine nilfs_dat_translate() fails with internal code -ENOENT due to DAT metadata corruption, it can be passed back to nilfs_get_block(). This causes nilfs_get_block() to misidentify an existing block as non-existent, causing both data block lookup and insertion to fail inconsistently. The second flaw is that nilfs_get_block() returns a successful status in this inconsistent state. This causes the caller __block_write_begin_int() or others to request a read even though the buffer is not mapped, resulting in a BUG_ON check for the BH_Mapped flag in submit_bh_wbc() failing. This fixes the first issue by changing the return value to code -EINVAL when a conversion using DAT fails with code -ENOENT, avoiding the conflicting condition that leads to the kernel bug described above. Here, code -EINVAL indicates that metadata corruption was detected during the block lookup, which will be properly handled as a file system error and converted to -EIO when passing through the nilfs2 bmap layer. Link: https://lkml.kernel.org/r/20240313105827.5296-1-konishi.ryusuke@gmail.com Link: https://lkml.kernel.org/r/20240313105827.5296-2-konishi.ryusuke@gmail.com Fixes: c3a7abf0 ("nilfs2: support contiguous lookup of blocks") Signed-off-by:
Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by:
<syzbot+cfed5b56649bddf80d6e@syzkaller.appspotmail.com> Closes: https://syzkaller.appspot.com/bug?extid=cfed5b56649bddf80d6e Tested-by:
Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
- Mar 11, 2024
-
-
commit 5124a0a5 upstream. If DAT metadata file block access fails due to corruption of the DAT file or abnormal virtual block numbers held by b-trees or inodes, a kernel warning is generated. This replaces the WARN_ONs by error output, so that a kernel, booted with panic_on_warn, does not panic. This patch also replaces the detected return code -ENOENT with another internal code -EINVAL to notify the bmap layer of metadata corruption. When the bmap layer sees -EINVAL, it handles the abnormal situation with nilfs_bmap_convert_error() and finally returns code -EIO as it should. Link: https://lkml.kernel.org/r/0000000000005cc3d205ea23ddcf@google.com Link: https://lkml.kernel.org/r/20230126164114.6911-1-konishi.ryusuke@gmail.com Signed-off-by:
Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by:
<syzbot+5d5d25f90f195a3cfcb4@syzkaller.appspotmail.com> Tested-by:
Ryusuke Konishi <konishi.ryusuke@gmail.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
commit 5bc09b39 upstream. According to a syzbot report, end_buffer_async_write(), which handles the completion of block device writes, may detect abnormal condition of the buffer async_write flag and cause a BUG_ON failure when using nilfs2. Nilfs2 itself does not use end_buffer_async_write(). But, the async_write flag is now used as a marker by commit 7f42ec39 ("nilfs2: fix issue with race condition of competition between segments for dirty blocks") as a means of resolving double list insertion of dirty blocks in nilfs_lookup_dirty_data_buffers() and nilfs_lookup_node_buffers() and the resulting crash. This modification is safe as long as it is used for file data and b-tree node blocks where the page caches are independent. However, it was irrelevant and redundant to also introduce async_write for segment summary and super root blocks that share buffers with the backing device. This led to the possibility that the BUG_ON check in end_buffer_async_write would fail as described above, if independent writebacks of the backing device occurred in parallel. The use of async_write for segment summary buffers has already been removed in a previous change. Fix this issue by removing the manipulation of the async_write flag for the remaining super root block buffer. Link: https://lkml.kernel.org/r/20240203161645.4992-1-konishi.ryusuke@gmail.com Fixes: 7f42ec39 ("nilfs2: fix issue with race condition of competition between segments for dirty blocks") Signed-off-by:
Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by:
<syzbot+5c04210f7c7f897c1e7f@syzkaller.appspotmail.com> Closes: https://lkml.kernel.org/r/00000000000019a97c05fd42f8c8@google.com Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
commit 38296afe upstream. Syzbot reported a hang issue in migrate_pages_batch() called by mbind() and nilfs_lookup_dirty_data_buffers() called in the log writer of nilfs2. While migrate_pages_batch() locks a folio and waits for the writeback to complete, the log writer thread that should bring the writeback to completion picks up the folio being written back in nilfs_lookup_dirty_data_buffers() that it calls for subsequent log creation and was trying to lock the folio. Thus causing a deadlock. In the first place, it is unexpected that folios/pages in the middle of writeback will be updated and become dirty. Nilfs2 adds a checksum to verify the validity of the log being written and uses it for recovery at mount, so data changes during writeback are suppressed. Since this is broken, an unclean shutdown could potentially cause recovery to fail. Investigation revealed that the root cause is that the wait for writeback completion in nilfs_page_mkwrite() is conditional, and if the backing device does not require stable writes, data may be modified without waiting. Fix these issues by making nilfs_page_mkwrite() wait for writeback to finish regardless of the stable write requirement of the backing device. Link: https://lkml.kernel.org/r/20240131145657.4209-1-konishi.ryusuke@gmail.com Fixes: 1d1d1a76 ("mm: only enforce stable page writes if the backing device requires it") Signed-off-by:
Ryusuke Konishi <konishi.ryusuke@gmail.com> Reported-by:
<syzbot+ee2ae68da3b22d04cd8d@syzkaller.appspotmail.com> Closes: https://lkml.kernel.org/r/00000000000047d819061004ad6c@google.com Tested-by:
Ryusuke Konishi <konishi.ryusuke@gmail.com> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-