- Feb 03, 2025
-
-
commit 23dfdb56 upstream. The following kernel trace can be triggered with fstest generic/629 when executed against a filesystem with fast-commit feature enabled: INFO: trying to register non-static key. The code is fine but needs lockdep annotation, or maybe you didn't initialize this object before use? turning off the locking correctness validator. CPU: 0 PID: 866 Comm: mount Not tainted 6.10.0+ #11 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.2-3-gd478f380-prebuilt.qemu.org 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x66/0x90 register_lock_class+0x759/0x7d0 __lock_acquire+0x85/0x2630 ? __find_get_block+0xb4/0x380 lock_acquire+0xd1/0x2d0 ? __ext4_journal_get_write_access+0xd5/0x160 _raw_spin_lock+0x33/0x40 ? __ext4_journal_get_write_access+0xd5/0x160 __ext4_journal_get_write_access+0xd5/0x160 ext4_reserve_inode_write+0x61/0xb0 __ext4_mark_inode_dirty+0x79/0x270 ? ext4_ext_replay_set_iblocks+0x2f8/0x450 ext4_ext_replay_set_iblocks+0x330/0x450 ext4_fc_replay+0x14c8/0x1540 ? jread+0x88/0x2e0 ? rcu_is_watching+0x11/0x40 do_one_pass+0x447/0xd00 jbd2_journal_recover+0x139/0x1b0 jbd2_journal_load+0x96/0x390 ext4_load_and_init_journal+0x253/0xd40 ext4_fill_super+0x2cc6/0x3180 ... In the replay path there's an attempt to lock sbi->s_bdev_wb_lock in function ext4_check_bdev_write_error(). Unfortunately, at this point this spinlock has not been initialized yet. Moving it's initialization to an earlier point in __ext4_fill_super() fixes this splat. Signed-off-by:
Luis Henriques (SUSE) <luis.henriques@linux.dev> Link: https://patch.msgid.link/20240718094356.7863-1-luis.henriques@linux.dev Signed-off-by:
Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by:
Bruno VERNAY <bruno.vernay@se.com> Signed-off-by:
Victor Giraud <vgiraud.opensource@witekio.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
- Jan 14, 2025
-
-
commit 4a622e4d477bb12ad5ed4abbc7ad1365de1fa347 upstream. The original implementation ext4's FS_IOC_GETFSMAP handling only worked when the range of queried blocks included at least one free (unallocated) block range. This is because how the metadata blocks were emitted was as a side effect of ext4_mballoc_query_range() calling ext4_getfsmap_datadev_helper(), and that function was only called when a free block range was identified. As a result, this caused generic/365 to fail. Fix this by creating a new function ext4_getfsmap_meta_helper() which gets called so that blocks before the first free block range in a block group can get properly reported. Signed-off-by:
Theodore Ts'o <tytso@mit.edu> Cc: stable@vger.kernel.org Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
commit 902cc179c931a033cd7f4242353aa2733bf8524c upstream. find_group_other() and find_group_orlov() read *_lo, *_hi with ext4_free_inodes_count without additional locking. This can cause data-race warning, but since the lock is held for most writes and free inodes value is generally not a problem even if it is incorrect, it is more appropriate to use READ_ONCE()/WRITE_ONCE() than to add locking. ================================================================== BUG: KCSAN: data-race in ext4_free_inodes_count / ext4_free_inodes_set write to 0xffff88810404300e of 2 bytes by task 6254 on cpu 1: ext4_free_inodes_set+0x1f/0x80 fs/ext4/super.c:405 __ext4_new_inode+0x15ca/0x2200 fs/ext4/ialloc.c:1216 ext4_symlink+0x242/0x5a0 fs/ext4/namei.c:3391 vfs_symlink+0xca/0x1d0 fs/namei.c:4615 do_symlinkat+0xe3/0x340 fs/namei.c:4641 __do_sys_symlinkat fs/namei.c:4657 [inline] __se_sys_symlinkat fs/namei.c:4654 [inline] __x64_sys_symlinkat+0x5e/0x70 fs/namei.c:4654 x64_sys_call+0x1dda/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:267 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0x54/0x120 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x76/0x7e read to 0xffff88810404300e of 2 bytes by task 6257 on cpu 0: ext4_free_inodes_count+0x1c/0x80 fs/ext4/super.c:349 find_group_other fs/ext4/ialloc.c:594 [inline] __ext4_new_inode+0x6ec/0x2200 fs/ext4/ialloc.c:1017 ext4_symlink+0x242/0x5a0 fs/ext4/namei.c:3391 vfs_symlink+0xca/0x1d0 fs/namei.c:4615 do_symlinkat+0xe3/0x340 fs/namei.c:4641 __do_sys_symlinkat fs/namei.c:4657 [inline] __se_sys_symlinkat fs/namei.c:4654 [inline] __x64_sys_symlinkat+0x5e/0x70 fs/namei.c:4654 x64_sys_call+0x1dda/0x2d60 arch/x86/include/generated/asm/syscalls_64.h:267 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0x54/0x120 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x76/0x7e Cc: stable@vger.kernel.org Signed-off-by:
Jeongjun Park <aha310510@gmail.com> Reviewed-by:
Andreas Dilger <adilger@dilger.ca> Link: https://patch.msgid.link/20241003125337.47283-1-aha310510@gmail.com Signed-off-by:
Theodore Ts'o <tytso@mit.edu> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
[ Upstream commit 76486b104168ae59703190566e372badf433314b ] When we remount filesystem with 'abort' mount option while changing other mount options as well (as is LTP test doing), we can return error from the system call after commit d3476f3d ("ext4: don't set SB_RDONLY after filesystem errors") because the application of mount option changes detects shutdown filesystem and refuses to do anything. The behavior of application of other mount options in presence of 'abort' mount option is currently rather arbitary as some mount option changes are handled before 'abort' and some after it. Move aborting of the filesystem to the end of remount handling so all requested changes are properly applied before the filesystem is shutdown to have a reasonably consistent behavior. Fixes: d3476f3d ("ext4: don't set SB_RDONLY after filesystem errors") Reported-by:
Jan Stancek <jstancek@redhat.com> Link: https://lore.kernel.org/all/Zvp6L+oFnfASaoHl@t14s Signed-off-by:
Jan Kara <jack@suse.cz> Tested-by:
Jan Stancek <jstancek@redhat.com> Link: https://patch.msgid.link/20241004221556.19222-1-jack@suse.cz Signed-off-by:
Theodore Ts'o <tytso@mit.edu> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
[ Upstream commit 22b8d707 ] 'abort' mount option is the only mount option that has special handling and sets a bit in sbi->s_mount_flags. There is not strong reason for that so just simplify the code and make 'abort' set a bit in sbi->s_mount_opt2 as any other mount option. This simplifies the code and will allow us to drop EXT4_MF_FS_ABORTED completely in the following patch. Signed-off-by:
Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230616165109.21695-4-jack@suse.cz Signed-off-by:
Theodore Ts'o <tytso@mit.edu> Stable-dep-of: 76486b104168 ("ext4: avoid remount errors with 'abort' mount option") Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
commit 0ce160c5 upstream. Syzbot has found an ODEBUG bug in ext4_fill_super The del_timer_sync function cancels the s_err_report timer, which reminds about filesystem errors daily. We should guarantee the timer is no longer active before kfree(sbi). When filesystem mounting fails, the flow goes to failed_mount3, where an error occurs when ext4_stop_mmpd is called, causing a read I/O failure. This triggers the ext4_handle_error function that ultimately re-arms the timer, leaving the s_err_report timer active before kfree(sbi) is called. Fix the issue by canceling the s_err_report timer after calling ext4_stop_mmpd. Signed-off-by:
Xiaxi Shen <shenxiaxi26@gmail.com> Reported-and-tested-by:
<syzbot+59e0101c430934bc9a36@syzkaller.appspotmail.com> Closes: https://syzkaller.appspot.com/bug?extid=59e0101c430934bc9a36 Link: https://patch.msgid.link/20240715043336.98097-1-shenxiaxi26@gmail.com Signed-off-by:
Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by:
Xiangyu Chen <xiangyu.chen@windriver.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
[ Upstream commit d1bc560e ] Add nested locking with I_MUTEX_XATTR subclass to avoid lockdep warning while handling xattr inode on file open syscall at ext4_xattr_inode_iget. Backtrace EXT4-fs (loop0): Ignoring removed oldalloc option ====================================================== WARNING: possible circular locking dependency detected 5.10.0-syzkaller #0 Not tainted ------------------------------------------------------ syz-executor543/2794 is trying to acquire lock: ffff8880215e1a48 (&ea_inode->i_rwsem#7/1){+.+.}-{3:3}, at: inode_lock include/linux/fs.h:782 [inline] ffff8880215e1a48 (&ea_inode->i_rwsem#7/1){+.+.}-{3:3}, at: ext4_xattr_inode_iget+0x42a/0x5c0 fs/ext4/xattr.c:425 but task is already holding lock: ffff8880215e3278 (&ei->i_data_sem/3){++++}-{3:3}, at: ext4_setattr+0x136d/0x19c0 fs/ext4/inode.c:5559 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #1 (&ei->i_data_sem/3){++++}-{3:3}: lock_acquire+0x197/0x480 kernel/locking/lockdep.c:5566 down_write+0x93/0x180 kernel/locking/rwsem.c:1564 ext4_update_i_disksize fs/ext4/ext4.h:3267 [inline] ext4_xattr_inode_write fs/ext4/xattr.c:1390 [inline] ext4_xattr_inode_lookup_create fs/ext4/xattr.c:1538 [inline] ext4_xattr_set_entry+0x331a/0x3d80 fs/ext4/xattr.c:1662 ext4_xattr_ibody_set+0x124/0x390 fs/ext4/xattr.c:2228 ext4_xattr_set_handle+0xc27/0x14e0 fs/ext4/xattr.c:2385 ext4_xattr_set+0x219/0x390 fs/ext4/xattr.c:2498 ext4_xattr_user_set+0xc9/0xf0 fs/ext4/xattr_user.c:40 __vfs_setxattr+0x404/0x450 fs/xattr.c:177 __vfs_setxattr_noperm+0x11d/0x4f0 fs/xattr.c:208 __vfs_setxattr_locked+0x1f9/0x210 fs/xattr.c:266 vfs_setxattr+0x112/0x2c0 fs/xattr.c:283 setxattr+0x1db/0x3e0 fs/xattr.c:548 path_setxattr+0x15a/0x240 fs/xattr.c:567 __do_sys_setxattr fs/xattr.c:582 [inline] __se_sys_setxattr fs/xattr.c:578 [inline] __x64_sys_setxattr+0xc5/0xe0 fs/xattr.c:578 do_syscall_64+0x6d/0xa0 arch/x86/entry/common.c:62 entry_SYSCALL_64_after_hwframe+0x61/0xcb -> #0 (&ea_inode->i_rwsem#7/1){+.+.}-{3:3}: check_prev_add kernel/locking/lockdep.c:2988 [inline] check_prevs_add kernel/locking/lockdep.c:3113 [inline] validate_chain+0x1695/0x58f0 kernel/locking/lockdep.c:3729 __lock_acquire+0x12fd/0x20d0 kernel/locking/lockdep.c:4955 lock_acquire+0x197/0x480 kernel/locking/lockdep.c:5566 down_write+0x93/0x180 kernel/locking/rwsem.c:1564 inode_lock include/linux/fs.h:782 [inline] ext4_xattr_inode_iget+0x42a/0x5c0 fs/ext4/xattr.c:425 ext4_xattr_inode_get+0x138/0x410 fs/ext4/xattr.c:485 ext4_xattr_move_to_block fs/ext4/xattr.c:2580 [inline] ext4_xattr_make_inode_space fs/ext4/xattr.c:2682 [inline] ext4_expand_extra_isize_ea+0xe70/0x1bb0 fs/ext4/xattr.c:2774 __ext4_expand_extra_isize+0x304/0x3f0 fs/ext4/inode.c:5898 ext4_try_to_expand_extra_isize fs/ext4/inode.c:5941 [inline] __ext4_mark_inode_dirty+0x591/0x810 fs/ext4/inode.c:6018 ext4_setattr+0x1400/0x19c0 fs/ext4/inode.c:5562 notify_change+0xbb6/0xe60 fs/attr.c:435 do_truncate+0x1de/0x2c0 fs/open.c:64 handle_truncate fs/namei.c:2970 [inline] do_open fs/namei.c:3311 [inline] path_openat+0x29f3/0x3290 fs/namei.c:3425 do_filp_open+0x20b/0x450 fs/namei.c:3452 do_sys_openat2+0x124/0x460 fs/open.c:1207 do_sys_open fs/open.c:1223 [inline] __do_sys_open fs/open.c:1231 [inline] __se_sys_open fs/open.c:1227 [inline] __x64_sys_open+0x221/0x270 fs/open.c:1227 do_syscall_64+0x6d/0xa0 arch/x86/entry/common.c:62 entry_SYSCALL_64_after_hwframe+0x61/0xcb other info that might help us debug this: Possible unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&ei->i_data_sem/3); lock(&ea_inode->i_rwsem#7/1); lock(&ei->i_data_sem/3); lock(&ea_inode->i_rwsem#7/1); *** DEADLOCK *** 5 locks held by syz-executor543/2794: #0: ffff888026fbc448 (sb_writers#4){.+.+}-{0:0}, at: mnt_want_write+0x4a/0x2a0 fs/namespace.c:365 #1: ffff8880215e3488 (&sb->s_type->i_mutex_key#7){++++}-{3:3}, at: inode_lock include/linux/fs.h:782 [inline] #1: ffff8880215e3488 (&sb->s_type->i_mutex_key#7){++++}-{3:3}, at: do_truncate+0x1cf/0x2c0 fs/open.c:62 #2: ffff8880215e3310 (&ei->i_mmap_sem){++++}-{3:3}, at: ext4_setattr+0xec4/0x19c0 fs/ext4/inode.c:5519 #3: ffff8880215e3278 (&ei->i_data_sem/3){++++}-{3:3}, at: ext4_setattr+0x136d/0x19c0 fs/ext4/inode.c:5559 #4: ffff8880215e30c8 (&ei->xattr_sem){++++}-{3:3}, at: ext4_write_trylock_xattr fs/ext4/xattr.h:162 [inline] #4: ffff8880215e30c8 (&ei->xattr_sem){++++}-{3:3}, at: ext4_try_to_expand_extra_isize fs/ext4/inode.c:5938 [inline] #4: ffff8880215e30c8 (&ei->xattr_sem){++++}-{3:3}, at: __ext4_mark_inode_dirty+0x4fb/0x810 fs/ext4/inode.c:6018 stack backtrace: CPU: 1 PID: 2794 Comm: syz-executor543 Not tainted 5.10.0-syzkaller #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 03/27/2024 Call Trace: __dump_stack lib/dump_stack.c:77 [inline] dump_stack+0x177/0x211 lib/dump_stack.c:118 print_circular_bug+0x146/0x1b0 kernel/locking/lockdep.c:2002 check_noncircular+0x2cc/0x390 kernel/locking/lockdep.c:2123 check_prev_add kernel/locking/lockdep.c:2988 [inline] check_prevs_add kernel/locking/lockdep.c:3113 [inline] validate_chain+0x1695/0x58f0 kernel/locking/lockdep.c:3729 __lock_acquire+0x12fd/0x20d0 kernel/locking/lockdep.c:4955 lock_acquire+0x197/0x480 kernel/locking/lockdep.c:5566 down_write+0x93/0x180 kernel/locking/rwsem.c:1564 inode_lock include/linux/fs.h:782 [inline] ext4_xattr_inode_iget+0x42a/0x5c0 fs/ext4/xattr.c:425 ext4_xattr_inode_get+0x138/0x410 fs/ext4/xattr.c:485 ext4_xattr_move_to_block fs/ext4/xattr.c:2580 [inline] ext4_xattr_make_inode_space fs/ext4/xattr.c:2682 [inline] ext4_expand_extra_isize_ea+0xe70/0x1bb0 fs/ext4/xattr.c:2774 __ext4_expand_extra_isize+0x304/0x3f0 fs/ext4/inode.c:5898 ext4_try_to_expand_extra_isize fs/ext4/inode.c:5941 [inline] __ext4_mark_inode_dirty+0x591/0x810 fs/ext4/inode.c:6018 ext4_setattr+0x1400/0x19c0 fs/ext4/inode.c:5562 notify_change+0xbb6/0xe60 fs/attr.c:435 do_truncate+0x1de/0x2c0 fs/open.c:64 handle_truncate fs/namei.c:2970 [inline] do_open fs/namei.c:3311 [inline] path_openat+0x29f3/0x3290 fs/namei.c:3425 do_filp_open+0x20b/0x450 fs/namei.c:3452 do_sys_openat2+0x124/0x460 fs/open.c:1207 do_sys_open fs/open.c:1223 [inline] __do_sys_open fs/open.c:1231 [inline] __se_sys_open fs/open.c:1227 [inline] __x64_sys_open+0x221/0x270 fs/open.c:1227 do_syscall_64+0x6d/0xa0 arch/x86/entry/common.c:62 entry_SYSCALL_64_after_hwframe+0x61/0xcb RIP: 0033:0x7f0cde4ea229 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 21 18 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007ffd81d1c978 EFLAGS: 00000246 ORIG_RAX: 0000000000000002 RAX: ffffffffffffffda RBX: 0030656c69662f30 RCX: 00007f0cde4ea229 RDX: 0000000000000089 RSI: 00000000000a0a00 RDI: 00000000200001c0 RBP: 2f30656c69662f2e R08: 0000000000208000 R09: 0000000000208000 R10: 0000000000000000 R11: 0000000000000246 R12: 00007ffd81d1c9c0 R13: 00007ffd81d1ca00 R14: 0000000000080000 R15: 0000000000000003 EXT4-fs error (device loop0): ext4_expand_extra_isize_ea:2730: inode #13: comm syz-executor543: corrupted in-inode xattr Signed-off-by:
Wojciech Gładysz <wojciech.gladysz@infogain.com> Link: https://patch.msgid.link/20240801143827.19135-1-wojciech.gladysz@infogain.com Signed-off-by:
Theodore Ts'o <tytso@mit.edu> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
[ Upstream commit d3476f3d ] When the filesystem is mounted with errors=remount-ro, we were setting SB_RDONLY flag to stop all filesystem modifications. We knew this misses proper locking (sb->s_umount) and does not go through proper filesystem remount procedure but it has been the way this worked since early ext2 days and it was good enough for catastrophic situation damage mitigation. Recently, syzbot has found a way (see link) to trigger warnings in filesystem freezing because the code got confused by SB_RDONLY changing under its hands. Since these days we set EXT4_FLAGS_SHUTDOWN on the superblock which is enough to stop all filesystem modifications, modifying SB_RDONLY shouldn't be needed. So stop doing that. Link: https://lore.kernel.org/all/000000000000b90a8e061e21d12f@google.com Reported-by:
Christian Brauner <brauner@kernel.org> Signed-off-by:
Jan Kara <jack@suse.cz> Reviewed-by:
Christian Brauner <brauner@kernel.org> Link: https://patch.msgid.link/20240805201241.27286-1-jack@suse.cz Signed-off-by:
Theodore Ts'o <tytso@mit.edu> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
commit 04e6ce8f upstream. Calling ext4_fc_mark_ineligible() with a NULL handle is racy and may result in a fast-commit being done before the filesystem is effectively marked as ineligible. This patch moves the call to this function so that an handle can be used. If a transaction fails to start, then there's not point in trying to mark the filesystem as ineligible, and an error will eventually be returned to user-space. Suggested-by:
Jan Kara <jack@suse.cz> Signed-off-by:
Luis Henriques (SUSE) <luis.henriques@linux.dev> Reviewed-by:
Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240923104909.18342-3-luis.henriques@linux.dev Signed-off-by:
Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
commit faab35a0 upstream. Calling ext4_fc_mark_ineligible() with a NULL handle is racy and may result in a fast-commit being done before the filesystem is effectively marked as ineligible. This patch fixes the calls to this function in __track_dentry_update() by adding an extra parameter to the callback used in ext4_fc_track_template(). Suggested-by:
Jan Kara <jack@suse.cz> Signed-off-by:
Luis Henriques (SUSE) <luis.henriques@linux.dev> Reviewed-by:
Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240923104909.18342-2-luis.henriques@linux.dev Signed-off-by:
Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
commit 6db3c157 upstream. When a full journal commit is on-going, any fast commit has to be enqueued into a different queue: FC_Q_STAGING instead of FC_Q_MAIN. This enqueueing is done only once, i.e. if an inode is already queued in a previous fast commit entry it won't be enqueued again. However, if a full commit starts _after_ the inode is enqueued into FC_Q_MAIN, the next fast commit needs to be done into FC_Q_STAGING. And this is not being done in function ext4_fc_track_template(). This patch fixes the issue by re-enqueuing an inode into the STAGING queue during the fast commit clean-up callback when doing a full commit. However, to prevent a race with a fast-commit, the clean-up callback has to be called with the journal locked. This bug was found using fstest generic/047. This test creates several 32k bytes files, sync'ing each of them after it's creation, and then shutting down the filesystem. Some data may be loss in this operation; for example a file may have it's size truncated to zero. Suggested-by:
Jan Kara <jack@suse.cz> Signed-off-by:
Luis Henriques (SUSE) <luis.henriques@linux.dev> Reviewed-by:
Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240717172220.14201-1-luis.henriques@linux.dev Signed-off-by:
Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
commit dd589b0f upstream. Function ext4_wait_for_tail_page_commit() assumes that '0' is not a valid value for transaction IDs, which is incorrect. Don't assume that and invoke jbd2_log_wait_commit() if the journal had a committing transaction instead. Signed-off-by:
Luis Henriques (SUSE) <luis.henriques@linux.dev> Reviewed-by:
Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240724161119.13448-2-luis.henriques@linux.dev Signed-off-by:
Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
commit 5b4b2dca upstream. In ext4_find_extent(), if the path is not big enough, we free it and set *orig_path to NULL. But after reallocating and successfully initializing the path, we don't update *orig_path, in which case the caller gets a valid path but a NULL ppath, and this may cause a NULL pointer dereference or a path memory leak. For example: ext4_split_extent path = *ppath = 2000 ext4_find_extent if (depth > path[0].p_maxdepth) kfree(path = 2000); *orig_path = path = NULL; path = kcalloc() = 3000 ext4_split_extent_at(*ppath = NULL) path = *ppath; ex = path[depth].p_ext; // NULL pointer dereference! ================================================================== BUG: kernel NULL pointer dereference, address: 0000000000000010 CPU: 6 UID: 0 PID: 576 Comm: fsstress Not tainted 6.11.0-rc2-dirty #847 RIP: 0010:ext4_split_extent_at+0x6d/0x560 Call Trace: <TASK> ext4_split_extent.isra.0+0xcb/0x1b0 ext4_ext_convert_to_initialized+0x168/0x6c0 ext4_ext_handle_unwritten_extents+0x325/0x4d0 ext4_ext_map_blocks+0x520/0xdb0 ext4_map_blocks+0x2b0/0x690 ext4_iomap_begin+0x20e/0x2c0 [...] ================================================================== Therefore, *orig_path is updated when the extent lookup succeeds, so that the caller can safely use path or *ppath. Fixes: 10809df8 ("ext4: teach ext4_ext_find_extent() to realloc path if necessary") Cc: stable@kernel.org Signed-off-by:
Baokun Li <libaokun1@huawei.com> Reviewed-by:
Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240822023545.1994557-6-libaokun@huaweicloud.com Signed-off-by:
Theodore Ts'o <tytso@mit.edu> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
commit dcaa6c31 upstream. In ext4_ext_try_to_merge_up(), set path[1].p_bh to NULL after it has been released, otherwise it may be released twice. An example of what triggers this is as follows: split2 map split1 |--------|-------|--------| ext4_ext_map_blocks ext4_ext_handle_unwritten_extents ext4_split_convert_extents // path->p_depth == 0 ext4_split_extent // 1. do split1 ext4_split_extent_at |ext4_ext_insert_extent | ext4_ext_create_new_leaf | ext4_ext_grow_indepth | le16_add_cpu(&neh->eh_depth, 1) | ext4_find_extent | // return -ENOMEM |// get error and try zeroout |path = ext4_find_extent | path->p_depth = 1 |ext4_ext_try_to_merge | ext4_ext_try_to_merge_up | path->p_depth = 0 | brelse(path[1].p_bh) ---> not set to NULL here |// zeroout success // 2. update path ext4_find_extent // 3. do split2 ext4_split_extent_at ext4_ext_insert_extent ext4_ext_create_new_leaf ext4_ext_grow_indepth le16_add_cpu(&neh->eh_depth, 1) ext4_find_extent path[0].p_bh = NULL; path->p_depth = 1 read_extent_tree_block ---> return err // path[1].p_bh is still the old value ext4_free_ext_path ext4_ext_drop_refs // path->p_depth == 1 brelse(path[1].p_bh) ---> brelse a buffer twice Finally got the following WARRNING when removing the buffer from lru: ============================================ VFS: brelse: Trying to free free buffer WARNING: CPU: 2 PID: 72 at fs/buffer.c:1241 __brelse+0x58/0x90 CPU: 2 PID: 72 Comm: kworker/u19:1 Not tainted 6.9.0-dirty #716 RIP: 0010:__brelse+0x58/0x90 Call Trace: <TASK> __find_get_block+0x6e7/0x810 bdev_getblk+0x2b/0x480 __ext4_get_inode_loc+0x48a/0x1240 ext4_get_inode_loc+0xb2/0x150 ext4_reserve_inode_write+0xb7/0x230 __ext4_mark_inode_dirty+0x144/0x6a0 ext4_ext_insert_extent+0x9c8/0x3230 ext4_ext_map_blocks+0xf45/0x2dc0 ext4_map_blocks+0x724/0x1700 ext4_do_writepages+0x12d6/0x2a70 [...] ============================================ Fixes: ecb94f5f ("ext4: collapse a single extent tree block into the inode if possible") Cc: stable@kernel.org Signed-off-by:
Baokun Li <libaokun1@huawei.com> Reviewed-by:
Jan Kara <jack@suse.cz> Reviewed-by:
Ojaswin Mujoo <ojaswin@linux.ibm.com> Tested-by:
Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://patch.msgid.link/20240822023545.1994557-9-libaokun@huaweicloud.com Signed-off-by:
Theodore Ts'o <tytso@mit.edu> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
commit a164f3a4 upstream. As Ojaswin mentioned in Link, in ext4_ext_insert_extent(), if the path is reallocated in ext4_ext_create_new_leaf(), we'll use the stale path and cause UAF. Below is a sample trace with dummy values: ext4_ext_insert_extent path = *ppath = 2000 ext4_ext_create_new_leaf(ppath) ext4_find_extent(ppath) path = *ppath = 2000 if (depth > path[0].p_maxdepth) kfree(path = 2000); *ppath = path = NULL; path = kcalloc() = 3000 *ppath = 3000; return path; /* here path is still 2000, UAF! */ eh = path[depth].p_hdr ================================================================== BUG: KASAN: slab-use-after-free in ext4_ext_insert_extent+0x26d4/0x3330 Read of size 8 at addr ffff8881027bf7d0 by task kworker/u36:1/179 CPU: 3 UID: 0 PID: 179 Comm: kworker/u6:1 Not tainted 6.11.0-rc2-dirty #866 Call Trace: <TASK> ext4_ext_insert_extent+0x26d4/0x3330 ext4_ext_map_blocks+0xe22/0x2d40 ext4_map_blocks+0x71e/0x1700 ext4_do_writepages+0x1290/0x2800 [...] Allocated by task 179: ext4_find_extent+0x81c/0x1f70 ext4_ext_map_blocks+0x146/0x2d40 ext4_map_blocks+0x71e/0x1700 ext4_do_writepages+0x1290/0x2800 ext4_writepages+0x26d/0x4e0 do_writepages+0x175/0x700 [...] Freed by task 179: kfree+0xcb/0x240 ext4_find_extent+0x7c0/0x1f70 ext4_ext_insert_extent+0xa26/0x3330 ext4_ext_map_blocks+0xe22/0x2d40 ext4_map_blocks+0x71e/0x1700 ext4_do_writepages+0x1290/0x2800 ext4_writepages+0x26d/0x4e0 do_writepages+0x175/0x700 [...] ================================================================== So use *ppath to update the path to avoid the above problem. Reported-by:
Ojaswin Mujoo <ojaswin@linux.ibm.com> Closes: https://lore.kernel.org/r/ZqyL6rmtwl6N4MWR@li-bb2b2a4c-3307-11b2-a85c-8fa5c3a69313.ibm.com Fixes: 10809df8 ("ext4: teach ext4_ext_find_extent() to realloc path if necessary") Cc: stable@kernel.org Signed-off-by:
Baokun Li <libaokun1@huawei.com> Reviewed-by:
Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240822023545.1994557-7-libaokun@huaweicloud.com Signed-off-by:
Theodore Ts'o <tytso@mit.edu> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
commit 5c0f4cc8 upstream. When calling ext4_force_split_extent_at() in ext4_ext_replay_update_ex(), the 'ppath' is updated but it is the 'path' that is freed, thus potentially triggering a double-free in the following process: ext4_ext_replay_update_ex ppath = path ext4_force_split_extent_at(&ppath) ext4_split_extent_at ext4_ext_insert_extent ext4_ext_create_new_leaf ext4_ext_grow_indepth ext4_find_extent if (depth > path[0].p_maxdepth) kfree(path) ---> path First freed *orig_path = path = NULL ---> null ppath kfree(path) ---> path double-free !!! So drop the unnecessary ppath and use path directly to avoid this problem. And use ext4_find_extent() directly to update path, avoiding unnecessary memory allocation and freeing. Also, propagate the error returned by ext4_find_extent() instead of using strange error codes. Fixes: 8016e29f ("ext4: fast commit recovery path") Cc: stable@kernel.org Signed-off-by:
Baokun Li <libaokun1@huawei.com> Reviewed-by:
Jan Kara <jack@suse.cz> Reviewed-by:
Ojaswin Mujoo <ojaswin@linux.ibm.com> Tested-by:
Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://patch.msgid.link/20240822023545.1994557-8-libaokun@huaweicloud.com Signed-off-by:
Theodore Ts'o <tytso@mit.edu> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
commit dda898d7 upstream. The dax_iomap_rw() does two things in each iteration: map written blocks and copy user data to blocks. If the process is killed by user(See signal handling in dax_iomap_iter()), the copied data will be returned and added on inode size, which means that the length of written extents may exceed the inode size, then fsck will fail. An example is given as: dd if=/dev/urandom of=file bs=4M count=1 dax_iomap_rw iomap_iter // round 1 ext4_iomap_begin ext4_iomap_alloc // allocate 0~2M extents(written flag) dax_iomap_iter // copy 2M data iomap_iter // round 2 iomap_iter_advance iter->pos += iter->processed // iter->pos = 2M ext4_iomap_begin ext4_iomap_alloc // allocate 2~4M extents(written flag) dax_iomap_iter fatal_signal_pending done = iter->pos - iocb->ki_pos // done = 2M ext4_handle_inode_extension ext4_update_inode_size // inode size = 2M fsck reports: Inode 13, i_size is 2097152, should be 4194304. Fix? Fix the problem by truncating extents if the written length is smaller than expected. Fixes: 776722e8 ("ext4: DAX iomap write support") CC: stable@vger.kernel.org Link: https://bugzilla.kernel.org/show_bug.cgi?id=219136 Signed-off-by:
Zhihao Cheng <chengzhihao1@huawei.com> Reviewed-by:
Jan Kara <jack@suse.cz> Reviewed-by:
Zhihao Cheng <chengzhihao1@huawei.com> Link: https://patch.msgid.link/20240809121532.2105494-1-chengzhihao@huaweicloud.com Signed-off-by:
Theodore Ts'o <tytso@mit.edu> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
commit ebc4b2c1 upstream. Function jbd2_journal_shrink_checkpoint_list() assumes that '0' is not a valid value for transaction IDs, which is incorrect. Furthermore, the sbi->s_fc_ineligible_tid handling also makes the same assumption by being initialised to '0'. Fortunately, the sb flag EXT4_MF_FC_INELIGIBLE can be used to check whether sbi->s_fc_ineligible_tid has been previously set instead of comparing it with '0'. Signed-off-by:
Luis Henriques (SUSE) <luis.henriques@linux.dev> Reviewed-by:
Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240724161119.13448-5-luis.henriques@linux.dev Signed-off-by:
Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
commit 369c944e upstream. Even though ext4_find_extent() returns an error, ext4_insert_range() still returns 0. This may confuse the user as to why fallocate returns success, but the contents of the file are not as expected. So propagate the error returned by ext4_find_extent() to avoid inconsistencies. Fixes: 331573fe ("ext4: Add support FALLOC_FL_INSERT_RANGE for fallocate") Cc: stable@kernel.org Signed-off-by:
Baokun Li <libaokun1@huawei.com> Reviewed-by:
Jan Kara <jack@suse.cz> Reviewed-by:
Ojaswin Mujoo <ojaswin@linux.ibm.com> Tested-by:
Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://patch.msgid.link/20240822023545.1994557-11-libaokun@huaweicloud.com Signed-off-by:
Theodore Ts'o <tytso@mit.edu> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
commit c26ab357 upstream. We hit the following use-after-free: ================================================================== BUG: KASAN: slab-use-after-free in ext4_split_extent_at+0xba8/0xcc0 Read of size 2 at addr ffff88810548ed08 by task kworker/u20:0/40 CPU: 0 PID: 40 Comm: kworker/u20:0 Not tainted 6.9.0-dirty #724 Call Trace: <TASK> kasan_report+0x93/0xc0 ext4_split_extent_at+0xba8/0xcc0 ext4_split_extent.isra.0+0x18f/0x500 ext4_split_convert_extents+0x275/0x750 ext4_ext_handle_unwritten_extents+0x73e/0x1580 ext4_ext_map_blocks+0xe20/0x2dc0 ext4_map_blocks+0x724/0x1700 ext4_do_writepages+0x12d6/0x2a70 [...] Allocated by task 40: __kmalloc_noprof+0x1ac/0x480 ext4_find_extent+0xf3b/0x1e70 ext4_ext_map_blocks+0x188/0x2dc0 ext4_map_blocks+0x724/0x1700 ext4_do_writepages+0x12d6/0x2a70 [...] Freed by task 40: kfree+0xf1/0x2b0 ext4_find_extent+0xa71/0x1e70 ext4_ext_insert_extent+0xa22/0x3260 ext4_split_extent_at+0x3ef/0xcc0 ext4_split_extent.isra.0+0x18f/0x500 ext4_split_convert_extents+0x275/0x750 ext4_ext_handle_unwritten_extents+0x73e/0x1580 ext4_ext_map_blocks+0xe20/0x2dc0 ext4_map_blocks+0x724/0x1700 ext4_do_writepages+0x12d6/0x2a70 [...] ================================================================== The flow of issue triggering is as follows: ext4_split_extent_at path = *ppath ext4_ext_insert_extent(ppath) ext4_ext_create_new_leaf(ppath) ext4_find_extent(orig_path) path = *orig_path read_extent_tree_block // return -ENOMEM or -EIO ext4_free_ext_path(path) kfree(path) *orig_path = NULL a. If err is -ENOMEM: ext4_ext_dirty(path + path->p_depth) // path use-after-free !!! b. If err is -EIO and we have EXT_DEBUG defined: ext4_ext_show_leaf(path) eh = path[depth].p_hdr // path also use-after-free !!! So when trying to zeroout or fix the extent length, call ext4_find_extent() to update the path. In addition we use *ppath directly as an ext4_ext_show_leaf() input to avoid possible use-after-free when EXT_DEBUG is defined, and to avoid unnecessary path updates. Fixes: dfe50809 ("ext4: drop EXT4_EX_NOFREE_ON_ERR from rest of extents handling code") Cc: stable@kernel.org Signed-off-by:
Baokun Li <libaokun1@huawei.com> Reviewed-by:
Jan Kara <jack@suse.cz> Reviewed-by:
Ojaswin Mujoo <ojaswin@linux.ibm.com> Tested-by:
Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://patch.msgid.link/20240822023545.1994557-4-libaokun@huaweicloud.com Signed-off-by:
Theodore Ts'o <tytso@mit.edu> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
commit 70dd7b57 upstream. EXT4_DIRENT_HASH and EXT4_DIRENT_MINOR_HASH will access struct ext4_dir_entry_hash followed ext4_dir_entry. But there is no ext4_dir_entry_hash followed when inode is encrypted and not casefolded Signed-off-by:
yao.ly <yao.ly@linux.alibaba.com> Link: https://patch.msgid.link/1719816219-128287-1-git-send-email-yao.ly@linux.alibaba.com Signed-off-by:
Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
commit 1a00a393 upstream. Fixes: ac27a0ec ("[PATCH] ext4: initial copy of files from ext3") Reported-by:
<syzbot+ae688d469e36fb5138d0@syzkaller.appspotmail.com> Closes: https://syzkaller.appspot.com/bug?extid=ae688d469e36fb5138d0 Signed-off-by:
Edward Adam Davis <eadavis@qq.com> Reported-and-tested-by:
<syzbot+ae688d469e36fb5138d0@syzkaller.appspotmail.com> Link: https://patch.msgid.link/tencent_BE7AEE6C7C2D216CB8949CE8E6EE7ECC2C0A@qq.com Signed-off-by:
Theodore Ts'o <tytso@mit.edu> Cc: stable@kernel.org Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
[ Upstream commit cc749e61 ] Fuzzing reports a possible deadlock in jbd2_log_wait_commit. This issue is triggered when an EXT4_IOC_MIGRATE ioctl is set to require synchronous updates because the file descriptor is opened with O_SYNC. This can lead to the jbd2_journal_stop() function calling jbd2_might_wait_for_commit(), potentially causing a deadlock if the EXT4_IOC_MIGRATE call races with a write(2) system call. This problem only arises when CONFIG_PROVE_LOCKING is enabled. In this case, the jbd2_might_wait_for_commit macro locks jbd2_handle in the jbd2_journal_stop function while i_data_sem is locked. This triggers lockdep because the jbd2_journal_start function might also lock the same jbd2_handle simultaneously. Found by Linux Verification Center (linuxtesting.org) with syzkaller. Reviewed-by:
Ritesh Harjani (IBM) <ritesh.list@gmail.com> Co-developed-by:
Mikhail Ukhin <mish.uxin2012@yandex.ru> Signed-off-by:
Mikhail Ukhin <mish.uxin2012@yandex.ru> Signed-off-by:
Artem Sadovnikov <ancowi69@gmail.com> Rule: add Link: https://lore.kernel.org/stable/20240404095000.5872-1-mish.uxin2012%40yandex.ru Link: https://patch.msgid.link/20240829152210.2754-1-ancowi69@gmail.com Signed-off-by:
Theodore Ts'o <tytso@mit.edu> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
[ Upstream commit 4e2524ba ] In ext4_find_extent(), path may be freed by error or be reallocated, so using a previously saved *ppath may have been freed and thus may trigger use-after-free, as follows: ext4_split_extent path = *ppath; ext4_split_extent_at(ppath) path = ext4_find_extent(ppath) ext4_split_extent_at(ppath) // ext4_find_extent fails to free path // but zeroout succeeds ext4_ext_show_leaf(inode, path) eh = path[depth].p_hdr // path use-after-free !!! Similar to ext4_split_extent_at(), we use *ppath directly as an input to ext4_ext_show_leaf(). Fix a spelling error by the way. Same problem in ext4_ext_handle_unwritten_extents(). Since 'path' is only used in ext4_ext_show_leaf(), remove 'path' and use *ppath directly. This issue is triggered only when EXT_DEBUG is defined and therefore does not affect functionality. Signed-off-by:
Baokun Li <libaokun1@huawei.com> Reviewed-by:
Jan Kara <jack@suse.cz> Reviewed-by:
Ojaswin Mujoo <ojaswin@linux.ibm.com> Tested-by:
Ojaswin Mujoo <ojaswin@linux.ibm.com> Link: https://patch.msgid.link/20240822023545.1994557-5-libaokun@huaweicloud.com Signed-off-by:
Theodore Ts'o <tytso@mit.edu> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
[ Upstream commit cd69f8f9 ] ext4_search_dir currently returns -1 in case of a failure, while it returns 0 when the name is not found. In such failure cases, it should return an error code instead. This becomes even more important when ext4_find_inline_entry returns an error code as well in the next commit. -EFSCORRUPTED seems appropriate as such error code as these failures would be caused by unexpected record lengths and is in line with other instances of ext4_check_dir_entry failures. In the case of ext4_dx_find_entry, the current use of ERR_BAD_DX_DIR was left as is to reduce the risk of regressions. Signed-off-by:
Thadeu Lima de Souza Cascardo <cascardo@igalia.com> Link: https://patch.msgid.link/20240821152324.3621860-2-cascardo@igalia.com Signed-off-by:
Theodore Ts'o <tytso@mit.edu> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
[ Upstream commit c6b72f5d ] When looking up for an entry in an inlined directory, if e_value_offs is changed underneath the filesystem by some change in the block device, it will lead to an out-of-bounds access that KASAN detects as an UAF. EXT4-fs (loop0): mounted filesystem 00000000-0000-0000-0000-000000000000 r/w without journal. Quota mode: none. loop0: detected capacity change from 2048 to 2047 ================================================================== BUG: KASAN: use-after-free in ext4_search_dir+0xf2/0x1c0 fs/ext4/namei.c:1500 Read of size 1 at addr ffff88803e91130f by task syz-executor269/5103 CPU: 0 UID: 0 PID: 5103 Comm: syz-executor269 Not tainted 6.11.0-rc4-syzkaller #0 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-debian-1.16.3-2~bpo12+1 04/01/2014 Call Trace: <TASK> __dump_stack lib/dump_stack.c:93 [inline] dump_stack_lvl+0x241/0x360 lib/dump_stack.c:119 print_address_description mm/kasan/report.c:377 [inline] print_report+0x169/0x550 mm/kasan/report.c:488 kasan_report+0x143/0x180 mm/kasan/report.c:601 ext4_search_dir+0xf2/0x1c0 fs/ext4/namei.c:1500 ext4_find_inline_entry+0x4be/0x5e0 fs/ext4/inline.c:1697 __ext4_find_entry+0x2b4/0x1b30 fs/ext4/namei.c:1573 ext4_lookup_entry fs/ext4/namei.c:1727 [inline] ext4_lookup+0x15f/0x750 fs/ext4/namei.c:1795 lookup_one_qstr_excl+0x11f/0x260 fs/namei.c:1633 filename_create+0x297/0x540 fs/namei.c:3980 do_symlinkat+0xf9/0x3a0 fs/namei.c:4587 __do_sys_symlinkat fs/namei.c:4610 [inline] __se_sys_symlinkat fs/namei.c:4607 [inline] __x64_sys_symlinkat+0x95/0xb0 fs/namei.c:4607 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f RIP: 0033:0x7f3e73ced469 Code: 28 00 00 00 75 05 48 83 c4 28 c3 e8 21 18 00 00 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 c7 c1 b8 ff ff ff f7 d8 64 89 01 48 RSP: 002b:00007fff4d40c258 EFLAGS: 00000246 ORIG_RAX: 000000000000010a RAX: ffffffffffffffda RBX: 0032656c69662f2e RCX: 00007f3e73ced469 RDX: 0000000020000200 RSI: 00000000ffffff9c RDI: 00000000200001c0 RBP: 0000000000000000 R08: 00007fff4d40c290 R09: 00007fff4d40c290 R10: 0023706f6f6c2f76 R11: 0000000000000246 R12: 00007fff4d40c27c R13: 0000000000000003 R14: 431bde82d7b634db R15: 00007fff4d40c2b0 </TASK> Calling ext4_xattr_ibody_find right after reading the inode with ext4_get_inode_loc will lead to a check of the validity of the xattrs, avoiding this problem. Reported-by:
<syzbot+0c2508114d912a54ee79@syzkaller.appspotmail.com> Closes: https://syzkaller.appspot.com/bug?extid=0c2508114d912a54ee79 Fixes: e8e948e7 ("ext4: let ext4_find_entry handle inline data") Signed-off-by:
Thadeu Lima de Souza Cascardo <cascardo@igalia.com> Link: https://patch.msgid.link/20240821152324.3621860-5-cascardo@igalia.com Signed-off-by:
Theodore Ts'o <tytso@mit.edu> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
[ Upstream commit 4d231b91 ] In case of errors when reading an inode from disk or traversing inline directory entries, return an error-encoded ERR_PTR instead of returning NULL. ext4_find_inline_entry only caller, __ext4_find_entry already returns such encoded errors. Signed-off-by:
Thadeu Lima de Souza Cascardo <cascardo@igalia.com> Link: https://patch.msgid.link/20240821152324.3621860-3-cascardo@igalia.com Signed-off-by:
Theodore Ts'o <tytso@mit.edu> Stable-dep-of: c6b72f5d ("ext4: avoid OOB when system.data xattr changes underneath the filesystem") Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
[ Upstream commit bb0a12c3 ] min_clusters is signed integer and will be converted to unsigned integer when compared with unsigned number stats.free_clusters. If min_clusters is negative, it will be converted to a huge unsigned value in which case all groups may not meet the actual desired free clusters. Set negative min_clusters to 0 to avoid unexpected behavior. Fixes: ac27a0ec ("[PATCH] ext4: initial copy of files from ext3") Signed-off-by:
Kemeng Shi <shikemeng@huaweicloud.com> Link: https://patch.msgid.link/20240820132234.2759926-4-shikemeng@huaweicloud.com Signed-off-by:
Theodore Ts'o <tytso@mit.edu> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
[ Upstream commit 227d31b9 ] If a group is marked EXT4_GROUP_INFO_IBITMAP_CORRUPT after it's inode bitmap buffer_head was successfully verified, then __ext4_new_inode() will get a valid inode_bitmap_bh of a corrupted group from ext4_read_inode_bitmap() in which case inode_bitmap_bh misses a release. Hnadle "IS_ERR(inode_bitmap_bh)" and group corruption separately like how ext4_free_inode() does to avoid buffer_head leak. Fixes: 9008a58e ("ext4: make the bitmap read routines return real error codes") Signed-off-by:
Kemeng Shi <shikemeng@huaweicloud.com> Link: https://patch.msgid.link/20240820132234.2759926-3-shikemeng@huaweicloud.com Signed-off-by:
Theodore Ts'o <tytso@mit.edu> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
[ Upstream commit 5e5b2a56 ] Release inode_bitmap_bh from ext4_read_inode_bitmap() in ext4_mark_inode_used() to avoid buffer_head leak. By the way, remove unneeded goto for invalid ino when inode_bitmap_bh is NULL. Fixes: 8016e29f ("ext4: fast commit recovery path") Signed-off-by:
Kemeng Shi <shikemeng@huaweicloud.com> Link: https://patch.msgid.link/20240820132234.2759926-2-shikemeng@huaweicloud.com Signed-off-by:
Theodore Ts'o <tytso@mit.edu> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
[ Upstream commit 20cee68f ] Commit 3d56b8d2 ("ext4: Speed up FITRIM by recording flags in ext4_group_info") speed up fstrim by skipping trim trimmed group. We also has the chance to clear trimmed once there exists some block free for this group(mount without discard), and the next trim for this group will work well too. For mount with discard, we will issue dicard when we free blocks, so leave trimmed flag keep alive to skip useless trim trigger from userspace seems reasonable. But for some case like ext4 build on dm-thinpool(ext4 blocksize 4K, pool blocksize 128K), discard from ext4 maybe unaligned for dm thinpool, and thinpool will just finish this discard(see process_discard_bio when begein equals to end) without actually process discard. For this case, trim from userspace can really help us to free some thinpool block. So convert to clear trimmed flag for all case no matter mounted with discard or not. Fixes: 3d56b8d2 ("ext4: Speed up FITRIM by recording flags in ext4_group_info") Signed-off-by:
yangerkun <yangerkun@huawei.com> Reviewed-by:
Jan Kara <jack@suse.cz> Link: https://patch.msgid.link/20240817085510.2084444-1-yangerkun@huaweicloud.com Signed-off-by:
Theodore Ts'o <tytso@mit.edu> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
[ Upstream commit 63469662 ] In the fast commit code there are a few places where tid_t variables are being compared without taking into account the fact that these sequence numbers may wrap. Fix this issue by using the helper functions tid_gt() and tid_geq(). Signed-off-by:
Luis Henriques (SUSE) <luis.henriques@linux.dev> Reviewed-by:
Jan Kara <jack@suse.cz> Reviewed-by:
Harshad Shirwadkar <harshadshirwadkar@gmail.com> Link: https://patch.msgid.link/20240529092030.9557-3-luis.henriques@linux.dev Signed-off-by:
Theodore Ts'o <tytso@mit.edu> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
- Sep 17, 2024
-
-
Jan Kara authored
commit 04e568a3 upstream. Since we want to transition transaction commits to use ext4_writepages() for writing back ordered, add handling of page redirtying into ext4_bio_write_page(). Also move buffer dirty bit clearing into the same place other buffer state handling. Reviewed-by:
Ritesh Harjani (IBM) <ritesh.list@gmail.com> Signed-off-by:
Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20221207112722.22220-1-jack@suse.cz Signed-off-by:
Theodore Ts'o <tytso@mit.edu> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Eric Biggers authored
commit 8216776c upstream. It is invalid for the casefold inode flag to be set without the casefold superblock feature flag also being set. e2fsck already considers this case to be invalid and handles it by offering to clear the casefold flag on the inode. __ext4_iget() also already considered this to be invalid, sort of, but it only got so far as logging an error message; it didn't actually reject the inode. Make it reject the inode so that other code doesn't have to handle this case. This matches what f2fs does. Note: we could check 's_encoding != NULL' instead of ext4_has_feature_casefold(). This would make the check robust against the casefold feature being enabled by userspace writing to the page cache of the mounted block device. However, it's unsolvable in general for filesystems to be robust against concurrent writes to the page cache of the mounted block device. Though this very particular scenario involving the casefold feature is solvable, we should not pretend that we can support this model, so let's just check the casefold feature. tune2fs already forbids enabling casefold on a mounted filesystem. Signed-off-by:
Eric Biggers <ebiggers@google.com> Link: https://lore.kernel.org/r/20230814182903.37267-2-ebiggers@kernel.org Signed-off-by:
Theodore Ts'o <tytso@mit.edu> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
zhanchengbin authored
commit 3f542479 upstream. If ENOMEM fails when the extent is splitting, we need to restore the length of the split extent. In the ext4_split_extent_at function, only in ext4_ext_create_new_leaf will it alloc memory and change the shape of the extent tree,even if an ENOMEM is returned at this time, the extent tree is still self-consistent, Just restore the split extent lens in the function ext4_split_extent_at. ext4_split_extent_at ext4_ext_insert_extent ext4_ext_create_new_leaf 1)ext4_ext_split ext4_find_extent 2)ext4_ext_grow_indepth ext4_find_extent Signed-off-by:
zhanchengbin <zhanchengbin1@huawei.com> Reviewed-by:
Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20230103022812.130603-1-zhanchengbin1@huawei.com Signed-off-by:
Theodore Ts'o <tytso@mit.edu> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Baokun Li authored
[ Upstream commit 261341a9 ] The max_zeroout is of type int and the s_extent_max_zeroout_kb is of type uint, and the s_extent_max_zeroout_kb can be freely modified via the sysfs interface. When the block size is 1024, max_zeroout may overflow, so declare it as unsigned int to avoid overflow. Signed-off-by:
Baokun Li <libaokun1@huawei.com> Reviewed-by:
Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20240319113325.3110393-9-libaokun1@huawei.com Signed-off-by:
Theodore Ts'o <tytso@mit.edu> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Baokun Li authored
[ Upstream commit 17220215 ] Otherwise operating on an incorrupted block bitmap can lead to all sorts of unknown problems. Signed-off-by:
Baokun Li <libaokun1@huawei.com> Reviewed-by:
Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20240104142040.2835097-3-libaokun1@huawei.com Signed-off-by:
Theodore Ts'o <tytso@mit.edu> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Jan Kara authored
[ Upstream commit 0a46ef23 ] ext4_xattr_set_entry() creates new EA inodes while holding buffer lock on the external xattr block. This is problematic as it nests all the allocation locking (which acquires locks on other buffers) under the buffer lock. This can even deadlock when the filesystem is corrupted and e.g. quota file is setup to contain xattr block as data block. Move the allocation of EA inode out of ext4_xattr_set_entry() into the callers. Reported-by:
<syzbot+a43d4f48b8397d0e41a9@syzkaller.appspotmail.com> Signed-off-by:
Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20240321162657.27420-2-jack@suse.cz Signed-off-by:
Theodore Ts'o <tytso@mit.edu> Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Jan Kara authored
[ Upstream commit 8208c41c ] When allocating EA inode, quota accounting is done just before ext4_xattr_inode_lookup_create(). Logically these two operations belong together so just fold quota accounting into ext4_xattr_inode_lookup_create(). We also make ext4_xattr_inode_lookup_create() return the looked up / created inode to convert the function to a more standard calling convention. Signed-off-by:
Jan Kara <jack@suse.cz> Link: https://lore.kernel.org/r/20240209112107.10585-1-jack@suse.cz Signed-off-by:
Theodore Ts'o <tytso@mit.edu> Stable-dep-of: 0a46ef23 ("ext4: do not create EA inode under buffer lock") Signed-off-by:
Sasha Levin <sashal@kernel.org>
-
Li Zhong authored
[ Upstream commit 56d0d0b9 ] Check the return value of ext4_xattr_inode_dec_ref(), which could return error code and need to be warned. Signed-off-by:
Li Zhong <floridsleeves@gmail.com> Link: https://lore.kernel.org/r/20220917002816.3804400-1-floridsleeves@gmail.com Signed-off-by:
Theodore Ts'o <tytso@mit.edu> Stable-dep-of: 0a46ef23 ("ext4: do not create EA inode under buffer lock") Signed-off-by:
Sasha Levin <sashal@kernel.org>
-