Skip to content
Snippets Groups Projects
  1. Mar 03, 2025
  2. Jan 14, 2025
    • Ryusuke Konishi's avatar
      nilfs2: fix state management in error path of log writing function · 6d8ac68f
      Ryusuke Konishi authored and Frieder Schrempf's avatar Frieder Schrempf committed
      commit 6576dd66 upstream.
      
      After commit a694291a ("nilfs2: separate wait function from
      nilfs_segctor_write") was applied, the log writing function
      nilfs_segctor_do_construct() was able to issue I/O requests continuously
      even if user data blocks were split into multiple logs across segments,
      but two potential flaws were introduced in its error handling.
      
      First, if nilfs_segctor_begin_construction() fails while creating the
      second or subsequent logs, the log writing function returns without
      calling nilfs_segctor_abort_construction(), so the writeback flag set on
      pages/folios will remain uncleared.  This causes page cache operations to
      hang waiting for the writeback flag.  For example,
      truncate_inode_pages_final(), which is called via nilfs_evict_inode() when
      an inode is evicted from memory, will hang.
      
      Second, the NILFS_I_COLLECTED flag set on normal inodes remain uncleared.
      As a result, if the next log write involves checkpoint creation, that's
      fine, but if a partial log write is performed that does not, inodes with
      NILFS_I_COLLECTED set are erroneously removed from the "sc_dirty_files"
      list, and their data and b-tree blocks may not be written to the device,
      corrupting the block mapping.
      
      Fix these issues by uniformly calling nilfs_segctor_abort_construction()
      on failure of each step in the loop in nilfs_segctor_do_construct(),
      having it clean up logs and segment usages according to progress, and
      correcting the conditions for calling nilfs_redirty_inodes() to ensure
      that the NILFS_I_COLLECTED flag is cleared.
      
      Link: https://lkml.kernel.org/r/20240814101119.4070-1-konishi.ryusuke@gmail.com
      
      
      Fixes: a694291a ("nilfs2: separate wait function from nilfs_segctor_write")
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@gmail.com>
      Tested-by: default avatarRyusuke Konishi <konishi.ryusuke@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      6d8ac68f
  3. Aug 12, 2024
  4. Jul 11, 2024
  5. Mar 11, 2024
  6. Sep 11, 2023
  7. Aug 17, 2023
    • Ryusuke Konishi's avatar
      nilfs2: fix buffer corruption due to concurrent device reads · 8b936ed6
      Ryusuke Konishi authored and Frieder Schrempf's avatar Frieder Schrempf committed
      commit 679bd7eb upstream.
      
      As a result of analysis of a syzbot report, it turned out that in three
      cases where nilfs2 allocates block device buffers directly via sb_getblk,
      concurrent reads to the device can corrupt the allocated buffers.
      
      Nilfs2 uses sb_getblk for segment summary blocks, that make up a log
      header, and the super root block, that is the trailer, and when moving and
      writing the second super block after fs resize.
      
      In any of these, since the uptodate flag is not set when storing metadata
      to be written in the allocated buffers, the stored metadata will be
      overwritten if a device read of the same block occurs concurrently before
      the write.  This causes metadata corruption and misbehavior in the log
      write itself, causing warnings in nilfs_btree_assign() as reported.
      
      Fix these issues by setting an uptodate flag on the buffer head on the
      first or before modifying each buffer obtained with sb_getblk, and
      clearing the flag on failure.
      
      When setting the uptodate flag, the lock_buffer/unlock_buffer pair is used
      to perform necessary exclusive control, and the buffer is filled to ensure
      that uninitialized bytes are not mixed into the data read from others.  As
      for buffers for segment summary blocks, they are filled incrementally, so
      if the uptodate flag was unset on their allocation, set the flag and zero
      fill the buffer once at that point.
      
      Also, regarding the superblock move routine, the starting point of the
      memset call to zerofill the block is incorrectly specified, which can
      cause a buffer overflow on file systems with block sizes greater than
      4KiB.  In addition, if the superblock is moved within a large block, it is
      necessary to assume the possibility that the data in the superblock will
      be destroyed by zero-filling before copying.  So fix these potential
      issues as well.
      
      Link: https://lkml.kernel.org/r/20230609035732.20426-1-konishi.ryusuke@gmail.com
      
      
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@gmail.com>
      Reported-by: default avatar <syzbot+31837fe952932efc8fb9@syzkaller.appspotmail.com>
      Closes: https://lkml.kernel.org/r/00000000000030000a05e981f475@google.com
      
      
      Tested-by: default avatarRyusuke Konishi <konishi.ryusuke@gmail.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      8b936ed6
  8. May 11, 2023
  9. Apr 26, 2023
  10. Apr 13, 2023
  11. Nov 08, 2022
  12. Oct 12, 2022
  13. Oct 03, 2022
  14. Apr 01, 2022
    • Ryusuke Konishi's avatar
      nilfs2: fix lockdep warnings in page operations for btree nodes · e897be17
      Ryusuke Konishi authored
      Patch series "nilfs2 lockdep warning fixes".
      
      The first two are to resolve the lockdep warning issue, and the last one
      is the accompanying cleanup and low priority.
      
      Based on your comment, this series solves the issue by separating inode
      object as needed.  Since I was worried about the impact of the object
      composition changes, I tested the series carefully not to cause
      regressions especially for delicate functions such like disk space
      reclamation and snapshots.
      
      This patch (of 3):
      
      If CONFIG_LOCKDEP is enabled, nilfs2 hits lockdep warnings at
      inode_to_wb() during page/folio operations for btree nodes:
      
        WARNING: CPU: 0 PID: 6575 at include/linux/backing-dev.h:269 inode_to_wb include/linux/backing-dev.h:269 [inline]
        WARNING: CPU: 0 PID: 6575 at include/linux/backing-dev.h:269 folio_account_dirtied mm/page-writeback.c:2460 [inline]
        WARNING: CPU: 0 PID: 6575 at include/linux/backing-dev.h:269 __folio_mark_dirty+0xa7c/0xe30 mm/page-writeback.c:2509
        Modules linked in:
        ...
        RIP: 0010:inode_to_wb include/linux/backing-dev.h:269 [inline]
        RIP: 0010:folio_account_dirtied mm/page-writeback.c:2460 [inline]
        RIP: 0010:__folio_mark_dirty+0xa7c/0xe30 mm/page-writeback.c:2509
        ...
        Call Trace:
          __set_page_dirty include/linux/pagemap.h:834 [inline]
          mark_buffer_dirty+0x4e6/0x650 fs/buffer.c:1145
          nilfs_btree_propagate_p fs/nilfs2/btree.c:1889 [inline]
          nilfs_btree_propagate+0x4ae/0xea0 fs/nilfs2/btree.c:2085
          nilfs_bmap_propagate+0x73/0x170 fs/nilfs2/bmap.c:337
          nilfs_collect_dat_data+0x45/0xd0 fs/nilfs2/segment.c:625
          nilfs_segctor_apply_buffers+0x14a/0x470 fs/nilfs2/segment.c:1009
          nilfs_segctor_scan_file+0x47a/0x700 fs/nilfs2/segment.c:1048
          nilfs_segctor_collect_blocks fs/nilfs2/segment.c:1224 [inline]
          nilfs_segctor_collect fs/nilfs2/segment.c:1494 [inline]
          nilfs_segctor_do_construct+0x14f3/0x6c60 fs/nilfs2/segment.c:2036
          nilfs_segctor_construct+0x7a7/0xb30 fs/nilfs2/segment.c:2372
          nilfs_segctor_thread_construct fs/nilfs2/segment.c:2480 [inline]
          nilfs_segctor_thread+0x3c3/0xf90 fs/nilfs2/segment.c:2563
          kthread+0x405/0x4f0 kernel/kthread.c:327
          ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:295
      
      This is because nilfs2 uses two page caches for each inode and
      inode->i_mapping never points to one of them, the btree node cache.
      
      This causes inode_to_wb(inode) to refer to a different page cache than
      the caller page/folio operations such like __folio_start_writeback(),
      __folio_end_writeback(), or __folio_mark_dirty() acquired the lock.
      
      This patch resolves the issue by allocating and using an additional
      inode to hold the page cache of btree nodes.  The inode is attached
      one-to-one to the traditional nilfs2 inode if it requires a block
      mapping with b-tree.  This setup change is in memory only and does not
      affect the disk format.
      
      Link: https://lkml.kernel.org/r/1647867427-30498-1-git-send-email-konishi.ryusuke@gmail.com
      Link: https://lkml.kernel.org/r/1647867427-30498-2-git-send-email-konishi.ryusuke@gmail.com
      Link: https://lore.kernel.org/r/YXrYvIo8YRnAOJCj@casper.infradead.org
      Link: https://lore.kernel.org/r/9a20b33d-b38f-b4a2-4742-c1eb5b8e4d6c@redhat.com
      
      
      Signed-off-by: default avatarRyusuke Konishi <konishi.ryusuke@gmail.com>
      Reported-by: default avatar <syzbot+0d5b462a6f07447991b3@syzkaller.appspotmail.com>
      Reported-by: default avatar <syzbot+34ef28bb2aeb28724aa0@syzkaller.appspotmail.com>
      Reported-by: default avatarHao Sun <sunhao.th@gmail.com>
      Reported-by: default avatarDavid Hildenbrand <david@redhat.com>
      Tested-by: default avatarRyusuke Konishi <konishi.ryusuke@gmail.com>
      Cc: Matthew Wilcox <willy@infradead.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      e897be17
  15. Nov 09, 2021
  16. May 07, 2021
  17. Dec 16, 2020
  18. Aug 23, 2020
  19. Aug 12, 2020
  20. Jun 11, 2020
  21. Sep 04, 2018
  22. Feb 07, 2018
  23. Nov 27, 2017
    • Linus Torvalds's avatar
      Rename superblock flags (MS_xyz -> SB_xyz) · 1751e8a6
      Linus Torvalds authored
      
      This is a pure automated search-and-replace of the internal kernel
      superblock flags.
      
      The s_flags are now called SB_*, with the names and the values for the
      moment mirroring the MS_* flags that they're equivalent to.
      
      Note how the MS_xyz flags are the ones passed to the mount system call,
      while the SB_xyz flags are what we then use in sb->s_flags.
      
      The script to do this was:
      
          # places to look in; re security/*: it generally should *not* be
          # touched (that stuff parses mount(2) arguments directly), but
          # there are two places where we really deal with superblock flags.
          FILES="drivers/mtd drivers/staging/lustre fs ipc mm \
                  include/linux/fs.h include/uapi/linux/bfs_fs.h \
                  security/apparmor/apparmorfs.c security/apparmor/include/lib.h"
          # the list of MS_... constants
          SYMS="RDONLY NOSUID NODEV NOEXEC SYNCHRONOUS REMOUNT MANDLOCK \
                DIRSYNC NOATIME NODIRATIME BIND MOVE REC VERBOSE SILENT \
                POSIXACL UNBINDABLE PRIVATE SLAVE SHARED RELATIME KERNMOUNT \
                I_VERSION STRICTATIME LAZYTIME SUBMOUNT NOREMOTELOCK NOSEC BORN \
                ACTIVE NOUSER"
      
          SED_PROG=
          for i in $SYMS; do SED_PROG="$SED_PROG -e s/MS_$i/SB_$i/g"; done
      
          # we want files that contain at least one of MS_...,
          # with fs/namespace.c and fs/pnode.c excluded.
          L=$(for i in $SYMS; do git grep -w -l MS_$i $FILES; done| sort|uniq|grep -v '^fs/namespace.c'|grep -v '^fs/pnode.c')
      
          for f in $L; do sed -i $f $SED_PROG; done
      
      Requested-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      1751e8a6
  24. Nov 18, 2017
  25. Nov 16, 2017
  26. Jun 20, 2017
    • Ingo Molnar's avatar
      sched/wait: Disambiguate wq_entry->task_list and wq_head->task_list naming · 2055da97
      Ingo Molnar authored
      
      So I've noticed a number of instances where it was not obvious from the
      code whether ->task_list was for a wait-queue head or a wait-queue entry.
      
      Furthermore, there's a number of wait-queue users where the lists are
      not for 'tasks' but other entities (poll tables, etc.), in which case
      the 'task_list' name is actively confusing.
      
      To clear this all up, name the wait-queue head and entry list structure
      fields unambiguously:
      
      	struct wait_queue_head::task_list	=> ::head
      	struct wait_queue_entry::task_list	=> ::entry
      
      For example, this code:
      
      	rqw->wait.task_list.next != &wait->task_list
      
      ... is was pretty unclear (to me) what it's doing, while now it's written this way:
      
      	rqw->wait.head.next != &wait->entry
      
      ... which makes it pretty clear that we are iterating a list until we see the head.
      
      Other examples are:
      
      	list_for_each_entry_safe(pos, next, &x->task_list, task_list) {
      	list_for_each_entry(wq, &fence->wait.task_list, task_list) {
      
      ... where it's unclear (to me) what we are iterating, and during review it's
      hard to tell whether it's trying to walk a wait-queue entry (which would be
      a bug), while now it's written as:
      
      	list_for_each_entry_safe(pos, next, &x->head, entry) {
      	list_for_each_entry(wq, &fence->wait.head, entry) {
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      2055da97
    • Ingo Molnar's avatar
      sched/wait: Rename wait_queue_t => wait_queue_entry_t · ac6424b9
      Ingo Molnar authored
      
      Rename:
      
      	wait_queue_t		=>	wait_queue_entry_t
      
      'wait_queue_t' was always a slight misnomer: its name implies that it's a "queue",
      but in reality it's a queue *entry*. The 'real' queue is the wait queue head,
      which had to carry the name.
      
      Start sorting this out by renaming it to 'wait_queue_entry_t'.
      
      This also allows the real structure name 'struct __wait_queue' to
      lose its double underscore and become 'struct wait_queue_entry',
      which is the more canonical nomenclature for such data types.
      
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      ac6424b9
  27. Mar 02, 2017
  28. Feb 28, 2017
  29. Aug 02, 2016
Loading