Skip to content
Snippets Groups Projects
  1. Mar 03, 2025
  2. Dec 12, 2023
  3. Apr 13, 2023
  4. Mar 30, 2023
  5. Jan 24, 2023
    • Kees Cook's avatar
      panic: Consolidate open-coded panic_on_warn checks · 13aa82f0
      Kees Cook authored
      
      commit 79cc1ba7 upstream.
      
      Several run-time checkers (KASAN, UBSAN, KFENCE, KCSAN, sched) roll
      their own warnings, and each check "panic_on_warn". Consolidate this
      into a single function so that future instrumentation can be added in
      a single location.
      
      Cc: Marco Elver <elver@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Juri Lelli <juri.lelli@redhat.com>
      Cc: Vincent Guittot <vincent.guittot@linaro.org>
      Cc: Dietmar Eggemann <dietmar.eggemann@arm.com>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Ben Segall <bsegall@google.com>
      Cc: Mel Gorman <mgorman@suse.de>
      Cc: Daniel Bristot de Oliveira <bristot@redhat.com>
      Cc: Valentin Schneider <vschneid@redhat.com>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Andrey Konovalov <andreyknvl@gmail.com>
      Cc: Vincenzo Frascino <vincenzo.frascino@arm.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: David Gow <davidgow@google.com>
      Cc: tangmeng <tangmeng@uniontech.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: Shuah Khan <skhan@linuxfoundation.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: "Paul E. McKenney" <paulmck@kernel.org>
      Cc: Sebastian Andrzej Siewior <bigeasy@linutronix.de>
      Cc: "Guilherme G. Piccoli" <gpiccoli@igalia.com>
      Cc: Tiezhu Yang <yangtiezhu@loongson.cn>
      Cc: kasan-dev@googlegroups.com
      Cc: linux-mm@kvack.org
      Reviewed-by: default avatarLuis Chamberlain <mcgrof@kernel.org>
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Reviewed-by: default avatarMarco Elver <elver@google.com>
      Reviewed-by: default avatarAndrey Konovalov <andreyknvl@gmail.com>
      Link: https://lore.kernel.org/r/20221117234328.594699-4-keescook@chromium.org
      
      
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      13aa82f0
  6. Nov 23, 2022
    • Marco Elver's avatar
      kfence: fix stack trace pruning · 747c0f35
      Marco Elver authored
      Commit b1405135 ("mm/sl[au]b: generalize kmalloc subsystem")
      refactored large parts of the kmalloc subsystem, resulting in the stack
      trace pruning logic done by KFENCE to no longer work.
      
      While b1405135 attempted to fix the situation by including
      '__kmem_cache_free' in the list of functions KFENCE should skip through,
      this only works when the compiler actually optimized the tail call from
      kfree() to __kmem_cache_free() into a jump (and thus kfree() _not_
      appearing in the full stack trace to begin with).
      
      In some configurations, the compiler no longer optimizes the tail call
      into a jump, and __kmem_cache_free() appears in the stack trace.  This
      means that the pruned stack trace shown by KFENCE would include kfree()
      which is not intended - for example:
      
       | BUG: KFENCE: invalid free in kfree+0x7c/0x120
       |
       | Invalid free of 0xffff8883ed8fefe0 (in kfence-#126):
       |  kfree+0x7c/0x120
       |  test_double_free+0x116/0x1a9
       |  kunit_try_run_case+0x90/0xd0
       | [...]
      
      Fix it by moving __kmem_cache_free() to the list of functions that may be
      tail called by an allocator entry function, making the pruning logic work
      in both the optimized and unoptimized tail call cases.
      
      Link: https://lkml.kernel.org/r/20221118152216.3914899-1-elver@google.com
      
      
      Fixes: b1405135 ("mm/sl[au]b: generalize kmalloc subsystem")
      Signed-off-by: default avatarMarco Elver <elver@google.com>
      Reviewed-by: default avatarAlexander Potapenko <glider@google.com>
      Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com>
      Cc: Feng Tang <feng.tang@intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      747c0f35
  7. Oct 03, 2022
  8. Sep 29, 2022
    • Jason A. Donenfeld's avatar
      kfence: use better stack hash seed · 08475dab
      Jason A. Donenfeld authored
      
      As of the prior commit, the RNG will have incorporated both a cycle
      counter value and RDRAND, in addition to various other environmental
      noise. Therefore, using get_random_u32() will supply a stronger seed
      than simply using random_get_entropy(). N.B.: random_get_entropy()
      should be considered an internal API of random.c and not generally
      consumed.
      
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarMarco Elver <elver@google.com>
      Signed-off-by: default avatarJason A. Donenfeld <Jason@zx2c4.com>
      08475dab
  9. Sep 12, 2022
    • Imran Khan's avatar
      kfence: add sysfs interface to disable kfence for selected slabs. · b84e04f1
      Imran Khan authored
      By default kfence allocation can happen for any slab object, whose size is
      up to PAGE_SIZE, as long as that allocation is the first allocation after
      expiration of kfence sample interval.  But in certain debugging scenarios
      we may be interested in debugging corruptions involving some specific slub
      objects like dentry or ext4_* etc.  In such cases limiting kfence for
      allocations involving only specific slub objects will increase the
      probablity of catching the issue since kfence pool will not be consumed by
      other slab objects.
      
      This patch introduces a sysfs interface
      '/sys/kernel/slab/<name>/skip_kfence' to disable kfence for specific
      slabs.  Having the interface work in this way does not impact
      current/default behavior of kfence and allows us to use kfence for
      specific slabs (when needed) as well.  The decision to skip/use kfence is
      taken depending on whether kmem_cache.flags has (newly introduced)
      SLAB_SKIP_KFENCE flag set or not.
      
      Link: https://lkml.kernel.org/r/20220814195353.2540848-1-imran.f.khan@oracle.com
      
      
      Signed-off-by: default avatarImran Khan <imran.f.khan@oracle.com>
      Reviewed-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: default avatarMarco Elver <elver@google.com>
      Reviewed-by: default avatarHyeonggon Yoo <42.hyeyoo@gmail.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Christoph Lameter <cl@linux.com>
      Cc: Pekka Enberg <penberg@kernel.org>
      Cc: David Rientjes <rientjes@google.com>
      Cc: Joonsoo Kim <iamjoonsoo.kim@lge.com>
      Cc: Roman Gushchin <roman.gushchin@linux.dev>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      b84e04f1
  10. Sep 01, 2022
  11. Jul 18, 2022
  12. Jun 17, 2022
  13. May 25, 2022
  14. May 13, 2022
    • huangshaobo's avatar
      kfence: enable check kfence canary on panic via boot param · 3c81b3bb
      huangshaobo authored
      Out-of-bounds accesses that aren't caught by a guard page will result in
      corruption of canary memory.  In pathological cases, where an object has
      certain alignment requirements, an out-of-bounds access might never be
      caught by the guard page.  Such corruptions, however, are only detected on
      kfree() normally.  If the bug causes the kernel to panic before kfree(),
      KFENCE has no opportunity to report the issue.  Such corruptions may also
      indicate failing memory or other faults.
      
      To provide some more information in such cases, add the option to check
      canary bytes on panic.  This might help narrow the search for the panic
      cause; but, due to only having the allocation stack trace, such reports
      are difficult to use to diagnose an issue alone.  In most cases, such
      reports are inactionable, and is therefore an opt-in feature (disabled by
      default).
      
      [akpm@linux-foundation.org: add __read_mostly, per Marco]
      Link: https://lkml.kernel.org/r/20220425022456.44300-1-huangshaobo6@huawei.com
      
      
      Signed-off-by: default avatarhuangshaobo <huangshaobo6@huawei.com>
      Suggested-by: default avatarchenzefeng <chenzefeng2@huawei.com>
      Reviewed-by: default avatarMarco Elver <elver@google.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Xiaoming Ni <nixiaoming@huawei.com>
      Cc: Wangbing <wangbing6@huawei.com>
      Cc: Jubin Zhong <zhongjubin@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      3c81b3bb
  15. May 10, 2022
    • Hyeonggon Yoo's avatar
      mm/kfence: reset PG_slab and memcg_data before freeing __kfence_pool · 2839b099
      Hyeonggon Yoo authored
      When kfence fails to initialize kfence pool, it frees the pool.  But it
      does not reset memcg_data and PG_slab flag.
      
      Below is a BUG because of this. Let's fix it by resetting memcg_data
      and PG_slab flag before free.
      
      [    0.089149] BUG: Bad page state in process swapper/0  pfn:3d8e06
      [    0.089149] page:ffffea46cf638180 refcount:0 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x3d8e06
      [    0.089150] memcg:ffffffff94a475d1
      [    0.089150] flags: 0x17ffffc0000200(slab|node=0|zone=2|lastcpupid=0x1fffff)
      [    0.089151] raw: 0017ffffc0000200 ffffea46cf638188 ffffea46cf638188 0000000000000000
      [    0.089152] raw: 0000000000000000 0000000000000000 00000000ffffffff ffffffff94a475d1
      [    0.089152] page dumped because: page still charged to cgroup
      [    0.089153] Modules linked in:
      [    0.089153] CPU: 0 PID: 0 Comm: swapper/0 Tainted: G    B   W         5.18.0-rc1+ #965
      [    0.089154] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.14.0-2 04/01/2014
      [    0.089154] Call Trace:
      [    0.089155]  <TASK>
      [    0.089155]  dump_stack_lvl+0x49/0x5f
      [    0.089157]  dump_stack+0x10/0x12
      [    0.089158]  bad_page.cold+0x63/0x94
      [    0.089159]  check_free_page_bad+0x66/0x70
      [    0.089160]  __free_pages_ok+0x423/0x530
      [    0.089161]  __free_pages_core+0x8e/0xa0
      [    0.089162]  memblock_free_pages+0x10/0x12
      [    0.089164]  memblock_free_late+0x8f/0xb9
      [    0.089165]  kfence_init+0x68/0x92
      [    0.089166]  start_kernel+0x789/0x992
      [    0.089167]  x86_64_start_reservations+0x24/0x26
      [    0.089168]  x86_64_start_kernel+0xa9/0xaf
      [    0.089170]  secondary_startup_64_no_verify+0xd5/0xdb
      [    0.089171]  </TASK>
      
      Link: https://lkml.kernel.org/r/YnPG3pQrqfcgOlVa@hyeyoo
      
      
      Fixes: 0ce20dd8 ("mm: add Kernel Electric-Fence infrastructure")
      Fixes: 8f0b3649 ("mm: kfence: fix objcgs vector allocation")
      Signed-off-by: default avatarHyeonggon Yoo <42.hyeyoo@gmail.com>
      Reviewed-by: default avatarMarco Elver <elver@google.com>
      Reviewed-by: default avatarMuchun Song <songmuchun@bytedance.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      2839b099
  16. May 02, 2022
  17. Apr 15, 2022
  18. Apr 01, 2022
  19. Mar 22, 2022
  20. Feb 12, 2022
  21. Jan 06, 2022
    • Vlastimil Babka's avatar
      mm/sl*b: Differentiate struct slab fields by sl*b implementations · 401fb12c
      Vlastimil Babka authored
      
      With a struct slab definition separate from struct page, we can go
      further and define only fields that the chosen sl*b implementation uses.
      This means everything between __page_flags and __page_refcount
      placeholders now depends on the chosen CONFIG_SL*B. Some fields exist in
      all implementations (slab_list) but can be part of a union in some, so
      it's simpler to repeat them than complicate the definition with ifdefs
      even more.
      
      The patch doesn't change physical offsets of the fields, although it
      could be done later - for example it's now clear that tighter packing in
      SLOB could be possible.
      
      This should also prevent accidental use of fields that don't exist in
      given implementation. Before this patch virt_to_cache() and
      cache_from_obj() were visible for SLOB (albeit not used), although they
      rely on the slab_cache field that isn't set by SLOB. With this patch
      it's now a compile error, so these functions are now hidden behind
      an #ifndef CONFIG_SLOB.
      
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: default avatarRoman Gushchin <guro@fb.com>
      Tested-by: Marco Elver <elver@google.com> # kfence
      Reviewed-by: default avatarHyeonggon Yoo <42.hyeyoo@gmail.com>
      Tested-by: default avatarHyeonggon Yoo <42.hyeyoo@gmail.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Marco Elver <elver@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: <kasan-dev@googlegroups.com>
      401fb12c
    • Vlastimil Babka's avatar
      mm/kfence: Convert kfence_guarded_alloc() to struct slab · 8dae0cfe
      Vlastimil Babka authored
      
      The function sets some fields that are being moved from struct page to
      struct slab so it needs to be converted.
      
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Tested-by: default avatarMarco Elver <elver@google.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Marco Elver <elver@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: <kasan-dev@googlegroups.com>
      8dae0cfe
    • Vlastimil Babka's avatar
      mm: Convert struct page to struct slab in functions used by other subsystems · 40f3bf0c
      Vlastimil Babka authored
      
      KASAN, KFENCE and memcg interact with SLAB or SLUB internals through
      functions nearest_obj(), obj_to_index() and objs_per_slab() that use
      struct page as parameter. This patch converts it to struct slab
      including all callers, through a coccinelle semantic patch.
      
      // Options: --include-headers --no-includes --smpl-spacing include/linux/slab_def.h include/linux/slub_def.h mm/slab.h mm/kasan/*.c mm/kfence/kfence_test.c mm/memcontrol.c mm/slab.c mm/slub.c
      // Note: needs coccinelle 1.1.1 to avoid breaking whitespace
      
      @@
      @@
      
      -objs_per_slab_page(
      +objs_per_slab(
       ...
       )
       { ... }
      
      @@
      @@
      
      -objs_per_slab_page(
      +objs_per_slab(
       ...
       )
      
      @@
      identifier fn =~ "obj_to_index|objs_per_slab";
      @@
      
       fn(...,
      -   const struct page *page
      +   const struct slab *slab
          ,...)
       {
      <...
      (
      - page_address(page)
      + slab_address(slab)
      |
      - page
      + slab
      )
      ...>
       }
      
      @@
      identifier fn =~ "nearest_obj";
      @@
      
       fn(...,
      -   struct page *page
      +   const struct slab *slab
          ,...)
       {
      <...
      (
      - page_address(page)
      + slab_address(slab)
      |
      - page
      + slab
      )
      ...>
       }
      
      @@
      identifier fn =~ "nearest_obj|obj_to_index|objs_per_slab";
      expression E;
      @@
      
       fn(...,
      (
      - slab_page(E)
      + E
      |
      - virt_to_page(E)
      + virt_to_slab(E)
      |
      - virt_to_head_page(E)
      + virt_to_slab(E)
      |
      - page
      + page_slab(page)
      )
        ,...)
      
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Reviewed-by: default avatarAndrey Konovalov <andreyknvl@gmail.com>
      Reviewed-by: default avatarRoman Gushchin <guro@fb.com>
      Acked-by: default avatarJohannes Weiner <hannes@cmpxchg.org>
      Cc: Julia Lawall <julia.lawall@inria.fr>
      Cc: Luis Chamberlain <mcgrof@kernel.org>
      Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Andrey Konovalov <andreyknvl@gmail.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Marco Elver <elver@google.com>
      Cc: Johannes Weiner <hannes@cmpxchg.org>
      Cc: Michal Hocko <mhocko@kernel.org>
      Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
      Cc: <kasan-dev@googlegroups.com>
      Cc: <cgroups@vger.kernel.org>
      40f3bf0c
  22. Dec 25, 2021
    • Baokun Li's avatar
      kfence: fix memory leak when cat kfence objects · 0129ab1f
      Baokun Li authored
      Hulk robot reported a kmemleak problem:
      
          unreferenced object 0xffff93d1d8cc02e8 (size 248):
            comm "cat", pid 23327, jiffies 4624670141 (age 495992.217s)
            hex dump (first 32 bytes):
              00 40 85 19 d4 93 ff ff 00 10 00 00 00 00 00 00  .@..............
              00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
            backtrace:
               seq_open+0x2a/0x80
               full_proxy_open+0x167/0x1e0
               do_dentry_open+0x1e1/0x3a0
               path_openat+0x961/0xa20
               do_filp_open+0xae/0x120
               do_sys_openat2+0x216/0x2f0
               do_sys_open+0x57/0x80
               do_syscall_64+0x33/0x40
               entry_SYSCALL_64_after_hwframe+0x44/0xa9
          unreferenced object 0xffff93d419854000 (size 4096):
            comm "cat", pid 23327, jiffies 4624670141 (age 495992.217s)
            hex dump (first 32 bytes):
              6b 66 65 6e 63 65 2d 23 32 35 30 3a 20 30 78 30  kfence-#250: 0x0
              30 30 30 30 30 30 30 37 35 34 62 64 61 31 32 2d  0000000754bda12-
            backtrace:
               seq_read_iter+0x313/0x440
               seq_read+0x14b/0x1a0
               full_proxy_read+0x56/0x80
               vfs_read+0xa5/0x1b0
               ksys_read+0xa0/0xf0
               do_syscall_64+0x33/0x40
               entry_SYSCALL_64_after_hwframe+0x44/0xa9
      
      I find that we can easily reproduce this problem with the following
      commands:
      
      	cat /sys/kernel/debug/kfence/objects
      	echo scan > /sys/kernel/debug/kmemleak
      	cat /sys/kernel/debug/kmemleak
      
      The leaked memory is allocated in the stack below:
      
          do_syscall_64
            do_sys_open
              do_dentry_open
                full_proxy_open
                  seq_open            ---> alloc seq_file
            vfs_read
              full_proxy_read
                seq_read
                  seq_read_iter
                    traverse          ---> alloc seq_buf
      
      And it should have been released in the following process:
      
          do_syscall_64
            syscall_exit_to_user_mode
              exit_to_user_mode_prepare
                task_work_run
                  ____fput
                    __fput
                      full_proxy_release  ---> free here
      
      However, the release function corresponding to file_operations is not
      implemented in kfence.  As a result, a memory leak occurs.  Therefore,
      the solution to this problem is to implement the corresponding release
      function.
      
      Link: https://lkml.kernel.org/r/20211206133628.2822545-1-libaokun1@huawei.com
      
      
      Fixes: 0ce20dd8 ("mm: add Kernel Electric-Fence infrastructure")
      Signed-off-by: default avatarBaokun Li <libaokun1@huawei.com>
      Reported-by: default avatarHulk Robot <hulkci@huawei.com>
      Acked-by: default avatarMarco Elver <elver@google.com>
      Reviewed-by: default avatarKefeng Wang <wangkefeng.wang@huawei.com>
      Cc: Alexander Potapenko <glider@google.com>
      Cc: Dmitry Vyukov <dvyukov@google.com>
      Cc: Yu Kuai <yukuai3@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      0129ab1f
  23. Nov 06, 2021
  24. Sep 08, 2021
Loading