Skip to content
Snippets Groups Projects
  • Liu Shixin's avatar
    2e31443a
    mm: hugetlb: independent PMD page table shared count · 2e31443a
    Liu Shixin authored
    commit 59d9094df3d79443937add8700b2ef1a866b1081 upstream.
    
    The folio refcount may be increased unexpectly through try_get_folio() by
    caller such as split_huge_pages.  In huge_pmd_unshare(), we use refcount
    to check whether a pmd page table is shared.  The check is incorrect if
    the refcount is increased by the above caller, and this can cause the page
    table leaked:
    
     BUG: Bad page state in process sh  pfn:109324
     page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x66 pfn:0x109324
     flags: 0x17ffff800000000(node=0|zone=2|lastcpupid=0xfffff)
     page_type: f2(table)
     raw: 017ffff800000000 0000000000000000 0000000000000000 0000000000000000
     raw: 0000000000000066 0000000000000000 00000000f2000000 0000000000000000
     page dumped because: nonzero mapcount
     ...
     CPU: 31 UID: 0 PID: 7515 Comm: sh Kdump: loaded Tainted: G    B              6.13.0-rc2master+ #7
     Tainted: [B]=BAD_PAGE
     Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
     Call trace:
      show_stack+0x20/0x38 (C)
      dump_stack_lvl+0x80/0xf8
      dump_stack+0x18/0x28
      bad_page+0x8c/0x130
      free_page_is_bad_report+0xa4/0xb0
      free_unref_page+0x3cc/0x620
      __folio_put+0xf4/0x158
      split_huge_pages_all+0x1e0/0x3e8
      split_huge_pages_write+0x25c/0x2d8
      full_proxy_write+0x64/0xd8
      vfs_write+0xcc/0x280
      ksys_write+0x70/0x110
      __arm64_sys_write+0x24/0x38
      invoke_syscall+0x50/0x120
      el0_svc_common.constprop.0+0xc8/0xf0
      do_el0_svc+0x24/0x38
      el0_svc+0x34/0x128
      el0t_64_sync_handler+0xc8/0xd0
      el0t_64_sync+0x190/0x198
    
    The issue may be triggered by damon, offline_page, page_idle, etc, which
    will increase the refcount of page table.
    
    1. The page table itself will be discarded after reporting the
       "nonzero mapcount".
    
    2. The HugeTLB page mapped by the page table miss freeing since we
       treat the page table as shared and a shared page table will not be
       unmapped.
    
    Fix it by introducing independent PMD page table shared count.  As
    described by comment, pt_index/pt_mm/pt_frag_refcount are used for s390
    gmap, x86 pgds and powerpc, pt_share_count is used for x86/arm64/riscv
    pmds, so we can reuse the field as pt_share_count.
    
    Link: https://lkml.kernel.org/r/20241216071147.3984217-1-liushixin2@huawei.com
    
    
    Fixes: 39dde65c ("[PATCH] shared page table for hugetlb page")
    Signed-off-by: default avatarLiu Shixin <liushixin2@huawei.com>
    Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
    Cc: Ken Chen <kenneth.w.chen@intel.com>
    Cc: Muchun Song <muchun.song@linux.dev>
    Cc: Nanyong Sun <sunnanyong@huawei.com>
    Cc: Jane Chu <jane.chu@oracle.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    2e31443a
    History
    mm: hugetlb: independent PMD page table shared count
    Liu Shixin authored
    commit 59d9094df3d79443937add8700b2ef1a866b1081 upstream.
    
    The folio refcount may be increased unexpectly through try_get_folio() by
    caller such as split_huge_pages.  In huge_pmd_unshare(), we use refcount
    to check whether a pmd page table is shared.  The check is incorrect if
    the refcount is increased by the above caller, and this can cause the page
    table leaked:
    
     BUG: Bad page state in process sh  pfn:109324
     page: refcount:0 mapcount:0 mapping:0000000000000000 index:0x66 pfn:0x109324
     flags: 0x17ffff800000000(node=0|zone=2|lastcpupid=0xfffff)
     page_type: f2(table)
     raw: 017ffff800000000 0000000000000000 0000000000000000 0000000000000000
     raw: 0000000000000066 0000000000000000 00000000f2000000 0000000000000000
     page dumped because: nonzero mapcount
     ...
     CPU: 31 UID: 0 PID: 7515 Comm: sh Kdump: loaded Tainted: G    B              6.13.0-rc2master+ #7
     Tainted: [B]=BAD_PAGE
     Hardware name: QEMU KVM Virtual Machine, BIOS 0.0.0 02/06/2015
     Call trace:
      show_stack+0x20/0x38 (C)
      dump_stack_lvl+0x80/0xf8
      dump_stack+0x18/0x28
      bad_page+0x8c/0x130
      free_page_is_bad_report+0xa4/0xb0
      free_unref_page+0x3cc/0x620
      __folio_put+0xf4/0x158
      split_huge_pages_all+0x1e0/0x3e8
      split_huge_pages_write+0x25c/0x2d8
      full_proxy_write+0x64/0xd8
      vfs_write+0xcc/0x280
      ksys_write+0x70/0x110
      __arm64_sys_write+0x24/0x38
      invoke_syscall+0x50/0x120
      el0_svc_common.constprop.0+0xc8/0xf0
      do_el0_svc+0x24/0x38
      el0_svc+0x34/0x128
      el0t_64_sync_handler+0xc8/0xd0
      el0t_64_sync+0x190/0x198
    
    The issue may be triggered by damon, offline_page, page_idle, etc, which
    will increase the refcount of page table.
    
    1. The page table itself will be discarded after reporting the
       "nonzero mapcount".
    
    2. The HugeTLB page mapped by the page table miss freeing since we
       treat the page table as shared and a shared page table will not be
       unmapped.
    
    Fix it by introducing independent PMD page table shared count.  As
    described by comment, pt_index/pt_mm/pt_frag_refcount are used for s390
    gmap, x86 pgds and powerpc, pt_share_count is used for x86/arm64/riscv
    pmds, so we can reuse the field as pt_share_count.
    
    Link: https://lkml.kernel.org/r/20241216071147.3984217-1-liushixin2@huawei.com
    
    
    Fixes: 39dde65c ("[PATCH] shared page table for hugetlb page")
    Signed-off-by: default avatarLiu Shixin <liushixin2@huawei.com>
    Cc: Kefeng Wang <wangkefeng.wang@huawei.com>
    Cc: Ken Chen <kenneth.w.chen@intel.com>
    Cc: Muchun Song <muchun.song@linux.dev>
    Cc: Nanyong Sun <sunnanyong@huawei.com>
    Cc: Jane Chu <jane.chu@oracle.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>