Skip to content
Snippets Groups Projects
  • David Hildenbrand's avatar
    ddfbbebc
    mm/migrate_device: don't add folio to be freed to LRU in migrate_device_finalize() · ddfbbebc
    David Hildenbrand authored and Frieder Schrempf's avatar Frieder Schrempf committed
    commit 41cddf83d8b00f29fd105e7a0777366edc69a5cf upstream.
    
    If migration succeeded, we called
    folio_migrate_flags()->mem_cgroup_migrate() to migrate the memcg from the
    old to the new folio.  This will set memcg_data of the old folio to 0.
    
    Similarly, if migration failed, memcg_data of the dst folio is left unset.
    
    If we call folio_putback_lru() on such folios (memcg_data == 0), we will
    add the folio to be freed to the LRU, making memcg code unhappy.  Running
    the hmm selftests:
    
      # ./hmm-tests
      ...
      #  RUN           hmm.hmm_device_private.migrate ...
      [  102.078007][T14893] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x7ff27d200 pfn:0x13cc00
      [  102.079974][T14893] anon flags: 0x17ff00000020018(uptodate|dirty|swapbacked|node=0|zone=2|lastcpupid=0x7ff)
      [  102.082037][T14893] raw: 017ff00000020018 dead000000000100 dead000000000122 ffff8881353896c9
      [  102.083687][T14893] raw: 00000007ff27d200 0000000000000000 00000001ffffffff 0000000000000000
      [  102.085331][T14893] page dumped because: VM_WARN_ON_ONCE_FOLIO(!memcg && !mem_cgroup_disabled())
      [  102.087230][T14893] ------------[ cut here ]------------
      [  102.088279][T14893] WARNING: CPU: 0 PID: 14893 at ./include/linux/memcontrol.h:726 folio_lruvec_lock_irqsave+0x10e/0x170
      [  102.090478][T14893] Modules linked in:
      [  102.091244][T14893] CPU: 0 UID: 0 PID: 14893 Comm: hmm-tests Not tainted 6.13.0-09623-g6c216bc522fd #151
      [  102.093089][T14893] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014
      [  102.094848][T14893] RIP: 0010:folio_lruvec_lock_irqsave+0x10e/0x170
      [  102.096104][T14893] Code: ...
      [  102.099908][T14893] RSP: 0018:ffffc900236c37b0 EFLAGS: 00010293
      [  102.101152][T14893] RAX: 0000000000000000 RBX: ffffea0004f30000 RCX: ffffffff8183f426
      [  102.102684][T14893] RDX: ffff8881063cb880 RSI: ffffffff81b8117f RDI: ffff8881063cb880
      [  102.104227][T14893] RBP: 0000000000000000 R08: 0000000000000005 R09: 0000000000000000
      [  102.105757][T14893] R10: 0000000000000001 R11: 0000000000000002 R12: ffffc900236c37d8
      [  102.107296][T14893] R13: ffff888277a2bcb0 R14: 000000000000001f R15: 0000000000000000
      [  102.108830][T14893] FS:  00007ff27dbdd740(0000) GS:ffff888277a00000(0000) knlGS:0000000000000000
      [  102.110643][T14893] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  102.111924][T14893] CR2: 00007ff27d400000 CR3: 000000010866e000 CR4: 0000000000750ef0
      [  102.113478][T14893] PKRU: 55555554
      [  102.114172][T14893] Call Trace:
      [  102.114805][T14893]  <TASK>
      [  102.115397][T14893]  ? folio_lruvec_lock_irqsave+0x10e/0x170
      [  102.116547][T14893]  ? __warn.cold+0x110/0x210
      [  102.117461][T14893]  ? folio_lruvec_lock_irqsave+0x10e/0x170
      [  102.118667][T14893]  ? report_bug+0x1b9/0x320
      [  102.119571][T14893]  ? handle_bug+0x54/0x90
      [  102.120494][T14893]  ? exc_invalid_op+0x17/0x50
      [  102.121433][T14893]  ? asm_exc_invalid_op+0x1a/0x20
      [  102.122435][T14893]  ? __wake_up_klogd.part.0+0x76/0xd0
      [  102.123506][T14893]  ? dump_page+0x4f/0x60
      [  102.124352][T14893]  ? folio_lruvec_lock_irqsave+0x10e/0x170
      [  102.125500][T14893]  folio_batch_move_lru+0xd4/0x200
      [  102.126577][T14893]  ? __pfx_lru_add+0x10/0x10
      [  102.127505][T14893]  __folio_batch_add_and_move+0x391/0x720
      [  102.128633][T14893]  ? __pfx_lru_add+0x10/0x10
      [  102.129550][T14893]  folio_putback_lru+0x16/0x80
      [  102.130564][T14893]  migrate_device_finalize+0x9b/0x530
      [  102.131640][T14893]  dmirror_migrate_to_device.constprop.0+0x7c5/0xad0
      [  102.133047][T14893]  dmirror_fops_unlocked_ioctl+0x89b/0xc80
    
    Likely, nothing else goes wrong: putting the last folio reference will
    remove the folio from the LRU again.  So besides memcg complaining, adding
    the folio to be freed to the LRU is just an unnecessary step.
    
    The new flow resembles what we have in migrate_folio_move(): add the dst
    to the lru, remove migration ptes, unlock and unref dst.
    
    Link: https://lkml.kernel.org/r/20250210161317.717936-1-david@redhat.com
    
    
    Fixes: 8763cb45 ("mm/migrate: new memory migration helper for use with device memory")
    Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
    Cc: Jérôme Glisse <jglisse@redhat.com>
    Cc: John Hubbard <jhubbard@nvidia.com>
    Cc: Alistair Popple <apopple@nvidia.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    ddfbbebc
    History
    mm/migrate_device: don't add folio to be freed to LRU in migrate_device_finalize()
    David Hildenbrand authored and Frieder Schrempf's avatar Frieder Schrempf committed
    commit 41cddf83d8b00f29fd105e7a0777366edc69a5cf upstream.
    
    If migration succeeded, we called
    folio_migrate_flags()->mem_cgroup_migrate() to migrate the memcg from the
    old to the new folio.  This will set memcg_data of the old folio to 0.
    
    Similarly, if migration failed, memcg_data of the dst folio is left unset.
    
    If we call folio_putback_lru() on such folios (memcg_data == 0), we will
    add the folio to be freed to the LRU, making memcg code unhappy.  Running
    the hmm selftests:
    
      # ./hmm-tests
      ...
      #  RUN           hmm.hmm_device_private.migrate ...
      [  102.078007][T14893] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x7ff27d200 pfn:0x13cc00
      [  102.079974][T14893] anon flags: 0x17ff00000020018(uptodate|dirty|swapbacked|node=0|zone=2|lastcpupid=0x7ff)
      [  102.082037][T14893] raw: 017ff00000020018 dead000000000100 dead000000000122 ffff8881353896c9
      [  102.083687][T14893] raw: 00000007ff27d200 0000000000000000 00000001ffffffff 0000000000000000
      [  102.085331][T14893] page dumped because: VM_WARN_ON_ONCE_FOLIO(!memcg && !mem_cgroup_disabled())
      [  102.087230][T14893] ------------[ cut here ]------------
      [  102.088279][T14893] WARNING: CPU: 0 PID: 14893 at ./include/linux/memcontrol.h:726 folio_lruvec_lock_irqsave+0x10e/0x170
      [  102.090478][T14893] Modules linked in:
      [  102.091244][T14893] CPU: 0 UID: 0 PID: 14893 Comm: hmm-tests Not tainted 6.13.0-09623-g6c216bc522fd #151
      [  102.093089][T14893] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014
      [  102.094848][T14893] RIP: 0010:folio_lruvec_lock_irqsave+0x10e/0x170
      [  102.096104][T14893] Code: ...
      [  102.099908][T14893] RSP: 0018:ffffc900236c37b0 EFLAGS: 00010293
      [  102.101152][T14893] RAX: 0000000000000000 RBX: ffffea0004f30000 RCX: ffffffff8183f426
      [  102.102684][T14893] RDX: ffff8881063cb880 RSI: ffffffff81b8117f RDI: ffff8881063cb880
      [  102.104227][T14893] RBP: 0000000000000000 R08: 0000000000000005 R09: 0000000000000000
      [  102.105757][T14893] R10: 0000000000000001 R11: 0000000000000002 R12: ffffc900236c37d8
      [  102.107296][T14893] R13: ffff888277a2bcb0 R14: 000000000000001f R15: 0000000000000000
      [  102.108830][T14893] FS:  00007ff27dbdd740(0000) GS:ffff888277a00000(0000) knlGS:0000000000000000
      [  102.110643][T14893] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  102.111924][T14893] CR2: 00007ff27d400000 CR3: 000000010866e000 CR4: 0000000000750ef0
      [  102.113478][T14893] PKRU: 55555554
      [  102.114172][T14893] Call Trace:
      [  102.114805][T14893]  <TASK>
      [  102.115397][T14893]  ? folio_lruvec_lock_irqsave+0x10e/0x170
      [  102.116547][T14893]  ? __warn.cold+0x110/0x210
      [  102.117461][T14893]  ? folio_lruvec_lock_irqsave+0x10e/0x170
      [  102.118667][T14893]  ? report_bug+0x1b9/0x320
      [  102.119571][T14893]  ? handle_bug+0x54/0x90
      [  102.120494][T14893]  ? exc_invalid_op+0x17/0x50
      [  102.121433][T14893]  ? asm_exc_invalid_op+0x1a/0x20
      [  102.122435][T14893]  ? __wake_up_klogd.part.0+0x76/0xd0
      [  102.123506][T14893]  ? dump_page+0x4f/0x60
      [  102.124352][T14893]  ? folio_lruvec_lock_irqsave+0x10e/0x170
      [  102.125500][T14893]  folio_batch_move_lru+0xd4/0x200
      [  102.126577][T14893]  ? __pfx_lru_add+0x10/0x10
      [  102.127505][T14893]  __folio_batch_add_and_move+0x391/0x720
      [  102.128633][T14893]  ? __pfx_lru_add+0x10/0x10
      [  102.129550][T14893]  folio_putback_lru+0x16/0x80
      [  102.130564][T14893]  migrate_device_finalize+0x9b/0x530
      [  102.131640][T14893]  dmirror_migrate_to_device.constprop.0+0x7c5/0xad0
      [  102.133047][T14893]  dmirror_fops_unlocked_ioctl+0x89b/0xc80
    
    Likely, nothing else goes wrong: putting the last folio reference will
    remove the folio from the LRU again.  So besides memcg complaining, adding
    the folio to be freed to the LRU is just an unnecessary step.
    
    The new flow resembles what we have in migrate_folio_move(): add the dst
    to the lru, remove migration ptes, unlock and unref dst.
    
    Link: https://lkml.kernel.org/r/20250210161317.717936-1-david@redhat.com
    
    
    Fixes: 8763cb45 ("mm/migrate: new memory migration helper for use with device memory")
    Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
    Cc: Jérôme Glisse <jglisse@redhat.com>
    Cc: John Hubbard <jhubbard@nvidia.com>
    Cc: Alistair Popple <apopple@nvidia.com>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>