Skip to content
Snippets Groups Projects
  • Seiji Nishikawa's avatar
    b45d8fda
    mm: vmscan: account for free pages to prevent infinite Loop in throttle_direct_reclaim() · b45d8fda
    Seiji Nishikawa authored and Frieder Schrempf's avatar Frieder Schrempf committed
    commit 6aaced5abd32e2a57cd94fd64f824514d0361da8 upstream.
    
    The task sometimes continues looping in throttle_direct_reclaim() because
    allow_direct_reclaim(pgdat) keeps returning false.
    
     #0 [ffff80002cb6f8d0] __switch_to at ffff8000080095ac
     #1 [ffff80002cb6f900] __schedule at ffff800008abbd1c
     #2 [ffff80002cb6f990] schedule at ffff800008abc50c
     #3 [ffff80002cb6f9b0] throttle_direct_reclaim at ffff800008273550
     #4 [ffff80002cb6fa20] try_to_free_pages at ffff800008277b68
     #5 [ffff80002cb6fae0] __alloc_pages_nodemask at ffff8000082c4660
     #6 [ffff80002cb6fc50] alloc_pages_vma at ffff8000082e4a98
     #7 [ffff80002cb6fca0] do_anonymous_page at ffff80000829f5a8
     #8 [ffff80002cb6fce0] __handle_mm_fault at ffff8000082a5974
     #9 [ffff80002cb6fd90] handle_mm_fault at ffff8000082a5bd4
    
    At this point, the pgdat contains the following two zones:
    
            NODE: 4  ZONE: 0  ADDR: ffff00817fffe540  NAME: "DMA32"
              SIZE: 20480  MIN/LOW/HIGH: 11/28/45
              VM_STAT:
                    NR_FREE_PAGES: 359
            NR_ZONE_INACTIVE_ANON: 18813
              NR_ZONE_ACTIVE_ANON: 0
            NR_ZONE_INACTIVE_FILE: 50
              NR_ZONE_ACTIVE_FILE: 0
              NR_ZONE_UNEVICTABLE: 0
            NR_ZONE_WRITE_PENDING: 0
                         NR_MLOCK: 0
                        NR_BOUNCE: 0
                       NR_ZSPAGES: 0
                NR_FREE_CMA_PAGES: 0
    
            NODE: 4  ZONE: 1  ADDR: ffff00817fffec00  NAME: "Normal"
              SIZE: 8454144  PRESENT: 98304  MIN/LOW/HIGH: 68/166/264
              VM_STAT:
                    NR_FREE_PAGES: 146
            NR_ZONE_INACTIVE_ANON: 94668
              NR_ZONE_ACTIVE_ANON: 3
            NR_ZONE_INACTIVE_FILE: 735
              NR_ZONE_ACTIVE_FILE: 78
              NR_ZONE_UNEVICTABLE: 0
            NR_ZONE_WRITE_PENDING: 0
                         NR_MLOCK: 0
                        NR_BOUNCE: 0
                       NR_ZSPAGES: 0
                NR_FREE_CMA_PAGES: 0
    
    In allow_direct_reclaim(), while processing ZONE_DMA32, the sum of
    inactive/active file-backed pages calculated in zone_reclaimable_pages()
    based on the result of zone_page_state_snapshot() is zero.
    
    Additionally, since this system lacks swap, the calculation of inactive/
    active anonymous pages is skipped.
    
            crash> p nr_swap_pages
            nr_swap_pages = $1937 = {
              counter = 0
            }
    
    As a result, ZONE_DMA32 is deemed unreclaimable and skipped, moving on to
    the processing of the next zone, ZONE_NORMAL, despite ZONE_DMA32 having
    free pages significantly exceeding the high watermark.
    
    The problem is that the pgdat->kswapd_failures hasn't been incremented.
    
            crash> px ((struct pglist_data *) 0xffff00817fffe540)->kswapd_failures
            $1935 = 0x0
    
    This is because the node deemed balanced.  The node balancing logic in
    balance_pgdat() evaluates all zones collectively.  If one or more zones
    (e.g., ZONE_DMA32) have enough free pages to meet their watermarks, the
    entire node is deemed balanced.  This causes balance_pgdat() to exit early
    before incrementing the kswapd_failures, as it considers the overall
    memory state acceptable, even though some zones (like ZONE_NORMAL) remain
    under significant pressure.
    
    
    The patch ensures that zone_reclaimable_pages() includes free pages
    (NR_FREE_PAGES) in its calculation when no other reclaimable pages are
    available (e.g., file-backed or anonymous pages).  This change prevents
    zones like ZONE_DMA32, which have sufficient free pages, from being
    mistakenly deemed unreclaimable.  By doing so, the patch ensures proper
    node balancing, avoids masking pressure on other zones like ZONE_NORMAL,
    and prevents infinite loops in throttle_direct_reclaim() caused by
    allow_direct_reclaim(pgdat) repeatedly returning false.
    
    
    The kernel hangs due to a task stuck in throttle_direct_reclaim(), caused
    by a node being incorrectly deemed balanced despite pressure in certain
    zones, such as ZONE_NORMAL.  This issue arises from
    zone_reclaimable_pages() returning 0 for zones without reclaimable file-
    backed or anonymous pages, causing zones like ZONE_DMA32 with sufficient
    free pages to be skipped.
    
    The lack of swap or reclaimable pages results in ZONE_DMA32 being ignored
    during reclaim, masking pressure in other zones.  Consequently,
    pgdat->kswapd_failures remains 0 in balance_pgdat(), preventing fallback
    mechanisms in allow_direct_reclaim() from being triggered, leading to an
    infinite loop in throttle_direct_reclaim().
    
    This patch modifies zone_reclaimable_pages() to account for free pages
    (NR_FREE_PAGES) when no other reclaimable pages exist.  This ensures zones
    with sufficient free pages are not skipped, enabling proper balancing and
    reclaim behavior.
    
    [akpm@linux-foundation.org: coding-style cleanups]
    Link: https://lkml.kernel.org/r/20241130164346.436469-1-snishika@redhat.com
    Link: https://lkml.kernel.org/r/20241130161236.433747-2-snishika@redhat.com
    
    
    Fixes: 5a1c84b4 ("mm: remove reclaim and compaction retry approximations")
    Signed-off-by: default avatarSeiji Nishikawa <snishika@redhat.com>
    Cc: Mel Gorman <mgorman@techsingularity.net>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    b45d8fda
    History
    mm: vmscan: account for free pages to prevent infinite Loop in throttle_direct_reclaim()
    Seiji Nishikawa authored and Frieder Schrempf's avatar Frieder Schrempf committed
    commit 6aaced5abd32e2a57cd94fd64f824514d0361da8 upstream.
    
    The task sometimes continues looping in throttle_direct_reclaim() because
    allow_direct_reclaim(pgdat) keeps returning false.
    
     #0 [ffff80002cb6f8d0] __switch_to at ffff8000080095ac
     #1 [ffff80002cb6f900] __schedule at ffff800008abbd1c
     #2 [ffff80002cb6f990] schedule at ffff800008abc50c
     #3 [ffff80002cb6f9b0] throttle_direct_reclaim at ffff800008273550
     #4 [ffff80002cb6fa20] try_to_free_pages at ffff800008277b68
     #5 [ffff80002cb6fae0] __alloc_pages_nodemask at ffff8000082c4660
     #6 [ffff80002cb6fc50] alloc_pages_vma at ffff8000082e4a98
     #7 [ffff80002cb6fca0] do_anonymous_page at ffff80000829f5a8
     #8 [ffff80002cb6fce0] __handle_mm_fault at ffff8000082a5974
     #9 [ffff80002cb6fd90] handle_mm_fault at ffff8000082a5bd4
    
    At this point, the pgdat contains the following two zones:
    
            NODE: 4  ZONE: 0  ADDR: ffff00817fffe540  NAME: "DMA32"
              SIZE: 20480  MIN/LOW/HIGH: 11/28/45
              VM_STAT:
                    NR_FREE_PAGES: 359
            NR_ZONE_INACTIVE_ANON: 18813
              NR_ZONE_ACTIVE_ANON: 0
            NR_ZONE_INACTIVE_FILE: 50
              NR_ZONE_ACTIVE_FILE: 0
              NR_ZONE_UNEVICTABLE: 0
            NR_ZONE_WRITE_PENDING: 0
                         NR_MLOCK: 0
                        NR_BOUNCE: 0
                       NR_ZSPAGES: 0
                NR_FREE_CMA_PAGES: 0
    
            NODE: 4  ZONE: 1  ADDR: ffff00817fffec00  NAME: "Normal"
              SIZE: 8454144  PRESENT: 98304  MIN/LOW/HIGH: 68/166/264
              VM_STAT:
                    NR_FREE_PAGES: 146
            NR_ZONE_INACTIVE_ANON: 94668
              NR_ZONE_ACTIVE_ANON: 3
            NR_ZONE_INACTIVE_FILE: 735
              NR_ZONE_ACTIVE_FILE: 78
              NR_ZONE_UNEVICTABLE: 0
            NR_ZONE_WRITE_PENDING: 0
                         NR_MLOCK: 0
                        NR_BOUNCE: 0
                       NR_ZSPAGES: 0
                NR_FREE_CMA_PAGES: 0
    
    In allow_direct_reclaim(), while processing ZONE_DMA32, the sum of
    inactive/active file-backed pages calculated in zone_reclaimable_pages()
    based on the result of zone_page_state_snapshot() is zero.
    
    Additionally, since this system lacks swap, the calculation of inactive/
    active anonymous pages is skipped.
    
            crash> p nr_swap_pages
            nr_swap_pages = $1937 = {
              counter = 0
            }
    
    As a result, ZONE_DMA32 is deemed unreclaimable and skipped, moving on to
    the processing of the next zone, ZONE_NORMAL, despite ZONE_DMA32 having
    free pages significantly exceeding the high watermark.
    
    The problem is that the pgdat->kswapd_failures hasn't been incremented.
    
            crash> px ((struct pglist_data *) 0xffff00817fffe540)->kswapd_failures
            $1935 = 0x0
    
    This is because the node deemed balanced.  The node balancing logic in
    balance_pgdat() evaluates all zones collectively.  If one or more zones
    (e.g., ZONE_DMA32) have enough free pages to meet their watermarks, the
    entire node is deemed balanced.  This causes balance_pgdat() to exit early
    before incrementing the kswapd_failures, as it considers the overall
    memory state acceptable, even though some zones (like ZONE_NORMAL) remain
    under significant pressure.
    
    
    The patch ensures that zone_reclaimable_pages() includes free pages
    (NR_FREE_PAGES) in its calculation when no other reclaimable pages are
    available (e.g., file-backed or anonymous pages).  This change prevents
    zones like ZONE_DMA32, which have sufficient free pages, from being
    mistakenly deemed unreclaimable.  By doing so, the patch ensures proper
    node balancing, avoids masking pressure on other zones like ZONE_NORMAL,
    and prevents infinite loops in throttle_direct_reclaim() caused by
    allow_direct_reclaim(pgdat) repeatedly returning false.
    
    
    The kernel hangs due to a task stuck in throttle_direct_reclaim(), caused
    by a node being incorrectly deemed balanced despite pressure in certain
    zones, such as ZONE_NORMAL.  This issue arises from
    zone_reclaimable_pages() returning 0 for zones without reclaimable file-
    backed or anonymous pages, causing zones like ZONE_DMA32 with sufficient
    free pages to be skipped.
    
    The lack of swap or reclaimable pages results in ZONE_DMA32 being ignored
    during reclaim, masking pressure in other zones.  Consequently,
    pgdat->kswapd_failures remains 0 in balance_pgdat(), preventing fallback
    mechanisms in allow_direct_reclaim() from being triggered, leading to an
    infinite loop in throttle_direct_reclaim().
    
    This patch modifies zone_reclaimable_pages() to account for free pages
    (NR_FREE_PAGES) when no other reclaimable pages exist.  This ensures zones
    with sufficient free pages are not skipped, enabling proper balancing and
    reclaim behavior.
    
    [akpm@linux-foundation.org: coding-style cleanups]
    Link: https://lkml.kernel.org/r/20241130164346.436469-1-snishika@redhat.com
    Link: https://lkml.kernel.org/r/20241130161236.433747-2-snishika@redhat.com
    
    
    Fixes: 5a1c84b4 ("mm: remove reclaim and compaction retry approximations")
    Signed-off-by: default avatarSeiji Nishikawa <snishika@redhat.com>
    Cc: Mel Gorman <mgorman@techsingularity.net>
    Cc: <stable@vger.kernel.org>
    Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
    Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>