Skip to content
Snippets Groups Projects
  1. Dec 05, 2024
    • Liu Jian's avatar
      sunrpc: fix one UAF issue caused by sunrpc kernel tcp socket · 61c0a5ea
      Liu Jian authored
      
      [ Upstream commit 3f23f96528e8fcf8619895c4c916c52653892ec1 ]
      
      BUG: KASAN: slab-use-after-free in tcp_write_timer_handler+0x156/0x3e0
      Read of size 1 at addr ffff888111f322cd by task swapper/0/0
      
      CPU: 0 UID: 0 PID: 0 Comm: swapper/0 Not tainted 6.12.0-rc4-dirty #7
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1
      Call Trace:
       <IRQ>
       dump_stack_lvl+0x68/0xa0
       print_address_description.constprop.0+0x2c/0x3d0
       print_report+0xb4/0x270
       kasan_report+0xbd/0xf0
       tcp_write_timer_handler+0x156/0x3e0
       tcp_write_timer+0x66/0x170
       call_timer_fn+0xfb/0x1d0
       __run_timers+0x3f8/0x480
       run_timer_softirq+0x9b/0x100
       handle_softirqs+0x153/0x390
       __irq_exit_rcu+0x103/0x120
       irq_exit_rcu+0xe/0x20
       sysvec_apic_timer_interrupt+0x76/0x90
       </IRQ>
       <TASK>
       asm_sysvec_apic_timer_interrupt+0x1a/0x20
      RIP: 0010:default_idle+0xf/0x20
      Code: 4c 01 c7 4c 29 c2 e9 72 ff ff ff 90 90 90 90 90 90 90 90 90 90 90 90
       90 90 90 90 f3 0f 1e fa 66 90 0f 00 2d 33 f8 25 00 fb f4 <fa> c3 cc cc cc
       cc 66 66 2e 0f 1f 84 00 00 00 00 00 90 90 90 90 90
      RSP: 0018:ffffffffa2007e28 EFLAGS: 00000242
      RAX: 00000000000f3b31 RBX: 1ffffffff4400fc7 RCX: ffffffffa09c3196
      RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffffffff9f00590f
      RBP: 0000000000000000 R08: 0000000000000001 R09: ffffed102360835d
      R10: ffff88811b041aeb R11: 0000000000000001 R12: 0000000000000000
      R13: ffffffffa202d7c0 R14: 0000000000000000 R15: 00000000000147d0
       default_idle_call+0x6b/0xa0
       cpuidle_idle_call+0x1af/0x1f0
       do_idle+0xbc/0x130
       cpu_startup_entry+0x33/0x40
       rest_init+0x11f/0x210
       start_kernel+0x39a/0x420
       x86_64_start_reservations+0x18/0x30
       x86_64_start_kernel+0x97/0xa0
       common_startup_64+0x13e/0x141
       </TASK>
      
      Allocated by task 595:
       kasan_save_stack+0x24/0x50
       kasan_save_track+0x14/0x30
       __kasan_slab_alloc+0x87/0x90
       kmem_cache_alloc_noprof+0x12b/0x3f0
       copy_net_ns+0x94/0x380
       create_new_namespaces+0x24c/0x500
       unshare_nsproxy_namespaces+0x75/0xf0
       ksys_unshare+0x24e/0x4f0
       __x64_sys_unshare+0x1f/0x30
       do_syscall_64+0x70/0x180
       entry_SYSCALL_64_after_hwframe+0x76/0x7e
      
      Freed by task 100:
       kasan_save_stack+0x24/0x50
       kasan_save_track+0x14/0x30
       kasan_save_free_info+0x3b/0x60
       __kasan_slab_free+0x54/0x70
       kmem_cache_free+0x156/0x5d0
       cleanup_net+0x5d3/0x670
       process_one_work+0x776/0xa90
       worker_thread+0x2e2/0x560
       kthread+0x1a8/0x1f0
       ret_from_fork+0x34/0x60
       ret_from_fork_asm+0x1a/0x30
      
      Reproduction script:
      
      mkdir -p /mnt/nfsshare
      mkdir -p /mnt/nfs/netns_1
      mkfs.ext4 /dev/sdb
      mount /dev/sdb /mnt/nfsshare
      systemctl restart nfs-server
      chmod 777 /mnt/nfsshare
      exportfs -i -o rw,no_root_squash *:/mnt/nfsshare
      
      ip netns add netns_1
      ip link add name veth_1_peer type veth peer veth_1
      ifconfig veth_1_peer 11.11.0.254 up
      ip link set veth_1 netns netns_1
      ip netns exec netns_1 ifconfig veth_1 11.11.0.1
      
      ip netns exec netns_1 /root/iptables -A OUTPUT -d 11.11.0.254 -p tcp \
      	--tcp-flags FIN FIN  -j DROP
      
      (note: In my environment, a DESTROY_CLIENTID operation is always sent
       immediately, breaking the nfs tcp connection.)
      ip netns exec netns_1 timeout -s 9 300 mount -t nfs -o proto=tcp,vers=4.1 \
      	11.11.0.254:/mnt/nfsshare /mnt/nfs/netns_1
      
      ip netns del netns_1
      
      The reason here is that the tcp socket in netns_1 (nfs side) has been
      shutdown and closed (done in xs_destroy), but the FIN message (with ack)
      is discarded, and the nfsd side keeps sending retransmission messages.
      As a result, when the tcp sock in netns_1 processes the received message,
      it sends the message (FIN message) in the sending queue, and the tcp timer
      is re-established. When the network namespace is deleted, the net structure
      accessed by tcp's timer handler function causes problems.
      
      To fix this problem, let's hold netns refcnt for the tcp kernel socket as
      done in other modules. This is an ugly hack which can easily be backported
      to earlier kernels. A proper fix which cleans up the interfaces will
      follow, but may not be so easy to backport.
      
      Fixes: 26abe143 ("net: Modify sk_alloc to not reference count the netns of kernel sockets.")
      Signed-off-by: default avatarLiu Jian <liujian56@huawei.com>
      Acked-by: default avatarJeff Layton <jlayton@kernel.org>
      Reviewed-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@hammerspace.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      61c0a5ea
    • Benjamin Coddington's avatar
      SUNRPC: timeout and cancel TLS handshake with -ETIMEDOUT · a84e6c15
      Benjamin Coddington authored
      
      [ Upstream commit d7bdd849ef1b681da03ac05ca0957b2cbe2d24b6 ]
      
      We've noticed a situation where an unstable TCP connection can cause the
      TLS handshake to timeout waiting for userspace to complete it.  When this
      happens, we don't want to return from xs_tls_handshake_sync() with zero, as
      this will cause the upper xprt to be set CONNECTED, and subsequent attempts
      to transmit will be returned with -EPIPE.  The sunrpc machine does not
      recover from this situation and will spin attempting to transmit.
      
      The return value of tls_handshake_cancel() can be used to detect a race
      with completion:
      
       * tls_handshake_cancel - cancel a pending handshake
       * Return values:
       *   %true - Uncompleted handshake request was canceled
       *   %false - Handshake request already completed or not found
      
      If true, we do not want the upper xprt to be connected, so return
      -ETIMEDOUT.  If false, its possible the handshake request was lost and
      that may be the reason for our timeout.  Again we do not want the upper
      xprt to be connected, so return -ETIMEDOUT.
      
      Ensure that we alway return an error from xs_tls_handshake_sync() if we
      call tls_handshake_cancel().
      
      Signed-off-by: default avatarBenjamin Coddington <bcodding@redhat.com>
      Reviewed-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Fixes: 75eb6af7 ("SUNRPC: Add a TCP-with-TLS RPC transport class")
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@hammerspace.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      a84e6c15
    • Liu Jian's avatar
      sunrpc: clear XPRT_SOCK_UPD_TIMEOUT when reset transport · 66d11ca9
      Liu Jian authored
      
      [ Upstream commit 4db9ad82a6c823094da27de4825af693a3475d51 ]
      
      Since transport->sock has been set to NULL during reset transport,
      XPRT_SOCK_UPD_TIMEOUT also needs to be cleared. Otherwise, the
      xs_tcp_set_socket_timeouts() may be triggered in xs_tcp_send_request()
      to dereference the transport->sock that has been set to NULL.
      
      Fixes: 7196dbb0 ("SUNRPC: Allow changing of the TCP timeout parameters on the fly")
      Signed-off-by: default avatarLi Lingfeng <lilingfeng3@huawei.com>
      Signed-off-by: default avatarLiu Jian <liujian56@huawei.com>
      Signed-off-by: default avatarTrond Myklebust <trond.myklebust@hammerspace.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      66d11ca9
    • Alex Zenla's avatar
      9p/xen: fix release of IRQ · 530bc9f0
      Alex Zenla authored
      
      [ Upstream commit e43c608f40c065b30964f0a806348062991b802d ]
      
      Kernel logs indicate an IRQ was double-freed.
      
      Pass correct device ID during IRQ release.
      
      Fixes: 71ebd719 ("xen/9pfs: connect to the backend")
      Signed-off-by: default avatarAlex Zenla <alex@edera.dev>
      Signed-off-by: default avatarAlexander Merritt <alexander@edera.dev>
      Signed-off-by: default avatarAriadne Conill <ariadne@ariadne.space>
      Reviewed-by: default avatarJuergen Gross <jgross@suse.com>
      Message-ID: <20241121225100.5736-1-alexander@edera.dev>
      [Dominique: remove confusing variable reset to 0]
      Signed-off-by: default avatarDominique Martinet <asmadeus@codewreck.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      530bc9f0
    • Alex Zenla's avatar
      9p/xen: fix init sequence · 592fb738
      Alex Zenla authored
      
      [ Upstream commit 7ef3ae82a6ebbf4750967d1ce43bcdb7e44ff74b ]
      
      Large amount of mount hangs observed during hotplugging of 9pfs devices. The
      9pfs Xen driver attempts to initialize itself more than once, causing the
      frontend and backend to disagree: the backend listens on a channel that the
      frontend does not send on, resulting in stalled processing.
      
      Only allow initialization of 9p frontend once.
      
      Fixes: c15fe55d ("9p/xen: fix connection sequence")
      Signed-off-by: default avatarAlex Zenla <alex@edera.dev>
      Signed-off-by: default avatarAlexander Merritt <alexander@edera.dev>
      Signed-off-by: default avatarAriadne Conill <ariadne@ariadne.space>
      Reviewed-by: default avatarJuergen Gross <jgross@suse.com>
      Message-ID: <20241119211633.38321-1-alexander@edera.dev>
      Signed-off-by: default avatarDominique Martinet <asmadeus@codewreck.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      592fb738
    • Mirsad Todorovac's avatar
      net/9p/usbg: fix handling of the failed kzalloc() memory allocation · 2cdb416d
      Mirsad Todorovac authored
      
      [ Upstream commit ff1060813d9347e8c45c8b8cff93a4dfdb6726ad ]
      
      On the linux-next, next-20241108 vanilla kernel, the coccinelle tool gave the
      following error report:
      
      ./net/9p/trans_usbg.c:912:5-11: ERROR: allocation function on line 911 returns
      NULL not ERR_PTR on failure
      
      kzalloc() failure is fixed to handle the NULL return case on the memory exhaustion.
      
      Fixes: a3be076d ("net/9p/usbg: Add new usb gadget function transport")
      Cc: Michael Grzeschik <m.grzeschik@pengutronix.de>
      Cc: Eric Van Hensbergen <ericvh@kernel.org>
      Cc: Latchesar Ionkov <lucho@ionkov.net>
      Cc: Dominique Martinet <asmadeus@codewreck.org>
      Cc: Christian Schoenebeck <linux_oss@crudebyte.com>
      Cc: v9fs@lists.linux.dev
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarMirsad Todorovac <mtodorovac69@gmail.com>
      Message-ID: <20241109211840.721226-2-mtodorovac69@gmail.com>
      Signed-off-by: default avatarDominique Martinet <asmadeus@codewreck.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      2cdb416d
    • Yang Erkun's avatar
      SUNRPC: make sure cache entry active before cache_show · d882e2b7
      Yang Erkun authored
      
      commit 2862eee078a4d2d1f584e7f24fa50dddfa5f3471 upstream.
      
      The function `c_show` was called with protection from RCU. This only
      ensures that `cp` will not be freed. Therefore, the reference count for
      `cp` can drop to zero, which will trigger a refcount use-after-free
      warning when `cache_get` is called. To resolve this issue, use
      `cache_get_rcu` to ensure that `cp` remains active.
      
      ------------[ cut here ]------------
      refcount_t: addition on 0; use-after-free.
      WARNING: CPU: 7 PID: 822 at lib/refcount.c:25
      refcount_warn_saturate+0xb1/0x120
      CPU: 7 UID: 0 PID: 822 Comm: cat Not tainted 6.12.0-rc3+ #1
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
      1.16.1-2.fc37 04/01/2014
      RIP: 0010:refcount_warn_saturate+0xb1/0x120
      
      Call Trace:
       <TASK>
       c_show+0x2fc/0x380 [sunrpc]
       seq_read_iter+0x589/0x770
       seq_read+0x1e5/0x270
       proc_reg_read+0xe1/0x140
       vfs_read+0x125/0x530
       ksys_read+0xc1/0x160
       do_syscall_64+0x5f/0x170
       entry_SYSCALL_64_after_hwframe+0x76/0x7e
      
      Cc: stable@vger.kernel.org # v4.20+
      Signed-off-by: default avatarYang Erkun <yangerkun@huawei.com>
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d882e2b7
    • Joe Damato's avatar
      netdev-genl: Hold rcu_read_lock in napi_get · 343e3e90
      Joe Damato authored
      
      commit c53bf100f68619acf6cedcf4cf5249a1ca2db0b4 upstream.
      
      Hold rcu_read_lock in netdev_nl_napi_get_doit, which calls napi_by_id
      and is required to be called under rcu_read_lock.
      
      Cc: stable@vger.kernel.org
      Fixes: 27f91aaf ("netdev-genl: Add netlink framework functions for napi")
      Signed-off-by: default avatarJoe Damato <jdamato@fastly.com>
      Link: https://patch.msgid.link/20241114175157.16604-1-jdamato@fastly.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      343e3e90
    • Jeongjun Park's avatar
      netfilter: ipset: add missing range check in bitmap_ip_uadt · 15794835
      Jeongjun Park authored
      
      commit 35f56c554eb1b56b77b3cf197a6b00922d49033d upstream.
      
      When tb[IPSET_ATTR_IP_TO] is not present but tb[IPSET_ATTR_CIDR] exists,
      the values of ip and ip_to are slightly swapped. Therefore, the range check
      for ip should be done later, but this part is missing and it seems that the
      vulnerability occurs.
      
      So we should add missing range checks and remove unnecessary range checks.
      
      Cc: <stable@vger.kernel.org>
      Reported-by: default avatar <syzbot+58c872f7790a4d2ac951@syzkaller.appspotmail.com>
      Fixes: 72205fc6 ("netfilter: ipset: bitmap:ip set type support")
      Signed-off-by: default avatarJeongjun Park <aha310510@gmail.com>
      Acked-by: default avatarJozsef Kadlecsik <kadlec@blackhole.kfki.hu>
      Signed-off-by: default avatarPablo Neira Ayuso <pablo@netfilter.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      15794835
    • Aleksei Vetrov's avatar
      wifi: nl80211: fix bounds checker error in nl80211_parse_sched_scan · 1a7b62dd
      Aleksei Vetrov authored
      
      commit 9c46a3a5b394d6d123866aa44436fc2cd342eb0d upstream.
      
      The channels array in the cfg80211_scan_request has a __counted_by
      attribute attached to it, which points to the n_channels variable. This
      attribute is used in bounds checking, and if it is not set before the
      array is filled, then the bounds sanitizer will issue a warning or a
      kernel panic if CONFIG_UBSAN_TRAP is set.
      
      This patch sets the size of allocated memory as the initial value for
      n_channels. It is updated with the actual number of added elements after
      the array is filled.
      
      Fixes: aa4ec06c ("wifi: cfg80211: use __counted_by where appropriate")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarAleksei Vetrov <vvvvvv@google.com>
      Reviewed-by: default avatarJeff Johnson <quic_jjohnson@quicinc.com>
      Link: https://patch.msgid.link/20241029-nl80211_parse_sched_scan-bounds-checker-fix-v2-1-c804b787341f@google.com
      
      
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      1a7b62dd
    • Andrej Shadura's avatar
      Bluetooth: Fix type of len in rfcomm_sock_getsockopt{,_old}() · 33209e6f
      Andrej Shadura authored
      
      commit 5fe6caa62b07fd39cd6a28acc8f92ba2955e11a6 upstream.
      
      Commit 9bf4e919 worked around an issue introduced after an innocuous
      optimisation change in LLVM main:
      
      > len is defined as an 'int' because it is assigned from
      > '__user int *optlen'. However, it is clamped against the result of
      > sizeof(), which has a type of 'size_t' ('unsigned long' for 64-bit
      > platforms). This is done with min_t() because min() requires compatible
      > types, which results in both len and the result of sizeof() being casted
      > to 'unsigned int', meaning len changes signs and the result of sizeof()
      > is truncated. From there, len is passed to copy_to_user(), which has a
      > third parameter type of 'unsigned long', so it is widened and changes
      > signs again. This excessive casting in combination with the KCSAN
      > instrumentation causes LLVM to fail to eliminate the __bad_copy_from()
      > call, failing the build.
      
      The same issue occurs in rfcomm in functions rfcomm_sock_getsockopt and
      rfcomm_sock_getsockopt_old.
      
      Change the type of len to size_t in both rfcomm_sock_getsockopt and
      rfcomm_sock_getsockopt_old and replace min_t() with min().
      
      Cc: stable@vger.kernel.org
      Co-authored-by: default avatarAleksei Vetrov <vvvvvv@google.com>
      Improves: 9bf4e919 ("Bluetooth: Fix type of len in {l2cap,sco}_sock_getsockopt_old()")
      Link: https://github.com/ClangBuiltLinux/linux/issues/2007
      Link: https://github.com/llvm/llvm-project/issues/85647
      
      
      Signed-off-by: default avatarAndrej Shadura <andrew.shadura@collabora.co.uk>
      Reviewed-by: default avatarNathan Chancellor <nathan@kernel.org>
      Signed-off-by: default avatarLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      33209e6f
    • Jakub Kicinski's avatar
      net_sched: sch_fq: don't follow the fast path if Tx is behind now · d716851d
      Jakub Kicinski authored
      
      commit 122aba8c80618eca904490b1733af27fb8f07528 upstream.
      
      Recent kernels cause a lot of TCP retransmissions
      
      [ ID] Interval           Transfer     Bitrate         Retr  Cwnd
      [  5]   0.00-1.00   sec  2.24 GBytes  19.2 Gbits/sec  2767    442 KBytes
      [  5]   1.00-2.00   sec  2.23 GBytes  19.1 Gbits/sec  2312    350 KBytes
                                                            ^^^^
      
      Replacing the qdisc with pfifo makes retransmissions go away.
      
      It appears that a flow may have a delayed packet with a very near
      Tx time. Later, we may get busy processing Rx and the target Tx time
      will pass, but we won't service Tx since the CPU is busy with Rx.
      If Rx sees an ACK and we try to push more data for the delayed flow
      we may fastpath the skb, not realizing that there are already "ready
      to send" packets for this flow sitting in the qdisc.
      
      Don't trust the fastpath if we are "behind" according to the projected
      Tx time for next flow waiting in the Qdisc. Because we consider anything
      within the offload window to be okay for fastpath we must consider
      the entire offload window as "now".
      
      Qdisc config:
      
      qdisc fq 8001: dev eth0 parent 1234:1 limit 10000p flow_limit 100p \
        buckets 32768 orphan_mask 1023 bands 3 \
        priomap 1 2 2 2 1 2 0 0 1 1 1 1 1 1 1 1 \
        weights 589824 196608 65536 quantum 3028b initial_quantum 15140b \
        low_rate_threshold 550Kbit \
        refill_delay 40ms timer_slack 10us horizon 10s horizon_drop
      
      For iperf this change seems to do fine, the reordering is gone.
      The fastpath still gets used most of the time:
      
        gc 0 highprio 0 fastpath 142614 throttled 418309 latency 19.1us
         xx_behind 2731
      
      where "xx_behind" counts how many times we hit the new "return false".
      
      CC: stable@vger.kernel.org
      Fixes: 076433bd ("net_sched: sch_fq: add fast path for mostly idle qdisc")
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Link: https://patch.msgid.link/20241124022148.3126719-1-kuba@kernel.org
      
      
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      [stable: drop the offload horizon, it's not supported / 0]
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      d716851d
    • Paolo Abeni's avatar
      ipmr: fix tables suspicious RCU usage · 72bb7903
      Paolo Abeni authored
      
      [ Upstream commit fc9c273d6daaa9866f349bbe8cae25c67764c456 ]
      
      Similar to the previous patch, plumb the RCU lock inside
      the ipmr_get_table(), provided a lockless variant and apply
      the latter in the few spots were the lock is already held.
      
      Fixes: 709b46e8 ("net: Add compat ioctl support for the ipv4 multicast ioctl SIOCGETSGCNT")
      Fixes: f0ad0860 ("ipv4: ipmr: support multiple tables")
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      72bb7903
    • Paolo Abeni's avatar
      ip6mr: fix tables suspicious RCU usage · ead7c126
      Paolo Abeni authored
      
      [ Upstream commit f1553c9894b4dbeb10a2ab15ab1aa113b3b4047c ]
      
      Several places call ip6mr_get_table() with no RCU nor RTNL lock.
      Add RCU protection inside such helper and provide a lockless variant
      for the few callers that already acquired the relevant lock.
      
      Note that some users additionally reference the table outside the RCU
      lock. That is actually safe as the table deletion can happen only
      after all table accesses are completed.
      
      Fixes: e2d57766 ("net: Provide compat support for SIOCGETMIFCNT_IN6 and SIOCGETSGCNT_IN6.")
      Fixes: d7c31cbd ("net: ip6mr: add RTM_GETROUTE netlink op")
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ead7c126
    • Kuniyuki Iwashima's avatar
      tcp: Fix use-after-free of nreq in reqsk_timer_handler(). · 6d845028
      Kuniyuki Iwashima authored
      
      [ Upstream commit c31e72d021db2714df03df6c42855a1db592716c ]
      
      The cited commit replaced inet_csk_reqsk_queue_drop_and_put() with
      __inet_csk_reqsk_queue_drop() and reqsk_put() in reqsk_timer_handler().
      
      Then, oreq should be passed to reqsk_put() instead of req; otherwise
      use-after-free of nreq could happen when reqsk is migrated but the
      retry attempt failed (e.g. due to timeout).
      
      Let's pass oreq to reqsk_put().
      
      Fixes: e8c526f2 ("tcp/dccp: Don't use timer_pending() in reqsk_queue_unlink().")
      Reported-by: default avatarLiu Jian <liujian56@huawei.com>
      Closes: https://lore.kernel.org/netdev/1284490f-9525-42ee-b7b8-ccadf6606f6d@huawei.com/
      
      
      Signed-off-by: default avatarKuniyuki Iwashima <kuniyu@amazon.com>
      Reviewed-by: default avatarVadim Fedorenko <vadim.fedorenko@linux.dev>
      Reviewed-by: default avatarLiu Jian <liujian56@huawei.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      Link: https://patch.msgid.link/20241123174236.62438-1-kuniyu@amazon.com
      
      
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      6d845028
    • Michal Luczaj's avatar
      rxrpc: Improve setsockopt() handling of malformed user input · 8052a9b9
      Michal Luczaj authored
      
      [ Upstream commit 02020056647017e70509bb58c3096448117099e1 ]
      
      copy_from_sockptr() does not return negative value on error; instead, it
      reports the number of bytes that failed to copy. Since it's deprecated,
      switch to copy_safe_from_sockptr().
      
      Note: Keeping the `optlen != sizeof(unsigned int)` check as
      copy_safe_from_sockptr() by itself would also accept
      optlen > sizeof(unsigned int). Which would allow a more lenient handling
      of inputs.
      
      Fixes: 17926a79 ("[AF_RXRPC]: Provide secure RxRPC sockets for use by userspace and kernel both")
      Signed-off-by: default avatarMichal Luczaj <mhal@rbox.co>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      8052a9b9
    • Michal Luczaj's avatar
      llc: Improve setsockopt() handling of malformed user input · 3c0e013b
      Michal Luczaj authored
      
      [ Upstream commit 1465036b10be4b8b00eb31c879e86de633ad74c1 ]
      
      copy_from_sockptr() is used incorrectly: return value is the number of
      bytes that could not be copied. Since it's deprecated, switch to
      copy_safe_from_sockptr().
      
      Note: Keeping the `optlen != sizeof(int)` check as copy_safe_from_sockptr()
      by itself would also accept optlen > sizeof(int). Which would allow a more
      lenient handling of inputs.
      
      Fixes: 1da177e4 ("Linux-2.6.12-rc2")
      Suggested-by: default avatarDavid Wei <dw@davidwei.uk>
      Signed-off-by: default avatarMichal Luczaj <mhal@rbox.co>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      3c0e013b
    • Luiz Augusto von Dentz's avatar
      Bluetooth: MGMT: Fix possible deadlocks · cac34e44
      Luiz Augusto von Dentz authored
      
      [ Upstream commit a66dfaf18fd61bb75ef8cee83db46b2aadf153d0 ]
      
      This fixes possible deadlocks like the following caused by
      hci_cmd_sync_dequeue causing the destroy function to run:
      
       INFO: task kworker/u19:0:143 blocked for more than 120 seconds.
             Tainted: G        W  O        6.8.0-2024-03-19-intel-next-iLS-24ww14 #1
       "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
       task:kworker/u19:0   state:D stack:0     pid:143   tgid:143   ppid:2      flags:0x00004000
       Workqueue: hci0 hci_cmd_sync_work [bluetooth]
       Call Trace:
        <TASK>
        __schedule+0x374/0xaf0
        schedule+0x3c/0xf0
        schedule_preempt_disabled+0x1c/0x30
        __mutex_lock.constprop.0+0x3ef/0x7a0
        __mutex_lock_slowpath+0x13/0x20
        mutex_lock+0x3c/0x50
        mgmt_set_connectable_complete+0xa4/0x150 [bluetooth]
        ? kfree+0x211/0x2a0
        hci_cmd_sync_dequeue+0xae/0x130 [bluetooth]
        ? __pfx_cmd_complete_rsp+0x10/0x10 [bluetooth]
        cmd_complete_rsp+0x26/0x80 [bluetooth]
        mgmt_pending_foreach+0x4d/0x70 [bluetooth]
        __mgmt_power_off+0x8d/0x180 [bluetooth]
        ? _raw_spin_unlock_irq+0x23/0x40
        hci_dev_close_sync+0x445/0x5b0 [bluetooth]
        hci_set_powered_sync+0x149/0x250 [bluetooth]
        set_powered_sync+0x24/0x60 [bluetooth]
        hci_cmd_sync_work+0x90/0x150 [bluetooth]
        process_one_work+0x13e/0x300
        worker_thread+0x2f7/0x420
        ? __pfx_worker_thread+0x10/0x10
        kthread+0x107/0x140
        ? __pfx_kthread+0x10/0x10
        ret_from_fork+0x3d/0x60
        ? __pfx_kthread+0x10/0x10
        ret_from_fork_asm+0x1b/0x30
        </TASK>
      
      Tested-by: default avatarKiran K <kiran.k@intel.com>
      Fixes: f53e1c9c ("Bluetooth: MGMT: Fix possible crash on mgmt_index_removed")
      Signed-off-by: default avatarLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      cac34e44
    • Luiz Augusto von Dentz's avatar
      Bluetooth: MGMT: Fix slab-use-after-free Read in set_powered_sync · 87819234
      Luiz Augusto von Dentz authored
      
      [ Upstream commit 0b882940665ca2849386ee459d4331aa2f8c4e7d ]
      
      This fixes the following crash:
      
      ==================================================================
      BUG: KASAN: slab-use-after-free in set_powered_sync+0x3a/0xc0 net/bluetooth/mgmt.c:1353
      Read of size 8 at addr ffff888029b4dd18 by task kworker/u9:0/54
      
      CPU: 1 UID: 0 PID: 54 Comm: kworker/u9:0 Not tainted 6.11.0-rc6-syzkaller-01155-gf723224742fc #0
      Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 08/06/2024
      Workqueue: hci0 hci_cmd_sync_work
      Call Trace:
       <TASK>
       __dump_stack lib/dump_stack.c:93 [inline]
       dump_stack_lvl+0x241/0x360 lib/dump_stack.c:119
       print_address_description mm/kasan/report.c:377 [inline]
       print_report+0x169/0x550 mm/kasan/report.c:488
      q kasan_report+0x143/0x180 mm/kasan/report.c:601
       set_powered_sync+0x3a/0xc0 net/bluetooth/mgmt.c:1353
       hci_cmd_sync_work+0x22b/0x400 net/bluetooth/hci_sync.c:328
       process_one_work kernel/workqueue.c:3231 [inline]
       process_scheduled_works+0xa2c/0x1830 kernel/workqueue.c:3312
       worker_thread+0x86d/0xd10 kernel/workqueue.c:3389
       kthread+0x2f0/0x390 kernel/kthread.c:389
       ret_from_fork+0x4b/0x80 arch/x86/kernel/process.c:147
       ret_from_fork_asm+0x1a/0x30 arch/x86/entry/entry_64.S:244
       </TASK>
      
      Allocated by task 5247:
       kasan_save_stack mm/kasan/common.c:47 [inline]
       kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
       poison_kmalloc_redzone mm/kasan/common.c:370 [inline]
       __kasan_kmalloc+0x98/0xb0 mm/kasan/common.c:387
       kasan_kmalloc include/linux/kasan.h:211 [inline]
       __kmalloc_cache_noprof+0x19c/0x2c0 mm/slub.c:4193
       kmalloc_noprof include/linux/slab.h:681 [inline]
       kzalloc_noprof include/linux/slab.h:807 [inline]
       mgmt_pending_new+0x65/0x250 net/bluetooth/mgmt_util.c:269
       mgmt_pending_add+0x36/0x120 net/bluetooth/mgmt_util.c:296
       set_powered+0x3cd/0x5e0 net/bluetooth/mgmt.c:1394
       hci_mgmt_cmd+0xc47/0x11d0 net/bluetooth/hci_sock.c:1712
       hci_sock_sendmsg+0x7b8/0x11c0 net/bluetooth/hci_sock.c:1832
       sock_sendmsg_nosec net/socket.c:730 [inline]
       __sock_sendmsg+0x221/0x270 net/socket.c:745
       sock_write_iter+0x2dd/0x400 net/socket.c:1160
       new_sync_write fs/read_write.c:497 [inline]
       vfs_write+0xa72/0xc90 fs/read_write.c:590
       ksys_write+0x1a0/0x2c0 fs/read_write.c:643
       do_syscall_x64 arch/x86/entry/common.c:52 [inline]
       do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
       entry_SYSCALL_64_after_hwframe+0x77/0x7f
      
      Freed by task 5246:
       kasan_save_stack mm/kasan/common.c:47 [inline]
       kasan_save_track+0x3f/0x80 mm/kasan/common.c:68
       kasan_save_free_info+0x40/0x50 mm/kasan/generic.c:579
       poison_slab_object+0xe0/0x150 mm/kasan/common.c:240
       __kasan_slab_free+0x37/0x60 mm/kasan/common.c:256
       kasan_slab_free include/linux/kasan.h:184 [inline]
       slab_free_hook mm/slub.c:2256 [inline]
       slab_free mm/slub.c:4477 [inline]
       kfree+0x149/0x360 mm/slub.c:4598
       settings_rsp+0x2bc/0x390 net/bluetooth/mgmt.c:1443
       mgmt_pending_foreach+0xd1/0x130 net/bluetooth/mgmt_util.c:259
       __mgmt_power_off+0x112/0x420 net/bluetooth/mgmt.c:9455
       hci_dev_close_sync+0x665/0x11a0 net/bluetooth/hci_sync.c:5191
       hci_dev_do_close net/bluetooth/hci_core.c:483 [inline]
       hci_dev_close+0x112/0x210 net/bluetooth/hci_core.c:508
       sock_do_ioctl+0x158/0x460 net/socket.c:1222
       sock_ioctl+0x629/0x8e0 net/socket.c:1341
       vfs_ioctl fs/ioctl.c:51 [inline]
       __do_sys_ioctl fs/ioctl.c:907 [inline]
       __se_sys_ioctl+0xfc/0x170 fs/ioctl.c:893
       do_syscall_x64 arch/x86/entry/common.c:52 [inline]
       do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83gv
       entry_SYSCALL_64_after_hwframe+0x77/0x7f
      
      Reported-by: default avatar <syzbot+03d6270b6425df1605bf@syzkaller.appspotmail.com>
      Tested-by: default avatar <syzbot+03d6270b6425df1605bf@syzkaller.appspotmail.com>
      Closes: https://syzkaller.appspot.com/bug?extid=03d6270b6425df1605bf
      
      
      Fixes: 275f3f64 ("Bluetooth: Fix not checking MGMT cmd pending queue")
      Signed-off-by: default avatarLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      87819234
    • Eric Dumazet's avatar
      net: hsr: fix hsr_init_sk() vs network/transport headers. · 50aa2502
      Eric Dumazet authored
      
      [ Upstream commit 9cfb5e7f0ded2bfaabc270ceb5f91d13f0e805b9 ]
      
      Following sequence in hsr_init_sk() is invalid :
      
          skb_reset_mac_header(skb);
          skb_reset_mac_len(skb);
          skb_reset_network_header(skb);
          skb_reset_transport_header(skb);
      
      It is invalid because skb_reset_mac_len() needs the correct
      network header, which should be after the mac header.
      
      This patch moves the skb_reset_network_header()
      and skb_reset_transport_header() before
      the call to dev_hard_header().
      
      As a result skb->mac_len is no longer set to a value
      close to 65535.
      
      Fixes: 48b491a5 ("net: hsr: fix mac_len checks")
      Signed-off-by: default avatarEric Dumazet <edumazet@google.com>
      Cc: George McCollister <george.mccollister@gmail.com>
      Link: https://patch.msgid.link/20241122171343.897551-1-edumazet@google.com
      
      
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      50aa2502
    • Hangbin Liu's avatar
      net/ipv6: delete temporary address if mngtmpaddr is removed or unmanaged · cb74207e
      Hangbin Liu authored
      
      [ Upstream commit 00b5b7aab9e422d00d5a9d03d7e0760a76b5d57f ]
      
      RFC8981 section 3.4 says that existing temporary addresses must have their
      lifetimes adjusted so that no temporary addresses should ever remain "valid"
      or "preferred" longer than the incoming SLAAC Prefix Information. This would
      strongly imply in Linux's case that if the "mngtmpaddr" address is deleted or
      un-flagged as such, its corresponding temporary addresses must be cleared out
      right away.
      
      But now the temporary address is renewed even after ‘mngtmpaddr’ is removed
      or becomes unmanaged as manage_tempaddrs() set temporary addresses
      prefered/valid time to 0, and later in addrconf_verify_rtnl() all checkings
      failed to remove the addresses. Fix this by deleting the temporary address
      directly for these situations.
      
      Fixes: 778964f2 ("ipv6/addrconf: fix timing bug in tempaddr regen")
      Signed-off-by: default avatarHangbin Liu <liuhangbin@gmail.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      cb74207e
    • Sidraya Jayagond's avatar
      s390/iucv: MSG_PEEK causes memory leak in iucv_sock_destruct() · 9f603e66
      Sidraya Jayagond authored
      
      [ Upstream commit ebaf81317e42aa990ad20b113cfe3a7b20d4e937 ]
      
      Passing MSG_PEEK flag to skb_recv_datagram() increments skb refcount
      (skb->users) and iucv_sock_recvmsg() does not decrement skb refcount
      at exit.
      This results in skb memory leak in skb_queue_purge() and WARN_ON in
      iucv_sock_destruct() during socket close. To fix this decrease
      skb refcount by one if MSG_PEEK is set in order to prevent memory
      leak and WARN_ON.
      
      WARNING: CPU: 2 PID: 6292 at net/iucv/af_iucv.c:286 iucv_sock_destruct+0x144/0x1a0 [af_iucv]
      CPU: 2 PID: 6292 Comm: afiucv_test_msg Kdump: loaded Tainted: G        W          6.10.0-rc7 #1
      Hardware name: IBM 3931 A01 704 (z/VM 7.3.0)
      Call Trace:
              [<001587c682c4aa98>] iucv_sock_destruct+0x148/0x1a0 [af_iucv]
              [<001587c682c4a9d0>] iucv_sock_destruct+0x80/0x1a0 [af_iucv]
              [<001587c704117a32>] __sk_destruct+0x52/0x550
              [<001587c704104a54>] __sock_release+0xa4/0x230
              [<001587c704104c0c>] sock_close+0x2c/0x40
              [<001587c702c5f5a8>] __fput+0x2e8/0x970
              [<001587c7024148c4>] task_work_run+0x1c4/0x2c0
              [<001587c7023b0716>] do_exit+0x996/0x1050
              [<001587c7023b13aa>] do_group_exit+0x13a/0x360
              [<001587c7023b1626>] __s390x_sys_exit_group+0x56/0x60
              [<001587c7022bccca>] do_syscall+0x27a/0x380
              [<001587c7049a6a0c>] __do_syscall+0x9c/0x160
              [<001587c7049ce8a8>] system_call+0x70/0x98
              Last Breaking-Event-Address:
              [<001587c682c4a9d4>] iucv_sock_destruct+0x84/0x1a0 [af_iucv]
      
      Fixes: eac3731b ("[S390]: Add AF_IUCV socket support")
      Reviewed-by: default avatarAlexandra Winter <wintera@linux.ibm.com>
      Reviewed-by: default avatarThorsten Winkler <twinkler@linux.ibm.com>
      Signed-off-by: default avatarSidraya Jayagond <sidraya@linux.ibm.com>
      Signed-off-by: default avatarAlexandra Winter <wintera@linux.ibm.com>
      Reviewed-by: default avatarDavid Wei <dw@davidwei.uk>
      Link: https://patch.msgid.link/20241119152219.3712168-1-wintera@linux.ibm.com
      
      
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      9f603e66
    • James Chapman's avatar
      net/l2tp: fix warning in l2tp_exit_net found by syzbot · a487cc89
      James Chapman authored
      
      [ Upstream commit 5d066766c5f1252f98ff859265bcd1a5b52ac46c ]
      
      In l2tp's net exit handler, we check that an IDR is empty before
      destroying it:
      
      	WARN_ON_ONCE(!idr_is_empty(&pn->l2tp_tunnel_idr));
      	idr_destroy(&pn->l2tp_tunnel_idr);
      
      By forcing memory allocation failures in idr_alloc_32, syzbot is able
      to provoke a condition where idr_is_empty returns false despite there
      being no items in the IDR. This turns out to be because the radix tree
      of the IDR contains only internal radix-tree nodes and it is this that
      causes idr_is_empty to return false. The internal nodes are cleaned by
      idr_destroy.
      
      Use idr_for_each to check that the IDR is empty instead of
      idr_is_empty to avoid the problem.
      
      Reported-by: default avatar <syzbot+332fe1e67018625f63c9@syzkaller.appspotmail.com>
      Closes: https://syzkaller.appspot.com/bug?extid=332fe1e67018625f63c9
      
      
      Fixes: 73d33bd0 ("l2tp: avoid using drain_workqueue in l2tp_pre_exit_net")
      Signed-off-by: default avatarJames Chapman <jchapman@katalix.com>
      Link: https://patch.msgid.link/20241118140411.1582555-1-jchapman@katalix.com
      
      
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      a487cc89
    • Jakub Kicinski's avatar
      netlink: fix false positive warning in extack during dumps · 28af028a
      Jakub Kicinski authored
      
      [ Upstream commit 3bf39fa849ab8ed52abb6715922e6102d3df9f97 ]
      
      Commit under fixes extended extack reporting to dumps.
      It works under normal conditions, because extack errors are
      usually reported during ->start() or the first ->dump(),
      it's quite rare that the dump starts okay but fails later.
      If the dump does fail later, however, the input skb will
      already have the initiating message pulled, so checking
      if bad attr falls within skb->data will fail.
      
      Switch the check to using nlh, which is always valid.
      
      syzbot found a way to hit that scenario by filling up
      the receive queue. In this case we initiate a dump
      but don't call ->dump() until there is read space for
      an skb.
      
      WARNING: CPU: 1 PID: 5845 at net/netlink/af_netlink.c:2210 netlink_ack_tlv_fill+0x1a8/0x560 net/netlink/af_netlink.c:2209
      RIP: 0010:netlink_ack_tlv_fill+0x1a8/0x560 net/netlink/af_netlink.c:2209
      Call Trace:
       <TASK>
       netlink_dump_done+0x513/0x970 net/netlink/af_netlink.c:2250
       netlink_dump+0x91f/0xe10 net/netlink/af_netlink.c:2351
       netlink_recvmsg+0x6bb/0x11d0 net/netlink/af_netlink.c:1983
       sock_recvmsg_nosec net/socket.c:1051 [inline]
       sock_recvmsg+0x22f/0x280 net/socket.c:1073
       __sys_recvfrom+0x246/0x3d0 net/socket.c:2267
       __do_sys_recvfrom net/socket.c:2285 [inline]
       __se_sys_recvfrom net/socket.c:2281 [inline]
       __x64_sys_recvfrom+0xde/0x100 net/socket.c:2281
       do_syscall_x64 arch/x86/entry/common.c:52 [inline]
       do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
       entry_SYSCALL_64_after_hwframe+0x77/0x7f
       RIP: 0033:0x7ff37dd17a79
      
      Reported-by: default avatar <syzbot+d4373fa8042c06cefa84@syzkaller.appspotmail.com>
      Fixes: 8af4f604 ("netlink: support all extack types in dumps")
      Reviewed-by: default avatarJacob Keller <jacob.e.keller@intel.com>
      Link: https://patch.msgid.link/20241119224432.1713040-1-kuba@kernel.org
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      28af028a
    • Ye Bin's avatar
      svcrdma: fix miss destroy percpu_counter in svc_rdma_proc_init() · 20322edc
      Ye Bin authored
      
      [ Upstream commit ce89e742a4c12b20f09a43fec1b21db33f2166cd ]
      
      There's issue as follows:
      RPC: Registered rdma transport module.
      RPC: Registered rdma backchannel transport module.
      RPC: Unregistered rdma transport module.
      RPC: Unregistered rdma backchannel transport module.
      BUG: unable to handle page fault for address: fffffbfff80c609a
      PGD 123fee067 P4D 123fee067 PUD 123fea067 PMD 10c624067 PTE 0
      Oops: Oops: 0000 [#1] PREEMPT SMP KASAN NOPTI
      RIP: 0010:percpu_counter_destroy_many+0xf7/0x2a0
      Call Trace:
       <TASK>
       __die+0x1f/0x70
       page_fault_oops+0x2cd/0x860
       spurious_kernel_fault+0x36/0x450
       do_kern_addr_fault+0xca/0x100
       exc_page_fault+0x128/0x150
       asm_exc_page_fault+0x26/0x30
       percpu_counter_destroy_many+0xf7/0x2a0
       mmdrop+0x209/0x350
       finish_task_switch.isra.0+0x481/0x840
       schedule_tail+0xe/0xd0
       ret_from_fork+0x23/0x80
       ret_from_fork_asm+0x1a/0x30
       </TASK>
      
      If register_sysctl() return NULL, then svc_rdma_proc_cleanup() will not
      destroy the percpu counters which init in svc_rdma_proc_init().
      If CONFIG_HOTPLUG_CPU is enabled, residual nodes may be in the
      'percpu_counters' list. The above issue may occur once the module is
      removed. If the CONFIG_HOTPLUG_CPU configuration is not enabled, memory
      leakage occurs.
      To solve above issue just destroy all percpu counters when
      register_sysctl() return NULL.
      
      Fixes: 1e7e5573 ("svcrdma: Restore read and write stats")
      Fixes: 22df5a22 ("svcrdma: Convert rdma_stat_sq_starve to a per-CPU counter")
      Fixes: df971cd8 ("svcrdma: Convert rdma_stat_recv to a per-CPU counter")
      Signed-off-by: default avatarYe Bin <yebin10@huawei.com>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      20322edc
    • Chuck Lever's avatar
      svcrdma: Address an integer overflow · e5c440c2
      Chuck Lever authored
      
      [ Upstream commit 3c63d8946e578663b868cb9912dac616ea68bfd0 ]
      
      Dan Carpenter reports:
      > Commit 78147ca8 ("svcrdma: Add a "parsed chunk list" data
      > structure") from Jun 22, 2020 (linux-next), leads to the following
      > Smatch static checker warning:
      >
      >	net/sunrpc/xprtrdma/svc_rdma_recvfrom.c:498 xdr_check_write_chunk()
      >	warn: potential user controlled sizeof overflow 'segcount * 4 * 4'
      >
      > net/sunrpc/xprtrdma/svc_rdma_recvfrom.c
      >     488 static bool xdr_check_write_chunk(struct svc_rdma_recv_ctxt *rctxt)
      >     489 {
      >     490         u32 segcount;
      >     491         __be32 *p;
      >     492
      >     493         if (xdr_stream_decode_u32(&rctxt->rc_stream, &segcount))
      >                                                               ^^^^^^^^
      >
      >     494                 return false;
      >     495
      >     496         /* A bogus segcount causes this buffer overflow check to fail. */
      >     497         p = xdr_inline_decode(&rctxt->rc_stream,
      > --> 498                               segcount * rpcrdma_segment_maxsz * sizeof(*p));
      >
      >
      > segcount is an untrusted u32.  On 32bit systems anything >= SIZE_MAX / 16 will
      > have an integer overflow and some those values will be accepted by
      > xdr_inline_decode().
      
      Reported-by: default avatarDan Carpenter <dan.carpenter@linaro.org>
      Fixes: 78147ca8 ("svcrdma: Add a "parsed chunk list" data structure")
      Reviewed-by: default avatarJeff Layton <jlayton@kernel.org>
      Signed-off-by: default avatarChuck Lever <chuck.lever@oracle.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      e5c440c2
    • Jiayuan Chen's avatar
      bpf: fix recursive lock when verdict program return SK_PASS · f84c5ef6
      Jiayuan Chen authored
      [ Upstream commit 8ca2a1eeadf09862190b2810697702d803ceef2d ]
      
      When the stream_verdict program returns SK_PASS, it places the received skb
      into its own receive queue, but a recursive lock eventually occurs, leading
      to an operating system deadlock. This issue has been present since v6.9.
      
      '''
      sk_psock_strp_data_ready
          write_lock_bh(&sk->sk_callback_lock)
          strp_data_ready
            strp_read_sock
              read_sock -> tcp_read_sock
                strp_recv
                  cb.rcv_msg -> sk_psock_strp_read
                    # now stream_verdict return SK_PASS without peer sock assign
                    __SK_PASS = sk_psock_map_verd(SK_PASS, NULL)
                    sk_psock_verdict_apply
                      sk_psock_skb_ingress_self
                        sk_psock_skb_ingress_enqueue
                          sk_psock_data_ready
                            read_lock_bh(&sk->sk_callback_lock) <= dead lock
      
      '''
      
      This topic has been discussed before, but it has not been fixed.
      Previous discussion:
      https://lore.kernel.org/all/6684a5864ec86_403d20898@john.notmuch
      
      
      
      Fixes: 6648e613 ("bpf, skmsg: Fix NULL pointer dereference in sk_psock_skb_ingress_enqueue")
      Reported-by: default avatarVincent Whitchurch <vincent.whitchurch@datadoghq.com>
      Signed-off-by: default avatarJiayuan Chen <mrpre@163.com>
      Signed-off-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Acked-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      Link: https://patch.msgid.link/20241118030910.36230-2-mrpre@163.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      f84c5ef6
    • Felix Maurer's avatar
      xsk: Free skb when TX metadata options are invalid · d5d346de
      Felix Maurer authored
      
      [ Upstream commit 0c0d0f42ffa6ac94cd79893b7ed419c15e1b45de ]
      
      When a new skb is allocated for transmitting an xsk descriptor, i.e., for
      every non-multibuf descriptor or the first frag of a multibuf descriptor,
      but the descriptor is later found to have invalid options set for the TX
      metadata, the new skb is never freed. This can leak skbs until the send
      buffer is full which makes sending more packets impossible.
      
      Fix this by freeing the skb in the error path if we are currently dealing
      with the first frag, i.e., an skb allocated in this iteration of
      xsk_build_skb.
      
      Fixes: 48eb03dd ("xsk: Add TX timestamp and TX checksum offload support")
      Reported-by: default avatarMichal Schmidt <mschmidt@redhat.com>
      Signed-off-by: default avatarFelix Maurer <fmaurer@redhat.com>
      Reviewed-by: default avatarToke Høiland-Jørgensen <toke@redhat.com>
      Acked-by: default avatarStanislav Fomichev <sdf@fomichev.me>
      Acked-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      Link: https://patch.msgid.link/edb9b00fb19e680dff5a3350cd7581c5927975a8.1731581697.git.fmaurer@redhat.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      d5d346de
    • Dmitry Antipov's avatar
      Bluetooth: fix use-after-free in device_for_each_child() · 7b277bd5
      Dmitry Antipov authored
      
      [ Upstream commit 27aabf27fd014ae037cc179c61b0bee7cff55b3d ]
      
      Syzbot has reported the following KASAN splat:
      
      BUG: KASAN: slab-use-after-free in device_for_each_child+0x18f/0x1a0
      Read of size 8 at addr ffff88801f605308 by task kbnepd bnep0/4980
      
      CPU: 0 UID: 0 PID: 4980 Comm: kbnepd bnep0 Not tainted 6.12.0-rc4-00161-gae90f6a6170d #1
      Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-2.fc40 04/01/2014
      Call Trace:
       <TASK>
       dump_stack_lvl+0x100/0x190
       ? device_for_each_child+0x18f/0x1a0
       print_report+0x13a/0x4cb
       ? __virt_addr_valid+0x5e/0x590
       ? __phys_addr+0xc6/0x150
       ? device_for_each_child+0x18f/0x1a0
       kasan_report+0xda/0x110
       ? device_for_each_child+0x18f/0x1a0
       ? __pfx_dev_memalloc_noio+0x10/0x10
       device_for_each_child+0x18f/0x1a0
       ? __pfx_device_for_each_child+0x10/0x10
       pm_runtime_set_memalloc_noio+0xf2/0x180
       netdev_unregister_kobject+0x1ed/0x270
       unregister_netdevice_many_notify+0x123c/0x1d80
       ? __mutex_trylock_common+0xde/0x250
       ? __pfx_unregister_netdevice_many_notify+0x10/0x10
       ? trace_contention_end+0xe6/0x140
       ? __mutex_lock+0x4e7/0x8f0
       ? __pfx_lock_acquire.part.0+0x10/0x10
       ? rcu_is_watching+0x12/0xc0
       ? unregister_netdev+0x12/0x30
       unregister_netdevice_queue+0x30d/0x3f0
       ? __pfx_unregister_netdevice_queue+0x10/0x10
       ? __pfx_down_write+0x10/0x10
       unregister_netdev+0x1c/0x30
       bnep_session+0x1fb3/0x2ab0
       ? __pfx_bnep_session+0x10/0x10
       ? __pfx_lock_release+0x10/0x10
       ? __pfx_woken_wake_function+0x10/0x10
       ? __kthread_parkme+0x132/0x200
       ? __pfx_bnep_session+0x10/0x10
       ? kthread+0x13a/0x370
       ? __pfx_bnep_session+0x10/0x10
       kthread+0x2b7/0x370
       ? __pfx_kthread+0x10/0x10
       ret_from_fork+0x48/0x80
       ? __pfx_kthread+0x10/0x10
       ret_from_fork_asm+0x1a/0x30
       </TASK>
      
      Allocated by task 4974:
       kasan_save_stack+0x30/0x50
       kasan_save_track+0x14/0x30
       __kasan_kmalloc+0xaa/0xb0
       __kmalloc_noprof+0x1d1/0x440
       hci_alloc_dev_priv+0x1d/0x2820
       __vhci_create_device+0xef/0x7d0
       vhci_write+0x2c7/0x480
       vfs_write+0x6a0/0xfc0
       ksys_write+0x12f/0x260
       do_syscall_64+0xc7/0x250
       entry_SYSCALL_64_after_hwframe+0x77/0x7f
      
      Freed by task 4979:
       kasan_save_stack+0x30/0x50
       kasan_save_track+0x14/0x30
       kasan_save_free_info+0x3b/0x60
       __kasan_slab_free+0x4f/0x70
       kfree+0x141/0x490
       hci_release_dev+0x4d9/0x600
       bt_host_release+0x6a/0xb0
       device_release+0xa4/0x240
       kobject_put+0x1ec/0x5a0
       put_device+0x1f/0x30
       vhci_release+0x81/0xf0
       __fput+0x3f6/0xb30
       task_work_run+0x151/0x250
       do_exit+0xa79/0x2c30
       do_group_exit+0xd5/0x2a0
       get_signal+0x1fcd/0x2210
       arch_do_signal_or_restart+0x93/0x780
       syscall_exit_to_user_mode+0x140/0x290
       do_syscall_64+0xd4/0x250
       entry_SYSCALL_64_after_hwframe+0x77/0x7f
      
      In 'hci_conn_del_sysfs()', 'device_unregister()' may be called when
      an underlying (kobject) reference counter is greater than 1. This
      means that reparenting (happened when the device is actually freed)
      is delayed and, during that delay, parent controller device (hciX)
      may be deleted. Since the latter may create a dangling pointer to
      freed parent, avoid that scenario by reparenting to NULL explicitly.
      
      Reported-by: default avatar <syzbot+6cf5652d3df49fae2e3f@syzkaller.appspotmail.com>
      Tested-by: default avatar <syzbot+6cf5652d3df49fae2e3f@syzkaller.appspotmail.com>
      Closes: https://syzkaller.appspot.com/bug?extid=6cf5652d3df49fae2e3f
      
      
      Fixes: a85fb91e ("Bluetooth: Fix double free in hci_conn_cleanup")
      Signed-off-by: default avatarDmitry Antipov <dmantipov@yandex.ru>
      Signed-off-by: default avatarLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      7b277bd5
    • Iulia Tanasescu's avatar
      Bluetooth: ISO: Send BIG Create Sync via hci_sync · 1360e5b6
      Iulia Tanasescu authored
      
      [ Upstream commit 07a9342b94a91b306ed1cf6aa8254aea210764c9 ]
      
      Before issuing the LE BIG Create Sync command, an available BIG handle
      is chosen by iterating through the conn_hash list and finding the first
      unused value.
      
      If a BIG is terminated, the associated hcons are removed from the list
      and the LE BIG Terminate Sync command is sent via hci_sync queue.
      However, a new LE BIG Create sync command might be issued via
      hci_send_cmd, before the previous BIG sync was terminated. This
      can cause the same BIG handle to be reused and the LE BIG Create Sync
      to fail with Command Disallowed.
      
      < HCI Command: LE Broadcast Isochronous Group Create Sync (0x08|0x006b)
              BIG Handle: 0x00
              BIG Sync Handle: 0x0002
              Encryption: Unencrypted (0x00)
              Broadcast Code[16]: 00000000000000000000000000000000
              Maximum Number Subevents: 0x00
              Timeout: 20000 ms (0x07d0)
              Number of BIS: 1
              BIS ID: 0x01
      > HCI Event: Command Status (0x0f) plen 4
            LE Broadcast Isochronous Group Create Sync (0x08|0x006b) ncmd 1
              Status: Command Disallowed (0x0c)
      < HCI Command: LE Broadcast Isochronous Group Terminate Sync (0x08|0x006c)
              BIG Handle: 0x00
      
      This commit fixes the ordering of the LE BIG Create Sync/LE BIG Terminate
      Sync commands, to make sure that either the previous BIG sync is
      terminated before reusing the handle, or that a new handle is chosen
      for a new sync.
      
      Fixes: eca0ae4a ("Bluetooth: Add initial implementation of BIS connections")
      Signed-off-by: default avatarIulia Tanasescu <iulia.tanasescu@nxp.com>
      Signed-off-by: default avatarLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      1360e5b6
    • Iulia Tanasescu's avatar
      Bluetooth: ISO: Do not emit LE BIG Create Sync if previous is pending · 91d19383
      Iulia Tanasescu authored
      
      [ Upstream commit 42ecf1947135110ea08abeaca39741636f9a2285 ]
      
      The Bluetooth Core spec does not allow a LE BIG Create sync command to be
      sent to Controller if another one is pending (Vol 4, Part E, page 2586).
      
      In order to avoid this issue, the HCI_CONN_CREATE_BIG_SYNC was added
      to mark that the LE BIG Create Sync command has been sent for a hcon.
      Once the BIG Sync Established event is received, the hcon flag is
      erased and the next pending hcon is handled.
      
      Signed-off-by: default avatarIulia Tanasescu <iulia.tanasescu@nxp.com>
      Signed-off-by: default avatarLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
      Stable-dep-of: 07a9342b94a9 ("Bluetooth: ISO: Send BIG Create Sync via hci_sync")
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      91d19383
    • Iulia Tanasescu's avatar
      Bluetooth: ISO: Do not emit LE PA Create Sync if previous is pending · 67ead8f8
      Iulia Tanasescu authored
      
      [ Upstream commit 4a5e0ba68676b3a77298cf646cd2b39c94fbd2f5 ]
      
      The Bluetooth Core spec does not allow a LE PA Create sync command to be
      sent to Controller if another one is pending (Vol 4, Part E, page 2493).
      
      In order to avoid this issue, the HCI_CONN_CREATE_PA_SYNC was added
      to mark that the LE PA Create Sync command has been sent for a hcon.
      Once the PA Sync Established event is received, the hcon flag is
      erased and the next pending hcon is handled.
      
      Signed-off-by: default avatarIulia Tanasescu <iulia.tanasescu@nxp.com>
      Signed-off-by: default avatarLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
      Stable-dep-of: 07a9342b94a9 ("Bluetooth: ISO: Send BIG Create Sync via hci_sync")
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      67ead8f8
    • Luiz Augusto von Dentz's avatar
      Bluetooth: ISO: Use kref to track lifetime of iso_conn · a58d0f5d
      Luiz Augusto von Dentz authored
      
      [ Upstream commit dc26097bdb864a0d5955b9a25e43376ffc1af99b ]
      
      This make use of kref to keep track of reference of iso_conn which
      allows better tracking of its lifetime with usage of things like
      kref_get_unless_zero in a similar way as used in l2cap_chan.
      
      In addition to it remove call to iso_sock_set_timer on iso_sock_disconn
      since at that point it is useless to set a timer as the sk will be freed
      there is nothing to be done in iso_sock_timeout.
      
      Fixes: ccf74f23 ("Bluetooth: Add BTPROTO_ISO socket type")
      Signed-off-by: default avatarLuiz Augusto von Dentz <luiz.von.dentz@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      a58d0f5d
    • Mingwei Zheng's avatar
      net: rfkill: gpio: Add check for clk_enable() · 20be8d4b
      Mingwei Zheng authored
      
      [ Upstream commit 8251e7621b25ccdb689f1dd9553b8789e3745ea1 ]
      
      Add check for the return value of clk_enable() to catch the potential
      error.
      
      Fixes: 7176ba23 ("net: rfkill: add generic gpio rfkill driver")
      Signed-off-by: default avatarMingwei Zheng <zmw12306@gmail.com>
      Signed-off-by: default avatarJiasheng Jiang <jiashengjiangcool@gmail.com>
      Link: https://patch.msgid.link/20241108195341.1853080-1-zmw12306@gmail.com
      
      
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      20be8d4b
    • Omid Ehtemam-Haghighi's avatar
      ipv6: Fix soft lockups in fib6_select_path under high next hop churn · 34a949e7
      Omid Ehtemam-Haghighi authored
      
      [ Upstream commit d9ccb18f83ea2bb654289b6ecf014fd267cc988b ]
      
      Soft lockups have been observed on a cluster of Linux-based edge routers
      located in a highly dynamic environment. Using the `bird` service, these
      routers continuously update BGP-advertised routes due to frequently
      changing nexthop destinations, while also managing significant IPv6
      traffic. The lockups occur during the traversal of the multipath
      circular linked-list in the `fib6_select_path` function, particularly
      while iterating through the siblings in the list. The issue typically
      arises when the nodes of the linked list are unexpectedly deleted
      concurrently on a different core—indicated by their 'next' and
      'previous' elements pointing back to the node itself and their reference
      count dropping to zero. This results in an infinite loop, leading to a
      soft lockup that triggers a system panic via the watchdog timer.
      
      Apply RCU primitives in the problematic code sections to resolve the
      issue. Where necessary, update the references to fib6_siblings to
      annotate or use the RCU APIs.
      
      Include a test script that reproduces the issue. The script
      periodically updates the routing table while generating a heavy load
      of outgoing IPv6 traffic through multiple iperf3 clients. It
      consistently induces infinite soft lockups within a couple of minutes.
      
      Kernel log:
      
       0 [ffffbd13003e8d30] machine_kexec at ffffffff8ceaf3eb
       1 [ffffbd13003e8d90] __crash_kexec at ffffffff8d0120e3
       2 [ffffbd13003e8e58] panic at ffffffff8cef65d4
       3 [ffffbd13003e8ed8] watchdog_timer_fn at ffffffff8d05cb03
       4 [ffffbd13003e8f08] __hrtimer_run_queues at ffffffff8cfec62f
       5 [ffffbd13003e8f70] hrtimer_interrupt at ffffffff8cfed756
       6 [ffffbd13003e8fd0] __sysvec_apic_timer_interrupt at ffffffff8cea01af
       7 [ffffbd13003e8ff0] sysvec_apic_timer_interrupt at ffffffff8df1b83d
      -- <IRQ stack> --
       8 [ffffbd13003d3708] asm_sysvec_apic_timer_interrupt at ffffffff8e000ecb
          [exception RIP: fib6_select_path+299]
          RIP: ffffffff8ddafe7b  RSP: ffffbd13003d37b8  RFLAGS: 00000287
          RAX: ffff975850b43600  RBX: ffff975850b40200  RCX: 0000000000000000
          RDX: 000000003fffffff  RSI: 0000000051d383e4  RDI: ffff975850b43618
          RBP: ffffbd13003d3800   R8: 0000000000000000   R9: ffff975850b40200
          R10: 0000000000000000  R11: 0000000000000000  R12: ffffbd13003d3830
          R13: ffff975850b436a8  R14: ffff975850b43600  R15: 0000000000000007
          ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
       9 [ffffbd13003d3808] ip6_pol_route at ffffffff8ddb030c
      10 [ffffbd13003d3888] ip6_pol_route_input at ffffffff8ddb068c
      11 [ffffbd13003d3898] fib6_rule_lookup at ffffffff8ddf02b5
      12 [ffffbd13003d3928] ip6_route_input at ffffffff8ddb0f47
      13 [ffffbd13003d3a18] ip6_rcv_finish_core.constprop.0 at ffffffff8dd950d0
      14 [ffffbd13003d3a30] ip6_list_rcv_finish.constprop.0 at ffffffff8dd96274
      15 [ffffbd13003d3a98] ip6_sublist_rcv at ffffffff8dd96474
      16 [ffffbd13003d3af8] ipv6_list_rcv at ffffffff8dd96615
      17 [ffffbd13003d3b60] __netif_receive_skb_list_core at ffffffff8dc16fec
      18 [ffffbd13003d3be0] netif_receive_skb_list_internal at ffffffff8dc176b3
      19 [ffffbd13003d3c50] napi_gro_receive at ffffffff8dc565b9
      20 [ffffbd13003d3c80] ice_receive_skb at ffffffffc087e4f5 [ice]
      21 [ffffbd13003d3c90] ice_clean_rx_irq at ffffffffc0881b80 [ice]
      22 [ffffbd13003d3d20] ice_napi_poll at ffffffffc088232f [ice]
      23 [ffffbd13003d3d80] __napi_poll at ffffffff8dc18000
      24 [ffffbd13003d3db8] net_rx_action at ffffffff8dc18581
      25 [ffffbd13003d3e40] __do_softirq at ffffffff8df352e9
      26 [ffffbd13003d3eb0] run_ksoftirqd at ffffffff8ceffe47
      27 [ffffbd13003d3ec0] smpboot_thread_fn at ffffffff8cf36a30
      28 [ffffbd13003d3ee8] kthread at ffffffff8cf2b39f
      29 [ffffbd13003d3f28] ret_from_fork at ffffffff8ce5fa64
      30 [ffffbd13003d3f50] ret_from_fork_asm at ffffffff8ce03cbb
      
      Fixes: 66f5d6ce ("ipv6: replace rwlock with rcu and spinlock in fib6_table")
      Reported-by: default avatarAdrian Oliver <kernel@aoliver.ca>
      Signed-off-by: default avatarOmid Ehtemam-Haghighi <omid.ehtemamhaghighi@menlosecurity.com>
      Cc: Shuah Khan <shuah@kernel.org>
      Cc: Ido Schimmel <idosch@idosch.org>
      Cc: Kuniyuki Iwashima <kuniyu@amazon.com>
      Cc: Simon Horman <horms@kernel.org>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Link: https://patch.msgid.link/20241106010236.1239299-1-omid.ehtemamhaghighi@menlosecurity.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      34a949e7
    • Lingbo Kong's avatar
      wifi: cfg80211: Remove the Medium Synchronization Delay validity check · ee22f520
      Lingbo Kong authored
      
      [ Upstream commit b4ebb58cb9a4b1b5cb5278b09d6afdcd71b2a6b4 ]
      
      Currently, when the driver attempts to connect to an AP MLD with multiple
      APs, the cfg80211_mlme_check_mlo_compat() function requires the Medium
      Synchronization Delay values from different APs of the same AP MLD to be
      equal, which may result in connection failures.
      
      This is because when the driver receives a multi-link probe response from
      an AP MLD with multiple APs, cfg80211 updates the Elements for each AP
      based on the multi-link probe response. If the Medium Synchronization Delay
      is set in the multi-link probe response, the Elements for each AP belonging
      to the same AP MLD will have the Medium Synchronization Delay set
      simultaneously. If non-multi-link probe responses are received from
      different APs of the same MLD AP, cfg80211 will still update the Elements
      based on the non-multi-link probe response. Since the non-multi-link probe
      response does not set the Medium Synchronization Delay
      (IEEE 802.11be-2024-35.3.4.4), if the Elements from a non-multi-link probe
      response overwrite those from a multi-link probe response that has set the
      Medium Synchronization Delay, the Medium Synchronization Delay values for
      APs belonging to the same AP MLD will not be equal. This discrepancy causes
      the cfg80211_mlme_check_mlo_compat() function to fail, leading to
      connection failures. Commit ccb964b4
      ("wifi: cfg80211: validate MLO connections better") did not take this into
      account.
      
      To address this issue, remove this validity check.
      
      Fixes: ccb964b4 ("wifi: cfg80211: validate MLO connections better")
      Signed-off-by: default avatarLingbo Kong <quic_lingbok@quicinc.com>
      Link: https://patch.msgid.link/20241031134223.970-1-quic_lingbok@quicinc.com
      
      
      Signed-off-by: default avatarJohannes Berg <johannes.berg@intel.com>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ee22f520
    • Paolo Abeni's avatar
      ipv6: release nexthop on device removal · 0e4c6faa
      Paolo Abeni authored
      
      [ Upstream commit eb02688c5c45c3e7af7e71f036a7144f5639cbfe ]
      
      The CI is hitting some aperiodic hangup at device removal time in the
      pmtu.sh self-test:
      
      unregister_netdevice: waiting for veth_A-R1 to become free. Usage count = 6
      ref_tracker: veth_A-R1@ffff888013df15d8 has 1/5 users at
      	dst_init+0x84/0x4a0
      	dst_alloc+0x97/0x150
      	ip6_dst_alloc+0x23/0x90
      	ip6_rt_pcpu_alloc+0x1e6/0x520
      	ip6_pol_route+0x56f/0x840
      	fib6_rule_lookup+0x334/0x630
      	ip6_route_output_flags+0x259/0x480
      	ip6_dst_lookup_tail.constprop.0+0x5c2/0x940
      	ip6_dst_lookup_flow+0x88/0x190
      	udp_tunnel6_dst_lookup+0x2a7/0x4c0
      	vxlan_xmit_one+0xbde/0x4a50 [vxlan]
      	vxlan_xmit+0x9ad/0xf20 [vxlan]
      	dev_hard_start_xmit+0x10e/0x360
      	__dev_queue_xmit+0xf95/0x18c0
      	arp_solicit+0x4a2/0xe00
      	neigh_probe+0xaa/0xf0
      
      While the first suspect is the dst_cache, explicitly tracking the dst
      owing the last device reference via probes proved such dst is held by
      the nexthop in the originating fib6_info.
      
      Similar to commit f5b51fe8 ("ipv6: route: purge exception on
      removal"), we need to explicitly release the originating fib info when
      disconnecting a to-be-removed device from a live ipv6 dst: move the
      fib6_info cleanup into ip6_dst_ifdown().
      
      Tested running:
      
      ./pmtu.sh cleanup_ipv6_exception
      
      in a tight loop for more than 400 iterations with no spat, running an
      unpatched kernel  I observed a splat every ~10 iterations.
      
      Fixes: f88d8ea6 ("ipv6: Plumb support for nexthop object in a fib6_info")
      Signed-off-by: default avatarPaolo Abeni <pabeni@redhat.com>
      Reviewed-by: default avatarEric Dumazet <edumazet@google.com>
      Reviewed-by: default avatarDavid Ahern <dsahern@kernel.org>
      Link: https://patch.msgid.link/604c45c188c609b732286b47ac2a451a40f6cf6d.1730828007.git.pabeni@redhat.com
      
      
      Signed-off-by: default avatarJakub Kicinski <kuba@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      0e4c6faa
    • Zijian Zhang's avatar
      bpf, sockmap: Fix sk_msg_reset_curr · 08baa3f0
      Zijian Zhang authored
      
      [ Upstream commit 955afd57dc4bf7e8c620a0a9e3af3c881c2c6dff ]
      
      Found in the test_txmsg_pull in test_sockmap,
      ```
      txmsg_cork = 512; // corking is importrant here
      opt->iov_length = 3;
      opt->iov_count = 1;
      opt->rate = 512; // sendmsg will be invoked 512 times
      ```
      The first sendmsg will send an sk_msg with size 3, and bpf_msg_pull_data
      will be invoked the first time. sk_msg_reset_curr will reset the copybreak
      from 3 to 0. In the second sendmsg, since we are in the stage of corking,
      psock->cork will be reused in func sk_msg_alloc. msg->sg.copybreak is 0
      now, the second msg will overwrite the first msg. As a result, we could
      not pass the data integrity test.
      
      The same problem happens in push and pop test. Thus, fix sk_msg_reset_curr
      to restore the correct copybreak.
      
      Fixes: bb9aefde ("bpf: sockmap, updating the sg structure should also update curr")
      Signed-off-by: default avatarZijian Zhang <zijianzhang@bytedance.com>
      Link: https://lore.kernel.org/r/20241106222520.527076-9-zijianzhang@bytedance.com
      
      
      Signed-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      08baa3f0
    • Zijian Zhang's avatar
      bpf, sockmap: Several fixes to bpf_msg_pop_data · 275a9f3e
      Zijian Zhang authored
      
      [ Upstream commit 5d609ba262475db450ba69b8e8a557bd768ac07a ]
      
      Several fixes to bpf_msg_pop_data,
      1. In sk_msg_shift_left, we should put_page
      2. if (len == 0), return early is better
      3. pop the entire sk_msg (last == msg->sg.size) should be supported
      4. Fix for the value of variable "a"
      5. In sk_msg_shift_left, after shifting, i has already pointed to the next
      element. Addtional sk_msg_iter_var_next may result in BUG.
      
      Fixes: 7246d8ed ("bpf: helper to pop data from messages")
      Signed-off-by: default avatarZijian Zhang <zijianzhang@bytedance.com>
      Reviewed-by: default avatarJohn Fastabend <john.fastabend@gmail.com>
      Link: https://lore.kernel.org/r/20241106222520.527076-8-zijianzhang@bytedance.com
      
      
      Signed-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      275a9f3e
    • Zijian Zhang's avatar
      bpf, sockmap: Several fixes to bpf_msg_push_data · ce06c450
      Zijian Zhang authored
      
      [ Upstream commit 15ab0548e3107665c34579ae523b2b6e7c22082a ]
      
      Several fixes to bpf_msg_push_data,
      1. test_sockmap has tests where bpf_msg_push_data is invoked to push some
      data at the end of a message, but -EINVAL is returned. In this case, in
      bpf_msg_push_data, after the first loop, i will be set to msg->sg.end, add
      the logic to handle it.
      2. In the code block of "if (start - offset)", it's possible that "i"
      points to the last of sk_msg_elem. In this case, "sk_msg_iter_next(msg,
      end)" might still be called twice, another invoking is in "if (!copy)"
      code block, but actually only one is needed. Add the logic to handle it,
      and reconstruct the code to make the logic more clear.
      
      Fixes: 6fff607e ("bpf: sk_msg program helper bpf_msg_push_data")
      Signed-off-by: default avatarZijian Zhang <zijianzhang@bytedance.com>
      Link: https://lore.kernel.org/r/20241106222520.527076-7-zijianzhang@bytedance.com
      
      
      Signed-off-by: default avatarMartin KaFai Lau <martin.lau@kernel.org>
      Signed-off-by: default avatarSasha Levin <sashal@kernel.org>
      ce06c450
Loading