- Dec 09, 2024
-
-
Yang Erkun authored
commit 98100e88dd8865999dc6379a3356cd799795fe7b upstream. The action force umount(umount -f) will attempt to kill all rpc_task even umount operation may ultimately fail if some files remain open. Consequently, if an action attempts to open a file, it can potentially send two rpc_task to nfs server. NFS CLIENT thread1 thread2 open("file") ... nfs4_do_open _nfs4_do_open _nfs4_open_and_get_state _nfs4_proc_open nfs4_run_open_task /* rpc_task1 */ rpc_run_task rpc_wait_for_completion_task umount -f nfs_umount_begin rpc_killall_tasks rpc_signal_task rpc_task1 been wakeup and return -512 _nfs4_do_open // while loop ... nfs4_run_open_task /* rpc_task2 */ rpc_run_task rpc_wait_for_completion_task While processing an open request, nfsd will first attempt to find or allocate an nfs4_openowner. If it finds an nfs4_openowner that is not marked as NFS4_OO_CONFIRMED, this nfs4_openowner will released. Since two rpc_task can attempt to open the same file simultaneously from the client to server, and because two instances of nfsd can run concurrently, this situation can lead to lots of memory leak. Additionally, when we echo 0 to /proc/fs/nfsd/threads, warning will be triggered. NFS SERVER nfsd1 nfsd2 echo 0 > /proc/fs/nfsd/threads nfsd4_open nfsd4_process_open1 find_or_alloc_open_stateowner // alloc oo1, stateid1 nfsd4_open nfsd4_process_open1 find_or_alloc_open_stateowner // find oo1, without NFS4_OO_CONFIRMED release_openowner unhash_openowner_locked list_del_init(&oo->oo_perclient) // cannot find this oo // from client, LEAK!!! alloc_stateowner // alloc oo2 nfsd4_process_open2 init_open_stateid // associate oo1 // with stateid1, stateid1 LEAK!!! nfs4_get_vfs_file // alloc nfsd_file1 and nfsd_file_mark1 // all LEAK!!! nfsd4_process_open2 ... write_threads ... nfsd_destroy_serv nfsd_shutdown_net nfs4_state_shutdown_net nfs4_state_destroy_net destroy_client __destroy_client // won't find oo1!!! nfsd_shutdown_generic nfsd_file_cache_shutdown kmem_cache_destroy for nfsd_file_slab and nfsd_file_mark_slab // bark since nfsd_file1 // and nfsd_file_mark1 // still alive ======================================================================= BUG nfsd_file (Not tainted): Objects remaining in nfsd_file on __kmem_cache_shutdown() ----------------------------------------------------------------------- Slab 0xffd4000004438a80 objects=34 used=1 fp=0xff11000110e2ad28 flags=0x17ffffc0000240(workingset|head|node=0|zone=2|lastcpupid=0x1fffff) CPU: 4 UID: 0 PID: 757 Comm: sh Not tainted 6.12.0-rc6+ #19 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.1-2.fc37 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x53/0x70 slab_err+0xb0/0xf0 __kmem_cache_shutdown+0x15c/0x310 kmem_cache_destroy+0x66/0x160 nfsd_file_cache_shutdown+0xac/0x210 [nfsd] nfsd_destroy_serv+0x251/0x2a0 [nfsd] nfsd_svc+0x125/0x1e0 [nfsd] write_threads+0x16a/0x2a0 [nfsd] nfsctl_transaction_write+0x74/0xa0 [nfsd] vfs_write+0x1ae/0x6d0 ksys_write+0xc1/0x160 do_syscall_64+0x5f/0x170 entry_SYSCALL_64_after_hwframe+0x76/0x7e Disabling lock debugging due to kernel taint Object 0xff11000110e2ac38 @offset=3128 Allocated in nfsd_file_do_acquire+0x20f/0xa30 [nfsd] age=1635 cpu=3 pid=800 nfsd_file_do_acquire+0x20f/0xa30 [nfsd] nfsd_file_acquire_opened+0x5f/0x90 [nfsd] nfs4_get_vfs_file+0x4c9/0x570 [nfsd] nfsd4_process_open2+0x713/0x1070 [nfsd] nfsd4_open+0x74b/0x8b0 [nfsd] nfsd4_proc_compound+0x70b/0xc20 [nfsd] nfsd_dispatch+0x1b4/0x3a0 [nfsd] svc_process_common+0x5b8/0xc50 [sunrpc] svc_process+0x2ab/0x3b0 [sunrpc] svc_handle_xprt+0x681/0xa20 [sunrpc] nfsd+0x183/0x220 [nfsd] kthread+0x199/0x1e0 ret_from_fork+0x31/0x60 ret_from_fork_asm+0x1a/0x30 Add nfs4_openowner_unhashed to help found unhashed nfs4_openowner, and break nfsd4_open process to fix this problem. Cc: stable@vger.kernel.org # v5.4+ Reviewed-by:
Jeff Layton <jlayton@kernel.org> Signed-off-by:
Yang Erkun <yangerkun@huawei.com> Signed-off-by:
Chuck Lever <chuck.lever@oracle.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Yang Erkun authored
commit be8f982c369c965faffa198b46060f8853e0f1f0 upstream. The function `e_show` was called with protection from RCU. This only ensures that `exp` will not be freed. Therefore, the reference count for `exp` can drop to zero, which will trigger a refcount use-after-free warning when `exp_get` is called. To resolve this issue, use `cache_get_rcu` to ensure that `exp` remains active. ------------[ cut here ]------------ refcount_t: addition on 0; use-after-free. WARNING: CPU: 3 PID: 819 at lib/refcount.c:25 refcount_warn_saturate+0xb1/0x120 CPU: 3 UID: 0 PID: 819 Comm: cat Not tainted 6.12.0-rc3+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.1-2.fc37 04/01/2014 RIP: 0010:refcount_warn_saturate+0xb1/0x120 ... Call Trace: <TASK> e_show+0x20b/0x230 [nfsd] seq_read_iter+0x589/0x770 seq_read+0x1e5/0x270 vfs_read+0x125/0x530 ksys_read+0xc1/0x160 do_syscall_64+0x5f/0x170 entry_SYSCALL_64_after_hwframe+0x76/0x7e Fixes: bf18f163 ("NFSD: Using exp_get for export getting") Cc: stable@vger.kernel.org # 4.20+ Signed-off-by:
Yang Erkun <yangerkun@huawei.com> Reviewed-by:
Jeff Layton <jlayton@kernel.org> Signed-off-by:
Chuck Lever <chuck.lever@oracle.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Damien Le Moal authored
commit 64f093c4d99d797b68b407a9d8767aadc3e3ea7a upstream. The Rockchip PCIe endpoint controller handles PCIe transfers addresses by masking the lower bits of the programmed PCI address and using the same number of lower bits masked from the CPU address space used for the mapping. For a PCI mapping of <size> bytes starting from <pci_addr>, the number of bits masked is the number of address bits changing in the address range [pci_addr..pci_addr + size - 1]. However, rockchip_pcie_prog_ep_ob_atu() calculates num_pass_bits only using the size of the mapping, resulting in an incorrect number of mask bits depending on the value of the PCI address to map. Fix this by introducing the helper function rockchip_pcie_ep_ob_atu_num_bits() to correctly calculate the number of mask bits to use to program the address translation unit. The number of mask bits is calculated depending on both the PCI address and size of the mapping, and clamped between 8 and 20 using the macros ROCKCHIP_PCIE_AT_MIN_NUM_BITS and ROCKCHIP_PCIE_AT_MAX_NUM_BITS. As defined in the Rockchip RK3399 TRM V1.3 Part2, Sections 17.5.5.1.1 and 17.6.8.2.1, this clamping is necessary because: 1) The lower 8 bits of the PCI address to be mapped by the outbound region are ignored. So a minimum of 8 address bits are needed and imply that the PCI address must be aligned to 256. 2) The outbound memory regions are 1MB in size. So while we can specify up to 63-bits for the PCI address (num_bits filed uses bits 0 to 5 of the outbound address region 0 register), we must limit the number of valid address bits to 20 to match the memory window maximum size (1 << 20 = 1MB). Fixes: cf590b07 ("PCI: rockchip: Add EP driver for Rockchip PCIe controller") Link: https://lore.kernel.org/r/20241017015849.190271-2-dlemoal@kernel.org Signed-off-by:
Damien Le Moal <dlemoal@kernel.org> Signed-off-by:
Krzysztof Wilczyński <kwilczynski@kernel.org> Signed-off-by:
Bjorn Helgaas <bhelgaas@google.com> Cc: stable@vger.kernel.org Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Andrea della Porta authored
commit 5e316d34b53039346e252d0019e2f4167af2c0ef upstream. When populating "ranges" property for a PCI bridge or endpoint, of_pci_prop_ranges() incorrectly uses the CPU address of the resource. In such PCI nodes, the window should instead be in PCI address space. Call pci_bus_address() on the resource in order to obtain the PCI bus address. [Previous discussion at: https://lore.kernel.org/all/8b4fa91380fc4754ea80f47330c613e4f6b6592c.1724159867.git.andrea.porta@suse.com/] Link: https://lore.kernel.org/r/20241108094256.28933-1-andrea.porta@suse.com Fixes: 407d1a51 ("PCI: Create device tree node for bridge") Tested-by:
Herve Codina <herve.codina@bootlin.com> Signed-off-by:
Andrea della Porta <andrea.porta@suse.com> Signed-off-by:
Bjorn Helgaas <bhelgaas@google.com> Cc: stable@vger.kernel.org Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Niklas Cassel authored
commit 118397c9baaac0b7ec81896f8d755d09aa82c485 upstream. The advertised resizable BAR size was fixed in commit 72e34b85 ("PCI: dwc: endpoint: Fix advertised resizable BAR size"). Commit 867ab111 ("PCI: dwc: ep: Add a generic dw_pcie_ep_linkdown() API to handle Link Down event") was included shortly after this, and moved the code to another function. When the code was moved, this fix was mistakenly lost. According to the spec, it is illegal to not have a bit set in PCI_REBAR_CAP, and 1 MB is the smallest size allowed. So, set bit 4 in PCI_REBAR_CAP, so that we actually advertise support for a 1 MB BAR size. Fixes: 867ab111 ("PCI: dwc: ep: Add a generic dw_pcie_ep_linkdown() API to handle Link Down event") Link: https://lore.kernel.org/r/20241116005950.2480427-2-cassel@kernel.org Link: https://lore.kernel.org/r/20240606-pci-deinit-v1-3-4395534520dc@linaro.org Link: https://lore.kernel.org/r/20240307111520.3303774-1-cassel@kernel.org Signed-off-by:
Niklas Cassel <cassel@kernel.org> Signed-off-by:
Krzysztof Wilczyński <kwilczynski@kernel.org> Cc: stable@vger.kernel.org Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Yuan Can authored
commit e74fa2447bf9ed03d085b6d91f0256cc1b53f1a8 upstream. This commit add missed destroy_work_on_stack() operations for pw->worker in pool_work_wait(). Fixes: e7a3e871 ("dm thin: cleanup noflush_work to use a proper completion") Cc: stable@vger.kernel.org Signed-off-by:
Yuan Can <yuancan@huawei.com> Signed-off-by:
Mikulas Patocka <mpatocka@redhat.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Ssuhung Yeh authored
commit 2deb70d3e66d538404d9e71bff236e6d260da66e upstream. Remove the redundant "i" at the beginning of the error message. This "i" came from commit 1c131886 ("dm: prefer '"%s...", __func__'"), the "i" is accidentally left. Signed-off-by:
Ssuhung Yeh <ssuhung@gmail.com> Signed-off-by:
Mikulas Patocka <mpatocka@redhat.com> Fixes: 1c131886 ("dm: prefer '"%s...", __func__'") Cc: stable@vger.kernel.org # v6.3+ Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Adrian Huang authored
commit 9e9e085effe9b7e342138fde3cf8577d22509932 upstream. When compiling kernel source 'make -j $(nproc)' with the up-and-running KASAN-enabled kernel on a 256-core machine, the following soft lockup is shown: watchdog: BUG: soft lockup - CPU#28 stuck for 22s! [kworker/28:1:1760] CPU: 28 PID: 1760 Comm: kworker/28:1 Kdump: loaded Not tainted 6.10.0-rc5 #95 Workqueue: events drain_vmap_area_work RIP: 0010:smp_call_function_many_cond+0x1d8/0xbb0 Code: 38 c8 7c 08 84 c9 0f 85 49 08 00 00 8b 45 08 a8 01 74 2e 48 89 f1 49 89 f7 48 c1 e9 03 41 83 e7 07 4c 01 e9 41 83 c7 03 f3 90 <0f> b6 01 41 38 c7 7c 08 84 c0 0f 85 d4 06 00 00 8b 45 08 a8 01 75 RSP: 0018:ffffc9000cb3fb60 EFLAGS: 00000202 RAX: 0000000000000011 RBX: ffff8883bc4469c0 RCX: ffffed10776e9949 RDX: 0000000000000002 RSI: ffff8883bb74ca48 RDI: ffffffff8434dc50 RBP: ffff8883bb74ca40 R08: ffff888103585dc0 R09: ffff8884533a1800 R10: 0000000000000004 R11: ffffffffffffffff R12: ffffed1077888d39 R13: dffffc0000000000 R14: ffffed1077888d38 R15: 0000000000000003 FS: 0000000000000000(0000) GS:ffff8883bc400000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00005577b5c8d158 CR3: 0000000004850000 CR4: 0000000000350ef0 Call Trace: <IRQ> ? watchdog_timer_fn+0x2cd/0x390 ? __pfx_watchdog_timer_fn+0x10/0x10 ? __hrtimer_run_queues+0x300/0x6d0 ? sched_clock_cpu+0x69/0x4e0 ? __pfx___hrtimer_run_queues+0x10/0x10 ? srso_return_thunk+0x5/0x5f ? ktime_get_update_offsets_now+0x7f/0x2a0 ? srso_return_thunk+0x5/0x5f ? srso_return_thunk+0x5/0x5f ? hrtimer_interrupt+0x2ca/0x760 ? __sysvec_apic_timer_interrupt+0x8c/0x2b0 ? sysvec_apic_timer_interrupt+0x6a/0x90 </IRQ> <TASK> ? asm_sysvec_apic_timer_interrupt+0x16/0x20 ? smp_call_function_many_cond+0x1d8/0xbb0 ? __pfx_do_kernel_range_flush+0x10/0x10 on_each_cpu_cond_mask+0x20/0x40 flush_tlb_kernel_range+0x19b/0x250 ? srso_return_thunk+0x5/0x5f ? kasan_release_vmalloc+0xa7/0xc0 purge_vmap_node+0x357/0x820 ? __pfx_purge_vmap_node+0x10/0x10 __purge_vmap_area_lazy+0x5b8/0xa10 drain_vmap_area_work+0x21/0x30 process_one_work+0x661/0x10b0 worker_thread+0x844/0x10e0 ? srso_return_thunk+0x5/0x5f ? __kthread_parkme+0x82/0x140 ? __pfx_worker_thread+0x10/0x10 kthread+0x2a5/0x370 ? __pfx_kthread+0x10/0x10 ret_from_fork+0x30/0x70 ? __pfx_kthread+0x10/0x10 ret_from_fork_asm+0x1a/0x30 </TASK> Debugging Analysis: 1. The following ftrace log shows that the lockup CPU spends too much time iterating vmap_nodes and flushing TLB when purging vm_area structures. (Some info is trimmed). kworker: funcgraph_entry: | drain_vmap_area_work() { kworker: funcgraph_entry: | mutex_lock() { kworker: funcgraph_entry: 1.092 us | __cond_resched(); kworker: funcgraph_exit: 3.306 us | } ... ... kworker: funcgraph_entry: | flush_tlb_kernel_range() { ... ... kworker: funcgraph_exit: # 7533.649 us | } ... ... kworker: funcgraph_entry: 2.344 us | mutex_unlock(); kworker: funcgraph_exit: $ 23871554 us | } The drain_vmap_area_work() spends over 23 seconds. There are 2805 flush_tlb_kernel_range() calls in the ftrace log. * One is called in __purge_vmap_area_lazy(). * Others are called by purge_vmap_node->kasan_release_vmalloc. purge_vmap_node() iteratively releases kasan vmalloc allocations and flushes TLB for each vmap_area. - [Rough calculation] Each flush_tlb_kernel_range() runs about 7.5ms. -- 2804 * 7.5ms = 21.03 seconds. -- That's why a soft lock is triggered. 2. Extending the soft lockup time can work around the issue (For example, # echo 60 > /proc/sys/kernel/watchdog_thresh). This confirms the above-mentioned speculation: drain_vmap_area_work() spends too much time. If we combine all TLB flush operations of the KASAN shadow virtual address into one operation in the call path 'purge_vmap_node()->kasan_release_vmalloc()', the running time of drain_vmap_area_work() can be saved greatly. The idea is from the flush_tlb_kernel_range() call in __purge_vmap_area_lazy(). And, the soft lockup won't be triggered. Here is the test result based on 6.10: [6.10 wo/ the patch] 1. ftrace latency profiling (record a trace if the latency > 20s). echo 20000000 > /sys/kernel/debug/tracing/tracing_thresh echo drain_vmap_area_work > /sys/kernel/debug/tracing/set_graph_function echo function_graph > /sys/kernel/debug/tracing/current_tracer echo 1 > /sys/kernel/debug/tracing/tracing_on 2. Run `make -j $(nproc)` to compile the kernel source 3. Once the soft lockup is reproduced, check the ftrace log: cat /sys/kernel/debug/tracing/trace # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 76) $ 50412985 us | } /* __purge_vmap_area_lazy */ 76) $ 50412997 us | } /* drain_vmap_area_work */ 76) $ 29165911 us | } /* __purge_vmap_area_lazy */ 76) $ 29165926 us | } /* drain_vmap_area_work */ 91) $ 53629423 us | } /* __purge_vmap_area_lazy */ 91) $ 53629434 us | } /* drain_vmap_area_work */ 91) $ 28121014 us | } /* __purge_vmap_area_lazy */ 91) $ 28121026 us | } /* drain_vmap_area_work */ [6.10 w/ the patch] 1. Repeat step 1-2 in "[6.10 wo/ the patch]" 2. The soft lockup is not triggered and ftrace log is empty. cat /sys/kernel/debug/tracing/trace # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 3. Setting 'tracing_thresh' to 10/5 seconds does not get any ftrace log. 4. Setting 'tracing_thresh' to 1 second gets ftrace log. cat /sys/kernel/debug/tracing/trace # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 23) $ 1074942 us | } /* __purge_vmap_area_lazy */ 23) $ 1074950 us | } /* drain_vmap_area_work */ The worst execution time of drain_vmap_area_work() is about 1 second. Link: https://lore.kernel.org/lkml/ZqFlawuVnOMY2k3E@pc638.lan/ Link: https://lkml.kernel.org/r/20240726165246.31326-1-ahuang12@lenovo.com Fixes: 282631cb ("mm: vmalloc: remove global purge_vmap_area_root rb-tree") Signed-off-by:
Adrian Huang <ahuang12@lenovo.com> Co-developed-by:
Uladzislau Rezki (Sony) <urezki@gmail.com> Signed-off-by:
Uladzislau Rezki (Sony) <urezki@gmail.com> Tested-by:
Jiwei Sun <sunjw10@lenovo.com> Reviewed-by:
Baoquan He <bhe@redhat.com> Cc: Alexander Potapenko <glider@google.com> Cc: Andrey Konovalov <andreyknvl@gmail.com> Cc: Andrey Ryabinin <ryabinin.a.a@gmail.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Vincenzo Frascino <vincenzo.frascino@arm.com> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Oleksandr Tymoshenko authored
commit 3b6b99ef15ea37635604992ede9ebcccef38a239 upstream. dentry_open in ovl_security_fileattr fails for any file larger than 2GB if open method of the underlying filesystem calls generic_file_open (e.g. fusefs). The issue can be reproduce using the following script: (passthrough_ll is an example app from libfuse). $ D=/opt/test/mnt $ mkdir -p ${D}/{source,base,top/uppr,top/work,ovlfs} $ dd if=/dev/zero of=${D}/source/zero.bin bs=1G count=2 $ passthrough_ll -o source=${D}/source ${D}/base $ mount -t overlay overlay \ -olowerdir=${D}/base,upperdir=${D}/top/uppr,workdir=${D}/top/work \ ${D}/ovlfs $ chmod 0777 ${D}/mnt/ovlfs/zero.bin Running this script results in "Value too large for defined data type" error message from chmod. Signed-off-by:
Oleksandr Tymoshenko <ovt@google.com> Fixes: 72db8211 ("ovl: copy up sync/noatime fileattr flags") Cc: stable@vger.kernel.org # v5.15+ Signed-off-by:
Amir Goldstein <amir73il@gmail.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Javier Carrasco authored
commit 73b03b27736e440e3009fe1319cbc82d2cd1290c upstream. The device_for_each_child_node() macro requires explicit calls to fwnode_handle_put() upon early exits to avoid memory leaks, and in this case the error paths are handled after jumping to 'out_flash_realease', which misses that required call to to decrement the refcount of the child node. A more elegant and robust solution is using the scoped variant of the loop, which automatically handles such early exits. Fix the child node refcounting in the error paths by using device_for_each_child_node_scoped(). Cc: stable@vger.kernel.org Fixes: 679f8652 ("leds: Add mt6360 driver") Signed-off-by:
Javier Carrasco <javier.carrasco.cruz@gmail.com> Link: https://lore.kernel.org/r/20240927-leds_device_for_each_child_node_scoped-v1-1-95c0614b38c8@gmail.com Signed-off-by:
Lee Jones <lee@kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Srinivas Pandruvada authored
commit 7082503622986537f57bdb5ef23e69e70cfad881 upstream. When the current_uuid attribute is set to the active policy UUID, reading back the same attribute is returning "INVALID" instead of the active policy UUID on some platforms before Ice Lake. In platforms before Ice Lake, firmware provides a list of supported thermal policies. In this case, user space can select any of the supported thermal policies via a write to attribute "current_uuid". In commit c7ff2976 ("thermal: int340x: Update OS policy capability handshake")', the OS policy handshake was updated to support Ice Lake and later platforms and it treated priv->current_uuid_index=0 as invalid. However, priv->current_uuid_index=0 is for the active policy, only priv->current_uuid_index=-1 is invalid. Fix this issue by updating the priv->current_uuid_index check. Fixes: c7ff2976 ("thermal: int340x: Update OS policy capability handshake") Signed-off-by:
Srinivas Pandruvada <srinivas.pandruvada@linux.intel.com> Cc: 5.18+ <stable@vger.kernel.org> # 5.18+ Link: https://patch.msgid.link/20241114200213.422303-1-srinivas.pandruvada@linux.intel.com [ rjw: Subject and changelog edits ] Signed-off-by:
Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jiri Olsa authored
commit 088f294609d8f8816dc316681aef2eb61982e0da upstream. If iov_iter_zero succeeds after failed copy_from_kernel_nofault, we need to reset the ret value to zero otherwise it will be returned as final return value of read_kcore_iter. This fixes objdump -d dump over /proc/kcore for me. Cc: stable@vger.kernel.org Cc: Alexander Gordeev <agordeev@linux.ibm.com> Fixes: 3d5854d7 ("fs/proc/kcore.c: allow translation of physical memory addresses") Signed-off-by:
Jiri Olsa <jolsa@kernel.org> Link: https://lore.kernel.org/r/20241121231118.3212000-1-jolsa@kernel.org Acked-by:
Alexander Gordeev <agordeev@linux.ibm.com> Signed-off-by:
Christian Brauner <brauner@kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Geert Uytterhoeven authored
commit 9008fe8fad8255edfdbecea32d7eb0485d939d0d upstream. On m68k, where the minimum alignment of unsigned long is 2 bytes: Kernel panic - not syncing: __kmem_cache_create_args: Failed to create slab 'io_kiocb'. Error -22 CPU: 0 UID: 0 PID: 1 Comm: swapper Not tainted 6.12.0-atari-03776-g7eaa1f99261a #1783 Stack from 0102fe5c: 0102fe5c 00514a2b 00514a2b ffffff00 00000001 0051f5ed 00425e78 00514a2b 0041eb74 ffffffea 00000310 0051f5ed ffffffea ffffffea 00601f60 00000044 0102ff20 000e7a68 0051ab8e 004383b8 0051f5ed ffffffea 000000b8 00000007 01020c00 00000000 000e77f0 0041e5f0 005f67c0 0051f5ed 000000b6 0102fef4 00000310 0102fef4 00000000 00000016 005f676c 0060a34c 00000010 00000004 00000038 0000009a 01000000 000000b8 005f668e 0102e000 00001372 0102ff88 Call Trace: [<00425e78>] dump_stack+0xc/0x10 [<0041eb74>] panic+0xd8/0x26c [<000e7a68>] __kmem_cache_create_args+0x278/0x2e8 [<000e77f0>] __kmem_cache_create_args+0x0/0x2e8 [<0041e5f0>] memset+0x0/0x8c [<005f67c0>] io_uring_init+0x54/0xd2 The minimal alignment of an integral type may differ from its size, hence is not safe to assume that an arbitrary freeptr_t (which is basically an unsigned long) is always aligned to 4 or 8 bytes. As nothing seems to require the additional alignment, it is safe to fix this by relaxing the check to the actual minimum alignment of freeptr_t. Fixes: aaa736b186239b7d ("io_uring: specify freeptr usage for SLAB_TYPESAFE_BY_RCU io_kiocb cache") Fixes: d345bd2e ("mm: add kmem_cache_create_rcu()") Reported-by:
Guenter Roeck <linux@roeck-us.net> Closes: https://lore.kernel.org/37c588d4-2c32-4aad-a19e-642961f200d7@roeck-us.net Cc: <stable@vger.kernel.org> Signed-off-by:
Geert Uytterhoeven <geert@linux-m68k.org> Tested-by:
Guenter Roeck <linux@roeck-us.net> Reviewed-by:
Jens Axboe <axboe@kernel.dk> Signed-off-by:
Vlastimil Babka <vbabka@suse.cz> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Zijun Hu authored
commit 688d2eb4c6fcfdcdaed0592f9df9196573ff5ce2 upstream. In addition to a primary endpoint controller, an endpoint function may be associated with a secondary endpoint controller, epf->sec_epc, to provide NTB (non-transparent bridge) functionality. Previously, pci_epc_remove_epf() incorrectly cleared epf->epc instead of epf->sec_epc when removing from the secondary endpoint controller. Extend the epc->list_lock coverage and clear either epf->epc or epf->sec_epc as indicated. Link: https://lore.kernel.org/r/20241107-epc_rfc-v2-2-da5b6a99a66f@quicinc.com Fixes: 63840ff5 ("PCI: endpoint: Add support to associate secondary EPC with EPF") Signed-off-by:
Zijun Hu <quic_zijuhu@quicinc.com> Reviewed-by:
Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> [mani: reworded subject and description] Signed-off-by:
Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> [bhelgaas: commit log] Signed-off-by:
Bjorn Helgaas <bhelgaas@google.com> Signed-off-by:
Krzysztof Wilczyński <kwilczynski@kernel.org> Cc: stable@vger.kernel.org Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Zijun Hu authored
commit 4acc902ed3743edd4ac2d3846604a99d17104359 upstream. pci_epc_destroy() invokes pci_bus_release_domain_nr() to release the PCI domain ID, but there are two issues: - 'epc->dev' is passed to pci_bus_release_domain_nr() which was already freed by device_unregister(), leading to a use-after-free issue. - Domain ID corresponds to the EPC device parent, so passing 'epc->dev' is also wrong. Fix these issues by passing 'epc->dev.parent' to pci_bus_release_domain_nr() and also do it before device_unregister(). Fixes: 0328947c ("PCI: endpoint: Assign PCI domain number for endpoint controllers") Signed-off-by:
Zijun Hu <quic_zijuhu@quicinc.com> Reviewed-by:
Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Link: https://lore.kernel.org/r/20241107-epc_rfc-v2-1-da5b6a99a66f@quicinc.com [mani: reworded subject and description] Signed-off-by:
Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Signed-off-by:
Bjorn Helgaas <bhelgaas@google.com> Signed-off-by:
Krzysztof Wilczyński <kwilczynski@kernel.org> Cc: stable@vger.kernel.org Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Kishon Vijay Abraham I authored
commit 9e9ec8d8692a6f64d81ef67d4fb6255af6be684b upstream. K2G forwards the error triggered by a link-down state (e.g., no connected endpoint device) on the system bus for PCI configuration transactions; these errors are reported as an SError at system level, which is fatal and hangs the system. So, apply fix similar to how it was done in the DesignWare Core driver commit 15b23906 ("PCI: dwc: Add link up check in dw_child_pcie_ops.map_bus()"). Fixes: 10a797c6 ("PCI: dwc: keystone: Use pci_ops for config space accessors") Link: https://lore.kernel.org/r/20240524105714.191642-3-s-vadapalli@ti.com Signed-off-by:
Kishon Vijay Abraham I <kishon@ti.com> Signed-off-by:
Siddharth Vadapalli <s-vadapalli@ti.com> [kwilczynski: commit log, added tag for stable releases] Signed-off-by:
Krzysztof Wilczyński <kwilczynski@kernel.org> Cc: stable@vger.kernel.org Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Kishon Vijay Abraham I authored
commit 5a938ed9481b0c06cb97aec45e722a80568256fd upstream. commit 23284ad6 ("PCI: keystone: Add support for PCIe EP in AM654x Platforms") introduced configuring "enum dw_pcie_device_mode" as part of device data ("struct ks_pcie_of_data"). However it failed to set the mode for "ti,keystone-pcie" compatible. Since the mode defaults to "DW_PCIE_UNKNOWN_TYPE", the following error message is displayed for the v3.65a controller: "INVALID device type 0" Despite the driver probing successfully, the controller may not be functional in the Root Complex mode of operation. So, set the mode as Root Complex for "ti,keystone-pcie" compatible to fix this. Fixes: 23284ad6 ("PCI: keystone: Add support for PCIe EP in AM654x Platforms") Link: https://lore.kernel.org/r/20240524105714.191642-2-s-vadapalli@ti.com Signed-off-by:
Kishon Vijay Abraham I <kishon@ti.com> Signed-off-by:
Siddharth Vadapalli <s-vadapalli@ti.com> [kwilczynski: commit log, added tag for stable releases] Signed-off-by:
Krzysztof Wilczyński <kwilczynski@kernel.org> Cc: stable@vger.kernel.org Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Frank Li authored
commit 25bc99be5fe53853053ceeaa328068c49dc1e799 upstream. Fix issue where disabling IBI on one device disables the entire IBI interrupt. Modify bit 7:0 of enabled_events to serve as an IBI enable counter, ensuring that the system IBI interrupt is disabled only when all I3C devices have IBI disabled. Cc: stable@kernel.org Fixes: 7ff730ca ("i3c: master: svc: enable the interrupt in the enable ibi function") Reviewed-by:
Miquel Raynal <miquel.raynal@bootlin.com> Signed-off-by:
Frank Li <Frank.Li@nxp.com> Link: https://lore.kernel.org/r/20241101165002.2479794-1-Frank.Li@nxp.com Signed-off-by:
Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Frank Li authored
commit 3b2ac810d86eb96e882db80a3320a3848b133208 upstream. svc_i3c_master_do_daa() { ... for (i = 0; i < dev_nb; i++) { ret = i3c_master_add_i3c_dev_locked(m, addrs[i]); if (ret) goto rpm_out; } } If two devices (A and B) are detected in DAA and address 0xa is assigned to device A and 0xb to device B, a failure in i3c_master_add_i3c_dev_locked() for device A (addr: 0xa) could prevent device B (addr: 0xb) from being registered on the bus. The I3C stack might still consider 0xb a free address. If a subsequent Hotjoin occurs, 0xb might be assigned to Device A, causing both devices A and B to use the same address 0xb, violating the I3C specification. The return value for i3c_master_add_i3c_dev_locked() should not be checked because subsequent steps will scan the entire I3C bus, independent of whether i3c_master_add_i3c_dev_locked() returns success. If device A registration fails, there is still a chance to register device B. i3c_master_add_i3c_dev_locked() can reset DAA if a failure occurs while retrieving device information. Cc: stable@kernel.org Fixes: 317bacf9 ("i3c: master: add enable(disable) hot join in sys entry") Reviewed-by:
Miquel Raynal <miquel.raynal@bootlin.com> Signed-off-by:
Frank Li <Frank.Li@nxp.com> Link: https://lore.kernel.org/r/20241002-svc-i3c-hj-v6-6-7e6e1d3569ae@nxp.com Signed-off-by:
Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Frank Li authored
commit 3082990592f7c6d7510a9133afa46e31bbe26533 upstream. if (dev->boardinfo && dev->boardinfo->init_dyn_addr) ^^^ here check "init_dyn_addr" i3c_bus_set_addr_slot_status(&master->bus, dev->info.dyn_addr, ...) ^^^^ free "dyn_addr" Fix copy/paste error "dyn_addr" by replacing it with "init_dyn_addr". Cc: stable@kernel.org Fixes: 3a379bbc ("i3c: Add core I3C infrastructure") Reviewed-by:
Miquel Raynal <miquel.raynal@bootlin.com> Signed-off-by:
Frank Li <Frank.Li@nxp.com> Link: https://lore.kernel.org/r/20241001162608.224039-1-Frank.Li@nxp.com Signed-off-by:
Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Jinjie Ruan authored
commit 18599e93e4e814ce146186026c6abf83c14d5798 upstream. It is not valid to call pm_runtime_set_suspended() for devices with runtime PM enabled because it returns -EAGAIN if it is enabled already and working. So, call pm_runtime_disable() before to fix it. Cc: stable@vger.kernel.org # v5.17 Fixes: 05be23ef ("i3c: master: svc: add runtime pm support") Reviewed-by:
Frank Li <Frank.Li@nxp.com> Reviewed-by:
Miquel Raynal <miquel.raynal@bootlin.com> Signed-off-by:
Jinjie Ruan <ruanjinjie@huawei.com> Link: https://lore.kernel.org/r/20240930091913.2545510-1-ruanjinjie@huawei.com Signed-off-by:
Alexandre Belloni <alexandre.belloni@bootlin.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Peter Griffin authored
commit ceef938bbf8b93ba3a218b4adc244cde94b582aa upstream. v1 of the patch which introduced the ufshcd_vops_hibern8_notify() callback used a bool instead of an enum. In v2 this was updated to an enum based on the review feedback in [1]. ufs-exynos hibernate calls have always been broken upstream as it follows the v1 bool implementation. Link: https://patchwork.kernel.org/project/linux-scsi/patch/001f01d23994$719997c0$54ccc740$@samsung.com/ [1] Fixes: 55f4b1f7 ("scsi: ufs: ufs-exynos: Add UFS host support for Exynos SoCs") Signed-off-by:
Peter Griffin <peter.griffin@linaro.org> Link: https://lore.kernel.org/r/20241031150033.3440894-13-peter.griffin@linaro.org Cc: stable@vger.kernel.org Reviewed-by:
Tudor Ambarus <tudor.ambarus@linaro.org> Signed-off-by:
Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Peter Griffin authored
commit c662cedea14efdcf373d8d886ec18019d50e0772 upstream. Move the EXYNOS_UFS_OPT_UFSPR_SECURE check inside exynos_ufs_config_smu(). This way all call sites will benefit from the check. This fixes a bug currently in the exynos_ufs_resume() path on gs101 as it calls exynos_ufs_config_smu() and we end up accessing registers that can only be accessed from secure world which results in a serror. Fixes: d11e0a31 ("scsi: ufs: exynos: Add support for Tensor gs101 SoC") Signed-off-by:
Peter Griffin <peter.griffin@linaro.org> Link: https://lore.kernel.org/r/20241031150033.3440894-5-peter.griffin@linaro.org Cc: stable@vger.kernel.org Reviewed-by:
Tudor Ambarus <tudor.ambarus@linaro.org> Signed-off-by:
Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Heiko Carstens authored
commit 588a9836a4ef7ec3bfcffda526dfa399637e6cfc upstream. arch_stack_walk_user_common() contains a return statement instead of a break statement in case store_ip() fails while trying to store a callchain entry of a user space process. This may lead to a missing pagefault_enable() call. If this happens any subsequent page fault of the process won't be resolved by the page fault handler and this in turn will lead to the process being killed. Use a break instead of a return statement to fix this. Fixes: ebd912ff ("s390/stacktrace: Merge perf_callchain_user() and arch_stack_walk_user()") Cc: stable@vger.kernel.org Reviewed-by:
Jens Remus <jremus@linux.ibm.com> Signed-off-by:
Heiko Carstens <hca@linux.ibm.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Alexandru Ardelean authored
commit bc73b4186736341ab5cd2c199da82db6e1134e13 upstream. A bug was found in the find_closest() (find_closest_descending() is also affected after some testing), where for certain values with small progressions, the rounding (done by averaging 2 values) causes an incorrect index to be returned. The rounding issues occur for progressions of 1, 2 and 3. It goes away when the progression/interval between two values is 4 or larger. It's particularly bad for progressions of 1. For example if there's an array of 'a = { 1, 2, 3 }', using 'find_closest(2, a ...)' would return 0 (the index of '1'), rather than returning 1 (the index of '2'). This means that for exact values (with a progression of 1), find_closest() will misbehave and return the index of the value smaller than the one we're searching for. For progressions of 2 and 3, the exact values are obtained correctly; but values aren't approximated correctly (as one would expect). Starting with progressions of 4, all seems to be good (one gets what one would expect). While one could argue that 'find_closest()' should not be used for arrays with progressions of 1 (i.e. '{1, 2, 3, ...}', the macro should still behave correctly. The bug was found while testing the 'drivers/iio/adc/ad7606.c', specifically the oversampling feature. For reference, the oversampling values are listed as: static const unsigned int ad7606_oversampling_avail[7] = { 1, 2, 4, 8, 16, 32, 64, }; When doing: 1. $ echo 1 > /sys/bus/iio/devices/iio\:device0/oversampling_ratio $ cat /sys/bus/iio/devices/iio\:device0/oversampling_ratio 1 # this is fine 2. $ echo 2 > /sys/bus/iio/devices/iio\:device0/oversampling_ratio $ cat /sys/bus/iio/devices/iio\:device0/oversampling_ratio 1 # this is wrong; 2 should be returned here 3. $ echo 3 > /sys/bus/iio/devices/iio\:device0/oversampling_ratio $ cat /sys/bus/iio/devices/iio\:device0/oversampling_ratio 2 # this is fine 4. $ echo 4 > /sys/bus/iio/devices/iio\:device0/oversampling_ratio $ cat /sys/bus/iio/devices/iio\:device0/oversampling_ratio 4 # this is fine And from here-on, the values are as correct (one gets what one would expect.) While writing a kunit test for this bug, a peculiar issue was found for the array in the 'drivers/hwmon/ina2xx.c' & 'drivers/iio/adc/ina2xx-adc.c' drivers. While running the kunit test (for 'ina226_avg_tab' from these drivers): * idx = find_closest([-1 to 2], ina226_avg_tab, ARRAY_SIZE(ina226_avg_tab)); This returns idx == 0, so value. * idx = find_closest(3, ina226_avg_tab, ARRAY_SIZE(ina226_avg_tab)); This returns idx == 0, value 1; and now one could argue whether 3 is closer to 4 or to 1. This quirk only appears for value '3' in this array, but it seems to be a another rounding issue. * And from 4 onwards the 'find_closest'() works fine (one gets what one would expect). This change reworks the find_closest() macros to also check the difference between the left and right elements when 'x'. If the distance to the right is smaller (than the distance to the left), the index is incremented by 1. This also makes redundant the need for using the DIV_ROUND_CLOSEST() macro. In order to accommodate for any mix of negative + positive values, the internal variables '__fc_x', '__fc_mid_x', '__fc_left' & '__fc_right' are forced to 'long' type. This also addresses any potential bugs/issues with 'x' being of an unsigned type. In those situations any comparison between signed & unsigned would be promoted to a comparison between 2 unsigned numbers; this is especially annoying when '__fc_left' & '__fc_right' underflow. The find_closest_descending() macro was also reworked and duplicated from the find_closest(), and it is being iterated in reverse. The main reason for this is to get the same indices as 'find_closest()' (but in reverse). The comparison for '__fc_right < __fc_left' favors going the array in ascending order. For example for array '{ 1024, 512, 256, 128, 64, 16, 4, 1 }' and x = 3, we get: __fc_mid_x = 2 __fc_left = -1 __fc_right = -2 Then '__fc_right < __fc_left' evaluates to true and '__fc_i++' becomes 7 which is not quite incorrect, but 3 is closer to 4 than to 1. This change has been validated with the kunit from the next patch. Link: https://lkml.kernel.org/r/20241105145406.554365-1-aardelean@baylibre.com Fixes: 95d11952 ("util_macros.h: add find_closest() macro") Signed-off-by:
Alexandru Ardelean <aardelean@baylibre.com> Cc: Bartosz Golaszewski <bartosz.golaszewski@linaro.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Miquel Raynal authored
commit fee9b240916df82a8b07aef0fdfe96785417a164 upstream. These four chips: * W25N512GW * W25N01GW * W25N01JW * W25N02JW all require a single bit of ECC strength and thus feature an on-die Hamming-like ECC engine. There is no point in filling a ->get_status() callback for them because the main ECC status bytes are located in standard places, and retrieving the number of bitflips in case of corrected chunk is both useless and unsupported (if there are bitflips, then there is 1 at most, so no need to query the chip for that). Without this change, a kernel warning triggers every time a bit flips. Fixes: 6a804fb7 ("mtd: spinand: winbond: add support for serial NAND flash") Cc: stable@vger.kernel.org Signed-off-by:
Miquel Raynal <miquel.raynal@bootlin.com> Reviewed-by:
Frieder Schrempf <frieder.schrempf@kontron.de> Link: https://lore.kernel.org/linux-mtd/20241009125002.191109-3-miquel.raynal@bootlin.com Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Miquel Raynal authored
commit c1247de51cab53fc357a73804c11fb4fba55b2d9 upstream. Both W25N512GW and W25N02JW chips have 64 bytes of OOB and thus cannot use the layout for 128 bytes OOB. Reference the correct layout instead. Fixes: 6a804fb7 ("mtd: spinand: winbond: add support for serial NAND flash") Cc: stable@vger.kernel.org Signed-off-by:
Miquel Raynal <miquel.raynal@bootlin.com> Reviewed-by:
Frieder Schrempf <frieder.schrempf@kontron.de> Link: https://lore.kernel.org/linux-mtd/20241009125002.191109-2-miquel.raynal@bootlin.com Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Max Kellermann authored
commit c5cf420303256dcd6ff175643e9e9558543c2047 upstream. get_current_cred() increments the reference counter, but the put_cred() call was missing. Cc: stable@vger.kernel.org Fixes: 596afb0b ("ceph: add ceph_mds_check_access() helper") Signed-off-by:
Max Kellermann <max.kellermann@ionos.com> Reviewed-by:
Xiubo Li <xiubli@redhat.com> Signed-off-by:
Ilya Dryomov <idryomov@gmail.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Max Kellermann authored
commit 23426309a4064b25a961e1c72961d8bfc7c8c990 upstream. This eliminates a redundant get_current_cred() call, because ceph_mds_check_access() has already obtained this pointer. As a side effect, this also fixes a reference leak in ceph_mds_auth_match(): by omitting the get_current_cred() call, no additional cred reference is taken. Cc: stable@vger.kernel.org Fixes: 596afb0b ("ceph: add ceph_mds_check_access() helper") Signed-off-by:
Max Kellermann <max.kellermann@ionos.com> Reviewed-by:
Xiubo Li <xiubli@redhat.com> Signed-off-by:
Ilya Dryomov <idryomov@gmail.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Patrick Donnelly authored
commit 955710afcb3bb63e21e186451ed5eba85fa14d0b upstream. Previously, the "name" in the new device syntax "<name>@<fsid>.<fsname>" was ignored because (presumably) tests were done using mount.ceph which also passed the entity name using "-o name=foo". If mounting is done without the mount.ceph helper, the new device id syntax fails to set the name properly. Cc: stable@vger.kernel.org Link: https://tracker.ceph.com/issues/68516 Signed-off-by:
Patrick Donnelly <pdonnell@redhat.com> Reviewed-by:
Ilya Dryomov <idryomov@gmail.com> Signed-off-by:
Ilya Dryomov <idryomov@gmail.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Chao Yu authored
commit bc8aeb04fd80cb8cfae3058445c84410fd0beb5e upstream. Piergiorgio reported a bug in bugzilla as below: ------------[ cut here ]------------ WARNING: CPU: 2 PID: 969 at fs/f2fs/segment.c:1330 RIP: 0010:__submit_discard_cmd+0x27d/0x400 [f2fs] Call Trace: __issue_discard_cmd+0x1ca/0x350 [f2fs] issue_discard_thread+0x191/0x480 [f2fs] kthread+0xcf/0x100 ret_from_fork+0x31/0x50 ret_from_fork_asm+0x1a/0x30 w/ below testcase, it can reproduce this bug quickly: - pvcreate /dev/vdb - vgcreate myvg1 /dev/vdb - lvcreate -L 1024m -n mylv1 myvg1 - mount /dev/myvg1/mylv1 /mnt/f2fs - dd if=/dev/zero of=/mnt/f2fs/file bs=1M count=20 - sync - rm /mnt/f2fs/file - sync - lvcreate -L 1024m -s -n mylv1-snapshot /dev/myvg1/mylv1 - umount /mnt/f2fs The root cause is: it will update discard_max_bytes of mounted lvm device to zero after creating snapshot on this lvm device, then, __submit_discard_cmd() will pass parameter @nr_sects w/ zero value to __blkdev_issue_discard(), it returns a NULL bio pointer, result in panic. This patch changes as below for fixing: 1. Let's drop all remained discards in f2fs_unfreeze() if snapshot of lvm device is created. 2. Checking discard_max_bytes before submitting discard during __submit_discard_cmd(). Cc: stable@vger.kernel.org Fixes: 35ec7d57 ("f2fs: split discard command in prior to block layer") Reported-by:
Piergiorgio Sartor <piergiorgio.sartor@nexgo.de> Closes: https://bugzilla.kernel.org/show_bug.cgi?id=219484 Signed-off-by:
Chao Yu <chao@kernel.org> Signed-off-by:
Jaegeuk Kim <jaegeuk@kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
yuan.gao authored
commit dbc16915279a548a204154368da23d402c141c81 upstream. Boot with slub_debug=UFPZ. If allocated object failed in alloc_consistency_checks, all objects of the slab will be marked as used, and then the slab will be removed from the partial list. When an object belonging to the slab got freed later, the remove_full() function is called. Because the slab is neither on the partial list nor on the full list, it eventually lead to a list corruption (actually a list poison being detected). So we need to mark and isolate the slab page with metadata corruption, do not put it back in circulation. Because the debug caches avoid all the fastpaths, reusing the frozen bit to mark slab page with metadata corruption seems to be fine. [ 4277.385669] list_del corruption, ffffea00044b3e50->next is LIST_POISON1 (dead000000000100) [ 4277.387023] ------------[ cut here ]------------ [ 4277.387880] kernel BUG at lib/list_debug.c:56! [ 4277.388680] invalid opcode: 0000 [#1] PREEMPT SMP PTI [ 4277.389562] CPU: 5 PID: 90 Comm: kworker/5:1 Kdump: loaded Tainted: G OE 6.6.1-1 #1 [ 4277.392113] Workqueue: xfs-inodegc/vda1 xfs_inodegc_worker [xfs] [ 4277.393551] RIP: 0010:__list_del_entry_valid_or_report+0x7b/0xc0 [ 4277.394518] Code: 48 91 82 e8 37 f9 9a ff 0f 0b 48 89 fe 48 c7 c7 28 49 91 82 e8 26 f9 9a ff 0f 0b 48 89 fe 48 c7 c7 58 49 91 [ 4277.397292] RSP: 0018:ffffc90000333b38 EFLAGS: 00010082 [ 4277.398202] RAX: 000000000000004e RBX: ffffea00044b3e50 RCX: 0000000000000000 [ 4277.399340] RDX: 0000000000000002 RSI: ffffffff828f8715 RDI: 00000000ffffffff [ 4277.400545] RBP: ffffea00044b3e40 R08: 0000000000000000 R09: ffffc900003339f0 [ 4277.401710] R10: 0000000000000003 R11: ffffffff82d44088 R12: ffff888112cf9910 [ 4277.402887] R13: 0000000000000001 R14: 0000000000000001 R15: ffff8881000424c0 [ 4277.404049] FS: 0000000000000000(0000) GS:ffff88842fd40000(0000) knlGS:0000000000000000 [ 4277.405357] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 4277.406389] CR2: 00007f2ad0b24000 CR3: 0000000102a3a006 CR4: 00000000007706e0 [ 4277.407589] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 4277.408780] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 4277.410000] PKRU: 55555554 [ 4277.410645] Call Trace: [ 4277.411234] <TASK> [ 4277.411777] ? die+0x32/0x80 [ 4277.412439] ? do_trap+0xd6/0x100 [ 4277.413150] ? __list_del_entry_valid_or_report+0x7b/0xc0 [ 4277.414158] ? do_error_trap+0x6a/0x90 [ 4277.414948] ? __list_del_entry_valid_or_report+0x7b/0xc0 [ 4277.415915] ? exc_invalid_op+0x4c/0x60 [ 4277.416710] ? __list_del_entry_valid_or_report+0x7b/0xc0 [ 4277.417675] ? asm_exc_invalid_op+0x16/0x20 [ 4277.418482] ? __list_del_entry_valid_or_report+0x7b/0xc0 [ 4277.419466] ? __list_del_entry_valid_or_report+0x7b/0xc0 [ 4277.420410] free_to_partial_list+0x515/0x5e0 [ 4277.421242] ? xfs_iext_remove+0x41a/0xa10 [xfs] [ 4277.422298] xfs_iext_remove+0x41a/0xa10 [xfs] [ 4277.423316] ? xfs_inodegc_worker+0xb4/0x1a0 [xfs] [ 4277.424383] xfs_bmap_del_extent_delay+0x4fe/0x7d0 [xfs] [ 4277.425490] __xfs_bunmapi+0x50d/0x840 [xfs] [ 4277.426445] xfs_itruncate_extents_flags+0x13a/0x490 [xfs] [ 4277.427553] xfs_inactive_truncate+0xa3/0x120 [xfs] [ 4277.428567] xfs_inactive+0x22d/0x290 [xfs] [ 4277.429500] xfs_inodegc_worker+0xb4/0x1a0 [xfs] [ 4277.430479] process_one_work+0x171/0x340 [ 4277.431227] worker_thread+0x277/0x390 [ 4277.431962] ? __pfx_worker_thread+0x10/0x10 [ 4277.432752] kthread+0xf0/0x120 [ 4277.433382] ? __pfx_kthread+0x10/0x10 [ 4277.434134] ret_from_fork+0x2d/0x50 [ 4277.434837] ? __pfx_kthread+0x10/0x10 [ 4277.435566] ret_from_fork_asm+0x1b/0x30 [ 4277.436280] </TASK> Fixes: 643b1138 ("slub: enable tracking of full slabs") Suggested-by:
Hyeonggon Yoo <42.hyeyoo@gmail.com> Suggested-by:
Vlastimil Babka <vbabka@suse.cz> Cc: <stable@vger.kernel.org> Signed-off-by:
yuan.gao <yuan.gao@ucloud.cn> Reviewed-by:
Hyeonggon Yoo <42.hyeyoo@gmail.com> Acked-by:
Christoph Lameter <cl@linux.com> Signed-off-by:
Vlastimil Babka <vbabka@suse.cz> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Stefan Eichenberger authored
commit 0a726f542d7c8cc0f9c5ed7df5a4bd4b59ac21b3 upstream. The suspend/resume functionality is currently broken on the i.MX6QDL platform, as documented in the NXP errata (ERR005723): https://www.nxp.com/docs/en/errata/IMX6DQCE.pdf This patch addresses the issue by sharing most of the suspend/resume sequences used by other i.MX devices, while avoiding modifications to critical registers that disrupt the PCIe functionality. It targets the same problem as the following downstream commit: https://github.com/nxp-imx/linux-imx/commit/4e92355e1f79d225ea842511fcfd42b343b32995 Unlike the downstream commit, this patch also resets the connected PCIe device if possible. Without this reset, certain drivers, such as ath10k or iwlwifi, will crash on resume. The device reset is also done by the driver on other i.MX platforms, making this patch consistent with existing practices. Upon resuming, the kernel will hang and display an error. Here's an example of the error encountered with the ath10k driver: ath10k_pci 0000:01:00.0: Unable to change power state from D3hot to D0, device inaccessible Unhandled fault: imprecise external abort (0x1406) at 0x0106f944 Without this patch, suspend/resume will fail on i.MX6QDL devices if a PCIe device is connected. Link: https://lore.kernel.org/r/20241030103250.83640-1-eichest@gmail.com Signed-off-by:
Stefan Eichenberger <stefan.eichenberger@toradex.com> [kwilczynski: commit log, added tag for stable releases] Signed-off-by:
Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by:
Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Acked-by:
Richard Zhu <hongxing.zhu@nxp.com> Cc: stable@vger.kernel.org Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Balaji Pothunoori authored
commit 8a47704d64c9afda80e7f399ba2cf898cfcc45b2 upstream. Currently, the rproc "atomic_t power" variable is incremented during: a. WPSS rproc auto boot. b. AHB power on for ath11k. During AHB power off (rmmod ath11k_ahb.ko), rproc_shutdown fails to unload the WPSS firmware because the rproc->power value is '2', causing the atomic_dec_and_test(&rproc->power) condition to fail. Consequently, during AHB power on (insmod ath11k_ahb.ko), QMI_WLANFW_HOST_CAP_REQ_V01 fails due to the host and firmware QMI states being out of sync. Fixes: 300ed425 ("remoteproc: qcom_q6v5_pas: Add SC7280 ADSP, CDSP & WPSS") Cc: stable@vger.kernel.org Signed-off-by:
Balaji Pothunoori <quic_bpothuno@quicinc.com> Reviewed-by:
Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Link: https://lore.kernel.org/r/20241018105911.165415-1-quic_bpothuno@quicinc.com Signed-off-by:
Bjorn Andersson <andersson@kernel.org> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Xu Yang authored
commit 4a159e6049f319bef6f9e6d2ccdd322f57d24830 upstream. When do perf stat on sys metric, perf tool output nothing now: $ perf stat -a -M imx95_ddr_read.all -I 1000 $ This command runs on an arm64 machine and the Soc has one DDR hw pmu except one armv8_cortex_a55 pmu. Their maps show as follows: const struct pmu_events_map pmu_events_map[] = { { .arch = "arm64", .cpuid = "0x00000000410fd050", .event_table = { .pmus = pmu_events__arm_cortex_a55, .num_pmus = ARRAY_SIZE(pmu_events__arm_cortex_a55) }, .metric_table = { .pmus = NULL, .num_pmus = 0 } }, static const struct pmu_sys_events pmu_sys_event_tables[] = { { .event_table = { .pmus = pmu_events__freescale_imx95_sys, .num_pmus = ARRAY_SIZE(pmu_events__freescale_imx95_sys) }, .metric_table = { .pmus = pmu_metrics__freescale_imx95_sys, .num_pmus = ARRAY_SIZE(pmu_metrics__freescale_imx95_sys) }, .name = "pmu_events__freescale_imx95_sys", }, Currently, pmu_metrics_table__find() will return NULL when only do perf stat on sys metric. Then parse_groups() will never be called to parse sys metric_name, finally perf tool will exit directly. This should be a common problem. To fix the issue, this will keep the logic before commit f20c15d1 ("perf pmu-events: Remember the perf_events_map for a PMU") to return a empty metric table rather than a NULL pointer. This should be fine since the removed part just check if the table match provided metric_name. Without these code, the code in parse_groups() will also check the validity of metrci_name too. Fixes: f20c15d1 ("perf pmu-events: Remember the perf_events_map for a PMU") Reviewed-by:
James Clark <james.clark@linaro.org> Signed-off-by:
Xu Yang <xu.yang_2@nxp.com> Tested-by:
Xu Yang <xu.yang_2@nxp.com> Acked-by:
Ian Rogers <irogers@google.com> Cc: Adrian Hunter <adrian.hunter@intel.com> Cc: Albert Ou <aou@eecs.berkeley.edu> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Alexandre Ghiti <alexghiti@rivosinc.com> Cc: Athira Rajeev <atrajeev@linux.vnet.ibm.com> Cc: Benjamin Gray <bgray@linux.ibm.com> Cc: Ben Zong-You Xie <ben717@andestech.com> Cc: Bibo Mao <maobibo@loongson.cn> Cc: Clément Le Goffic <clement.legoffic@foss.st.com> Cc: Dima Kogan <dima@secretsauce.net> Cc: Dr. David Alan Gilbert <linux@treblig.org> Cc: Huacai Chen <chenhuacai@kernel.org> Cc: Ingo Molnar <mingo@redhat.com> Cc: Jiri Olsa <jolsa@kernel.org> Cc: John Garry <john.g.garry@oracle.com> Cc: Kan Liang <kan.liang@linux.intel.com> Cc: Leo Yan <leo.yan@linux.dev> Cc: Mark Rutland <mark.rutland@arm.com> Cc: Masami Hiramatsu <mhiramat@kernel.org> Cc: Mike Leach <mike.leach@linaro.org> Cc: Namhyung Kim <namhyung@kernel.org> Cc: Palmer Dabbelt <palmer@dabbelt.com> Cc: Paul Walmsley <paul.walmsley@sifive.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Ravi Bangoria <ravi.bangoria@amd.com> Cc: Sandipan Das <sandipan.das@amd.com> Cc: Will Deacon <will@kernel.org> Cc: Yicong Yang <yangyicong@hisilicon.com> Cc: linux-arm-kernel@lists.infradead.org Cc: linux-riscv@lists.infradead.org Cc: stable@vger.kernel.org Link: https://lore.kernel.org/r/20241107162035.52206-2-irogers@google.com Signed-off-by:
Ian Rogers <irogers@google.com> Signed-off-by:
Arnaldo Carvalho de Melo <acme@redhat.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Qiang Yu authored
commit fba6045161d686adc102b6ef71b2fd1e5f90a616 upstream. Currently, the cfg_1_9_0 which is being used for X1E80100 doesn't disable ASPM L0s. However, hardware team recommends to disable L0s as the PHY init sequence is not tuned support L0s. Hence reuse cfg_sc8280xp for X1E80100. Note that the config_sid() callback is not present in cfg_sc8280xp, don't concern about this because config_sid() callback is originally a no-op for X1E80100. Fixes: 6d0c3932 ("PCI: qcom: Add X1E80100 PCIe support") Link: https://lore.kernel.org/r/20241101030902.579789-5-quic_qianyu@quicinc.com Signed-off-by:
Qiang Yu <quic_qianyu@quicinc.com> Signed-off-by:
Krzysztof Wilczyński <kwilczynski@kernel.org> Reviewed-by:
Dmitry Baryshkov <dmitry.baryshkov@linaro.org> Reviewed-by:
Johan Hovold <johan+linaro@kernel.org> Reviewed-by:
Manivannan Sadhasivam <manivannan.sadhasivam@linaro.org> Cc: <stable@vger.kernel.org> # 6.9 Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Giovanni Cabiddu authored
commit 9283b7392570421c22a6c8058614f5b76a46b81c upstream. The unsigned variable `size_t len` is cast to the signed type `loff_t` when passed to the function check_add_overflow(). This function considers the type of the destination, which is of type loff_t (signed), potentially leading to an overflow. This issue is similar to the one described in the link below. Remove the cast. Note that even if check_add_overflow() is bypassed, by setting `len` to a value that is greater than LONG_MAX (which is considered as a negative value after the cast), the function copy_from_user(), invoked a few lines later, will not perform any copy and return `len` as (len > INT_MAX) causing qat_vf_resume_write() to fail with -EFAULT. Fixes: bb208810 ("vfio/qat: Add vfio_pci driver for Intel QAT SR-IOV VF devices") CC: stable@vger.kernel.org # 6.10+ Link: https://lore.kernel.org/all/138bd2e2-ede8-4bcc-aa7b-f3d9de167a37@moroto.mountain Reported-by:
Zijie Zhao <zzjas98@gmail.com> Signed-off-by:
Giovanni Cabiddu <giovanni.cabiddu@intel.com> Reviewed-by:
Xin Zeng <xin.zeng@intel.com> Link: https://lore.kernel.org/r/20241021123843.42979-1-giovanni.cabiddu@intel.com Signed-off-by:
Alex Williamson <alex.williamson@redhat.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Choong Yong Liang authored
commit 59c5e1411a0a13ebb930f4ebba495cc4eb14f8f2 upstream. Set the initial eee_cfg values to have 'ethtool --show-eee ' display the initial EEE configuration. Fixes: 49168d19 ("net: phy: Add phy_support_eee() indicating MAC support EEE") Cc: <stable@vger.kernel.org> Signed-off-by:
Choong Yong Liang <yong.liang.choong@linux.intel.com> Reviewed-by:
Jacob Keller <jacob.e.keller@intel.com> Link: https://patch.msgid.link/20241120083818.1079456-1-yong.liang.choong@linux.intel.com Signed-off-by:
Paolo Abeni <pabeni@redhat.com> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Linus Walleij authored
commit 93ee385254d53849c01dd8ab9bc9d02790ee7f0e upstream. The code for syncing vmalloc memory PGD pointers is using atomic_read() in pair with atomic_set_release() but the proper pairing is atomic_read_acquire() paired with atomic_set_release(). This is done to clearly instruct the compiler to not reorder the memcpy() or similar calls inside the section so that we do not observe changes to init_mm. memcpy() calls should be identified by the compiler as having unpredictable side effects, but let's try to be on the safe side. Cc: stable@vger.kernel.org Fixes: d31e23af ("ARM: mm: make vmalloc_seq handling SMP safe") Suggested-by:
Mark Rutland <mark.rutland@arm.com> Signed-off-by:
Linus Walleij <linus.walleij@linaro.org> Signed-off-by:
Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-
Linus Walleij authored
commit 44e9a3bb76e5f2eecd374c8176b2c5163c8bb2e2 upstream. When switching task, in addition to a dummy read from the new VMAP stack, also do a dummy read from the VMAP stack's corresponding KASAN shadow memory to sync things up in the new MM context. Cc: stable@vger.kernel.org Fixes: a1c510d0 ("ARM: implement support for vmap'ed stacks") Link: https://lore.kernel.org/linux-arm-kernel/a1a1d062-f3a2-4d05-9836-3b098de9db6d@foss.st.com/ Reported-by:
Clement LE GOFFIC <clement.legoffic@foss.st.com> Suggested-by:
Ard Biesheuvel <ardb@kernel.org> Signed-off-by:
Linus Walleij <linus.walleij@linaro.org> Signed-off-by:
Russell King (Oracle) <rmk+kernel@armlinux.org.uk> Signed-off-by:
Greg Kroah-Hartman <gregkh@linuxfoundation.org>
-