Skip to content
Snippets Groups Projects
  • Lukas Wunner's avatar
    69d2ceac
    PCI: Fix use-after-free of slot->bus on hot remove · 69d2ceac
    Lukas Wunner authored
    commit c7acef99642b763ba585f4a43af999fcdbcc3dc4 upstream.
    
    Dennis reports a boot crash on recent Lenovo laptops with a USB4 dock.
    
    Since commit 0fc70886 ("thunderbolt: Reset USB4 v2 host router") and
    commit 59a54c5f ("thunderbolt: Reset topology created by the boot
    firmware"), USB4 v2 and v1 Host Routers are reset on probe of the
    thunderbolt driver.
    
    The reset clears the Presence Detect State and Data Link Layer Link Active
    bits at the USB4 Host Router's Root Port and thus causes hot removal of the
    dock.
    
    The crash occurs when pciehp is unbound from one of the dock's Downstream
    Ports:  pciehp creates a pci_slot on bind and destroys it on unbind.  The
    pci_slot contains a pointer to the pci_bus below the Downstream Port, but
    a reference on that pci_bus is never acquired.  The pci_bus is destroyed
    before the pci_slot, so a use-after-free ensues when pci_slot_release()
    accesses slot->bus.
    
    In principle this should not happen because pci_stop_bus_device() unbinds
    pciehp (and therefore destroys the pci_slot) before the pci_bus is
    destroyed by pci_remove_bus_device().
    
    However the stacktrace provided by Dennis shows that pciehp is unbound from
    pci_remove_bus_device() instead of pci_stop_bus_device().  To understand
    the significance of this, one needs to know that the PCI core uses a two
    step process to remove a portion of the hierarchy:  It first unbinds all
    drivers in the sub-hierarchy in pci_stop_bus_device() and then actually
    removes the devices in pci_remove_bus_device().  There is no precaution to
    prevent driver binding in-between pci_stop_bus_device() and
    pci_remove_bus_device().
    
    In Dennis' case, it seems removal of the hierarchy by pciehp races with
    driver binding by pci_bus_add_devices().  pciehp is bound to the
    Downstream Port after pci_stop_bus_device() has run, so it is unbound by
    pci_remove_bus_device() instead of pci_stop_bus_device().  Because the
    pci_bus has already been destroyed at that point, accesses to it result in
    a use-after-free.
    
    One might conclude that driver binding needs to be prevented after
    pci_stop_bus_device() has run.  However it seems risky that pci_slot points
    to pci_bus without holding a reference.  Solely relying on correct ordering
    of driver unbind versus pci_bus destruction is certainly not defensive
    programming.
    
    If pci_slot has a need to access data in pci_bus, it ought to acquire a
    reference.  Amend pci_create_slot() accordingly.  Dennis reports that the
    crash is not reproducible with this change.
    
    Abridged stacktrace:
    
      pcieport 0000:00:07.0: PME: Signaling with IRQ 156
      pcieport 0000:00:07.0: pciehp: Slot #12 AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug+ Surprise+ Interlock- NoCompl+ IbPresDis- LLActRep+
      pci_bus 0000:20: dev 00, created physical slot 12
      pcieport 0000:00:07.0: pciehp: Slot(12): Card not present
      ...
      pcieport 0000:21:02.0: pciehp: pcie_disable_notification: SLOTCTRL d8 write cmd 0
      Oops: general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6b6b: 0000 [#1] PREEMPT SMP NOPTI
      CPU: 13 UID: 0 PID: 134 Comm: irq/156-pciehp Not tainted 6.11.0-devel+ #1
      RIP: 0010:dev_driver_string+0x12/0x40
      pci_destroy_slot
      pciehp_remove
      pcie_port_remove_service
      device_release_driver_internal
      bus_remove_device
      device_del
      device_unregister
      remove_iter
      device_for_each_child
      pcie_portdrv_remove
      pci_device_remove
      device_release_driver_internal
      bus_remove_device
      device_del
      pci_remove_bus_device (recursive invocation)
      pci_remove_bus_device
      pciehp_unconfigure_device
      pciehp_disable_slot
      pciehp_handle_presence_or_link_change
      pciehp_ist
    
    Link: https://lore.kernel.org/r/4bfd4c0e976c1776cd08e76603903b338cf25729.1728579288.git.lukas@wunner.de
    
    
    Reported-by: default avatarDennis Wassenberg <Dennis.Wassenberg@secunet.com>
    Closes: https://lore.kernel.org/r/6de4b45ff2b32dd91a805ec02ec8ec73ef411bf6.camel@secunet.com/
    
    
    Tested-by: default avatarDennis Wassenberg <Dennis.Wassenberg@secunet.com>
    Signed-off-by: default avatarLukas Wunner <lukas@wunner.de>
    Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
    Reviewed-by: default avatarMika Westerberg <mika.westerberg@linux.intel.com>
    Cc: stable@vger.kernel.org
    Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    69d2ceac
    History
    PCI: Fix use-after-free of slot->bus on hot remove
    Lukas Wunner authored
    commit c7acef99642b763ba585f4a43af999fcdbcc3dc4 upstream.
    
    Dennis reports a boot crash on recent Lenovo laptops with a USB4 dock.
    
    Since commit 0fc70886 ("thunderbolt: Reset USB4 v2 host router") and
    commit 59a54c5f ("thunderbolt: Reset topology created by the boot
    firmware"), USB4 v2 and v1 Host Routers are reset on probe of the
    thunderbolt driver.
    
    The reset clears the Presence Detect State and Data Link Layer Link Active
    bits at the USB4 Host Router's Root Port and thus causes hot removal of the
    dock.
    
    The crash occurs when pciehp is unbound from one of the dock's Downstream
    Ports:  pciehp creates a pci_slot on bind and destroys it on unbind.  The
    pci_slot contains a pointer to the pci_bus below the Downstream Port, but
    a reference on that pci_bus is never acquired.  The pci_bus is destroyed
    before the pci_slot, so a use-after-free ensues when pci_slot_release()
    accesses slot->bus.
    
    In principle this should not happen because pci_stop_bus_device() unbinds
    pciehp (and therefore destroys the pci_slot) before the pci_bus is
    destroyed by pci_remove_bus_device().
    
    However the stacktrace provided by Dennis shows that pciehp is unbound from
    pci_remove_bus_device() instead of pci_stop_bus_device().  To understand
    the significance of this, one needs to know that the PCI core uses a two
    step process to remove a portion of the hierarchy:  It first unbinds all
    drivers in the sub-hierarchy in pci_stop_bus_device() and then actually
    removes the devices in pci_remove_bus_device().  There is no precaution to
    prevent driver binding in-between pci_stop_bus_device() and
    pci_remove_bus_device().
    
    In Dennis' case, it seems removal of the hierarchy by pciehp races with
    driver binding by pci_bus_add_devices().  pciehp is bound to the
    Downstream Port after pci_stop_bus_device() has run, so it is unbound by
    pci_remove_bus_device() instead of pci_stop_bus_device().  Because the
    pci_bus has already been destroyed at that point, accesses to it result in
    a use-after-free.
    
    One might conclude that driver binding needs to be prevented after
    pci_stop_bus_device() has run.  However it seems risky that pci_slot points
    to pci_bus without holding a reference.  Solely relying on correct ordering
    of driver unbind versus pci_bus destruction is certainly not defensive
    programming.
    
    If pci_slot has a need to access data in pci_bus, it ought to acquire a
    reference.  Amend pci_create_slot() accordingly.  Dennis reports that the
    crash is not reproducible with this change.
    
    Abridged stacktrace:
    
      pcieport 0000:00:07.0: PME: Signaling with IRQ 156
      pcieport 0000:00:07.0: pciehp: Slot #12 AttnBtn- PwrCtrl- MRL- AttnInd- PwrInd- HotPlug+ Surprise+ Interlock- NoCompl+ IbPresDis- LLActRep+
      pci_bus 0000:20: dev 00, created physical slot 12
      pcieport 0000:00:07.0: pciehp: Slot(12): Card not present
      ...
      pcieport 0000:21:02.0: pciehp: pcie_disable_notification: SLOTCTRL d8 write cmd 0
      Oops: general protection fault, probably for non-canonical address 0x6b6b6b6b6b6b6b6b: 0000 [#1] PREEMPT SMP NOPTI
      CPU: 13 UID: 0 PID: 134 Comm: irq/156-pciehp Not tainted 6.11.0-devel+ #1
      RIP: 0010:dev_driver_string+0x12/0x40
      pci_destroy_slot
      pciehp_remove
      pcie_port_remove_service
      device_release_driver_internal
      bus_remove_device
      device_del
      device_unregister
      remove_iter
      device_for_each_child
      pcie_portdrv_remove
      pci_device_remove
      device_release_driver_internal
      bus_remove_device
      device_del
      pci_remove_bus_device (recursive invocation)
      pci_remove_bus_device
      pciehp_unconfigure_device
      pciehp_disable_slot
      pciehp_handle_presence_or_link_change
      pciehp_ist
    
    Link: https://lore.kernel.org/r/4bfd4c0e976c1776cd08e76603903b338cf25729.1728579288.git.lukas@wunner.de
    
    
    Reported-by: default avatarDennis Wassenberg <Dennis.Wassenberg@secunet.com>
    Closes: https://lore.kernel.org/r/6de4b45ff2b32dd91a805ec02ec8ec73ef411bf6.camel@secunet.com/
    
    
    Tested-by: default avatarDennis Wassenberg <Dennis.Wassenberg@secunet.com>
    Signed-off-by: default avatarLukas Wunner <lukas@wunner.de>
    Signed-off-by: default avatarBjorn Helgaas <bhelgaas@google.com>
    Reviewed-by: default avatarMika Westerberg <mika.westerberg@linux.intel.com>
    Cc: stable@vger.kernel.org
    Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>