- Apr 19, 2012
-
-
Alex Williamson authored
As pointed out by Jason Baron, when assigning a device to a guest we first set the iommu domain pointer, which enables mapping and unmapping of memory slots to the iommu. This leaves a window where this path is enabled, but we haven't synchronized the iommu mappings to the existing memory slots. Thus a slot being removed at that point could send us down unexpected code paths removing non-existent pinnings and iommu mappings. Take the slots_lock around creating the iommu domain and initial mappings as well as around iommu teardown to avoid this race. Signed-off-by:
Alex Williamson <alex.williamson@redhat.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
- Apr 12, 2012
-
-
Alex Williamson authored
We've been adding new mappings, but not destroying old mappings. This can lead to a page leak as pages are pinned using get_user_pages, but only unpinned with put_page if they still exist in the memslots list on vm shutdown. A memslot that is destroyed while an iommu domain is enabled for the guest will therefore result in an elevated page reference count that is never cleared. Additionally, without this fix, the iommu is only programmed with the first translation for a gpa. This can result in peer-to-peer errors if a mapping is destroyed and replaced by a new mapping at the same gpa as the iommu will still be pointing to the original, pinned memory address. Signed-off-by:
Alex Williamson <alex.williamson@redhat.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
- Mar 20, 2012
-
-
Jan Kiszka authored
As kvm_notify_acked_irq calls kvm_assigned_dev_ack_irq under rcu_read_lock, we cannot use a mutex in the latter function. Switch to a spin lock to address this. Signed-off-by:
Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by:
Avi Kivity <avi@redhat.com>
-
- Mar 08, 2012
-
-
Alex Shi authored
Using 'int' type is not suitable for a 'long' object. So, correct it. Signed-off-by:
Alex Shi <alex.shi@intel.com> Signed-off-by:
Avi Kivity <avi@redhat.com>
-
Jan Kiszka authored
PCI 2.3 allows to generically disable IRQ sources at device level. This enables us to share legacy IRQs of such devices with other host devices when passing them to a guest. The new IRQ sharing feature introduced here is optional, user space has to request it explicitly. Moreover, user space can inform us about its view of PCI_COMMAND_INTX_DISABLE so that we can avoid unmasking the interrupt and signaling it if the guest masked it via the virtualized PCI config space. Signed-off-by:
Jan Kiszka <jan.kiszka@siemens.com> Acked-by:
Alex Williamson <alex.williamson@redhat.com> Acked-by:
Michael S. Tsirkin <mst@redhat.com> Signed-off-by:
Avi Kivity <avi@redhat.com>
-
Avi Kivity authored
If some vcpus are created before KVM_CREATE_IRQCHIP, then irqchip_in_kernel() and vcpu->arch.apic will be inconsistent, leading to potential NULL pointer dereferences. Fix by: - ensuring that no vcpus are installed when KVM_CREATE_IRQCHIP is called - ensuring that a vcpu has an apic if it is installed after KVM_CREATE_IRQCHIP This is somewhat long winded because vcpu->arch.apic is created without kvm->lock held. Based on earlier patch by Michael Ellerman. Signed-off-by:
Michael Ellerman <michael@ellerman.id.au> Signed-off-by:
Avi Kivity <avi@redhat.com>
-
Takuya Yoshikawa authored
Other threads may process the same page in that small window and skip TLB flush and then return before these functions do flush. Signed-off-by:
Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by:
Avi Kivity <avi@redhat.com>
-
Takuya Yoshikawa authored
Some members of kvm_memory_slot are not used by every architecture. This patch is the first step to make this difference clear by introducing kvm_memory_slot::arch; lpage_info is moved into it. Signed-off-by:
Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by:
Avi Kivity <avi@redhat.com>
-
Takuya Yoshikawa authored
Narrow down the controlled text inside the conditional so that it will include lpage_info and rmap stuff only. For this we change the way we check whether the slot is being created from "if (npages && !new.rmap)" to "if (npages && !old.npages)". We also stop checking if lpage_info is NULL when we create lpage_info because we do it from inside the slot creation code block. Signed-off-by:
Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by:
Avi Kivity <avi@redhat.com>
-
Takuya Yoshikawa authored
This makes it easy to make lpage_info architecture specific. Signed-off-by:
Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by:
Avi Kivity <avi@redhat.com>
-
Takuya Yoshikawa authored
This patch cleans up the code and removes the "(void)level;" warning suppressor. Note that we can also use this for PT_PAGE_TABLE_LEVEL to treat every level uniformly later. Signed-off-by:
Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by:
Avi Kivity <avi@redhat.com>
-
- Mar 05, 2012
-
-
Paul Mackerras authored
This moves __gfn_to_memslot() and search_memslots() from kvm_main.c to kvm_host.h to reduce the code duplication caused by the need for non-modular code in arch/powerpc/kvm/book3s_hv_rm_mmu.c to call gfn_to_memslot() in real mode. Rather than putting gfn_to_memslot() itself in a header, which would lead to increased code size, this puts __gfn_to_memslot() in a header. Then, the non-modular uses of gfn_to_memslot() are changed to call __gfn_to_memslot() instead. This way there is only one place in the source code that needs to be changed should the gfn_to_memslot() implementation need to be modified. On powerpc, the Book3S HV style of KVM has code that is called from real mode which needs to call gfn_to_memslot() and thus needs this. (Module code is allocated in the vmalloc region, which can't be accessed in real mode.) With this, we can remove builtin_gfn_to_memslot() from book3s_hv_rm_mmu.c. Signed-off-by:
Paul Mackerras <paulus@samba.org> Acked-by:
Avi Kivity <avi@redhat.com> Signed-off-by:
Alexander Graf <agraf@suse.de> Signed-off-by:
Avi Kivity <avi@redhat.com>
-
Michael S. Tsirkin authored
find_index_from_host_irq returns 0 on error but callers assume < 0 on error. This should not matter much: an out of range irq should never happen since irq handler was registered with this irq #, and even if it does we get a spurious msix irq in guest and typically nothing terrible happens. Still, better to make it consistent. Signed-off-by:
Michael S. Tsirkin <mst@redhat.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by:
Avi Kivity <avi@redhat.com>
-
Paul Mackerras authored
This adds an smp_wmb in kvm_mmu_notifier_invalidate_range_end() and an smp_rmb in mmu_notifier_retry() so that mmu_notifier_retry() will give the correct answer when called without kvm->mmu_lock being held. PowerPC Book3S HV KVM wants to use a bitlock per guest page rather than a single global spinlock in order to improve the scalability of updates to the guest MMU hashed page table, and so needs this. Signed-off-by:
Paul Mackerras <paulus@samba.org> Acked-by:
Avi Kivity <avi@redhat.com> Signed-off-by:
Alexander Graf <agraf@suse.de> Signed-off-by:
Avi Kivity <avi@redhat.com>
-
Carsten Otte authored
This patch exports the s390 SIE hardware control block to userspace via the mapping of the vcpu file descriptor. In order to do so, a new arch callback named kvm_arch_vcpu_fault is introduced for all architectures. It allows to map architecture specific pages. Signed-off-by:
Carsten Otte <cotte@de.ibm.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by:
Avi Kivity <avi@redhat.com>
-
Carsten Otte authored
This patch introduces a new config option for user controlled kernel virtual machines. It introduces a parameter to KVM_CREATE_VM that allows to set bits that alter the capabilities of the newly created virtual machine. The parameter is passed to kvm_arch_init_vm for all architectures. The only valid modifier bit for now is KVM_VM_S390_UCONTROL. This requires CAP_SYS_ADMIN privileges and creates a user controlled virtual machine on s390 architectures. Signed-off-by:
Carsten Otte <cotte@de.ibm.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com> Signed-off-by:
Avi Kivity <avi@redhat.com>
-
- Feb 01, 2012
-
-
Takuya Yoshikawa authored
It is possible that the __set_bit() in mark_page_dirty() is called simultaneously on the same region of memory, which may result in only one bit being set, because some callers do not take mmu_lock before mark_page_dirty(). This problem is hard to produce because when we reach mark_page_dirty() beginning from, e.g., tdp_page_fault(), mmu_lock is being held during __direct_map(): making kvm-unit-tests' dirty log api test write to two pages concurrently was not useful for this reason. So we have confirmed that there can actually be race condition by checking if some callers really reach there without holding mmu_lock using spin_is_locked(): probably they were from kvm_write_guest_page(). To fix this race, this patch changes the bit operation to the atomic version: note that nr_dirty_pages also suffers from the race but we do not need exactly correct numbers for now. Signed-off-by:
Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
- Jan 12, 2012
-
-
Rusty Russell authored
module_param(bool) used to counter-intuitively take an int. In fddd5201 (mid-2009) we allowed bool or int/unsigned int using a messy trick. It's time to remove the int/unsigned int option. For this version it'll simply give a warning, but it'll break next kernel version. Acked-by:
Mauro Carvalho Chehab <mchehab@redhat.com> Signed-off-by:
Rusty Russell <rusty@rustcorp.com.au>
-
- Dec 27, 2011
-
-
Hamo authored
by checking the return value from kvm_init_debug, we can ensure that the entries under debugfs for KVM have been created correctly. Signed-off-by:
Yang Bai <hamo.by@gmail.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
Gleb Natapov authored
Drop bsp_vcpu pointer from kvm struct since its only use is incorrect anyway. Signed-off-by:
Gleb Natapov <gleb@redhat.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
Sasha Levin authored
Switch to using memdup_user when possible. This makes code more smaller and compact, and prevents errors. Signed-off-by:
Sasha Levin <levinsasha928@gmail.com> Signed-off-by:
Avi Kivity <avi@redhat.com>
-
Sasha Levin authored
Switch to kmemdup() in two places to shorten the code and avoid possible bugs. Signed-off-by:
Sasha Levin <levinsasha928@gmail.com> Signed-off-by:
Avi Kivity <avi@redhat.com>
-
Julian Stecklina authored
This fixes byte accesses to IOAPIC_REG_SELECT as mandated by at least the ICH10 and Intel Series 5 chipset specs. It also makes ioapic_mmio_write consistent with ioapic_mmio_read, which also allows byte and word accesses. Signed-off-by:
Julian Stecklina <js@alien8.de> Signed-off-by:
Avi Kivity <avi@redhat.com>
-
Xiao Guangrong authored
The operation of getting dirty log is frequent when framebuffer-based displays are used(for example, Xwindow), so, we introduce a mapping table to speed up id_to_memslot() Signed-off-by:
Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by:
Avi Kivity <avi@redhat.com>
-
Xiao Guangrong authored
Sort memslots base on its size and use line search to find it, so that the larger memslots have better fit The idea is from Avi Signed-off-by:
Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by:
Avi Kivity <avi@redhat.com>
-
Xiao Guangrong authored
Introduce id_to_memslot to get memslot by slot id Signed-off-by:
Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by:
Avi Kivity <avi@redhat.com>
-
Xiao Guangrong authored
Introduce kvm_for_each_memslot to walk all valid memslot Signed-off-by:
Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by:
Avi Kivity <avi@redhat.com>
-
Xiao Guangrong authored
Introduce update_memslots to update slot which will be update to kvm->memslots Signed-off-by:
Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by:
Avi Kivity <avi@redhat.com>
-
Xiao Guangrong authored
Introduce KVM_MEM_SLOTS_NUM macro to instead of KVM_MEMORY_SLOTS + KVM_PRIVATE_MEM_SLOTS Signed-off-by:
Xiao Guangrong <xiaoguangrong@linux.vnet.ibm.com> Signed-off-by:
Avi Kivity <avi@redhat.com>
-
Takuya Yoshikawa authored
Needed for the next patch which uses this number to decide how to write protect a slot. Signed-off-by:
Takuya Yoshikawa <yoshikawa.takuya@oss.ntt.co.jp> Signed-off-by:
Avi Kivity <avi@redhat.com>
-
Thomas Meyer authored
Use kmemdup rather than duplicating its implementation The semantic patch that makes this change is available in scripts/coccinelle/api/memdup.cocci. More information about semantic patching is available at http://coccinelle.lip6.fr/ Signed-off-by:
Thomas Meyer <thomas@m3y3r.de> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
Dan Carpenter authored
My testing version of Smatch complains that addr and len come from the user and they can wrap. The path is: -> kvm_vm_ioctl() -> kvm_vm_ioctl_unregister_coalesced_mmio() -> coalesced_mmio_in_range() I don't know what the implications are of wrapping here, but we may as well fix it, if only to silence the warning. Signed-off-by:
Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
- Dec 25, 2011
-
-
Alex Williamson authored
Only allow KVM device assignment to attach to devices which: - Are not bridges - Have BAR resources (assume others are special devices) - The user has permissions to use Assigning a bridge is a configuration error, it's not supported, and typically doesn't result in the behavior the user is expecting anyway. Devices without BAR resources are typically chipset components that also don't have host drivers. We don't want users to hold such devices captive or cause system problems by fencing them off into an iommu domain. We determine "permission to use" by testing whether the user has access to the PCI sysfs resource files. By default a normal user will not have access to these files, so it provides a good indication that an administration agent has granted the user access to the device. [Yang Bai: add missing #include] [avi: fix comment style] Signed-off-by:
Alex Williamson <alex.williamson@redhat.com> Signed-off-by:
Yang Bai <hamo.by@gmail.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
Alex Williamson authored
This option has no users and it exposes a security hole that we can allow devices to be assigned without iommu protection. Make KVM_DEV_ASSIGN_ENABLE_IOMMU a mandatory option. Signed-off-by:
Alex Williamson <alex.williamson@redhat.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-
- Nov 10, 2011
-
-
Ohad Ben-Cohen authored
When mapping a memory region, split it to page sizes as supported by the iommu hardware. Always prefer bigger pages, when possible, in order to reduce the TLB pressure. The logic to do that is now added to the IOMMU core, so neither the iommu drivers themselves nor users of the IOMMU API have to duplicate it. This allows a more lenient granularity of mappings; traditionally the IOMMU API took 'order' (of a page) as a mapping size, and directly let the low level iommu drivers handle the mapping, but now that the IOMMU core can split arbitrary memory regions into pages, we can remove this limitation, so users don't have to split those regions by themselves. Currently the supported page sizes are advertised once and they then remain static. That works well for OMAP and MSM but it would probably not fly well with intel's hardware, where the page size capabilities seem to have the potential to be different between several DMA remapping devices. register_iommu() currently sets a default pgsize behavior, so we can convert the IOMMU drivers in subsequent patches. After all the drivers are converted, the temporary default settings will be removed. Mainline users of the IOMMU API (kvm and omap-iovmm) are adopted to deal with bytes instead of page order. Many thanks to Joerg Roedel <Joerg.Roedel@amd.com> for significant review! Signed-off-by:
Ohad Ben-Cohen <ohad@wizery.com> Cc: David Brown <davidb@codeaurora.org> Cc: David Woodhouse <dwmw2@infradead.org> Cc: Joerg Roedel <Joerg.Roedel@amd.com> Cc: Stepan Moskovchenko <stepanm@codeaurora.org> Cc: KyongHo Cho <pullip.cho@samsung.com> Cc: Hiroshi DOYU <hdoyu@nvidia.com> Cc: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Cc: kvm@vger.kernel.org Signed-off-by:
Joerg Roedel <joerg.roedel@amd.com>
-
- Oct 31, 2011
-
-
Paul Gortmaker authored
This file has things like module_param_named() and MODULE_PARM_DESC() so it needs the full module.h header present. Without it, you'll get: CC arch/x86/kvm/../../../virt/kvm/iommu.o virt/kvm/iommu.c:37: error: expected ‘)’ before ‘bool’ virt/kvm/iommu.c:39: error: expected ‘)’ before string constant make[3]: *** [arch/x86/kvm/../../../virt/kvm/iommu.o] Error 1 make[2]: *** [arch/x86/kvm] Error 2 Signed-off-by:
Paul Gortmaker <paul.gortmaker@windriver.com>
-
Paul Gortmaker authored
This was coming in via an implicit module.h (and its sub-includes) before, but we'll be cleaning that up shortly. Call out the stat.h include requirement in advance. Signed-off-by:
Paul Gortmaker <paul.gortmaker@windriver.com>
-
- Oct 21, 2011
-
-
Joerg Roedel authored
With per-bus iommu_ops the iommu_found function needs to work on a bus_type too. This patch adds a bus_type parameter to that function and converts all call-places. The function is also renamed to iommu_present because the function now checks if an iommu is present for a given bus and does not check for a global iommu anymore. Signed-off-by:
Joerg Roedel <joerg.roedel@amd.com>
-
Joerg Roedel authored
This is necessary to store a pointer to the bus-specific iommu_ops in the iommu-domain structure. It will be used later to call into bus-specific iommu-ops. Signed-off-by:
Joerg Roedel <joerg.roedel@amd.com>
-
- Sep 25, 2011
-
-
Jan Kiszka authored
The threaded IRQ handler for MSI-X has almost nothing in common with the INTx/MSI handler. Move its code into a dedicated handler. Signed-off-by:
Jan Kiszka <jan.kiszka@siemens.com> Signed-off-by:
Marcelo Tosatti <mtosatti@redhat.com>
-