Skip to content
Snippets Groups Projects
  1. Apr 30, 2008
  2. Apr 29, 2008
  3. Apr 27, 2008
  4. Apr 21, 2008
    • mark gross's avatar
      PCI: iommu: iotlb flushing · 5e0d2a6f
      mark gross authored
      
      This patch is for batching up the flushing of the IOTLB for the DMAR
      implementation found in the Intel VT-d hardware.  It works by building a list
      of to be flushed IOTLB entries and a bitmap list of which DMAR engine they are
      from.
      
      After either a high water mark (250 accessible via debugfs) or 10ms the list
      of iova's will be reclaimed and the DMAR engines associated are IOTLB-flushed.
      
      This approach recovers 15 to 20% of the performance lost when using the IOMMU
      for my netperf udp stream benchmark with small packets.  It can be disabled
      with a kernel boot parameter "intel_iommu=strict".
      
      Its use does weaken the IOMMU protections a bit.
      
      Signed-off-by: default avatarMark Gross <mgross@linux.intel.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      5e0d2a6f
    • Greg Kroah-Hartman's avatar
      PCI: remove initial bios sort of PCI devices on x86 · 1ba6ab11
      Greg Kroah-Hartman authored
      
      We currently keep 2 lists of PCI devices in the system, one in the
      driver core, and one all on its own.  This second list is sorted at boot
      time, in "BIOS" order, to try to remain compatible with older kernels
      (2.2 and earlier days).  There was also a "nosort" option to turn this
      sorting off, to remain compatible with even older kernel versions, but
      that just ends up being what we have been doing from 2.5 days...
      
      Unfortunately, the second list of devices is not really ever used to 
      determine the probing order of PCI devices or drivers[1].  That is done
      using the driver core list instead.  This change happened back in the
      early 2.5 days.
      
      Relying on BIOS ording for the binding of drivers to specific device
      names is problematic for many reasons, and userspace tools like udev
      exist to properly name devices in a persistant manner if that is needed,
      no reliance on the BIOS is needed.
      
      Matt Domsch and others at Dell noticed this back in 2006, and added a
      boot option to sort the PCI device lists (both of them) in a
      breadth-first manner to help remain compatible with the 2.4 order, if
      needed for any reason.  This option is not going away, as some systems
      rely on them.
      
      This patch removes the sorting of the internal PCI device list in "BIOS"
      mode, as it's not needed at all anymore, and hasn't for many years.
      I've also removed the PCI flags for this from some other arches that for
      some reason defined them, but never used them.
      
      This should not change the ordering of any drivers or device probing.
      
      [1] The old-style pci_get_device and pci_find_device() still used this
      sorting order, but there are very few drivers that use these functions,
      as they are deprecated for use in this manner.  If for some reason, a
      driver rely on the order and uses these functions, the breadth-first
      boot option will resolve any problem.
      
      Cc: Matt Domsch <Matt_Domsch@dell.com>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@suse.de>
      1ba6ab11
  5. Apr 19, 2008
  6. Apr 17, 2008
  7. Apr 11, 2008
  8. Apr 04, 2008
    • Paul Menage's avatar
      cgroups: add cgroup support for enabling controllers at boot time · 8bab8dde
      Paul Menage authored
      
      The effects of cgroup_disable=foo are:
      
      - foo isn't auto-mounted if you mount all cgroups in a single hierarchy
      - foo isn't visible as an individually mountable subsystem
      
      As a result there will only ever be one call to foo->create(), at init time;
      all processes will stay in this group, and the group will never be mounted on
      a visible hierarchy.  Any additional effects (e.g.  not allocating metadata)
      are up to the foo subsystem.
      
      This doesn't handle early_init subsystems (their "disabled" bit isn't set be,
      but it could easily be extended to do so if any of the early_init systems
      wanted it - I think it would just involve some nastier parameter processing
      since it would occur before the command-line argument parser had been run.
      
      Hugh said:
      
        Ballpark figures, I'm trying to get this question out rather than
        processing the exact numbers: CONFIG_CGROUP_MEM_RES_CTLR adds 15% overhead
        to the affected paths, booting with cgroup_disable=memory cuts that back to
        1% overhead (due to slightly bigger struct page).
      
        I'm no expert on distros, they may have no interest whatever in
        CONFIG_CGROUP_MEM_RES_CTLR=y; and the rest of us can easily build with or
        without it, or apply the cgroup_disable=memory patches.
      
      Unix bench's execl test result on x86_64 was
      
      == just after boot without mounting any cgroup fs.==
      mem_cgorup=off : Execl Throughput       43.0     3150.1      732.6
      mem_cgroup=on  : Execl Throughput       43.0     2932.6      682.0
      ==
      
      [lizf@cn.fujitsu.com: fix boot option parsing]
      Signed-off-by: default avatarBalbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Paul Menage <menage@google.com>
      Cc: Balbir Singh <balbir@linux.vnet.ibm.com>
      Cc: Pavel Emelyanov <xemul@openvz.org>
      Cc: KAMEZAWA Hiroyuki <kamezawa.hiroyu@jp.fujitsu.com>
      Cc: Hugh Dickins <hugh@veritas.com>
      Cc: Sudhir Kumar <skumar@linux.vnet.ibm.com>
      Cc: YAMAMOTO Takashi <yamamoto@valinux.co.jp>
      Cc: David Rientjes <rientjes@google.com>
      Signed-off-by: default avatarLi Zefan <lizf@cn.fujitsu.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8bab8dde
    • Fenghua Yu's avatar
      [IA64] Kernel parameter for max number of concurrent global TLB purges · a6c75b86
      Fenghua Yu authored
      
      The patch defines kernel parameter "nptcg=". The parameter overrides max number
      of concurrent global TLB purges which is reported from either PAL_VM_SUMMARY or
      SAL PALO.
      
      Signed-off-by: default avatarFenghua Yu <fenghua.yu@intel.com>
      Signed-off-by: default avatarTony Luck <tony.luck@intel.com>
      a6c75b86
  9. Apr 01, 2008
    • Rafael J. Wysocki's avatar
      ACPI PM: Restore the 2.6.24 suspend ordering · 7731ce63
      Rafael J. Wysocki authored
      Some time ago it turned out that our suspend code ordering broke some
      NVidia-based systems that hung if _PTS was executed with one of the PCI
      devices, specifically a USB controller, in a low power state.
      
      Then, it was noticed that the suspend code ordering was not compliant
      with ACPI 1.0, although it was compliant with ACPI 2.0 (and later), and
      it was argued that the code had to be changed for that reason (ref.
      http://bugzilla.kernel.org/show_bug.cgi?id=9528).
      
      So we did, but evidently we did wrong, because it's now turning out that
      some systems have been broken by this change. Refs:
      	http://bugzilla.kernel.org/show_bug.cgi?id=10340
      	https://bugzilla.novell.com/show_bug.cgi?id=374217#c16
      
      [ I said at that time that something like this might happend, but the
        majority of people involved thought that it was improbable due to the
        necessity to preserve the compliance of hardware with ACPI 1.0. ]
      
      This actually is a quite serious regression from 2.6.24.
      
      Moreover, the ACPI 1.0 ordering of suspend code introduced another issue
      that I have only noticed recently.  Namely, if the suspend of one of
      devices fails, the already suspended devices will be resumed without
      executing _WAK before, which leads to problems on some systems (for
      example, in such situations thermal management is broken on my HP
      nx6325).  Consequently, it also breaks suspend debugging on the affected
      systems.
      
      Note also, that the requirement to execute _PTS before suspending
      devices does not really make sense, because the device in question may
      be put into a low power state at run time for a reason unrelated to a
      system-wide suspend.
      
      For the reasons outlined above, the change of the suspend ordering
      should be reverted, which is done by the patch below.
      
      [ Felix Möller: "I am the reporter from the original Novell Bug:
      
      	https://bugzilla.novell.com/show_bug.cgi?id=374217
      
      
      
        I just tried current git head (two hours ago) with the patch (the one
        from the beginning of this thread) from Rafael and without it.  With
        the patch my MacBook does suspend without it does not." ]
      
      Signed-off-by: default avatarRafael J. Wysocki <rjw@sisk.pl>
      Tested-by: default avatarFelix Möller <felix@derklecks.de>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7731ce63
    • Robert Brose's avatar
      [POWERPC] Add kernel parameter to set l3cr for MPC745x · a78bfbfc
      Robert Brose authored
      
      Old-world powermacs don't set L2CR or L3CR on processor upgrade cards.
      This simple patch allows the setting of L3CR via a kernel parameter
      (like the existing kernel parameter to set L2CR).
      
      Signed-off-by: default avatarRobert Brose <bob@qbjnet.com>
      Signed-off-by: default avatarPaul Mackerras <paulus@samba.org>
      a78bfbfc
  10. Mar 25, 2008
  11. Mar 15, 2008
    • Linus Torvalds's avatar
      ACPI: Remove ACPI_CUSTOM_DSDT_INITRD option · 9a9e0d68
      Linus Torvalds authored
      
      This essentially reverts commit 71fc47a9
      ("ACPI: basic initramfs DSDT override support"), because the code simply
      isn't ready.
      
      It did ugly things to the init sequence to populate the rootfs image
      early, but that just ended up showing other problems with the whole
      approach.  The fact is, the VFS layer simply isn't initialized this
      early, and the relevant ACPI code should either run much later, or this
      shouldn't be done at all.
      
      For 2.6.25, we'll just pick the latter option.  We can revisit this
      concept later if necessary.
      
      Cc: Dave Hansen <haveblue@us.ibm.com>
      Cc: Tilman Schmidt <tilman@imap.cc>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Thomas Renninger <trenn@suse.de>
      Cc: Eric Piel <eric.piel@tremplin-utc.net>
      Cc: Len Brown <len.brown@intel.com>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: Markus Gaugusch <dsdt@gaugusch.at>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      9a9e0d68
  12. Mar 14, 2008
  13. Mar 12, 2008
  14. Mar 07, 2008
  15. Feb 20, 2008
    • Tejun Heo's avatar
      libata: implement libata.force module parameter · 33267325
      Tejun Heo authored
      
      This patch implements libata.force module parameter which can
      selectively override ATA port, link and device configurations
      including cable type, SATA PHY SPD limit, transfer mode and NCQ.
      
      For example, you can say "use 1.5Gbps for all fan-out ports attached
      to the second port but allow 3.0Gbps for the PMP device itself, oh,
      the device attached to the third fan-out port chokes on NCQ and
      shouldn't go over UDMA4" by the following.
      
       libata.force=2:1.5g,2.15:3.0g,2.03:noncq,udma4
      
      Signed-off-by: default avatarTejun Heo <htejun@gmail.com>
      Signed-off-by: default avatarJeff Garzik <jeff@garzik.org>
      33267325
  16. Feb 19, 2008
  17. Feb 08, 2008
  18. Feb 07, 2008
  19. Feb 06, 2008
  20. Feb 03, 2008
  21. Feb 01, 2008
  22. Jan 30, 2008
    • Willy Tarreau's avatar
      x86: GEODE add the "mfgptfix" boot time option to fix MFGPT timers · e6c4dc6c
      Willy Tarreau authored
      
      The new "mfgptfix" boot command line option may be usd to fix MFGPT
      timers on AMD Geode platforms when the BIOS has incorrectly applied
      a workaround. TinyBIOS version 0.98 is known to be affected, 0.99
      fixes the problem by letting the user disable the workaround.
      
      Signed-off-by: default avatarWilly Tarreau <w@1wt.eu>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      e6c4dc6c
    • Yinghai Lu's avatar
      x86_32: trim memory by updating e820 · 093af8d7
      Yinghai Lu authored
      
      when MTRRs are not covering the whole e820 table, we need to trim the
      RAM and need to update e820.
      
      reuse some code on 64-bit as well.
      
      here need to add early_get_cap and use it in early_cpu_detect, and move
      mtrr_bp_init early.
      
      The code successfully trimmed the memory map on Justin's system:
      
      from:
      
       [    0.000000]  BIOS-e820: 0000000100000000 - 000000022c000000 (usable)
      
      to:
      
       [    0.000000]   modified: 0000000100000000 - 0000000228000000 (usable)
       [    0.000000]   modified: 0000000228000000 - 000000022c000000 (reserved)
      
      According to Justin it makes quite a difference:
      
      |  When I boot the box without any trimming it acts like a 286 or 386,
      |  takes about 10 minutes to boot (using raptor disks).
      
      Signed-off-by: default avatarYinghai Lu <yinghai.lu@sun.com>
      Tested-by: default avatarJustin Piszcz <jpiszcz@lucidpixels.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      093af8d7
    • Andi Kleen's avatar
      x86: add generic clearcpuid=... option · ac72e788
      Andi Kleen authored
      
      Add a generic option to clear any cpuid bit. I added it because it was
      very easy to add with the new generic cpuid disable bitmap and perhaps
      it will be useful in the future.
      
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      ac72e788
    • Andi Kleen's avatar
      x86: add noclflush option · 191679fd
      Andi Kleen authored
      
      To disable CLFLUSH usage, especially in change_page_attr().
      
      Signed-off-by: default avatarAndi Kleen <ak@suse.de>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      191679fd
    • Jesse Barnes's avatar
      x86, 32-bit: trim memory not covered by wb mtrrs · 99fc8d42
      Jesse Barnes authored
      
      On some machines, buggy BIOSes don't properly setup WB MTRRs to cover all
      available RAM, meaning the last few megs (or even gigs) of memory will be
      marked uncached.  Since Linux tends to allocate from high memory addresses
      first, this causes the machine to be unusably slow as soon as the kernel
      starts really using memory (i.e.  right around init time).
      
      This patch works around the problem by scanning the MTRRs at boot and
      figuring out whether the current end_pfn value (setup by early e820 code)
      goes beyond the highest WB MTRR range, and if so, trimming it to match.  A
      fairly obnoxious KERN_WARNING is printed too, letting the user know that
      not all of their memory is available due to a likely BIOS bug.
      
      Something similar could be done on i386 if needed, but the boot ordering
      would be slightly different, since the MTRR code on i386 depends on the
      boot_cpu_data structure being setup.
      
      This patch fixes a bug in the last patch that caused the code to run on
      non-Intel machines (AMD machines apparently don't need it and it's untested
      on other non-Intel machines, so best keep it off).
      
      Further enhancements and fixes from:
      
        Yinghai Lu <Yinghai.Lu@Sun.COM>
        Andi Kleen <ak@suse.de>
      
      Signed-off-by: default avatarJesse Barnes <jesse.barnes@intel.com>
      Tested-by: default avatarJustin Piszcz <jpiszcz@lucidpixels.com>
      Cc: Andi Kleen <andi@firstfloor.org>
      Cc: "Eric W. Biederman" <ebiederm@xmission.com>
      Cc: Yinghai Lu <yhlu.kernel@gmail.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      99fc8d42
    • Yinghai Lu's avatar
      x86: disable the GART early, 64-bit · aaf23042
      Yinghai Lu authored
      
      For K8 system: 4G RAM with memory hole remapping enabled, or more than
      4G RAM installed.
      
      when try to use kexec second kernel, and the first doesn't include
      gart_shutdown. the second kernel could have different aper position than
      the first kernel. and second kernel could use that hole as RAM that is
      still used by GART set by the first kernel. esp. when try to kexec
      2.6.24 with sparse mem enable from previous kernel (from RHEL 5 or SLES
      10). the new kernel will use aper by GART (set by first kernel) for
      vmemmap. and after new kernel setting one new GART. the position will be
      real RAM. the _mapcount set is lost.
      
      Bad page state in process 'swapper'
      page:ffffe2000e600020 flags:0x0000000000000000 mapping:0000000000000000 mapcount:1 count:0
      Trying to fix it up, but a reboot is needed
      Backtrace:
      Pid: 0, comm: swapper Not tainted 2.6.24-rc7-smp-gcdf71a10-dirty #13
      
      Call Trace:
       [<ffffffff8026401f>] bad_page+0x63/0x8d
       [<ffffffff80264169>] __free_pages_ok+0x7c/0x2a5
       [<ffffffff80ba75d1>] free_all_bootmem_core+0xd0/0x198
       [<ffffffff80ba3a42>] numa_free_all_bootmem+0x3b/0x76
       [<ffffffff80ba3461>] mem_init+0x3b/0x152
       [<ffffffff80b959d3>] start_kernel+0x236/0x2c2
       [<ffffffff80b9511a>] _sinittext+0x11a/0x121
      
      and
       [ffffe2000e600000-ffffe2000e7fffff] PMD ->ffff81001c200000 on node 0
      phys addr is : 0x1c200000
      
      RHEL 5.1 kernel -53 said:
      PCI-DMA: aperture base @ 1c000000 size 65536 KB
      
      new kernel said:
      Mapping aperture over 65536 KB of RAM @ 3c000000
      
      So could try to disable that GART if possible.
      
      According to Ingo
      
      > hm, i'm wondering, instead of modifying the GART, why dont we simply
      > _detect_ whatever GART settings we have inherited, and propagate that
      > into our e820 maps? I.e. if there's inconsistency, then punch that out
      > from the memory maps and just dont use that memory.
      >
      > that way it would not matter whether the GART settings came from a [old
      > or crashing] Linux kernel that has not called gart_iommu_shutdown(), or
      > whether it's a BIOS that has set up an aperture hole inconsistent with
      > the memory map it passed. (or the memory map we _think_ i tried to pass
      > us)
      >
      > it would also be more robust to only read and do a memory map quirk
      > based on that, than actively trying to change the GART so early in the
      > bootup. Later on we have to re-enable the GART _anyway_ and have to
      > punch a hole for it.
      >
      > and as a bonus, we would have shored up our defenses against crappy
      > BIOSes as well.
      
      add e820 modification for gart inconsistent setting.
      
      gart_fix_e820=off could be used to disable e820 fix.
      
      Signed-off-by: default avatarYinghai Lu <yinghai.lu@sun.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      aaf23042
    • Arjan van de Ven's avatar
      x86: add the "print code before the trapping instruction" feature to 64 bit · a25bd949
      Arjan van de Ven authored
      
      The 32 bit x86 tree has a very useful feature that prints the Code: line
      for the code even before the trapping instrution (and the start of the
      trapping instruction is then denoted with a <>). Unfortunately, the 64 bit
      x86 tree does not yet have this feature, making diagnosing backtraces harder
      than needed.
      
      This patch adds this feature in the same was as the 32 bit tree has
      (including the same kernel boot parameter), and including a bugfix
      to make the code use probe_kernel_address() rarther than a buggy (deadlocking)
      __get_user.
      
      Signed-off-by: default avatarArjan van de Ven <arjan@linux.intel.com>
      Signed-off-by: default avatarIngo Molnar <mingo@elte.hu>
      Signed-off-by: default avatarThomas Gleixner <tglx@linutronix.de>
      a25bd949
Loading