Skip to content
Snippets Groups Projects
  1. Feb 07, 2017
  2. Jan 27, 2017
  3. Jan 25, 2017
    • Dave Airlie's avatar
      Revert "drm/probe-helpers: Drop locking from poll_enable" · 54a07c7b
      Dave Airlie authored
      This reverts commit 3846fd9b.
      
      There were some precursor commits missing for this around connector
      locking, we should probably merge Lyude's nouveau avoid the problem patch.
      54a07c7b
    • Geert Uytterhoeven's avatar
      net: phy: leds: Fix truncated LED trigger names · 3c880eb0
      Geert Uytterhoeven authored
      
      Commit 4567d686 ("phy: increase size of MII_BUS_ID_SIZE and
      bus_id") increased the size of MII bus IDs, but forgot to update the
      private definition in <linux/phy_led_triggers.h>.
      This may cause:
        1. Truncation of LED trigger names,
        2. Duplicate LED trigger names,
        3. Failures registering LED triggers,
        4. Crashes due to bad error handling in the LED trigger failure path.
      
      To fix this, and prevent the definitions going out of sync again in the
      future, let the PHY LED trigger code use the existing MII_BUS_ID_SIZE
      definition.
      
      Example:
        - Before I had triggers "ee700000.etherne:01:100Mbps" and
          "ee700000.etherne:01:10Mbps",
        - After the increase of MII_BUS_ID_SIZE, both became
          "ee700000.ethernet-ffffffff:01:" => FAIL,
        - Now, the triggers are "ee700000.ethernet-ffffffff:01:100Mbps" and
          "ee700000.ethernet-ffffffff:01:10Mbps", which are unique again.
      
      Fixes: 4567d686 ("phy: increase size of MII_BUS_ID_SIZE and bus_id")
      Fixes: 2e0bc452 ("net: phy: leds: add support for led triggers on phy link state change")
      Signed-off-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3c880eb0
    • Geert Uytterhoeven's avatar
      net: phy: leds: Break dependency of phy.h on phy_led_triggers.h · d6f8cfa3
      Geert Uytterhoeven authored
      
      <linux/phy.h> includes <linux/phy_led_triggers.h>, which is not really
      needed.  Drop the include from <linux/phy.h>, and add it to all users
      that didn't include it explicitly.
      
      Suggested-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Reviewed-by: default avatarAndrew Lunn <andrew@lunn.ch>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d6f8cfa3
    • Vlastimil Babka's avatar
      mm, page_alloc: fix check for NULL preferred_zone · ea57485a
      Vlastimil Babka authored
      Patch series "fix premature OOM regression in 4.7+ due to cpuset races".
      
      This is v2 of my attempt to fix the recent report based on LTP cpuset
      stress test [1].  The intention is to go to stable 4.9 LTSS with this,
      as triggering repeated OOMs is not nice.  That's why the patches try to
      be not too intrusive.
      
      Unfortunately why investigating I found that modifying the testcase to
      use per-VMA policies instead of per-task policies will bring the OOM's
      back, but that seems to be much older and harder to fix problem.  I have
      posted a RFC [2] but I believe that fixing the recent regressions has a
      higher priority.
      
      Longer-term we might try to think how to fix the cpuset mess in a better
      and less error prone way.  I was for example very surprised to learn,
      that cpuset updates change not only task->mems_allowed, but also
      nodemask of mempolicies.  Until now I expected the parameter to
      alloc_pages_nodemask() to be stable.  I wonder why do we then treat
      cpusets specially in get_page_from_freelist() and distinguish HARDWALL
      etc, when there's unconditional intersection between mempolicy and
      cpuset.  I would expect the nodemask adjustment for saving overhead in
      g_p_f(), but that clearly doesn't happen in the current form.  So we
      have both crazy complexity and overhead, AFAICS.
      
      [1] https://lkml.kernel.org/r/CAFpQJXUq-JuEP=QPidy4p_=FN0rkH5Z-kfB4qBvsf6jMS87Edg@mail.gmail.com
      [2] https://lkml.kernel.org/r/7c459f26-13a6-a817-e508-b65b903a8378@suse.cz
      
      This patch (of 4):
      
      Since commit c33d6c06 ("mm, page_alloc: avoid looking up the first
      zone in a zonelist twice") we have a wrong check for NULL preferred_zone,
      which can theoretically happen due to concurrent cpuset modification.  We
      check the zoneref pointer which is never NULL and we should check the zone
      pointer.  Also document this in first_zones_zonelist() comment per Michal
      Hocko.
      
      Fixes: c33d6c06 ("mm, page_alloc: avoid looking up the first zone in a zonelist twice")
      Link: http://lkml.kernel.org/r/20170120103843.24587-2-vbabka@suse.cz
      
      
      Signed-off-by: default avatarVlastimil Babka <vbabka@suse.cz>
      Acked-by: default avatarMel Gorman <mgorman@techsingularity.net>
      Acked-by: default avatarHillf Danton <hillf.zj@alibaba-inc.com>
      Cc: Ganapatrao Kulkarni <gpkulkarni@gmail.com>
      Cc: Michal Hocko <mhocko@suse.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ea57485a
    • Don Zickus's avatar
      kernel/watchdog: prevent false hardlockup on overloaded system · b94f5118
      Don Zickus authored
      On an overloaded system, it is possible that a change in the watchdog
      threshold can be delayed long enough to trigger a false positive.
      
      This can easily be achieved by having a cpu spinning indefinitely on a
      task, while another cpu updates watchdog threshold.
      
      What happens is while trying to park the watchdog threads, the hrtimers
      on the other cpus trigger and reprogram themselves with the new slower
      watchdog threshold.  Meanwhile, the nmi watchdog is still programmed
      with the old faster threshold.
      
      Because the one cpu is blocked, it prevents the thread parking on the
      other cpus from completing, which is needed to shutdown the nmi watchdog
      and reprogram it correctly.  As a result, a false positive from the nmi
      watchdog is reported.
      
      Fix this by setting a park_in_progress flag to block all lockups until
      the parking is complete.
      
      Fix provided by Ulrich Obergfell.
      
      [akpm@linux-foundation.org: s/park_in_progress/watchdog_park_in_progress/]
      Link: http://lkml.kernel.org/r/1481041033-192236-1-git-send-email-dzickus@redhat.com
      
      
      Signed-off-by: default avatarDon Zickus <dzickus@redhat.com>
      Reviewed-by: default avatarAaron Tomlin <atomlin@redhat.com>
      Cc: Ulrich Obergfell <uobergfe@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      b94f5118
    • Yasuaki Ishimatsu's avatar
      memory_hotplug: make zone_can_shift() return a boolean value · 8a1f780e
      Yasuaki Ishimatsu authored
      online_{kernel|movable} is used to change the memory zone to
      ZONE_{NORMAL|MOVABLE} and online the memory.
      
      To check that memory zone can be changed, zone_can_shift() is used.
      Currently the function returns minus integer value, plus integer
      value and 0. When the function returns minus or plus integer value,
      it means that the memory zone can be changed to ZONE_{NORNAL|MOVABLE}.
      
      But when the function returns 0, there are two meanings.
      
      One of the meanings is that the memory zone does not need to be changed.
      For example, when memory is in ZONE_NORMAL and onlined by online_kernel
      the memory zone does not need to be changed.
      
      Another meaning is that the memory zone cannot be changed. When memory
      is in ZONE_NORMAL and onlined by online_movable, the memory zone may
      not be changed to ZONE_MOVALBE due to memory online limitation(see
      Documentation/memory-hotplug.txt). In this case, memory must not be
      onlined.
      
      The patch changes the return type of zone_can_shift() so that memory
      online operation fails when memory zone cannot be changed as follows:
      
      Before applying patch:
         # grep -A 35 "Node 2" /proc/zoneinfo
         Node 2, zone   Normal
         <snip>
            node_scanned  0
                 spanned  8388608
                 present  7864320
                 managed  7864320
         # echo online_movable > memory4097/state
         # grep -A 35 "Node 2" /proc/zoneinfo
         Node 2, zone   Normal
         <snip>
            node_scanned  0
                 spanned  8388608
                 present  8388608
                 managed  8388608
      
         online_movable operation succeeded. But memory is onlined as
         ZONE_NORMAL, not ZONE_MOVABLE.
      
      After applying patch:
         # grep -A 35 "Node 2" /proc/zoneinfo
         Node 2, zone   Normal
         <snip>
            node_scanned  0
                 spanned  8388608
                 present  7864320
                 managed  7864320
         # echo online_movable > memory4097/state
         bash: echo: write error: Invalid argument
         # grep -A 35 "Node 2" /proc/zoneinfo
         Node 2, zone   Normal
         <snip>
            node_scanned  0
                 spanned  8388608
                 present  7864320
                 managed  7864320
      
         online_movable operation failed because of failure of changing
         the memory zone from ZONE_NORMAL to ZONE_MOVABLE
      
      Fixes: df429ac0 ("memory-hotplug: more general validation of zone during online")
      Link: http://lkml.kernel.org/r/2f9c3837-33d7-b6e5-59c0-6ca4372b2d84@gmail.com
      
      
      Signed-off-by: default avatarYasuaki Ishimatsu <isimatu.yasuaki@jp.fujitsu.com>
      Reviewed-by: default avatarReza Arbab <arbab@linux.vnet.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8a1f780e
  4. Jan 24, 2017
  5. Jan 20, 2017
  6. Jan 19, 2017
  7. Jan 18, 2017
    • Daniel Borkmann's avatar
      bpf: don't trigger OOM killer under pressure with map alloc · d407bd25
      Daniel Borkmann authored
      
      This patch adds two helpers, bpf_map_area_alloc() and bpf_map_area_free(),
      that are to be used for map allocations. Using kmalloc() for very large
      allocations can cause excessive work within the page allocator, so i) fall
      back earlier to vmalloc() when the attempt is considered costly anyway,
      and even more importantly ii) don't trigger OOM killer with any of the
      allocators.
      
      Since this is based on a user space request, for example, when creating
      maps with element pre-allocation, we really want such requests to fail
      instead of killing other user space processes.
      
      Also, don't spam the kernel log with warnings should any of the allocations
      fail under pressure. Given that, we can make backend selection in
      bpf_map_area_alloc() generic, and convert all maps over to use this API
      for spots with potentially large allocation requests.
      
      Note, replacing the one kmalloc_array() is fine as overflow checks happen
      earlier in htab_map_alloc(), since it must also protect the multiplication
      for vmalloc() should kmalloc_array() fail.
      
      Signed-off-by: default avatarDaniel Borkmann <daniel@iogearbox.net>
      Acked-by: default avatarAlexei Starovoitov <ast@kernel.org>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d407bd25
    • David Ahern's avatar
      lwtunnel: fix autoload of lwt modules · 9ed59592
      David Ahern authored
      
      Trying to add an mpls encap route when the MPLS modules are not loaded
      hangs. For example:
      
          CONFIG_MPLS=y
          CONFIG_NET_MPLS_GSO=m
          CONFIG_MPLS_ROUTING=m
          CONFIG_MPLS_IPTUNNEL=m
      
          $ ip route add 10.10.10.10/32 encap mpls 100 via inet 10.100.1.2
      
      The ip command hangs:
      root       880   826  0 21:25 pts/0    00:00:00 ip route add 10.10.10.10/32 encap mpls 100 via inet 10.100.1.2
      
          $ cat /proc/880/stack
          [<ffffffff81065a9b>] call_usermodehelper_exec+0xd6/0x134
          [<ffffffff81065efc>] __request_module+0x27b/0x30a
          [<ffffffff814542f6>] lwtunnel_build_state+0xe4/0x178
          [<ffffffff814aa1e4>] fib_create_info+0x47f/0xdd4
          [<ffffffff814ae451>] fib_table_insert+0x90/0x41f
          [<ffffffff814a8010>] inet_rtm_newroute+0x4b/0x52
          ...
      
      modprobe is trying to load rtnl-lwt-MPLS:
      
      root       881     5  0 21:25 ?        00:00:00 /sbin/modprobe -q -- rtnl-lwt-MPLS
      
      and it hangs after loading mpls_router:
      
          $ cat /proc/881/stack
          [<ffffffff81441537>] rtnl_lock+0x12/0x14
          [<ffffffff8142ca2a>] register_netdevice_notifier+0x16/0x179
          [<ffffffffa0033025>] mpls_init+0x25/0x1000 [mpls_router]
          [<ffffffff81000471>] do_one_initcall+0x8e/0x13f
          [<ffffffff81119961>] do_init_module+0x5a/0x1e5
          [<ffffffff810bd070>] load_module+0x13bd/0x17d6
          ...
      
      The problem is that lwtunnel_build_state is called with rtnl lock
      held preventing mpls_init from registering.
      
      Given the potential references held by the time lwtunnel_build_state it
      can not drop the rtnl lock to the load module. So, extract the module
      loading code from lwtunnel_build_state into a new function to validate
      the encap type. The new function is called while converting the user
      request into a fib_config which is well before any table, device or
      fib entries are examined.
      
      Fixes: 745041e2 ("lwtunnel: autoload of lwt modules")
      Signed-off-by: default avatarDavid Ahern <dsa@cumulusnetworks.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9ed59592
  8. Jan 17, 2017
  9. Jan 16, 2017
  10. Jan 15, 2017
    • Paul E. McKenney's avatar
      rcu: Narrow early boot window of illegal synchronous grace periods · 52d7e48b
      Paul E. McKenney authored
      
      The current preemptible RCU implementation goes through three phases
      during bootup.  In the first phase, there is only one CPU that is running
      with preemption disabled, so that a no-op is a synchronous grace period.
      In the second mid-boot phase, the scheduler is running, but RCU has
      not yet gotten its kthreads spawned (and, for expedited grace periods,
      workqueues are not yet running.  During this time, any attempt to do
      a synchronous grace period will hang the system (or complain bitterly,
      depending).  In the third and final phase, RCU is fully operational and
      everything works normally.
      
      This has been OK for some time, but there has recently been some
      synchronous grace periods showing up during the second mid-boot phase.
      This code worked "by accident" for awhile, but started failing as soon
      as expedited RCU grace periods switched over to workqueues in commit
      8b355e3b ("rcu: Drive expedited grace periods from workqueue").
      Note that the code was buggy even before this commit, as it was subject
      to failure on real-time systems that forced all expedited grace periods
      to run as normal grace periods (for example, using the rcu_normal ksysfs
      parameter).  The callchain from the failure case is as follows:
      
      early_amd_iommu_init()
      |-> acpi_put_table(ivrs_base);
      |-> acpi_tb_put_table(table_desc);
      |-> acpi_tb_invalidate_table(table_desc);
      |-> acpi_tb_release_table(...)
      |-> acpi_os_unmap_memory
      |-> acpi_os_unmap_iomem
      |-> acpi_os_map_cleanup
      |-> synchronize_rcu_expedited
      
      The kernel showing this callchain was built with CONFIG_PREEMPT_RCU=y,
      which caused the code to try using workqueues before they were
      initialized, which did not go well.
      
      This commit therefore reworks RCU to permit synchronous grace periods
      to proceed during this mid-boot phase.  This commit is therefore a
      fix to a regression introduced in v4.9, and is therefore being put
      forward post-merge-window in v4.10.
      
      This commit sets a flag from the existing rcu_scheduler_starting()
      function which causes all synchronous grace periods to take the expedited
      path.  The expedited path now checks this flag, using the requesting task
      to drive the expedited grace period forward during the mid-boot phase.
      Finally, this flag is updated by a core_initcall() function named
      rcu_exp_runtime_mode(), which causes the runtime codepaths to be used.
      
      Note that this arrangement assumes that tasks are not sent POSIX signals
      (or anything similar) from the time that the first task is spawned
      through core_initcall() time.
      
      Fixes: 8b355e3b ("rcu: Drive expedited grace periods from workqueue")
      Reported-by: default avatar"Zheng, Lv" <lv.zheng@intel.com>
      Reported-by: default avatarBorislav Petkov <bp@alien8.de>
      Signed-off-by: default avatarPaul E. McKenney <paulmck@linux.vnet.ibm.com>
      Tested-by: default avatarStan Kain <stan.kain@gmail.com>
      Tested-by: default avatarIvan <waffolz@hotmail.com>
      Tested-by: default avatarEmanuel Castelo <emanuel.castelo@gmail.com>
      Tested-by: default avatarBruno Pesavento <bpesavento@infinito.it>
      Tested-by: default avatarBorislav Petkov <bp@suse.de>
      Tested-by: default avatarFrederic Bezies <fredbezies@gmail.com>
      Cc: <stable@vger.kernel.org> # 4.9.0-
      52d7e48b
    • Dave Kleikamp's avatar
      coredump: Ensure proper size of sparse core files · 4d22c75d
      Dave Kleikamp authored
      
      If the last section of a core file ends with an unmapped or zero page,
      the size of the file does not correspond with the last dump_skip() call.
      gdb complains that the file is truncated and can be confusing to users.
      
      After all of the vma sections are written, make sure that the file size
      is no smaller than the current file position.
      
      This problem can be demonstrated with gdb's bigcore testcase on the
      sparc architecture.
      
      Signed-off-by: default avatarDave Kleikamp <dave.kleikamp@oracle.com>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: linux-fsdevel@vger.kernel.org
      Cc: linux-kernel@vger.kernel.org
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      4d22c75d
  11. Jan 14, 2017
    • Peter Jones's avatar
      efi/x86: Prune invalid memory map entries and fix boot regression · 0100a3e6
      Peter Jones authored
      
      Some machines, such as the Lenovo ThinkPad W541 with firmware GNET80WW
      (2.28), include memory map entries with phys_addr=0x0 and num_pages=0.
      
      These machines fail to boot after the following commit,
      
        commit 8e80632f ("efi/esrt: Use efi_mem_reserve() and avoid a kmalloc()")
      
      Fix this by removing such bogus entries from the memory map.
      
      Furthermore, currently the log output for this case (with efi=debug)
      looks like:
      
       [    0.000000] efi: mem45: [Reserved           |   |  |  |  |  |  |  |  |  |  |  |  ] range=[0x0000000000000000-0xffffffffffffffff] (0MB)
      
      This is clearly wrong, and also not as informative as it could be.  This
      patch changes it so that if we find obviously invalid memory map
      entries, we print an error and skip those entries.  It also detects the
      display of the address range calculation overflow, so the new output is:
      
       [    0.000000] efi: [Firmware Bug]: Invalid EFI memory map entries:
       [    0.000000] efi: mem45: [Reserved           |   |  |  |  |  |  |  |   |  |  |  |  ] range=[0x0000000000000000-0x0000000000000000] (invalid)
      
      It also detects memory map sizes that would overflow the physical
      address, for example phys_addr=0xfffffffffffff000 and
      num_pages=0x0200000000000001, and prints:
      
       [    0.000000] efi: [Firmware Bug]: Invalid EFI memory map entries:
       [    0.000000] efi: mem45: [Reserved           |   |  |  |  |  |  |  |   |  |  |  |  ] range=[phys_addr=0xfffffffffffff000-0x20ffffffffffffffff] (invalid)
      
      It then removes these entries from the memory map.
      
      Signed-off-by: default avatarPeter Jones <pjones@redhat.com>
      Signed-off-by: default avatarArd Biesheuvel <ard.biesheuvel@linaro.org>
      [ardb: refactor for clarity with no functional changes, avoid PAGE_SHIFT]
      Signed-off-by: default avatarMatt Fleming <matt@codeblueprint.co.uk>
      [Matt: Include bugzilla info in commit log]
      Cc: <stable@vger.kernel.org> # v4.9+
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Link: https://bugzilla.kernel.org/show_bug.cgi?id=191121
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      0100a3e6
    • Jiri Olsa's avatar
      perf/x86/intel: Account interrupts for PEBS errors · 475113d9
      Jiri Olsa authored
      
      It's possible to set up PEBS events to get only errors and not
      any data, like on SNB-X (model 45) and IVB-EP (model 62)
      via 2 perf commands running simultaneously:
      
          taskset -c 1 ./perf record -c 4 -e branches:pp -j any -C 10
      
      This leads to a soft lock up, because the error path of the
      intel_pmu_drain_pebs_nhm() does not account event->hw.interrupt
      for error PEBS interrupts, so in case you're getting ONLY
      errors you don't have a way to stop the event when it's over
      the max_samples_per_tick limit:
      
        NMI watchdog: BUG: soft lockup - CPU#22 stuck for 22s! [perf_fuzzer:5816]
        ...
        RIP: 0010:[<ffffffff81159232>]  [<ffffffff81159232>] smp_call_function_single+0xe2/0x140
        ...
        Call Trace:
         ? trace_hardirqs_on_caller+0xf5/0x1b0
         ? perf_cgroup_attach+0x70/0x70
         perf_install_in_context+0x199/0x1b0
         ? ctx_resched+0x90/0x90
         SYSC_perf_event_open+0x641/0xf90
         SyS_perf_event_open+0x9/0x10
         do_syscall_64+0x6c/0x1f0
         entry_SYSCALL64_slow_path+0x25/0x25
      
      Add perf_event_account_interrupt() which does the interrupt
      and frequency checks and call it from intel_pmu_drain_pebs_nhm()'s
      error path.
      
      We keep the pending_kill and pending_wakeup logic only in the
      __perf_event_overflow() path, because they make sense only if
      there's any data to deliver.
      
      Signed-off-by: default avatarJiri Olsa <jolsa@kernel.org>
      Signed-off-by: default avatarPeter Zijlstra (Intel) <peterz@infradead.org>
      Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com>
      Cc: Arnaldo Carvalho de Melo <acme@kernel.org>
      Cc: Arnaldo Carvalho de Melo <acme@redhat.com>
      Cc: Jiri Olsa <jolsa@redhat.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Stephane Eranian <eranian@google.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Vince Weaver <vince@deater.net>
      Cc: Vince Weaver <vincent.weaver@maine.edu>
      Link: http://lkml.kernel.org/r/1482931866-6018-2-git-send-email-jolsa@kernel.org
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      475113d9
  12. Jan 13, 2017
Loading