Skip to content
Snippets Groups Projects
  1. Jun 08, 2020
    • Rafael Aquini's avatar
      kernel: add panic_on_taint · db38d5c1
      Rafael Aquini authored
      
      Analogously to the introduction of panic_on_warn, this patch introduces
      a kernel option named panic_on_taint in order to provide a simple and
      generic way to stop execution and catch a coredump when the kernel gets
      tainted by any given flag.
      
      This is useful for debugging sessions as it avoids having to rebuild the
      kernel to explicitly add calls to panic() into the code sites that
      introduce the taint flags of interest.
      
      For instance, if one is interested in proceeding with a post-mortem
      analysis at the point a given code path is hitting a bad page (i.e.
      unaccount_page_cache_page(), or slab_bug()), a coredump can be collected
      by rebooting the kernel with 'panic_on_taint=0x20' amended to the
      command line.
      
      Another, perhaps less frequent, use for this option would be as a means
      for assuring a security policy case where only a subset of taints, or no
      single taint (in paranoid mode), is allowed for the running system.  The
      optional switch 'nousertaint' is handy in this particular scenario, as
      it will avoid userspace induced crashes by writes to sysctl interface
      /proc/sys/kernel/tainted causing false positive hits for such policies.
      
      [akpm@linux-foundation.org: tweak kernel-parameters.txt wording]
      
      Suggested-by: default avatarQian Cai <cai@lca.pw>
      Signed-off-by: default avatarRafael Aquini <aquini@redhat.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarLuis Chamberlain <mcgrof@kernel.org>
      Cc: Dave Young <dyoung@redhat.com>
      Cc: Baoquan He <bhe@redhat.com>
      Cc: Jonathan Corbet <corbet@lwn.net>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Randy Dunlap <rdunlap@infradead.org>
      Cc: "Theodore Ts'o" <tytso@mit.edu>
      Cc: Adrian Bunk <bunk@kernel.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Laura Abbott <labbott@redhat.com>
      Cc: Jeff Mahoney <jeffm@suse.com>
      Cc: Jiri Kosina <jikos@kernel.org>
      Cc: Takashi Iwai <tiwai@suse.de>
      Link: http://lkml.kernel.org/r/20200515175502.146720-1-aquini@redhat.com
      
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      db38d5c1
  2. Nov 25, 2019
  3. Oct 07, 2019
    • Will Deacon's avatar
      panic: ensure preemption is disabled during panic() · 20bb759a
      Will Deacon authored
      Calling 'panic()' on a kernel with CONFIG_PREEMPT=y can leave the
      calling CPU in an infinite loop, but with interrupts and preemption
      enabled.  From this state, userspace can continue to be scheduled,
      despite the system being "dead" as far as the kernel is concerned.
      
      This is easily reproducible on arm64 when booting with "nosmp" on the
      command line; a couple of shell scripts print out a periodic "Ping"
      message whilst another triggers a crash by writing to
      /proc/sysrq-trigger:
      
        | sysrq: Trigger a crash
        | Kernel panic - not syncing: sysrq triggered crash
        | CPU: 0 PID: 1 Comm: init Not tainted 5.2.15 #1
        | Hardware name: linux,dummy-virt (DT)
        | Call trace:
        |  dump_backtrace+0x0/0x148
        |  show_stack+0x14/0x20
        |  dump_stack+0xa0/0xc4
        |  panic+0x140/0x32c
        |  sysrq_handle_reboot+0x0/0x20
        |  __handle_sysrq+0x124/0x190
        |  write_sysrq_trigger+0x64/0x88
        |  proc_reg_write+0x60/0xa8
        |  __vfs_write+0x18/0x40
        |  vfs_write+0xa4/0x1b8
        |  ksys_write+0x64/0xf0
        |  __arm64_sys_write+0x14/0x20
        |  el0_svc_common.constprop.0+0xb0/0x168
        |  el0_svc_handler+0x28/0x78
        |  el0_svc+0x8/0xc
        | Kernel Offset: disabled
        | CPU features: 0x0002,24002004
        | Memory Limit: none
        | ---[ end Kernel panic - not syncing: sysrq triggered crash ]---
        |  Ping 2!
        |  Ping 1!
        |  Ping 1!
        |  Ping 2!
      
      The issue can also be triggered on x86 kernels if CONFIG_SMP=n,
      otherwise local interrupts are disabled in 'smp_send_stop()'.
      
      Disable preemption in 'panic()' before re-enabling interrupts.
      
      Link: http://lkml.kernel.org/r/20191002123538.22609-1-will@kernel.org
      Link: https://lore.kernel.org/r/BX1W47JXPMR8.58IYW53H6M5N@dragonstone
      
      
      Signed-off-by: default avatarWill Deacon <will@kernel.org>
      Reported-by: default avatarXogium <contact@xogium.me>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Russell King <linux@armlinux.org.uk>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Ingo Molnar <mingo@redhat.com>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Feng Tang <feng.tang@intel.com>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      20bb759a
  4. Sep 26, 2019
    • Kees Cook's avatar
      bug: consolidate __WARN_FLAGS usage · 2da1ead4
      Kees Cook authored
      Instead of having separate tests for __WARN_FLAGS, merge the two #ifdef
      blocks and replace the synonym WANT_WARN_ON_SLOWPATH macro.
      
      Link: http://lkml.kernel.org/r/20190819234111.9019-7-keescook@chromium.org
      
      
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: Drew Davenport <ddavenport@chromium.org>
      Cc: Feng Tang <feng.tang@intel.com>
      Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
      Cc: YueHaibing <yuehaibing@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      2da1ead4
    • Kees Cook's avatar
      bug: lift "cut here" out of __warn() · d38aba49
      Kees Cook authored
      In preparation for cleaning up "cut here", move the "cut here" logic up
      out of __warn() and into callers that pass non-NULL args.  For anyone
      looking closely, there are two callers that pass NULL args: one already
      explicitly prints "cut here".  The remaining case is covered by how a WARN
      is built, which will be cleaned up in the next patch.
      
      Link: http://lkml.kernel.org/r/20190819234111.9019-5-keescook@chromium.org
      
      
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: Drew Davenport <ddavenport@chromium.org>
      Cc: Feng Tang <feng.tang@intel.com>
      Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
      Cc: YueHaibing <yuehaibing@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      d38aba49
    • Kees Cook's avatar
      bug: consolidate warn_slowpath_fmt() usage · f2f84b05
      Kees Cook authored
      Instead of having a separate helper for no printk output, just consolidate
      the logic into warn_slowpath_fmt().
      
      Link: http://lkml.kernel.org/r/20190819234111.9019-4-keescook@chromium.org
      
      
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: Drew Davenport <ddavenport@chromium.org>
      Cc: Feng Tang <feng.tang@intel.com>
      Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
      Cc: YueHaibing <yuehaibing@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      f2f84b05
    • Kees Cook's avatar
      bug: refactor away warn_slowpath_fmt_taint() · ee871133
      Kees Cook authored
      Patch series "Clean up WARN() "cut here" handling", v2.
      
      Christophe Leroy noticed that the fix for missing "cut here" in the WARN()
      case was adding explicit printk() calls instead of teaching the exception
      handler to add it.  This refactors the bug/warn infrastructure to pass
      this information as a new BUGFLAG.
      
      Longer details repeated from the last patch in the series:
      
      bug: move WARN_ON() "cut here" into exception handler
      
      The original cleanup of "cut here" missed the WARN_ON() case (that does
      not have a printk message), which was fixed recently by adding an explicit
      printk of "cut here".  This had the downside of adding a printk() to every
      WARN_ON() caller, which reduces the utility of using an instruction
      exception to streamline the resulting code.  By making this a new BUGFLAG,
      all of these can be removed and "cut here" can be handled by the exception
      handler.
      
      This was very pronounced on PowerPC, but the effect can be seen on x86 as
      well.  The resulting text size of a defconfig build shows some small
      savings from this patch:
      
         text    data     bss     dec     hex filename
      19691167        5134320 1646664 26472151        193eed7 vmlinux.before
      19676362        5134260 1663048 26473670        193f4c6 vmlinux.after
      
      This change also opens the door for creating something like BUG_MSG(),
      where a custom printk() before issuing BUG(), without confusing the "cut
      here" line.
      
      This patch (of 7):
      
      There's no reason to have specialized helpers for passing the warn taint
      down to __warn().  Consolidate and refactor helper macros, removing
      __WARN_printf() and warn_slowpath_fmt_taint().
      
      Link: http://lkml.kernel.org/r/20190819234111.9019-2-keescook@chromium.org
      
      
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Christophe Leroy <christophe.leroy@c-s.fr>
      Cc: Drew Davenport <ddavenport@chromium.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
      Cc: Feng Tang <feng.tang@intel.com>
      Cc: Petr Mladek <pmladek@suse.com>
      Cc: Mauro Carvalho Chehab <mchehab+samsung@kernel.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: YueHaibing <yuehaibing@huawei.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ee871133
    • Douglas Anderson's avatar
      kgdb: don't use a notifier to enter kgdb at panic; call directly · 7d92bda2
      Douglas Anderson authored
      Right now kgdb/kdb hooks up to debug panics by registering for the panic
      notifier.  This works OK except that it means that kgdb/kdb gets called
      _after_ the CPUs in the system are taken offline.  That means that if
      anything important was happening on those CPUs (like something that might
      have contributed to the panic) you can't debug them.
      
      Specifically I ran into a case where I got a panic because a task was
      "blocked for more than 120 seconds" which was detected on CPU 2.  I nicely
      got shown stack traces in the kernel log for all CPUs including CPU 0,
      which was running 'PID: 111 Comm: kworker/0:1H' and was in the middle of
      __mmc_switch().
      
      I then ended up at the kdb prompt where switched over to kgdb to try to
      look at local variables of the process on CPU 0.  I found that I couldn't.
      Digging more, I found that I had no info on any tasks running on CPUs
      other than CPU 2 and that asking kdb for help showed me "Error: no saved
      data for this cpu".  This was because all the CPUs were offline.
      
      Let's move the entry of kdb/kgdb to a direct call from panic() and stop
      using the generic notifier.  Putting a direct call in allows us to order
      things more properly and it also doesn't seem like we're breaking any
      abstractions by calling into the debugger from the panic function.
      
      Daniel said:
      
      : This patch changes the way kdump and kgdb interact with each other.
      : However it would seem rather odd to have both tools simultaneously armed
      : and, even if they were, the user still has the option to use panic_timeout
      : to force a kdump to happen.  Thus I think the change of order is
      : acceptable.
      
      Link: http://lkml.kernel.org/r/20190703170354.217312-1-dianders@chromium.org
      
      
      Signed-off-by: default avatarDouglas Anderson <dianders@chromium.org>
      Reviewed-by: default avatarDaniel Thompson <daniel.thompson@linaro.org>
      Cc: Jason Wessel <jason.wessel@windriver.com>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: Borislav Petkov <bp@suse.de>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: Feng Tang <feng.tang@intel.com>
      Cc: YueHaibing <yuehaibing@huawei.com>
      Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
      Cc: "Steven Rostedt (VMware)" <rostedt@goodmis.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      7d92bda2
  5. Jul 15, 2019
  6. May 21, 2019
  7. May 18, 2019
  8. May 15, 2019
  9. May 02, 2019
    • Martin Schwidefsky's avatar
      s390: simplify disabled_wait · 98587c2d
      Martin Schwidefsky authored
      
      The disabled_wait() function uses its argument as the PSW address when
      it stops the CPU with a wait PSW that is disabled for interrupts.
      The different callers sometimes use a specific number like 0xdeadbeef
      to indicate a specific failure, the early boot code uses 0 and some
      other calls sites use __builtin_return_address(0).
      
      At the time a dump is created the current PSW and the registers of a
      CPU are written to lowcore to make them avaiable to the dump analysis
      tool. For a CPU stopped with disabled_wait the PSW and the registers
      do not really make sense together, the PSW address does not point to
      the function the registers belong to.
      
      Simplify disabled_wait() by using _THIS_IP_ for the PSW address and
      drop the argument to the function.
      
      Signed-off-by: default avatarMartin Schwidefsky <schwidefsky@de.ibm.com>
      98587c2d
  10. Mar 08, 2019
  11. Jan 04, 2019
  12. Nov 22, 2018
    • Sergey Senozhatsky's avatar
      panic: avoid deadlocks in re-entrant console drivers · c7c3f05e
      Sergey Senozhatsky authored
      From printk()/serial console point of view panic() is special, because
      it may force CPU to re-enter printk() or/and serial console driver.
      Therefore, some of serial consoles drivers are re-entrant. E.g. 8250:
      
      serial8250_console_write()
      {
      	if (port->sysrq)
      		locked = 0;
      	else if (oops_in_progress)
      		locked = spin_trylock_irqsave(&port->lock, flags);
      	else
      		spin_lock_irqsave(&port->lock, flags);
      	...
      }
      
      panic() does set oops_in_progress via bust_spinlocks(1), so in theory
      we should be able to re-enter serial console driver from panic():
      
      	CPU0
      	<NMI>
      	uart_console_write()
      	serial8250_console_write()		// if (oops_in_progress)
      						//    spin_trylock_irqsave()
      	call_console_drivers()
      	console_unlock()
      	console_flush_on_panic()
      	bust_spinlocks(1)			// oops_in_progress++
      	panic()
      	<NMI/>
      	spin_lock_irqsave(&port->lock, flags)   // spin_lock_irqsave()
      	serial8250_console_write()
      	call_console_drivers()
      	console_unlock()
      	printk()
      	...
      
      However, this does not happen and we deadlock in serial console on
      port->lock spinlock. And the problem is that console_flush_on_panic()
      called after bust_spinlocks(0):
      
      void panic(const char *fmt, ...)
      {
      	bust_spinlocks(1);
      	...
      	bust_spinlocks(0);
      	console_flush_on_panic();
      	...
      }
      
      bust_spinlocks(0) decrements oops_in_progress, so oops_in_progress
      can go back to zero. Thus even re-entrant console drivers will simply
      spin on port->lock spinlock. Given that port->lock may already be
      locked either by a stopped CPU, or by the very same CPU we execute
      panic() on (for instance, NMI panic() on printing CPU) the system
      deadlocks and does not reboot.
      
      Fix this by removing bust_spinlocks(0), so oops_in_progress is always
      set in panic() now and, thus, re-entrant console drivers will trylock
      the port->lock instead of spinning on it forever, when we call them
      from console_flush_on_panic().
      
      Link: http://lkml.kernel.org/r/20181025101036.6823-1-sergey.senozhatsky@gmail.com
      
      
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Daniel Wang <wonderfly@google.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
      Cc: Alan Cox <gnomes@lxorguk.ukuu.org.uk>
      Cc: Jiri Slaby <jslaby@suse.com>
      Cc: Peter Feiner <pfeiner@google.com>
      Cc: linux-serial@vger.kernel.org
      Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarSergey Senozhatsky <sergey.senozhatsky@gmail.com>
      Signed-off-by: default avatarPetr Mladek <pmladek@suse.com>
      c7c3f05e
  13. Oct 31, 2018
  14. Jun 14, 2018
    • Linus Torvalds's avatar
      Kbuild: rename CC_STACKPROTECTOR[_STRONG] config variables · 050e9baa
      Linus Torvalds authored
      
      The changes to automatically test for working stack protector compiler
      support in the Kconfig files removed the special STACKPROTECTOR_AUTO
      option that picked the strongest stack protector that the compiler
      supported.
      
      That was all a nice cleanup - it makes no sense to have the AUTO case
      now that the Kconfig phase can just determine the compiler support
      directly.
      
      HOWEVER.
      
      It also meant that doing "make oldconfig" would now _disable_ the strong
      stackprotector if you had AUTO enabled, because in a legacy config file,
      the sane stack protector configuration would look like
      
        CONFIG_HAVE_CC_STACKPROTECTOR=y
        # CONFIG_CC_STACKPROTECTOR_NONE is not set
        # CONFIG_CC_STACKPROTECTOR_REGULAR is not set
        # CONFIG_CC_STACKPROTECTOR_STRONG is not set
        CONFIG_CC_STACKPROTECTOR_AUTO=y
      
      and when you ran this through "make oldconfig" with the Kbuild changes,
      it would ask you about the regular CONFIG_CC_STACKPROTECTOR (that had
      been renamed from CONFIG_CC_STACKPROTECTOR_REGULAR to just
      CONFIG_CC_STACKPROTECTOR), but it would think that the STRONG version
      used to be disabled (because it was really enabled by AUTO), and would
      disable it in the new config, resulting in:
      
        CONFIG_HAVE_CC_STACKPROTECTOR=y
        CONFIG_CC_HAS_STACKPROTECTOR_NONE=y
        CONFIG_CC_STACKPROTECTOR=y
        # CONFIG_CC_STACKPROTECTOR_STRONG is not set
        CONFIG_CC_HAS_SANE_STACKPROTECTOR=y
      
      That's dangerously subtle - people could suddenly find themselves with
      the weaker stack protector setup without even realizing.
      
      The solution here is to just rename not just the old RECULAR stack
      protector option, but also the strong one.  This does that by just
      removing the CC_ prefix entirely for the user choices, because it really
      is not about the compiler support (the compiler support now instead
      automatially impacts _visibility_ of the options to users).
      
      This results in "make oldconfig" actually asking the user for their
      choice, so that we don't have any silent subtle security model changes.
      The end result would generally look like this:
      
        CONFIG_HAVE_CC_STACKPROTECTOR=y
        CONFIG_CC_HAS_STACKPROTECTOR_NONE=y
        CONFIG_STACKPROTECTOR=y
        CONFIG_STACKPROTECTOR_STRONG=y
        CONFIG_CC_HAS_SANE_STACKPROTECTOR=y
      
      where the "CC_" versions really are about internal compiler
      infrastructure, not the user selections.
      
      Acked-by: default avatarMasahiro Yamada <yamada.masahiro@socionext.com>
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      050e9baa
  15. Apr 11, 2018
  16. Apr 06, 2018
  17. Mar 10, 2018
  18. Mar 08, 2018
  19. Nov 18, 2017
  20. Aug 17, 2017
    • Kees Cook's avatar
      locking/refcounts, x86/asm: Implement fast refcount overflow protection · 7a46ec0e
      Kees Cook authored
      This implements refcount_t overflow protection on x86 without a noticeable
      performance impact, though without the fuller checking of REFCOUNT_FULL.
      
      This is done by duplicating the existing atomic_t refcount implementation
      but with normally a single instruction added to detect if the refcount
      has gone negative (e.g. wrapped past INT_MAX or below zero). When detected,
      the handler saturates the refcount_t to INT_MIN / 2. With this overflow
      protection, the erroneous reference release that would follow a wrap back
      to zero is blocked from happening, avoiding the class of refcount-overflow
      use-after-free vulnerabilities entirely.
      
      Only the overflow case of refcounting can be perfectly protected, since
      it can be detected and stopped before the reference is freed and left to
      be abused by an attacker. There isn't a way to block early decrements,
      and while REFCOUNT_FULL stops increment-from-zero cases (which would
      be the state _after_ an early decrement and stops potential double-free
      conditions), this fast implementation does not, since it would require
      the more expensive cmpxchg loops. Since the overflow case is much more
      common (e.g. missing a "put" during an error path), this protection
      provides real-world protection. For example, the two public refcount
      overflow use-after-free exploits published in 2016 would have been
      rendered unexploitable:
      
        http://perception-point.io/2016/01/14/analysis-and-exploitation-of-a-linux-kernel-vulnerability-cve-2016-0728/
      
        http://cyseclabs.com/page?n=02012016
      
      
      
      This implementation does, however, notice an unchecked decrement to zero
      (i.e. caller used refcount_dec() instead of refcount_dec_and_test() and it
      resulted in a zero). Decrements under zero are noticed (since they will
      have resulted in a negative value), though this only indicates that a
      use-after-free may have already happened. Such notifications are likely
      avoidable by an attacker that has already exploited a use-after-free
      vulnerability, but it's better to have them reported than allow such
      conditions to remain universally silent.
      
      On first overflow detection, the refcount value is reset to INT_MIN / 2
      (which serves as a saturation value) and a report and stack trace are
      produced. When operations detect only negative value results (such as
      changing an already saturated value), saturation still happens but no
      notification is performed (since the value was already saturated).
      
      On the matter of races, since the entire range beyond INT_MAX but before
      0 is negative, every operation at INT_MIN / 2 will trap, leaving no
      overflow-only race condition.
      
      As for performance, this implementation adds a single "js" instruction
      to the regular execution flow of a copy of the standard atomic_t refcount
      operations. (The non-"and_test" refcount_dec() function, which is uncommon
      in regular refcount design patterns, has an additional "jz" instruction
      to detect reaching exactly zero.) Since this is a forward jump, it is by
      default the non-predicted path, which will be reinforced by dynamic branch
      prediction. The result is this protection having virtually no measurable
      change in performance over standard atomic_t operations. The error path,
      located in .text.unlikely, saves the refcount location and then uses UD0
      to fire a refcount exception handler, which resets the refcount, handles
      reporting, and returns to regular execution. This keeps the changes to
      .text size minimal, avoiding return jumps and open-coded calls to the
      error reporting routine.
      
      Example assembly comparison:
      
      refcount_inc() before:
      
        .text:
        ffffffff81546149:       f0 ff 45 f4             lock incl -0xc(%rbp)
      
      refcount_inc() after:
      
        .text:
        ffffffff81546149:       f0 ff 45 f4             lock incl -0xc(%rbp)
        ffffffff8154614d:       0f 88 80 d5 17 00       js     ffffffff816c36d3
        ...
        .text.unlikely:
        ffffffff816c36d3:       48 8d 4d f4             lea    -0xc(%rbp),%rcx
        ffffffff816c36d7:       0f ff                   (bad)
      
      These are the cycle counts comparing a loop of refcount_inc() from 1
      to INT_MAX and back down to 0 (via refcount_dec_and_test()), between
      unprotected refcount_t (atomic_t), fully protected REFCOUNT_FULL
      (refcount_t-full), and this overflow-protected refcount (refcount_t-fast):
      
        2147483646 refcount_inc()s and 2147483647 refcount_dec_and_test()s:
      		    cycles		protections
        atomic_t           82249267387	none
        refcount_t-fast    82211446892	overflow, untested dec-to-zero
        refcount_t-full   144814735193	overflow, untested dec-to-zero, inc-from-zero
      
      This code is a modified version of the x86 PAX_REFCOUNT atomic_t
      overflow defense from the last public patch of PaX/grsecurity, based
      on my understanding of the code. Changes or omissions from the original
      code are mine and don't reflect the original grsecurity/PaX code. Thanks
      to PaX Team for various suggestions for improvement for repurposing this
      code to be a refcount-only protection.
      
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Reviewed-by: default avatarJosh Poimboeuf <jpoimboe@redhat.com>
      Cc: Alexey Dobriyan <adobriyan@gmail.com>
      Cc: Andrew Morton <akpm@linux-foundation.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Christoph Hellwig <hch@infradead.org>
      Cc: David S. Miller <davem@davemloft.net>
      Cc: Davidlohr Bueso <dave@stgolabs.net>
      Cc: Elena Reshetova <elena.reshetova@intel.com>
      Cc: Eric Biggers <ebiggers3@gmail.com>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Greg KH <gregkh@linuxfoundation.org>
      Cc: Hans Liljestrand <ishkamiel@gmail.com>
      Cc: James Bottomley <James.Bottomley@hansenpartnership.com>
      Cc: Jann Horn <jannh@google.com>
      Cc: Linus Torvalds <torvalds@linux-foundation.org>
      Cc: Manfred Spraul <manfred@colorfullife.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Rik van Riel <riel@redhat.com>
      Cc: Serge E. Hallyn <serge@hallyn.com>
      Cc: Thomas Gleixner <tglx@linutronix.de>
      Cc: arozansk@redhat.com
      Cc: axboe@kernel.dk
      Cc: kernel-hardening@lists.openwall.com
      Cc: linux-arch <linux-arch@vger.kernel.org>
      Link: http://lkml.kernel.org/r/20170815161924.GA133115@beast
      
      
      Signed-off-by: default avatarIngo Molnar <mingo@kernel.org>
      7a46ec0e
  21. Mar 02, 2017
  22. Feb 23, 2017
  23. Feb 08, 2017
  24. Jan 25, 2017
  25. Jan 17, 2017
  26. Nov 26, 2016
    • Petr Mladek's avatar
      taint/module: Clean up global and module taint flags handling · 7fd8329b
      Petr Mladek authored
      The commit 66cc69e3 ("Fix: module signature vs tracepoints:
      add new TAINT_UNSIGNED_MODULE") updated module_taint_flags() to
      potentially print one more character. But it did not increase the
      size of the corresponding buffers in m_show() and print_modules().
      
      We have recently done the same mistake when adding a taint flag
      for livepatching, see
      https://lkml.kernel.org/r/cfba2c823bb984690b73572aaae1db596b54a082.1472137475.git.jpoimboe@redhat.com
      
      
      
      Also struct module uses an incompatible type for mod-taints flags.
      It survived from the commit 2bc2d61a ("[PATCH] list module
      taint flags in Oops/panic"). There was used "int" for the global taint
      flags at these times. But only the global tain flags was later changed
      to "unsigned long" by the commit 25ddbb18 ("Make the taint
      flags reliable").
      
      This patch defines TAINT_FLAGS_COUNT that can be used to create
      arrays and buffers of the right size. Note that we could not use
      enum because the taint flag indexes are used also in assembly code.
      
      Then it reworks the table that describes the taint flags. The TAINT_*
      numbers can be used as the index. Instead, we add information
      if the taint flag is also shown per-module.
      
      Finally, it uses "unsigned long", bit operations, and the updated
      taint_flags table also for mod->taints.
      
      It is not optimal because only few taint flags can be printed by
      module_taint_flags(). But better be on the safe side. IMHO, it is
      not worth the optimization and this is a good compromise.
      
      Signed-off-by: default avatarPetr Mladek <pmladek@suse.com>
      Link: http://lkml.kernel.org/r/1474458442-21581-1-git-send-email-pmladek@suse.com
      
      
      [jeyu@redhat.com: fix broken lkml link in changelog]
      Signed-off-by: default avatarJessica Yu <jeyu@redhat.com>
      7fd8329b
Loading