Skip to content
Snippets Groups Projects
  1. Sep 17, 2024
  2. Oct 12, 2023
    • Greg Ungerer's avatar
      fs: binfmt_elf_efpic: fix personality for ELF-FDPIC · 31f2a4ac
      Greg Ungerer authored and Frieder Schrempf's avatar Frieder Schrempf committed
      commit 7c315158 upstream.
      
      The elf-fdpic loader hard sets the process personality to either
      PER_LINUX_FDPIC for true elf-fdpic binaries or to PER_LINUX for normal ELF
      binaries (in this case they would be constant displacement compiled with
      -pie for example).  The problem with that is that it will lose any other
      bits that may be in the ELF header personality (such as the "bug
      emulation" bits).
      
      On the ARM architecture the ADDR_LIMIT_32BIT flag is used to signify a
      normal 32bit binary - as opposed to a legacy 26bit address binary.  This
      matters since start_thread() will set the ARM CPSR register as required
      based on this flag.  If the elf-fdpic loader loses this bit the process
      will be mis-configured and crash out pretty quickly.
      
      Modify elf-fdpic loader personality setting so that it preserves the upper
      three bytes by using the SET_PERSONALITY macro to set it.  This macro in
      the generic case sets PER_LINUX and preserves the upper bytes.
      Architectures can override this for their specific use case, and ARM does
      exactly this.
      
      The problem shows up quite easily running under qemu using the ARM
      architecture, but not necessarily on all types of real ARM hardware.  If
      the underlying ARM processor does not support the legacy 26-bit addressing
      mode then everything will work as expected.
      
      Link: https://lkml.kernel.org/r/20230907011808.2985083-1-gerg@kernel.org
      
      
      Fixes: 1bde925d ("fs/binfmt_elf_fdpic.c: provide NOMMU loader for regular ELF binaries")
      Signed-off-by: default avatarGreg Ungerer <gerg@kernel.org>
      Cc: Al Viro <viro@zeniv.linux.org.uk>
      Cc: Christian Brauner <brauner@kernel.org>
      Cc: Eric W. Biederman <ebiederm@xmission.com>
      Cc: Greg Ungerer <gerg@kernel.org>
      Cc: Kees Cook <keescook@chromium.org>
      Cc: <stable@vger.kernel.org>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
      31f2a4ac
  3. Jan 18, 2023
  4. Jan 04, 2023
  5. Mar 08, 2022
  6. Mar 02, 2022
  7. Oct 08, 2021
    • Eric W. Biederman's avatar
      coredump: Limit coredumps to a single thread group · 0258b5fd
      Eric W. Biederman authored
      
      Today when a signal is delivered with a handler of SIG_DFL whose
      default behavior is to generate a core dump not only that process but
      every process that shares the mm is killed.
      
      In the case of vfork this looks like a real world problem.  Consider
      the following well defined sequence.
      
      	if (vfork() == 0) {
      		execve(...);
      		_exit(EXIT_FAILURE);
      	}
      
      If a signal that generates a core dump is received after vfork but
      before the execve changes the mm the process that called vfork will
      also be killed (as the mm is shared).
      
      Similarly if the execve fails after the point of no return the kernel
      delivers SIGSEGV which will kill both the exec'ing process and because
      the mm is shared the process that called vfork as well.
      
      As far as I can tell this behavior is a violation of people's
      reasonable expectations, POSIX, and is unnecessarily fragile when the
      system is low on memory.
      
      Solve this by making a userspace visible change to only kill a single
      process/thread group.  This is possible because Jann Horn recently
      modified[1] the coredump code so that the mm can safely be modified
      while the coredump is happening.  With LinuxThreads long gone I don't
      expect anyone to have a notice this behavior change in practice.
      
      To accomplish this move the core_state pointer from mm_struct to
      signal_struct, which allows different thread groups to coredump
      simultatenously.
      
      In zap_threads remove the work to kill anything except for the current
      thread group.
      
      v2: Remove core_state from the VM_BUG_ON_MM print to fix
          compile failure when CONFIG_DEBUG_VM is enabled.
      Reported-by: default avatarStephen Rothwell <sfr@canb.auug.org.au>
      
      [1] a07279c9 ("binfmt_elf, binfmt_elf_fdpic: use a VMA list snapshot")
      Fixes: d89f3847def4 ("[PATCH] thread-aware coredumps, 2.5.43-C3")
      History-tree: git://git.kernel.org/pub/scm/linux/kernel/git/tglx/history.git
      Link: https://lkml.kernel.org/r/87y27mvnke.fsf@disp2133
      Link: https://lkml.kernel.org/r/20211007144701.67592574@canb.auug.org.au
      
      
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      0258b5fd
  8. Sep 03, 2021
    • David Hildenbrand's avatar
      binfmt: remove in-tree usage of MAP_DENYWRITE · 4589ff7c
      David Hildenbrand authored
      At exec time when we mmap the new executable via MAP_DENYWRITE we have it
      opened via do_open_execat() and already deny_write_access()'ed the file
      successfully. Once exec completes, we allow_write_acces(); however,
      we set mm->exe_file in begin_new_exec() via set_mm_exe_file() and
      also deny_write_access() as long as mm->exe_file remains set. We'll
      effectively deny write access to our executable via mm->exe_file
      until mm->exe_file is changed -- when the process is removed, on new
      exec, or via sys_prctl(PR_SET_MM_MAP/EXE_FILE).
      
      Let's remove all usage of MAP_DENYWRITE, it's no longer necessary for
      mm->exe_file.
      
      In case of an elf interpreter, we'll now only deny write access to the file
      during exec. This is somewhat okay, because the interpreter behaves
      (and sometime is) a shared library; all shared libraries, especially the
      ones loaded directly in user space like via dlopen() won't ever be mapped
      via MAP_DENYWRITE, because we ignore that from user space completely;
      these shared libraries can always be modified while mapped and executed.
      Let's only special-case the main executable, denying write access while
      being executed by a process. This can be considered a minor user space
      visible change.
      
      While this is a cleanup, it also fixes part of a problem reported with
      VM_DENYWRITE on overlayfs, as VM_DENYWRITE is effectively unused with
      this patch and will be removed next:
        "Overlayfs did not honor positive i_writecount on realfile for
         VM_DENYWRITE mappings." [1]
      
      [1] https://lore.kernel.org/r/YNHXzBgzRrZu1MrD@miu.piliscsaba.redhat.com/
      
      
      
      Reported-by: default avatarChengguang Xu <cgxu519@mykernel.net>
      Acked-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      Acked-by: default avatarChristian König <christian.koenig@amd.com>
      Signed-off-by: default avatarDavid Hildenbrand <david@redhat.com>
      4589ff7c
  9. Jun 29, 2021
  10. Jun 18, 2021
  11. Mar 08, 2021
    • Al Viro's avatar
      coredump: don't bother with do_truncate() · d0f1088b
      Al Viro authored
      
      have dump_skip() just remember how much needs to be skipped,
      leave actual seeks/writing zeroes to the next dump_emit()
      or the end of coredump output, whichever comes first.
      And instead of playing with do_truncate() in the end, just
      write one NUL at the end of the last gap (if any).
      
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      d0f1088b
  12. Feb 15, 2021
  13. Jan 06, 2021
    • Al Viro's avatar
      elf_prstatus: collect the common part (everything before pr_reg) into a struct · f2485a2d
      Al Viro authored
      
      Preparations to doing i386 compat elf_prstatus sanely - rather than duplicating
      the beginning of compat_elf_prstatus, take these fields into a separate
      structure (compat_elf_prstatus_common), so that it could be reused.  Due to
      the incestous relationship between binfmt_elf.c and compat_binfmt_elf.c we
      need the same shape change done to native struct elf_prstatus, gathering the
      fields prior to pr_reg into a new structure (struct elf_prstatus_common).
      
      Fortunately, offset of pr_reg is always a multiple of 16 with no padding
      right before it, so it's possible to turn all the stuff prior to it into
      a single member without disturbing the layout.
      
      [build fix from Geert Uytterhoeven folded in]
      
      Signed-off-by: default avatarAl Viro <viro@zeniv.linux.org.uk>
      f2485a2d
  14. Oct 16, 2020
    • Jann Horn's avatar
      binfmt_elf, binfmt_elf_fdpic: use a VMA list snapshot · a07279c9
      Jann Horn authored
      
      In both binfmt_elf and binfmt_elf_fdpic, use a new helper
      dump_vma_snapshot() to take a snapshot of the VMA list (including the gate
      VMA, if we have one) while protected by the mmap_lock, and then use that
      snapshot instead of walking the VMA list without locking.
      
      An alternative approach would be to keep the mmap_lock held across the
      entire core dumping operation; however, keeping the mmap_lock locked while
      we may be blocked for an unbounded amount of time (e.g.  because we're
      dumping to a FUSE filesystem or so) isn't really optimal; the mmap_lock
      blocks things like the ->release handler of userfaultfd, and we don't
      really want critical system daemons to grind to a halt just because
      someone "gifted" them SCM_RIGHTS to an eternally-locked userfaultfd, or
      something like that.
      
      Since both the normal ELF code and the FDPIC ELF code need this
      functionality (and if any other binfmt wants to add coredump support in
      the future, they'd probably need it, too), implement this with a common
      helper in fs/coredump.c.
      
      A downside of this approach is that we now need a bigger amount of kernel
      memory per userspace VMA in the normal ELF case, and that we need O(n)
      kernel memory in the FDPIC ELF case at all; but 40 bytes per VMA shouldn't
      be terribly bad.
      
      There currently is a data race between stack expansion and anything that
      reads ->vm_start or ->vm_end under the mmap_lock held in read mode; to
      mitigate that for core dumping, take the mmap_lock in write mode when
      taking a snapshot of the VMA hierarchy.  (If we only took the mmap_lock in
      read mode, we could end up with a corrupted core dump if someone does
      get_user_pages_remote() concurrently.  Not really a major problem, but
      taking the mmap_lock either way works here, so we might as well avoid the
      issue.) (This doesn't do anything about the existing data races with stack
      expansion in other mm code.)
      
      Signed-off-by: default avatarJann Horn <jannh@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: "Eric W . Biederman" <ebiederm@xmission.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Link: http://lkml.kernel.org/r/20200827114932.3572699-6-jannh@google.com
      
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      a07279c9
    • Jann Horn's avatar
      coredump: rework elf/elf_fdpic vma_dump_size() into common helper · 429a22e7
      Jann Horn authored
      
      At the moment, the binfmt_elf and binfmt_elf_fdpic code have slightly
      different code to figure out which VMAs should be dumped, and if so,
      whether the dump should contain the entire VMA or just its first page.
      
      Eliminate duplicate code by reworking the binfmt_elf version into a
      generic core dumping helper in coredump.c.
      
      As part of that, change the heuristic for detecting executable/library
      header pages to check whether the inode is executable instead of looking
      at the file mode.
      
      This is less problematic in terms of locking because it lets us avoid
      get_user() under the mmap_sem.  (And arguably it looks nicer and makes
      more sense in generic code.)
      
      Adjust a little bit based on the binfmt_elf_fdpic version: ->anon_vma is
      only meaningful under CONFIG_MMU, otherwise we have to assume that the VMA
      has been written to.
      
      Suggested-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Signed-off-by: default avatarJann Horn <jannh@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: "Eric W . Biederman" <ebiederm@xmission.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Link: http://lkml.kernel.org/r/20200827114932.3572699-5-jannh@google.com
      
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      429a22e7
    • Jann Horn's avatar
      coredump: refactor page range dumping into common helper · afc63a97
      Jann Horn authored
      
      Both fs/binfmt_elf.c and fs/binfmt_elf_fdpic.c need to dump ranges of
      pages into the coredump file.  Extract that logic into a common helper.
      
      Signed-off-by: default avatarJann Horn <jannh@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: "Eric W . Biederman" <ebiederm@xmission.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Link: http://lkml.kernel.org/r/20200827114932.3572699-4-jannh@google.com
      
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      afc63a97
    • Jann Horn's avatar
      binfmt_elf_fdpic: stop using dump_emit() on user pointers on !MMU · 8f942eea
      Jann Horn authored
      Patch series "Fix ELF / FDPIC ELF core dumping, and use mmap_lock properly in there", v5.
      
      At the moment, we have that rather ugly mmget_still_valid() helper to work
      around <https://crbug.com/project-zero/1790
      
      >: ELF core dumping doesn't
      take the mmap_sem while traversing the task's VMAs, and if anything (like
      userfaultfd) then remotely messes with the VMA tree, fireworks ensue.  So
      at the moment we use mmget_still_valid() to bail out in any writers that
      might be operating on a remote mm's VMAs.
      
      With this series, I'm trying to get rid of the need for that as cleanly as
      possible.  ("cleanly" meaning "avoid holding the mmap_lock across
      unbounded sleeps".)
      
      Patches 1, 2, 3 and 4 are relatively unrelated cleanups in the core
      dumping code.
      
      Patches 5 and 6 implement the main change: Instead of repeatedly accessing
      the VMA list with sleeps in between, we snapshot it at the start with
      proper locking, and then later we just use our copy of the VMA list.  This
      ensures that the kernel won't crash, that VMA metadata in the coredump is
      consistent even in the presence of concurrent modifications, and that any
      virtual addresses that aren't being concurrently modified have their
      contents show up in the core dump properly.
      
      The disadvantage of this approach is that we need a bit more memory during
      core dumping for storing metadata about all VMAs.
      
      At the end of the series, patch 7 removes the old workaround for this
      issue (mmget_still_valid()).
      
      I have tested:
      
       - Creating a simple core dump on X86-64 still works.
       - The created coredump on X86-64 opens in GDB and looks plausible.
       - X86-64 core dumps contain the first page for executable mappings at
         offset 0, and don't contain the first page for non-executable file
         mappings or executable mappings at offset !=0.
       - NOMMU 32-bit ARM can still generate plausible-looking core dumps
         through the FDPIC implementation. (I can't test this with GDB because
         GDB is missing some structure definition for nommu ARM, but I've
         poked around in the hexdump and it looked decent.)
      
      This patch (of 7):
      
      dump_emit() is for kernel pointers, and VMAs describe userspace memory.
      Let's be tidy here and avoid accessing userspace pointers under KERNEL_DS,
      even if it probably doesn't matter much on !MMU systems - especially given
      that it looks like we can just use the same get_dump_page() as on MMU if
      we move it out of the CONFIG_MMU block.
      
      One small change we have to make in get_dump_page() is to use
      __get_user_pages_locked() instead of __get_user_pages(), since the latter
      doesn't exist on nommu.  On mmu builds, __get_user_pages_locked() will
      just call __get_user_pages() for us.
      
      Signed-off-by: default avatarJann Horn <jannh@google.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Acked-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Cc: Christoph Hellwig <hch@lst.de>
      Cc: Alexander Viro <viro@zeniv.linux.org.uk>
      Cc: "Eric W . Biederman" <ebiederm@xmission.com>
      Cc: Oleg Nesterov <oleg@redhat.com>
      Cc: Hugh Dickins <hughd@google.com>
      Link: http://lkml.kernel.org/r/20200827114932.3572699-1-jannh@google.com
      Link: http://lkml.kernel.org/r/20200827114932.3572699-2-jannh@google.com
      
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      8f942eea
  15. Aug 07, 2020
    • Mike Rapoport's avatar
      mm: remove unneeded includes of <asm/pgalloc.h> · ca15ca40
      Mike Rapoport authored
      
      Patch series "mm: cleanup usage of <asm/pgalloc.h>"
      
      Most architectures have very similar versions of pXd_alloc_one() and
      pXd_free_one() for intermediate levels of page table.  These patches add
      generic versions of these functions in <asm-generic/pgalloc.h> and enable
      use of the generic functions where appropriate.
      
      In addition, functions declared and defined in <asm/pgalloc.h> headers are
      used mostly by core mm and early mm initialization in arch and there is no
      actual reason to have the <asm/pgalloc.h> included all over the place.
      The first patch in this series removes unneeded includes of
      <asm/pgalloc.h>
      
      In the end it didn't work out as neatly as I hoped and moving
      pXd_alloc_track() definitions to <asm-generic/pgalloc.h> would require
      unnecessary changes to arches that have custom page table allocations, so
      I've decided to move lib/ioremap.c to mm/ and make pgalloc-track.h local
      to mm/.
      
      This patch (of 8):
      
      In most cases <asm/pgalloc.h> header is required only for allocations of
      page table memory.  Most of the .c files that include that header do not
      use symbols declared in <asm/pgalloc.h> and do not require that header.
      
      As for the other header files that used to include <asm/pgalloc.h>, it is
      possible to move that include into the .c file that actually uses symbols
      from <asm/pgalloc.h> and drop the include from the header file.
      
      The process was somewhat automated using
      
      	sed -i -E '/[<"]asm\/pgalloc\.h/d' \
                      $(grep -L -w -f /tmp/xx \
                              $(git grep -E -l '[<"]asm/pgalloc\.h'))
      
      where /tmp/xx contains all the symbols defined in
      arch/*/include/asm/pgalloc.h.
      
      [rppt@linux.ibm.com: fix powerpc warning]
      
      Signed-off-by: default avatarMike Rapoport <rppt@linux.ibm.com>
      Signed-off-by: default avatarAndrew Morton <akpm@linux-foundation.org>
      Reviewed-by: default avatarPekka Enberg <penberg@kernel.org>
      Acked-by: Geert Uytterhoeven <geert@linux-m68k.org>	[m68k]
      Cc: Abdul Haleem <abdhalee@linux.vnet.ibm.com>
      Cc: Andy Lutomirski <luto@kernel.org>
      Cc: Arnd Bergmann <arnd@arndb.de>
      Cc: Christophe Leroy <christophe.leroy@csgroup.eu>
      Cc: Joerg Roedel <joro@8bytes.org>
      Cc: Max Filippov <jcmvbkbc@gmail.com>
      Cc: Peter Zijlstra <peterz@infradead.org>
      Cc: Satheesh Rajendran <sathnaga@linux.vnet.ibm.com>
      Cc: Stafford Horne <shorne@gmail.com>
      Cc: Stephen Rothwell <sfr@canb.auug.org.au>
      Cc: Steven Rostedt <rostedt@goodmis.org>
      Cc: Joerg Roedel <jroedel@suse.de>
      Cc: Matthew Wilcox <willy@infradead.org>
      Link: http://lkml.kernel.org/r/20200627143453.31835-1-rppt@kernel.org
      Link: http://lkml.kernel.org/r/20200627143453.31835-2-rppt@kernel.org
      
      
      Signed-off-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      ca15ca40
  16. Jul 27, 2020
  17. Jun 03, 2020
  18. May 27, 2020
  19. May 21, 2020
    • Eric W. Biederman's avatar
      exec: Generic execfd support · b8a61c9e
      Eric W. Biederman authored
      Most of the support for passing the file descriptor of an executable
      to an interpreter already lives in the generic code and in binfmt_elf.
      Rework the fields in binfmt_elf that deal with executable file
      descriptor passing to make executable file descriptor passing a first
      class concept.
      
      Move the fd_install from binfmt_misc into begin_new_exec after the new
      creds have been installed.  This means that accessing the file through
      /proc/<pid>/fd/N is able to see the creds for the new executable
      before allowing access to the new executables files.
      
      Performing the install of the executables file descriptor after
      the point of no return also means that nothing special needs to
      be done on error.  The exiting of the process will close all
      of it's open files.
      
      Move the would_dump from binfmt_misc into begin_new_exec right
      after would_dump is called on the bprm->file.  This makes it
      obvious this case exists and that no nesting of bprm->file is
      currently supported.
      
      In binfmt_misc the movement of fd_install into generic code means
      that it's special error exit path is no longer needed.
      
      Link: https://lkml.kernel.org/r/87y2poyd91.fsf_-_@x220.int.ebiederm.org
      
      
      Acked-by: default avatarLinus Torvalds <torvalds@linux-foundation.org>
      Reviewed-by: default avatarKees Cook <keescook@chromium.org>
      Signed-off-by: default avatar"Eric W. Biederman" <ebiederm@xmission.com>
      b8a61c9e
  20. May 07, 2020
  21. May 05, 2020
  22. Nov 15, 2019
    • Arnd Bergmann's avatar
      y2038: elfcore: Use __kernel_old_timeval for process times · e2bb80d5
      Arnd Bergmann authored
      
      We store elapsed time for a crashed process in struct elf_prstatus using
      'timeval' structures. Once glibc starts using 64-bit time_t, this becomes
      incompatible with the kernel's idea of timeval since the structure layout
      no longer matches on 32-bit architectures.
      
      This changes the definition of the elf_prstatus structure to use
      __kernel_old_timeval instead, which is hardcoded to the currently used
      binary layout. There is no risk of overflow in y2038 though, because
      the time values are all relative times, and can store up to 68 years
      of process elapsed time.
      
      There is a risk of applications breaking at build time when they
      use the new kernel headers and expect the type to be exactly 'timeval'
      rather than a structure that has the same fields as before. Those
      applications have to be modified to deal with 64-bit time_t anyway.
      
      Signed-off-by: default avatarArnd Bergmann <arnd@arndb.de>
      e2bb80d5
  23. May 30, 2019
  24. Jun 12, 2018
    • Kees Cook's avatar
      treewide: kmalloc() -> kmalloc_array() · 6da2ec56
      Kees Cook authored
      
      The kmalloc() function has a 2-factor argument form, kmalloc_array(). This
      patch replaces cases of:
      
              kmalloc(a * b, gfp)
      
      with:
              kmalloc_array(a * b, gfp)
      
      as well as handling cases of:
      
              kmalloc(a * b * c, gfp)
      
      with:
      
              kmalloc(array3_size(a, b, c), gfp)
      
      as it's slightly less ugly than:
      
              kmalloc_array(array_size(a, b), c, gfp)
      
      This does, however, attempt to ignore constant size factors like:
      
              kmalloc(4 * 1024, gfp)
      
      though any constants defined via macros get caught up in the conversion.
      
      Any factors with a sizeof() of "unsigned char", "char", and "u8" were
      dropped, since they're redundant.
      
      The tools/ directory was manually excluded, since it has its own
      implementation of kmalloc().
      
      The Coccinelle script used for this was:
      
      // Fix redundant parens around sizeof().
      @@
      type TYPE;
      expression THING, E;
      @@
      
      (
        kmalloc(
      -	(sizeof(TYPE)) * E
      +	sizeof(TYPE) * E
        , ...)
      |
        kmalloc(
      -	(sizeof(THING)) * E
      +	sizeof(THING) * E
        , ...)
      )
      
      // Drop single-byte sizes and redundant parens.
      @@
      expression COUNT;
      typedef u8;
      typedef __u8;
      @@
      
      (
        kmalloc(
      -	sizeof(u8) * (COUNT)
      +	COUNT
        , ...)
      |
        kmalloc(
      -	sizeof(__u8) * (COUNT)
      +	COUNT
        , ...)
      |
        kmalloc(
      -	sizeof(char) * (COUNT)
      +	COUNT
        , ...)
      |
        kmalloc(
      -	sizeof(unsigned char) * (COUNT)
      +	COUNT
        , ...)
      |
        kmalloc(
      -	sizeof(u8) * COUNT
      +	COUNT
        , ...)
      |
        kmalloc(
      -	sizeof(__u8) * COUNT
      +	COUNT
        , ...)
      |
        kmalloc(
      -	sizeof(char) * COUNT
      +	COUNT
        , ...)
      |
        kmalloc(
      -	sizeof(unsigned char) * COUNT
      +	COUNT
        , ...)
      )
      
      // 2-factor product with sizeof(type/expression) and identifier or constant.
      @@
      type TYPE;
      expression THING;
      identifier COUNT_ID;
      constant COUNT_CONST;
      @@
      
      (
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(TYPE) * (COUNT_ID)
      +	COUNT_ID, sizeof(TYPE)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(TYPE) * COUNT_ID
      +	COUNT_ID, sizeof(TYPE)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(TYPE) * (COUNT_CONST)
      +	COUNT_CONST, sizeof(TYPE)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(TYPE) * COUNT_CONST
      +	COUNT_CONST, sizeof(TYPE)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(THING) * (COUNT_ID)
      +	COUNT_ID, sizeof(THING)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(THING) * COUNT_ID
      +	COUNT_ID, sizeof(THING)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(THING) * (COUNT_CONST)
      +	COUNT_CONST, sizeof(THING)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(THING) * COUNT_CONST
      +	COUNT_CONST, sizeof(THING)
        , ...)
      )
      
      // 2-factor product, only identifiers.
      @@
      identifier SIZE, COUNT;
      @@
      
      - kmalloc
      + kmalloc_array
        (
      -	SIZE * COUNT
      +	COUNT, SIZE
        , ...)
      
      // 3-factor product with 1 sizeof(type) or sizeof(expression), with
      // redundant parens removed.
      @@
      expression THING;
      identifier STRIDE, COUNT;
      type TYPE;
      @@
      
      (
        kmalloc(
      -	sizeof(TYPE) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kmalloc(
      -	sizeof(TYPE) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kmalloc(
      -	sizeof(TYPE) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kmalloc(
      -	sizeof(TYPE) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(TYPE))
        , ...)
      |
        kmalloc(
      -	sizeof(THING) * (COUNT) * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kmalloc(
      -	sizeof(THING) * (COUNT) * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kmalloc(
      -	sizeof(THING) * COUNT * (STRIDE)
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      |
        kmalloc(
      -	sizeof(THING) * COUNT * STRIDE
      +	array3_size(COUNT, STRIDE, sizeof(THING))
        , ...)
      )
      
      // 3-factor product with 2 sizeof(variable), with redundant parens removed.
      @@
      expression THING1, THING2;
      identifier COUNT;
      type TYPE1, TYPE2;
      @@
      
      (
        kmalloc(
      -	sizeof(TYPE1) * sizeof(TYPE2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        kmalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(TYPE2))
        , ...)
      |
        kmalloc(
      -	sizeof(THING1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        kmalloc(
      -	sizeof(THING1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(THING1), sizeof(THING2))
        , ...)
      |
        kmalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * COUNT
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      |
        kmalloc(
      -	sizeof(TYPE1) * sizeof(THING2) * (COUNT)
      +	array3_size(COUNT, sizeof(TYPE1), sizeof(THING2))
        , ...)
      )
      
      // 3-factor product, only identifiers, with redundant parens removed.
      @@
      identifier STRIDE, SIZE, COUNT;
      @@
      
      (
        kmalloc(
      -	(COUNT) * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kmalloc(
      -	COUNT * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kmalloc(
      -	COUNT * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kmalloc(
      -	(COUNT) * (STRIDE) * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kmalloc(
      -	COUNT * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kmalloc(
      -	(COUNT) * STRIDE * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kmalloc(
      -	(COUNT) * (STRIDE) * (SIZE)
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      |
        kmalloc(
      -	COUNT * STRIDE * SIZE
      +	array3_size(COUNT, STRIDE, SIZE)
        , ...)
      )
      
      // Any remaining multi-factor products, first at least 3-factor products,
      // when they're not all constants...
      @@
      expression E1, E2, E3;
      constant C1, C2, C3;
      @@
      
      (
        kmalloc(C1 * C2 * C3, ...)
      |
        kmalloc(
      -	(E1) * E2 * E3
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kmalloc(
      -	(E1) * (E2) * E3
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kmalloc(
      -	(E1) * (E2) * (E3)
      +	array3_size(E1, E2, E3)
        , ...)
      |
        kmalloc(
      -	E1 * E2 * E3
      +	array3_size(E1, E2, E3)
        , ...)
      )
      
      // And then all remaining 2 factors products when they're not all constants,
      // keeping sizeof() as the second factor argument.
      @@
      expression THING, E1, E2;
      type TYPE;
      constant C1, C2, C3;
      @@
      
      (
        kmalloc(sizeof(THING) * C2, ...)
      |
        kmalloc(sizeof(TYPE) * C2, ...)
      |
        kmalloc(C1 * C2 * C3, ...)
      |
        kmalloc(C1 * C2, ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(TYPE) * (E2)
      +	E2, sizeof(TYPE)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(TYPE) * E2
      +	E2, sizeof(TYPE)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(THING) * (E2)
      +	E2, sizeof(THING)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	sizeof(THING) * E2
      +	E2, sizeof(THING)
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	(E1) * E2
      +	E1, E2
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	(E1) * (E2)
      +	E1, E2
        , ...)
      |
      - kmalloc
      + kmalloc_array
        (
      -	E1 * E2
      +	E1, E2
        , ...)
      )
      
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      6da2ec56
  25. Apr 11, 2018
  26. Oct 12, 2017
  27. Sep 10, 2017
  28. Sep 04, 2017
  29. Aug 01, 2017
    • Kees Cook's avatar
      binfmt: Introduce secureexec flag · c425e189
      Kees Cook authored
      
      The bprm_secureexec hook can be moved earlier. Right now, it is called
      during create_elf_tables(), via load_binary(), via search_binary_handler(),
      via exec_binprm(). Nearly all (see exception below) state used by
      bprm_secureexec is created during the bprm_set_creds hook, called from
      prepare_binprm().
      
      For all LSMs (except commoncaps described next), only the first execution
      of bprm_set_creds takes any effect (they all check bprm->called_set_creds
      which prepare_binprm() sets after the first call to the bprm_set_creds
      hook).  However, all these LSMs also only do anything with bprm_secureexec
      when they detected a secure state during their first run of bprm_set_creds.
      Therefore, it is functionally identical to move the detection into
      bprm_set_creds, since the results from secureexec here only need to be
      based on the first call to the LSM's bprm_set_creds hook.
      
      The single exception is that the commoncaps secureexec hook also examines
      euid/uid and egid/gid differences which are controlled by bprm_fill_uid(),
      via prepare_binprm(), which can be called multiple times (e.g.
      binfmt_script, binfmt_misc), and may clear the euid/egid for the final
      load (i.e. the script interpreter). However, while commoncaps specifically
      ignores bprm->cred_prepared, and runs its bprm_set_creds hook each time
      prepare_binprm() may get called, it needs to base the secureexec decision
      on the final call to bprm_set_creds. As a result, it will need special
      handling.
      
      To begin this refactoring, this adds the secureexec flag to the bprm
      struct, and calls the secureexec hook during setup_new_exec(). This is
      safe since all the cred work is finished (and past the point of no return).
      This explicit call will be removed in later patches once the hook has been
      removed.
      
      Cc: David Howells <dhowells@redhat.com>
      Signed-off-by: default avatarKees Cook <keescook@chromium.org>
      Reviewed-by: default avatarJohn Johansen <john.johansen@canonical.com>
      Acked-by: default avatarSerge Hallyn <serge@hallyn.com>
      Reviewed-by: default avatarJames Morris <james.l.morris@oracle.com>
      c425e189
Loading