- Nov 01, 2012
-
-
Richard Cochran authored
This patch removes the timecompare code from the kernel. The top five reasons to do this are: 1. There are no more users of this code. 2. The original idea was a bit weak. 3. The original author has disappeared. 4. The code was not general purpose but tuned to a particular hardware, 5. There are better ways to accomplish clock synchronization. Signed-off-by:
Richard Cochran <richardcochran@gmail.com> Acked-by:
John Stultz <john.stultz@linaro.org> Tested-by:
Bob Liu <lliubbo@gmail.com> Signed-off-by:
David S. Miller <davem@davemloft.net>
-
- Oct 25, 2012
-
-
H. Peter Anvin authored
If one includes documentation for an external tool, it should be correct. This is not: 1. Overriding the input to rngd should typically be neither necessary nor desired. This is especially so since newer versions of rngd support a number of different *types* of sources. 2. The default kernel-exported device is called /dev/hwrng not /dev/hwrandom nor /dev/hw_random (both of which were used in the past; however, kernel and udev seem to have converged on /dev/hwrng.) Overall it is better if the documentation for rngd is kept with rngd rather than in a kernel Makefile. Signed-off-by:
H. Peter Anvin <hpa@linux.intel.com> Cc: David Howells <dhowells@redhat.com> Cc: Jeff Garzik <jgarzik@redhat.com> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Andrew Vagin authored
'struct pid' is a "variable sized struct" - a header with an array of upids at the end. The size of the array depends on a level (depth) of pid namespaces. Now a level of pidns is not limited, so 'struct pid' can be more than one page. Looks reasonable, that it should be less than a page. MAX_PIS_NS_LEVEL is not calculated from PAGE_SIZE, because in this case it depends on architectures, config options and it will be reduced, if someone adds a new fields in struct pid or struct upid. I suggest to set MAX_PIS_NS_LEVEL = 32, because it saves ability to expand "struct pid" and it's more than enough for all known for me use-cases. When someone finds a reasonable use case, we can add a config option or a sysctl parameter. In addition it will reduce the effect of another problem, when we have many nested namespaces and the oldest one starts dying. zap_pid_ns_processe will be called for each namespace and find_vpid will be called for each process in a namespace. find_vpid will be called minimum max_level^2 / 2 times. The reason of that is that when we found a bit in pidmap, we can't determine this pidns is top for this process or it isn't. vpid is a heavy operation, so a fork bomb, which create many nested namespace, can make a system inaccessible for a long time. For example my system becomes inaccessible for a few minutes with 4000 processes. [akpm@linux-foundation.org: return -EINVAL in response to excessive nesting, not -ENOMEM] Signed-off-by:
Andrew Vagin <avagin@openvz.org> Acked-by:
Oleg Nesterov <oleg@redhat.com> Cc: Cyrill Gorcunov <gorcunov@openvz.org> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Pavel Emelyanov <xemul@parallels.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Oct 24, 2012
-
-
Dan Magenheimer authored
57b30ae7 ("workqueue: reimplement cancel_delayed_work() using try_to_grab_pending()") made cancel_delayed_work() always return %true unless someone else is also trying to cancel the work item, which is broken - if the target work item is idle, the return value should be %false. try_to_grab_pending() indicates that the target work item was idle by zero return value. Use it for return. Note that this brings cancel_delayed_work() in line with __cancel_work_timer() in return value handling. Signed-off-by:
Dan Magenheimer <dan.magenheimer@oracle.com> Signed-off-by:
Tejun Heo <tj@kernel.org> LKML-Reference: <444a6439-b1a4-4740-9e7e-bc37267cfe73@default>
-
- Oct 22, 2012
-
-
Randy Dunlap authored
Fix the warning: kernel/module_signing.c:195:2: warning: format '%lu' expects type 'long unsigned int', but argument 3 has type 'size_t' by using the proper 'z' modifier for printing a size_t. Signed-off-by:
Randy Dunlap <rdunlap@xenotime.net> Cc: David Howells <dhowells@redhat.com> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Oct 20, 2012
-
-
Kees Cook authored
The min/max call needed to have explicit types on some architectures (e.g. mn10300). Use clamp_t instead to avoid the warning: kernel/sys.c: In function 'override_release': kernel/sys.c:1287:10: warning: comparison of distinct pointer types lacks a cast [enabled by default] Reported-by:
Fengguang Wu <fengguang.wu@intel.com> Signed-off-by:
Kees Cook <keescook@chromium.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
David Howells authored
Emit the magic string that indicates a module has a signature after the signature data instead of before it. This allows module_sig_check() to be made simpler and faster by the elimination of the search for the magic string. Instead we just need to do a single memcmp(). This works because at the end of the signature data there is the fixed-length signature information block. This block then falls immediately prior to the magic number. From the contents of the information block, it is trivial to calculate the size of the signature data and thus the size of the actual module data. Signed-off-by:
David Howells <dhowells@redhat.com> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Oct 19, 2012
-
-
Tejun Heo authored
This reverts commit 7e3aa30a. The commit incorrectly assumed that fork path always performed threadgroup_change_begin/end() and depended on that for synchronization against task exit and cgroup migration paths instead of explicitly grabbing task_lock(). threadgroup_change is not locked when forking a new process (as opposed to a new thread in the same process) and even if it were it wouldn't be effective as different processes use different threadgroup locks. Revert the incorrect optimization. Signed-off-by:
Tejun Heo <tj@kernel.org> LKML-Reference: <20121008020000.GB2575@localhost> Acked-by:
Li Zefan <lizefan@huawei.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: stable@vger.kernel.org
-
Tejun Heo authored
This reverts commit 7e381b0e. The commit incorrectly assumed that fork path always performed threadgroup_change_begin/end() and depended on that for synchronization against task exit and cgroup migration paths instead of explicitly grabbing task_lock(). threadgroup_change is not locked when forking a new process (as opposed to a new thread in the same process) and even if it were it wouldn't be effective as different processes use different threadgroup locks. Revert the incorrect optimization. Signed-off-by:
Tejun Heo <tj@kernel.org> LKML-Reference: <20121008020000.GB2575@localhost> Acked-by:
Li Zefan <lizefan@huawei.com> Bitterly-Acked-by:
Frederic Weisbecker <fweisbec@gmail.com> Cc: stable@vger.kernel.org
-
Cyrill Gorcunov authored
free_pid_ns() operates in a recursive fashion: free_pid_ns(parent) put_pid_ns(parent) kref_put(&ns->kref, free_pid_ns); free_pid_ns thus if there was a huge nesting of namespaces the userspace may trigger avalanche calling of free_pid_ns leading to kernel stack exhausting and a panic eventually. This patch turns the recursion into an iterative loop. Based on a patch by Andrew Vagin. [akpm@linux-foundation.org: export put_pid_ns() to modules] Signed-off-by:
Cyrill Gorcunov <gorcunov@openvz.org> Cc: Andrew Vagin <avagin@openvz.org> Cc: Oleg Nesterov <oleg@redhat.com> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Pavel Emelyanov <xemul@parallels.com> Cc: Greg KH <greg@kroah.com> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Kees Cook authored
Calling uname() with the UNAME26 personality set allows a leak of kernel stack contents. This fixes it by defensively calculating the length of copy_to_user() call, making the len argument unsigned, and initializing the stack buffer to zero (now technically unneeded, but hey, overkill). CVE-2012-0957 Reported-by:
PaX Team <pageexec@freemail.hu> Signed-off-by:
Kees Cook <keescook@chromium.org> Cc: Andi Kleen <ak@linux.intel.com> Cc: PaX Team <pageexec@freemail.hu> Cc: Brad Spengler <spender@grsecurity.net> Cc: <stable@vger.kernel.org> Signed-off-by:
Andrew Morton <akpm@linux-foundation.org> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
- Oct 17, 2012
-
-
Paul E. McKenney authored
The console_cpu_notify() function runs with interrupts disabled in the CPU_DYING case. It therefore cannot block, for example, as will happen when it calls console_lock(). Therefore, remove the CPU_DYING leg of the switch statement to avoid this problem. Signed-off-by:
Paul E. McKenney <paulmck@linux.vnet.ibm.com> Reviewed-by:
Srivatsa S. Bhat <srivatsa.bhat@linux.vnet.ibm.com> Signed-off-by:
Linus Torvalds <torvalds@linux-foundation.org>
-
Daisuke Nishimura authored
notify_on_release must be triggered when the last process in a cgroup is move to another. But if the first(and only) process in a cgroup is moved to another, notify_on_release is not triggered. # mkdir /cgroup/cpu/SRC # mkdir /cgroup/cpu/DST # # echo 1 >/cgroup/cpu/SRC/notify_on_release # echo 1 >/cgroup/cpu/DST/notify_on_release # # sleep 300 & [1] 8629 # # echo 8629 >/cgroup/cpu/SRC/tasks # echo 8629 >/cgroup/cpu/DST/tasks -> notify_on_release for /SRC must be triggered at this point, but it isn't. This is because put_css_set() is called before setting CGRP_RELEASABLE in cgroup_task_migrate(), and is a regression introduce by the commit:74a1166d(cgroups: make procs file writable), which was merged into v3.0. Cc: Ben Blum <bblum@andrew.cmu.edu> Cc: <stable@vger.kernel.org> # v3.0.x and later Acked-by:
Li Zefan <lizefan@huawei.com> Signed-off-by:
Daisuke Nishimura <nishimura@mxp.nes.nec.co.jp> Signed-off-by:
Tejun Heo <tj@kernel.org>
-
- Oct 13, 2012
-
-
Jeff Layton authored
Keep a pointer to the audit_names "slot" in struct filename. Have all of the audit_inode callers pass a struct filename ponter to audit_inode instead of a string pointer. If the aname field is already populated, then we can skip walking the list altogether and just use it directly. Signed-off-by:
Jeff Layton <jlayton@redhat.com> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Jeff Layton authored
...and fix up the callers. For do_file_open_root, just declare a struct filename on the stack and fill out the .name field. For do_filp_open, make it also take a struct filename pointer, and fix up its callers to call it appropriately. For filp_open, add a variant that takes a struct filename pointer and turn filp_open into a wrapper around it. Signed-off-by:
Jeff Layton <jlayton@redhat.com> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Jeff Layton authored
Currently, if we call getname() on a userland string more than once, we'll get multiple copies of the string and multiple audit_names records. Add a function that will allow the audit_names code to satisfy getname requests using info from the audit_names list, avoiding a new allocation and audit_names records. Signed-off-by:
Jeff Layton <jlayton@redhat.com> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Jeff Layton authored
getname() is intended to copy pathname strings from userspace into a kernel buffer. The result is just a string in kernel space. It would however be quite helpful to be able to attach some ancillary info to the string. For instance, we could attach some audit-related info to reduce the amount of audit-related processing needed. When auditing is enabled, we could also call getname() on the string more than once and not need to recopy it from userspace. This patchset converts the getname()/putname() interfaces to return a struct instead of a string. For now, the struct just tracks the string in kernel space and the original userland pointer for it. Later, we'll add other information to the struct as it becomes convenient. Signed-off-by:
Jeff Layton <jlayton@redhat.com> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
- Oct 12, 2012
-
-
Al Viro authored
* allow kernel_execve() leave the actual return to userland to caller (selected by CONFIG_GENERIC_KERNEL_EXECVE). Callers updated accordingly. * architecture that does select GENERIC_KERNEL_EXECVE in its Kconfig should have its ret_from_kernel_thread() do this: call schedule_tail call the callback left for it by copy_thread(); if it ever returns, that's because it has just done successful kernel_execve() jump to return from syscall IOW, its only difference from ret_from_fork() is that it does call the callback. * such an architecture should also get rid of ret_from_kernel_execve() and __ARCH_WANT_KERNEL_EXECVE This is the last part of infrastructure patches in that area - from that point on work on different architectures can live independently. Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Jason Wessel authored
It is possible to miss data when using the kdb pager. The kdb pager does not pay attention to the maximum column constraint of the screen or serial terminal. This result is not incrementing the shown lines correctly and the pager will print more lines that fit on the screen. Obviously that is less than useful when using a VGA console where you cannot scroll back. The pager will now look at the kdb_buffer string to see how many characters are printed. It might not be perfect considering you can output ASCII that might move the cursor position, but it is a substantially better approximation for viewing dmesg and trace logs. This also means that the vt screen needs to set the kdb COLUMNS variable. Cc: <stable@vger.kernel.org> Signed-off-by:
Jason Wessel <jason.wessel@windriver.com>
-
Jason Wessel authored
If you press 'q' the pager should exit instead of printing everything from dmesg which can really bog down a 9600 baud serial link. The same is true for the bta command. Signed-off-by:
Jason Wessel <jason.wessel@windriver.com>
-
Jason Wessel authored
Allow gdb to auto load kernel modules when it is attached, which makes it trivially easy to debug module init functions or pre-set breakpoints in a kernel module that has not loaded yet. Signed-off-by:
Jason Wessel <jason.wessel@windriver.com>
-
Jeff Layton authored
Signed-off-by:
Jeff Layton <jlayton@redhat.com> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Jeff Layton authored
In order to accomodate retrying path-based syscalls, we need to add a new "type" argument to audit_inode_child. This will tell us whether we're looking for a child entry that represents a create or a delete. If we find a parent, don't automatically assume that we need to create a new entry. Instead, use the information we have to try to find an existing entry first. Update it if one is found and create a new one if not. Signed-off-by:
Jeff Layton <jlayton@redhat.com> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Jeff Layton authored
In the cases where we already know the length of the parent, pass it as a parm so we don't need to recompute it. In the cases where we don't know the length, pass in AUDIT_NAME_FULL (-1) to indicate that it should be determined. Signed-off-by:
Jeff Layton <jlayton@redhat.com> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Eric Paris authored
Signed-off-by:
Eric Paris <eparis@redhat.com> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Jeff Layton authored
All the callers set this to NULL now. Signed-off-by:
Jeff Layton <jlayton@redhat.com> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Jeff Layton authored
Currently, this gets set mostly by happenstance when we call into audit_inode_child. While that might be a little more efficient, it seems wrong. If the syscall ends up failing before audit_inode_child ever gets called, then you'll have an audit_names record that shows the full path but has the parent inode info attached. Fix this by passing in a parent flag when we call audit_inode that gets set to the value of LOOKUP_PARENT. We can then fix up the pathname for the audit entry correctly from the get-go. While we're at it, clean up the no-op macro for audit_inode in the !CONFIG_AUDITSYSCALL case. Signed-off-by:
Jeff Layton <jlayton@redhat.com> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Jeff Layton authored
For now, we just have two possibilities: UNKNOWN: for a new audit_names record that we don't know anything about yet NORMAL: for everything else In later patches, we'll add other types so we can distinguish and update records created under different circumstances. Signed-off-by:
Jeff Layton <jlayton@redhat.com> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Jeff Layton authored
Most of the callers get called with an inode and dentry in the reverse order. The compiler then has to reshuffle the arg registers and/or stack in order to pass them on to audit_inode_child. Reverse those arguments for a micro-optimization. Reported-by:
Eric Paris <eparis@redhat.com> Signed-off-by:
Jeff Layton <jlayton@redhat.com> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Jeff Layton authored
If name is NULL then the condition in the loop will never be true. Also, with this change, we can eliminate the check for n->name == NULL since the equivalence check will never be true if it is. Signed-off-by:
Jeff Layton <jlayton@redhat.com> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Jeff Layton authored
In some cases, we were passing in NULL even when we have a dentry. Reported-by:
Eric Paris <eparis@redhat.com> Signed-off-by:
Jeff Layton <jlayton@redhat.com> Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
Al Viro authored
Most of them never returned anyway - only two functions had to be changed. That allows to simplify their callers a whole lot. Note that this does *not* apply to kthread_run() callbacks - all of those had been called from the same kernel_thread() callback, which did do_exit() already. This is strictly about very few low-level kernel_thread() callbacks (there are only 6 of those, mostly as part of kthread.h and kmod.h exported mechanisms, plus kernel_init() itself). Signed-off-by:
Al Viro <viro@zeniv.linux.org.uk>
-
- Oct 11, 2012
-
-
Vaibhav Nagarnaik authored
With a system where, num_present_cpus < num_possible_cpus, even if all CPUs are online, non-present CPUs don't have per_cpu buffers allocated. If per_cpu/<cpu>/buffer_size_kb is modified for such a CPU, it can cause a panic due to NULL dereference in ring_buffer_resize(). To fix this, resize operation is allowed only if the per-cpu buffer has been initialized. Link: http://lkml.kernel.org/r/1349912427-6486-1-git-send-email-vnagarnaik@google.com Cc: stable@vger.kernel.org # 3.5+ Signed-off-by:
Vaibhav Nagarnaik <vnagarnaik@google.com> Signed-off-by:
Steven Rostedt <rostedt@goodmis.org>
-
- Oct 10, 2012
-
-
Rusty Russell authored
It doesn't, because the clean targets don't include kernel/Makefile, and because two files were missing from the list. Signed-off-by:
Rusty Russell <rusty@rustcorp.com.au>
-
David Howells authored
Place an indication that the certificate should use utf8 strings into the x509.genkey template generated by kernel/Makefile. Signed-off-by:
David Howells <dhowells@redhat.com> Signed-off-by:
Rusty Russell <rusty@rustcorp.com.au>
-
David Howells authored
Use the same digest type for the autogenerated key signature as for the module signature so that the hash algorithm is guaranteed to be present in the kernel. Without this, the X.509 certificate loader may reject the X.509 certificate so generated because it was self-signed and the signature will be checked against itself - but this won't work if the digest algorithm must be loaded as a module. The symptom is that the key fails to load with the following message emitted into the kernel log: MODSIGN: Problem loading in-kernel X.509 certificate (-65) the error in brackets being -ENOPKG. What you should see is something like: MODSIGN: Loaded cert 'Magarathea: Glacier signing key: 9588321144239a119d3406d4c4cf1fbae1836fa0' Note that this doesn't apply to certificates that are not self-signed as we don't check those currently as they require the parent CA certificate to be available. Reported-by:
Rusty Russell <rusty@rustcorp.com.au> Signed-off-by:
David Howells <dhowells@redhat.com> Signed-off-by:
Rusty Russell <rusty@rustcorp.com.au>
-
David Howells authored
Check the signature on the module against the keys compiled into the kernel or available in a hardware key store. Currently, only RSA keys are supported - though that's easy enough to change, and the signature is expected to contain raw components (so not a PGP or PKCS#7 formatted blob). The signature blob is expected to consist of the following pieces in order: (1) The binary identifier for the key. This is expected to match the SubjectKeyIdentifier from an X.509 certificate. Only X.509 type identifiers are currently supported. (2) The signature data, consisting of a series of MPIs in which each is in the format of a 2-byte BE word sizes followed by the content data. (3) A 12 byte information block of the form: struct module_signature { enum pkey_algo algo : 8; enum pkey_hash_algo hash : 8; enum pkey_id_type id_type : 8; u8 __pad; __be32 id_length; __be32 sig_length; }; The three enums are defined in crypto/public_key.h. 'algo' contains the public-key algorithm identifier (0->DSA, 1->RSA). 'hash' contains the digest algorithm identifier (0->MD4, 1->MD5, 2->SHA1, etc.). 'id_type' contains the public-key identifier type (0->PGP, 1->X.509). '__pad' should be 0. 'id_length' should contain in the binary identifier length in BE form. 'sig_length' should contain in the signature data length in BE form. The lengths are in BE order rather than CPU order to make dealing with cross-compilation easier. Signed-off-by:
David Howells <dhowells@redhat.com> Signed-off-by: Rusty Russell <rusty@rustcorp.com.au> (minor Kconfig fix)
-
David Howells authored
Include a PGP keyring containing the public keys required to perform module verification in the kernel image during build and create a special keyring during boot which is then populated with keys of crypto type holding the public keys found in the PGP keyring. These can be seen by root: [root@andromeda ~]# cat /proc/keys 07ad4ee0 I----- 1 perm 3f010000 0 0 crypto modsign.0: RSA 87b9b3bd [] 15c7f8c3 I----- 1 perm 1f030000 0 0 keyring .module_sign: 1/4 ... It is probably worth permitting root to invalidate these keys, resulting in their removal and preventing further modules from being loaded with that key. Signed-off-by:
David Howells <dhowells@redhat.com> Signed-off-by:
Rusty Russell <rusty@rustcorp.com.au>
-
David Howells authored
Automatically generate keys for module signing if they're absent so that allyesconfig doesn't break. The builder should consider generating their own key and certificate, however, so that the keys are appropriately named. The private key for the module signer should be placed in signing_key.priv (unencrypted!) and the public key in an X.509 certificate as signing_key.x509. If a transient key is desired for signing the modules, a config file for 'openssl req' can be placed in x509.genkey, looking something like the following: [ req ] default_bits = 4096 distinguished_name = req_distinguished_name prompt = no x509_extensions = myexts [ req_distinguished_name ] O = Magarathea CN = Glacier signing key emailAddress = slartibartfast@magrathea.h2g2 [ myexts ] basicConstraints=critical,CA:FALSE keyUsage=digitalSignature subjectKeyIdentifier=hash authorityKeyIdentifier=hash The build process will use this to configure: openssl req -new -nodes -utf8 -sha1 -days 36500 -batch \ -x509 -config x509.genkey \ -outform DER -out signing_key.x509 \ -keyout signing_key.priv to generate the key. Note that it is required that the X.509 certificate have a subjectKeyIdentifier and an authorityKeyIdentifier. Without those, the certificate will be rejected. These can be used to check the validity of a certificate. Note that 'make distclean' will remove signing_key.{priv,x509} and x509.genkey, whether or not they were generated automatically. Signed-off-by:
David Howells <dhowells@redhat.com> Signed-off-by:
Rusty Russell <rusty@rustcorp.com.au>
-
David Howells authored
If we're in FIPS mode, we should panic if we fail to verify the signature on a module or we're asked to load an unsigned module in signature enforcing mode. Possibly FIPS mode should automatically enable enforcing mode. Signed-off-by:
David Howells <dhowells@redhat.com> Signed-off-by:
Rusty Russell <rusty@rustcorp.com.au>
-