diff --git a/Documentation/filesystems/nfs/localio.rst b/Documentation/filesystems/nfs/localio.rst index ef3851d4813307561a378c3b28d04daef34647ee..4637c0b34753eb9d4edea4da7b888a4c0c968dfe 100644 --- a/Documentation/filesystems/nfs/localio.rst +++ b/Documentation/filesystems/nfs/localio.rst @@ -150,6 +150,11 @@ FAQ __fh_verify(). So they get handled exactly the same way for LOCALIO as they do for non-LOCALIO. +9. How does LOCALIO make certain that object lifetimes are managed + properly given NFSD and NFS operate in different contexts? + + See the detailed "NFS Client and Server Interlock" section below. + RPC === @@ -209,6 +214,69 @@ objects to span from the host kernel's nfsd to per-container knfsd instances that are connected to nfs client's running on the same local host. +NFS Client and Server Interlock +=============================== + +LOCALIO provides the nfs_uuid_t object and associated interfaces to +allow proper network namespace (net-ns) and NFSD object refcounting: + + We don't want to keep a long-term counted reference on each NFSD's + net-ns in the client because that prevents a server container from + completely shutting down. + + So we avoid taking a reference at all and rely on the per-cpu + reference to the server (detailed below) being sufficient to keep + the net-ns active. This involves allowing the NFSD's net-ns exit + code to iterate all active clients and clear their ->net pointers + (which are needed to find the per-cpu-refcount for the nfsd_serv). + + Details: + + - Embed nfs_uuid_t in nfs_client. nfs_uuid_t provides a list_head + that can be used to find the client. It does add the 16-byte + uuid_t to nfs_client so it is bigger than needed (given that + uuid_t is only used during the initial NFS client and server + LOCALIO handshake to determine if they are local to each other). + If that is really a problem we can find a fix. + + - When the nfs server confirms that the uuid_t is local, it moves + the nfs_uuid_t onto a per-net-ns list in NFSD's nfsd_net. + + - When each server's net-ns is shutting down - in a "pre_exit" + handler, all these nfs_uuid_t have their ->net cleared. There is + an rcu_synchronize() call between pre_exit() handlers and exit() + handlers so any caller that sees nfs_uuid_t ->net as not NULL can + safely manage the per-cpu-refcount for nfsd_serv. + + - The client's nfs_uuid_t is passed to nfsd_open_local_fh() so it + can safely dereference ->net in a private rcu_read_lock() section + to allow safe access to the associated nfsd_net and nfsd_serv. + +So LOCALIO required the introduction and use of NFSD's percpu_ref to +interlock nfsd_destroy_serv() and nfsd_open_local_fh(), to ensure each +nn->nfsd_serv is not destroyed while in use by nfsd_open_local_fh(), and +warrants a more detailed explanation: + + nfsd_open_local_fh() uses nfsd_serv_try_get() before opening its + nfsd_file handle and then the caller (NFS client) must drop the + reference for the nfsd_file and associated nn->nfsd_serv using + nfs_file_put_local() once it has completed its IO. + + This interlock working relies heavily on nfsd_open_local_fh() being + afforded the ability to safely deal with the possibility that the + NFSD's net-ns (and nfsd_net by association) may have been destroyed + by nfsd_destroy_serv() via nfsd_shutdown_net() -- which is only + possible given the nfs_uuid_t ->net pointer managemenet detailed + above. + +All told, this elaborate interlock of the NFS client and server has been +verified to fix an easy to hit crash that would occur if an NFSD +instance running in a container, with a LOCALIO client mounted, is +shutdown. Upon restart of the container and associated NFSD the client +would go on to crash due to NULL pointer dereference that occurred due +to the LOCALIO client's attempting to nfsd_open_local_fh(), using +nn->nfsd_serv, without having a proper reference on nn->nfsd_serv. + NFS Client issues IO instead of Server ======================================