From 743162013d40ca612b4cb53d3a200dff2d9ab26e Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Mon, 7 Jul 2014 15:16:04 +1000 Subject: sched: Remove proliferation of wait_on_bit() action functions The current "wait_on_bit" interface requires an 'action' function to be provided which does the actual waiting. There are over 20 such functions, many of them identical. Most cases can be satisfied by one of just two functions, one which uses io_schedule() and one which just uses schedule(). So: Rename wait_on_bit and wait_on_bit_lock to wait_on_bit_action and wait_on_bit_lock_action to make it explicit that they need an action function. Introduce new wait_on_bit{,_lock} and wait_on_bit{,_lock}_io which are *not* given an action function but implicitly use a standard one. The decision to error-out if a signal is pending is now made based on the 'mode' argument rather than being encoded in the action function. All instances of the old wait_on_bit and wait_on_bit_lock which can use the new version have been changed accordingly and their action functions have been discarded. wait_on_bit{_lock} does not return any specific error code in the event of a signal so the caller must check for non-zero and interpolate their own error code as appropriate. The wait_on_bit() call in __fscache_wait_on_invalidate() was ambiguous as it specified TASK_UNINTERRUPTIBLE but used fscache_wait_bit_interruptible as an action function. David Howells confirms this should be uniformly "uninterruptible" The main remaining user of wait_on_bit{,_lock}_action is NFS which needs to use a freezer-aware schedule() call. A comment in fs/gfs2/glock.c notes that having multiple 'action' functions is useful as they display differently in the 'wchan' field of 'ps'. (and /proc/$PID/wchan). As the new bit_wait{,_io} functions are tagged "__sched", they will not show up at all, but something higher in the stack. So the distinction will still be visible, only with different function names (gds2_glock_wait versus gfs2_glock_dq_wait in the gfs2/glock.c case). Since first version of this patch (against 3.15) two new action functions appeared, on in NFS and one in CIFS. CIFS also now uses an action function that makes the same freezer aware schedule call as NFS. Signed-off-by: NeilBrown Acked-by: David Howells (fscache, keys) Acked-by: Steven Whitehouse (gfs2) Acked-by: Peter Zijlstra Cc: Oleg Nesterov Cc: Steve French Cc: Linus Torvalds Link: http://lkml.kernel.org/r/20140707051603.28027.72349.stgit@notabene.brown Signed-off-by: Ingo Molnar --- fs/nfs/inode.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) (limited to 'fs/nfs/inode.c') diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c index 9927913c97c2..b7b710e7d08e 100644 --- a/fs/nfs/inode.c +++ b/fs/nfs/inode.c @@ -1074,8 +1074,8 @@ int nfs_revalidate_mapping(struct inode *inode, struct address_space *mapping) * the bit lock here if it looks like we're going to be doing that. */ for (;;) { - ret = wait_on_bit(bitlock, NFS_INO_INVALIDATING, - nfs_wait_bit_killable, TASK_KILLABLE); + ret = wait_on_bit_action(bitlock, NFS_INO_INVALIDATING, + nfs_wait_bit_killable, TASK_KILLABLE); if (ret) goto out; spin_lock(&inode->i_lock); -- cgit v1.2.3 From c1221321b7c25b53204447cff9949a6d5a7ddddc Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Mon, 7 Jul 2014 15:16:04 +1000 Subject: sched: Allow wait_on_bit_action() functions to support a timeout It is currently not possible for various wait_on_bit functions to implement a timeout. While the "action" function that is called to do the waiting could certainly use schedule_timeout(), there is no way to carry forward the remaining timeout after a false wake-up. As false-wakeups a clearly possible at least due to possible hash collisions in bit_waitqueue(), this is a real problem. The 'action' function is currently passed a pointer to the word containing the bit being waited on. No current action functions use this pointer. So changing it to something else will be a little noisy but will have no immediate effect. This patch changes the 'action' function to take a pointer to the "struct wait_bit_key", which contains a pointer to the word containing the bit so nothing is really lost. It also adds a 'private' field to "struct wait_bit_key", which is initialized to zero. An action function can now implement a timeout with something like static int timed_out_waiter(struct wait_bit_key *key) { unsigned long waited; if (key->private == 0) { key->private = jiffies; if (key->private == 0) key->private -= 1; } waited = jiffies - key->private; if (waited > 10 * HZ) return -EAGAIN; schedule_timeout(waited - 10 * HZ); return 0; } If any other need for context in a waiter were found it would be easy to use ->private for some other purpose, or even extend "struct wait_bit_key". My particular need is to support timeouts in nfs_release_page() to avoid deadlocks with loopback mounted NFS. While wait_on_bit_timeout() would be a cleaner interface, it will not meet my need. I need the timeout to be sensitive to the state of the connection with the server, which could change. So I need to use an 'action' interface. Signed-off-by: NeilBrown Acked-by: Peter Zijlstra Cc: Oleg Nesterov Cc: Steve French Cc: David Howells Cc: Steven Whitehouse Cc: Linus Torvalds Link: http://lkml.kernel.org/r/20140707051604.28027.41257.stgit@notabene.brown Signed-off-by: Ingo Molnar --- fs/nfs/inode.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) (limited to 'fs/nfs/inode.c') diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c index b7b710e7d08e..abd37a380535 100644 --- a/fs/nfs/inode.c +++ b/fs/nfs/inode.c @@ -75,7 +75,7 @@ nfs_fattr_to_ino_t(struct nfs_fattr *fattr) * nfs_wait_bit_killable - helper for functions that are sleeping on bit locks * @word: long word containing the bit lock */ -int nfs_wait_bit_killable(void *word) +int nfs_wait_bit_killable(struct wait_bit_key *key) { if (fatal_signal_pending(current)) return -ERESTARTSYS; -- cgit v1.2.3 From 912a108da767ae75cc929d2854e698aff527ec5d Mon Sep 17 00:00:00 2001 From: NeilBrown Date: Mon, 14 Jul 2014 11:28:20 +1000 Subject: NFS: teach nfs_neg_need_reval to understand LOOKUP_RCU This requires nfs_check_verifier to take an rcu_walk flag, and requires an rcu version of nfs_revalidate_inode which returns -ECHILD rather than making an RPC call. With this, nfs_lookup_revalidate can call nfs_neg_need_reval in RCU-walk mode. We can also move the LOOKUP_RCU check past the nfs_check_verifier() call in nfs_lookup_revalidate. If RCU_WALK prevents nfs_check_verifier or nfs_neg_need_reval from doing a full check, they return a status indicating that a revalidation is required. As this revalidation will not be possible in RCU_WALK mode, -ECHILD will ultimately be returned, which is the desired result. Signed-off-by: NeilBrown Signed-off-by: Trond Myklebust --- fs/nfs/inode.c | 9 +++++++++ 1 file changed, 9 insertions(+) (limited to 'fs/nfs/inode.c') diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c index 9927913c97c2..147fd17e7920 100644 --- a/fs/nfs/inode.c +++ b/fs/nfs/inode.c @@ -1002,6 +1002,15 @@ int nfs_revalidate_inode(struct nfs_server *server, struct inode *inode) } EXPORT_SYMBOL_GPL(nfs_revalidate_inode); +int nfs_revalidate_inode_rcu(struct nfs_server *server, struct inode *inode) +{ + if (!(NFS_I(inode)->cache_validity & + (NFS_INO_INVALID_ATTR|NFS_INO_INVALID_LABEL)) + && !nfs_attribute_cache_expired(inode)) + return NFS_STALE(inode) ? -ESTALE : 0; + return -ECHILD; +} + static int nfs_invalidate_mapping(struct inode *inode, struct address_space *mapping) { struct nfs_inode *nfsi = NFS_I(inode); -- cgit v1.2.3 From 65b38851a17472d31fec9019fc3a55b0802dab88 Mon Sep 17 00:00:00 2001 From: "Eric W. Biederman" Date: Thu, 31 Jul 2014 04:35:20 -0700 Subject: NFS: Fix /proc/fs/nfsfs/servers and /proc/fs/nfsfs/volumes The usage of pid_ns->child_reaper->nsproxy->net_ns in nfs_server_list_open and nfs_client_list_open is not safe. /proc for a pid namespace can remain mounted after the all of the process in that pid namespace have exited. There are also times before the initial process in a pid namespace has started or after the initial process in a pid namespace has exited where pid_ns->child_reaper can be NULL or stale. Making the idiom pid_ns->child_reaper->nsproxy a double whammy of problems. Luckily all that needs to happen is to move /proc/fs/nfsfs/servers and /proc/fs/nfsfs/volumes under /proc/net to /proc/net/nfsfs/servers and /proc/net/nfsfs/volumes and add a symlink from the original location, and to use seq_open_net as it has been designed. Cc: stable@vger.kernel.org Cc: Trond Myklebust Cc: Stanislav Kinsbursky Signed-off-by: "Eric W. Biederman" --- fs/nfs/inode.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) (limited to 'fs/nfs/inode.c') diff --git a/fs/nfs/inode.c b/fs/nfs/inode.c index 9927913c97c2..a732cbbd4e80 100644 --- a/fs/nfs/inode.c +++ b/fs/nfs/inode.c @@ -1840,11 +1840,12 @@ EXPORT_SYMBOL_GPL(nfs_net_id); static int nfs_net_init(struct net *net) { nfs_clients_init(net); - return 0; + return nfs_fs_proc_net_init(net); } static void nfs_net_exit(struct net *net) { + nfs_fs_proc_net_exit(net); nfs_cleanup_cb_ident_idr(net); } -- cgit v1.2.3