summaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorjohn stultz <johnstul@us.ibm.com>2011-05-11 16:10:28 -0700
committerWilly Tarreau <w@1wt.eu>2012-02-11 15:40:48 +0100
commitaf79350f1516512485de747775a1dde094a23e59 (patch)
tree238174fb4b0b3cba7888c26cd1e30c95abae781a
parent6e4e58898f4bb7e512f939fd89c0f5ad9b4939ae (diff)
Fix time() inconsistencies caused by intermediate xtime_cache values being read
Currently with 2.6.32-longterm, its possible for time() to occasionally return values one second earlier then the previous time() call. This happens because update_xtime_cache() does: xtime_cache = xtime; timespec_add_ns(&xtime_cache, nsec); Its possible that xtime is 1sec,999msecs, and nsecs is 1ms, resulting in a xtime_cache that is 2sec,0ms. get_seconds() (which is used by sys_time()) does not take the xtime_lock, which is ok as the xtime.tv_sec value is a long and can be atomically read safely. The problem occurs the next call to update_xtime_cache() if xtime has not increased: /* This sets xtime_cache back to 1sec, 999msec */ xtime_cache = xtime; /* get_seconds, calls here, and sees a 1second inconsistency */ timespec_add_ns(&xtime_cache, nsec); In order to resolve this, we could add locking to get_seconds(), but it needs to be lock free, as it is called from the machine check handler, opening a possible deadlock. So instead, this patch introduces an intermediate value for the calculations, so that we only assign xtime_cache once with the correct time, using ACCESS_ONCE to make sure the compiler doesn't optimize out any intermediate values. The xtime_cache manipulations were removed with 2.6.35, so that kernel and later do not need this change. In 2.6.33 and 2.6.34 the logarithmic accumulation should make it so xtime is updated each tick, so it is unlikely that two updates to xtime_cache could occur while the difference between xtime and xtime_cache crosses the second boundary. However, the paranoid might want to pull this into 2.6.33/34-longterm just to be sure. Thanks to Stephen for helping finally narrow down the root cause and many hours of help with testing and validation. Also thanks to Max, Andi, Eric and Paul for review of earlier attempts and helping clarify what is possible with regard to out of order execution. Acked-by: Eric Dumazet <eric.dumazet@gmail.com> Signed-off-by: John Stultz <johnstul@us.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> Signed-off-by: Willy Tarreau <w@1wt.eu>
-rw-r--r--kernel/time/timekeeping.c11
1 files changed, 9 insertions, 2 deletions
diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index 6e22c16c98bd..ac5f4e6d5279 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -52,8 +52,15 @@ int __read_mostly timekeeping_suspended;
static struct timespec xtime_cache __attribute__ ((aligned (16)));
void update_xtime_cache(u64 nsec)
{
- xtime_cache = xtime;
- timespec_add_ns(&xtime_cache, nsec);
+ /*
+ * Use temporary variable so get_seconds() cannot catch
+ * an intermediate xtime_cache.tv_sec value.
+ * The ACCESS_ONCE() keeps the compiler from optimizing
+ * out the intermediate value.
+ */
+ struct timespec ts = xtime;
+ timespec_add_ns(&ts, nsec);
+ ACCESS_ONCE(xtime_cache) = ts;
}
struct clocksource *clock;