summaryrefslogtreecommitdiff
path: root/include/linux
AgeCommit message (Collapse)Author
2007-12-18Revert "PNP: increase the maximum number of resources"Greg Kroah-Hartman
This reverts commit fc175adc1c935ea8679d76a78d7a58df34af16eb. There have been reports that it causes problems: http://bugzilla.kernel.org/show_bug.cgi?id=9514 people are still debating for 2.6.24 if it should be reverted or not, but as it causes a known problem, we will revert this for now. Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-12-14PNP: increase the maximum number of resourcesZhao Yakui
patch a7839e960675b549f06209d18283d5cee2ce9261 in mainline. On some systems the number of resources(IO,MEM) returnedy by PNP device is greater than the PNP constant, for example motherboard devices. It brings that some resources can't be reserved and resource confilicts. This will cause PCI resources are assigned wrongly in some systems, and cause hang. This is a regression since we deleted ACPI motherboard driver and use PNP system driver. [akpm@linux-foundation.org: fix text and coding-style a bit] Signed-off-by: Li Shaohua <shaohua.li@intel.com> Signed-off-by: Zhao Yakui <yakui.zhao@intel.com> Cc: Bjorn Helgaas <bjorn.helgaas@hp.com> Cc: Thomas Renninger <trenn@suse.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-12-14futex: fix for futex_wait signal stack corruptionSteven Rostedt
From Steven Rostedt <srostedt@redhat.com> patch ce6bd420f43b28038a2c6e8fbb86ad24014727b6 in mainline. David Holmes found a bug in the -rt tree with respect to pthread_cond_timedwait. After trying his test program on the latest git from mainline, I found the bug was there too. The bug he was seeing that his test program showed, was that if one were to do a "Ctrl-Z" on a process that was in the pthread_cond_timedwait, and then did a "bg" on that process, it would return with a "-ETIMEDOUT" but early. That is, the timer would go off early. Looking into this, I found the source of the problem. And it is a rather nasty bug at that. Here's the relevant code from kernel/futex.c: (not in order in the file) [...] smlinkage long sys_futex(u32 __user *uaddr, int op, u32 val, struct timespec __user *utime, u32 __user *uaddr2, u32 val3) { struct timespec ts; ktime_t t, *tp = NULL; u32 val2 = 0; int cmd = op & FUTEX_CMD_MASK; if (utime && (cmd == FUTEX_WAIT || cmd == FUTEX_LOCK_PI)) { if (copy_from_user(&ts, utime, sizeof(ts)) != 0) return -EFAULT; if (!timespec_valid(&ts)) return -EINVAL; t = timespec_to_ktime(ts); if (cmd == FUTEX_WAIT) t = ktime_add(ktime_get(), t); tp = &t; } [...] return do_futex(uaddr, op, val, tp, uaddr2, val2, val3); } [...] long do_futex(u32 __user *uaddr, int op, u32 val, ktime_t *timeout, u32 __user *uaddr2, u32 val2, u32 val3) { int ret; int cmd = op & FUTEX_CMD_MASK; struct rw_semaphore *fshared = NULL; if (!(op & FUTEX_PRIVATE_FLAG)) fshared = &current->mm->mmap_sem; switch (cmd) { case FUTEX_WAIT: ret = futex_wait(uaddr, fshared, val, timeout); [...] static int futex_wait(u32 __user *uaddr, struct rw_semaphore *fshared, u32 val, ktime_t *abs_time) { [...] struct restart_block *restart; restart = &current_thread_info()->restart_block; restart->fn = futex_wait_restart; restart->arg0 = (unsigned long)uaddr; restart->arg1 = (unsigned long)val; restart->arg2 = (unsigned long)abs_time; restart->arg3 = 0; if (fshared) restart->arg3 |= ARG3_SHARED; return -ERESTART_RESTARTBLOCK; [...] static long futex_wait_restart(struct restart_block *restart) { u32 __user *uaddr = (u32 __user *)restart->arg0; u32 val = (u32)restart->arg1; ktime_t *abs_time = (ktime_t *)restart->arg2; struct rw_semaphore *fshared = NULL; restart->fn = do_no_restart_syscall; if (restart->arg3 & ARG3_SHARED) fshared = &current->mm->mmap_sem; return (long)futex_wait(uaddr, fshared, val, abs_time); } So when the futex_wait is interrupt by a signal we break out of the hrtimer code and set up or return from signal. This code does not return back to userspace, so we set up a RESTARTBLOCK. The bug here is that we save the "abs_time" which is a pointer to the stack variable "ktime_t t" from sys_futex. This returns and unwinds the stack before we get to call our signal. On return from the signal we go to futex_wait_restart, where we update all the parameters for futex_wait and call it. But here we have a problem where abs_time is no longer valid. I verified this with print statements, and sure enough, what abs_time was set to ends up being garbage when we get to futex_wait_restart. The solution I did to solve this (with input from Linus Torvalds) was to add unions to the restart_block to allow system calls to use the restart with specific parameters. This way the futex code now saves the time in a 64bit value in the restart block instead of storing it on the stack. Note: I'm a bit nervious to add "linux/types.h" and use u32 and u64 in thread_info.h, when there's a #ifdef __KERNEL__ just below that. Not sure what that is there for. If this turns out to be a problem, I've tested this with using "unsigned int" for u32 and "unsigned long long" for u64 and it worked just the same. I'm using u32 and u64 just to be consistent with what the futex code uses. Signed-off-by: Steven Rostedt <srostedt@redhat.com> Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-12-14forcedeth: new mcp79 pci idsAyaz Abdulla
patch 490dde8990c55662596a4be71b5070bd7d382d4a in mainline. This patch adds new device ids and features for mcp79 devices into the forcedeth driver. Signed-off-by: Ayaz Abdulla <aabdulla@nvidia.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de> index 92ce2e3..f9ba0ac 100644
2007-11-16libata: backport ATA_FLAG_NO_SRST and ATA_FLAG_ASSUME_ATATejun Heo
Differs from mainline, but the functionality is already there. Backport ATA_FLAG_NO_SRST and ATA_FLAG_ASSUME_ATA. These are originally link flags (ATA_LFLAG_*) but link abstraction doesn't exist on 2.6.23, so make it port flags. This is for the following workaround for ASUS P5W DH Deluxe. These new flags don't introduce any behavior change unless set and nobody sets them yet. Signed-off-by: Tejun Heo <htejun@gmail.com> Cc: Jeff Garzik <jeff@garzik.org> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-11-16ide: Add ide_get_paired_drive() helperBenjamin Herrenschmidt
patch 1b678347121001c3c230c6eccfdf9f65c3ec1a4e in mainline. This adds a helper to get to the "other" drive on a pair connected to a given hwif. Signed-off-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Cc: Andrew Morton <akpm@osdl.org> Acked-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-11-16USB: remove USB_QUIRK_NO_AUTOSUSPENDAlan Stern
patch a691efa9888e71232dfb4088fb8a8304ffc7b0f9 in mainline. This patch (as995) cleans up the remains of the former NO_AUTOSUSPEND quirk. Since autosuspend is disabled by default, we will let userspace worry about which devices can safely be suspended. Thus the lengthy series of quirk entries is no longer needed, and neither is the quirk ID. I suppose someone might eventually run across a hub that can't be suspended; let's ignore the possibility for now. The patch also cleans up the hasty way in which autosuspend gets disabled. Setting udev->autosuspend_delay to -1 wasn't quite right, because the value is always supposed to be a multiple of HZ. It's better to leave the delay value alone and set autosuspend_disabled, which is what the quirk routine used to do. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-11-16forcedeth: add MCP77 device IDsAyaz Abdulla
patch 96fd4cd3e40e240f0c385af87f58e74da8b7099a in mainline. Signed-off-by: Ayaz Abdulla <aabdulla@nvidia.com> Signed-off-by: Jeff Garzik <jgarzik@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-11-16Fix netlink timeouts.Patrick McHardy
[NETLINK]: Fix unicast timeouts [ Upstream commit: c3d8d1e30cace31fed6186a4b8c6b1401836d89c ] Commit ed6dcf4a in the history.git tree broke netlink_unicast timeouts by moving the schedule_timeout() call to a new function that doesn't propagate the remaining timeout back to the caller. This means on each retry we start with the full timeout again. ipc/mqueue.c seems to actually want to wait indefinitely so this behaviour is retained. Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-11-16Fix SKB_WITH_OVERHEAD calculations.Herbert Xu
patch deea84b0ae3d26b41502ae0a39fe7fe134e703d0 in mainline. [NET]: Fix SKB_WITH_OVERHEAD calculation The calculation in SKB_WITH_OVERHEAD is incorrect in that it can cause an overflow across a page boundary which is what it's meant to prevent. In particular, the header length (X) should not be lumped together with skb_shared_info. The latter needs to be aligned properly while the header has no choice but to sit in front of wherever the payload is. Therefore the correct calculation is to take away the aligned size of skb_shared_info, and then subtract the header length. The resulting quantity L satisfies the following inequality: SKB_DATA_ALIGN(L + X) + sizeof(struct skb_shared_info) <= PAGE_SIZE This is the quantity used by alloc_skb to do the actual allocation. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-11-16revert "x86_64: allocate sparsemem memmap above 4G"Linus Torvalds
Reverted upstream by commit 6a22c57b8d2a62dea7280a6b2ac807a539ef0716 Revert this commit: commit 2e1c49db4c640b35df13889b86b9d62215ade4b6 Author: Zou Nan hai <nanhai.zou@intel.com> Date: Fri Jun 1 00:46:28 2007 -0700 x86_64: allocate sparsemem memmap above 4G This reverts commit 2e1c49db4c640b35df13889b86b9d62215ade4b6. First off, testing in Fedora has shown it to cause boot failures, bisected down by Martin Ebourne, and reported by Dave Jobes. So the commit will likely be reverted in the 2.6.23 stable kernels. Secondly, in the 2.6.24 model, x86-64 has now grown support for SPARSEMEM_VMEMMAP, which disables the relevant code anyway, so while the bug is not visible any more, it's become invisible due to the code just being irrelevant and no longer enabled on the only architecture that this ever affected. Reported-by: Dave Jones <davej@redhat.com> Tested-by: Martin Ebourne <fedora@ebourne.me.uk> Cc: Zou Nan hai <nanhai.zou@intel.com> Cc: Suresh Siddha <suresh.b.siddha@intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Acked-by: Andy Whitcroft <apw@shadowen.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Chuck Ebbert <cebbert@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-11-16BLOCK: Fix bad sharing of tag busy list on queues with shared tag mapsJens Axboe
patch 6eca9004dfcb274a502438a591df5b197690afb1 in mainline. For the locking to work, only the tag map and tag bit map may be shared (incidentally, I was just explaining this to Nick yesterday, but I apparently didn't review the code well enough myself). But we also share the busy list! The busy_list must be queue private, or we need a block_queue_tag covering lock as well. So we have to move the busy_list to the queue. This'll work fine, and it'll actually also fix a problem with blk_queue_invalidate_tags() which will invalidate tags across all shared queues. This is a bit confusing, the low level driver should call it for each queue seperately since otherwise you cannot kill tags on just a single queue for eg a hard drive that stops responding. Since the function has no callers currently, it's not an issue. This is fixed with commit 6eca9004dfcb274a502438a591df5b197690afb1 in Linus' tree. Signed-off-by: Jens Axboe <jens.axboe@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-11-16sched: keep utime/stime monotonicFrans Pop
sched: keep utime/stime monotonic cpustats use utime/stime as a ratio against sum_exec_runtime, as a consequence it can happen - when the ratio changes faster than time accumulates - that either can be appear to go backwards. Combined backport for 2.6.23 of the following patches from mainline: commit 73a2bcb0edb9ffb0b007b3546b430e2c6e415eee Author: Peter Zijlstra <a.p.zijlstra@chello.nl> sched: keep utime/stime monotonic commit 9301899be75b464ef097f0b5af7af6d9bd8f68a7 Author: Balbir Singh <balbir@linux.vnet.ibm.com> sched: fix /proc/<PID>/stat stime/utime monotonicity, part 2 Signed-off-by: Frans Pop <elendil@planet.nl> CC: Peter Zijlstra <a.p.zijlstra@chello.nl> CC: Balbir Singh <balbir@linux.vnet.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-10-08mm: set_page_dirty_balance() vs ->page_mkwrite()Peter Zijlstra
All the current page_mkwrite() implementations also set the page dirty. Which results in the set_page_dirty_balance() call to _not_ call balance, because the page is already found dirty. This allows us to dirty a _lot_ of pages without ever hitting balance_dirty_pages(). Not good (tm). Force a balance call if ->page_mkwrite() was successful. Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-10-07Don't do load-average calculations at even 5-second intervalsLinus Torvalds
It turns out that there are a few other five-second timers in the kernel, and if the timers get in sync, the load-average can get artificially inflated by events that just happen to coincide. So just offset the load average calculation it by a timer tick. Noticed by Anders Boström, for whom the coincidence started triggering on one of his machines with the JBD jiffies rounding code (JBD is one of the subsystems that also end up using a 5-second timer by default). Tested-by: Anders Boström <anders@bostrom.dyndns.org> Cc: Chuck Ebbert <cebbert@redhat.com> Cc: Arjan van de Ven <arjan@linux.intel.com> Cc: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-09-26Revert "[PATCH] x86-64: fix x86_64-mm-sched-clock-share"Linus Torvalds
This reverts commit 184c44d2049c4db7ef6ec65794546954da2c6a0e. As noted by Dave Jones: "Linus, please revert the above cset. It doesn't seem to be necessary (it was added to fix a miscompile in 'make allnoconfig' which doesn't seem to be repeatable with it reverted) and actively breaks the ARM SA1100 framebuffer driver." Requested-by: Dave Jones <davej@redhat.com> Cc: Russell King <rmk+lkml@arm.linux.org.uk> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Andi Kleen <ak@suse.de> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-09-20signalfd simplificationDavide Libenzi
This simplifies signalfd code, by avoiding it to remain attached to the sighand during its lifetime. In this way, the signalfd remain attached to the sighand only during poll(2) (and select and epoll) and read(2). This also allows to remove all the custom "tsk == current" checks in kernel/signal.c, since dequeue_signal() will only be called by "current". I think this is also what Ben was suggesting time ago. The external effect of this, is that a thread can extract only its own private signals and the group ones. I think this is an acceptable behaviour, in that those are the signals the thread would be able to fetch w/out signalfd. Signed-off-by: Davide Libenzi <davidel@xmailserver.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-09-19sched: add /proc/sys/kernel/sched_compat_yieldIngo Molnar
add /proc/sys/kernel/sched_compat_yield to make sys_sched_yield() more agressive, by moving the yielding task to the last position in the rbtree. with sched_compat_yield=0: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2539 mingo 20 0 1576 252 204 R 50 0.0 0:02.03 loop_yield 2541 mingo 20 0 1576 244 196 R 50 0.0 0:02.05 loop with sched_compat_yield=1: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2584 mingo 20 0 1576 248 196 R 99 0.0 0:52.45 loop 2582 mingo 20 0 1576 256 204 R 0 0.0 0:00.00 loop_yield Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
2007-09-19Fix NUMA Memory Policy Reference CountingLee Schermerhorn
This patch proposes fixes to the reference counting of memory policy in the page allocation paths and in show_numa_map(). Extracted from my "Memory Policy Cleanups and Enhancements" series as stand-alone. Shared policy lookup [shmem] has always added a reference to the policy, but this was never unrefed after page allocation or after formatting the numa map data. Default system policy should not require additional ref counting, nor should the current task's task policy. However, show_numa_map() calls get_vma_policy() to examine what may be [likely is] another task's policy. The latter case needs protection against freeing of the policy. This patch adds a reference count to a mempolicy returned by get_vma_policy() when the policy is a vma policy or another task's mempolicy. Again, shared policy is already reference counted on lookup. A matching "unref" [__mpol_free()] is performed in alloc_page_vma() for shared and vma policies, and in show_numa_map() for shared and another task's mempolicy. We can call __mpol_free() directly, saving an admittedly inexpensive inline NULL test, because we know we have a non-NULL policy. Handling policy ref counts for hugepages is a bit trickier. huge_zonelist() returns a zone list that might come from a shared or vma 'BIND policy. In this case, we should hold the reference until after the huge page allocation in dequeue_hugepage(). The patch modifies huge_zonelist() to return a pointer to the mempolicy if it needs to be unref'd after allocation. Kernel Build [16cpu, 32GB, ia64] - average of 10 runs: w/o patch w/ refcount patch Avg Std Devn Avg Std Devn Real: 100.59 0.38 100.63 0.43 User: 1209.60 0.37 1209.91 0.31 System: 81.52 0.42 81.64 0.34 Signed-off-by: Lee Schermerhorn <lee.schermerhorn@hp.com> Acked-by: Andi Kleen <ak@suse.de> Cc: Christoph Lameter <clameter@sgi.com> Acked-by: Mel Gorman <mel@csn.ul.ie> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-09-19Fix user namespace exiting OOPsPavel Emelyanov
It turned out, that the user namespace is released during the do_exit() in exit_task_namespaces(), but the struct user_struct is released only during the put_task_struct(), i.e. MUCH later. On debug kernels with poisoned slabs this will cause the oops in uid_hash_remove() because the head of the chain, which resides inside the struct user_namespace, will be already freed and poisoned. Since the uid hash itself is required only when someone can search it, i.e. when the namespace is alive, we can safely unhash all the user_struct-s from it during the namespace exiting. The subsequent free_uid() will complete the user_struct destruction. For example simple program #include <sched.h> char stack[2 * 1024 * 1024]; int f(void *foo) { return 0; } int main(void) { clone(f, stack + 1 * 1024 * 1024, 0x10000000, 0); return 0; } run on kernel with CONFIG_USER_NS turned on will oops the kernel immediately. This was spotted during OpenVZ kernel testing. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org> Acked-by: "Serge E. Hallyn" <serue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-09-19Convert uid hash to hlistPavel Emelyanov
Surprisingly, but (spotted by Alexey Dobriyan) the uid hash still uses list_heads, thus occupying twice as much place as it could. Convert it to hlist_heads. Signed-off-by: Pavel Emelyanov <xemul@openvz.org> Signed-off-by: Alexey Dobriyan <adobriyan@openvz.org> Acked-by: Serge Hallyn <serue@us.ibm.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-09-16Merge branch 'master' of ↵Linus Torvalds
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 * 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: [VLAN]: Fix net_device leak. [PPP] generic: Fix receive path data clobbering & non-linear handling [PPP] generic: Call skb_cow_head before scribbling over skb [NET] skbuff: Add skb_cow_head [BRIDGE]: Kill clone argument to br_flood_* [PPP] pppoe: Fill in header directly in __pppoe_xmit [PPP] pppoe: Fix data clobbering in __pppoe_xmit and return value [PPP] pppoe: Fix skb_unshare_check call position [SCTP]: Convert bind_addr_list locking to RCU [SCTP]: Add RCU synchronization around sctp_localaddr_list [PKT_SCHED]: sch_cbq.c: Shut up uninitialized variable warning [PKTGEN]: srcmac fix [IPV6]: Fix source address selection. [IPV4]: Just increment OutDatagrams once per a datagram. [IPV6]: Just increment OutDatagrams once per a datagram. [IPV6]: Fix unbalanced socket reference with MSG_CONFIRM. [NET_SCHED] protect action config/dump from irqs [NET]: Fix two issues wrt. SO_BINDTODEVICE.
2007-09-16Fix non-ISA link error in drivers/scsi/advansys.cMatthew Wilcox
When CONFIG_ISA is disabled, the isa_driver support will not be compiled in. Define stubs so that we don't get link-time errors. Signed-off-by: Matthew Wilcox <matthew@wil.cx> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-09-16[NET] skbuff: Add skb_cow_headHerbert Xu
This patch adds an optimised version of skb_cow that avoids the copy if the header can be modified even if the rest of the payload is cloned. This can be used in encapsulating paths where we only need to modify the header. As it is, this can be used in PPPOE and bridging. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-09-12Merge branch 'for-linus' of git://git.o-hand.com/linux-rpurdie-ledsLinus Torvalds
* 'for-linus' of git://git.o-hand.com/linux-rpurdie-leds: leds: Add missing include for leds.h
2007-09-12Merge branch 'for-linus' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input: Input: usbtouchscreen - correctly set 'phys' Input: i8042 - add HP Pavilion DV4270ca to the MUX blacklist Input: i8042 - fix modpost warning Input: add more Braille keycodes
2007-09-11Fix select on /proc files without ->pollAlexey Dobriyan
Taneli Vähäkangas <vahakang@cs.helsinki.fi> reported that commit 786d7e1612f0b0adb6046f19b906609e4fe8b1ba aka "Fix rmmod/read/write races in /proc entries" broke SBCL + SLIME combo. The old code in do_select() used DEFAULT_POLLMASK, if couldn't find ->poll handler. The new code makes ->poll always there and returns 0 by default, which is not correct. Return DEFAULT_POLLMASK instead. Steps to reproduce: install emacs, SBCL, SLIME emacs M-x slime in *inferior-lisp* buffer [watch it doing "Connecting to Swank on port X.."] Please, apply before 2.6.23. P.S.: why SBCL can't just read(2) /proc/cpuinfo is a mystery. Signed-off-by: Alexey Dobriyan <adobriyan@gmail.com> Cc: T Taneli Vahakangas <vahakang@cs.helsinki.fi> Cc: Oleg Nesterov <oleg@tv-sign.ru> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-09-11PTR_ALIGNMatthew Wilcox
The AdvanSys driver wants to align some pointers, and the ALIGN macro doesn't work for pointers. Rather than try to make it work, add a new PTR_ALIGN macro which is typesafe. Signed-off-by: Matthew Wilcox <matthew@wil.cx> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-09-11Merge master.kernel.org:/pub/scm/linux/kernel/git/bart/ide-2.6Linus Torvalds
* master.kernel.org:/pub/scm/linux/kernel/git/bart/ide-2.6: pdc202xx_new: PLL detection fix via82cxxx: add Arima W730-K8 and other rebadgings to short cables list pmac: build fix pata_ali/alim15x3: override 80-wire cable detection for Toshiba S1800-814 hpt366: UltraDMA filter for SATA cards (take 2) ide: add ide_dev_is_sata() helper (take 2) hpt366: fix PCI clock detection for HPT374 (take 4) pdc202xx_new: fix PCI refcounting ide: fix PCI refcounting mpc8xx: Only build mpc8xx on arch/ppc
2007-09-11leds: Add missing include for leds.hYoichi Yuasa
This patch has added #include <linux/spinlock.h> to include/linux/leds.h for rwlock_t. Signed-off-by: Yoichi Yuasa <yoichi_yuasa@tripeaks.co.jp> Signed-off-by: Richard Purdie <rpurdie@rpsys.net>
2007-09-11ide: add ide_dev_is_sata() helper (take 2)Sergei Shtylyov
Make the SATA drive detection code from eighty_ninty_three() into inline ide_dev_is_sata() helper fixing it along the way to be more strict while checking word 80 for the reserved values... Signed-off-by: Sergei Shtylyov <sshtylyov@ru.mvista.com> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
2007-09-11Merge master.kernel.org:/pub/scm/linux/kernel/git/gregkh/pci-2.6Linus Torvalds
* master.kernel.org:/pub/scm/linux/kernel/git/gregkh/pci-2.6: PCI: irq and pci_ids patch for Intel Tolapai PCI: unhide SMBus on Compaq Deskpro EP 401963-001 motherboard PCI: Remove __devinit from pcibios_get_irq_routing_table PCI: remove devinit from pci_read_bridge_bases PCI AER: fix warnings when PCIEAER=n
2007-09-11PCI: irq and pci_ids patch for Intel TolapaiJason Gaston
This patch adds the Intel Tolapai LPC and SMBus Controller DID's. Signed-off-by: Jason Gaston <jason.d.gaston@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-09-11PCI AER: fix warnings when PCIEAER=nRandy Dunlap
Fix warnings when CONFIG_PCIEAER=n: drivers/pci/pcie/portdrv_pci.c:105: warning: statement with no effect drivers/pci/pcie/portdrv_pci.c:226: warning: statement with no effect drivers/scsi/arcmsr/arcmsr_hba.c:352: warning: statement with no effect Signed-off-by: Randy Dunlap <randy.dunlap@oracle.com> Acked-by: Linas Vepstas <linas@austin.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@suse.de>
2007-09-11[NETFILTER]: Fix/improve deadlock condition on module removal netfilterNeil Horman
So I've had a deadlock reported to me. I've found that the sequence of events goes like this: 1) process A (modprobe) runs to remove ip_tables.ko 2) process B (iptables-restore) runs and calls setsockopt on a netfilter socket, increasing the ip_tables socket_ops use count 3) process A acquires a file lock on the file ip_tables.ko, calls remove_module in the kernel, which in turn executes the ip_tables module cleanup routine, which calls nf_unregister_sockopt 4) nf_unregister_sockopt, seeing that the use count is non-zero, puts the calling process into uninterruptible sleep, expecting the process using the socket option code to wake it up when it exits the kernel 4) the user of the socket option code (process B) in do_ipt_get_ctl, calls ipt_find_table_lock, which in this case calls request_module to load ip_tables_nat.ko 5) request_module forks a copy of modprobe (process C) to load the module and blocks until modprobe exits. 6) Process C. forked by request_module process the dependencies of ip_tables_nat.ko, of which ip_tables.ko is one. 7) Process C attempts to lock the request module and all its dependencies, it blocks when it attempts to lock ip_tables.ko (which was previously locked in step 3) Theres not really any great permanent solution to this that I can see, but I've developed a two part solution that corrects the problem Part 1) Modifies the nf_sockopt registration code so that, instead of using a use counter internal to the nf_sockopt_ops structure, we instead use a pointer to the registering modules owner to do module reference counting when nf_sockopt calls a modules set/get routine. This prevents the deadlock by preventing set 4 from happening. Part 2) Enhances the modprobe utilty so that by default it preforms non-blocking remove operations (the same way rmmod does), and add an option to explicity request blocking operation. So if you select blocking operation in modprobe you can still cause the above deadlock, but only if you explicity try (and since root can do any old stupid thing it would like.... :) ). Signed-off-by: Neil Horman <nhorman@tuxdriver.com> Signed-off-by: Patrick McHardy <kaber@trash.net> Signed-off-by: David S. Miller <davem@davemloft.net>
2007-09-10[libata, IDE] add new VIA bridge to VIA PATA driversJoseph Chan
Signed-off-by: Joseph Chan <josephchan@via.com.tw> Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-09-04Input: add more Braille keycodesSamuel Thibault
Some braille keyboards have 10 dots, so extend the Input braille keys definitions. Signed-off-by: Samuel Thibault <samuel.thibault@ens-lyon.org> Signed-off-by: Dmitry Torokhov <dtor@mail.ru>
2007-09-01NFS: Fix a write request leak in nfs_invalidate_page()Trond Myklebust
Ryusuke Konishi says: The recent truncate_complete_page() clears the dirty flag from a page before calling a_ops->invalidatepage(), ^^^^^^ static void truncate_complete_page(struct address_space *mapping, struct page *page) { ... cancel_dirty_page(page, PAGE_CACHE_SIZE); <--- Inserted here at kernel 2.6.20 if (PagePrivate(page)) do_invalidatepage(page, 0); ---> will call a_ops->invalidatepage() ... } and this is disturbing nfs_wb_page_priority() from calling nfs_writepage_locked() that is expected to handle the pending request (=nfs_page) associated with the page. int nfs_wb_page_priority(struct inode *inode, struct page *page, int how) { ... if (clear_page_dirty_for_io(page)) { ret = nfs_writepage_locked(page, &wbc); if (ret < 0) goto out; } ... } Since truncate_complete_page() will get rid of the page after a_ops->invalidatepage() returns, the request (=nfs_page) associated with the page becomes a garbage in nfs_inode->nfs_page_tree. ------------------------ Fix this by ensuring that nfs_wb_page_priority() recognises that it may also need to clear out non-dirty pages that have an nfs_page associated with them. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
2007-08-31Merge git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-schedLinus Torvalds
* git://git.kernel.org/pub/scm/linux/kernel/git/mingo/linux-2.6-sched: sched: clean up task_new_fair() sched: small schedstat fix sched: fix wait_start_fair condition in update_stats_wait_end() sched: call update_curr() in task_tick_fair() sched: make the scheduler converge to the ideal latency sched: fix sleeper bonus limit
2007-08-31Merge branch 'upstream-linus' of ↵Linus Torvalds
master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/libata-dev * 'upstream-linus' of master.kernel.org:/pub/scm/linux/kernel/git/jgarzik/libata-dev: [libata] Bump driver versions ata_piix: implement IOCFG bit18 quirk libata: implement BROKEN_HPA horkage and apply it to affected drives sata_promise: FastTrack TX4200 is a second-generation chip pata_marvell: Add more identifiers ata_piix: add Satellite U200 to broken suspend list ata: add ATA_MWDMA* and ATA_SWDMA* defines ata_piix: IDE mode SATA patch for Intel Tolapai libata-core: Allow translation setting to fail
2007-08-31hugepage: fix broken check for offset alignment in hugepage mappingsDavid Gibson
For hugepage mappings, the file offset, like the address and size, needs to be aligned to the size of a hugepage. In commit 68589bc353037f233fe510ad9ff432338c95db66, the check for this was moved into prepare_hugepage_range() along with the address and size checks. But since BenH's rework of the get_unmapped_area() paths leading up to commit 4b1d89290b62bb2db476c94c82cf7442aab440c8, prepare_hugepage_range() is only called for MAP_FIXED mappings, not for other mappings. This means we're no longer ever checking for an aligned offset - I've confirmed that mmap() will (apparently) succeed with a misaligned offset on both powerpc and i386 at least. This patch restores the check, removing it from prepare_hugepage_range() and putting it back into hugetlbfs_file_mmap(). I'm putting it there, rather than in the get_unmapped_area() path so it only needs to go in one place, than separately in the half-dozen or so arch-specific implementations of hugetlb_get_unmapped_area(). Signed-off-by: David Gibson <david@gibson.dropbear.id.au> Cc: Adam Litke <agl@us.ibm.com> Cc: Andi Kleen <ak@suse.de> Cc: "David S. Miller" <davem@davemloft.net> Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-08-31i2c-piix4: Fix SB700 PCI device IDShane Huang
We find that SB700 and SB800 use the same SMBus device ID as SB600, which is 0x4385, instead of the already submitted 0x4395. Besides removing the wrong SB700 device ID, add SB800 support to kernel, by renaming the PCI_DEVICE_ID_ATI_IXP600_SMBUS into PCI_DEVICE_ID_ATI_SBX00_SMBUS. Signed-off-by: Shane Huang <shane.huang@amd.com> Signed-off-by: Jean Delvare <khali@linux-fr.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-08-31PM: Fix dependencies of CONFIG_SUSPEND and CONFIG_HIBERNATIONRafael J. Wysocki
Dependencies of CONFIG_SUSPEND and CONFIG_HIBERNATION introduced by commit 296699de6bdc717189a331ab6bbe90e05c94db06 "Introduce CONFIG_SUSPEND for suspend-to-Ram and standby" are incorrect, as they don't cover the facts that (1) not all architectures support suspend and (2) SMP hibernation is only possible on X86 and PPC64 (if CONFIG_PPC64_SWSUSP is set). Signed-off-by: Rafael J. Wysocki <rjw@sisk.pl> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-08-31libata: implement BROKEN_HPA horkage and apply it to affected drivesTejun Heo
Some drives choke on READ_NATIVE_MAX_ADDRESS[_EXT]. Implement ATA_HORKAGE_BROKEN_HPA and apply it to affected drives. Signed-off-by: Tejun Heo <htejun@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-08-31SLUB: Force inlining for functions in slub_def.hChristoph Lameter
Some compilers (especially older gcc releases) may skip inlining sometimes which will lead to link failures. Force the inlining of keyfunctions in slub_def.h to avoid these issues. Signed-off-by: Christoph Lameter <clameter@sgi.com> Acked-by: Jan Dittmer <jdi@l4x.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
2007-08-31ata: add ATA_MWDMA* and ATA_SWDMA* definesBartlomiej Zolnierkiewicz
Cc: Jeff Garzik <jeff@garzik.org> Signed-off-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com> Signed-off-by: Jeff Garzik <jeff@garzik.org>
2007-08-30[TCP]: Allow minimum RTO to be configurable via routing metrics.David S. Miller
Cell phone networks do link layer retransmissions and other things that cause unnecessary timeout retransmits. So allow the minimum RTO to be inflated per-route to deal with this. Signed-off-by: David S. Miller <davem@davemloft.net>
2007-08-28sched: make the scheduler converge to the ideal latencyIngo Molnar
de-HZ-ification of the granularity defaults unearthed a pre-existing property of CFS: while it correctly converges to the granularity goal, it does not prevent run-time fluctuations in the range of [-gran ... 0 ... +gran]. With the increase of the granularity due to the removal of HZ dependencies, this becomes visible in chew-max output (with 5 tasks running): out: 28 . 27. 32 | flu: 0 . 0 | ran: 9 . 13 | per: 37 . 40 out: 27 . 27. 32 | flu: 0 . 0 | ran: 17 . 13 | per: 44 . 40 out: 27 . 27. 32 | flu: 0 . 0 | ran: 9 . 13 | per: 36 . 40 out: 29 . 27. 32 | flu: 2 . 0 | ran: 17 . 13 | per: 46 . 40 out: 28 . 27. 32 | flu: 0 . 0 | ran: 9 . 13 | per: 37 . 40 out: 29 . 27. 32 | flu: 0 . 0 | ran: 18 . 13 | per: 47 . 40 out: 28 . 27. 32 | flu: 0 . 0 | ran: 9 . 13 | per: 37 . 40 average slice is the ideal 13 msecs and the period is picture-perfect 40 msecs. But the 'ran' field fluctuates around 13.33 msecs and there's no mechanism in CFS to keep that from happening: it's a perfectly valid solution that CFS finds. to fix this we add a granularity/preemption rule that knows about the "target latency", which makes tasks that run longer than the ideal latency run a bit less. The simplest approach is to simply decrease the preemption granularity when a task overruns its ideal latency. For this we have to track how much the task executed since its last preemption. ( this adds a new field to task_struct, but we can eliminate that overhead in 2.6.24 by putting all the scheduler timestamps into an anonymous union. ) with this change in place, chew-max output is fluctuation-less all around: out: 28 . 27. 39 | flu: 0 . 2 | ran: 13 . 13 | per: 41 . 40 out: 28 . 27. 39 | flu: 0 . 2 | ran: 13 . 13 | per: 41 . 40 out: 28 . 27. 39 | flu: 0 . 2 | ran: 13 . 13 | per: 41 . 40 out: 28 . 27. 39 | flu: 0 . 2 | ran: 13 . 13 | per: 41 . 40 out: 28 . 27. 39 | flu: 0 . 1 | ran: 13 . 13 | per: 41 . 40 out: 28 . 27. 39 | flu: 0 . 1 | ran: 13 . 13 | per: 41 . 40 this patch has no impact on any fastpath or on any globally observable scheduling property. (unless you have sharp enough eyes to see millisecond-level ruckles in glxgears smoothness :-) Signed-off-by: Ingo Molnar <mingo@elte.hu> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl> Signed-off-by: Mike Galbraith <efault@gmx.de>
2007-08-27Merge branch 'master' of ↵Linus Torvalds
master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6 * 'master' of master.kernel.org:/pub/scm/linux/kernel/git/davem/net-2.6: [NET]: Mark Paul Moore as maintainer of labelled networking. [VLAN/BRIDGE]: Fix "skb_pull_rcsum - Fatal exception in interrupt" [ISDN]: Get rid of some pointless allocation casts in common and bsd comp. [NET]: Avoid pointless allocation casts in BSD compression module [IRDA]: Do not do pointless kmalloc return value cast in KingSun driver [NET]: Fix crash in dev_mc_sync()/dev_mc_unsync() [PPPOL2TP]: Fix endianness annotations. [IOAT]: ioatdma needs to to play nice in a multi-dma-client world [SLIP]: trivial sparse warning fix [EQL]: sparse warning fix [NET]: is_power_of_2 in net/core/neighbour.c [TCP]: Describe tcp_init_cwnd() thoroughly in a comment. [NET]: Fix IP_ADD/DROP_MEMBERSHIP to handle only connectionless [KBUILD]: Sanitize tc_ematch headers. [IPSEC] AH4: Update IPv4 options handling to conform to RFC 4302.
2007-08-27Merge branch 'merge' of ↵Linus Torvalds
git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc * 'merge' of git://git.kernel.org/pub/scm/linux/kernel/git/paulus/powerpc: [POWERPC] Fix SLB initialization at boot time [POWERPC] Fix undefined reference to device_power_up/resume [POWERPC] cell: Update cell_defconfig for 2.6.23 [POWERPC] axonram: Do not delete gendisks queue in error path [POWERPC] axonram: Module modification for latest firmware API changes [POWERPC] cell: Support pinhole-reset on IBM cell blades [POWERPC] spu_manage: Use newer physical-id attribute [POWERPC] pasemi: Another IOMMU bugfix for 64K PAGE_SIZE