summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2014-06-30Linux 3.14.10v3.14.10Greg Kroah-Hartman
2014-06-30efi-pstore: Fix an overflow on 32-bit buildsAndrzej Zaborowski
commit 783ee43118dc773bc8b0342c5b230e017d5a04d0 upstream. In generic_id the long int timestamp is multiplied by 100000 and needs an explicit cast to u64. Without that the id in the resulting pstore filename is wrong and userspace may have problems parsing it, but more importantly files in pstore can never be deleted and may fill the EFI flash (brick device?). This happens because when generic pstore code wants to delete a file, it passes the id to the EFI backend which reinterpretes it and a wrong variable name is attempted to be deleted. There's no error message but after remounting pstore, deleted files would reappear. Signed-off-by: Andrew Zaborowski <andrew.zaborowski@intel.com> Acked-by: David Rientjes <rientjes@google.com> Signed-off-by: Matt Fleming <matt.fleming@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-30builddeb: use $OBJCOPY variable instead of objcopyFathi Boudra
commit 6b4a144a92ab81a1f45fb9b12aebaaaee0d08120 upstream. In cross-build environment, we expect to use the cross-compiler objcopy instead of the host objcopy. It fixes following build failures: objcopy --only-keep-debug lib/modules/3.14/kernel/net/ipv6/xfrm6_mode_tunnel.ko /srv/build/linux/debian/dbgtmp/usr/lib/debug/lib/modules/3.14/kernel/net/ipv6/xfrm6_mode_tunnel.ko objcopy: Unable to recognise the format of the input file `lib/modules/3.14/kernel/net/ipv6/xfrm6_mode_tunnel.ko' Signed-off-by: Fathi Boudra <fathi.boudra@linaro.org> Fixes: 810e843746b7 ('deb-pkg: split debug symbols in their own package') Reviewed-by: Ben Hutchings <ben@decadent.org.uk> Signed-off-by: Michal Marek <mmarek@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-30epoll: fix use-after-free in eventpoll_release_fileKonstantin Khlebnikov
commit ebe06187bf2aec10d537ce4595e416035367d703 upstream. This fixes use-after-free of epi->fllink.next inside list loop macro. This loop actually releases elements in the body. The list is rcu-protected but here we cannot hold rcu_read_lock because we need to lock mutex inside. The obvious solution is to use list_for_each_entry_safe(). RCU-ness isn't essential because nobody can change this list under us, it's final fput for this file. The bug was introduced by ae10b2b4eb01 ("epoll: optimize EPOLL_CTL_DEL using rcu") Signed-off-by: Konstantin Khlebnikov <koct9i@gmail.com> Reported-by: Cyrill Gorcunov <gorcunov@openvz.org> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Jason Baron <jbaron@akamai.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-30x86_32, entry: Do syscall exit work on badsys (CVE-2014-4508)Andy Lutomirski
commit 554086d85e71f30abe46fc014fea31929a7c6a8a upstream. The bad syscall nr paths are their own incomprehensible route through the entry control flow. Rearrange them to work just like syscalls that return -ENOSYS. This fixes an OOPS in the audit code when fast-path auditing is enabled and sysenter gets a bad syscall nr (CVE-2014-4508). This has probably been broken since Linux 2.6.27: af0575bba0 i386 syscall audit fast-path Cc: Roland McGrath <roland@redhat.com> Reported-by: Toralf Förster <toralf.foerster@gmx.de> Signed-off-by: Andy Lutomirski <luto@amacapital.net> Link: http://lkml.kernel.org/r/e09c499eade6fc321266dd6b54da7beb28d6991c.1403558229.git.luto@amacapital.net Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-30lz4: fix another possible overrunGreg Kroah-Hartman
commit 4148c1f67abf823099b2d7db6851e4aea407f5ee upstream. There is one other possible overrun in the lz4 code as implemented by Linux at this point in time (which differs from the upstream lz4 codebase, but will get synced at in a future kernel release.) As pointed out by Don, we also need to check the overflow in the data itself. While we are at it, replace the odd error return value with just a "simple" -1 value as the return value is never used for anything other than a basic "did this work or not" check. Reported-by: "Don A. Bailey" <donb@securitymouse.com> Reported-by: Willy Tarreau <w@1wt.eu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-30btrfs: allocate raid type kobjects dynamicallyJeff Mahoney
commit c1895442be01c58449e3bf9272f22062a670e08f upstream. We are currently allocating space_info objects in an array when we allocate space_info. When a user does something like: # btrfs balance start -mconvert=raid1 -dconvert=raid1 /mnt # btrfs balance start -mconvert=single -dconvert=single /mnt -f # btrfs balance start -mconvert=raid1 -dconvert=raid1 / We can end up with memory corruption since the kobject hasn't been reinitialized properly and the name pointer was left set. The rationale behind allocating them statically was to avoid creating a separate kobject container that just contained the raid type. It used the index in the array to determine the index. Ultimately, though, this wastes more memory than it saves in all but the most complex scenarios and introduces kobject lifetime questions. This patch allocates the kobjects dynamically instead. Note that we also remove the kobject_get/put of the parent kobject since kobject_add and kobject_del do that internally. Signed-off-by: Jeff Mahoney <jeffm@suse.com> Reported-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-30btrfs: fix lockdep warning with reclaim lock inversionJeff Mahoney
commit ed55b6ac077fe7f9c6490ff55172c4b563562d7c upstream. When encountering memory pressure, testers have run into the following lockdep warning. It was caused by __link_block_group calling kobject_add with the groups_sem held. kobject_add calls kvasprintf with GFP_KERNEL, which gets us into reclaim context. The kobject doesn't actually need to be added under the lock -- it just needs to ensure that it's only added for the first block group to be linked. ========================================================= [ INFO: possible irq lock inversion dependency detected ] 3.14.0-rc8-default #1 Not tainted --------------------------------------------------------- kswapd0/169 just changed the state of lock: (&delayed_node->mutex){+.+.-.}, at: [<ffffffffa018baea>] __btrfs_release_delayed_node+0x3a/0x200 [btrfs] but this lock took another, RECLAIM_FS-unsafe lock in the past: (&found->groups_sem){+++++.} and interrupts could create inverse lock ordering between them. other info that might help us debug this: Possible interrupt unsafe locking scenario: CPU0 CPU1 ---- ---- lock(&found->groups_sem); local_irq_disable(); lock(&delayed_node->mutex); lock(&found->groups_sem); <Interrupt> lock(&delayed_node->mutex); *** DEADLOCK *** 2 locks held by kswapd0/169: #0: (shrinker_rwsem){++++..}, at: [<ffffffff81159e8a>] shrink_slab+0x3a/0x160 #1: (&type->s_umount_key#27){++++..}, at: [<ffffffff811bac6f>] grab_super_passive+0x3f/0x90 Signed-off-by: Jeff Mahoney <jeffm@suse.com> Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-30btrfs: fix use of uninit "ret" in end_extent_writepage()Eric Sandeen
commit 3e2426bd0eb980648449e7a2f5a23e3cd3c7725c upstream. If this condition in end_extent_writepage() is false: if (tree->ops && tree->ops->writepage_end_io_hook) we will then test an uninitialized "ret" at: ret = ret < 0 ? ret : -EIO; The test for ret is for the case where ->writepage_end_io_hook failed, and we'd choose that ret as the error; but if there is no ->writepage_end_io_hook, nothing sets ret. Initializing ret to 0 should be sufficient; if writepage_end_io_hook wasn't set, (!uptodate) means non-zero err was passed in, so we choose -EIO in that case. Signed-of-by: Eric Sandeen <sandeen@redhat.com> Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-30Btrfs: fix scrub_print_warning to handle skinny metadata extentsLiu Bo
commit 6eda71d0c030af0fc2f68aaa676e6d445600855b upstream. The skinny extents are intepreted incorrectly in scrub_print_warning(), and end up hitting the BUG() in btrfs_extent_inline_ref_size. Reported-by: Konstantinos Skarlatos <k.skarlatos@gmail.com> Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-30Btrfs: use right type to get real comparisonLiu Bo
commit cd857dd6bc2ae9ecea14e75a34e8a8fdc158e307 upstream. We want to make sure the point is still within the extent item, not to verify the memory it's pointing to. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-30Btrfs: don't check nodes for extent itemsJosef Bacik
commit 8a56457f5f8fa7c2698ffae8545214c5b96a2cb5 upstream. The backref code was looking at nodes as well as leaves when we tried to populate extent item entries. This is not good, and although we go away with it for the most part because we'd skip where disk_bytenr != random_memory, sometimes random_memory would match and suddenly boom. This fixes that problem. Thanks, Signed-off-by: Josef Bacik <jbacik@fb.com> Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-30fs: btrfs: volumes.c: Fix for possible null pointer dereferenceRickard Strandqvist
commit 8321cf2596d283821acc466377c2b85bcd3422b7 upstream. There is otherwise a risk of a possible null pointer dereference. Was largely found by using a static code analysis program called cppcheck. Signed-off-by: Rickard Strandqvist <rickard_strandqvist@spectrumdigital.se> Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-30Btrfs: send, don't error in the presence of subvols/snapshotsFilipe Manana
commit 1af56070e3ef9477dbc7eba3b9ad7446979c7974 upstream. If we are doing an incremental send and the base snapshot has a directory with name X that doesn't exist anymore in the second snapshot and a new subvolume/snapshot exists in the second snapshot that has the same name as the directory (name X), the incremental send would fail with -ENOENT error. This is because it attempts to lookup for an inode with a number matching the objectid of a root, which doesn't exist. Steps to reproduce: mkfs.btrfs -f /dev/sdd mount /dev/sdd /mnt mkdir /mnt/testdir btrfs subvolume snapshot -r /mnt /mnt/mysnap1 rmdir /mnt/testdir btrfs subvolume create /mnt/testdir btrfs subvolume snapshot -r /mnt /mnt/mysnap2 btrfs send -p /mnt/mysnap1 /mnt/mysnap2 -f /tmp/send.data A test case for xfstests follows. Reported-by: Robert White <rwhite@pobox.com> Signed-off-by: Filipe David Borba Manana <fdmanana@gmail.com> Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-30Btrfs: set right total device count for seeding supportWang Shilong
commit 298658414a2f0bea1f05a81876a45c1cd96aa2e0 upstream. Seeding device support allows us to create a new filesystem based on existed filesystem. However newly created filesystem's @total_devices should include seed devices. This patch fix the following problem: # mkfs.btrfs -f /dev/sdb # btrfstune -S 1 /dev/sdb # mount /dev/sdb /mnt # btrfs device add -f /dev/sdc /mnt --->fs_devices->total_devices = 1 # umount /mnt # mount /dev/sdc /mnt --->fs_devices->total_devices = 2 This is because we record right @total_devices in superblock, but @fs_devices->total_devices is reset to be 0 in btrfs_prepare_sprout(). Fix this problem by not resetting @fs_devices->total_devices. Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-30Btrfs: mark mapping with error flag to report errors to userspaceLiu Bo
commit 5dca6eea91653e9949ce6eb9e9acab6277e2f2c4 upstream. According to commit 865ffef3797da2cac85b3354b5b6050dc9660978 (fs: fix fsync() error reporting), it's not stable to just check error pages because pages can be truncated or invalidated, we should also mark mapping with error flag so that a later fsync can catch the error. Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-30Btrfs: fix NULL pointer crash of deleting a seed deviceLiu Bo
commit 29cc83f69c8338ff8fd1383c9be263d4bdf52d73 upstream. Same as normal devices, seed devices should be initialized with fs_info->dev_root as well, otherwise we'll get a NULL pointer crash. Cc: Chris Murphy <lists@colorremedies.com> Reported-by: Chris Murphy <lists@colorremedies.com> Signed-off-by: Liu Bo <bo.li.liu@oracle.com> Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-30Btrfs: make sure there are not any read requests before stopping workersWang Shilong
commit de348ee022175401e77d7662b7ca6e231a94e3fd upstream. In close_ctree(), after we have stopped all workers,there maybe still some read requests(for example readahead) to submit and this *maybe* trigger an oops that user reported before: kernel BUG at fs/btrfs/async-thread.c:619! By hacking codes, i can reproduce this problem with one cpu available. We fix this potential problem by invalidating all btree inode pages before stopping all workers. Thanks to Miao for pointing out this problem. Signed-off-by: Wang Shilong <wangsl.fnst@cn.fujitsu.com> Reviewed-by: David Sterba <dsterba@suse.cz> Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-30Btrfs: output warning instead of error when loading free space cache failedMiao Xie
commit 32d6b47fe6fc1714d5f1bba1b9f38e0ab0ad58a8 upstream. If we fail to load a free space cache, we can rebuild it from the extent tree, so it is not a serious error, we should not output a error message that would make the users uncomfortable. This patch uses warning message instead of it. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-30btrfs: Add ctime/mtime update for btrfs device add/remove.Qu Wenruo
commit 5a1972bd9fd4b2fb1bac8b7a0b636d633d8717e3 upstream. Btrfs will send uevent to udev inform the device change, but ctime/mtime for the block device inode is not udpated, which cause libblkid used by btrfs-progs unable to detect device change and use old cache, causing 'btrfs dev scan; btrfs dev rmove; btrfs dev scan' give an error message. Reported-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Cc: Karel Zak <kzak@redhat.com> Signed-off-by: Qu Wenruo <quwenruo@cn.fujitsu.com> Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-30Btrfs: fix double free in find_lock_delalloc_rangeChris Mason
commit 7d78874273463a784759916fc3e0b4e2eb141c70 upstream. We need to NULL the cached_state after freeing it, otherwise we might free it again if find_delalloc_range doesn't find anything. Signed-off-by: Chris Mason <clm@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-30CIFS: Fix memory leaks in SMB2_openPavel Shilovsky
commit 663a962151593c69374776e8651238d0da072459 upstream. Signed-off-by: Pavel Shilovsky <pshilovsky@samba.org> Reviewed-by: Shirish Pargaonkar <spargaonkar@suse.com> Signed-off-by: Steve French <smfrench@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-30aio: fix kernel memory disclosure in io_getevents() introduced in v3.10Benjamin LaHaise
commit edfbbf388f293d70bf4b7c0bc38774d05e6f711a upstream. A kernel memory disclosure was introduced in aio_read_events_ring() in v3.10 by commit a31ad380bed817aa25f8830ad23e1a0480fef797. The changes made to aio_read_events_ring() failed to correctly limit the index into ctx->ring_pages[], allowing an attacked to cause the subsequent kmap() of an arbitrary page with a copy_to_user() to copy the contents into userspace. This vulnerability has been assigned CVE-2014-0206. Thanks to Mateusz and Petr for disclosing this issue. This patch applies to v3.12+. A separate backport is needed for 3.10/3.11. Signed-off-by: Benjamin LaHaise <bcrl@kvack.org> Cc: Mateusz Guzik <mguzik@redhat.com> Cc: Petr Matousek <pmatouse@redhat.com> Cc: Kent Overstreet <kmo@daterainc.com> Cc: Jeff Moyer <jmoyer@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-30aio: fix aio request leak when events are reaped by userspaceBenjamin LaHaise
commit f8567a3845ac05bb28f3c1b478ef752762bd39ef upstream. The aio cleanups and optimizations by kmo that were merged into the 3.10 tree added a regression for userspace event reaping. Specifically, the reference counts are not decremented if the event is reaped in userspace, leading to the application being unable to submit further aio requests. This patch applies to 3.12+. A separate backport is required for 3.10/3.11. This issue was uncovered as part of CVE-2014-0206. Signed-off-by: Benjamin LaHaise <bcrl@kvack.org> Cc: Kent Overstreet <kmo@daterainc.com> Cc: Mateusz Guzik <mguzik@redhat.com> Cc: Petr Matousek <pmatouse@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-30genirq: Sanitize spurious interrupt detection of threaded irqsThomas Gleixner
commit 1e77d0a1ed7417d2a5a52a7b8d32aea1833faa6c upstream. Till reported that the spurious interrupt detection of threaded interrupts is broken in two ways: - note_interrupt() is called for each action thread of a shared interrupt line. That's wrong as we are only interested whether none of the device drivers felt responsible for the interrupt, but by calling multiple times for a single interrupt line we account IRQ_NONE even if one of the drivers felt responsible. - note_interrupt() when called from the thread handler is not serialized. That leaves the members of irq_desc which are used for the spurious detection unprotected. To solve this we need to defer the spurious detection of a threaded interrupt to the next hardware interrupt context where we have implicit serialization. If note_interrupt is called with action_ret == IRQ_WAKE_THREAD, we check whether the previous interrupt requested a deferred check. If not, we request a deferred check for the next hardware interrupt and return. If set, we check whether one of the interrupt threads signaled success. Depending on this information we feed the result into the spurious detector. If one primary handler of a shared interrupt returns IRQ_HANDLED we disable the deferred check of irq threads on the same line, as we have found at least one device driver who cared. Reported-by: Till Straumann <strauman@slac.stanford.edu> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Austin Schuh <austin@peloton-tech.com> Cc: Oliver Hartkopp <socketcan@hartkopp.net> Cc: Wolfgang Grandegger <wg@grandegger.com> Cc: Pavel Pisa <pisa@cmp.felk.cvut.cz> Cc: Marc Kleine-Budde <mkl@pengutronix.de> Cc: linux-can@vger.kernel.org Link: http://lkml.kernel.org/r/alpine.LFD.2.02.1303071450130.22263@ionos Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-30x86, x32: Use compat shims for io_{setup,submit}Mike Frysinger
commit 7fd44dacdd803c0bbf38bf478d51d280902bb0f1 upstream. The io_setup takes a pointer to a context id of type aio_context_t. This in turn is typed to a __kernel_ulong_t. We could tweak the exported headers to define this as a 64bit quantity for specific ABIs, but since we already have a 32bit compat shim for the x86 ABI, let's just re-use that logic. The libaio package is also written to expect this as a pointer type, so a compat shim would simplify that. The io_submit func operates on an array of pointers to iocb structs. Padding out the array to be 64bit aligned is a huge pain, so convert it over to the existing compat shim too. We don't convert io_getevents to the compat func as its only purpose is to handle the timespec struct, and the x32 ABI uses 64bit times. With this change, the libaio package can now pass its testsuite when built for the x32 ABI. Signed-off-by: Mike Frysinger <vapier@gentoo.org> Link: http://lkml.kernel.org/r/1399250595-5005-1-git-send-email-vapier@gentoo.org Cc: H.J. Lu <hjl.tools@gmail.com> Signed-off-by: H. Peter Anvin <hpa@zytor.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-30x86-32, espfix: Remove filter for espfix32 due to raceH. Peter Anvin
commit 246f2d2ee1d715e1077fc47d61c394569c8ee692 upstream. It is not safe to use LAR to filter when to go down the espfix path, because the LDT is per-process (rather than per-thread) and another thread might change the descriptors behind our back. Fortunately it is always *safe* (if a bit slow) to go down the espfix path, and a 32-bit LDT stack segment is extremely rare. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Link: http://lkml.kernel.org/r/1398816946-3351-1-git-send-email-hpa@linux.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-30arm64/dma: Removing ARCH_HAS_DMA_GET_REQUIRED_MASK macroSuravee Suthikulpanit
commit f3a183cb422574014538017b5b291a416396f97e upstream. Arm64 does not define dma_get_required_mask() function. Therefore, it should not define the ARCH_HAS_DMA_GET_REQUIRED_MASK. This causes build errors in some device drivers (e.g. mpt2sas) Signed-off-by: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com> Signed-off-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-30ARM: mvebu: DT: fix OpenBlocks AX3-4 RAM sizeJason Cooper
commit e47043aea3853a74a9aa5726a1faa916d7462ab7 upstream. The OpenBlocks AX3-4 has a non-DT bootloader. It also comes with 1GB of soldered on RAM, and a DIMM slot for expansion. Unfortunately, atags_to_fdt() doesn't work in big-endian mode, so we see the following failure when attempting to boot a big-endian kernel: 686 slab pages 17 pages shared 0 pages swap cached [ pid ] uid tgid total_vm rss nr_ptes swapents oom_score_adj name Kernel panic - not syncing: Out of memory and no killable processes... CPU: 1 PID: 351 Comm: kworker/u4:0 Not tainted 3.15.0-rc8-next-20140603 #1 [<c0215a54>] (unwind_backtrace) from [<c021160c>] (show_stack+0x10/0x14) [<c021160c>] (show_stack) from [<c0802500>] (dump_stack+0x78/0x94) [<c0802500>] (dump_stack) from [<c0800068>] (panic+0x90/0x21c) [<c0800068>] (panic) from [<c02b5704>] (out_of_memory+0x320/0x340) [<c02b5704>] (out_of_memory) from [<c02b93a0>] (__alloc_pages_nodemask+0x874/0x930) [<c02b93a0>] (__alloc_pages_nodemask) from [<c02d446c>] (handle_mm_fault+0x744/0x96c) [<c02d446c>] (handle_mm_fault) from [<c02cf250>] (__get_user_pages+0xd0/0x4c0) [<c02cf250>] (__get_user_pages) from [<c02f3598>] (get_arg_page+0x54/0xbc) [<c02f3598>] (get_arg_page) from [<c02f3878>] (copy_strings+0x278/0x29c) [<c02f3878>] (copy_strings) from [<c02f38bc>] (copy_strings_kernel+0x20/0x28) [<c02f38bc>] (copy_strings_kernel) from [<c02f4f1c>] (do_execve+0x3a8/0x4c8) [<c02f4f1c>] (do_execve) from [<c025ac10>] (____call_usermodehelper+0x15c/0x194) [<c025ac10>] (____call_usermodehelper) from [<c020e9b8>] (ret_from_fork+0x14/0x3c) CPU0: stopping CPU: 0 PID: 0 Comm: swapper/0 Not tainted 3.15.0-rc8-next-20140603 #1 [<c0215a54>] (unwind_backtrace) from [<c021160c>] (show_stack+0x10/0x14) [<c021160c>] (show_stack) from [<c0802500>] (dump_stack+0x78/0x94) [<c0802500>] (dump_stack) from [<c021429c>] (handle_IPI+0x138/0x174) [<c021429c>] (handle_IPI) from [<c02087f0>] (armada_370_xp_handle_irq+0xb0/0xcc) [<c02087f0>] (armada_370_xp_handle_irq) from [<c0212100>] (__irq_svc+0x40/0x50) Exception stack(0xc0b6bf68 to 0xc0b6bfb0) bf60: e9fad598 00000000 00f509a3 00000000 c0b6a000 c0b724c4 bf80: c0b72458 c0b6a000 00000000 00000000 c0b66da0 c0b6a000 00000000 c0b6bfb0 bfa0: c027bb94 c027bb24 60000313 ffffffff [<c0212100>] (__irq_svc) from [<c027bb24>] (cpu_startup_entry+0x54/0x214) [<c027bb24>] (cpu_startup_entry) from [<c0ac5b30>] (start_kernel+0x318/0x37c) [<c0ac5b30>] (start_kernel) from [<00208078>] (0x208078) ---[ end Kernel panic - not syncing: Out of memory and no killable processes... A similar failure will also occur if ARM_ATAG_DTB_COMPAT isn't selected. Fix this by setting a sane default (1 GB) in the dts file. Signed-off-by: Jason Cooper <jason@lakedaemon.net> Tested-by: Kevin Hilman <khilman@linaro.org> Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-30SCSI: Fix spurious request sense in error handlingJames Bottomley
commit d555a2abf3481f81303d835046a5ec2c4fb3ca8e upstream. We unconditionally execute scsi_eh_get_sense() to make sure all failed commands that should have sense attached, do. However, the routine forgets that some commands, because of the way they fail, will not have any sense code ... we should not bother them with a REQUEST_SENSE command. Fix this by testing to see if we actually got a CHECK_CONDITION return and skip asking for sense if we don't. Tested-by: Alan Stern <stern@rowland.harvard.edu> Signed-off-by: James Bottomley <JBottomley@Parallels.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-30target: Explicitly clear ramdisk_mcp backend pagesNicholas A. Bellinger
[Note that a different patch to address the same issue went in during v3.15-rc1 (commit 4442dc8a), but includes a bunch of other changes that don't strictly apply to fixing the bug] This patch changes rd_allocate_sgl_table() to explicitly clear ramdisk_mcp backend memory pages by passing __GFP_ZERO into alloc_pages(). This addresses a potential security issue where reading from a ramdisk_mcp could return sensitive information, and follows what >= v3.15 does to explicitly clear ramdisk_mcp memory at backend device initialization time. Reported-by: Jorge Daniel Sequeira Matias <jdsm@tecnico.ulisboa.pt> Cc: Jorge Daniel Sequeira Matias <jdsm@tecnico.ulisboa.pt> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-30target: Report correct response length for some commandsRoland Dreier
commit 2426bd456a61407388b6e61fc5f98dbcbebc50e2 upstream. When an initiator sends an allocation length bigger than what its command consumes, the target should only return the actual response data and set the residual length to the unused part of the allocation length. Add a helper function that command handlers (INQUIRY, READ CAPACITY, etc) can use to do this correctly, and use this code to get the correct residual for commands that don't use the full initiator allocation in the handlers for READ CAPACITY, READ CAPACITY(16), INQUIRY, MODE SENSE and REPORT LUNS. This addresses a handful of failures as reported by Christophe with the Windows Certification Kit: http://permalink.gmane.org/gmane.linux.scsi.target.devel/6515 Signed-off-by: Roland Dreier <roland@purestorage.com> Tested-by: Christophe Vu-Brugier <cvubrugier@yahoo.fr> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-30Target/iscsi: Fix sendtargets response pdu for iser transportSagi Grimberg
commit 22c7aaa57e80853b4904a46c18f97db0036a3b97 upstream. In case the transport is iser we should not include the iscsi target info in the sendtargets text response pdu. This causes sendtargets response to include the target info twice. Modify iscsit_build_sendtargets_response to filter transport types that don't match. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Reported-by: Slava Shwartsman <valyushash@gmail.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-30iscsi-target: Fix ABORT_TASK + connection reset iscsi_queue_req memory leakNicholas Bellinger
commit bbc050488525e1ab1194c27355f63c66814385b8 upstream. This patch fixes a iscsi_queue_req memory leak when ABORT_TASK response has been queued by TFO->queue_tm_rsp() -> lio_queue_tm_rsp() after a long standing I/O completes, but the connection has already reset and waiting for cleanup to complete in iscsit_release_commands_from_conn() -> transport_generic_free_cmd() -> transport_wait_for_tasks() code. It moves iscsit_free_queue_reqs_for_conn() after the per-connection command list has been released, so that the associated se_cmd tag can be completed + released by target-core before freeing any remaining iscsi_queue_req memory for the connection generated by lio_queue_tm_rsp(). Cc: Thomas Glanzmann <thomas@glanzmann.de> Cc: Charalampos Pournaris <charpour@gmail.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-30target: Use complete_all for se_cmd->t_transport_stop_compNicholas Bellinger
commit a95d6511303b848da45ee27b35018bb58087bdc6 upstream. This patch fixes a bug where multiple waiters on ->t_transport_stop_comp occurs due to a concurrent ABORT_TASK and session reset both invoking transport_wait_for_tasks(), while waiting for the associated se_cmd descriptor backend processing to complete. For this case, complete_all() should be invoked in order to wake up both waiters in core_tmr_abort_task() + transport_generic_free_cmd() process contexts. Cc: Thomas Glanzmann <thomas@glanzmann.de> Cc: Charalampos Pournaris <charpour@gmail.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-30target: Set CMD_T_ACTIVE bit for Task Management RequestsNicholas Bellinger
commit f15e9cd910c4d9da7de43f2181f362082fc45f0f upstream. This patch fixes a bug where se_cmd descriptors associated with a Task Management Request (TMR) where not setting CMD_T_ACTIVE before being dispatched into target_tmr_work() process context. This is required in order for transport_generic_free_cmd() -> transport_wait_for_tasks() to wait on se_cmd->t_transport_stop_comp if a session reset event occurs while an ABORT_TASK is outstanding waiting for another I/O to complete. Cc: Thomas Glanzmann <thomas@glanzmann.de> Cc: Charalampos Pournaris <charpour@gmail.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-30Target/iser: Wait for proper cleanup before unloadingSagi Grimberg
commit f5ebec9629cf78eeeea4b8258882a9f439ab2404 upstream. disconnected_handler works are scheduled on system_wq. When attempting to unload, first make sure all works have cleaned up. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-30Target/iser: Improve cm events handlingSagi Grimberg
commit 88c4015fda6d014392f76d3b1688347950d7a12d upstream. There are 4 RDMA_CM events that all basically mean that the user should teardown the IB connection: - DISCONNECTED - ADDR_CHANGE - DEVICE_REMOVAL - TIMEWAIT_EXIT Only in DISCONNECTED/ADDR_CHANGE it makes sense to call rdma_disconnect (send DREQ/DREP to our initiator). So we keep the same teardown handler for all of them but only indicate calling rdma_disconnect for the relevant events. This patch also removes redundant debug prints for each single event. v2 changes: - Call isert_disconnected_handler() for DEVICE_REMOVAL (Or + Sag) Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-30Target/iser: Fix hangs in connection teardownSagi Grimberg
commit 9d49f5e284e700576f3b65f1e28dea8539da6661 upstream. In ungraceful teardowns isert close flows seem racy such that isert_wait_conn hangs as RDMA_CM_EVENT_DISCONNECTED never gets invoked (no one called rdma_disconnect). Both graceful and ungraceful teardowns will have rx flush errors (isert posts a batch once connection is established). Once all flush errors are consumed we invoke isert_wait_conn and it will be responsible for calling rdma_disconnect. This way it can be sure that rdma_disconnect was called and it won't wait forever. This patch also removes the logout_posted indicator. either the logout completion was consumed and no problem decrementing the post_send_buf_count, or it was consumed as a flush error. no point of keeping it for isert_wait_conn as there is no danger that isert_conn will be accidentally removed while it is running. (Drop unnecessary sleep_on_conn_wait_comp check in isert_cq_rx_comp_err - nab) Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-30Target/iser: Bail from accept_np if np_thread is trying to closeSagi Grimberg
commit e346ab343f4f58c12a96725c7b13df9cc2ad56f6 upstream. In case np_thread state is in RESET/SHUTDOWN/EXIT states, no point for isert to stall there as we may get a hang in case no one will wake it up later. Signed-off-by: Sagi Grimberg <sagig@mellanox.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-30Bluetooth: Fix L2CAP deadlockJukka Taimisto
commit 8a96f3cd22878fc0bb564a8478a6e17c0b8dca73 upstream. -[0x01 Introduction We have found a programming error causing a deadlock in Bluetooth subsystem of Linux kernel. The problem is caused by missing release_sock() call when L2CAP connection creation fails due full accept queue. The issue can be reproduced with 3.15-rc5 kernel and is also present in earlier kernels. -[0x02 Details The problem occurs when multiple L2CAP connections are created to a PSM which contains listening socket (like SDP) and left pending, for example, configuration (the underlying ACL link is not disconnected between connections). When L2CAP connection request is received and listening socket is found the l2cap_sock_new_connection_cb() function (net/bluetooth/l2cap_sock.c) is called. This function locks the 'parent' socket and then checks if the accept queue is full. 1178 lock_sock(parent); 1179 1180 /* Check for backlog size */ 1181 if (sk_acceptq_is_full(parent)) { 1182 BT_DBG("backlog full %d", parent->sk_ack_backlog); 1183 return NULL; 1184 } If case the accept queue is full NULL is returned, but the 'parent' socket is not released. Thus when next L2CAP connection request is received the code blocks on lock_sock() since the parent is still locked. Also note that for connections already established and waiting for configuration to complete a timeout will occur and l2cap_chan_timeout() (net/bluetooth/l2cap_core.c) will be called. All threads calling this function will also be blocked waiting for the channel mutex since the thread which is waiting on lock_sock() alread holds the channel mutex. We were able to reproduce this by sending continuously L2CAP connection request followed by disconnection request containing invalid CID. This left the created connections pending configuration. After the deadlock occurs it is impossible to kill bluetoothd, btmon will not get any more data etc. requiring reboot to recover. -[0x03 Fix Releasing the 'parent' socket when l2cap_sock_new_connection_cb() returns NULL seems to fix the issue. Signed-off-by: Jukka Taimisto <jtt@codenomicon.com> Reported-by: Tommi Mäkilä <tmakila@codenomicon.com> Signed-off-by: Johan Hedberg <johan.hedberg@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-30Bluetooth: 6LoWPAN: Fix MAC address universal/local bit handlingJukka Rissanen
commit 62bbd5b35994eaf30519f126765d7f6af9cd3526 upstream. The universal/local bit handling was incorrectly done in the code. So when setting EUI address from BD address we do this: - If BD address type is PUBLIC, then we clear the universal bit in EUI address. If the address type is RANDOM, then the universal bit is set (BT 6lowpan draft chapter 3.2.2) - After this we invert the universal/local bit according to RFC 2464 When figuring out BD address we do the reverse: - Take EUI address from stateless IPv6 address, invert the universal/local bit according to RFC 2464 - If universal bit is 1 in this modified EUI address, then address type is set to RANDOM, otherwise it is PUBLIC Note that 6lowpan_iphc.[ch] does the final toggling of U/L bit before sending or receiving the network packet. Signed-off-by: Jukka Rissanen <jukka.rissanen@linux.intel.com> Signed-off-by: Marcel Holtmann <marcel@holtmann.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-30bluetooth: hci_ldisc: fix deadlock conditionFelipe Balbi
commit da64c27d3c93ee9f89956b9de86c4127eb244494 upstream. LDISCs shouldn't call tty->ops->write() from within ->write_wakeup(). ->write_wakeup() is called with port lock taken and IRQs disabled, tty->ops->write() will try to acquire the same port lock and we will deadlock. Acked-by: Marcel Holtmann <marcel@holtmann.org> Reviewed-by: Peter Hurley <peter@hurleysoftware.com> Reported-by: Huang Shijie <b32955@freescale.com> Signed-off-by: Felipe Balbi <balbi@ti.com> Tested-by: Andreas Bießmann <andreas@biessmann.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-30PM / OPP: fix incorrect OPP count handling in of_init_opp_tableChander Kashyap
commit 086abb58590a4df73e8a6ed71fd418826937cd46 upstream. In of_init_opp_table function, if a failure to add an OPP is detected, the count of OPPs, yet to be added is not updated. Fix this by decrementing this count on failure as well. Signed-off-by: Chander Kashyap <k.chander@samsung.com> Signed-off-by: Inderpal Singh <inderpal.s@samsung.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Acked-by: Nishanth Menon <nm@ti.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-30ARM: 8037/1: mm: support big-endian page tablesJianguo Wu
commit 86f40622af7329375e38f282f6c0aab95f3e5f72 upstream. When enable LPAE and big-endian in a hisilicon board, while specify mem=384M mem=512M@7680M, will get bad page state: Freeing unused kernel memory: 180K (c0466000 - c0493000) BUG: Bad page state in process init pfn:fa442 page:c7749840 count:0 mapcount:-1 mapping: (null) index:0x0 page flags: 0x40000400(reserved) Modules linked in: CPU: 0 PID: 1 Comm: init Not tainted 3.10.27+ #66 [<c000f5f0>] (unwind_backtrace+0x0/0x11c) from [<c000cbc4>] (show_stack+0x10/0x14) [<c000cbc4>] (show_stack+0x10/0x14) from [<c009e448>] (bad_page+0xd4/0x104) [<c009e448>] (bad_page+0xd4/0x104) from [<c009e520>] (free_pages_prepare+0xa8/0x14c) [<c009e520>] (free_pages_prepare+0xa8/0x14c) from [<c009f8ec>] (free_hot_cold_page+0x18/0xf0) [<c009f8ec>] (free_hot_cold_page+0x18/0xf0) from [<c00b5444>] (handle_pte_fault+0xcf4/0xdc8) [<c00b5444>] (handle_pte_fault+0xcf4/0xdc8) from [<c00b6458>] (handle_mm_fault+0xf4/0x120) [<c00b6458>] (handle_mm_fault+0xf4/0x120) from [<c0013754>] (do_page_fault+0xfc/0x354) [<c0013754>] (do_page_fault+0xfc/0x354) from [<c0008400>] (do_DataAbort+0x2c/0x90) [<c0008400>] (do_DataAbort+0x2c/0x90) from [<c0008fb4>] (__dabt_usr+0x34/0x40) The bad pfn:fa442 is not system memory(mem=384M mem=512M@7680M), after debugging, I find in page fault handler, will get wrong pfn from pte just after set pte, as follow: do_anonymous_page() { ... set_pte_at(mm, address, page_table, entry); //debug code pfn = pte_pfn(entry); pr_info("pfn:0x%lx, pte:0x%llxn", pfn, pte_val(entry)); //read out the pte just set new_pte = pte_offset_map(pmd, address); new_pfn = pte_pfn(*new_pte); pr_info("new pfn:0x%lx, new pte:0x%llxn", pfn, pte_val(entry)); ... } pfn: 0x1fa4f5, pte:0xc00001fa4f575f new_pfn:0xfa4f5, new_pte:0xc00000fa4f5f5f //new pfn/pte is wrong. The bug is happened in cpu_v7_set_pte_ext(ptep, pte): An LPAE PTE is a 64bit quantity, passed to cpu_v7_set_pte_ext in the r2 and r3 registers. On an LE kernel, r2 contains the LSB of the PTE, and r3 the MSB. On a BE kernel, the assignment is reversed. Unfortunately, the current code always assumes the LE case, leading to corruption of the PTE when clearing/setting bits. This patch fixes this issue much like it has been done already in the cpu_v7_switch_mm case. Signed-off-by: Jianguo Wu <wujianguo@huawei.com> Acked-by: Marc Zyngier <marc.zyngier@arm.com> Acked-by: Will Deacon <will.deacon@arm.com> Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-30ARM: stacktrace: avoid listing stacktrace functions in stacktraceRussell King
commit 3683f44c42e991d313dc301504ee0fca1aeb8580 upstream. While debugging the FEC ethernet driver using stacktrace, it was noticed that the stacktraces always begin as follows: [<c00117b4>] save_stack_trace_tsk+0x0/0x98 [<c0011870>] save_stack_trace+0x24/0x28 ... This is because the stack trace code includes the stack frames for itself. This is incorrect behaviour, and also leads to "skip" doing the wrong thing (which is the number of stack frames to avoid recording.) Perversely, it does the right thing when passed a non-current thread. Fix this by ensuring that we have a known constant number of frames above the main stack trace function, and always skip these. Signed-off-by: Russell King <rmk+kernel@arm.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-30media: saa7134: fix regression with tvtimeHans Verkuil
commit 17e7f1b515803e1a79b246688aacbddd2e34165d upstream. This solves this bug: https://bugzilla.kernel.org/show_bug.cgi?id=73361 The problem is that when you quit tvtime it calls STREAMOFF, but then it queues a bunch of buffers for no good reason before closing the file descriptor. In the past closing the fd would free the vb queue since that was part of the file handle struct. Since that was moved to the global struct that no longer happened. This wouldn't be a problem, but the extra QBUF calls that tvtime does meant that the buffer list in videobuf (q->stream) contained buffers, so REQBUFS would fail with -EBUSY. The solution is to init the list head explicitly when releasing the file descriptor and to not free the video resource when calling streamoff. The real fix will hopefully go into kernel 3.16 when the vb2 conversion is merged. Basically the saa7134 driver with the old videobuf is so full of holes it ain't funny anymore, so consider this a band-aid for kernels 3.14 and 15. Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com> Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-30media: radio-bcm2048: fix wrong overflow checkPali Rohár
commit 5d60122b7e30f275593df93b39a76d3c2663cfc2 upstream. This patch fixes an off by one check in bcm2048_set_region(). Reported-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Pali Rohár <pali.rohar@gmail.com> Signed-off-by: Pavel Machek <pavel@ucw.cz> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com> Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-30media: uvcvideo: Fix clock param realtime settingOlivier Langlois
commit 3b35fc81e7ec552147a4fd843d0da0bbbe4ef253 upstream. timestamps in v4l2 buffers returned to userspace are updated in uvc_video_clock_update() which uses timestamps fetched from uvc_video_clock_decode() by calling unconditionally ktime_get_ts(). Hence setting the module clock param to realtime has no effect before this patch. This has been tested with ffmpeg: ffmpeg -y -f v4l2 -input_format yuyv422 -video_size 640x480 -framerate 30 -i /dev/video0 \ -f alsa -acodec pcm_s16le -ar 16000 -ac 1 -i default \ -c:v libx264 -preset ultrafast \ -c:a libfdk_aac \ out.mkv and inspecting the v4l2 input starting timestamp. Signed-off-by: Olivier Langlois <olivier@trillion01.com> Signed-off-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Mauro Carvalho Chehab <m.chehab@samsung.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2014-06-30rtmutex: Plug slow unlock raceThomas Gleixner
commit 27e35715df54cbc4f2d044f681802ae30479e7fb upstream. When the rtmutex fast path is enabled the slow unlock function can create the following situation: spin_lock(foo->m->wait_lock); foo->m->owner = NULL; rt_mutex_lock(foo->m); <-- fast path free = atomic_dec_and_test(foo->refcnt); rt_mutex_unlock(foo->m); <-- fast path if (free) kfree(foo); spin_unlock(foo->m->wait_lock); <--- Use after free. Plug the race by changing the slow unlock to the following scheme: while (!rt_mutex_has_waiters(m)) { /* Clear the waiters bit in m->owner */ clear_rt_mutex_waiters(m); owner = rt_mutex_owner(m); spin_unlock(m->wait_lock); if (cmpxchg(m->owner, owner, 0) == owner) return; spin_lock(m->wait_lock); } So in case of a new waiter incoming while the owner tries the slow path unlock we have two situations: unlock(wait_lock); lock(wait_lock); cmpxchg(p, owner, 0) == owner mark_rt_mutex_waiters(lock); acquire(lock); Or: unlock(wait_lock); lock(wait_lock); mark_rt_mutex_waiters(lock); cmpxchg(p, owner, 0) != owner enqueue_waiter(); unlock(wait_lock); lock(wait_lock); wakeup_next waiter(); unlock(wait_lock); lock(wait_lock); acquire(lock); If the fast path is disabled, then the simple m->owner = NULL; unlock(m->wait_lock); is sufficient as all access to m->owner is serialized via m->wait_lock; Also document and clarify the wakeup_next_waiter function as suggested by Oleg Nesterov. Reported-by: Steven Rostedt <rostedt@goodmis.org> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Steven Rostedt <rostedt@goodmis.org> Cc: Peter Zijlstra <peterz@infradead.org> Link: http://lkml.kernel.org/r/20140611183852.937945560@linutronix.de Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>