summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2017-03-26Linux 4.4.57v4.4.57Greg Kroah-Hartman
2017-03-26ext4: fix fencepost in s_first_meta_bg validationTheodore Ts'o
commit 2ba3e6e8afc9b6188b471f27cf2b5e3cf34e7af2 upstream. It is OK for s_first_meta_bg to be equal to the number of block group descriptor blocks. (It rarely happens, but it shouldn't cause any problems.) https://bugzilla.kernel.org/show_bug.cgi?id=194567 Fixes: 3a4b77cd47bb837b8557595ec7425f281f2ca1fe Signed-off-by: Theodore Ts'o <tytso@mit.edu> Cc: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-26percpu: acquire pcpu_lock when updating pcpu_nr_empty_pop_pagesTahsin Erdogan
commit 320661b08dd6f1746d5c7ab4eb435ec64b97cd45 upstream. Update to pcpu_nr_empty_pop_pages in pcpu_alloc() is currently done without holding pcpu_lock. This can lead to bad updates to the variable. Add missing lock calls. Fixes: b539b87fed37 ("percpu: implmeent pcpu_nr_empty_pop_pages and chunk->nr_populated") Signed-off-by: Tahsin Erdogan <tahsin@google.com> Signed-off-by: Tejun Heo <tj@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-26gfs2: Avoid alignment hole in struct lm_locknameAndreas Gruenbacher
commit 28ea06c46fbcab63fd9a55531387b7928a18a590 upstream. Commit 88ffbf3e03 switches to using rhashtables for glocks, hashing over the entire struct lm_lockname instead of its individual fields. On some architectures, struct lm_lockname contains a hole of uninitialized memory due to alignment rules, which now leads to incorrect hash values. Get rid of that hole. Signed-off-by: Andreas Gruenbacher <agruenba@redhat.com> Signed-off-by: Bob Peterson <rpeterso@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-26isdn/gigaset: fix NULL-deref at probeJohan Hovold
commit 68c32f9c2a36d410aa242e661506e5b2c2764179 upstream. Make sure to check the number of endpoints to avoid dereferencing a NULL-pointer should a malicious device lack endpoints. Fixes: cf7776dc05b8 ("[PATCH] isdn4linux: Siemens Gigaset drivers - direct USB connection") Cc: Hansjoerg Lipp <hjlipp@web.de> Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-26target: Fix VERIFY_16 handling in sbc_parse_cdbMax Lohrmann
commit 13603685c1f12c67a7a2427f00b63f39a2b6f7c9 upstream. As reported by Max, the Windows 2008 R2 chkdsk utility expects VERIFY_16 to be supported, and does not handle the returned CHECK_CONDITION properly, resulting in an infinite loop. The kernel will log huge amounts of this error: kernel: TARGET_CORE[iSCSI]: Unsupported SCSI Opcode 0x8f, sending CHECK_CONDITION. Signed-off-by: Max Lohrmann <post@wickenrode.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-26scsi: libiscsi: add lock around task lists to fix list corruption regressionChris Leech
commit 6f8830f5bbab16e54f261de187f3df4644a5b977 upstream. There's a rather long standing regression from the commit "libiscsi: Reduce locking contention in fast path" Depending on iSCSI target behavior, it's possible to hit the case in iscsi_complete_task where the task is still on a pending list (!list_empty(&task->running)). When that happens the task is removed from the list while holding the session back_lock, but other task list modification occur under the frwd_lock. That leads to linked list corruption and eventually a panicked system. Rather than back out the session lock split entirely, in order to try and keep some of the performance gains this patch adds another lock to maintain the task lists integrity. Major enterprise supported kernels have been backing out the lock split for while now, thanks to the efforts at IBM where a lab setup has the most reliable reproducer I've seen on this issue. This patch has been tested there successfully. Signed-off-by: Chris Leech <cleech@redhat.com> Fixes: 659743b02c41 ("[SCSI] libiscsi: Reduce locking contention in fast path") Reported-by: Prashantha Subbarao <psubbara@us.ibm.com> Reviewed-by: Guilherme G. Piccoli <gpiccoli@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-26scsi: lpfc: Add shutdown method for kexecAnton Blanchard
commit 85e8a23936ab3442de0c42da97d53b29f004ece1 upstream. We see lpfc devices regularly fail during kexec. Fix this by adding a shutdown method which mirrors the remove method. Signed-off-by: Anton Blanchard <anton@samba.org> Reviewed-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com> Tested-by: Mauricio Faria de Oliveira <mauricfo@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-26target/pscsi: Fix TYPE_TAPE + TYPE_MEDIMUM_CHANGER exportNicholas Bellinger
commit a04e54f2c35823ca32d56afcd5cea5b783e2f51a upstream. The following fixes a divide by zero OOPs with TYPE_TAPE due to pscsi_tape_read_blocksize() failing causing a zero sd->sector_size being propigated up via dev_attrib.hw_block_size. It also fixes another long-standing bug where TYPE_TAPE and TYPE_MEDIMUM_CHANGER where using pscsi_create_type_other(), which does not call scsi_device_get() to take the device reference. Instead, rename pscsi_create_type_rom() to pscsi_create_type_nondisk() and use it for all cases. Finally, also drop a dump_stack() in pscsi_get_blocks() for non TYPE_DISK, which in modern target-core can get invoked via target_sense_desc_format() during CHECK_CONDITION. Reported-by: Malcolm Haak <insanemal@gmail.com> Signed-off-by: Nicholas Bellinger <nab@linux-iscsi.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-26md/raid1/10: fix potential deadlockShaohua Li
commit 61eb2b43b99ebdc9bc6bc83d9792257b243e7cb3 upstream. Neil Brown pointed out a potential deadlock in raid 10 code with bio_split/chain. The raid1 code could have the same issue, but recent barrier rework makes it less likely to happen. The deadlock happens in below sequence: 1. generic_make_request(bio), this will set current->bio_list 2. raid10_make_request will split bio to bio1 and bio2 3. __make_request(bio1), wait_barrer, add underlayer disk bio to current->bio_list 4. __make_request(bio2), wait_barrer If raise_barrier happens between 3 & 4, since wait_barrier runs at 3, raise_barrier waits for IO completion from 3. And since raise_barrier sets barrier, 4 waits for raise_barrier. But IO from 3 can't be dispatched because raid10_make_request() doesn't finished yet. The solution is to adjust the IO ordering. Quotes from Neil: " It is much safer to: if (need to split) { split = bio_split(bio, ...) bio_chain(...) make_request_fn(split); generic_make_request(bio); } else make_request_fn(mddev, bio); This way we first process the initial section of the bio (in 'split') which will queue some requests to the underlying devices. These requests will be queued in generic_make_request. Then we queue the remainder of the bio, which will be added to the end of the generic_make_request queue. Then we return. generic_make_request() will pop the lower-level device requests off the queue and handle them first. Then it will process the remainder of the original bio once the first section has been fully processed. " Note, this only happens in read path. In write path, the bio is flushed to underlaying disks either by blk flush (from schedule) or offladed to raid1/10d. It's queued in current->bio_list. Cc: Coly Li <colyli@suse.de> Suggested-by: NeilBrown <neilb@suse.com> Reviewed-by: Jack Wang <jinpu.wang@profitbricks.com> Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-26powerpc/boot: Fix zImage TOC alignmentMichael Ellerman
commit 97ee351b50a49717543533cfb85b4bf9d88c9680 upstream. Recent toolchains force the TOC to be 256 byte aligned. We need to enforce this alignment in the zImage linker script, otherwise pointers to our TOC variables (__toc_start) could be incorrect. If the actual start of the TOC and __toc_start don't have the same value we crash early in the zImage wrapper. Suggested-by: Alan Modra <amodra@gmail.com> Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-26cpufreq: Fix and clean up show_cpuinfo_cur_freq()Rafael J. Wysocki
commit 9b4f603e7a9f4282aec451063ffbbb8bb410dcd9 upstream. There is a missing newline in show_cpuinfo_cur_freq(), so add it, but while at it clean that function up somewhat too. Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Acked-by: Viresh Kumar <viresh.kumar@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-26perf/core: Fix event inheritance on fork()Peter Zijlstra
commit e7cc4865f0f31698ef2f7aac01a50e78968985b7 upstream. While hunting for clues to a use-after-free, Oleg spotted that perf_event_init_context() can loose an error value with the result that fork() can succeed even though we did not fully inherit the perf event context. Spotted-by: Oleg Nesterov <oleg@redhat.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@kernel.org> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Dmitry Vyukov <dvyukov@google.com> Cc: Frederic Weisbecker <fweisbec@gmail.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Mathieu Desnoyers <mathieu.desnoyers@efficios.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Vince Weaver <vincent.weaver@maine.edu> Cc: oleg@redhat.com Fixes: 889ff0150661 ("perf/core: Split context's event group list into pinned and non-pinned lists") Link: http://lkml.kernel.org/r/20170316125823.190342547@infradead.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-26give up on gcc ilog2() constant optimizationsLinus Torvalds
commit 474c90156c8dcc2fa815e6716cc9394d7930cb9c upstream. gcc-7 has an "optimization" pass that completely screws up, and generates the code expansion for the (impossible) case of calling ilog2() with a zero constant, even when the code gcc compiles does not actually have a zero constant. And we try to generate a compile-time error for anybody doing ilog2() on a constant where that doesn't make sense (be it zero or negative). So now gcc7 will fail the build due to our sanity checking, because it created that constant-zero case that didn't actually exist in the source code. There's a whole long discussion on the kernel mailing about how to work around this gcc bug. The gcc people themselevs have discussed their "feature" in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=72785 but it's all water under the bridge, because while it looked at one point like it would be solved by the time gcc7 was released, that was not to be. So now we have to deal with this compiler braindamage. And the only simple approach seems to be to just delete the code that tries to warn about bad uses of ilog2(). So now "ilog2()" will just return 0 not just for the value 1, but for any non-positive value too. It's not like I can recall anybody having ever actually tried to use this function on any invalid value, but maybe the sanity check just meant that such code never made it out in public. Reported-by: Laura Abbott <labbott@redhat.com> Cc: John Stultz <john.stultz@linaro.org>, Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Ard Biesheuvel <ard.biesheuvel@linaro.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Jiri Slaby <jslaby@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-26kernek/fork.c: allocate idle task for a CPU always on its local nodeAndi Kleen
commit 725fc629ff2545b061407305ae51016c9f928fce upstream. Linux preallocates the task structs of the idle tasks for all possible CPUs. This currently means they all end up on node 0. This also implies that the cache line of MWAIT, which is around the flags field in the task struct, are all located in node 0. We see a noticeable performance improvement on Knights Landing CPUs when the cache lines used for MWAIT are located in the local nodes of the CPUs using them. I would expect this to give a (likely slight) improvement on other systems too. The patch implements placing the idle task in the node of its CPUs, by passing the right target node to copy_process() [akpm@linux-foundation.org: use NUMA_NO_NODE, not a bare -1] Link: http://lkml.kernel.org/r/1463492694-15833-1-git-send-email-andi@firstfloor.org Signed-off-by: Andi Kleen <ak@linux.intel.com> Cc: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Cc: Sumit Semwal <sumit.semwal@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-26hv_netvsc: use skb_get_hash() instead of a homegrown implementationVitaly Kuznetsov
commit 757647e10e55c01fb7a9c4356529442e316a7c72 upstream. Recent changes to 'struct flow_keys' (e.g commit d34af823ff40 ("net: Add VLAN ID to flow_keys")) introduced a performance regression in netvsc driver. Is problem is, however, not the above mentioned commit but the fact that netvsc_set_hash() function did some assumptions on the struct flow_keys data layout and this is wrong. Get rid of netvsc_set_hash() by switching to skb_get_hash(). This change will also imply switching to Jenkins hash from the currently used Toeplitz but it seems there is no good excuse for Toeplitz to stay. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Cc: Sumit Semwal <sumit.semwal@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-26tpm_tis: Use devm_free_irq not free_irqJason Gunthorpe
commit 727f28b8ca24a581c7bd868326b8cea1058c720a upstream. The interrupt is always allocated with devm_request_irq so it must always be freed with devm_free_irq. Fixes: 448e9c55c12d ("tpm_tis: verify interrupt during init") Signed-off-by: Jason Gunthorpe <jgunthorpe@obsidianresearch.com> Acked-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Tested-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Tested-by: Martin Wilck <Martin.Wilck@ts.fujitsu.com> Signed-off-by: Jarkko Sakkinen <jarkko.sakkinen@linux.intel.com> Acked-by: Peter Huewe <peterhuewe@gmx.de> Cc: Sumit Semwal <sumit.semwal@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-26drm/amdgpu: add missing irq.h includeDave Airlie
commit e9c5e7402dad6f4f04c2430db6f283512bcd4392 upstream. this fixes the build on arm. Signed-off-by: Dave Airlie <airlied@redhat.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-26s390/pci: fix use after free in dma_initSebastian Ott
commit dba599091c191d209b1499511a524ad9657c0e5a upstream. After a failure during registration of the dma_table (because of the function being in error state) we free its memory but don't reset the associated pointer to zero. When we then receive a notification from firmware (about the function being in error state) we'll try to walk and free the dma_table again. Fix this by resetting the dma_table pointer. In addition to that make sure that we free the iommu_bitmap when appropriate. Signed-off-by: Sebastian Ott <sebott@linux.vnet.ibm.com> Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-26KVM: PPC: Book3S PR: Fix illegal opcode emulationThomas Huth
commit 708e75a3ee750dce1072134e630d66c4e6eaf63c upstream. If kvmppc_handle_exit_pr() calls kvmppc_emulate_instruction() to emulate one instruction (in the BOOK3S_INTERRUPT_H_EMUL_ASSIST case), it calls kvmppc_core_queue_program() afterwards if kvmppc_emulate_instruction() returned EMULATE_FAIL, so the guest gets an program interrupt for the illegal opcode. However, the kvmppc_emulate_instruction() also tried to inject a program exception for this already, so the program interrupt gets injected twice and the return address in srr0 gets destroyed. All other callers of kvmppc_emulate_instruction() are also injecting a program interrupt, and since the callers have the right knowledge about the srr1 flags that should be used, it is the function kvmppc_emulate_instruction() that should _not_ inject program interrupts, so remove the kvmppc_core_queue_program() here. This fixes the issue discovered by Laurent Vivier with kvm-unit-tests where the logs are filled with these messages when the test tries to execute an illegal instruction: Couldn't emulate instruction 0x00000000 (op 0 xop 0) kvmppc_handle_exit_pr: emulation at 700 failed (00000000) Signed-off-by: Thomas Huth <thuth@redhat.com> Reviewed-by: Alexander Graf <agraf@suse.de> Tested-by: Laurent Vivier <lvivier@redhat.com> Signed-off-by: Paul Mackerras <paulus@ozlabs.org> Cc: Sumit Semwal <sumit.semwal@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-26xen/qspinlock: Don't kick CPU if IRQ is not initializedRoss Lagerwall
commit 707e59ba494372a90d245f18b0c78982caa88e48 upstream. The following commit: 1fb3a8b2cfb2 ("xen/spinlock: Fix locking path engaging too soon under PVHVM.") ... moved the initalization of the kicker interrupt until after native_cpu_up() is called. However, when using qspinlocks, a CPU may try to kick another CPU that is spinning (because it has not yet initialized its kicker interrupt), resulting in the following crash during boot: kernel BUG at /build/linux-Ay7j_C/linux-4.4.0/drivers/xen/events/events_base.c:1210! invalid opcode: 0000 [#1] SMP ... RIP: 0010:[<ffffffff814c97c9>] [<ffffffff814c97c9>] xen_send_IPI_one+0x59/0x60 ... Call Trace: [<ffffffff8102be9e>] xen_qlock_kick+0xe/0x10 [<ffffffff810cabc2>] __pv_queued_spin_unlock+0xb2/0xf0 [<ffffffff810ca6d1>] ? __raw_callee_save___pv_queued_spin_unlock+0x11/0x20 [<ffffffff81052936>] ? check_tsc_warp+0x76/0x150 [<ffffffff81052aa6>] check_tsc_sync_source+0x96/0x160 [<ffffffff81051e28>] native_cpu_up+0x3d8/0x9f0 [<ffffffff8102b315>] xen_hvm_cpu_up+0x35/0x80 [<ffffffff8108198c>] _cpu_up+0x13c/0x180 [<ffffffff81081a4a>] cpu_up+0x7a/0xa0 [<ffffffff81f80dfc>] smp_init+0x7f/0x81 [<ffffffff81f5a121>] kernel_init_freeable+0xef/0x212 [<ffffffff81817f30>] ? rest_init+0x80/0x80 [<ffffffff81817f3e>] kernel_init+0xe/0xe0 [<ffffffff8182488f>] ret_from_fork+0x3f/0x70 [<ffffffff81817f30>] ? rest_init+0x80/0x80 To fix this, only send the kick if the target CPU's interrupt has been initialized. This check isn't racy, because the target is waiting for the spinlock, so it won't have initialized the interrupt in the meantime. Signed-off-by: Ross Lagerwall <ross.lagerwall@citrix.com> Reviewed-by: Boris Ostrovsky <boris.ostrovsky@oracle.com> Cc: David Vrabel <david.vrabel@citrix.com> Cc: Juergen Gross <jgross@suse.com> Cc: Konrad Rzeszutek Wilk <konrad.wilk@oracle.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: linux-kernel@vger.kernel.org Cc: xen-devel@lists.xenproject.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Cc: Sumit Semwal <sumit.semwal@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-26Drivers: hv: avoid vfree() on crashVitaly Kuznetsov
commit a9f61ca793becabdefab03b77568d6c6f8c1bc79 upstream. When we crash from NMI context (e.g. after NMI injection from host when 'sysctl -w kernel.unknown_nmi_panic=1' is set) we hit kernel BUG at mm/vmalloc.c:1530! as vfree() is denied. While the issue could be solved with in_nmi() check instead I opted for skipping vfree on all sorts of crashes to reduce the amount of work which can cause consequent crashes. We don't really need to free anything on crash. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-26Drivers: hv: balloon: don't crash when memory is added in non-sorted orderVitaly Kuznetsov
commit 77c0c9735bc0ba5898e637a3a20d6bcb50e3f67d upstream. When we iterate through all HA regions in handle_pg_range() we have an assumption that all these regions are sorted in the list and the 'start_pfn >= has->end_pfn' check is enough to find the proper region. Unfortunately it's not the case with WS2016 where host can hot-add regions in a different order. We end up modifying the wrong HA region and crashing later on pages online. Modify the check to make sure we found the region we were searching for while iterating. Fix the same check in pfn_covered() as well. Signed-off-by: Vitaly Kuznetsov <vkuznets@redhat.com> Signed-off-by: K. Y. Srinivasan <kys@microsoft.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-26pinctrl: cherryview: Do not mask all interrupts in probeMika Westerberg
commit bcb48cca23ec9852739e4a464307fa29515bbe48 upstream. The Cherryview GPIO controller has 8 or 16 wires connected to the I/O-APIC which can be used directly by the platform/BIOS or drivers. One such wire is used as SCI (System Control Interrupt) which ACPI depends on to be able to trigger GPEs (General Purpose Events). The pinctrl driver itself uses another IRQ resource which is wire OR of all the 8 (or 16) wires and follows what BIOS has programmed to the IntSel register of each pin. Currently the driver masks all interrupts at probe time and this prevents these direct interrupts from working as expected. The reason for this is that some early stage prototypes had some pins misconfigured causing lots of spurious interrupts. We fix this by leaving the interrupt mask untouched. This allows SCI and other direct interrupts work properly. What comes to the possible spurious interrupts we switch the default handler to be handle_bad_irq() instead of handle_simple_irq() (which was not correct anyway). Reported-by: Yu C Chen <yu.c.chen@intel.com> Reported-by: Anisse Astier <anisse@astier.eu> Signed-off-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Cc: Sumit Semwal <sumit.semwal@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-26ACPI / video: skip evaluating _DOD when it does not existAlex Hung
commit e34fbbac669de0b7fb7803929d0477f35f6e2833 upstream. Some system supports hybrid graphics and its discrete VGA does not have any connectors and therefore has no _DOD method. Signed-off-by: Alex Hung <alex.hung@canonical.com> Reviewed-by: Aaron Lu <aaron.lu@intel.com> Signed-off-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-26cxlflash: Increase cmd_per_lun for better throughputManoj N. Kumar
commit 83430833b4d4a9c9b23964babbeb1f36450f8136 upstream. With the current value of cmd_per_lun at 16, the throughput over a single adapter is limited to around 150kIOPS. Increase the value of cmd_per_lun to 256 to improve throughput. With this change a single adapter is able to attain close to the maximum throughput (380kIOPS). Also change the number of RRQ entries that can be queued. Signed-off-by: Manoj N. Kumar <manoj@linux.vnet.ibm.com> Acked-by: Matthew R. Ochs <mrochs@linux.vnet.ibm.com> Reviewed-by: Uma Krishnan <ukrishn@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-26crypto: mcryptd - Fix load failureWang, Rui Y
commit ddef482420b1ba8ec45e6123a7e8d3f67b21e5e3 upstream. mcryptd_create_hash() fails by returning -EINVAL, causing any driver using mcryptd to fail to load. It is because it needs to set its statesize properly. Signed-off-by: Rui Wang <rui.y.wang@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cc: Sumit Semwal <sumit.semwal@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-26crypto: cryptd - Assign statesize properlyWang, Rui Y
commit 1a07834024dfca5c4bed5de8f8714306e0a11836 upstream. cryptd_create_hash() fails by returning -EINVAL. It is because after 8996eafdc ("crypto: ahash - ensure statesize is non-zero") all ahash drivers must have a non-zero statesize. This patch fixes the problem by properly assigning the statesize. Signed-off-by: Rui Wang <rui.y.wang@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cc: Sumit Semwal <sumit.semwal@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-26crypto: ghash-clmulni - Fix load failureWang, Rui Y
commit 3a020a723c65eb8ffa7c237faca26521a024e582 upstream. ghash_clmulni_intel fails to load on Linux 4.3+ with the following message: "modprobe: ERROR: could not insert 'ghash_clmulni_intel': Invalid argument" After 8996eafdc ("crypto: ahash - ensure statesize is non-zero") all ahash drivers are required to implement import()/export(), and must have a non- zero statesize. This patch has been tested with the algif_hash interface. The calculated digest values, after several rounds of import()s and export()s, match those calculated by tcrypt. Signed-off-by: Rui Wang <rui.y.wang@intel.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Cc: Sumit Semwal <sumit.semwal@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-26USB: don't free bandwidth_mutex too earlyAlan Stern
commit ab2a4bf83902c170d29ba130a8abb5f9d90559e1 upstream. The USB core contains a bug that can show up when a USB-3 host controller is removed. If the primary (USB-2) hcd structure is released before the shared (USB-3) hcd, the core will try to do a double-free of the common bandwidth_mutex. The problem was described in graphical form by Chung-Geol Kim, who first reported it: ================================================= At *remove USB(3.0) Storage sequence <1> --> <5> ((Problem Case)) ================================================= VOLD ------------------------------------|------------ (uevent) ________|_________ |<1> | |dwc3_otg_sm_work | |usb_put_hcd | |peer_hcd(kref=2)| |__________________| ________|_________ |<2> | |New USB BUS #2 | | | |peer_hcd(kref=1) | | | --(Link)-bandXX_mutex| | |__________________| | ___________________ | |<3> | | |dwc3_otg_sm_work | | |usb_put_hcd | | |primary_hcd(kref=1)| | |___________________| | _________|_________ | |<4> | | |New USB BUS #1 | | |hcd_release | | |primary_hcd(kref=0)| | | | | |bandXX_mutex(free) |<- |___________________| (( VOLD )) ______|___________ |<5> | | SCSI | |usb_put_hcd | |peer_hcd(kref=0) | |*hcd_release | |bandXX_mutex(free*)|<- double free |__________________| ================================================= This happens because hcd_release() frees the bandwidth_mutex whenever it sees a primary hcd being released (which is not a very good idea in any case), but in the course of releasing the primary hcd, it changes the pointers in the shared hcd in such a way that the shared hcd will appear to be primary when it gets released. This patch fixes the problem by changing hcd_release() so that it deallocates the bandwidth_mutex only when the _last_ hcd structure referencing it is released. The patch also removes an unnecessary test, so that when an hcd is released, both the shared_hcd and primary_hcd pointers in the hcd's peer will be cleared. Signed-off-by: Alan Stern <stern@rowland.harvard.edu> Reported-by: Chung-Geol Kim <chunggeol.kim@samsung.com> Tested-by: Chung-Geol Kim <chunggeol.kim@samsung.com> Cc: Sumit Semwal <sumit.semwal@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-26usb: core: hub: hub_port_init lock controller instead of busChris Bainbridge
commit feb26ac31a2a5cb88d86680d9a94916a6343e9e6 upstream. The XHCI controller presents two USB buses to the system - one for USB2 and one for USB3. The hub init code (hub_port_init) is reentrant but only locks one bus per thread, leading to a race condition failure when two threads attempt to simultaneously initialise a USB2 and USB3 device: [ 8.034843] xhci_hcd 0000:00:14.0: Timeout while waiting for setup device command [ 13.183701] usb 3-3: device descriptor read/all, error -110 On a test system this failure occurred on 6% of all boots. The call traces at the point of failure are: Call Trace: [<ffffffff81b9bab7>] schedule+0x37/0x90 [<ffffffff817da7cd>] usb_kill_urb+0x8d/0xd0 [<ffffffff8111e5e0>] ? wake_up_atomic_t+0x30/0x30 [<ffffffff817dafbe>] usb_start_wait_urb+0xbe/0x150 [<ffffffff817db10c>] usb_control_msg+0xbc/0xf0 [<ffffffff817d07de>] hub_port_init+0x51e/0xb70 [<ffffffff817d4697>] hub_event+0x817/0x1570 [<ffffffff810f3e6f>] process_one_work+0x1ff/0x620 [<ffffffff810f3dcf>] ? process_one_work+0x15f/0x620 [<ffffffff810f4684>] worker_thread+0x64/0x4b0 [<ffffffff810f4620>] ? rescuer_thread+0x390/0x390 [<ffffffff810fa7f5>] kthread+0x105/0x120 [<ffffffff810fa6f0>] ? kthread_create_on_node+0x200/0x200 [<ffffffff81ba183f>] ret_from_fork+0x3f/0x70 [<ffffffff810fa6f0>] ? kthread_create_on_node+0x200/0x200 Call Trace: [<ffffffff817fd36d>] xhci_setup_device+0x53d/0xa40 [<ffffffff817fd87e>] xhci_address_device+0xe/0x10 [<ffffffff817d047f>] hub_port_init+0x1bf/0xb70 [<ffffffff811247ed>] ? trace_hardirqs_on+0xd/0x10 [<ffffffff817d4697>] hub_event+0x817/0x1570 [<ffffffff810f3e6f>] process_one_work+0x1ff/0x620 [<ffffffff810f3dcf>] ? process_one_work+0x15f/0x620 [<ffffffff810f4684>] worker_thread+0x64/0x4b0 [<ffffffff810f4620>] ? rescuer_thread+0x390/0x390 [<ffffffff810fa7f5>] kthread+0x105/0x120 [<ffffffff810fa6f0>] ? kthread_create_on_node+0x200/0x200 [<ffffffff81ba183f>] ret_from_fork+0x3f/0x70 [<ffffffff810fa6f0>] ? kthread_create_on_node+0x200/0x200 Which results from the two call chains: hub_port_init usb_get_device_descriptor usb_get_descriptor usb_control_msg usb_internal_control_msg usb_start_wait_urb usb_submit_urb / wait_for_completion_timeout / usb_kill_urb hub_port_init hub_set_address xhci_address_device xhci_setup_device Mathias Nyman explains the current behaviour violates the XHCI spec: hub_port_reset() will end up moving the corresponding xhci device slot to default state. As hub_port_reset() is called several times in hub_port_init() it sounds reasonable that we could end up with two threads having their xhci device slots in default state at the same time, which according to xhci 4.5.3 specs still is a big no no: "Note: Software shall not transition more than one Device Slot to the Default State at a time" So both threads fail at their next task after this. One fails to read the descriptor, and the other fails addressing the device. Fix this in hub_port_init by locking the USB controller (instead of an individual bus) to prevent simultaneous initialisation of both buses. Fixes: 638139eb95d2 ("usb: hub: allow to process more usb hub events in parallel") Link: https://lkml.org/lkml/2016/2/8/312 Link: https://lkml.org/lkml/2016/2/4/748 Signed-off-by: Chris Bainbridge <chris.bainbridge@gmail.com> Cc: stable <stable@vger.kernel.org> Acked-by: Mathias Nyman <mathias.nyman@linux.intel.com> Signed-off-by: Sumit Semwal <sumit.semwal@linaro.org> [sumits: minor merge conflict resolution for linux-4.4.y] Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-22Linux 4.4.56v4.4.56Greg Kroah-Hartman
2017-03-22futex: Add missing error handling to FUTEX_REQUEUE_PIPeter Zijlstra
commit 9bbb25afeb182502ca4f2c4f3f88af0681b34cae upstream. Thomas spotted that fixup_pi_state_owner() can return errors and we fail to unlock the rt_mutex in that case. Reported-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Darren Hart <dvhart@linux.intel.com> Cc: juri.lelli@arm.com Cc: bigeasy@linutronix.de Cc: xlpang@redhat.com Cc: rostedt@goodmis.org Cc: mathieu.desnoyers@efficios.com Cc: jdesfossez@efficios.com Cc: dvhart@infradead.org Cc: bristot@redhat.com Link: http://lkml.kernel.org/r/20170304093558.867401760@infradead.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-22futex: Fix potential use-after-free in FUTEX_REQUEUE_PIPeter Zijlstra
commit c236c8e95a3d395b0494e7108f0d41cf36ec107c upstream. While working on the futex code, I stumbled over this potential use-after-free scenario. Dmitry triggered it later with syzkaller. pi_mutex is a pointer into pi_state, which we drop the reference on in unqueue_me_pi(). So any access to that pointer after that is bad. Since other sites already do rt_mutex_unlock() with hb->lock held, see for example futex_lock_pi(), simply move the unlock before unqueue_me_pi(). Reported-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> Reviewed-by: Darren Hart <dvhart@linux.intel.com> Cc: juri.lelli@arm.com Cc: bigeasy@linutronix.de Cc: xlpang@redhat.com Cc: rostedt@goodmis.org Cc: mathieu.desnoyers@efficios.com Cc: jdesfossez@efficios.com Cc: dvhart@infradead.org Cc: bristot@redhat.com Link: http://lkml.kernel.org/r/20170304093558.801744246@infradead.org Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-22x86/perf: Fix CR4.PCE propagation to use active_mm instead of mmAndy Lutomirski
commit 5dc855d44c2ad960a86f593c60461f1ae1566b6d upstream. If one thread mmaps a perf event while another thread in the same mm is in some context where active_mm != mm (which can happen in the scheduler, for example), refresh_pce() would write the wrong value to CR4.PCE. This broke some PAPI tests. Reported-and-tested-by: Vince Weaver <vincent.weaver@maine.edu> Signed-off-by: Andy Lutomirski <luto@kernel.org> Cc: Alexander Shishkin <alexander.shishkin@linux.intel.com> Cc: Arnaldo Carvalho de Melo <acme@redhat.com> Cc: Borislav Petkov <bpetkov@suse.de> Cc: H. Peter Anvin <hpa@zytor.com> Cc: Jiri Olsa <jolsa@redhat.com> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Stephane Eranian <eranian@google.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: stable@vger.kernel.org Fixes: 7911d3f7af14 ("perf/x86: Only allow rdpmc if a perf_event is mapped") Link: http://lkml.kernel.org/r/0c5b38a76ea50e405f9abe07a13dfaef87c173a1.1489694270.git.luto@kernel.org Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-22x86/kasan: Fix boot with KASAN=y and PROFILE_ANNOTATED_BRANCHES=yAndrey Ryabinin
commit be3606ff739d1c1be36389f8737c577ad87e1f57 upstream. The kernel doesn't boot with both PROFILE_ANNOTATED_BRANCHES=y and KASAN=y options selected. With branch profiling enabled we end up calling ftrace_likely_update() before kasan_early_init(). ftrace_likely_update() is built with KASAN instrumentation, so calling it before kasan has been initialized leads to crash. Use DISABLE_BRANCH_PROFILING define to make sure that we don't call ftrace_likely_update() from early code before kasan_early_init(). Fixes: ef7f0d6a6ca8 ("x86_64: add KASan support") Reported-by: Fengguang Wu <fengguang.wu@intel.com> Signed-off-by: Andrey Ryabinin <aryabinin@virtuozzo.com> Cc: kasan-dev@googlegroups.com Cc: Alexander Potapenko <glider@google.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: lkp@01.org Cc: Dmitry Vyukov <dvyukov@google.com> Link: http://lkml.kernel.org/r/20170313163337.1704-1-aryabinin@virtuozzo.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-22fscrypto: lock inode while setting encryption policyEric Biggers
commit 8906a8223ad4909b391c5628f7991ebceda30e52 upstream. i_rwsem needs to be acquired while setting an encryption policy so that concurrent calls to FS_IOC_SET_ENCRYPTION_POLICY are correctly serialized (especially the ->get_context() + ->set_context() pair), and so that new files cannot be created in the directory during or after the ->empty_dir() check. Signed-off-by: Eric Biggers <ebiggers@google.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Richard Weinberger <richard@nod.at> Cc: stable@vger.kernel.org Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-22fscrypt: fix renaming and linking special filesEric Biggers
commit 42d97eb0ade31e1bc537d086842f5d6e766d9d51 upstream. Attempting to link a device node, named pipe, or socket file into an encrypted directory through rename(2) or link(2) always failed with EPERM. This happened because fscrypt_has_permitted_context() saw that the file was unencrypted and forbid creating the link. This behavior was unexpected because such files are never encrypted; only regular files, directories, and symlinks can be encrypted. To fix this, make fscrypt_has_permitted_context() always return true on special files. This will be covered by a test in my encryption xfstests patchset. Fixes: 9bd8212f981e ("ext4 crypto: add encryption policy and password salt support") Signed-off-by: Eric Biggers <ebiggers@google.com> Reviewed-by: Richard Weinberger <richard@nod.at> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-22net sched actions: decrement module reference count after table flush.Roman Mashak
[ Upstream commit edb9d1bff4bbe19b8ae0e71b1f38732591a9eeb2 ] When tc actions are loaded as a module and no actions have been installed, flushing them would result in actions removed from the memory, but modules reference count not being decremented, so that the modules would not be unloaded. Following is example with GACT action: % sudo modprobe act_gact % lsmod Module Size Used by act_gact 16384 0 % % sudo tc actions ls action gact % % sudo tc actions flush action gact % lsmod Module Size Used by act_gact 16384 1 % sudo tc actions flush action gact % lsmod Module Size Used by act_gact 16384 2 % sudo rmmod act_gact rmmod: ERROR: Module act_gact is in use .... After the fix: % lsmod Module Size Used by act_gact 16384 0 % % sudo tc actions add action pass index 1 % sudo tc actions add action pass index 2 % sudo tc actions add action pass index 3 % lsmod Module Size Used by act_gact 16384 3 % % sudo tc actions flush action gact % lsmod Module Size Used by act_gact 16384 0 % % sudo tc actions flush action gact % lsmod Module Size Used by act_gact 16384 0 % sudo rmmod act_gact % lsmod Module Size Used by % Fixes: f97017cdefef ("net-sched: Fix actions flushing") Signed-off-by: Roman Mashak <mrv@mojatatu.com> Signed-off-by: Jamal Hadi Salim <jhs@mojatatu.com> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-22dccp: fix memory leak during tear-down of unsuccessful connection requestHannes Frederic Sowa
[ Upstream commit 72ef9c4125c7b257e3a714d62d778ab46583d6a3 ] This patch fixes a memory leak, which happens if the connection request is not fulfilled between parsing the DCCP options and handling the SYN (because e.g. the backlog is full), because we forgot to free the list of ack vectors. Reported-by: Jianwen Ji <jiji@redhat.com> Signed-off-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-22dccp/tcp: fix routing redirect raceJon Maxwell
[ Upstream commit 45caeaa5ac0b4b11784ac6f932c0ad4c6b67cda0 ] As Eric Dumazet pointed out this also needs to be fixed in IPv6. v2: Contains the IPv6 tcp/Ipv6 dccp patches as well. We have seen a few incidents lately where a dst_enty has been freed with a dangling TCP socket reference (sk->sk_dst_cache) pointing to that dst_entry. If the conditions/timings are right a crash then ensues when the freed dst_entry is referenced later on. A Common crashing back trace is: #8 [] page_fault at ffffffff8163e648 [exception RIP: __tcp_ack_snd_check+74] . . #9 [] tcp_rcv_established at ffffffff81580b64 #10 [] tcp_v4_do_rcv at ffffffff8158b54a #11 [] tcp_v4_rcv at ffffffff8158cd02 #12 [] ip_local_deliver_finish at ffffffff815668f4 #13 [] ip_local_deliver at ffffffff81566bd9 #14 [] ip_rcv_finish at ffffffff8156656d #15 [] ip_rcv at ffffffff81566f06 #16 [] __netif_receive_skb_core at ffffffff8152b3a2 #17 [] __netif_receive_skb at ffffffff8152b608 #18 [] netif_receive_skb at ffffffff8152b690 #19 [] vmxnet3_rq_rx_complete at ffffffffa015eeaf [vmxnet3] #20 [] vmxnet3_poll_rx_only at ffffffffa015f32a [vmxnet3] #21 [] net_rx_action at ffffffff8152bac2 #22 [] __do_softirq at ffffffff81084b4f #23 [] call_softirq at ffffffff8164845c #24 [] do_softirq at ffffffff81016fc5 #25 [] irq_exit at ffffffff81084ee5 #26 [] do_IRQ at ffffffff81648ff8 Of course it may happen with other NIC drivers as well. It's found the freed dst_entry here: 224 static bool tcp_in_quickack_mode(struct sock *sk)↩ 225 {↩ 226 ▹ const struct inet_connection_sock *icsk = inet_csk(sk);↩ 227 ▹ const struct dst_entry *dst = __sk_dst_get(sk);↩ 228 ↩ 229 ▹ return (dst && dst_metric(dst, RTAX_QUICKACK)) ||↩ 230 ▹ ▹ (icsk->icsk_ack.quick && !icsk->icsk_ack.pingpong);↩ 231 }↩ But there are other backtraces attributed to the same freed dst_entry in netfilter code as well. All the vmcores showed 2 significant clues: - Remote hosts behind the default gateway had always been redirected to a different gateway. A rtable/dst_entry will be added for that host. Making more dst_entrys with lower reference counts. Making this more probable. - All vmcores showed a postitive LockDroppedIcmps value, e.g: LockDroppedIcmps 267 A closer look at the tcp_v4_err() handler revealed that do_redirect() will run regardless of whether user space has the socket locked. This can result in a race condition where the same dst_entry cached in sk->sk_dst_entry can be decremented twice for the same socket via: do_redirect()->__sk_dst_check()-> dst_release(). Which leads to the dst_entry being prematurely freed with another socket pointing to it via sk->sk_dst_cache and a subsequent crash. To fix this skip do_redirect() if usespace has the socket locked. Instead let the redirect take place later when user space does not have the socket locked. The dccp/IPv6 code is very similar in this respect, so fixing it there too. As Eric Garver pointed out the following commit now invalidates routes. Which can set the dst->obsolete flag so that ipv4_dst_check() returns null and triggers the dst_release(). Fixes: ceb3320610d6 ("ipv4: Kill routes during PMTU/redirect updates.") Cc: Eric Garver <egarver@redhat.com> Cc: Hannes Sowa <hsowa@redhat.com> Signed-off-by: Jon Maxwell <jmaxwell37@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-22bridge: drop netfilter fake rtable unconditionallyFlorian Westphal
[ Upstream commit a13b2082ece95247779b9995c4e91b4246bed023 ] Andreas reports kernel oops during rmmod of the br_netfilter module. Hannes debugged the oops down to a NULL rt6info->rt6i_indev. Problem is that br_netfilter has the nasty concept of adding a fake rtable to skb->dst; this happens in a br_netfilter prerouting hook. A second hook (in bridge LOCAL_IN) is supposed to remove these again before the skb is handed up the stack. However, on module unload hooks get unregistered which means an skb could traverse the prerouting hook that attaches the fake_rtable, while the 'fake rtable remove' hook gets removed from the hooklist immediately after. Fixes: 34666d467cbf1e2e3c7 ("netfilter: bridge: move br_netfilter out of the core") Reported-by: Andreas Karis <akaris@redhat.com> Debugged-by: Hannes Frederic Sowa <hannes@stressinduktion.org> Signed-off-by: Florian Westphal <fw@strlen.de> Acked-by: Pablo Neira Ayuso <pablo@netfilter.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-22ipv6: avoid write to a possibly cloned skbFlorian Westphal
[ Upstream commit 79e49503efe53a8c51d8b695bedc8a346c5e4a87 ] ip6_fragment, in case skb has a fraglist, checks if the skb is cloned. If it is, it will move to the 'slow path' and allocates new skbs for each fragment. However, right before entering the slowpath loop, it updates the nexthdr value of the last ipv6 extension header to NEXTHDR_FRAGMENT, to account for the fragment header that will be inserted in the new ipv6-fragment skbs. In case original skb is cloned this munges nexthdr value of another skb. Avoid this by doing the nexthdr update for each of the new fragment skbs separately. This was observed with tcpdump on a bridge device where netfilter ipv6 reassembly is active: tcpdump shows malformed fragment headers as the l4 header (icmpv6, tcp, etc). is decoded as a fragment header. Cc: Hannes Frederic Sowa <hannes@stressinduktion.org> Reported-by: Andreas Karis <akaris@redhat.com> Signed-off-by: Florian Westphal <fw@strlen.de> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-22ipv6: make ECMP route replacement less greedySabrina Dubroca
[ Upstream commit 67e194007be08d071294456274dd53e0a04fdf90 ] Commit 27596472473a ("ipv6: fix ECMP route replacement") introduced a loop that removes all siblings of an ECMP route that is being replaced. However, this loop doesn't stop when it has replaced siblings, and keeps removing other routes with a higher metric. We also end up triggering the WARN_ON after the loop, because after this nsiblings < 0. Instead, stop the loop when we have taken care of all routes with the same metric as the route being replaced. Reproducer: =========== #!/bin/sh ip netns add ns1 ip netns add ns2 ip -net ns1 link set lo up for x in 0 1 2 ; do ip link add veth$x netns ns2 type veth peer name eth$x netns ns1 ip -net ns1 link set eth$x up ip -net ns2 link set veth$x up done ip -net ns1 -6 r a 2000::/64 nexthop via fe80::0 dev eth0 \ nexthop via fe80::1 dev eth1 nexthop via fe80::2 dev eth2 ip -net ns1 -6 r a 2000::/64 via fe80::42 dev eth0 metric 256 ip -net ns1 -6 r a 2000::/64 via fe80::43 dev eth0 metric 2048 echo "before replace, 3 routes" ip -net ns1 -6 r | grep -v '^fe80\|^ff00' echo ip -net ns1 -6 r c 2000::/64 nexthop via fe80::4 dev eth0 \ nexthop via fe80::5 dev eth1 nexthop via fe80::6 dev eth2 echo "after replace, only 2 routes, metric 2048 is gone" ip -net ns1 -6 r | grep -v '^fe80\|^ff00' Fixes: 27596472473a ("ipv6: fix ECMP route replacement") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Acked-by: Nicolas Dichtel <nicolas.dichtel@6wind.com> Reviewed-by: Xin Long <lucien.xin@gmail.com> Reviewed-by: Michal Kubecek <mkubecek@suse.cz> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-22mpls: Send route delete notifications when router module is unloadedDavid Ahern
[ Upstream commit e37791ec1ad785b59022ae211f63a16189bacebf ] When the mpls_router module is unloaded, mpls routes are deleted but notifications are not sent to userspace leaving userspace caches out of sync. Add the call to mpls_notify_route in mpls_net_exit as routes are freed. Fixes: 0189197f44160 ("mpls: Basic routing support") Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-22act_connmark: avoid crashing on malformed nlattrs with null parmsEtienne Noss
[ Upstream commit 52491c7607c5527138095edf44c53169dc1ddb82 ] tcf_connmark_init does not check in its configuration if TCA_CONNMARK_PARMS is set, resulting in a null pointer dereference when trying to access it. [501099.043007] BUG: unable to handle kernel NULL pointer dereference at 0000000000000004 [501099.043039] IP: [<ffffffffc10c60fb>] tcf_connmark_init+0x8b/0x180 [act_connmark] ... [501099.044334] Call Trace: [501099.044345] [<ffffffffa47270e8>] ? tcf_action_init_1+0x198/0x1b0 [501099.044363] [<ffffffffa47271b0>] ? tcf_action_init+0xb0/0x120 [501099.044380] [<ffffffffa47250a4>] ? tcf_exts_validate+0xc4/0x110 [501099.044398] [<ffffffffc0f5fa97>] ? u32_set_parms+0xa7/0x270 [cls_u32] [501099.044417] [<ffffffffc0f60bf0>] ? u32_change+0x680/0x87b [cls_u32] [501099.044436] [<ffffffffa4725d1d>] ? tc_ctl_tfilter+0x4dd/0x8a0 [501099.044454] [<ffffffffa44a23a1>] ? security_capable+0x41/0x60 [501099.044471] [<ffffffffa470ca01>] ? rtnetlink_rcv_msg+0xe1/0x220 [501099.044490] [<ffffffffa470c920>] ? rtnl_newlink+0x870/0x870 [501099.044507] [<ffffffffa472cc61>] ? netlink_rcv_skb+0xa1/0xc0 [501099.044524] [<ffffffffa47073f4>] ? rtnetlink_rcv+0x24/0x30 [501099.044541] [<ffffffffa472c634>] ? netlink_unicast+0x184/0x230 [501099.044558] [<ffffffffa472c9d8>] ? netlink_sendmsg+0x2f8/0x3b0 [501099.044576] [<ffffffffa46d8880>] ? sock_sendmsg+0x30/0x40 [501099.044592] [<ffffffffa46d8e03>] ? SYSC_sendto+0xd3/0x150 [501099.044608] [<ffffffffa425fda1>] ? __do_page_fault+0x2d1/0x510 [501099.044626] [<ffffffffa47fbd7b>] ? system_call_fast_compare_end+0xc/0x9b Fixes: 22a5dc0e5e3e ("net: sched: Introduce connmark action") Signed-off-by: Étienne Noss <etienne.noss@wifirst.fr> Signed-off-by: Victorien Molle <victorien.molle@wifirst.fr> Acked-by: Cong Wang <xiyou.wangcong@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-22uapi: fix linux/packet_diag.h userspace compilation errorDmitry V. Levin
[ Upstream commit 745cb7f8a5de0805cade3de3991b7a95317c7c73 ] Replace MAX_ADDR_LEN with its numeric value to fix the following linux/packet_diag.h userspace compilation error: /usr/include/linux/packet_diag.h:67:17: error: 'MAX_ADDR_LEN' undeclared here (not in a function) __u8 pdmc_addr[MAX_ADDR_LEN]; This is not the first case in the UAPI where the numeric value of MAX_ADDR_LEN is used instead of symbolic one, uapi/linux/if_link.h already does the same: $ grep MAX_ADDR_LEN include/uapi/linux/if_link.h __u8 mac[32]; /* MAX_ADDR_LEN */ There are no UAPI headers besides these two that use MAX_ADDR_LEN. Signed-off-by: Dmitry V. Levin <ldv@altlinux.org> Acked-by: Pavel Emelyanov <xemul@virtuozzo.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-22vrf: Fix use-after-free in vrf_xmitDavid Ahern
[ Upstream commit f7887d40e541f74402df0684a1463c0a0bb68c68 ] KASAN detected a use-after-free: [ 269.467067] BUG: KASAN: use-after-free in vrf_xmit+0x7f1/0x827 [vrf] at addr ffff8800350a21c0 [ 269.467067] Read of size 4 by task ssh/1879 [ 269.467067] CPU: 1 PID: 1879 Comm: ssh Not tainted 4.10.0+ #249 [ 269.467067] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.7.5-20140531_083030-gandalf 04/01/2014 [ 269.467067] Call Trace: [ 269.467067] dump_stack+0x81/0xb6 [ 269.467067] kasan_object_err+0x21/0x78 [ 269.467067] kasan_report+0x2f7/0x450 [ 269.467067] ? vrf_xmit+0x7f1/0x827 [vrf] [ 269.467067] ? ip_output+0xa4/0xdb [ 269.467067] __asan_load4+0x6b/0x6d [ 269.467067] vrf_xmit+0x7f1/0x827 [vrf] ... Which corresponds to the skb access after xmit handling. Fix by saving skb->len and using the saved value to update stats. Fixes: 193125dbd8eb2 ("net: Introduce VRF device driver") Signed-off-by: David Ahern <dsa@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-22dccp: fix use-after-free in dccp_feat_activate_valuesEric Dumazet
[ Upstream commit 62f8f4d9066c1c6f2474845d1ca7e2891f2ae3fd ] Dmitry reported crashes in DCCP stack [1] Problem here is that when I got rid of listener spinlock, I missed the fact that DCCP stores a complex state in struct dccp_request_sock, while TCP does not. Since multiple cpus could access it at the same time, we need to add protection. [1] BUG: KASAN: use-after-free in dccp_feat_activate_values+0x967/0xab0 net/dccp/feat.c:1541 at addr ffff88003713be68 Read of size 8 by task syz-executor2/8457 CPU: 2 PID: 8457 Comm: syz-executor2 Not tainted 4.10.0-rc7+ #127 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 Call Trace: <IRQ> __dump_stack lib/dump_stack.c:15 [inline] dump_stack+0x292/0x398 lib/dump_stack.c:51 kasan_object_err+0x1c/0x70 mm/kasan/report.c:162 print_address_description mm/kasan/report.c:200 [inline] kasan_report_error mm/kasan/report.c:289 [inline] kasan_report.part.1+0x20e/0x4e0 mm/kasan/report.c:311 kasan_report mm/kasan/report.c:332 [inline] __asan_report_load8_noabort+0x29/0x30 mm/kasan/report.c:332 dccp_feat_activate_values+0x967/0xab0 net/dccp/feat.c:1541 dccp_create_openreq_child+0x464/0x610 net/dccp/minisocks.c:121 dccp_v6_request_recv_sock+0x1f6/0x1960 net/dccp/ipv6.c:457 dccp_check_req+0x335/0x5a0 net/dccp/minisocks.c:186 dccp_v6_rcv+0x69e/0x1d00 net/dccp/ipv6.c:711 ip6_input_finish+0x46d/0x17a0 net/ipv6/ip6_input.c:279 NF_HOOK include/linux/netfilter.h:257 [inline] ip6_input+0xdb/0x590 net/ipv6/ip6_input.c:322 dst_input include/net/dst.h:507 [inline] ip6_rcv_finish+0x289/0x890 net/ipv6/ip6_input.c:69 NF_HOOK include/linux/netfilter.h:257 [inline] ipv6_rcv+0x12ec/0x23d0 net/ipv6/ip6_input.c:203 __netif_receive_skb_core+0x1ae5/0x3400 net/core/dev.c:4190 __netif_receive_skb+0x2a/0x170 net/core/dev.c:4228 process_backlog+0xe5/0x6c0 net/core/dev.c:4839 napi_poll net/core/dev.c:5202 [inline] net_rx_action+0xe70/0x1900 net/core/dev.c:5267 __do_softirq+0x2fb/0xb7d kernel/softirq.c:284 do_softirq_own_stack+0x1c/0x30 arch/x86/entry/entry_64.S:902 </IRQ> do_softirq.part.17+0x1e8/0x230 kernel/softirq.c:328 do_softirq kernel/softirq.c:176 [inline] __local_bh_enable_ip+0x1f2/0x200 kernel/softirq.c:181 local_bh_enable include/linux/bottom_half.h:31 [inline] rcu_read_unlock_bh include/linux/rcupdate.h:971 [inline] ip6_finish_output2+0xbb0/0x23d0 net/ipv6/ip6_output.c:123 ip6_finish_output+0x302/0x960 net/ipv6/ip6_output.c:148 NF_HOOK_COND include/linux/netfilter.h:246 [inline] ip6_output+0x1cb/0x8d0 net/ipv6/ip6_output.c:162 ip6_xmit+0xcdf/0x20d0 include/net/dst.h:501 inet6_csk_xmit+0x320/0x5f0 net/ipv6/inet6_connection_sock.c:179 dccp_transmit_skb+0xb09/0x1120 net/dccp/output.c:141 dccp_xmit_packet+0x215/0x760 net/dccp/output.c:280 dccp_write_xmit+0x168/0x1d0 net/dccp/output.c:362 dccp_sendmsg+0x79c/0xb10 net/dccp/proto.c:796 inet_sendmsg+0x164/0x5b0 net/ipv4/af_inet.c:744 sock_sendmsg_nosec net/socket.c:635 [inline] sock_sendmsg+0xca/0x110 net/socket.c:645 SYSC_sendto+0x660/0x810 net/socket.c:1687 SyS_sendto+0x40/0x50 net/socket.c:1655 entry_SYSCALL_64_fastpath+0x1f/0xc2 RIP: 0033:0x4458b9 RSP: 002b:00007f8ceb77bb58 EFLAGS: 00000282 ORIG_RAX: 000000000000002c RAX: ffffffffffffffda RBX: 0000000000000017 RCX: 00000000004458b9 RDX: 0000000000000023 RSI: 0000000020e60000 RDI: 0000000000000017 RBP: 00000000006e1b90 R08: 00000000200f9fe1 R09: 0000000000000020 R10: 0000000000008010 R11: 0000000000000282 R12: 00000000007080a8 R13: 0000000000000000 R14: 00007f8ceb77c9c0 R15: 00007f8ceb77c700 Object at ffff88003713be50, in cache kmalloc-64 size: 64 Allocated: PID = 8446 save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57 save_stack+0x43/0xd0 mm/kasan/kasan.c:502 set_track mm/kasan/kasan.c:514 [inline] kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:605 kmem_cache_alloc_trace+0x82/0x270 mm/slub.c:2738 kmalloc include/linux/slab.h:490 [inline] dccp_feat_entry_new+0x214/0x410 net/dccp/feat.c:467 dccp_feat_push_change+0x38/0x220 net/dccp/feat.c:487 __feat_register_sp+0x223/0x2f0 net/dccp/feat.c:741 dccp_feat_propagate_ccid+0x22b/0x2b0 net/dccp/feat.c:949 dccp_feat_server_ccid_dependencies+0x1b3/0x250 net/dccp/feat.c:1012 dccp_make_response+0x1f1/0xc90 net/dccp/output.c:423 dccp_v6_send_response+0x4ec/0xc20 net/dccp/ipv6.c:217 dccp_v6_conn_request+0xaba/0x11b0 net/dccp/ipv6.c:377 dccp_rcv_state_process+0x51e/0x1650 net/dccp/input.c:606 dccp_v6_do_rcv+0x213/0x350 net/dccp/ipv6.c:632 sk_backlog_rcv include/net/sock.h:893 [inline] __sk_receive_skb+0x36f/0xcc0 net/core/sock.c:479 dccp_v6_rcv+0xba5/0x1d00 net/dccp/ipv6.c:742 ip6_input_finish+0x46d/0x17a0 net/ipv6/ip6_input.c:279 NF_HOOK include/linux/netfilter.h:257 [inline] ip6_input+0xdb/0x590 net/ipv6/ip6_input.c:322 dst_input include/net/dst.h:507 [inline] ip6_rcv_finish+0x289/0x890 net/ipv6/ip6_input.c:69 NF_HOOK include/linux/netfilter.h:257 [inline] ipv6_rcv+0x12ec/0x23d0 net/ipv6/ip6_input.c:203 __netif_receive_skb_core+0x1ae5/0x3400 net/core/dev.c:4190 __netif_receive_skb+0x2a/0x170 net/core/dev.c:4228 process_backlog+0xe5/0x6c0 net/core/dev.c:4839 napi_poll net/core/dev.c:5202 [inline] net_rx_action+0xe70/0x1900 net/core/dev.c:5267 __do_softirq+0x2fb/0xb7d kernel/softirq.c:284 Freed: PID = 15 save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57 save_stack+0x43/0xd0 mm/kasan/kasan.c:502 set_track mm/kasan/kasan.c:514 [inline] kasan_slab_free+0x73/0xc0 mm/kasan/kasan.c:578 slab_free_hook mm/slub.c:1355 [inline] slab_free_freelist_hook mm/slub.c:1377 [inline] slab_free mm/slub.c:2954 [inline] kfree+0xe8/0x2b0 mm/slub.c:3874 dccp_feat_entry_destructor.part.4+0x48/0x60 net/dccp/feat.c:418 dccp_feat_entry_destructor net/dccp/feat.c:416 [inline] dccp_feat_list_pop net/dccp/feat.c:541 [inline] dccp_feat_activate_values+0x57f/0xab0 net/dccp/feat.c:1543 dccp_create_openreq_child+0x464/0x610 net/dccp/minisocks.c:121 dccp_v6_request_recv_sock+0x1f6/0x1960 net/dccp/ipv6.c:457 dccp_check_req+0x335/0x5a0 net/dccp/minisocks.c:186 dccp_v6_rcv+0x69e/0x1d00 net/dccp/ipv6.c:711 ip6_input_finish+0x46d/0x17a0 net/ipv6/ip6_input.c:279 NF_HOOK include/linux/netfilter.h:257 [inline] ip6_input+0xdb/0x590 net/ipv6/ip6_input.c:322 dst_input include/net/dst.h:507 [inline] ip6_rcv_finish+0x289/0x890 net/ipv6/ip6_input.c:69 NF_HOOK include/linux/netfilter.h:257 [inline] ipv6_rcv+0x12ec/0x23d0 net/ipv6/ip6_input.c:203 __netif_receive_skb_core+0x1ae5/0x3400 net/core/dev.c:4190 __netif_receive_skb+0x2a/0x170 net/core/dev.c:4228 process_backlog+0xe5/0x6c0 net/core/dev.c:4839 napi_poll net/core/dev.c:5202 [inline] net_rx_action+0xe70/0x1900 net/core/dev.c:5267 __do_softirq+0x2fb/0xb7d kernel/softirq.c:284 Memory state around the buggy address: ffff88003713bd00: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc ffff88003713bd80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc >ffff88003713be00: fc fc fc fc fc fc fc fc fc fc fb fb fb fb fb fb ^ Fixes: 079096f103fa ("tcp/dccp: install syn_recv requests into ehash table") Signed-off-by: Eric Dumazet <edumazet@google.com> Reported-by: Dmitry Vyukov <dvyukov@google.com> Tested-by: Dmitry Vyukov <dvyukov@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-03-22net: fix socket refcounting in skb_complete_tx_timestamp()Eric Dumazet
[ Upstream commit 9ac25fc063751379cb77434fef9f3b088cd3e2f7 ] TX skbs do not necessarily hold a reference on skb->sk->sk_refcnt By the time TX completion happens, sk_refcnt might be already 0. sock_hold()/sock_put() would then corrupt critical state, like sk_wmem_alloc and lead to leaks or use after free. Fixes: 62bccb8cdb69 ("net-timestamp: Make the clone operation stand-alone from phy timestamping") Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Alexander Duyck <alexander.h.duyck@intel.com> Cc: Johannes Berg <johannes@sipsolutions.net> Cc: Soheil Hassas Yeganeh <soheil@google.com> Cc: Willem de Bruijn <willemb@google.com> Acked-by: Soheil Hassas Yeganeh <soheil@google.com> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>