summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2019-01-08Merge remote-tracking branch 'remotes/origin/linux-4.19.y'toradex_4.19.yLuka Pivk
Signed-off-by: Luka Pivk <luka.pivk@toradex.com>
2019-01-07ARM: dts: imx7: add DMA properties for ECSPIStefan Agner
Allow to use DMA for SPI by adding the appropriate DMA properites to the ecspi nodes. Signed-off-by: Stefan Agner <stefan@agner.ch>
2019-01-07ARM: dts: imx6ul: add DMA properties for ECSPIStefan Agner
Allow to use DMA for SPI by adding the appropriate DMA properites to the ecspi nodes. Signed-off-by: Stefan Agner <stefan@agner.ch>
2019-01-03ARM: imx: don't build ssi-fiq if not requiredStefan Agner
The symbols provided by ssi-fiq are used in sound/soc/fsl/imx-pcm-fiq.c only. Build ssi-fiq.o/ssi-fiq-ksym.o only if SND_SOC_IMX_PCM_FIQ is enabled. Signed-off-by: Stefan Agner <stefan@agner.ch>
2018-12-29Linux 4.19.13Greg Kroah-Hartman
2018-12-29drm/ioctl: Fix Spectre v1 vulnerabilitiesGustavo A. R. Silva
commit 505b5240329b922f21f91d5b5d1e535c805eca6d upstream. nr is indirectly controlled by user-space, hence leading to a potential exploitation of the Spectre variant 1 vulnerability. This issue was detected with the help of Smatch: drivers/gpu/drm/drm_ioctl.c:805 drm_ioctl() warn: potential spectre issue 'dev->driver->ioctls' [r] drivers/gpu/drm/drm_ioctl.c:810 drm_ioctl() warn: potential spectre issue 'drm_ioctls' [r] (local cap) drivers/gpu/drm/drm_ioctl.c:892 drm_ioctl_flags() warn: potential spectre issue 'drm_ioctls' [r] (local cap) Fix this by sanitizing nr before using it to index dev->driver->ioctls and drm_ioctls. Notice that given that speculation windows are large, the policy is to kill the speculation on the first load and not worry if it can be completed with a dependent load/store [1]. [1] https://marc.info/?l=linux-kernel&m=152449131114778&w=2 Cc: stable@vger.kernel.org Signed-off-by: Gustavo A. R. Silva <gustavo@embeddedor.com> Signed-off-by: Daniel Vetter <daniel.vetter@ffwll.ch> Link: https://patchwork.freedesktop.org/patch/msgid/20181220000015.GA18973@embeddedor Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-29proc/sysctl: don't return ENOMEM on lookup when a table is unregisteringIvan Delalande
commit ea5751ccd665a2fd1b24f9af81f6167f0718c5f6 upstream. proc_sys_lookup can fail with ENOMEM instead of ENOENT when the corresponding sysctl table is being unregistered. In our case we see this upon opening /proc/sys/net/*/conf files while network interfaces are being deleted, which confuses our configuration daemon. The problem was successfully reproduced and this fix tested on v4.9.122 and v4.20-rc6. v2: return ERR_PTRs in all cases when proc_sys_make_inode fails instead of mixing them with NULL. Thanks Al Viro for the feedback. Fixes: ace0c791e6c3 ("proc/sysctl: Don't grab i_lock under sysctl_lock.") Cc: stable@vger.kernel.org Signed-off-by: Ivan Delalande <colona@arista.com> Signed-off-by: Al Viro <viro@zeniv.linux.org.uk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-29Input: elantech - disable elan-i2c for P52 and P72Benjamin Tissoires
commit d21ff5d7f8c397261e095393a1a8e199934720bc upstream. The current implementation of elan_i2c is known to not support those 2 laptops. A proper fix is to tweak both elantech and elan_i2c to transmit the correct information from PS/2, which would make a bad candidate for stable. So to give us some time for fixing the root of the problem, disable elan_i2c for the devices we know are not behaving properly. Link: https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1803600 Link: https://bugs.archlinux.org/task/59714 Fixes: df077237cf55 Input: elantech - detect new ICs and setup Host Notify for them Cc: stable@vger.kernel.org # v4.18+ Signed-off-by: Benjamin Tissoires <benjamin.tissoires@redhat.com> Acked-by: Peter Hutterer <peter.hutterer@who-t.net> Signed-off-by: Dmitry Torokhov <dmitry.torokhov@gmail.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-29mm: don't miss the last page because of round-off errorRoman Gushchin
commit 68600f623d69da428c6163275f97ca126e1a8ec5 upstream. I've noticed, that dying memory cgroups are often pinned in memory by a single pagecache page. Even under moderate memory pressure they sometimes stayed in such state for a long time. That looked strange. My investigation showed that the problem is caused by applying the LRU pressure balancing math: scan = div64_u64(scan * fraction[lru], denominator), where denominator = fraction[anon] + fraction[file] + 1. Because fraction[lru] is always less than denominator, if the initial scan size is 1, the result is always 0. This means the last page is not scanned and has no chances to be reclaimed. Fix this by rounding up the result of the division. In practice this change significantly improves the speed of dying cgroups reclaim. [guro@fb.com: prevent double calculation of DIV64_U64_ROUND_UP() arguments] Link: http://lkml.kernel.org/r/20180829213311.GA13501@castle Link: http://lkml.kernel.org/r/20180827162621.30187-3-guro@fb.com Signed-off-by: Roman Gushchin <guro@fb.com> Reviewed-by: Andrew Morton <akpm@linux-foundation.org> Cc: Johannes Weiner <hannes@cmpxchg.org> Cc: Michal Hocko <mhocko@kernel.org> Cc: Tejun Heo <tj@kernel.org> Cc: Rik van Riel <riel@surriel.com> Cc: Konstantin Khlebnikov <koct9i@gmail.com> Cc: Matthew Wilcox <willy@infradead.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-29mm, page_alloc: fix has_unmovable_pages for HugePagesOscar Salvador
commit 17e2e7d7e1b83fa324b3f099bfe426659aa3c2a4 upstream. While playing with gigantic hugepages and memory_hotplug, I triggered the following #PF when "cat memoryX/removable": BUG: unable to handle kernel NULL pointer dereference at 0000000000000008 #PF error: [normal kernel read fault] PGD 0 P4D 0 Oops: 0000 [#1] SMP PTI CPU: 1 PID: 1481 Comm: cat Tainted: G E 4.20.0-rc6-mm1-1-default+ #18 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.0.0-prebuilt.qemu-project.org 04/01/2014 RIP: 0010:has_unmovable_pages+0x154/0x210 Call Trace: is_mem_section_removable+0x7d/0x100 removable_show+0x90/0xb0 dev_attr_show+0x1c/0x50 sysfs_kf_seq_show+0xca/0x1b0 seq_read+0x133/0x380 __vfs_read+0x26/0x180 vfs_read+0x89/0x140 ksys_read+0x42/0x90 do_syscall_64+0x5b/0x180 entry_SYSCALL_64_after_hwframe+0x44/0xa9 The reason is we do not pass the Head to page_hstate(), and so, the call to compound_order() in page_hstate() returns 0, so we end up checking all hstates's size to match PAGE_SIZE. Obviously, we do not find any hstate matching that size, and we return NULL. Then, we dereference that NULL pointer in hugepage_migration_supported() and we got the #PF from above. Fix that by getting the head page before calling page_hstate(). Also, since gigantic pages span several pageblocks, re-adjust the logic for skipping pages. While are it, we can also get rid of the round_up(). [osalvador@suse.de: remove round_up(), adjust skip pages logic per Michal] Link: http://lkml.kernel.org/r/20181221062809.31771-1-osalvador@suse.de Link: http://lkml.kernel.org/r/20181217225113.17864-1-osalvador@suse.de Signed-off-by: Oscar Salvador <osalvador@suse.de> Acked-by: Michal Hocko <mhocko@suse.com> Reviewed-by: David Hildenbrand <david@redhat.com> Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Pavel Tatashin <pavel.tatashin@microsoft.com> Cc: Mike Rapoport <rppt@linux.vnet.ibm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-29mm: thp: fix flags for pmd migration when splitPeter Xu
commit 2e83ee1d8694a61d0d95a5b694f2e61e8dde8627 upstream. When splitting a huge migrating PMD, we'll transfer all the existing PMD bits and apply them again onto the small PTEs. However we are fetching the bits unconditionally via pmd_soft_dirty(), pmd_write() or pmd_yound() while actually they don't make sense at all when it's a migration entry. Fix them up. Since at it, drop the ifdef together as not needed. Note that if my understanding is correct about the problem then if without the patch there is chance to lose some of the dirty bits in the migrating pmd pages (on x86_64 we're fetching bit 11 which is part of swap offset instead of bit 2) and it could potentially corrupt the memory of an userspace program which depends on the dirty bit. Link: http://lkml.kernel.org/r/20181213051510.20306-1-peterx@redhat.com Signed-off-by: Peter Xu <peterx@redhat.com> Reviewed-by: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Reviewed-by: William Kucharski <william.kucharski@oracle.com> Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Cc: Andrea Arcangeli <aarcange@redhat.com> Cc: Matthew Wilcox <willy@infradead.org> Cc: Michal Hocko <mhocko@suse.com> Cc: Dave Jiang <dave.jiang@intel.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Souptick Joarder <jrdr.linux@gmail.com> Cc: Konstantin Khlebnikov <khlebnikov@yandex-team.ru> Cc: Zi Yan <zi.yan@cs.rutgers.edu> Cc: <stable@vger.kernel.org> [4.14+] Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-29mm, memory_hotplug: initialize struct pages for the full memory sectionMikhail Zaslonko
commit 2830bf6f05fb3e05bc4743274b806c821807a684 upstream. If memory end is not aligned with the sparse memory section boundary, the mapping of such a section is only partly initialized. This may lead to VM_BUG_ON due to uninitialized struct page access from is_mem_section_removable() or test_pages_in_a_zone() function triggered by memory_hotplug sysfs handlers: Here are the the panic examples: CONFIG_DEBUG_VM=y CONFIG_DEBUG_VM_PGFLAGS=y kernel parameter mem=2050M -------------------------- page:000003d082008000 is uninitialized and poisoned page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p)) Call Trace: ( test_pages_in_a_zone+0xde/0x160) show_valid_zones+0x5c/0x190 dev_attr_show+0x34/0x70 sysfs_kf_seq_show+0xc8/0x148 seq_read+0x204/0x480 __vfs_read+0x32/0x178 vfs_read+0x82/0x138 ksys_read+0x5a/0xb0 system_call+0xdc/0x2d8 Last Breaking-Event-Address: test_pages_in_a_zone+0xde/0x160 Kernel panic - not syncing: Fatal exception: panic_on_oops kernel parameter mem=3075M -------------------------- page:000003d08300c000 is uninitialized and poisoned page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p)) Call Trace: ( is_mem_section_removable+0xb4/0x190) show_mem_removable+0x9a/0xd8 dev_attr_show+0x34/0x70 sysfs_kf_seq_show+0xc8/0x148 seq_read+0x204/0x480 __vfs_read+0x32/0x178 vfs_read+0x82/0x138 ksys_read+0x5a/0xb0 system_call+0xdc/0x2d8 Last Breaking-Event-Address: is_mem_section_removable+0xb4/0x190 Kernel panic - not syncing: Fatal exception: panic_on_oops Fix the problem by initializing the last memory section of each zone in memmap_init_zone() till the very end, even if it goes beyond the zone end. Michal said: : This has alwways been problem AFAIU. It just went unnoticed because we : have zeroed memmaps during allocation before f7f99100d8d9 ("mm: stop : zeroing memory during allocation in vmemmap") and so the above test : would simply skip these ranges as belonging to zone 0 or provided a : garbage. : : So I guess we do care for post f7f99100d8d9 kernels mostly and : therefore Fixes: f7f99100d8d9 ("mm: stop zeroing memory during : allocation in vmemmap") Link: http://lkml.kernel.org/r/20181212172712.34019-2-zaslonko@linux.ibm.com Fixes: f7f99100d8d9 ("mm: stop zeroing memory during allocation in vmemmap") Signed-off-by: Mikhail Zaslonko <zaslonko@linux.ibm.com> Reviewed-by: Gerald Schaefer <gerald.schaefer@de.ibm.com> Suggested-by: Michal Hocko <mhocko@kernel.org> Acked-by: Michal Hocko <mhocko@suse.com> Reported-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Tested-by: Mikhail Gavrilov <mikhail.v.gavrilov@gmail.com> Cc: Dave Hansen <dave.hansen@intel.com> Cc: Alexander Duyck <alexander.h.duyck@linux.intel.com> Cc: Pasha Tatashin <Pavel.Tatashin@microsoft.com> Cc: Martin Schwidefsky <schwidefsky@de.ibm.com> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: <stable@vger.kernel.org> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-29media: ov5640: Fix set format regressionJacopo Mondi
commit 07115449919383548d094ff83cc27bd08639a8a1 upstream. The set_fmt operations updates the sensor format only when the image format is changed. When only the image sizes gets changed, the format do not get updated causing the sensor to always report the one that was previously in use. Without this patch, updating frame size only fails: [fmt:UYVY8_2X8/640x480@1/30 field:none colorspace:srgb xfer:srgb ...] With this patch applied: [fmt:UYVY8_2X8/1024x768@1/30 field:none colorspace:srgb xfer:srgb ...] Fixes: 6949d864776e ("media: ov5640: do not change mode if format or frame interval is unchanged") Signed-off-by: Jacopo Mondi <jacopo+renesas@jmondi.org> Signed-off-by: Maxime Ripard <maxime.ripard@bootlin.com> Tested-by: Adam Ford <aford173@gmail.com> #imx6 w/ CSI2 interface on 4.19.6 and 4.20-RC5 Signed-off-by: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Mauro Carvalho Chehab <mchehab+samsung@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-29iwlwifi: add new cards for 9560, 9462, 9461 and killer seriesIhab Zhaika
commit f108703cb5f199d0fc98517ac29a997c4c646c94 upstream. add few PCI ID'S for 9560, 9462, 9461 and killer series. Cc: stable@vger.kernel.org Signed-off-by: Ihab Zhaika <ihab.zhaika@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-29Revert "mwifiex: restructure rx_reorder_tbl_lock usage"Brian Norris
commit 1aa48f088615ebfa5e139951a0d3e7dc2c2af4ec upstream. This reverts commit 5188d5453bc9380ccd4ae1086138dd485d13aef2, because it introduced lock recursion: BUG: spinlock recursion on CPU#2, kworker/u13:1/395 lock: 0xffffffc0e28a47f0, .magic: dead4ead, .owner: kworker/u13:1/395, .owner_cpu: 2 CPU: 2 PID: 395 Comm: kworker/u13:1 Not tainted 4.20.0-rc4+ #2 Hardware name: Google Kevin (DT) Workqueue: MWIFIEX_RX_WORK_QUEUE mwifiex_rx_work_queue [mwifiex] Call trace: dump_backtrace+0x0/0x140 show_stack+0x20/0x28 dump_stack+0x84/0xa4 spin_bug+0x98/0xa4 do_raw_spin_lock+0x5c/0xdc _raw_spin_lock_irqsave+0x38/0x48 mwifiex_flush_data+0x2c/0xa4 [mwifiex] call_timer_fn+0xcc/0x1c4 run_timer_softirq+0x264/0x4f0 __do_softirq+0x1a8/0x35c do_softirq+0x54/0x64 netif_rx_ni+0xe8/0x120 mwifiex_recv_packet+0xfc/0x10c [mwifiex] mwifiex_process_rx_packet+0x1d4/0x238 [mwifiex] mwifiex_11n_dispatch_pkt+0x190/0x1ac [mwifiex] mwifiex_11n_rx_reorder_pkt+0x28c/0x354 [mwifiex] mwifiex_process_sta_rx_packet+0x204/0x26c [mwifiex] mwifiex_handle_rx_packet+0x15c/0x16c [mwifiex] mwifiex_rx_work_queue+0x104/0x134 [mwifiex] worker_thread+0x4cc/0x72c kthread+0x134/0x13c ret_from_fork+0x10/0x18 This was clearly not tested well at all. I simply performed 'wget' in a loop and it fell over within a few seconds. Fixes: 5188d5453bc9 ("mwifiex: restructure rx_reorder_tbl_lock usage") Cc: <stable@vger.kernel.org> Cc: Ganapathi Bhat <gbhat@marvell.com> Signed-off-by: Brian Norris <briannorris@chromium.org> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-29iwlwifi: mvm: don't send GEO_TX_POWER_LIMIT to old firmwaresEmmanuel Grumbach
commit eca1e56ceedd9cc185eb18baf307d3ff2e4af376 upstream. Old firmware versions don't support this command. Sending it to any firmware before -41.ucode will crash the firmware. This fixes https://bugzilla.kernel.org/show_bug.cgi?id=201975 Fixes: 66e839030fd6 ("iwlwifi: fix wrong WGDS_WIFI_DATA_SIZE") CC: <stable@vger.kernel.org> #4.19+ Signed-off-by: Emmanuel Grumbach <emmanuel.grumbach@intel.com> Signed-off-by: Luca Coelho <luciano.coelho@intel.com> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-29rtlwifi: Fix leak of skb when processing C2H_BT_INFOLarry Finger
commit 8cfa272b0d321160ebb5b45073e39ef0a6ad73f2 upstream. With commit 0a9f8f0a1ba9 ("rtlwifi: fix btmpinfo timeout while processing C2H_BT_INFO"), calling rtl_c2hcmd_enqueue() with rtl_c2h_fast_cmd() true, the routine returns without freeing that skb, thereby leaking it. This issue has been discussed at https://github.com/lwfinger/rtlwifi_new/issues/401 and the fix tested there. Fixes: 0a9f8f0a1ba9 ("rtlwifi: fix btmpinfo timeout while processing C2H_BT_INFO") Reported-and-tested-by: Francisco Machado Magalhães Neto <franmagneto@gmail.com> Cc: Francisco Machado Magalhães Neto <franmagneto@gmail.com> Cc: Ping-Ke Shih <pkshih@realtek.com> Cc: Stable <stable@vger.kernel.org> # 4.18+ Signed-off-by: Larry Finger <Larry.Finger@lwfinger.net> Signed-off-by: Kalle Valo <kvalo@codeaurora.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-29xfrm_user: fix freeing of xfrm states on acquireMathias Krause
commit 4a135e538962cb00a9667c82e7d2b9e4d7cd7177 upstream. Commit 565f0fa902b6 ("xfrm: use a dedicated slab cache for struct xfrm_state") moved xfrm state objects to use their own slab cache. However, it missed to adapt xfrm_user to use this new cache when freeing xfrm states. Fix this by introducing and make use of a new helper for freeing xfrm_state objects. Fixes: 565f0fa902b6 ("xfrm: use a dedicated slab cache for struct xfrm_state") Reported-by: Pan Bian <bianpan2016@163.com> Cc: <stable@vger.kernel.org> # v4.18+ Signed-off-by: Mathias Krause <minipli@googlemail.com> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Steffen Klassert <steffen.klassert@secunet.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-29mm: introduce mm_[p4d|pud|pmd]_foldedMartin Schwidefsky
[ Upstream commit 1071fc5779d9846fec56a4ff6089ab08cac1ab72 ] Add three architecture overrideable functions to test if the p4d, pud, or pmd layer of a page table is folded or not. Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-12-29mm: make the __PAGETABLE_PxD_FOLDED defines non-emptyMartin Schwidefsky
[ Upstream commit a8874e7e8a8896f2b6c641f4b8e2473eafd35204 ] Change the currently empty defines for __PAGETABLE_PMD_FOLDED, __PAGETABLE_PUD_FOLDED and __PAGETABLE_P4D_FOLDED to return 1. This makes it possible to use __is_defined() to test if the preprocessor define exists. Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-12-29mm: add mm_pxd_folded checks to pgtable_bytes accounting functionsMartin Schwidefsky
[ Upstream commit 6d212db11947ae5464e4717536ed9faf61c01e86 ] The common mm code calls mm_dec_nr_pmds() and mm_dec_nr_puds() in free_pgtables() if the address range spans a full pud or pmd. If mm_dec_nr_puds/mm_dec_nr_pmds are non-empty due to configuration settings they blindly subtract the size of the pmd or pud table from pgtable_bytes even if the pud or pmd page table layer is folded. Add explicit mm_[pmd|pud]_folded checks to the four pgtable_bytes accounting functions mm_inc_nr_puds, mm_inc_nr_pmds, mm_dec_nr_puds and mm_dec_nr_pmds. As the check for folded page tables can be overwritten by the architecture, this allows to keep a correct pgtable_bytes value for platforms that use a dynamic number of page table levels. Acked-by: Kirill A. Shutemov <kirill.shutemov@linux.intel.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Sasha Levin <sashal@kernel.org>
2018-12-29panic: avoid deadlocks in re-entrant console driversSergey Senozhatsky
commit c7c3f05e341a9a2bd1a92993d4f996cfd6e7348e upstream. From printk()/serial console point of view panic() is special, because it may force CPU to re-enter printk() or/and serial console driver. Therefore, some of serial consoles drivers are re-entrant. E.g. 8250: serial8250_console_write() { if (port->sysrq) locked = 0; else if (oops_in_progress) locked = spin_trylock_irqsave(&port->lock, flags); else spin_lock_irqsave(&port->lock, flags); ... } panic() does set oops_in_progress via bust_spinlocks(1), so in theory we should be able to re-enter serial console driver from panic(): CPU0 <NMI> uart_console_write() serial8250_console_write() // if (oops_in_progress) // spin_trylock_irqsave() call_console_drivers() console_unlock() console_flush_on_panic() bust_spinlocks(1) // oops_in_progress++ panic() <NMI/> spin_lock_irqsave(&port->lock, flags) // spin_lock_irqsave() serial8250_console_write() call_console_drivers() console_unlock() printk() ... However, this does not happen and we deadlock in serial console on port->lock spinlock. And the problem is that console_flush_on_panic() called after bust_spinlocks(0): void panic(const char *fmt, ...) { bust_spinlocks(1); ... bust_spinlocks(0); console_flush_on_panic(); ... } bust_spinlocks(0) decrements oops_in_progress, so oops_in_progress can go back to zero. Thus even re-entrant console drivers will simply spin on port->lock spinlock. Given that port->lock may already be locked either by a stopped CPU, or by the very same CPU we execute panic() on (for instance, NMI panic() on printing CPU) the system deadlocks and does not reboot. Fix this by removing bust_spinlocks(0), so oops_in_progress is always set in panic() now and, thus, re-entrant console drivers will trylock the port->lock instead of spinning on it forever, when we call them from console_flush_on_panic(). Link: http://lkml.kernel.org/r/20181025101036.6823-1-sergey.senozhatsky@gmail.com Cc: Steven Rostedt <rostedt@goodmis.org> Cc: Daniel Wang <wonderfly@google.com> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Cc: Alan Cox <gnomes@lxorguk.ukuu.org.uk> Cc: Jiri Slaby <jslaby@suse.com> Cc: Peter Feiner <pfeiner@google.com> Cc: linux-serial@vger.kernel.org Cc: Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com> Cc: stable@vger.kernel.org Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com> Signed-off-by: Petr Mladek <pmladek@suse.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-29x86/intel_rdt: Ensure a CPU remains online for the region's pseudo-locking ↵Reinette Chatre
sequence commit 80b71c340f17705ec145911b9a193ea781811b16 upstream. The user triggers the creation of a pseudo-locked region when writing the requested schemata to the schemata resctrl file. The pseudo-locking of a region is required to be done on a CPU that is associated with the cache on which the pseudo-locked region will reside. In order to run the locking code on a specific CPU, the needed CPU has to be selected and ensured to remain online during the entire locking sequence. At this time, the cpu_hotplug_lock is not taken during the pseudo-lock region creation and it is thus possible for a CPU to be selected to run the pseudo-locking code and then that CPU to go offline before the thread is able to run on it. Fix this by ensuring that the cpu_hotplug_lock is taken while the CPU on which code has to run needs to be controlled. Since the cpu_hotplug_lock is always taken before rdtgroup_mutex the lock order is maintained. Fixes: e0bdfe8e36f3 ("x86/intel_rdt: Support creation/removal of pseudo-locked region") Signed-off-by: Reinette Chatre <reinette.chatre@intel.com> Signed-off-by: Borislav Petkov <bp@suse.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Fenghua Yu <fenghua.yu@intel.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: Tony Luck <tony.luck@intel.com> Cc: gavin.hindman@intel.com Cc: jithu.joseph@intel.com Cc: stable <stable@vger.kernel.org> Cc: x86-ml <x86@kernel.org> Link: https://lkml.kernel.org/r/b7b17432a80f95a1fa21a1698ba643014f58ad31.1544476425.git.reinette.chatre@intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-29x86/vdso: Pass --eh-frame-hdr to the linkerAlistair Strachan
commit cd01544a268ad8ee5b1dfe42c4393f1095f86879 upstream. Commit 379d98ddf413 ("x86: vdso: Use $LD instead of $CC to link") accidentally broke unwinding from userspace, because ld would strip the .eh_frame sections when linking. Originally, the compiler would implicitly add --eh-frame-hdr when invoking the linker, but when this Makefile was converted from invoking ld via the compiler, to invoking it directly (like vmlinux does), the flag was missed. (The EH_FRAME section is important for the VDSO shared libraries, but not for vmlinux.) Fix the problem by explicitly specifying --eh-frame-hdr, which restores parity with the old method. See relevant bug reports for additional info: https://bugzilla.kernel.org/show_bug.cgi?id=201741 https://bugzilla.redhat.com/show_bug.cgi?id=1659295 Fixes: 379d98ddf413 ("x86: vdso: Use $LD instead of $CC to link") Reported-by: Florian Weimer <fweimer@redhat.com> Reported-by: Carlos O'Donell <carlos@redhat.com> Reported-by: "H. J. Lu" <hjl.tools@gmail.com> Signed-off-by: Alistair Strachan <astrachan@google.com> Signed-off-by: Borislav Petkov <bp@suse.de> Tested-by: Laura Abbott <labbott@redhat.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Carlos O'Donell <carlos@redhat.com> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: Ingo Molnar <mingo@redhat.com> Cc: Joel Fernandes <joel@joelfernandes.org> Cc: kernel-team@android.com Cc: Laura Abbott <labbott@redhat.com> Cc: stable <stable@vger.kernel.org> Cc: Thomas Gleixner <tglx@linutronix.de> Cc: X86 ML <x86@kernel.org> Link: https://lkml.kernel.org/r/20181214223637.35954-1-astrachan@google.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-29x86/mm: Fix decoy address handling vs 32-bit buildsDan Williams
commit 51c3fbd89d7554caa3290837604309f8d8669d99 upstream. A decoy address is used by set_mce_nospec() to update the cache attributes for a page that may contain poison (multi-bit ECC error) while attempting to minimize the possibility of triggering a speculative access to that page. When reserve_memtype() is handling a decoy address it needs to convert it to its real physical alias. The conversion, AND'ing with __PHYSICAL_MASK, is broken for a 32-bit physical mask and reserve_memtype() is passed the last physical page. Gert reports triggering the: BUG_ON(start >= end); ...assertion when running a 32-bit non-PAE build on a platform that has a driver resource at the top of physical memory: BIOS-e820: [mem 0x00000000fff00000-0x00000000ffffffff] reserved Given that the decoy address scheme is only targeted at 64-bit builds and assumes that the top of physical address space is free for use as a decoy address range, simply bypass address sanitization in the 32-bit case. Lastly, there was no need to crash the system when this failure occurred, and no need to crash future systems if the assumptions of decoy addresses are ever violated. Change the BUG_ON() to a WARN() with an error return. Fixes: 510ee090abc3 ("x86/mm/pat: Prepare {reserve, free}_memtype() for...") Reported-by: Gert Robben <t2@gert.gr> Signed-off-by: Dan Williams <dan.j.williams@intel.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Tested-by: Gert Robben <t2@gert.gr> Cc: stable@vger.kernel.org Cc: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Cc: Dave Hansen <dave.hansen@linux.intel.com> Cc: Andy Lutomirski <luto@kernel.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: Borislav Petkov <bp@alien8.de> Cc: "H. Peter Anvin" <hpa@zytor.com> Cc: platform-driver-x86@vger.kernel.org Cc: <stable@vger.kernel.org> Link: https://lkml.kernel.org/r/154454337985.789277.12133288391664677775.stgit@dwillia2-desk3.amr.corp.intel.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-29x86/mtrr: Don't copy uninitialized gentry fields back to userspaceColin Ian King
commit 32043fa065b51e0b1433e48d118821c71b5cd65d upstream. Currently the copy_to_user of data in the gentry struct is copying uninitiaized data in field _pad from the stack to userspace. Fix this by explicitly memset'ing gentry to zero, this also will zero any compiler added padding fields that may be in struct (currently there are none). Detected by CoverityScan, CID#200783 ("Uninitialized scalar variable") Fixes: b263b31e8ad6 ("x86, mtrr: Use explicit sizing and padding for the 64-bit ioctls") Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Reviewed-by: Tyler Hicks <tyhicks@canonical.com> Cc: security@kernel.org Link: https://lkml.kernel.org/r/20181218172956.1440-1-colin.king@canonical.com Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-29futex: Cure exit raceThomas Gleixner
commit da791a667536bf8322042e38ca85d55a78d3c273 upstream. Stefan reported, that the glibc tst-robustpi4 test case fails occasionally. That case creates the following race between sys_exit() and sys_futex_lock_pi(): CPU0 CPU1 sys_exit() sys_futex() do_exit() futex_lock_pi() exit_signals(tsk) No waiters: tsk->flags |= PF_EXITING; *uaddr == 0x00000PID mm_release(tsk) Set waiter bit exit_robust_list(tsk) { *uaddr = 0x80000PID; Set owner died attach_to_pi_owner() { *uaddr = 0xC0000000; tsk = get_task(PID); } if (!tsk->flags & PF_EXITING) { ... attach(); tsk->flags |= PF_EXITPIDONE; } else { if (!(tsk->flags & PF_EXITPIDONE)) return -EAGAIN; return -ESRCH; <--- FAIL } ESRCH is returned all the way to user space, which triggers the glibc test case assert. Returning ESRCH unconditionally is wrong here because the user space value has been changed by the exiting task to 0xC0000000, i.e. the FUTEX_OWNER_DIED bit is set and the futex PID value has been cleared. This is a valid state and the kernel has to handle it, i.e. taking the futex. Cure it by rereading the user space value when PF_EXITING and PF_EXITPIDONE is set in the task which 'owns' the futex. If the value has changed, let the kernel retry the operation, which includes all regular sanity checks and correctly handles the FUTEX_OWNER_DIED case. If it hasn't changed, then return ESRCH as there is no way to distinguish this case from malfunctioning user space. This happens when the exiting task did not have a robust list, the robust list was corrupted or the user space value in the futex was simply bogus. Reported-by: Stefan Liebler <stli@linux.ibm.com> Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Acked-by: Peter Zijlstra <peterz@infradead.org> Cc: Heiko Carstens <heiko.carstens@de.ibm.com> Cc: Darren Hart <dvhart@infradead.org> Cc: Ingo Molnar <mingo@kernel.org> Cc: Sasha Levin <sashal@kernel.org> Cc: stable@vger.kernel.org Link: https://bugzilla.kernel.org/show_bug.cgi?id=200467 Link: https://lkml.kernel.org/r/20181210152311.986181245@linutronix.de Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-29Drivers: hv: vmbus: Return -EINVAL for the sys files for unopened channelsDexuan Cui
commit fc96df16a1ce80cbb3c316ab7d4dc8cd5c2852ce upstream. Before 98f4c651762c, we returned zeros for unopened channels. With 98f4c651762c, we started to return random on-stack values. We'd better return -EINVAL instead. Fixes: 98f4c651762c ("hv: move ringbuffer bus attributes to dev_groups") Cc: stable@vger.kernel.org Cc: K. Y. Srinivasan <kys@microsoft.com> Cc: Haiyang Zhang <haiyangz@microsoft.com> Cc: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: Dexuan Cui <decui@microsoft.com> Signed-off-by: Sasha Levin <sashal@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-29KVM: Fix UAF in nested posted interrupt processingCfir Cohen
commit c2dd5146e9fe1f22c77c1b011adf84eea0245806 upstream. nested_get_vmcs12_pages() processes the posted_intr address in vmcs12. It caches the kmap()ed page object and pointer, however, it doesn't handle errors correctly: it's possible to cache a valid pointer, then release the page and later dereference the dangling pointer. I was able to reproduce with the following steps: 1. Call vmlaunch with valid posted_intr_desc_addr but an invalid MSR_EFER. This causes nested_get_vmcs12_pages() to cache the kmap()ed pi_desc_page and pi_desc. Later the invalid EFER value fails check_vmentry_postreqs() which fails the first vmlaunch. 2. Call vmlanuch with a valid EFER but an invalid posted_intr_desc_addr (I set it to 2G - 0x80). The second time we call nested_get_vmcs12_pages pi_desc_page is unmapped and released and pi_desc_page is set to NULL (the "shouldn't happen" clause). Due to the invalid posted_intr_desc_addr, kvm_vcpu_gpa_to_page() fails and nested_get_vmcs12_pages() returns. It doesn't return an error value so vmlaunch proceeds. Note that at this time we have a dangling pointer in vmx->nested.pi_desc and POSTED_INTR_DESC_ADDR in L0's vmcs. 3. Issue an IPI in L2 guest code. This triggers a call to vmx_complete_nested_posted_interrupt() and pi_test_and_clear_on() which dereferences the dangling pointer. Vulnerable code requires nested and enable_apicv variables to be set to true. The host CPU must also support posted interrupts. Fixes: 5e2f30b756a37 "KVM: nVMX: get rid of nested_get_page()" Cc: stable@vger.kernel.org Reviewed-by: Andy Honig <ahonig@google.com> Signed-off-by: Cfir Cohen <cfir@google.com> Reviewed-by: Liran Alon <liran.alon@oracle.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-29kvm: x86: Add AMD's EX_CFG to the list of ignored MSRsEduardo Habkost
commit 0e1b869fff60c81b510c2d00602d778f8f59dd9a upstream. Some guests OSes (including Windows 10) write to MSR 0xc001102c on some cases (possibly while trying to apply a CPU errata). Make KVM ignore reads and writes to that MSR, so the guest won't crash. The MSR is documented as "Execution Unit Configuration (EX_CFG)", at AMD's "BIOS and Kernel Developer's Guide (BKDG) for AMD Family 15h Models 00h-0Fh Processors". Cc: stable@vger.kernel.org Signed-off-by: Eduardo Habkost <ehabkost@redhat.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-29KVM: X86: Fix NULL deref in vcpu_scan_ioapicWanpeng Li
commit dcbd3e49c2f0b2c2d8a321507ff8f3de4af76d7c upstream. Reported by syzkaller: CPU: 1 PID: 5962 Comm: syz-executor118 Not tainted 4.20.0-rc6+ #374 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:kvm_apic_hw_enabled arch/x86/kvm/lapic.h:169 [inline] RIP: 0010:vcpu_scan_ioapic arch/x86/kvm/x86.c:7449 [inline] RIP: 0010:vcpu_enter_guest arch/x86/kvm/x86.c:7602 [inline] RIP: 0010:vcpu_run arch/x86/kvm/x86.c:7874 [inline] RIP: 0010:kvm_arch_vcpu_ioctl_run+0x5296/0x7320 arch/x86/kvm/x86.c:8074 Call Trace: kvm_vcpu_ioctl+0x5c8/0x1150 arch/x86/kvm/../../../virt/kvm/kvm_main.c:2596 vfs_ioctl fs/ioctl.c:46 [inline] file_ioctl fs/ioctl.c:509 [inline] do_vfs_ioctl+0x1de/0x1790 fs/ioctl.c:696 ksys_ioctl+0xa9/0xd0 fs/ioctl.c:713 __do_sys_ioctl fs/ioctl.c:720 [inline] __se_sys_ioctl fs/ioctl.c:718 [inline] __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe The reason is that the testcase writes hyperv synic HV_X64_MSR_SINT14 msr and triggers scan ioapic logic to load synic vectors into EOI exit bitmap. However, irqchip is not initialized by this simple testcase, ioapic/apic objects should not be accessed. This patch fixes it by also considering whether or not apic is present. Reported-by: syzbot+39810e6c400efadfef71@syzkaller.appspotmail.com Cc: stable@vger.kernel.org Cc: Paolo Bonzini <pbonzini@redhat.com> Cc: Radim Krčmář <rkrcmar@redhat.com> Signed-off-by: Wanpeng Li <wanpengli@tencent.com> Signed-off-by: Paolo Bonzini <pbonzini@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-29posix-timers: Fix division by zero bugThomas Gleixner
commit 0e334db6bb4b1fd1e2d72c1f3d8f004313cd9f94 upstream. The signal delivery path of posix-timers can try to rearm the timer even if the interval is zero. That's handled for the common case (hrtimer) but not for alarm timers. In that case the forwarding function raises a division by zero exception. The handling for hrtimer based posix timers is wrong because it marks the timer as active despite the fact that it is stopped. Move the check from common_hrtimer_rearm() to posixtimer_rearm() to cure both issues. Reported-by: syzbot+9d38bedac9cc77b8ad5e@syzkaller.appspotmail.com Signed-off-by: Thomas Gleixner <tglx@linutronix.de> Cc: John Stultz <john.stultz@linaro.org> Cc: Linus Torvalds <torvalds@linux-foundation.org> Cc: Peter Zijlstra <peterz@infradead.org> Cc: sboyd@kernel.org Cc: stable@vger.kernel.org Cc: syzkaller-bugs@googlegroups.com Link: http://lkml.kernel.org/r/alpine.DEB.2.21.1812171328050.1880@nanos.tec.linutronix.de Signed-off-by: Ingo Molnar <mingo@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-29gpiolib-acpi: Only defer request_irq for GpioInt ACPI event handlersHans de Goede
commit e59f5e08ece1060073d92c66ded52e1f2c43b5bb upstream. Commit 78d3a92edbfb ("gpiolib-acpi: Register GpioInt ACPI event handlers from a late_initcall") deferred the entire acpi_gpiochip_request_interrupt call for each event resource. This means it also delays the gpiochip_request_own_desc(..., "ACPI:Event") call. This is a problem if some AML code reads the GPIO pin before we run the deferred acpi_gpiochip_request_interrupt, because in that case acpi_gpio_adr_space_handler() will already have called gpiochip_request_own_desc(..., "ACPI:OpRegion") causing the call from acpi_gpiochip_request_interrupt to fail with -EBUSY and we will fail to register an event handler. acpi_gpio_adr_space_handler is prepared for acpi_gpiochip_request_interrupt already having claimed the pin, but the other way around does not work. One example of a problem this causes, is the event handler for the OTG ID pin on a Prowise PT301 tablet not registering, keeping the port stuck in whatever mode it was in during boot and e.g. only allowing charging after a reboot. This commit fixes this by only deferring the request_irq call and the initial run of edge-triggered IRQs instead of deferring all of acpi_gpiochip_request_interrupt. Cc: stable@vger.kernel.org Fixes: 78d3a92edbfb ("gpiolib-acpi: Register GpioInt ACPI event ...") Signed-off-by: Hans de Goede <hdegoede@redhat.com> Reviewed-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com> Acked-by: Mika Westerberg <mika.westerberg@linux.intel.com> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-29gpio: max7301: fix driver for use with CONFIG_VMAP_STACKChristophe Leroy
commit abf221d2f51b8ce7b9959a8953f880a8b0a1400d upstream. spi_read() and spi_write() require DMA-safe memory. When CONFIG_VMAP_STACK is selected, those functions cannot be used with buffers on stack. This patch replaces calls to spi_read() and spi_write() by spi_write_then_read() which doesn't require DMA-safe buffers. Fixes: 0c36ec314735 ("gpio: gpio driver for max7301 SPI GPIO expander") Cc: <stable@vger.kernel.org> Signed-off-by: Christophe Leroy <christophe.leroy@c-s.fr> Signed-off-by: Linus Walleij <linus.walleij@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-29mmc: omap_hsmmc: fix DMA API warningRussell King
commit 0b479790684192ab7024ce6a621f93f6d0a64d92 upstream. While booting with rootfs on MMC, the following warning is encountered on OMAP4430: omap-dma-engine 4a056000.dma-controller: DMA-API: mapping sg segment longer than device claims to support [len=69632] [max=65536] This is because the DMA engine has a default maximum segment size of 64K but HSMMC sets: mmc->max_blk_size = 512; /* Block Length at max can be 1024 */ mmc->max_blk_count = 0xFFFF; /* No. of Blocks is 16 bits */ mmc->max_req_size = mmc->max_blk_size * mmc->max_blk_count; mmc->max_seg_size = mmc->max_req_size; which ends up telling the block layer that we support a maximum segment size of 65535*512, which exceeds the advertised DMA engine capabilities. Fix this by clamping the maximum segment size to the lower of the maximum request size and of the DMA engine device used for either DMA channel. Signed-off-by: Russell King <rmk+kernel@armlinux.org.uk> Cc: <stable@vger.kernel.org> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-29mmc: core: Use a minimum 1600ms timeout when enabling CACHE ctrlUlf Hansson
commit e3ae3401aa19432ee4943eb0bbc2ec704d07d793 upstream. Some eMMCs from Micron have been reported to need ~800 ms timeout, while enabling the CACHE ctrl after running sudden power failure tests. The needed timeout is greater than what the card specifies as its generic CMD6 timeout, through the EXT_CSD register, hence the problem. Normally we would introduce a card quirk to extend the timeout for these specific Micron cards. However, due to the rather complicated debug process needed to find out the error, let's simply use a minimum timeout of 1600ms, the double of what has been reported, for all cards when enabling CACHE ctrl. Reported-by: Sjoerd Simons <sjoerd.simons@collabora.co.uk> Reported-by: Andreas Dannenberg <dannenberg@ti.com> Reported-by: Faiz Abbas <faiz_abbas@ti.com> Cc: <stable@vger.kernel.org> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-29mmc: core: Allow BKOPS and CACHE ctrl even if no HPI supportUlf Hansson
commit ba9f39a785a9977e72233000711ef1eb48203551 upstream. In commit 5320226a0512 ("mmc: core: Disable HPI for certain Hynix eMMC cards"), then intent was to prevent HPI from being used for some eMMC cards, which didn't properly support it. However, that went too far, as even BKOPS and CACHE ctrl became prevented. Let's restore those parts and allow BKOPS and CACHE ctrl even if HPI isn't supported. Fixes: 5320226a0512 ("mmc: core: Disable HPI for certain Hynix eMMC cards") Cc: Pratibhasagar V <pratibha@codeaurora.org> Cc: <stable@vger.kernel.org> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-29mmc: core: Reset HPI enabled state during re-init and in case of errorsUlf Hansson
commit a0741ba40a009f97c019ae7541dc61c1fdf41efb upstream. During a re-initialization of the eMMC card, we may fail to re-enable HPI. In these cases, that isn't properly reflected in the card->ext_csd.hpi_en bit, as it keeps being set. This may cause following attempts to use HPI, even if's not enabled. Let's fix this! Fixes: eb0d8f135b67 ("mmc: core: support HPI send command") Cc: <stable@vger.kernel.org> Signed-off-by: Ulf Hansson <ulf.hansson@linaro.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-29scsi: sd: use mempool for discard special pageJens Axboe
commit 61cce6f6eeced5ddd9cac55e807fe28b4f18c1ba upstream. When boxes are run near (or to) OOM, we have a problem with the discard page allocation in sd. If we fail allocating the special page, we return busy, and it'll get retried. But since ordering is honored for dispatch requests, we can keep retrying this same IO and failing. Behind that IO could be requests that want to free memory, but they never get the chance. This means you get repeated spews of traces like this: [1201401.625972] Call Trace: [1201401.631748] dump_stack+0x4d/0x65 [1201401.639445] warn_alloc+0xec/0x190 [1201401.647335] __alloc_pages_slowpath+0xe84/0xf30 [1201401.657722] ? get_page_from_freelist+0x11b/0xb10 [1201401.668475] ? __alloc_pages_slowpath+0x2e/0xf30 [1201401.679054] __alloc_pages_nodemask+0x1f9/0x210 [1201401.689424] alloc_pages_current+0x8c/0x110 [1201401.699025] sd_setup_write_same16_cmnd+0x51/0x150 [1201401.709987] sd_init_command+0x49c/0xb70 [1201401.719029] scsi_setup_cmnd+0x9c/0x160 [1201401.727877] scsi_queue_rq+0x4d9/0x610 [1201401.736535] blk_mq_dispatch_rq_list+0x19a/0x360 [1201401.747113] blk_mq_sched_dispatch_requests+0xff/0x190 [1201401.758844] __blk_mq_run_hw_queue+0x95/0xa0 [1201401.768653] blk_mq_run_work_fn+0x2c/0x30 [1201401.777886] process_one_work+0x14b/0x400 [1201401.787119] worker_thread+0x4b/0x470 [1201401.795586] kthread+0x110/0x150 [1201401.803089] ? rescuer_thread+0x320/0x320 [1201401.812322] ? kthread_park+0x90/0x90 [1201401.820787] ? do_syscall_64+0x53/0x150 [1201401.829635] ret_from_fork+0x29/0x40 Ensure that the discard page allocation has a mempool backing, so we know we can make progress. Cc: stable@vger.kernel.org Signed-off-by: Jens Axboe <axboe@kernel.dk> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-29scsi: t10-pi: Return correct ref tag when queue has no integrity profileMartin K. Petersen
commit 60a89a3ce0cce515dc663bc1b45ac89202ad6c79 upstream. Commit ddd0bc756983 ("block: move ref_tag calculation func to the block layer") moved ref tag calculation from SCSI to a library function. However, this change broke returning the correct ref tag for devices operating in DIF mode since these do not have an associated block integrity profile. This in turn caused read/write failures on PI-formatted disks attached to an mpt3sas controller. Fixes: ddd0bc756983 ("block: move ref_tag calculation func to the block layer") Cc: stable@vger.kernel.org # 4.19+ Reported-by: John Garry <john.garry@huawei.com> Tested-by: Xiang Chen <chenxiang66@hisilicon.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-29ubifs: Handle re-linking of inodes correctly while recoveryRichard Weinberger
commit e58725d51fa8da9133f3f1c54170aa2e43056b91 upstream. UBIFS's recovery code strictly assumes that a deleted inode will never come back, therefore it removes all data which belongs to that inode as soon it faces an inode with link count 0 in the replay list. Before O_TMPFILE this assumption was perfectly fine. With O_TMPFILE it can lead to data loss upon a power-cut. Consider a journal with entries like: 0: inode X (nlink = 0) /* O_TMPFILE was created */ 1: data for inode X /* Someone writes to the temp file */ 2: inode X (nlink = 0) /* inode was changed, xattr, chmod, … */ 3: inode X (nlink = 1) /* inode was re-linked via linkat() */ Upon replay of entry #2 UBIFS will drop all data that belongs to inode X, this will lead to an empty file after mounting. As solution for this problem, scan the replay list for a re-link entry before dropping data. Fixes: 474b93704f32 ("ubifs: Implement O_TMPFILE") Cc: stable@vger.kernel.org Cc: Russell Senior <russell@personaltelco.net> Cc: Rafał Miłecki <zajec5@gmail.com> Reported-by: Russell Senior <russell@personaltelco.net> Reported-by: Rafał Miłecki <zajec5@gmail.com> Tested-by: Rafał Miłecki <rafal@milecki.pl> Signed-off-by: Richard Weinberger <richard@nod.at> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-29USB: serial: option: add Telit LN940 seriesJörgen Storvist
commit 28a86092b1753b802ef7e3de8a4c4a69a9c1bb03 upstream. Added USB serial option driver support for Telit LN940 series cellular modules. Covering both QMI and MBIM modes. usb-devices output (0x1900): T: Bus=01 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 21 Spd=480 MxCh= 0 D: Ver= 2.00 Cls=ef(misc ) Sub=02 Prot=01 MxPS=64 #Cfgs= 1 P: Vendor=1bc7 ProdID=1900 Rev=03.10 S: Manufacturer=Telit S: Product=Telit LN940 Mobile Broadband S: SerialNumber=0123456789ABCDEF C: #Ifs= 5 Cfg#= 1 Atr=a0 MxPwr=500mA I: If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option I: If#= 1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=qmi_wwan I: If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option I: If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option I: If#= 4 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option usb-devices output (0x1901): T: Bus=01 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 20 Spd=480 MxCh= 0 D: Ver= 2.00 Cls=ef(misc ) Sub=02 Prot=01 MxPS=64 #Cfgs= 1 P: Vendor=1bc7 ProdID=1901 Rev=03.10 S: Manufacturer=Telit S: Product=Telit LN940 Mobile Broadband S: SerialNumber=0123456789ABCDEF C: #Ifs= 6 Cfg#= 1 Atr=a0 MxPwr=500mA I: If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option I: If#= 1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option I: If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option I: If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option I: If#= 4 Alt= 0 #EPs= 1 Cls=02(commc) Sub=0e Prot=00 Driver=cdc_mbim I: If#= 5 Alt= 1 #EPs= 2 Cls=0a(data ) Sub=00 Prot=02 Driver=cdc_mbim Signed-off-by: Jörgen Storvist <jorgen.storvist@gmail.com> Cc: stable <stable@vger.kernel.org> Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-29USB: serial: option: add Fibocom NL668 seriesJörgen Storvist
commit 30360224441ce89a98ed627861e735beb4010775 upstream. Added USB serial option driver support for Fibocom NL668 series cellular modules. Reserved USB endpoints 4, 5 and 6 for network + ADB interfaces. usb-devices output (QMI mode) T: Bus=01 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 16 Spd=480 MxCh= 0 D: Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1 P: Vendor=1508 ProdID=1001 Rev=03.18 S: Manufacturer=Nodecom NL668 Modem S: Product=Nodecom NL668-CN Modem S: SerialNumber= C: #Ifs= 6 Cfg#= 1 Atr=a0 MxPwr=500mA I: If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option I: If#= 1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option I: If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option I: If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option I: If#= 4 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=ff Prot=ff Driver=qmi_wwan I: If#= 5 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=42 Prot=01 Driver=(none) usb-devices output (ECM mode) T: Bus=01 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 17 Spd=480 MxCh= 0 D: Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1 P: Vendor=1508 ProdID=1001 Rev=03.18 S: Manufacturer=Nodecom NL668 Modem S: Product=Nodecom NL668-CN Modem S: SerialNumber= C: #Ifs= 7 Cfg#= 1 Atr=a0 MxPwr=500mA I: If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option I: If#= 1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option I: If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option I: If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option I: If#= 4 Alt= 0 #EPs= 1 Cls=02(commc) Sub=06 Prot=00 Driver=cdc_ether I: If#= 5 Alt= 1 #EPs= 2 Cls=0a(data ) Sub=00 Prot=00 Driver=cdc_ether I: If#= 6 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=42 Prot=01 Driver=(none) Signed-off-by: Jörgen Storvist <jorgen.storvist@gmail.com> Cc: stable <stable@vger.kernel.org> Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-29USB: serial: option: add Simcom SIM7500/SIM7600 (MBIM mode)Jörgen Storvist
commit cc6730df08a291e51e145bc65e24ffb5e2f17ab6 upstream. Added USB serial option driver support for Simcom SIM7500/SIM7600 series cellular modules exposing MBIM interface (VID 0x1e0e,PID 0x9003) T: Bus=01 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 14 Spd=480 MxCh= 0 D: Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1 P: Vendor=1e0e ProdID=9003 Rev=03.18 S: Manufacturer=SimTech, Incorporated S: Product=SimTech, Incorporated S: SerialNumber=0123456789ABCDEF C: #Ifs= 7 Cfg#= 1 Atr=a0 MxPwr=500mA I: If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option I: If#= 1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option I: If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option I: If#= 3 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option I: If#= 4 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option I: If#= 5 Alt= 0 #EPs= 1 Cls=02(commc) Sub=0e Prot=00 Driver=cdc_mbim I: If#= 6 Alt= 1 #EPs= 2 Cls=0a(data ) Sub=00 Prot=02 Driver=cdc_mbim Signed-off-by: Jörgen Storvist <jorgen.storvist@gmail.com> Cc: stable <stable@vger.kernel.org> Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-29USB: serial: option: add HP lt4132Tore Anderson
commit d57ec3c83b5153217a70b561d4fb6ed96f2f7a25 upstream. The HP lt4132 is a rebranded Huawei ME906s-158 LTE modem. The interface with protocol 0x16 is "CDC ECM & NCM" according to the *.inf files included with the Windows driver. Attaching the option driver to it doesn't result in a /dev/ttyUSB* device being created, so I've excluded it. Note that it is also excluded for corresponding Huawei-branded devices, cf. commit d544db293a44 ("USB: support new huawei devices in option.c"). T: Bus=01 Lev=01 Prnt=01 Port=02 Cnt=02 Dev#= 3 Spd=480 MxCh= 0 D: Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=ff MxPS=64 #Cfgs= 3 P: Vendor=03f0 ProdID=a31d Rev=01.02 S: Manufacturer=HP Inc. S: Product=HP lt4132 LTE/HSPA+ 4G Module S: SerialNumber=0123456789ABCDEF C: #Ifs= 6 Cfg#= 1 Atr=a0 MxPwr=2mA I: If#=0x0 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=06 Prot=10 Driver=option I: If#=0x1 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=06 Prot=13 Driver=option I: If#=0x2 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=06 Prot=12 Driver=option I: If#=0x3 Alt= 0 #EPs= 1 Cls=ff(vend.) Sub=06 Prot=16 Driver=(none) I: If#=0x4 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=06 Prot=14 Driver=option I: If#=0x5 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=06 Prot=1b Driver=option T: Bus=01 Lev=01 Prnt=01 Port=02 Cnt=02 Dev#= 3 Spd=480 MxCh= 0 D: Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=ff MxPS=64 #Cfgs= 3 P: Vendor=03f0 ProdID=a31d Rev=01.02 S: Manufacturer=HP Inc. S: Product=HP lt4132 LTE/HSPA+ 4G Module S: SerialNumber=0123456789ABCDEF C: #Ifs= 7 Cfg#= 2 Atr=a0 MxPwr=2mA I: If#=0x0 Alt= 0 #EPs= 1 Cls=02(commc) Sub=06 Prot=00 Driver=cdc_ether I: If#=0x1 Alt= 0 #EPs= 2 Cls=0a(data ) Sub=06 Prot=00 Driver=cdc_ether I: If#=0x2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=06 Prot=10 Driver=option I: If#=0x3 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=06 Prot=13 Driver=option I: If#=0x4 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=06 Prot=12 Driver=option I: If#=0x5 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=06 Prot=14 Driver=option I: If#=0x6 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=06 Prot=1b Driver=option T: Bus=01 Lev=01 Prnt=01 Port=02 Cnt=02 Dev#= 3 Spd=480 MxCh= 0 D: Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=ff MxPS=64 #Cfgs= 3 P: Vendor=03f0 ProdID=a31d Rev=01.02 S: Manufacturer=HP Inc. S: Product=HP lt4132 LTE/HSPA+ 4G Module S: SerialNumber=0123456789ABCDEF C: #Ifs= 3 Cfg#= 3 Atr=a0 MxPwr=2mA I: If#=0x0 Alt= 0 #EPs= 1 Cls=02(commc) Sub=0e Prot=00 Driver=cdc_mbim I: If#=0x1 Alt= 1 #EPs= 2 Cls=0a(data ) Sub=00 Prot=02 Driver=cdc_mbim I: If#=0x2 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=06 Prot=14 Driver=option Signed-off-by: Tore Anderson <tore@fud.no> Cc: stable@vger.kernel.org [ johan: drop id defines ] Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-29USB: serial: option: add GosunCn ZTE WeLink ME3630Jörgen Storvist
commit 70a7444c550a75584ffcfae95267058817eff6a7 upstream. Added USB serial option driver support for GosunCn ZTE WeLink ME3630 series cellular modules for USB modes ECM/NCM and MBIM. usb-devices output MBIM mode: T: Bus=01 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 10 Spd=480 MxCh= 0 D: Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1 P: Vendor=19d2 ProdID=0602 Rev=03.18 S: Manufacturer=Android S: Product=Android S: SerialNumber= C: #Ifs= 5 Cfg#= 1 Atr=a0 MxPwr=500mA I: If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option I: If#= 1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option I: If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option I: If#= 3 Alt= 0 #EPs= 1 Cls=02(commc) Sub=0e Prot=00 Driver=cdc_mbim I: If#= 4 Alt= 1 #EPs= 2 Cls=0a(data ) Sub=00 Prot=02 Driver=cdc_mbim usb-devices output ECM/NCM mode: T: Bus=01 Lev=01 Prnt=01 Port=00 Cnt=01 Dev#= 11 Spd=480 MxCh= 0 D: Ver= 2.00 Cls=00(>ifc ) Sub=00 Prot=00 MxPS=64 #Cfgs= 1 P: Vendor=19d2 ProdID=1476 Rev=03.18 S: Manufacturer=Android S: Product=Android S: SerialNumber= C: #Ifs= 5 Cfg#= 1 Atr=a0 MxPwr=500mA I: If#= 0 Alt= 0 #EPs= 2 Cls=ff(vend.) Sub=ff Prot=ff Driver=option I: If#= 1 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option I: If#= 2 Alt= 0 #EPs= 3 Cls=ff(vend.) Sub=00 Prot=00 Driver=option I: If#= 3 Alt= 0 #EPs= 1 Cls=02(commc) Sub=06 Prot=00 Driver=cdc_ether I: If#= 4 Alt= 1 #EPs= 2 Cls=0a(data ) Sub=00 Prot=00 Driver=cdc_ether Signed-off-by: Jörgen Storvist <jorgen.storvist@gmail.com> Cc: stable <stable@vger.kernel.org> Signed-off-by: Johan Hovold <johan@kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-29USB: xhci: fix 'broken_suspend' placement in struct xchi_hcdNicolas Saenz Julienne
commit 2419f30a4a4fcaa5f35111563b4c61f1b2b26841 upstream. As commented in the struct's definition there shouldn't be anything underneath its 'priv[0]' member as it would break some macros. The patch converts the broken_suspend into a bit-field and relocates it next to to the rest of bit-fields. Fixes: a7d57abcc8a5 ("xhci: workaround CSS timeout on AMD SNPS 3.0 xHC") Reported-by: Oliver Neukum <oneukum@suse.com> Signed-off-by: Nicolas Saenz Julienne <nsaenzjulienne@suse.de> Acked-by: Mathias Nyman <mathias.nyman@linux.intel.com> Cc: stable <stable@vger.kernel.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-29xhci: Don't prevent USB2 bus suspend in state check intended for USB3 onlyMathias Nyman
commit 45f750c16cae3625014c14c77bd9005eda975d35 upstream. The code to prevent a bus suspend if a USB3 port was still in link training also reacted to USB2 port polling state. This caused bus suspend to busyloop in some cases. USB2 polling state is different from USB3, and should not prevent bus suspend. Limit the USB3 link training state check to USB3 root hub ports only. The origial commit went to stable so this need to be applied there as well Fixes: 2f31a67f01a8 ("usb: xhci: Prevent bus suspend if a port connect change or polling state is detected") Cc: stable@vger.kernel.org Signed-off-by: Mathias Nyman <mathias.nyman@linux.intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-29USB: hso: Fix OOB memory access in hso_probe/hso_get_config_dataHui Peng
commit 5146f95df782b0ac61abde36567e718692725c89 upstream. The function hso_probe reads if_num from the USB device (as an u8) and uses it without a length check to index an array, resulting in an OOB memory read in hso_probe or hso_get_config_data. Add a length check for both locations and updated hso_probe to bail on error. This issue has been assigned CVE-2018-19985. Reported-by: Hui Peng <benquike@gmail.com> Reported-by: Mathias Payer <mathias.payer@nebelwelt.net> Signed-off-by: Hui Peng <benquike@gmail.com> Signed-off-by: Mathias Payer <mathias.payer@nebelwelt.net> Reviewed-by: Sebastian Andrzej Siewior <bigeasy@linutronix.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org> Signed-off-by: David S. Miller <davem@davemloft.net> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2018-12-29Revert "vfs: Allow userns root to call mknod on owned filesystems."Christian Brauner
commit 94f82008ce30e2624537d240d64ce718255e0b80 upstream. This reverts commit 55956b59df336f6738da916dbb520b6e37df9fbd. commit 55956b59df33 ("vfs: Allow userns root to call mknod on owned filesystems.") enabled mknod() in user namespaces for userns root if CAP_MKNOD is available. However, these device nodes are useless since any filesystem mounted from a non-initial user namespace will set the SB_I_NODEV flag on the filesystem. Now, when a device node s created in a non-initial user namespace a call to open() on said device node will fail due to: bool may_open_dev(const struct path *path) { return !(path->mnt->mnt_flags & MNT_NODEV) && !(path->mnt->mnt_sb->s_iflags & SB_I_NODEV); } The problem with this is that as of the aforementioned commit mknod() creates partially functional device nodes in non-initial user namespaces. In particular, it has the consequence that as of the aforementioned commit open() will be more privileged with respect to device nodes than mknod(). Before it was the other way around. Specifically, if mknod() succeeded then it was transparent for any userspace application that a fatal error must have occured when open() failed. All of this breaks multiple userspace workloads and a widespread assumption about how to handle mknod(). Basically, all container runtimes and systemd live by the slogan "ask for forgiveness not permission" when running user namespace workloads. For mknod() the assumption is that if the syscall succeeds the device nodes are useable irrespective of whether it succeeds in a non-initial user namespace or not. This logic was chosen explicitly to allow for the glorious day when mknod() will actually be able to create fully functional device nodes in user namespaces. A specific problem people are already running into when running 4.18 rc kernels are failing systemd services. For any distro that is run in a container systemd services started with the PrivateDevices= property set will fail to start since the device nodes in question cannot be opened (cf. the arguments in [1]). Full disclosure, Seth made the very sound argument that it is already possible to end up with partially functional device nodes. Any filesystem mounted with MS_NODEV set will allow mknod() to succeed but will not allow open() to succeed. The difference to the case here is that the MS_NODEV case is transparent to userspace since it is an explicitly set mount option while the SB_I_NODEV case is an implicit property enforced by the kernel and hence opaque to userspace. [1]: https://github.com/systemd/systemd/pull/9483 Signed-off-by: Christian Brauner <christian@brauner.io> Cc: "Eric W. Biederman" <ebiederm@xmission.com> Cc: Seth Forshee <seth.forshee@canonical.com> Cc: Serge Hallyn <serge@hallyn.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>