summaryrefslogtreecommitdiff
AgeCommit message (Collapse)Author
2017-09-27Linux 4.9.52v4.9.52Greg Kroah-Hartman
2017-09-27bcache: fix bch_hprint crash and improve outputMichael Lyle
commit 9276717b9e297a62d1151a43d1cd286213f68eb7 upstream. Most importantly, solve a crash where %llu was used to format signed numbers. This would cause a buffer overflow when reading sysfs writeback_rate_debug, as only 20 bytes were allocated for this and %llu writes 20 characters plus a null. Always use the units mechanism rather than having different output paths for simplicity. Also, correct problems with display output where 1.10 was a larger number than 1.09, by multiplying by 10 and then dividing by 1024 instead of dividing by 100. (Remainders of >= 1000 would print as .10). Minor changes: Always display the decimal point instead of trying to omit it based on number of digits shown. Decide what units to use based on 1000 as a threshold, not 1024 (in other words, always print at most 3 digits before the decimal point). Signed-off-by: Michael Lyle <mlyle@lyle.org> Reported-by: Dmitry Yu Okunev <dyokunev@ut.mephi.ru> Acked-by: Kent Overstreet <kent.overstreet@gmail.com> Reviewed-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27bcache: fix for gc and write-back raceTang Junhui
commit 9baf30972b5568d8b5bc8b3c46a6ec5b58100463 upstream. gc and write-back get raced (see the email "bcache get stucked" I sended before): gc thread write-back thread | |bch_writeback_thread() |bch_gc_thread() | | |==>read_dirty() |==>bch_btree_gc() | |==>btree_root() //get btree root | | //node write locker | |==>bch_btree_gc_root() | | |==>read_dirty_submit() | |==>write_dirty() | |==>continue_at(cl, | | write_dirty_finish, | | system_wq); | |==>write_dirty_finish()//excute | | //in system_wq | |==>bch_btree_insert() | |==>bch_btree_map_leaf_nodes() | |==>__bch_btree_map_nodes() | |==>btree_root //try to get btree | | //root node read | | //lock | |-----stuck here |==>bch_btree_set_root() |==>bch_journal_meta() |==>bch_journal() |==>journal_try_write() |==>journal_write_unlocked() //journal_full(&c->journal) | //condition satisfied |==>continue_at(cl, journal_write, system_wq); //try to excute | //journal_write in system_wq | //but work queue is excuting | //write_dirty_finish() |==>closure_sync(); //wait journal_write execute | //over and wake up gc, |-------------stuck here |==>release root node write locker This patch alloc a separate work-queue for write-back thread to avoid such race. (Commit log re-organized by Coly Li to pass checkpatch.pl checking) Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn> Acked-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27bcache: Correct return value for sysfs attach errorsTony Asleson
commit 77fa100f27475d08a569b9d51c17722130f089e7 upstream. If you encounter any errors in bch_cached_dev_attach it will return a negative error code. The variable 'v' which stores the result is unsigned, thus user space sees a very large value returned for bytes written which can cause incorrect user space behavior. Utilize 1 signed variable to use throughout the function to preserve error return capability. Signed-off-by: Tony Asleson <tasleson@redhat.com> Acked-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27bcache: correct cache_dirty_target in __update_writeback_rate()Tang Junhui
commit a8394090a9129b40f9d90dcb7f4a49d60c727ca6 upstream. __update_write_rate() uses a Proportion-Differentiation Controller algorithm to control writeback rate. A dirty target number is used in this PD controller to control writeback rate. A larger target number will make the writeback rate smaller, on the versus, a smaller target number will make the writeback rate larger. bcache uses the following steps to calculate the target number, 1) cache_sectors = all-buckets-of-cache-set * buckets-size 2) cache_dirty_target = cache_sectors * cached-device-writeback_percent 3) target = cache_dirty_target * (sectors-of-cached-device/sectors-of-all-cached-devices-of-this-cache-set) The calculation at step 1) for cache_sectors is incorrect, which does not consider dirty blocks occupied by flash only volume. A flash only volume can be took as a bcache device without cached device. All data sectors allocated for it are persistent on cache device and marked dirty, they are not touched by bcache writeback and garbage collection code. So data blocks of flash only volume should be ignore when calculating cache_sectors of cache set. Current code does not subtract dirty sectors of flash only volume, which results a larger target number from the above 3 steps. And in sequence the cache device's writeback rate is smaller then a correct value, writeback speed is slower on all cached devices. This patch fixes the incorrect slower writeback rate by subtracting dirty sectors of flash only volumes in __update_writeback_rate(). (Commit log composed by Coly Li to pass checkpatch.pl checking) Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn> Reviewed-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27bcache: do not subtract sectors_to_gc for bypassed IOTang Junhui
commit 69daf03adef5f7bc13e0ac86b4b8007df1767aab upstream. Since bypassed IOs use no bucket, so do not subtract sectors_to_gc to trigger gc thread. Signed-off-by: tang.junhui <tang.junhui@zte.com.cn> Acked-by: Coly Li <colyli@suse.de> Reviewed-by: Eric Wheeler <bcache@linux.ewheeler.net> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27bcache: Fix leak of bdev referenceJan Kara
commit 4b758df21ee7081ab41448d21d60367efaa625b3 upstream. If blkdev_get_by_path() in register_bcache() fails, we try to lookup the block device using lookup_bdev() to detect which situation we are in to properly report error. However we never drop the reference returned to us from lookup_bdev(). Fix that. Signed-off-by: Jan Kara <jack@suse.cz> Acked-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27bcache: initialize dirty stripes in flash_dev_run()Tang Junhui
commit 175206cf9ab63161dec74d9cd7f9992e062491f5 upstream. bcache uses a Proportion-Differentiation Controller algorithm to control writeback rate to cached devices. In the PD controller algorithm, dirty stripes of thin flash device should not be counted in, because flash only volumes never write back dirty data. Currently dirty stripe counter for thin flash device is not initialized when the thin flash device starts. Which means the following calculation in PD controller will reference an undefined dirty stripes number, and all cached devices attached to the same cache set where the thin flash device lies on may have an inaccurate writeback rate. This patch calles bch_sectors_dirty_init() in flash_dev_run(), to correctly initialize dirty stripe counter when the thin flash device starts to run. This patch also does following parameter data type change, -void bch_sectors_dirty_init(struct cached_dev *dc); +void bch_sectors_dirty_init(struct bcache_device *); to call this function conveniently in flash_dev_run(). (Commit log is composed by Coly Li) Signed-off-by: Tang Junhui <tang.junhui@zte.com.cn> Reviewed-by: Coly Li <colyli@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27PM / devfreq: Fix memory leak when fail to register deviceChanwoo Choi
commit 9e14de1077e9c34f141cf98bdba60cdd5193d962 upstream. When the devfreq_add_device fails to register deivce, the memory leak of devfreq instance happen. So, this patch fix the memory leak issue. Before freeing the devfreq instance checks whether devfreq instance is NULL or not because the device_unregister() frees the devfreq instance when jumping to the 'err_init'. It is to prevent the duplicate the kfee(devfreq). Fixes: ac4b281176a5 ("PM / devfreq: fix duplicated kfree on devfreq pointer") Signed-off-by: Chanwoo Choi <cw00.choi@samsung.com> Signed-off-by: MyungJoo Ham <myungjoo.ham@samsung.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27media: uvcvideo: Prevent heap overflow when accessing mapped controlsGuenter Roeck
commit 7e09f7d5c790278ab98e5f2c22307ebe8ad6e8ba upstream. The size of uvc_control_mapping is user controlled leading to a potential heap overflow in the uvc driver. This adds a check to verify the user provided size fits within the bounds of the defined buffer size. Originally-from: Richard Simmons <rssimmo@amazon.com> Signed-off-by: Guenter Roeck <linux@roeck-us.net> Reviewed-by: Laurent Pinchart <laurent.pinchart@ideasonboard.com> Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27media: v4l2-compat-ioctl32: Fix timespec conversionDaniel Mentz
commit 9c7ba1d7634cef490b85bc64c4091ff004821bfd upstream. Certain syscalls like recvmmsg support 64 bit timespec values for the X32 ABI. The helper function compat_put_timespec converts a timespec value to a 32 bit or 64 bit value depending on what ABI is used. The v4l2 compat layer, however, is not designed to support 64 bit timespec values and always uses 32 bit values. Hence, compat_put_timespec must not be used. Without this patch, user space will be provided with bad timestamp values from the VIDIOC_DQEVENT ioctl. Also, fields of the struct v4l2_event32 that come immediately after timestamp get overwritten, namely the field named id. Fixes: 81993e81a994 ("compat: Get rid of (get|put)_compat_time(val|spec)") Cc: H. Peter Anvin <hpa@linux.intel.com> Cc: Laurent Pinchart <laurent.pinchart+renesas@ideasonboard.com> Cc: Tiffany Lin <tiffany.lin@mediatek.com> Cc: Ricardo Ribalda Delgado <ricardo.ribalda@gmail.com> Cc: Sakari Ailus <sakari.ailus@linux.intel.com> Signed-off-by: Daniel Mentz <danielmentz@google.com> Signed-off-by: Hans Verkuil <hans.verkuil@cisco.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@s-opensource.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27s390/mm: fix race on mm->context.flush_mmMartin Schwidefsky
commit 60f07c8ec5fae06c23e9fd7bab67dabce92b3414 upstream. The order in __tlb_flush_mm_lazy is to flush TLB first and then clear the mm->context.flush_mm bit. This can lead to missed flushes as the bit can be set anytime, the order needs to be the other way aronud. But this leads to a different race, __tlb_flush_mm_lazy may be called on two CPUs concurrently. If mm->context.flush_mm is cleared first then another CPU can bypass __tlb_flush_mm_lazy although the first CPU has not done the flush yet. In a virtualized environment the time until the flush is finally completed can be arbitrarily long. Add a spinlock to serialize __tlb_flush_mm_lazy and use the function in finish_arch_post_lock_switch as well. Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27s390/mm: fix local TLB flushing vs. detach of an mm address spaceMartin Schwidefsky
commit b3e5dc45fd1ec2aa1de6b80008f9295eb17e0659 upstream. The local TLB flushing code keeps an additional mask in the mm.context, the cpu_attach_mask. At the time a global flush of an address space is done the cpu_attach_mask is copied to the mm_cpumask in order to avoid future global flushes in case the mm is used by a single CPU only after the flush. Trouble is that the reset of the mm_cpumask is racy against the detach of an mm address space by switch_mm. The current order is first the global TLB flush and then the copy of the cpu_attach_mask to the mm_cpumask. The order needs to be the other way around. Reviewed-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Martin Schwidefsky <schwidefsky@de.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27net/netfilter/nf_conntrack_core: Fix net_conntrack_lock()Manfred Spraul
commit 3ef0c7a730de0bae03d86c19570af764fa3c4445 upstream. As we want to remove spin_unlock_wait() and replace it with explicit spin_lock()/spin_unlock() calls, we can use this to simplify the locking. In addition: - Reading nf_conntrack_locks_all needs ACQUIRE memory ordering. - The new code avoids the backwards loop. Only slightly tested, I did not manage to trigger calls to nf_conntrack_all_lock(). V2: With improved comments, to clearly show how the barriers pair. Fixes: b16c29191dc8 ("netfilter: nf_conntrack: use safer way to lock all buckets") Signed-off-by: Manfred Spraul <manfred@colorfullife.com> Cc: Alan Stern <stern@rowland.harvard.edu> Cc: Sasha Levin <sasha.levin@oracle.com> Cc: Pablo Neira Ayuso <pablo@netfilter.org> Cc: netfilter-devel@vger.kernel.org Signed-off-by: Paul E. McKenney <paulmck@linux.vnet.ibm.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27PCI: pciehp: Report power fault only once until we clear itKeith Busch
commit 7612b3b28c0b900dcbcdf5e9b9747cc20a1e2455 upstream. When a power fault occurs, the power controller sets Power Fault Detected in the Slot Status register, and pciehp_isr() queues an INT_POWER_FAULT event to handle it. It also clears Power Fault Detected, but since nothing has yet changed to correct the power fault, the power controller will likely set it again immediately, which may cause an infinite loop when pcie_isr() rechecks Slot Status. Fix that by masking off Power Fault Detected from new events if the driver hasn't seen the power fault clear from the previous handling attempt. Fixes: fad214b0aa72 ("PCI: pciehp: Process all hotplug events before looking for new ones") Signed-off-by: Keith Busch <keith.busch@intel.com> [bhelgaas: changelog, pull test out and add comment] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Cc: Mayurkumar Patel <mayurkumar.patel@intel.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27PCI: shpchp: Enable bridge bus mastering if MSI is enabledAleksandr Bezzubikov
commit 48b79a14505349a29b3e20f03619ada9b33c4b17 upstream. An SHPC may generate MSIs to notify software about slot or controller events (SHPC spec r1.0, sec 4.7). A PCI device can only generate an MSI if it has bus mastering enabled. Enable bus mastering if the bridge contains an SHPC that uses MSI for event notifications. Signed-off-by: Aleksandr Bezzubikov <zuban32s@gmail.com> [bhelgaas: changelog] Signed-off-by: Bjorn Helgaas <bhelgaas@google.com> Reviewed-by: Marcel Apfelbaum <marcel@redhat.com> Acked-by: Michael S. Tsirkin <mst@redhat.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27ARC: Re-enable MMU upon Machine Check exceptionJose Abreu
commit 1ee55a8f7f6b7ca4c0c59e0b4b4e3584a085c2d3 upstream. I recently came upon a scenario where I would get a double fault machine check exception tiriggered by a kernel module. However the ensuing crash stacktrace (ksym lookup) was not working correctly. Turns out that machine check auto-disables MMU while modules are allocated in kernel vaddr spapce. This patch re-enables the MMU before start printing the stacktrace making stacktracing of modules work upon a fatal exception. Signed-off-by: Jose Abreu <joabreu@synopsys.com> Reviewed-by: Alexey Brodkin <abrodkin@synopsys.com> Signed-off-by: Vineet Gupta <vgupta@synopsys.com> [vgupta: moved code into low level handler to avoid in 2 places] Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27tracing: Apply trace_clock changes to instance max bufferBaohong Liu
commit 170b3b1050e28d1ba0700e262f0899ffa4fccc52 upstream. Currently trace_clock timestamps are applied to both regular and max buffers only for global trace. For instance trace, trace_clock timestamps are applied only to regular buffer. But, regular and max buffers can be swapped, for example, following a snapshot. So, for instance trace, bad timestamps can be seen following a snapshot. Let's apply trace_clock timestamps to instance max buffer as well. Link: http://lkml.kernel.org/r/ebdb168d0be042dcdf51f81e696b17fabe3609c1.1504642143.git.tom.zanussi@linux.intel.com Fixes: 277ba0446 ("tracing: Add interface to allow multiple trace buffers") Signed-off-by: Baohong Liu <baohong.liu@intel.com> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27tracing: Add barrier to trace_printk() buffer nesting modificationSteven Rostedt (VMware)
commit 3d9622c12c8873911f4cc0ccdabd0362c2fca06b upstream. trace_printk() uses 4 buffers, one for each context (normal, softirq, irq and NMI), such that it does not need to worry about one context preempting the other. There's a nesting counter that gets incremented to figure out which buffer to use. If the context gets preempted by another context which calls trace_printk() it will increment the counter and use the next buffer, and restore the counter when it is finished. The problem is that gcc may optimize the modification of the buffer nesting counter and it may not be incremented in memory before the buffer is used. If this happens, and the context gets interrupted by another context, it could pick the same buffer and corrupt the one that is being used. Compiler barriers need to be added after the nesting variable is incremented and before it is decremented to prevent usage of the context buffers by more than one context at the same time. Cc: Andy Lutomirski <luto@kernel.org> Fixes: e2ace00117 ("tracing: Choose static tp_printk buffer by explicit nesting count") Hat-tip-to: Peter Zijlstra <peterz@infradead.org> Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27ftrace: Fix memleak when unregistering dynamic ops when tracing disabledSteven Rostedt (VMware)
commit edb096e00724f02db5f6ec7900f3bbd465c6c76f upstream. If function tracing is disabled by the user via the function-trace option or the proc sysctl file, and a ftrace_ops that was allocated on the heap is unregistered, then the shutdown code exits out without doing the proper clean up. This was found via kmemleak and running the ftrace selftests, as one of the tests unregisters with function tracing disabled. # cat kmemleak unreferenced object 0xffffffffa0020000 (size 4096): comm "swapper/0", pid 1, jiffies 4294668889 (age 569.209s) hex dump (first 32 bytes): 55 ff 74 24 10 55 48 89 e5 ff 74 24 18 55 48 89 U.t$.UH...t$.UH. e5 48 81 ec a8 00 00 00 48 89 44 24 50 48 89 4c .H......H.D$PH.L backtrace: [<ffffffff81d64665>] kmemleak_vmalloc+0x85/0xf0 [<ffffffff81355631>] __vmalloc_node_range+0x281/0x3e0 [<ffffffff8109697f>] module_alloc+0x4f/0x90 [<ffffffff81091170>] arch_ftrace_update_trampoline+0x160/0x420 [<ffffffff81249947>] ftrace_startup+0xe7/0x300 [<ffffffff81249bd2>] register_ftrace_function+0x72/0x90 [<ffffffff81263786>] trace_selftest_ops+0x204/0x397 [<ffffffff82bb8971>] trace_selftest_startup_function+0x394/0x624 [<ffffffff81263a75>] run_tracer_selftest+0x15c/0x1d7 [<ffffffff82bb83f1>] init_trace_selftests+0x75/0x192 [<ffffffff81002230>] do_one_initcall+0x90/0x1e2 [<ffffffff82b7d620>] kernel_init_freeable+0x350/0x3fe [<ffffffff81d61ec3>] kernel_init+0x13/0x122 [<ffffffff81d72c6a>] ret_from_fork+0x2a/0x40 [<ffffffffffffffff>] 0xffffffffffffffff Fixes: 12cce594fa ("ftrace/x86: Allow !CONFIG_PREEMPT dynamic ops to use allocated trampolines") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27ftrace: Fix selftest goto location on errorSteven Rostedt (VMware)
commit 46320a6acc4fb58f04bcf78c4c942cc43b20f986 upstream. In the second iteration of trace_selftest_ops(), the error goto label is wrong in the case where trace_selftest_test_global_cnt is off. In the case of error, it leaks the dynamic ops that was allocated. Fixes: 95950c2e ("ftrace: Add self-tests for multiple function trace users") Signed-off-by: Steven Rostedt (VMware) <rostedt@goodmis.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27scsi: qla2xxx: Fix an integer overflow in sysfs codeDan Carpenter
commit e6f77540c067b48dee10f1e33678415bfcc89017 upstream. The value of "size" comes from the user. When we add "start + size" it could lead to an integer overflow bug. It means we vmalloc() a lot more memory than we had intended. I believe that on 64 bit systems vmalloc() can succeed even if we ask it to allocate huge 4GB buffers. So we would get memory corruption and likely a crash when we call ha->isp_ops->write_optrom() and ->read_optrom(). Only root can trigger this bug. Link: https://bugzilla.kernel.org/show_bug.cgi?id=194061 Fixes: b7cc176c9eb3 ("[SCSI] qla2xxx: Allow region-based flash-part accesses.") Reported-by: shqking <shqking@gmail.com> Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27scsi: qla2xxx: Correction to vha->vref_count timeoutJoe Carnuccio
commit 6e98095f8fb6d98da34c4e6c34e69e7c638d79c0 upstream. Fix incorrect second argument for wait_event_timeout() Fixes: c4a9b538ab2a ("qla2xxx: Allow vref count to timeout on vport delete.") Signed-off-by: Joe Carnuccio <joe.carnuccio@cavium.com> Signed-off-by: Himanshu Madhani <himanshu.madhani@cavium.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27scsi: sg: fixup infoleak when using SG_GET_REQUEST_TABLEHannes Reinecke
commit 3e0097499839e0fe3af380410eababe5a47c4cf9 upstream. When calling SG_GET_REQUEST_TABLE ioctl only a half-filled table is returned; the remaining part will then contain stale kernel memory information. This patch zeroes out the entire table to avoid this issue. Signed-off-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Reviewed-by: Eric Dumazet <edumazet@google.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27scsi: sg: factor out sg_fill_request_table()Hannes Reinecke
commit 4759df905a474d245752c9dc94288e779b8734dd upstream. Factor out sg_fill_request_table() for better readability. [mkp: typos, applied by hand] Signed-off-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Bart Van Assche <bart.vanassche@wdc.com> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27scsi: sg: off by one in sg_ioctl()Dan Carpenter
commit bd46fc406b30d1db1aff8dabaff8d18bb423fdcf upstream. If "val" is SG_MAX_QUEUE then we are one element beyond the end of the "rinfo" array so the > should be >=. Fixes: 109bade9c625 ("scsi: sg: use standard lists for sg_requests") Signed-off-by: Dan Carpenter <dan.carpenter@oracle.com> Acked-by: Douglas Gilbert <dgilbert@interlog.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27scsi: sg: use standard lists for sg_requestsHannes Reinecke
commit 109bade9c625c89bb5ea753aaa1a0a97e6fbb548 upstream. 'Sg_request' is using a private list implementation; convert it to standard lists. Signed-off-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Tested-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27scsi: sg: remove 'save_scat_len'Hannes Reinecke
commit 136e57bf43dc4babbfb8783abbf707d483cacbe3 upstream. Unused. Signed-off-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Johannes Thumshirn <jthumshirn@suse.de> Tested-by: Johannes Thumshirn <jthumshirn@suse.de> Reviewed-by: Christoph Hellwig <hch@lst.de> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27scsi: storvsc: fix memory leak on ring buffer busyLong Li
commit 0208eeaa650c5c866a3242201678a19e6dc4a14e upstream. When storvsc is sending I/O to Hyper-v, it may allocate a bigger buffer descriptor for large data payload that can't fit into a pre-allocated buffer descriptor. This bigger buffer is freed on return path. If I/O request to Hyper-v fails due to ring buffer busy, the storvsc allocated buffer descriptor should also be freed. [mkp: applied by hand] Fixes: be0cf6ca301c ("scsi: storvsc: Set the tablesize based on the information given by the host") Signed-off-by: Long Li <longli@microsoft.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27scsi: megaraid_sas: Return pended IOCTLs with cmd_status ↵Shivasharan S
MFI_STAT_WRONG_STATE in case adapter is dead commit eb3fe263a48b0d27b229c213929c4cb3b1b39a0f upstream. After a kill adapter, since the cmd_status is not set, the IOCTLs will be hung in driver resulting in application hang. Set cmd_status MFI_STAT_WRONG_STATE when completing pended IOCTLs. Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27scsi: megaraid_sas: Check valid aen class range to avoid kernel panicShivasharan S
commit 91b3d9f0069c8307d0b3a4c6843b65a439183318 upstream. Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27scsi: megaraid_sas: set minimum value of resetwaittime to be 1 secsShivasharan S
commit e636a7a430f41efb0ff2727960ce61ef9f8f6769 upstream. Setting resetwaittime to 0 during a FW fault will result in driver not calling the OCR. Signed-off-by: Kashyap Desai <kashyap.desai@broadcom.com> Signed-off-by: Shivasharan S <shivasharan.srikanteshwara@broadcom.com> Reviewed-by: Hannes Reinecke <hare@suse.com> Reviewed-by: Tomas Henzl <thenzl@redhat.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27scsi: zfcp: trace high part of "new" 64 bit SCSI LUNSteffen Maier
commit 5d4a3d0a2ff23799b956e5962b886287614e7fad upstream. Complements debugging aspects of the otherwise functionally complete v3.17 commit 9cb78c16f5da ("scsi: use 64-bit LUNs"). While I don't have access to a target exporting 3 or 4 level LUNs, I did test it by explicitly attaching a non-existent fake 4 level LUN by means of zfcp sysfs attribute "unit_add". In order to see corresponding trace records of otherwise successful events, we had to increase the trace level of area SCSI and HBA to 6. $ echo 6 > /sys/kernel/debug/s390dbf/zfcp_0.0.1880_scsi/level $ echo 6 > /sys/kernel/debug/s390dbf/zfcp_0.0.1880_hba/level $ echo 0x4011402240334044 > \ /sys/bus/ccw/drivers/zfcp/0.0.1880/0x50050763031bd327/unit_add Example output formatted by an updated zfcpdbf from the s390-tools package interspersed with kernel messages at scsi_logging_level=4605: Timestamp : ... Area : REC Subarea : 00 Level : 1 Exception : - CPU ID : .. Caller : 0x... Record ID : 1 Tag : scsla_1 LUN : 0x4011402240334044 WWPN : 0x50050763031bd327 D_ID : 0x00...... Adapter status : 0x5400050b Port status : 0x54000001 LUN status : 0x41000000 Ready count : 0x00000001 Running count : 0x00000000 ERP want : 0x01 ERP need : 0x01 scsi 2:0:0:4630896905707208721: scsi scan: INQUIRY pass 1 length 36 scsi 2:0:0:4630896905707208721: scsi scan: INQUIRY successful with code 0x0 Timestamp : ... Area : HBA Subarea : 00 Level : 6 Exception : - CPU ID : .. Caller : 0x... Record ID : 1 Tag : fs_norm Request ID : 0x<inquiry2-req-id> Request status : 0x00000010 FSF cmnd : 0x00000001 FSF sequence no: 0x... FSF issued : ... FSF stat : 0x00000000 FSF stat qual : 00000000 00000000 00000000 00000000 Prot stat : 0x00000001 Prot stat qual : ........ ........ 00000000 00000000 Port handle : 0x... LUN handle : 0x... | Timestamp : ... Area : SCSI Subarea : 00 Level : 6 Exception : - CPU ID : .. Caller : 0x... Record ID : 1 Tag : rsl_nor Request ID : 0x<inquiry2-req-id> SCSI ID : 0x00000000 SCSI LUN : 0x40224011 SCSI LUN high : 0x40444033 <======================= SCSI result : 0x00000000 SCSI retries : 0x00 SCSI allowed : 0x03 SCSI scribble : 0x<inquiry2-req-id> SCSI opcode : 12000000 a4000000 00000000 00000000 FCP rsp inf cod: 0x00 FCP rsp IU : 00000000 00000000 00000000 00000000 00000000 00000000 scsi 2:0:0:4630896905707208721: scsi scan: INQUIRY pass 2 length 164 scsi 2:0:0:4630896905707208721: scsi scan: INQUIRY successful with code 0x0 scsi 2:0:0:4630896905707208721: scsi scan: peripheral device type of 31, \ no device added Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: 9cb78c16f5da ("scsi: use 64-bit LUNs") Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Reviewed-by: Jens Remus <jremus@linux.vnet.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27scsi: zfcp: trace HBA FSF response by default on dismiss or timedout late ↵Steffen Maier
response commit fdb7cee3b9e3c561502e58137a837341f10cbf8b upstream. At the default trace level, we only trace unsuccessful events including FSF responses. zfcp_dbf_hba_fsf_response() only used protocol status and FSF status to decide on an unsuccessful response. However, this is only one of multiple possible sources determining a failed struct zfcp_fsf_req. An FSF request can also "fail" if its response runs into an ERP timeout or if it gets dismissed because a higher level recovery was triggered [trace tags "erscf_1" or "erscf_2" in zfcp_erp_strategy_check_fsfreq()]. FSF requests with ERP timeout are: FSF_QTCB_EXCHANGE_CONFIG_DATA, FSF_QTCB_EXCHANGE_PORT_DATA, FSF_QTCB_OPEN_PORT_WITH_DID or FSF_QTCB_CLOSE_PORT or FSF_QTCB_CLOSE_PHYSICAL_PORT for target ports, FSF_QTCB_OPEN_LUN, FSF_QTCB_CLOSE_LUN. One example is slow queue processing which can cause follow-on errors, e.g. FSF_PORT_ALREADY_OPEN after FSF_QTCB_OPEN_PORT_WITH_DID timed out. In order to see the root cause, we need to see late responses even if the channel presented them successfully with FSF_PROT_GOOD and FSF_GOOD. Example trace records formatted with zfcpdbf from the s390-tools package: Timestamp : ... Area : REC Subarea : 00 Level : 1 Exception : - CPU ID : .. Caller : ... Record ID : 1 Tag : fcegpf1 LUN : 0xffffffffffffffff WWPN : 0x<WWPN> D_ID : 0x00<D_ID> Adapter status : 0x5400050b Port status : 0x41200000 LUN status : 0x00000000 Ready count : 0x00000001 Running count : 0x... ERP want : 0x02 ZFCP_ERP_ACTION_REOPEN_PORT ERP need : 0x02 ZFCP_ERP_ACTION_REOPEN_PORT | Timestamp : ... 30 seconds later Area : REC Subarea : 00 Level : 1 Exception : - CPU ID : .. Caller : ... Record ID : 2 Tag : erscf_2 LUN : 0xffffffffffffffff WWPN : 0x<WWPN> D_ID : 0x00<D_ID> Adapter status : 0x5400050b Port status : 0x41200000 LUN status : 0x00000000 Request ID : 0x<request_ID> ERP status : 0x10000000 ZFCP_STATUS_ERP_TIMEDOUT ERP step : 0x0800 ZFCP_ERP_STEP_PORT_OPENING ERP action : 0x02 ZFCP_ERP_ACTION_REOPEN_PORT ERP count : 0x00 | Timestamp : ... later than previous record Area : HBA Subarea : 00 Level : 5 > default level => 3 <= default level Exception : - CPU ID : 00 Caller : ... Record ID : 1 Tag : fs_qtcb => fs_rerr Request ID : 0x<request_ID> Request status : 0x00001010 ZFCP_STATUS_FSFREQ_DISMISSED | ZFCP_STATUS_FSFREQ_CLEANUP FSF cmnd : 0x00000005 FSF sequence no: 0x... FSF issued : ... > 30 seconds ago FSF stat : 0x00000000 FSF_GOOD FSF stat qual : 00000000 00000000 00000000 00000000 Prot stat : 0x00000001 FSF_PROT_GOOD Prot stat qual : 00000000 00000000 00000000 00000000 Port handle : 0x... LUN handle : 0x00000000 QTCB log length: ... QTCB log info : ... In case of problems detecting that new responses are waiting on the input queue, we sooner or later trigger adapter recovery due to an FSF request timeout (trace tag "fsrth_1"). FSF requests with FSF request timeout are: typically FSF_QTCB_ABORT_FCP_CMND; but theoretically also FSF_QTCB_EXCHANGE_CONFIG_DATA or FSF_QTCB_EXCHANGE_PORT_DATA via sysfs, FSF_QTCB_OPEN_PORT_WITH_DID or FSF_QTCB_CLOSE_PORT for WKA ports, FSF_QTCB_FCP_CMND for task management function (LUN / target reset). One or more pending requests can meanwhile have FSF_PROT_GOOD and FSF_GOOD because the channel filled in the response via DMA into the request's QTCB. In a theroretical case, inject code can create an erroneous FSF request on purpose. If data router is enabled, it uses deferred error reporting. A READ SCSI command can succeed with FSF_PROT_GOOD, FSF_GOOD, and SAM_STAT_GOOD. But on writing the read data to host memory via DMA, it can still fail, e.g. if an intentionally wrong scatter list does not provide enough space. Rather than getting an unsuccessful response, we get a QDIO activate check which in turn triggers adapter recovery. One or more pending requests can meanwhile have FSF_PROT_GOOD and FSF_GOOD because the channel filled in the response via DMA into the request's QTCB. Example trace records formatted with zfcpdbf from the s390-tools package: Timestamp : ... Area : HBA Subarea : 00 Level : 6 > default level => 3 <= default level Exception : - CPU ID : .. Caller : ... Record ID : 1 Tag : fs_norm => fs_rerr Request ID : 0x<request_ID2> Request status : 0x00001010 ZFCP_STATUS_FSFREQ_DISMISSED | ZFCP_STATUS_FSFREQ_CLEANUP FSF cmnd : 0x00000001 FSF sequence no: 0x... FSF issued : ... FSF stat : 0x00000000 FSF_GOOD FSF stat qual : 00000000 00000000 00000000 00000000 Prot stat : 0x00000001 FSF_PROT_GOOD Prot stat qual : ........ ........ 00000000 00000000 Port handle : 0x... LUN handle : 0x... | Timestamp : ... Area : SCSI Subarea : 00 Level : 3 Exception : - CPU ID : .. Caller : ... Record ID : 1 Tag : rsl_err Request ID : 0x<request_ID2> SCSI ID : 0x... SCSI LUN : 0x... SCSI result : 0x000e0000 DID_TRANSPORT_DISRUPTED SCSI retries : 0x00 SCSI allowed : 0x05 SCSI scribble : 0x<request_ID2> SCSI opcode : 28... Read(10) FCP rsp inf cod: 0x00 FCP rsp IU : 00000000 00000000 00000000 00000000 ^^ SAM_STAT_GOOD 00000000 00000000 Only with luck in both above cases, we could see a follow-on trace record of an unsuccesful event following a successful but late FSF response with FSF_PROT_GOOD and FSF_GOOD. Typically this was the case for I/O requests resulting in a SCSI trace record "rsl_err" with DID_TRANSPORT_DISRUPTED [On ZFCP_STATUS_FSFREQ_DISMISSED, zfcp_fsf_protstatus_eval() sets ZFCP_STATUS_FSFREQ_ERROR seen by the request handler functions as failure]. However, the reason for this follow-on trace was invisible because the corresponding HBA trace record was missing at the default trace level (by default hidden records with tags "fs_norm", "fs_qtcb", or "fs_open"). On adapter recovery, after we had shut down the QDIO queues, we perform unsuccessful pseudo completions with flag ZFCP_STATUS_FSFREQ_DISMISSED for each pending FSF request in zfcp_fsf_req_dismiss_all(). In order to find the root cause, we need to see all pseudo responses even if the channel presented them successfully with FSF_PROT_GOOD and FSF_GOOD. Therefore, check zfcp_fsf_req.status for ZFCP_STATUS_FSFREQ_DISMISSED or ZFCP_STATUS_FSFREQ_ERROR and trace with a new tag "fs_rerr". It does not matter that there are numerous places which set ZFCP_STATUS_FSFREQ_ERROR after the location where we trace an FSF response early. These cases are based on protocol status != FSF_PROT_GOOD or == FSF_PROT_FSF_STATUS_PRESENTED and are thus already traced by default as trace tag "fs_perr" or "fs_ferr" respectively. NB: The trace record with tag "fssrh_1" for status read buffers on dismiss all remains. zfcp_fsf_req_complete() handles this and returns early. All other FSF request types are handled separately and as described above. Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: 8a36e4532ea1 ("[SCSI] zfcp: enhancement of zfcp debug features") Fixes: 2e261af84cdb ("[SCSI] zfcp: Only collect FSF/HBA debug data for matching trace levels") Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27scsi: zfcp: fix payload with full FCP_RSP IU in SCSI trace recordsSteffen Maier
commit 12c3e5754c8022a4f2fd1e9f00d19e99ee0d3cc1 upstream. If the FCP_RSP UI has optional parts (FCP_SNS_INFO or FCP_RSP_INFO) and thus does not fit into the fsp_rsp field built into a SCSI trace record, trace the full FCP_RSP UI with all optional parts as payload record instead of just FCP_SNS_INFO as payload and a 1 byte RSP_INFO_CODE part of FCP_RSP_INFO built into the SCSI record. That way we would also get the full FCP_SNS_INFO in case a target would ever send more than min(SCSI_SENSE_BUFFERSIZE==96, ZFCP_DBF_PAY_MAX_REC==256)==96. The mandatory part of FCP_RSP IU is only 24 bytes. PAYload costs at least one full PAY record of 256 bytes anyway. We cap to the hardware response size which is only FSF_FCP_RSP_SIZE==128. So we can just put the whole FCP_RSP IU with any optional parts into PAYload similarly as we do for SAN PAY since v4.9 commit aceeffbb59bb ("zfcp: trace full payload of all SAN records (req,resp,iels)"). This does not cause any additional trace records wasting memory. Decoded trace records were confusing because they showed a hard-coded sense data length of 96 even if the FCP_RSP_IU field FCP_SNS_LEN showed actually less. Since the same commit, we set pl_len for SAN traces to the full length of a request/response even if we cap the corresponding trace. In contrast, here for SCSI traces we set pl_len to the pre-computed length of FCP_RSP IU considering SNS_LEN or RSP_LEN if valid. Nonetheless we trace a hardcoded payload of length FSF_FCP_RSP_SIZE==128 if there were optional parts. This makes it easier for the zfcpdbf tool to format only the relevant part of the long FCP_RSP UI buffer. And any trailing information is still available in the payload trace record just in case. Rename the payload record tag from "fcp_sns" to "fcp_riu" to make the new content explicit to zfcpdbf which can then pick a suitable field name such as "FCP rsp IU all:" instead of "Sense info :" Also, the same zfcpdbf can still be backwards compatible with "fcp_sns". Old example trace record before this fix, formatted with the tool zfcpdbf from s390-tools: Timestamp : ... Area : SCSI Subarea : 00 Level : 3 Exception : - CPU id : .. Caller : 0x... Record id : 1 Tag : rsl_err Request id : 0x<request_id> SCSI ID : 0x... SCSI LUN : 0x... SCSI result : 0x00000002 SCSI retries : 0x00 SCSI allowed : 0x05 SCSI scribble : 0x<request_id> SCSI opcode : 00000000 00000000 00000000 00000000 FCP rsp inf cod: 0x00 FCP rsp IU : 00000000 00000000 00000202 00000000 ^^==FCP_SNS_LEN_VALID 00000020 00000000 ^^^^^^^^==FCP_SNS_LEN==32 Sense len : 96 <==min(SCSI_SENSE_BUFFERSIZE,ZFCP_DBF_PAY_MAX_REC) Sense info : 70000600 00000018 00000000 29000000 00000400 00000000 00000000 00000000 00000000 00000000 00000000 00000000<==superfluous 00000000 00000000 00000000 00000000<==superfluous 00000000 00000000 00000000 00000000<==superfluous 00000000 00000000 00000000 00000000<==superfluous New example trace records with this fix: Timestamp : ... Area : SCSI Subarea : 00 Level : 3 Exception : - CPU ID : .. Caller : 0x... Record ID : 1 Tag : rsl_err Request ID : 0x<request_id> SCSI ID : 0x... SCSI LUN : 0x... SCSI result : 0x00000002 SCSI retries : 0x00 SCSI allowed : 0x03 SCSI scribble : 0x<request_id> SCSI opcode : a30c0112 00000000 02000000 00000000 FCP rsp inf cod: 0x00 FCP rsp IU : 00000000 00000000 00000a02 00000200 00000020 00000000 FCP rsp IU len : 56 FCP rsp IU all : 00000000 00000000 00000a02 00000200 ^^=FCP_RESID_UNDER|FCP_SNS_LEN_VALID 00000020 00000000 70000500 00000018 ^^^^^^^^==FCP_SNS_LEN ^^^^^^^^^^^^^^^^^ 00000000 240000cb 00011100 00000000 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 00000000 00000000 ^^^^^^^^^^^^^^^^^==FCP_SNS_INFO Timestamp : ... Area : SCSI Subarea : 00 Level : 1 Exception : - CPU ID : .. Caller : 0x... Record ID : 1 Tag : lr_okay Request ID : 0x<request_id> SCSI ID : 0x... SCSI LUN : 0x... SCSI result : 0x00000000 SCSI retries : 0x00 SCSI allowed : 0x05 SCSI scribble : 0x<request_id> SCSI opcode : <CDB of unrelated SCSI command passed to eh handler> FCP rsp inf cod: 0x00 FCP rsp IU : 00000000 00000000 00000100 00000000 00000000 00000008 FCP rsp IU len : 32 FCP rsp IU all : 00000000 00000000 00000100 00000000 ^^==FCP_RSP_LEN_VALID 00000000 00000008 00000000 00000000 ^^^^^^^^==FCP_RSP_LEN ^^^^^^^^^^^^^^^^^==FCP_RSP_INFO Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: 250a1352b95e ("[SCSI] zfcp: Redesign of the debug tracing for SCSI records.") Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27scsi: zfcp: fix missing trace records for early returns in TMF eh handlersSteffen Maier
commit 1a5d999ebfc7bfe28deb48931bb57faa8e4102b6 upstream. For problem determination we need to see that we were in scsi_eh as well as whether and why we were successful or not. The following commits introduced new early returns without adding a trace record: v2.6.35 commit a1dbfddd02d2 ("[SCSI] zfcp: Pass return code from fc_block_scsi_eh to scsi eh") on fc_block_scsi_eh() returning != 0 which is FAST_IO_FAIL, v2.6.30 commit 63caf367e1c9 ("[SCSI] zfcp: Improve reliability of SCSI eh handlers in zfcp") on not having gotten an FSF request after the maximum number of retry attempts and thus could not issue a TMF and has to return FAILED. Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: a1dbfddd02d2 ("[SCSI] zfcp: Pass return code from fc_block_scsi_eh to scsi eh") Fixes: 63caf367e1c9 ("[SCSI] zfcp: Improve reliability of SCSI eh handlers in zfcp") Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27scsi: zfcp: fix passing fsf_req to SCSI trace on TMF to correlate with HBASteffen Maier
commit 9fe5d2b2fd30aa8c7827ec62cbbe6d30df4fe3e3 upstream. Without this fix we get SCSI trace records on task management functions which cannot be correlated to HBA trace records because all fields related to the FSF request are empty (zero). Also, the FCP_RSP_IU is missing as well as any sense data if available. This was caused by v2.6.14 commit 8a36e4532ea1 ("[SCSI] zfcp: enhancement of zfcp debug features") introducing trace records for TMFs but hard coding NULL for a possibly existing TMF FSF request. The scsi_cmnd scribble is also zero or unrelated for the TMF request so it also could not lookup a suitable FSF request from there. A broken example trace record formatted with zfcpdbf from the s390-tools package: Timestamp : ... Area : SCSI Subarea : 00 Level : 1 Exception : - CPU ID : .. Caller : 0x... Record ID : 1 Tag : lr_fail Request ID : 0x0000000000000000 ^^^^^^^^^^^^^^^^ no correlation to HBA record SCSI ID : 0x<scsitarget> SCSI LUN : 0x<scsilun> SCSI result : 0x000e0000 SCSI retries : 0x00 SCSI allowed : 0x05 SCSI scribble : 0x0000000000000000 SCSI opcode : 2a000017 3bb80000 08000000 00000000 FCP rsp inf cod: 0x00 ^^ no TMF response FCP rsp IU : 00000000 00000000 00000000 00000000 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 00000000 00000000 ^^^^^^^^^^^^^^^^^ no interesting FCP_RSP_IU Sense len : ... ^^^^^^^^^^^^^^^^^^^^ no sense data length Sense info : ... ^^^^^^^^^^^^^^^^^^^^ no sense data content, even if present There are some true cases where we really do not have an FSF request: "rsl_fai" from zfcp_dbf_scsi_fail_send() called for early returns / completions in zfcp_scsi_queuecommand(), "abrt_or", "abrt_bl", "abrt_ru", "abrt_ar" from zfcp_scsi_eh_abort_handler() where we did not get as far, "lr_nres", "tr_nres" from zfcp_task_mgmt_function() where we're successful and do not need to do anything because adapter stopped. For these cases it's correct to pass NULL for fsf_req to _zfcp_dbf_scsi(). Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: 8a36e4532ea1 ("[SCSI] zfcp: enhancement of zfcp debug features") Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27scsi: zfcp: fix capping of unsuccessful GPN_FT SAN response trace recordsSteffen Maier
commit 975171b4461be296a35e83ebd748946b81cf0635 upstream. v4.9 commit aceeffbb59bb ("zfcp: trace full payload of all SAN records (req,resp,iels)") fixed trace data loss of 2.6.38 commit 2c55b750a884 ("[SCSI] zfcp: Redesign of the debug tracing for SAN records.") necessary for problem determination, e.g. to see the currently active zone set during automatic port scan. While it already saves space by not dumping any empty residual entries of the large successful GPN_FT response (4 pages), there are seldom cases where the GPN_FT response is unsuccessful and likely does not have FC_NS_FID_LAST set in fp_flags so we did not cap the trace record. We typically see such case for an initiator WWPN, which is not in any zone. Cap unsuccessful responses to at least the actual basic CT_IU response plus whatever fits the SAN trace record built-in "payload" buffer just in case there's trailing information of which we would at least see the existence and its beginning. In order not to erroneously cap successful responses, we need to swap calling the trace function and setting the CT / ELS status to success (0). Example trace record pair formatted with zfcpdbf: Timestamp : ... Area : SAN Subarea : 00 Level : 1 Exception : - CPU ID : .. Caller : 0x... Record ID : 1 Tag : fssct_1 Request ID : 0x<request_id> Destination ID : 0x00fffffc SAN req short : 01000000 fc020000 01720ffc 00000000 00000008 SAN req length : 20 | Timestamp : ... Area : SAN Subarea : 00 Level : 1 Exception : - CPU ID : .. Caller : 0x... Record ID : 2 Tag : fsscth2 Request ID : 0x<request_id> Destination ID : 0x00fffffc SAN resp short : 01000000 fc020000 80010000 00090700 00000000 00000000 00000000 00000000 [trailing info] 00000000 00000000 00000000 00000000 [trailing info] SAN resp length: 16384 San resp info : 01000000 fc020000 80010000 00090700 00000000 00000000 00000000 00000000 [trailing info] 00000000 00000000 00000000 00000000 [trailing info] 00000000 00000000 00000000 00000000 [trailing info] 00000000 00000000 00000000 00000000 [trailing info] 00000000 00000000 00000000 00000000 [trailing info] 00000000 00000000 00000000 00000000 [trailing info] 00000000 00000000 00000000 00000000 [trailing info] 00000000 00000000 00000000 00000000 [trailing info] 00000000 00000000 00000000 00000000 [trailing info] 00000000 00000000 00000000 00000000 [trailing info] 00000000 00000000 00000000 00000000 [trailing info] 00000000 00000000 00000000 00000000 [trailing info] 00000000 00000000 00000000 00000000 [trailing info] 00000000 00000000 00000000 00000000 [trailing info] 00000000 00000000 00000000 00000000 [trailing info] The fix saves all but one of the previously associated 64 PAYload trace record chunks of size 256 bytes each. Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: aceeffbb59bb ("zfcp: trace full payload of all SAN records (req,resp,iels)") Fixes: 2c55b750a884 ("[SCSI] zfcp: Redesign of the debug tracing for SAN records.") Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27scsi: zfcp: add handling for FCP_RESID_OVER to the fcp ingress pathBenjamin Block
commit a099b7b1fc1f0418ab8d79ecf98153e1e134656e upstream. Up until now zfcp would just ignore the FCP_RESID_OVER flag in the FCP response IU. When this flag is set, it is possible, in regards to the FCP standard, that the storage-server processes the command normally, up to the point where data is missing and simply ignores those. In this case no CHECK CONDITION would be set, and because we ignored the FCP_RESID_OVER flag we resulted in at least a data loss or even -corruption as a follow-up error, depending on how the applications/layers on top behave. To prevent this, we now set the host-byte of the corresponding scsi_cmnd to DID_ERROR. Other storage-behaviors, where the same condition results in a CHECK CONDITION set in the answer, don't need to be changed as they are handled in the mid-layer already. Following is an example trace record decoded with zfcpdbf from the s390-tools package. We forcefully injected a fc_dl which is one byte too small: Timestamp : ... Area : SCSI Subarea : 00 Level : 3 Exception : - CPU ID : .. Caller : 0x... Record ID : 1 Tag : rsl_err Request ID : 0x... SCSI ID : 0x... SCSI LUN : 0x... SCSI result : 0x00070000 ^^DID_ERROR SCSI retries : 0x.. SCSI allowed : 0x.. SCSI scribble : 0x... SCSI opcode : 2a000000 00000000 08000000 00000000 FCP rsp inf cod: 0x00 FCP rsp IU : 00000000 00000000 00000400 00000001 ^^fr_flags==FCP_RESID_OVER ^^fr_status==SAM_STAT_GOOD ^^^^^^^^fr_resid 00000000 00000000 As of now, we don't actively handle to possibility that a response IU has both flags - FCP_RESID_OVER and FCP_RESID_UNDER - set at once. Reported-by: Luke M. Hopkins <lmhopkin@us.ibm.com> Reviewed-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: 553448f6c483 ("[SCSI] zfcp: Message cleanup") Fixes: ea127f975424 ("[PATCH] s390 (7/7): zfcp host adapter.") (tglx/history.git) Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27scsi: zfcp: fix queuecommand for scsi_eh commands when DIX enabledSteffen Maier
commit 71b8e45da51a7b64a23378221c0a5868bd79da4f upstream. Since commit db007fc5e20c ("[SCSI] Command protection operation"), scsi_eh_prep_cmnd() saves scmd->prot_op and temporarily resets it to SCSI_PROT_NORMAL. Other FCP LLDDs such as qla2xxx and lpfc shield their queuecommand() to only access any of scsi_prot_sg...() if (scsi_get_prot_op(cmd) != SCSI_PROT_NORMAL). Do the same thing for zfcp, which introduced DIX support with commit ef3eb71d8ba4 ("[SCSI] zfcp: Introduce experimental support for DIF/DIX"). Otherwise, TUR SCSI commands as part of scsi_eh likely fail in zfcp, because the regular SCSI command with DIX protection data, that scsi_eh re-uses in scsi_send_eh_cmnd(), of course still has (scsi_prot_sg_count() != 0) and so zfcp sends down bogus requests to the FCP channel hardware. This causes scsi_eh_test_devices() to have (finish_cmds == 0) [not SCSI device is online or not scsi_eh_tur() failed] so regular SCSI commands, that caused / were affected by scsi_eh, are moved to work_q and scsi_eh_test_devices() itself returns false. In turn, it unnecessarily escalates in our case in scsi_eh_ready_devs() beyond host reset to finally scsi_eh_offline_sdevs() which sets affected SCSI devices offline with the following kernel message: "kernel: sd H:0:T:L: Device offlined - not ready after error recovery" Signed-off-by: Steffen Maier <maier@linux.vnet.ibm.com> Fixes: ef3eb71d8ba4 ("[SCSI] zfcp: Introduce experimental support for DIF/DIX") Reviewed-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Benjamin Block <bblock@linux.vnet.ibm.com> Signed-off-by: Martin K. Petersen <martin.petersen@oracle.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27skd: Submit requests to firmware before triggering the doorbellBart Van Assche
commit 5fbd545cd3fd311ea1d6e8be4cedddd0ee5684c7 upstream. Ensure that the members of struct skd_msg_buf have been transferred to the PCIe adapter before the doorbell is triggered. This patch avoids that I/O fails sporadically and that the following error message is reported: (skd0:STM000196603:[0000:00:09.0]): Completion mismatch comp_id=0x0000 skreq=0x0400 new=0x0000 Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.de> Cc: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27skd: Avoid that module unloading triggers a use-after-freeBart Van Assche
commit 7277cc67b3916eed47558c64f9c9c0de00a35cda upstream. Since put_disk() triggers a disk_release() call and since that last function calls blk_put_queue() if disk->queue != NULL, clear the disk->queue pointer before calling put_disk(). This avoids that unloading the skd kernel module triggers the following use-after-free: WARNING: CPU: 8 PID: 297 at lib/refcount.c:128 refcount_sub_and_test+0x70/0x80 refcount_t: underflow; use-after-free. CPU: 8 PID: 297 Comm: kworker/8:1 Not tainted 4.11.10-300.fc26.x86_64 #1 Workqueue: events work_for_cpu_fn Call Trace: dump_stack+0x63/0x84 __warn+0xcb/0xf0 warn_slowpath_fmt+0x5a/0x80 refcount_sub_and_test+0x70/0x80 refcount_dec_and_test+0x11/0x20 kobject_put+0x1f/0x50 blk_put_queue+0x15/0x20 disk_release+0xae/0xf0 device_release+0x32/0x90 kobject_release+0x67/0x170 kobject_put+0x2b/0x50 put_disk+0x17/0x20 skd_destruct+0x5c/0x890 [skd] skd_pci_probe+0x124d/0x13a0 [skd] local_pci_probe+0x42/0xa0 work_for_cpu_fn+0x14/0x20 process_one_work+0x19e/0x470 worker_thread+0x1dc/0x4a0 kthread+0x125/0x140 ret_from_fork+0x25/0x30 Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.de> Cc: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27md/bitmap: disable bitmap_resize for file-backed bitmaps.NeilBrown
commit e8a27f836f165c26f867ece7f31eb5c811692319 upstream. bitmap_resize() does not work for file-backed bitmaps. The buffer_heads are allocated and initialized when the bitmap is read from the file, but resize doesn't read from the file, it loads from the internal bitmap. When it comes time to write the new bitmap, the bh is non-existent and we crash. The common case when growing an array involves making the array larger, and that normally means making the bitmap larger. Doing that inside the kernel is possible, but would need more code. It is probably easier to require people who use file-backed bitmaps to remove them and re-add after a reshape. So this patch disables the resizing of arrays which have file-backed bitmaps. This is better than crashing. Reported-by: Zhilong Liu <zlliu@suse.com> Fixes: d60b479d177a ("md/bitmap: add bitmap_resize function to allow bitmap resizing.") Signed-off-by: NeilBrown <neilb@suse.com> Signed-off-by: Shaohua Li <shli@fb.com> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27block: Relax a check in blk_start_queue()Bart Van Assche
commit 4ddd56b003f251091a67c15ae3fe4a5c5c5e390a upstream. Calling blk_start_queue() from interrupt context with the queue lock held and without disabling IRQs, as the skd driver does, is safe. This patch avoids that loading the skd driver triggers the following warning: WARNING: CPU: 11 PID: 1348 at block/blk-core.c:283 blk_start_queue+0x84/0xa0 RIP: 0010:blk_start_queue+0x84/0xa0 Call Trace: skd_unquiesce_dev+0x12a/0x1d0 [skd] skd_complete_internal+0x1e7/0x5a0 [skd] skd_complete_other+0xc2/0xd0 [skd] skd_isr_completion_posted.isra.30+0x2a5/0x470 [skd] skd_isr+0x14f/0x180 [skd] irq_forced_thread_fn+0x2a/0x70 irq_thread+0x144/0x1a0 kthread+0x125/0x140 ret_from_fork+0x2a/0x40 Fixes: commit a038e2536472 ("[PATCH] blk_start_queue() must be called with irq disabled - add warning") Signed-off-by: Bart Van Assche <bart.vanassche@wdc.com> Cc: Paolo 'Blaisorblade' Giarrusso <blaisorblade@yahoo.it> Cc: Andrew Morton <akpm@osdl.org> Cc: Christoph Hellwig <hch@lst.de> Cc: Hannes Reinecke <hare@suse.de> Cc: Johannes Thumshirn <jthumshirn@suse.de> Signed-off-by: Jens Axboe <axboe@kernel.dk> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27powerpc: Fix DAR reporting when alignment handler faultsMichael Ellerman
commit f9effe925039cf54489b5c04e0d40073bb3a123d upstream. Anton noticed that if we fault part way through emulating an unaligned instruction, we don't update the DAR to reflect that. The DAR value is eventually reported back to userspace as the address in the SEGV signal, and if userspace is using that value to demand fault then it can be confused by us not setting the value correctly. This patch is ugly as hell, but is intended to be the minimal fix and back ports easily. Signed-off-by: Michael Ellerman <mpe@ellerman.id.au> Reviewed-by: Paul Mackerras <paulus@ozlabs.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27ext4: fix quota inconsistency during orphan cleanup for read-only mountszhangyi (F)
commit 95f1fda47c9d8738f858c3861add7bf0a36a7c0b upstream. Quota does not get enabled for read-only mounts if filesystem has quota feature, so that quotas cannot updated during orphan cleanup, which will lead to quota inconsistency. This patch turn on quotas during orphan cleanup for this case, make sure quotas can be updated correctly. Reported-by: Jan Kara <jack@suse.cz> Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27ext4: fix incorrect quotaoff if the quota feature is enabledzhangyi (F)
commit b0a5a9589decd07db755d6a8d9c0910d96ff7992 upstream. Current ext4 quota should always "usage enabled" if the quota feautre is enabled. But in ext4_orphan_cleanup(), it turn quotas off directly (used for the older journaled quota), so we cannot turn it on again via "quotaon" unless umount and remount ext4. Simple reproduce: mkfs.ext4 -O project,quota /dev/vdb1 mount -o prjquota /dev/vdb1 /mnt chattr -p 123 /mnt chattr +P /mnt touch /mnt/aa /mnt/bb exec 100<>/mnt/aa rm -f /mnt/aa sync echo c > /proc/sysrq-trigger #reboot and mount mount -o prjquota /dev/vdb1 /mnt #query status quotaon -Ppv /dev/vdb1 #output quotaon: Cannot find mountpoint for device /dev/vdb1 quotaon: No correct mountpoint specified. This patch add check for journaled quotas to avoid incorrect quotaoff when ext4 has quota feautre. Signed-off-by: zhangyi (F) <yi.zhang@huawei.com> Signed-off-by: Theodore Ts'o <tytso@mit.edu> Reviewed-by: Jan Kara <jack@suse.cz> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27crypto: AF_ALG - remove SGL terminator indicator when chainingStephan Mueller
Fixed differently upstream as commit 2d97591ef43d ("crypto: af_alg - consolidation of duplicate code") The SGL is MAX_SGL_ENTS + 1 in size. The last SG entry is used for the chaining and is properly updated with the sg_chain invocation. During the filling-in of the initial SG entries, sg_mark_end is called for each SG entry. This is appropriate as long as no additional SGL is chained with the current SGL. However, when a new SGL is chained and the last SG entry is updated with sg_chain, the last but one entry still contains the end marker from the sg_mark_end. This end marker must be removed as otherwise a walk of the chained SGLs will cause a NULL pointer dereference at the last but one SG entry, because sg_next will return NULL. The patch only applies to all kernels up to and including 4.13. The patch 2d97591ef43d0587be22ad1b0d758d6df4999a0b added to 4.14-rc1 introduced a complete new code base which addresses this bug in a different way. Yet, that patch is too invasive for stable kernels and was therefore not marked for stable. Fixes: 8ff590903d5fc ("crypto: algif_skcipher - User-space interface for skcipher operations") Signed-off-by: Stephan Mueller <smueller@chronox.de> Acked-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27crypto: ccp - Fix XTS-AES-128 support on v5 CCPsGary R Hook
commit e652399edba99a5497f0d80f240c9075d3b43493 upstream. Version 5 CCPs have some new requirements for XTS-AES: the type field must be specified, and the key requires 512 bits, with each part occupying 256 bits and padded with zeroes. Signed-off-by: Gary R Hook <ghook@amd.com> Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
2017-09-27MIPS: math-emu: <MADDF|MSUBF>.D: Fix accuracy (64-bit case)Douglas Leung
commit 2cfa58259f4b65b33ebe8f167019a1f89c6c3289 upstream. Implement fused multiply-add with correct accuracy. Fused multiply-add operation has better accuracy than respective sequential execution of multiply and add operations applied on the same inputs. This is because accuracy errors accumulate in latter case. This patch implements fused multiply-add with the same accuracy as it is implemented in hardware, using 128-bit intermediate calculations. One test case example (raw bits) that this patch fixes: MADDF.D fd,fs,ft: fd = 0x00000ca000000000 fs = ft = 0x3f40624dd2f1a9fc Fixes: e24c3bec3e8e ("MIPS: math-emu: Add support for the MIPS R6 MADDF FPU instruction") Fixes: 83d43305a1df ("MIPS: math-emu: Add support for the MIPS R6 MSUBF FPU instruction") Signed-off-by: Douglas Leung <douglas.leung@imgtec.com> Signed-off-by: Miodrag Dinic <miodrag.dinic@imgtec.com> Signed-off-by: Goran Ferenc <goran.ferenc@imgtec.com> Signed-off-by: Aleksandar Markovic <aleksandar.markovic@imgtec.com> Cc: Douglas Leung <douglas.leung@imgtec.com> Cc: Bo Hu <bohu@google.com> Cc: James Hogan <james.hogan@imgtec.com> Cc: Jin Qian <jinqian@google.com> Cc: Paul Burton <paul.burton@imgtec.com> Cc: Petar Jovanovic <petar.jovanovic@imgtec.com> Cc: Raghu Gandham <raghu.gandham@imgtec.com> Cc: linux-mips@linux-mips.org Cc: linux-kernel@vger.kernel.org Patchwork: https://patchwork.linux-mips.org/patch/16891/ Signed-off-by: Ralf Baechle <ralf@linux-mips.org> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>