md/raid1,5,10: Disable WRITE SAME until a recovery strategy is in place

commit 5026d7a9b2f3eb1f9bda66c18ac6bc3036ec9020 upstream. There are cases where the kernel will believe that the WRITE SAME command is supported by a block device which does not, in fact, support WRITE SAME. This currently happens for SATA drivers behind a SAS controller, but there are probably a hundred other ways that can happen, including drive firmware bugs. After receiving an error for WRITE SAME the block layer will retry the request as a plain write of zeroes, but mdraid will consider the failure as fatal and consider the drive failed. This has the effect that all the mirrors containing a specific set of data are each offlined in very rapid succession resulting in data loss. However, just bouncing the request back up to the block layer isn't ideal either, because the whole initial request-retry sequence should be inside the write bitmap fence, which probably means that md needs to do its own conversion of WRITE SAME to write zero. Until the failure scenario has been sorted out, disable WRITE SAME for raid1, raid5, and raid10. [neilb: added raid5] This patch is appropriate for any -stable since 3.7 when write_same support was added. Signed-off-by: H. Peter Anvin <hpa@linux.intel.com> Signed-off-by: NeilBrown <neilb@suse.de> Signed-off-by: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
author: H. Peter Anvin <hpa@zytor.com> 2013-06-12 07:37:43 -0700
committer: Greg Kroah-Hartman <gregkh@linuxfoundation.org> 2013-06-20 12:01:30 -0700
commit: 8f44eed48ec962b50e6b7a7067aa76f41f6de3f3 (patch)
tree: b40a1e4f54d7ea62fc66dee0624df145ec23caff
parent: db9b5815031211b16f24fec0851553e8204b38a1 (diff)
3 files changed, 6 insertions, 5 deletions
diff --git a/drivers/md/raid1.c b/drivers/md/raid1.c
index 8430c0d6cc8e..754f95fd9888 100644
--- a/drivers/md/raid1.c
+++ b/drivers/md/raid1.c
@@ -2837,8 +2837,8 @@ static int run(struct mddev *mddev)
 		return PTR_ERR(conf);
 
 	if (mddev->queue)
-		blk_queue_max_write_same_sectors(mddev->queue,
-						 mddev->chunk_sectors);
+		blk_queue_max_write_same_sectors(mddev->queue, 0);
+
 	rdev_for_each(rdev, mddev) {
 		if (!mddev->gendisk)
 			continue;
diff --git a/drivers/md/raid10.c b/drivers/md/raid10.c
index bc5b444b683d..f17cadd062d0 100644
--- a/drivers/md/raid10.c
+++ b/drivers/md/raid10.c
@@ -3635,8 +3635,7 @@ static int run(struct mddev *mddev)
 	if (mddev->queue) {
 		blk_queue_max_discard_sectors(mddev->queue,
 					      mddev->chunk_sectors);
-		blk_queue_max_write_same_sectors(mddev->queue,
-						 mddev->chunk_sectors);
+		blk_queue_max_write_same_sectors(mddev->queue, 0);
 		blk_queue_io_min(mddev->queue, chunk_size);
 		if (conf->geo.raid_disks % conf->geo.near_copies)
 			blk_queue_io_opt(mddev->queue, chunk_size * conf->geo.raid_disks);
diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index f4e87bfc7567..251ab6449cec 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -5457,7 +5457,7 @@ static int run(struct mddev *mddev)
 		if (mddev->major_version == 0 &&
 		    mddev->minor_version > 90)
 			rdev->recovery_offset = reshape_offset;
-			
+
 		if (rdev->recovery_offset < reshape_offset) {
 			/* We need to check old and new layout */
 			if (!only_parity(rdev->raid_disk,
@@ -5580,6 +5580,8 @@ static int run(struct mddev *mddev)
 		 */
 		mddev->queue->limits.discard_zeroes_data = 0;
 
+		blk_queue_max_write_same_sectors(mddev->queue, 0);
+
 		rdev_for_each(rdev, mddev) {
 			disk_stack_limits(mddev->gendisk, rdev->bdev,
 					  rdev->data_offset << 9);
author	H. Peter Anvin <hpa@zytor.com>	2013-06-12 07:37:43 -0700
committer	Greg Kroah-Hartman <gregkh@linuxfoundation.org>	2013-06-20 12:01:30 -0700
commit	8f44eed48ec962b50e6b7a7067aa76f41f6de3f3 (patch)
tree	b40a1e4f54d7ea62fc66dee0624df145ec23caff
parent	db9b5815031211b16f24fec0851553e8204b38a1 (diff)