From 97178b7b6c84bd14660b89474d27931a1ea65c66 Mon Sep 17 00:00:00 2001 From: Nick Piggin Date: Thu, 25 Nov 2010 12:47:15 +0200 Subject: exofs: simple fsync race fix It is incorrect to test inode dirty bits without participating in the inode writeback protocol. Inode writeback sets I_SYNC and clears I_DIRTY_?, then writes out the particular bits, then clears I_SYNC when it is done. BTW. it may not completely write all pages out, so I_DIRTY_PAGES would get set again. This is a standard pattern used throughout the kernel's writeback caches (I_SYNC ~= I_WRITEBACK, if that makes it clearer). And so it is not possible to determine an inode's dirty status just by checking I_DIRTY bits. Especially not for the purpose of data integrity syncs. Missing the check for these bits means that fsync can complete while writeback to the inode is underway. Inode writeback functions get this right, so call into them rather than try to shortcut things by testing dirty state improperly. Signed-off-by: Nick Piggin Signed-off-by: Boaz Harrosh --- fs/exofs/file.c | 5 ----- 1 file changed, 5 deletions(-) (limited to 'fs/exofs/file.c') diff --git a/fs/exofs/file.c b/fs/exofs/file.c index b905c79b4f0a..4c0d6bac9143 100644 --- a/fs/exofs/file.c +++ b/fs/exofs/file.c @@ -48,11 +48,6 @@ static int exofs_file_fsync(struct file *filp, int datasync) struct inode *inode = filp->f_mapping->host; struct super_block *sb; - if (!(inode->i_state & I_DIRTY)) - return 0; - if (datasync && !(inode->i_state & I_DIRTY_DATASYNC)) - return 0; - ret = sync_inode_metadata(inode, 1); /* This is a good place to write the sb */ -- cgit v1.2.3 From 1cea312ad49d9cb964179a784fedb1fcfe396283 Mon Sep 17 00:00:00 2001 From: Boaz Harrosh Date: Thu, 3 Feb 2011 17:53:25 +0200 Subject: exofs: Write sbi->s_nextid as part of the Create command Before when creating a new inode, we'd set the sb->s_dirt flag, and sometime later the system would write out s_nextid as part of the sb_info. Also on inode sync we would force the sb sync as well. Define the s_nextid as a new partition attribute and set it every time we create a new object. At mount we read it from it's new place. We now never set sb->s_dirt anywhere in exofs. write_super is actually never called. The call to exofs_write_super from exofs_put_super is also removed because the VFS always calls ->sync_fs before calling ->put_super twice. To stay backward-and-forward compatible we also write the old s_nextid in the super_block object at unmount, and support zero length attribute on mount. This also fixes a BUG where in layouts when group_width was not a divisor of EXOFS_SUPER_ID (0x10000) the s_nextid was not read from the device it was written to. Because of the sliding window layout trick, and because the read was always done from the 0 device but the write was done via the raid engine that might slide the device view. Now we read and write through the raid engine. Signed-off-by: Boaz Harrosh --- fs/exofs/file.c | 11 +---------- 1 file changed, 1 insertion(+), 10 deletions(-) (limited to 'fs/exofs/file.c') diff --git a/fs/exofs/file.c b/fs/exofs/file.c index 4c0d6bac9143..45ca323d8363 100644 --- a/fs/exofs/file.c +++ b/fs/exofs/file.c @@ -45,17 +45,8 @@ static int exofs_release_file(struct inode *inode, struct file *filp) static int exofs_file_fsync(struct file *filp, int datasync) { int ret; - struct inode *inode = filp->f_mapping->host; - struct super_block *sb; - - ret = sync_inode_metadata(inode, 1); - - /* This is a good place to write the sb */ - /* TODO: Sechedule an sb-sync on create */ - sb = inode->i_sb; - if (sb->s_dirt) - exofs_sync_fs(sb, 1); + ret = sync_inode_metadata(filp->f_mapping->host, 1); return ret; } -- cgit v1.2.3