[OpenAFS-devel] to fsync() or not to fsync()

Jeffrey Hutzelman jhutz@cmu.edu
Thu, 14 Sep 2006 20:47:15 -0400


On Thursday, September 14, 2006 07:56:15 PM +0200 "Frank Batschulat (Home)" 
<Frank.Batschulat@Sun.COM> wrote:
>> Huh?  There's no RPC for that.  And no, I don't think file data is
>> synced, but metadata is, which is consistent with the behavior of many
>> other filesystems.
>
> interesting - I'd be curious to know about some examples that only
> update meta data on fsync(3C) but not the actual file data.

You misinterpret me.  I wasn't talking about clients calling fsync, but 
about what the fileserver does on its own.  Ordinarily, the fileserver does 
not write file data synchronously, unless the client requests it.  However, 
it does try to write its own metadata (which looks to the underlying 
filesystem like data) synchronously, as do a variety of other filesystems.

Reviewing the code, pretty much the only time we actually do sync volume 
metadata is when attaching a volume or taking one offline, or when we are 
about to start writing to a volume which currently has the dont-salvage bit 
set on disk.  The latter case is straightforward - we want to make sure the 
dont-salvage bit has really been cleared on the disk before we make any 
changes to the volume; otherwise we could make some other change and the 
salvager wouldn't bother to fix it.  Note that the sync only happens if the 
dont-salvage flag is not already clear; once cleared, the flag is only set 
after 10 minutes of inactivity, and that change is done _without_ a sync.

As for attaching and detaching volumes, I think we might see some 
improvement in the time required to start or stop a fileserver with many 
volumes by skipping all calls to fsync() during VInitVolumePackage() and 
VShutdown(), and instead calling sync(2) when those functions are done. 
However, even then I suspect the savings would be somewhat minimal.

It's been noted that the namei backend syncs the link table every time a 
link count is changed.  I agree this is somewhat excessive; the salvager 
will fix an incorrect link count as long as it is nonzero.  However, a zero 
link count will cause the salvager to completely ignore that underlying 
file, which means the corresponding vnode and directory entry will be 
deleted and the file is essentially lost forever (unless the server admin 
knows how to go digging for unreferenced files lying around in the server's 
data area, which usually go unnoticed for quite some time).

Once the salvager is fixed to deal correctly with cases where the AFSIDat 
directory tree contains a file whose recorded link count is zero, we should 
be able to avoid syncing the link table any more often than we sync other 
volume data (i.e. when the volume goes offline).

We should certainly be able to suppress all syncs of link table updates 
during clone operations (which only increase the link counts on existing 
files) and volume restores (where we are creating a whole new volume and 
have other problems if the server crashes during the operation) as well as 
when the salvager itself is running.

-- Jeff