[OpenAFS-devel] to fsync() or not to fsync()
Jeffrey Hutzelman
jhutz@cmu.edu
Thu, 14 Sep 2006 20:47:15 -0400
On Thursday, September 14, 2006 07:56:15 PM +0200 "Frank Batschulat (Home)"
<Frank.Batschulat@Sun.COM> wrote:
>> Huh? There's no RPC for that. And no, I don't think file data is
>> synced, but metadata is, which is consistent with the behavior of many
>> other filesystems.
>
> interesting - I'd be curious to know about some examples that only
> update meta data on fsync(3C) but not the actual file data.
You misinterpret me. I wasn't talking about clients calling fsync, but
about what the fileserver does on its own. Ordinarily, the fileserver does
not write file data synchronously, unless the client requests it. However,
it does try to write its own metadata (which looks to the underlying
filesystem like data) synchronously, as do a variety of other filesystems.
Reviewing the code, pretty much the only time we actually do sync volume
metadata is when attaching a volume or taking one offline, or when we are
about to start writing to a volume which currently has the dont-salvage bit
set on disk. The latter case is straightforward - we want to make sure the
dont-salvage bit has really been cleared on the disk before we make any
changes to the volume; otherwise we could make some other change and the
salvager wouldn't bother to fix it. Note that the sync only happens if the
dont-salvage flag is not already clear; once cleared, the flag is only set
after 10 minutes of inactivity, and that change is done _without_ a sync.
As for attaching and detaching volumes, I think we might see some
improvement in the time required to start or stop a fileserver with many
volumes by skipping all calls to fsync() during VInitVolumePackage() and
VShutdown(), and instead calling sync(2) when those functions are done.
However, even then I suspect the savings would be somewhat minimal.
It's been noted that the namei backend syncs the link table every time a
link count is changed. I agree this is somewhat excessive; the salvager
will fix an incorrect link count as long as it is nonzero. However, a zero
link count will cause the salvager to completely ignore that underlying
file, which means the corresponding vnode and directory entry will be
deleted and the file is essentially lost forever (unless the server admin
knows how to go digging for unreferenced files lying around in the server's
data area, which usually go unnoticed for quite some time).
Once the salvager is fixed to deal correctly with cases where the AFSIDat
directory tree contains a file whose recorded link count is zero, we should
be able to avoid syncing the link table any more often than we sync other
volume data (i.e. when the volume goes offline).
We should certainly be able to suppress all syncs of link table updates
during clone operations (which only increase the link counts on existing
files) and volume restores (where we are creating a whole new volume and
have other problems if the server crashes during the operation) as well as
when the salvager itself is running.
-- Jeff