[OpenAFS] Odd ubik (?) synchronization problem

Russ Allbery rra@stanford.edu
Mon, 26 Apr 2004 15:11:46 -0700

Marcus Watts <mdw@umich.edu> writes:

> The udebug output you posted has a db version of .56, after being
> labelled 136 seconds ago, so that's almost one change every 2 seconds,
> so you definitely have the series of writes.

We're in the middle of doing mass volume moves right now, so yeah, there
are a lot of writes.  It's not something that we've not done before,
though; we do these sorts of large volume migrations pretty routinely.

> what version software are you running?

Right now, OpenAFS 1.2.6 with a patch for the time wraparound problem.
I'm planning on upgrading to the latest 1.2.x series, but hadn't found the

> how big is your vldb?

afsdb1:/usr/afs/db# ls -l vldb.DB*
-rw-------    1 root     root      8704064 Apr 26 15:04 vldb.DB0
-rw-------    1 root     root           64 Apr 26 15:04 vldb.DBSYS1

> how fast are your machines?

They're Netra T1s or something in that range, so they're not particularly
zippy, but they're not ancient either.  We've never had any performance
issues on VLDB servers in the past.

> Have you run out of disk bandwidth or memory?

Nope, the systems are 95% idle and there's 560MB of physical memory free.

> are there any interesting messages in stderr from vlserver?

The master is reporting ubik servers coming back up pretty regularly.  The
slaves are logging messages like:

Mon Apr 26 15:05:10 2004 Ubik: Synchronize database with server
Mon Apr 26 15:05:12 2004 Ubik: Synchronize database completed

which is new as of last night (before it just did this silently).  There's
nothing in BosLog (as I recall, that's where stderr goes, right?).

> is there anything happening right now that has resulted in
> 	a temporary increase in your load or decrease in server speed?

Not that I can think of, although I believe we are fighting off a Windows
virus infestation.  I have a hard time imagining that it's causing enough
network flooding to be a problem when we're not noticing slowness from any
other application (not even AFS file service), but maybe....

