[OpenAFS-devel] stability problems, and interesting symptoms...

Neulinger, Nathan nneul@umr.edu
Wed, 30 May 2001 10:46:23 -0500


I've got two problems and one interesting symptom, though probably not of
any relation to the first problem.

First, on a couple of my servers (and this started happening sometime back
about a month or so with no apparent changes to server hardware or software)
- if I start moving volumes off the server en-masse, one at a time, one
after another, at some point in the process, 50-100 volumes have been moved,
I get a volserver error complaining about being unable to attach a volume.
Once that happens, from then on out, any listvol or volserver activity
against the server fails. Usually bos status indicates that vol exited with
signal 6 although not necessarily immediately (I haven't seen that with
openafs yet, but that was typically what I saw with 3.6-2.3). I have no
error messages from the volserver other than this - and basically no
indication that anything is wrong.

I get the error both with transarc 3.6-2.3 and openafs-cvs. 

Syslogs looks like this:
----
(lots and lots of stuff like the next few lines for the other volumes that
moved ok.)
May 30 10:30:18 afs4 fileserver[511]: fssync: volume 537013509 moved to
63019783; breaking all call backs 
May 30 10:30:18 afs4 volserver[483]: 1 Volser: Delete: volume 537013509
deleted  
May 30 10:30:18 afs4 volserver[483]: 1 Volser: Delete: volume 537013511
deleted  
May 30 10:30:18 afs4 volserver[483]: 1 Volser: Delete: volume 537020173
deleted  
May 30 10:30:20 afs4 volserver[483]: 1 Volser: Clone: Cloning volume
536897629 to new volume 537020174 
May 30 10:30:20 afs4 fileserver[511]: fssync: volume 536897629 moved to
63019783; breaking all call backs 
May 30 10:30:20 afs4 volserver[483]: 1 Volser: Delete: volume 536897629
deleted  
May 30 10:30:20 afs4 volserver[483]: 1 Volser: Delete: volume 536897631
deleted  
May 30 10:30:22 afs4 volserver[483]: 1 Volser: Delete: volume 537020174
deleted  
May 30 10:30:23 afs4 volserver[483]: VAttachVolume: Error attaching volume
/vicepd/V0536906941.vol; volume needs salvage 
May 30 10:30:23 afs4 volserver[483]: 1 Volser: ListVolumes: Could not attach
volume 536906941 
May 30 10:30:23 afs4 volserver[483]: 1 Volser: ListVolumes: Could not attach
volume 536985904 
May 30 10:30:23 afs4 volserver[483]: 1 Volser: ListVolumes: Could not attach
volume 536889228 
May 30 10:30:23 afs4 volserver[483]: 1 Volser: ListVolumes: Could not attach
volume 536924071 
May 30 10:30:23 afs4 volserver[483]: 1 Volser: ListVolumes: Could not attach
volume 536896750 
May 30 10:30:23 afs4 volserver[483]: 1 Volser: ListVolumes: Could not attach
volume 536897341 
May 30 10:30:23 afs4 volserver[483]: 1 Volser: ListVolumes: Could not attach
volume 536983233 
May 30 10:30:23 afs4 volserver[483]: 1 Volser: ListVolumes: Could not attach
volume 536906834 
(tons of that for every volume on the server, and happens again if you do a
vos listvol against the server.)
-----

The other symptom - when clearing off a server, I happened to notice that
the volserver seemed to hang (and not respond to any new client requests
such as vos partinfo) if I started a vos release against it. Once the vos
release (in particular the ForwardMulti) completed, the volserver responded
again. I'm not talking about a huge volume - maybe 5-10 megs with a few
thousand files in it. 

I'm running volserver with no options in both cases. 

-- Nathan

------------------------------------------------------------
Nathan Neulinger                       EMail:  nneul@umr.edu
University of Missouri - Rolla         Phone: (573) 341-4841
Computing Services                       Fax: (573) 341-4216