[OpenAFS] volserver restarting after issuing vos commands

Mike Burns burns@psu.edu
Mon, 15 Dec 2003 10:09:52 -0500 (EST)


I'm having a problem with vos commands failing and causing the volserver to
restart.  The volserver process will restart after the first or second vos
command (e.g. vos create or vos release) that is issued immediatly after
restarting the fs subsystem or all processes on a server.  This happens on
both sun4x_59 file servers I'm testing against.  Any subsequent vos commands
work fine.  Included below are the details of a couple of such restarts.
Any help in resolving the problem would be appreciated.

OpenAFS Version: 1.2.10
systype: sun4x_59
OS:      Solaris 9 08/03
Using the binaries from openafs.org.

sun1# bos restart sun1.et-test.psu.edu -instance fs
sun1# vos release users.a
Failed to create the ro volume: : I/O error
The volume 536870944 could not be released to the following 1 sites:
		       sun2.et-test.psu.edu /vicepb
VOLSER: release could not be completed
Error in vos release command.
VOLSER: release could not be completed

sun1's VolserLog.old shows this command caused it to restart.

sun1# cat /usr/afs/logs/VolserLog.old
Mon Dec 15 09:02:07 2003 Starting AFS Volserver 2.0 (/usr/afs/bin/volserver)
Mon Dec 15 09:02:36 2003 VAttachVolume: attach of volume 536870945 apparently denied
by file server
FSYNC_askfs: No response from file server
Mon Dec 15 09:02:36 2003 1 Volser: ListVolumes: Could not attach volume 536870945
(/vicepb:V0536870945.vol), error=103

sun1# vos release users.a
Released volume users.a successfully
sun1# cat /usr/afs/logs/VolserLog
Mon Dec 15 09:02:36 2003 Starting AFS Volserver 2.0 (/usr/afs/bin/volserver)
Mon Dec 15 09:02:38 2003 1 Volser: Clone: Recloning volume 536870944 to volume
536870945

sun1# cat /usr/afs/logs/BosLog
Sun Dec 14 04:01:00 2003: Server directory access is okay
Mon Dec 15 09:02:07 2003: fs:vol exited on signal 15
Mon Dec 15 09:02:07 2003: fs:file exited with code 0
Mon Dec 15 09:02:36 2003: fs:vol exited on signal 13


If I do the same set of commands on sun2.et-test.psu.edu I get a different
error message in VolserLog on sun2.

sun2# vos release users.b
Failed to create the ro volume: : I/O error
The volume 536870947 could not be released to the following 1 sites:
                       sun2.et-test.psu.edu /vicepb
VOLSER: release could not be completed
Error in vos release command.
VOLSER: release could not be completed
sun2#
sun2# cat /usr/afs/logs/VolserLog
Mon Dec 15 09:20:04 2003 Starting AFS Volserver 2.0 (/usr/afs/bin/volserver)
Mon Dec 15 09:21:59 2003 VAttachVolume: attach of volume 536870948 apparently denied
by file server
FSYNC_askfs: No response from file server
Mon Dec 15 09:21:59 2003 VCreateVolume: Header file /vicepb/V0536870948.vol already
exists!
Mon Dec 15 09:21:59 2003 1 Volser: CreateVolume: Unable to create the volume;
aborted, error code 104
Mon Dec 15 09:21:59 2003 : Error 104

And the command works fine the second time I issue it.

sun2# vos release users.b
Released volume users.b successfully


And here are the details about a failed vos create on sun1.  This one failed on
the second vos create command, not the first.

sun1# bos restart sun1.et-test.psu.edu -all
sun1# date; vos create sun1.et-test.psu.edu /vicepb junkvol
Mon Dec 15 09:29:09 EST 2003
Volume 536961433 created on partition /vicepb of sun1.et-test.psu.edu

sun1# date; vos create sun1.et-test.psu.edu /vicepb junkvol2
Mon Dec 15 09:30:03 EST 2003
Failed to end the transaction on the volume junkvol2 536961436
: No such file or directory
Error in vos create command.
: No such file or directory

sun1# cat /usr/afs/logs/VolserLog.old
Mon Dec 15 09:28:08 2003 Starting AFS Volserver 2.0 (/usr/afs/bin/volserver)
Mon Dec 15 09:29:09 2003 1 Volser: CreateVolume: volume 536961433 (junkvol) created
Mon Dec 15 09:30:04 2003 1 Volser: CreateVolume: volume 536961436 (junkvol2) created
FSYNC_askfs: No response from file server

sun1# cat /usr/afs/logs/VolserLog
Mon Dec 15 09:30:04 2003 Starting AFS Volserver 2.0 (/usr/afs/bin/volserver)

sun1# cat /usr/afs/logs/BosLog
Sun Dec 14 04:01:00 2003: Server directory access is okay
Mon Dec 15 09:02:07 2003: fs:vol exited on signal 15
Mon Dec 15 09:02:07 2003: fs:file exited with code 0
Mon Dec 15 09:02:36 2003: fs:vol exited on signal 13
Mon Dec 15 09:28:08 2003: upserver exited on signal 15
Mon Dec 15 09:28:08 2003: vlserver exited on signal 15
Mon Dec 15 09:28:08 2003: ptserver exited on signal 15
Mon Dec 15 09:28:08 2003: fs:vol exited on signal 15
Mon Dec 15 09:28:08 2003: fs:file exited with code 0
Mon Dec 15 09:30:04 2003: fs:vol exited on signal 13

Thanks.

- Mike

--------------------------------------------------------------------------
Mike Burns                                     Emerging Technologies Group
burns@psu.edu                  Academic Services and Emerging Technologies
+1 814 863 5606                          The Pennsylvania State University