[OpenAFS-devel] Regarding volserver hangs in 1.2.X

Jeffrey Hutzelman jhutz@cmu.edu
Wed, 24 Aug 2005 16:45:35 -0400


On Wednesday, August 24, 2005 04:02:31 PM -0400 William Setzer 
<William_Setzer@ncsu.edu> wrote:

> /usr/afs/logs # pstack 975
> 975:    /usr/afs/bin/volserver
>  ff19e89c read     (3, 1e6663, 1)
>  0003c02c FSYNC_askfs (201a52c2, 1e68f8, 95400, 2, 201a52c2, 1) + 88
>  00038e50 VAttachVolumeByName_r (1e697c, 1e68f8, 4c, 2, 6570, 6) + 1f8
>  00038c48 VAttachVolumeByName (1e697c, 1e68f8, 1e68e4, 2, 0, 94c00) + 10
>  00026f7c XAttachVolume (1e697c, 201a52c2, 4, 2, ab000, 16) + 60
>  000284c8 VolTransCreate (3bc9d0, 201a52c2, 4, 2, 1e6aa0, c0400) + b0


A volserver blocked here is waiting for an answer from the fileserver to a 
request it sent via the fssync protocol.  The most common case for this is 
when _returning_ a volume to the fileserver, because in 1.2.x the 
fileserver won't respond to such a request until it has broken all of the 
callbacks.  That was indeed the problem described in the thread you 
mentioned, and it was resolved in the development branch.  If you are 
seeing this problem, you should consider upgrading one or more servers to 
1.4.0rc1, which should not have this problem.


However, the backtrace you posted actually points to a different problem, 
which is described in ticket #5615.  This problem happens to affect only 
newer Solaris systems, and is fixed in 1.2.12.

-- Jeffrey T. Hutzelman (N3NHS) <jhutz+@cmu.edu>
   Sr. Research Systems Programmer
   School of Computer Science - Research Computing Facility
   Carnegie Mellon University - Pittsburgh, PA