[OpenAFS] Fileserver process hung on startup

John Morris openafs@butchwax.com
29 Mar 2004 01:44:22 -0600


Hi!  See what y'all can do with this.

Openafs 2.9.11, custom smp kernel 2.4.23.

Three fileserver cell, one fileserver, kug, suddenly stops serving
files; clients see 'connection timed out'.

AFS server processes seem to be running normally as reported by bos
status.

  # bos status kug -long -local
  Instance ptserver, (type is simple) currently running normally.
      Process last started at Sun Mar 28 16:51:58 2004 (2 proc starts)
      Last exit at Sun Mar 28 16:51:55 2004
      Command 1 is '/usr/afs/bin/ptserver'

  Instance vlserver, (type is simple) currently running normally.
      Process last started at Sun Mar 28 16:51:58 2004 (2 proc starts)
      Last exit at Sun Mar 28 16:51:55 2004
      Command 1 is '/usr/afs/bin/vlserver'

  Instance fs, (type is fs) currently running normally.
      Auxiliary status is: file server running.
      Process last started at Mon Mar 29 01:29:36 2004 (11 proc starts)
      Last exit at Mon Mar 29 01:29:36 2004
      Last error exit at Mon Mar 29 01:29:36 2004, by vol, by exiting
with code 1
      Command 1 is '/usr/afs/bin/fileserver'
      Command 2 is '/usr/afs/bin/volserver'
      Command 3 is '/usr/afs/bin/salvager'
  #

Port 2040 not being listened on:

  # netstat -tl | grep 2040
  # 

Get these errors from 2040 not being open:

   FSYNC_clientInit temporary failure (will retry): Connection refused

Any fs commands on kug's filesystems hang for a long time before timing
out.

strace on fileserver process finds process in seemingly hung state, ie.
no system calls until process is killed.

Haven't noticed anything else funny about /vicepa; salvages complete
with no errors.

Volume DB is frozen as long as fileserver process is running; once
fileserver is killed, voldb comes back online.

Lsof shows kug's fileserver process compared with another normally
running fileserver's process has similar files open, except
localhost:2040, and of course /vicepa files.

restarts and reboots don't help.

That's all I can think of.  Any ideas?  Thanks for any suggestions!  My
home directory is on this fileserver, so help will be appreciated extra!

	John