[OpenAFS] Fileserver process hung on startup
   
    John Morris
     
    openafs@butchwax.com
       
    29 Mar 2004 20:41:07 -0600
    
    
  
# ping -c 1 127.0.0.1
PING 127.0.0.1 (127.0.0.1) from 127.0.0.1 : 56(84) bytes of data.
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.099 ms
--- 127.0.0.1 ping statistics ---
1 packets transmitted, 1 received, 0% loss, time 0ms
rtt min/avg/max/mdev = 0.099/0.099/0.099/0.000 ms
# ifconfig lo
lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:109901 errors:0 dropped:0 overruns:0 frame:0
          TX packets:109901 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:10104040 (9.6 Mb)  TX bytes:10104040 (9.6 Mb)
#
Sure is.
What I'm guessing is that, judging by the fileserver's complete
unresponsiveness and that it's not even making system calls, that
something is hanging up, maybe a system call?  Is there any
documentation on the '-d #' argument to the fileserver executable's
commandline?  What debug level could I give it so it would give me a
clue what it's doing?  Where else can I look for debugging information
besides the sources I listed in my first email (included below for a
reminder)?
Thanks again!
	John
On Mon, 2004-03-29 at 17:08, Derrick J Brashear wrote:
> On Mon, 29 Mar 2004, John Morris wrote:
> 
> > Oops, 1.2.11.  :)
> >
> > I tried this, but no effect, the fileserver is still not listening on
> > 2040.  What should this have changed?
> 
> is the loopback interface up?
> 
> ifconfig lo
> 
> _______________________________________________
> OpenAFS-info mailing list
> OpenAFS-info@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-info
Hi!  See what y'all can do with this.
Openafs 2.9.11, custom smp kernel 2.4.23.
Three fileserver cell, one fileserver, kug, suddenly stops serving
files; clients see 'connection timed out'.
AFS server processes seem to be running normally as reported by bos
status.
  # bos status kug -long -local
  Instance ptserver, (type is simple) currently running normally.
      Process last started at Sun Mar 28 16:51:58 2004 (2 proc starts)
      Last exit at Sun Mar 28 16:51:55 2004
      Command 1 is '/usr/afs/bin/ptserver'
  Instance vlserver, (type is simple) currently running normally.
      Process last started at Sun Mar 28 16:51:58 2004 (2 proc starts)
      Last exit at Sun Mar 28 16:51:55 2004
      Command 1 is '/usr/afs/bin/vlserver'
  Instance fs, (type is fs) currently running normally.
      Auxiliary status is: file server running.
      Process last started at Mon Mar 29 01:29:36 2004 (11 proc starts)
      Last exit at Mon Mar 29 01:29:36 2004
      Last error exit at Mon Mar 29 01:29:36 2004, by vol, by exiting
with code 1
      Command 1 is '/usr/afs/bin/fileserver'
      Command 2 is '/usr/afs/bin/volserver'
      Command 3 is '/usr/afs/bin/salvager'
  #
Port 2040 not being listened on:
  # netstat -tl | grep 2040
  # 
Get these errors from 2040 not being open:
   FSYNC_clientInit temporary failure (will retry): Connection refused
Any fs commands on kug's filesystems hang for a long time before timing
out.
strace on fileserver process finds process in seemingly hung state, ie.
no system calls until process is killed.
Haven't noticed anything else funny about /vicepa; salvages complete
with no errors.
Volume DB is frozen as long as fileserver process is running; once
fileserver is killed, voldb comes back online.
Lsof shows kug's fileserver process compared with another normally
running fileserver's process has similar files open, except
localhost:2040, and of course /vicepa files.
restarts and reboots don't help.
That's all I can think of.  Any ideas?  Thanks for any suggestions!  My
home directory is on this fileserver, so help will be appreciated extra!
        John