[OpenAFS] Fileserver process hung on startup
John Morris
openafs@butchwax.com
29 Mar 2004 20:41:07 -0600
# ping -c 1 127.0.0.1
PING 127.0.0.1 (127.0.0.1) from 127.0.0.1 : 56(84) bytes of data.
64 bytes from 127.0.0.1: icmp_seq=1 ttl=64 time=0.099 ms
--- 127.0.0.1 ping statistics ---
1 packets transmitted, 1 received, 0% loss, time 0ms
rtt min/avg/max/mdev = 0.099/0.099/0.099/0.000 ms
# ifconfig lo
lo Link encap:Local Loopback
inet addr:127.0.0.1 Mask:255.0.0.0
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:109901 errors:0 dropped:0 overruns:0 frame:0
TX packets:109901 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:10104040 (9.6 Mb) TX bytes:10104040 (9.6 Mb)
#
Sure is.
What I'm guessing is that, judging by the fileserver's complete
unresponsiveness and that it's not even making system calls, that
something is hanging up, maybe a system call? Is there any
documentation on the '-d #' argument to the fileserver executable's
commandline? What debug level could I give it so it would give me a
clue what it's doing? Where else can I look for debugging information
besides the sources I listed in my first email (included below for a
reminder)?
Thanks again!
John
On Mon, 2004-03-29 at 17:08, Derrick J Brashear wrote:
> On Mon, 29 Mar 2004, John Morris wrote:
>
> > Oops, 1.2.11. :)
> >
> > I tried this, but no effect, the fileserver is still not listening on
> > 2040. What should this have changed?
>
> is the loopback interface up?
>
> ifconfig lo
>
> _______________________________________________
> OpenAFS-info mailing list
> OpenAFS-info@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-info
Hi! See what y'all can do with this.
Openafs 2.9.11, custom smp kernel 2.4.23.
Three fileserver cell, one fileserver, kug, suddenly stops serving
files; clients see 'connection timed out'.
AFS server processes seem to be running normally as reported by bos
status.
# bos status kug -long -local
Instance ptserver, (type is simple) currently running normally.
Process last started at Sun Mar 28 16:51:58 2004 (2 proc starts)
Last exit at Sun Mar 28 16:51:55 2004
Command 1 is '/usr/afs/bin/ptserver'
Instance vlserver, (type is simple) currently running normally.
Process last started at Sun Mar 28 16:51:58 2004 (2 proc starts)
Last exit at Sun Mar 28 16:51:55 2004
Command 1 is '/usr/afs/bin/vlserver'
Instance fs, (type is fs) currently running normally.
Auxiliary status is: file server running.
Process last started at Mon Mar 29 01:29:36 2004 (11 proc starts)
Last exit at Mon Mar 29 01:29:36 2004
Last error exit at Mon Mar 29 01:29:36 2004, by vol, by exiting
with code 1
Command 1 is '/usr/afs/bin/fileserver'
Command 2 is '/usr/afs/bin/volserver'
Command 3 is '/usr/afs/bin/salvager'
#
Port 2040 not being listened on:
# netstat -tl | grep 2040
#
Get these errors from 2040 not being open:
FSYNC_clientInit temporary failure (will retry): Connection refused
Any fs commands on kug's filesystems hang for a long time before timing
out.
strace on fileserver process finds process in seemingly hung state, ie.
no system calls until process is killed.
Haven't noticed anything else funny about /vicepa; salvages complete
with no errors.
Volume DB is frozen as long as fileserver process is running; once
fileserver is killed, voldb comes back online.
Lsof shows kug's fileserver process compared with another normally
running fileserver's process has similar files open, except
localhost:2040, and of course /vicepa files.
restarts and reboots don't help.
That's all I can think of. Any ideas? Thanks for any suggestions! My
home directory is on this fileserver, so help will be appreciated extra!
John