[OpenAFS] Fileserver process hung on startup
John Morris
openafs@butchwax.com
30 Mar 2004 00:34:12 -0600
Cool, got the strace. After the expected loading of .so files and
config files and such, we see it contact the vldbs of the other two
servers and do a 'rt_sigsuspend'; the second from which it never
returns. Is this a thread locking issue?
If it's relevant, this is openafs 1.2.11 built from a nearly-stock SRPM
on this machine, which runs RH8 with custom kernel and stock RH glibc
2.2.93.
Sorry for the line-wrapping, I'm not in my preferred environment right
now....
sendmsg(7, {msg_name(16)={sin_family=AF_INET, sin_port=htons(7003),
sin_addr=inet_addr("67.67.198.201")}},\
msg_iov(2)=[{"\257\356\r\2001\177\254\274\0\0\0\1\0\0\0\1\0\0\0\1\1\5"..., 28}, {"\0\0\2\24\0kK\356\0\0\3\34\241\0\0\37&\377\377\377\276\0"..., 64}], msg_controllen=0, msg_flags=0}, 0) = 92
time(NULL) = 1080626914
kill(5685, SIGRTMIN) = 0
kill(5685, SIGRTMIN) = 0
time(NULL) = 1080626914
rt_sigprocmask(SIG_SETMASK, NULL, [RTMIN], 8) = 0
rt_sigsuspend([] <unfinished ...>
--- SIGRTMIN (Real-time signal 0) ---
<... rt_sigsuspend resumed> ) = -1 EINTR (Interrupted system
call)
sigreturn() = ? (mask now [RTMIN])
gettimeofday({1080626914, 364860}, NULL) = 0
gettimeofday({1080626914, 365028}, NULL) = 0
time(NULL) = 1080626914
gettimeofday({1080626914, 365310}, NULL) = 0
gettimeofday({1080626914, 365488}, NULL) = 0
sendmsg(7, {msg_name(16)={sin_family=AF_INET, sin_port=htons(7003),
sin_addr=inet_addr("66.227.12.105")}},\
msg_iov(2)=[{"\257\356\r\2001\177\254\270\0\0\0\1\0\0\0\1\0\0\0\1\1\5"..., 28}, {"\0\0\2\24\0kK\356\0\0\3\34\241\0\0\37&\377\377\377\276\0"..., 64}], msg_controllen=0, msg_flags=0}, 0) = 92
time(NULL) = 1080626914
time(NULL) = 1080626914
rt_sigprocmask(SIG_SETMASK, NULL, [RTMIN], 8) = 0
rt_sigsuspend([]
As for the environment, the env variable is in there:
root 5681 0.0 0.6 17324 3544 ? S< 00:08 0:00
/usr/afs/bin/fileserver HOSTNAME=kugioga.butchwax.com TERM=dumb
SHELL=/bin/bash HISTSIZE=1000 SSH_CLIENT=192.168.3.101 48003 22
SSH_TTY=/dev/pts/2 EMACS=t USER=root LS_COLORS= TERMCAP= USERNAME=root
COLUMNS=80
PATH=/usr/kerberos/sbin:/usr/kerberos/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/usr/X11R6/bin:/root/bin MAIL=/var/spool/mail/root LC_COLLATE=C PWD=/usr/afs/logs INPUTRC=/etc/inputrc LANG=en_US SSH_ASKPASS=/usr/libexec/openssh/gnome-ssh-askpass HOME=/root SHLVL=3 LD_ASSUME_KERNEL=2.4.1 BASH_ENV=/root/.bashrc LOGNAME=root LESSOPEN=|/usr/bin/lesspipe.sh %s DISPLAY=localhost:10.0 G_BROKEN_FILENAMES=1 _=/usr/bin/strace
Thanks! Think we're getting closer?
John
On Mon, 2004-03-29 at 21:30, Derrick J Brashear wrote:
> On Mon, 29 Mar 2004, John Morris wrote:
>
> > What I'm guessing is that, judging by the fileserver's complete
> > unresponsiveness and that it's not even making system calls, that
> > something is hanging up, maybe a system call? Is there any
> > documentation on the '-d #' argument to the fileserver executable's
> > commandline? What debug level could I give it so it would give me a
> > clue what it's doing? Where else can I look for debugging information
> > besides the sources I listed in my first email (included below for a
> > reminder)?
>
> it won't really help. the debug messages probably won't give you a hint
> where it's hanging.
>
> attach before it hangs; better yet wrap fileserver itself with a shell
> script which straces to a file.
>
> I really like the LD_ASSUME_KERNEL idea; ps auxwwe |grep fileserver|grep
> -v grep
> and see if the LD_ASSUME_KERNEL variable is in the fileserver's
> environment.
>
> _______________________________________________
> OpenAFS-info mailing list
> OpenAFS-info@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-info