[OpenAFS] backup->buserver 1.4.2 hangs/loops on gettimeofday

John W. Sopko Jr. sopko@cs.unc.edu
Tue, 14 Nov 2006 09:37:40 -0500


Just upgraded from 1.4.1 to 1.4.2 on our 3 rhel-3 servers.
Ran into a problem where the /usr/sbin/backup command
hangs. I tried the 1.4.1 backup command same problem.

I ran "strace /usr/sbin/backup". The backup command is
looping on the gettimeofday call. Below is a snipet of
the strace. A full strace is at:

http://www.cs.unc.edu/~sopko/backup/strace.txt

The backup command tries to contacat all
2 of our backup servers, 152.2.128.3 .4 and .7 at
the buserver port 7021,  and continues to loop.

We run ntp servers and these look fine, the afs
servers can do ntpdate to the ntp servers.

When upgrading I did do a minor redhat kernel upgrade
from 2.4.21-47.0.ELsmp to 2.4.21-47.0.1.ELsmp.

I will put the 1.4.1 buserver in place and see what
happens.


gettimeofday({1163514403, 109895}, NULL) = 0
select(4, [3], NULL, NULL, {0, 998408}) = 1 (in [3], left {1, 0})
recvmsg(3, {msg_name(16)={sa_family=AF_INET, sin_port=htons(7021), 
sin_addr=inet_addr("152.2.128.4")}, 
msg_iov(7)=[{"\202\24J\257\351[\26\4\0\0\0\1\0\0\0\0\0\0\0\6\4\0\0\2"..., 28}, 
{"\0\0\25\0\0\0\0\2\0\0\0\1\0\0\0\5\7\0\6\323\0\0\0\5\244"..., 1416}, 
{"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1416}, 
{"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1416}, 
{"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1416}, 
{"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1416}, 
{"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1420}], 
msg_controllen=0, msg_flags=0}, 0) = 32
time(NULL)                              = 1163514403
gettimeofday({1163514403, 110429}, NULL) = 0
gettimeofday({1163514403, 110488}, NULL) = 0
time(NULL)                              = 1163514403
gettimeofday({1163514403, 110595}, NULL) = 0
gettimeofday({1163514403, 110655}, NULL) = 0
sendmsg(3, {msg_name(16)={sa_family=AF_INET, sin_port=htons(7021), 
sin_addr=inet_addr("152.2.128.7")}, 
msg_iov(2)=[{"\202\24J\257\351[\26\10\0\0\0\2\0\0\0\1\0\0\0\5\1\5\0\2"..., 
28}, {"\0\0\0\25", 4}], msg_controllen=0, msg_flags=0}, 0) = 32
time(NULL)                              = 1163514403
time(NULL)                              = 1163514403
gettimeofday({1163514403, 111005}, NULL) = 0
time(NULL)                              = 1163514403
gettimeofday({1163514403, 111120}, NULL) = 0
select(4, [3], NULL, NULL, {0, 998365}) = 1 (in [3], left {0, 900000})
recvmsg(3, {msg_name(16)={sa_family=AF_INET, sin_port=htons(7021), 
sin_addr=inet_addr("152.2.128.7")}, 
msg_iov(7)=[{"\202\24J\257\351[\26\10\0\0\0\2\0\0\0\0\0\0\0\6\2 \0\2"..., 28}, 
{"\0\0\0\0\0\0\0\2\0\0\0\1\0\0\0\0\10\0\0\0\0\0\0\5\244\0"..., 1416}, 
{"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1416}, 
{"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1416}, 
{"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1416}, 
{"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1416}, 
{"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1420}], 
msg_controllen=0, msg_flags=0}, 0) = 65
time(NULL)                              = 1163514403

-- 
John W. Sopko Jr.               University of North Carolina
email: sopko AT cs.unc.edu      Computer Science Dept., CB 3175
Phone: 919-962-1844             Sitterson Hall; Room 044
Fax:   919-962-1799             Chapel Hill, NC 27599-3175