[OpenAFS] 'vos' command dos not finish, file service works ok (sort of)
Thu, 24 Jul 2008 03:00:41 +0200
Jeffrey Altman <firstname.lastname@example.org> writes:
> The Volserver is trying to establish a connection with the Fileserver
> and it can't. As a result after five retries it exits with an assertion
> What is the state of the File Server? (FileLog)
Thu Jul 24 01:00:26 2008 File server starting
Thu Jul 24 01:00:26 2008 afs_krb_get_lrealm failed, using itp.tugraz.at.
Thu Jul 24 02:25:27 2008 Set Debug On level = 1
Thu Jul 24 02:25:53 2008  Set Debug On level = 5
After a few restarts the verbose mode is of course no longer active and the
Log File has been moved away. I have now rectivated more verbose logging, but
besides the message of increased logging as above I cannot remember anything
Dump again; differnt to my latest attempts there is now some reaction on the command line:
root@faeppc18:~# "vos dump -id user.zuzi -file /tmp/backup -localauth -verbose"
Full Dump ...
Starting transaction on volume 536872626...
And thats all after 10 Minutes - in my latest full archival backup this dump
file is about 300 Mbytes. The log files have not changed much, esp. FileLog
stays the same:
> What version of OpenAFS are you using?
1.4.7.dfsg1-1 - a backport of Sam Hartmans Debian packages (unstable) to
Debian stable, Kernel is a custom linux 220.127.116.11.
There are 3 DB servers and 3 file servers all running this version. This is
the only machine acting as a file and db server. DB server binds to virtual
ethernet socket provided by fake (arp poisioning). This worked with very
little problems for about a year now, but I had to reboot because we had a
sheduled power outage last friday and yesterday (wednesday)
Another file server (on big UPS, so not rebooted, but also running 1.4.7) is much more verbose after
Sun Jul 20 04:00:11 2008 File server starting
Sun Jul 20 04:00:11 2008 afs_krb_get_lrealm failed, using itp.tugraz.at.
Sun Jul 20 04:00:49 2008 VL_RegisterAddrs rpc failed; will retry periodically (code=5377, err=0)
Sun Jul 20 04:00:49 2008 Set thread id 11 for FSYNC_sync
Sun Jul 20 04:00:49 2008 FSYNC_sync: bind failed with (98), removed bogus /var/lib/openafs/local/fssync.sock
Sun Jul 20 04:00:49 2008 Partition /vicepa: attaching volumes
Sun Jul 20 04:01:15 2008 Partition /vicepa: attached 362 volumes; 0 volumes not attached
Sun Jul 20 04:01:15 2008 Getting FileServer name...
Sun Jul 20 04:01:15 2008 FileServer host name is 'faepsv07'
Sun Jul 20 04:01:15 2008 Getting FileServer address...
Sun Jul 20 04:01:15 2008 FileServer faepsv07 has address 18.104.22.168 (0x6fa11b81 or 0x811ba16f in host byte order)
Sun Jul 20 04:01:15 2008 File Server started Sun Jul 20 04:01:15 2008
Sun Jul 20 04:01:15 2008 Set thread id 15 for 'FiveMinuteCheckLWP'
Sun Jul 20 04:01:15 2008 Set thread id 16 for 'HostCheckLWP'
Sun Jul 20 04:01:15 2008 Set thread id 17 for 'FsyncCheckLWP'
Sun Jul 20 20:04:29 2008 CB: ProbeUuid for 22.214.171.124:51209 failed -01
Sun Jul 20 20:08:56 2008 CB: ProbeUuid for 126.96.36.199:51227 failed -01
I now really suspect those problems stem from the file server and db server
listening on different IP addresses on the same machine.
Thanks for caring!
Andreas Hirczy <email@example.com> http://itp.tugraz.at/~ahi/
Graz University of Technology phone: +43/316/873- 8190
Institute of Theoretical and Computational Physics fax: +43/316/873-10 8190
Petersgasse 16, A-8010 Graz mobile: +43/664/859 23 57