[OpenAFS] Further Problem with 1.3.77 under AIX 5.1

Hans-Gunther Borrmann hans-gunther.borrmann@rz.uni-freiburg.de
Thu, 13 Jan 2005 13:20:45 +0100


Hello,

I have compiled openafs-snap-2005-01-10. Besides that getting a token and=20
acessing AFS crashes the machine I found the following problem:

Without any token I tar a large AFS area with campus wide available files to
/dev/null. After some time I get the following error messages:

a rs_aix51/gaussian-03/g03/l405.hlp 6 blocks.
a rs_aix51/gaussian-03/g03/l502.exe 18195 blocks.
tar: 0511-182 Read error on afs: Lost contact with file server 10.1.2.26 in=
=20
cell uni-freiburg.de (multi-homed addre
ss; other same-host interfaces maybe up)
afs: Lost contact with file server 10.1.2.27 in cell uni-freiburg.de=20
(multi-homed address; other same-host interfac
es maybe up)
afs: Lost contact with file server 132.230.6.235 in cell uni-freiburg.de (a=
ll=20
multi-homed ip addresses down for the
 server)
afs: Lost contact with file server 132.230.6.236 in cell uni-freiburg.de (a=
ll=20
multi-homed ip addresses down for the
 server)
afs: setting clock back 10 seconds (of 45, via 10.1.2.26 in cell=20
uni-freiburg.de); clock is still fast.
rs_aix51/gaussian-03/g03/l502.exe: A remote host did not respond within the=
=20
timeout period.
a rs_aix51/gaussian-03/g03/l502.hlp 68 blocks.
tar: 0511-182 Read error on rs_aix51/gaussian-03/g03/l502.hlp: A remote hos=
t=20
did not respond within the timeout per
iod.
a rs_aix51/gaussian-03/g03/l503.exe 2922 blocks.
tar: 0511-182 Read error on rs_aix51/gaussian-03/g03/l503.exe: A remote hos=
t=20
did not respond within the timeout per
iod.
a rs_aix51/gaussian-03/g03/l503.hlp 7 blocks.
tar: 0511-182 Read error on rs_aix51/gaussian-03/g03/l503.hlp: A remote hos=
t=20
did not respond within the timeout per
iod.
a rs_aix51/gaussian-03/g03/l504.exe 3307 blocks.
tar: 0511-182 Read error on rs_aix51/gaussian-03/g03/l504.exe: A remote hos=
t=20
did not respond within the timeout per
iod.
a rs_aix51/gaussian-03/g03/l506.exe 6171 blocks.
tar: 0511-182 Read error on rs_aix51/gaussian-03/g03/l506.exe: A remote hos=
t=20
did not respond within the timeout per
iod.
tar: rs_aix51/gaussian-03/g03/l506.hlp: A remote host did not respond withi=
n=20
the timeout period.
=2E
=2E # the tar continues a little bit
=2E
a share/sw-tools-1.0/sbin/lnlibe 1 blocks.
a share/sw-tools-1.0/sbin/lnman 1 blocks.
a share/sw-tools-1.0/sbin/lnsbin 1 blocks.
a share/sw-tools-1.0/sbin/mkman 1 blocks.
a share/sw-tools-1.0/sbin/rmman 1 blocks.
a share/sw-tools-1.0/Links 1 blocks.
a share/sw-tools-1.0/Id 1 blocks.
a share/sw-tools-1.0/README 1 blocks.
a share/sw-tools-1.0/History 5 blocks.
tar: share/xyz: A remote host did not respond within the timeout period.

The tar the finishes.

=B0 The fileservers are multihomed. The test-machine has no access to 10.1.=
2.x,       =20
the fileservers are only reachable by their 132.230.6.x adresses.

=B0 After saying that "all multihomed ip-adresses are down" the test machin=
e has=20
no further access to AFS besides to some files which are stiil in the cache.
fs checkservers says always "These servers unavailable due to network or=20
server problems:  sv6.ruf.uni-freiburg.de sv7.ruf.uni-freiburg.de".

=B0 Stopping and starting AFS does not help. I have to reboot.

=B0 During this state  a machine connected to the same hub is able to tar t=
he=20
same area without any problems. So there is no problem with the network or=
=20
the servers itselves.

=B0 The problem is reproducible.

=B0 The problem does not show up if I replace the kernel extensions by thos=
e of=20
Hartmut Reuter contained in his 15.03.04 Version.

Thanks in advance for any help.

Gunther
=2D-=20
________________________________________________________________
Hans-Gunther Borrmann <hans-gunther.borrmann@rz.uni-freiburg.de>
Rechenzentrum der Universitaet Freiburg
Hermann-Herder-Str. 10, D79104 FREIBURG
Tel.: +49 761/203-4652
=46ax:  +49 761/203-4643