[OpenAFS-port-darwin] Re: OS X hangs when accessing files

Systems Administration sysadmin@contrailservices.com
Mon, 9 Aug 2004 11:55:01 -0600


>> No - the mac client is hung indefinitely - at least I have not had 
>> the patience to wait it out - 60 minutes is my limit to have my 
>> workstation be useless.

Hmm - correction on this - it seems that this hang is not related to 
permissions as I had thought - it definitely hangs even in file spaces 
that are accessible to my administratively privileged account.  In this 
example I am untarring an archive up to the server and it hangs after 
about half of the tarball has been uploaded.

The only way to kick to a 'timeout' is to force restart the bosserver 
on the fileserver box.

This happens with two different fileservers now, one is the cell master 
and KRB5 KDC, and also an AFS fileserver, the other is just a AFS 
fileserver running only the fs processes.  All of the traffic is 
between the client mac and the fileserver hosting the disk - restarting 
the cell master has no effect, only restarting the fileserver  causes 
the client timeout and release of the system hang.

>
> tcpdump port 7000; does anything show up?
> cmdebug (hung client hostname); do you see any locks held?

cmdebug shows one lock on the file while the client is hung - cannot 
kill the process.


** Cache entry @ 0x0d75d968 for 1.536871048.193.860 
[ridgebacksystems.com]
     locks: (none_waiting, upgrade_locked(pid:1175 at:66))
     2048 bytes  DV 26 refcnt 25
     callback 0281ef40   expires 1092087635
     0 opens     0 writers
     normal file
     states (0x1), stat'd



Tcp dump shows:

11:39:36.855104 IP 
lightning.internal.contrailservices.com.afs3-callback > 
turbine.internal.contrailservices.com.afs3-fileserver:  rx data fs call 
fetch-data fid 536871048/193/860 offset 0 length 999999999 (52)
11:39:37.238583 IP 
lightning.internal.contrailservices.com.afs3-callback > 
turbine.internal.contrailservices.com.afs3-fileserver:  rx data fs call 
fetch-data fid 536871048/193/860 offset 0 length 999999999 (52)
11:39:37.238858 IP 
turbine.internal.contrailservices.com.afs3-fileserver > 
lightning.internal.contrailservices.com.afs3-callback:  rx ack first 2 
serial 1754 reason duplicate packet (65)
11:39:51.890754 IP 
turbine.internal.contrailservices.com.afs3-fileserver > 
lightning.internal.contrailservices.com.afs3-callback:  rx ack first 2 
serial 0 reason ping (65)
11:39:51.891032 IP 
lightning.internal.contrailservices.com.afs3-callback > 
turbine.internal.contrailservices.com.afs3-fileserver:  rx ack first 1 
serial 890 reason ping response (65)
11:40:01.254868 IP 
lightning.internal.contrailservices.com.afs3-callback > 
turbine.internal.contrailservices.com.afs3-fileserver:  rx ack first 1 
serial 0 reason ping (65)
11:40:01.255304 IP 
turbine.internal.contrailservices.com.afs3-fileserver > 
lightning.internal.contrailservices.com.afs3-callback:  rx ack first 2 
serial 1756 reason ping response (65)
11:40:16.942052 IP 
turbine.internal.contrailservices.com.afs3-fileserver > 
lightning.internal.contrailservices.com.afs3-callback:  rx ack first 2 
serial 0 reason ping (65)
11:40:16.942344 IP 
lightning.internal.contrailservices.com.afs3-callback > 
turbine.internal.contrailservices.com.afs3-fileserver:  rx ack first 1 
serial 895 reason ping response (65)
11:40:25.259102 IP 
lightning.internal.contrailservices.com.afs3-callback > 
turbine.internal.contrailservices.com.afs3-fileserver:  rx ack first 1 
serial 0 reason ping (65)
11:40:25.259370 IP 
turbine.internal.contrailservices.com.afs3-fileserver > 
lightning.internal.contrailservices.com.afs3-callback:  rx ack first 2 
serial 1758 reason ping response (65)
11:40:36.983144 IP 
turbine.internal.contrailservices.com.afs3-fileserver > 
lightning.internal.contrailservices.com.afs3-callback:  rx ack first 2 
serial 0 reason ping (65)
11:40:36.983418 IP 
lightning.internal.contrailservices.com.afs3-callback > 
turbine.internal.contrailservices.com.afs3-fileserver:  rx ack first 1 
serial 899 reason ping response (65)



Ted