[OpenAFS-devel] client stability

Derek Atkins warlord@MIT.EDU
26 May 2001 00:52:25 -0400


First, AFS != CODA.  Are you talking about AFS or are you
talking about CODA.  You keep referencing the two, but they
are not the same.

Second, why are you even stopping AFS?  Starting and stopping AFS can
certainly lead to problems unless you are 100% sure that you are
stopping AFS in a clean environment.  If you try to stop AFS in an
unclean environment you will certainly have problems.

I've been using AFS on Linux since 1994, and I rarely, if ever, shut
down AFS.  If your network is going away, that's fine.. AFS will just
take a while to timeout (and you'll get syslog messages about the AFS
timeout).  Once it times out (and yes, the system will appear to
"hang" while it's timing out), the system returns to normal.  Then,
when you are back on the network, you can either run 'fs checks' or
wait for AFS to automatically recheck the servers to come back online.

-derek

Kuba Ober <kuba@mareimbrium.org> writes:

> Hi,
> 
> can some of the developers provide hints about stability of coda clients on 
> Linux?
> 
> I'm somewhat concerned about it, as it is not always possible to cleanly 
> unmount /afs
> Sometimes it succeeds, but sometimes umount hangs and that's it.
> 
> It looks like under 2.4 kernels it's very quick process. The client spirals 
> down in about 20 minutes.
> 
> Under 2.2 kernels it works w/o problems for quite some time. Actually the 
> only problem I had with 2.2 kernel client was when the server was rebooted 
> (due to reasons not connected with coda). Any access to /afs returned 
> `operation has timed out' error or somesuch, but unmounting just hung 
> forever. After killing the session and starting new shell the umount finally 
> worked, but then the kernel module was stuck - it had reference count of 2, 
> even though there were no afsd daemons alive at that moment. After some more 
> time (a minute or two) the module removal finally succeeded, but the machine 
> hung w/o kernel panic nor any other indication of problems. IP stack went 
> dead as well as keyboard handler, as no `lock' lights were operative, and 
> pinging the machine from the net didn't get return packets. Magic SysReq was 
> dead as well (yep, I have it compiled into the kernel).
> 
> I don't think there are any problems with the server in my test setup :-)
> 
> Any hints?
> 
> Right now I'm trying to decide whether to look for something other than AFS 
> (being reluctant to go with Coda as its win9x client is ripply and on the 
> whole its AFS-derived), or to get involved in the development trying to 
> rectify those problems. I don't have huge kernel hacking experience, but I 
> think I'd like to at least try documenting what happens upon unmounting of 
> /afs and where the process stalls...
> 
> Does any of you have success stories with afs on RH linux systems? I'm not 
> talking about `ideal' setup where you just /etc/init.d/afs start and never 
> touch it. I'm trying to obtain stable operation in a test case where the 
> client is brought up via initscript, some file accesses are made, client is 
> stopped via the initscript, and again (shell script).
> 
> In all systems that I've checked (RH 7.0 client, three RH 7.1 clients ), it 
> never survived more than 20 start/stops. All problems occured on client stop, 
> though. If it went OK, then client start was always flawless. All processors 
> on those machines were either PIII or Celeron with 128+ mb of ram and ample 
> free disk space. Kernels were 2.2.17 on RH7.0, 2.4.2 on RH7.1.
> 
> Cheerz,
> Kuba
> _______________________________________________
> OpenAFS-devel mailing list
> OpenAFS-devel@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-devel

-- 
       Derek Atkins, SB '93 MIT EE, SM '95 MIT Media Laboratory
       Member, MIT Student Information Processing Board  (SIPB)
       URL: http://web.mit.edu/warlord/    PP-ASEL-IA     N1NWH
       warlord@MIT.EDU                        PGP key available