[OpenAFS-devel] A few questions about the current Linux implementation of the AFS client

Matt Peterson matt@caldera.com
Mon, 21 Jan 2002 09:39:05 -0700


Pat,

On Sunday 20 January 2002 08:41 am, Patrick J. LoPresti wrote:
> Matt Peterson <matt@caldera.com> writes:
> >    3 - The AFS Client implementation on Linux is poorly designed and very
> >        unstable to the point that reboots become common place.
>
> Could you elaborate on this?  I am planning an AFS rollout here, but
> we hammer our file systems extremely hard.  If AFS on Linux is not
> ready for production use, that could prove an embarrassment.
>
> What sort of problems are you having?  Are other people on this list
> having trouble too?


I am unable to recommend the OpenAFS client for a production rollout on 
Linux.  It is not robust enough to be used in a production environment.  Let 
me offer a few examples of problems I have seen during my evaluation...

It is common to see the OpenAFS client become tied up in an afs_syscall that 
consumes 100% of the CPU.  When in this state it is impossible to shut down 
the client using normal shutdown scripts (i.e. '/etc/rc.d/init.d/afs stop', 
'shutdown -r now').   The only way is to hard boot the machine which is not 
acceptable.

I have also noticed that the libafs kernel module does not always release 
kernel resources when /afs is unmounted and afsd is stopped.   The kernel 
module ends up in a state where it can not be unloaded via rmmod and can not 
be reinitialized when afsd started and /afs is remounted.  The only way to 
restart AFS is reboot the machine.  

I have notices several kernel crashes -- especially on shutdown of the client 
that are related to OpenAFS kernel module.  I am investigating the oops and 
so far the appear to be related to code that is out of date with current 
Linux Kernel internals.

Finally, afsd is unable to handle a signal of any kind.  Several afsd 
processes (the rx callback listener, the trunc-cache daemon, and all the 
background daemons) will consume 100% of the CPU and never return from the 
syscall.  The only way to recover from this is to unmount /afs which puts you 
in the state mentioned previously where libafs still appears to retain use of 
kernel resources and will often crash the entire kernel.

This last problem is the one that bothers me the most.  It is extremely 
common for init scripts to send signals to processes as part of shutdown and 
restart operations.  afsd MUST be able to handle (ignore) signals.  

As I have time I am working on bug reports with specific reproduction 
instructions as well as a patches for the above problems.  

I hope I do not sound pessimistic.  OpenAFS is a great filesystem and I don't 
think that any of the problems I've mentioned will be too hard to fix.  Until 
then, I would be very careful in rolling it out as a production file system 
on Linux until it is a little more robust.

-- 
Matt Peterson
Sr. Software Engineer
Caldera, Inc
matt@caldera.com