[OpenAFS-devel] A few questions about the current Linux implementation of the AFS client
Matt Peterson
matt@caldera.com
Mon, 21 Jan 2002 09:39:05 -0700
Pat,
On Sunday 20 January 2002 08:41 am, Patrick J. LoPresti wrote:
> Matt Peterson <matt@caldera.com> writes:
> > 3 - The AFS Client implementation on Linux is poorly designed and very
> > unstable to the point that reboots become common place.
>
> Could you elaborate on this? I am planning an AFS rollout here, but
> we hammer our file systems extremely hard. If AFS on Linux is not
> ready for production use, that could prove an embarrassment.
>
> What sort of problems are you having? Are other people on this list
> having trouble too?
I am unable to recommend the OpenAFS client for a production rollout on
Linux. It is not robust enough to be used in a production environment. Let
me offer a few examples of problems I have seen during my evaluation...
It is common to see the OpenAFS client become tied up in an afs_syscall that
consumes 100% of the CPU. When in this state it is impossible to shut down
the client using normal shutdown scripts (i.e. '/etc/rc.d/init.d/afs stop',
'shutdown -r now'). The only way is to hard boot the machine which is not
acceptable.
I have also noticed that the libafs kernel module does not always release
kernel resources when /afs is unmounted and afsd is stopped. The kernel
module ends up in a state where it can not be unloaded via rmmod and can not
be reinitialized when afsd started and /afs is remounted. The only way to
restart AFS is reboot the machine.
I have notices several kernel crashes -- especially on shutdown of the client
that are related to OpenAFS kernel module. I am investigating the oops and
so far the appear to be related to code that is out of date with current
Linux Kernel internals.
Finally, afsd is unable to handle a signal of any kind. Several afsd
processes (the rx callback listener, the trunc-cache daemon, and all the
background daemons) will consume 100% of the CPU and never return from the
syscall. The only way to recover from this is to unmount /afs which puts you
in the state mentioned previously where libafs still appears to retain use of
kernel resources and will often crash the entire kernel.
This last problem is the one that bothers me the most. It is extremely
common for init scripts to send signals to processes as part of shutdown and
restart operations. afsd MUST be able to handle (ignore) signals.
As I have time I am working on bug reports with specific reproduction
instructions as well as a patches for the above problems.
I hope I do not sound pessimistic. OpenAFS is a great filesystem and I don't
think that any of the problems I've mentioned will be too hard to fix. Until
then, I would be very careful in rolling it out as a production file system
on Linux until it is a little more robust.
--
Matt Peterson
Sr. Software Engineer
Caldera, Inc
matt@caldera.com