[OpenAFS] Re: OpenAFS client cache overrun?

Andrew Deason adeason@sinenomine.net
Wed, 12 Mar 2014 11:04:03 -0500

On Wed, 12 Mar 2014 10:20:56 -0500
Eric Chris Garrison <ecgarris@iu.edu> wrote:

> Additional additional: If I didn't mention it before, this is all
> going over samba-on-OpenAFS. Yes, I know, users should be using the
> OpenAFS client rather than going through samba on a gateway. We have
> found it extremely difficult to get users to adopt this method,
> however, and have to try to make this work.

I don't think you need to keep saying this :) While that setup is maybe
not ideal, you shouldn't be able to lock-up the client like that. The
samba daemon(s) are accessing files over /afs like anything else.

> 3 - I had enabled a 2GB cache bypass, and it seemed to have no effect
> whatsoever.

"cache bypass" doesn't do anything for writes, only for read operations.
That probably wasn't clear, but I didn't know before if this was just
something stuffing data into afs or reading/writing stuff, or what.

> cmbdebug said this:
> [root@rgwb1 ~]# cmdebug localhost
> Lock afs_discon_lock status: (none_waiting, 21876 read_locks(pid:29278))

To be clear, this just ran and then exited on its own, right? You didn't
ctrl-C it or anything.

> [root@rgwb1 ~]# !ps
> ps -ef | grep 29278
> root     29278  4477  0 09:27 ?        00:00:00 smbd
> root     30101 29337  0 09:37 pts/3    00:00:00 grep 29278
> When I ran "top" I saw that the afs_cachetrim process was #1, but
> presumably wedged.
> I goosed /proc/sysrq-trigger and as promised, it dumped a lot of call
> trace info to the syslog. I'm looking through it, but am not sure what to
> look for. Nothing stands out, anyway.

You're looking for the stack trace for the afs_cachetrim process. Look
in syslog for "afs_cachetrim", or its pid. Under that should be a trace
of functions that indicates where we are in the code at that time.

I would extract that, and the entry for a hanging process. So, maybe
29278, or if anything hangs when touching anything in /afs, you could
get the entry for that.

Or if you want to try to find "everything", just look for anything
containing the string "afs".

If you ever don't want to leave the system hanging while you examine it,
but you want to capture information you can examine later, you can
generate a core dump. If your system is setup to capture a core on crash
(I'm not sure if this is the default... look at RHEL documentation, it
should be something mentioning kdump or kexec), you can crash the system
and you'll get a vmcore afterwards. To do this, send a 'c' to
/proc/sysrq-trigger. That will of course crash the system and cause it
to reboot, so don't do that if that's not what you want to happen.

Andrew Deason