[OpenAFS] IP-based ACLs failing
Stephen Joyce
stephen@physics.unc.edu
Fri, 24 Aug 2007 21:41:15 -0400 (EDT)
I'm using IP-based ACLs to protect some parts of my cell. (I know this
not ideal, but the info isn't really sensitive. I just want to discourage
people in other cells from casual browsing).
A few weeks ago about 10 of my clients began periodically losing
connectivity to these directories. Always the same clients. Other clients
in the same ACL continued to work fine. Once it occured, this problem would
continue indefinitely (ie, waiting 2 hours didn't fix it).
Restarting the fs instance cleared the problem and connectivity was
restored for the next 24-36 hours, then the problem repeated. This only
seemed to happen on this one fileserver and one group of clients.
Assuming that there was a problem with that fileserver, last weekend I
moved all of it's volumes to our warm-spare server. Voila! Problem fixed..
until about 3 hours ago. Now the problem is repeating.
The FileLog doesn't show anything out of the ordinary when these clients
begin lose connectivity.
The fileserver is RHEL 3 (2.4.21-47.ELsmp) running
openafs-server-1.4.1-rhel3.3. The clients are all Debian Etch
(2.6.18-4-686) running openafs-client 1.4.2-6. Other identical clients
don't show the problem.
I realize the server (and clients) are a few minor revisions out of date,
but I generally try to stay away from the bleeding edge with production
servers.
So, questions:
1) is this a known problem, and if so, is it fixed in a newer version of
the server?
2) if it's not a known problem, what info would be useful in
troubleshooting it? The problem is occuring _right now_. I can solve it by
restarting the fs process, but can delay and troubleshoot if it would be
beneficial.
Thanks!
Cheers, Stephen
--
Stephen Joyce
Systems Administrator P A N I C
Physics & Astronomy Department Physics & Astronomy
University of North Carolina at Chapel Hill Network Infrastructure
voice: (919) 962-7214 and Computing
fax: (919) 962-0480 http://www.panic.unc.edu