[OpenAFS-devel] openafs hangs on shutdown with selinux (caused by callback expiration
via umount?)
Christopher Allen Wing
wingc@umich.edu
Wed, 2 Jan 2008 17:45:19 -0500 (EST)
Hello,
We've noticed occasional hangs at shutdown when running OpenAFS 1.4.x on
RHEL5 systems; this seems to be caused by the fact that SELinux is
restricting network I/O from within the kernel when performed in the
context of the actual 'umount' process.
I believe this is what's going on:
1. system shutdown is started
2. AFS init script is called to 'stop'
3. /etc/rc.d/init.d/openafs-client does 'umount /afs'
4. umount process is run in 'mount_t' SELinux context
5. umount process does umount() system call
6. kernel code flow runs along these lines:
sys_umount("/afs")
...
mntput(struct vfsmount * corresponding to /afs)
...
deactivate_super(struct super_block * corresponding to /afs)
...
generic_shutdown_super(struct super_block * corresponding to /afs)
...
(struct super_block * for /afs)->put_super()
afs_shutdown()
7. Something called by afs_shutdown() attempts to do network I/O,
I'm guessing perhaps this is expiring open callbacks on the
fileservers?
8. SELinux blocks this network I/O because a process running in
mount_t security context is not permitted to do so according to
the RHEL5 security policy.
We see a large number of SELinux permission denied messages in the kernel
log like this:
audit(1199237877.841:1837): avc: denied { write } for pid=29174 comm="umount" lport=7001 scontext=system_u:system_r:mount_t:s0 tcontext=system_u:system_r:initrc_t:s0 tclass=udp_socket
from which I infer that there is some code in AFS which wants to send out
packets from the AFS client port (7001) during afs_shutdown(). At this
point I have not yet gotten a stack trace to see what part of the openafs
module is actually doing this.
I believe that SELinux accounts network traffic to the actual process
context in which it originates; thus since the 'umount' process is in the
kernel (the umount() system call) the SELinux policy for mount gets used.
Has anyone else seen this hang?
For the time being I hacked around this by forcing the 'umount' in the
openafs-client init script to run in an unrestricted SELinux security
context like this:
----- /etc/rc.d/init.d/openafs-client ---
stop() {
...
...
runcon system_u:system_r:unconfined_t -- /bin/umount /afs
I was not able to reproduce the problem in brief testing, though; it
seemed to be associated with trying to reboot a RHEL5 host after it had
been up (and accessing afs) for a day or similar length of time. I
couldn't get the hang to occur by just booting a machine, using AFS,
trying to reboot, etc. (I did try some things like putting the network
cable prior to shutdown, to see if I could somehow make the client act
differently)
Since I modified the init script as above I have not seen the problem
recur.
I don't have a recommendation for a resolution to this problem at present,
so I'm asking for ideas from others who might be running OpenAFS in a
SELinux environment.
Thoughts:
1. What does mount/umount do for NFS (in regards to SELinux)?
2. One solution would be the above hack (basically disable SELinux
protection for the 'umount' command when run via the openafs
init script).
3. Another solution would be to modify the SELinux policy when/if
necessary (on all RHEL5, suitable Fedora Core releases, etc.).
This would be a more involved change to the existing OpenAFS
packaging.
4. Otherwise we might modify the openafs kernel code so that it
does not attempt I/O from within afs_shutdown(); i.e., do it
from within one of the AFS kernel daemons instead.
I have no idea how feasible / desirable this approach would be.
Again, what does NFS/cifs/etc do here?
Thanks a lot,
Chris Wing
wingc@umich.edu