[OpenAFS-devel] openafs hangs on shutdown with selinux (caused by callback expiration via umount?)

Christopher Allen Wing wingc@umich.edu
Wed, 2 Jan 2008 17:45:19 -0500 (EST)


Hello,

We've noticed occasional hangs at shutdown when running OpenAFS 1.4.x on 
RHEL5 systems; this seems to be caused by the fact that SELinux is 
restricting network I/O from within the kernel when performed in the 
context of the actual 'umount' process.

I believe this is what's going on:


 	1. system shutdown is started

 	2. AFS init script is called to 'stop'

 	3. /etc/rc.d/init.d/openafs-client does 'umount /afs'

 	4. umount process is run in 'mount_t' SELinux context

 	5. umount process does umount() system call

 	6. kernel code flow runs along these lines:

 		sys_umount("/afs")

 		...

 		mntput(struct vfsmount * corresponding to /afs)

 		...

 		deactivate_super(struct super_block * corresponding to /afs)

 		...

 		generic_shutdown_super(struct super_block * corresponding to /afs)

 		...

 		(struct super_block * for /afs)->put_super()

 		afs_shutdown()

 	7. Something called by afs_shutdown() attempts to do network I/O,
 	   I'm guessing perhaps this is expiring open callbacks on the
 	   fileservers?

 	8. SELinux blocks this network I/O because a process running in
 	   mount_t security context is not permitted to do so according to
 	   the RHEL5 security policy.


We see a large number of SELinux permission denied messages in the kernel 
log like this:

 	audit(1199237877.841:1837): avc:  denied  { write } for  pid=29174 comm="umount" lport=7001 scontext=system_u:system_r:mount_t:s0 tcontext=system_u:system_r:initrc_t:s0 tclass=udp_socket


from which I infer that there is some code in AFS which wants to send out 
packets from the AFS client port (7001) during afs_shutdown().  At this 
point I have not yet gotten a stack trace to see what part of the openafs 
module is actually doing this.

I believe that SELinux accounts network traffic to the actual process 
context in which it originates; thus since the 'umount' process is in the 
kernel (the umount() system call) the SELinux policy for mount gets used.



Has anyone else seen this hang?


For the time being I hacked around this by forcing the 'umount' in the 
openafs-client init script to run in an unrestricted SELinux security 
context like this:

 	----- /etc/rc.d/init.d/openafs-client ---

 	stop() {
 		...
 		...
 		runcon system_u:system_r:unconfined_t -- /bin/umount /afs




I was not able to reproduce the problem in brief testing, though; it 
seemed to be associated with trying to reboot a RHEL5 host after it had 
been up (and accessing afs) for a day or similar length of time.  I 
couldn't get the hang to occur by just booting a machine, using AFS, 
trying to reboot, etc.  (I did try some things like putting the network 
cable prior to shutdown, to see if I could somehow make the client act 
differently)

Since I modified the init script as above I have not seen the problem 
recur.


I don't have a recommendation for a resolution to this problem at present, 
so I'm asking for ideas from others who might be running OpenAFS in a 
SELinux environment.


Thoughts:


 	1. What does mount/umount do for NFS (in regards to SELinux)?

 	2. One solution would be the above hack (basically disable SELinux
 	   protection for the 'umount' command when run via the openafs
 	   init script).

 	3. Another solution would be to modify the SELinux policy when/if
 	   necessary (on all RHEL5, suitable Fedora Core releases, etc.).
 	   This would be a more involved change to the existing OpenAFS
 	   packaging.

 	4. Otherwise we might modify the openafs kernel code so that it
 	   does not attempt I/O from within afs_shutdown(); i.e., do it
 	   from within one of the AFS kernel daemons instead.

 	   I have no idea how feasible / desirable this approach would be.
 	   Again, what does NFS/cifs/etc do here?



Thanks a lot,

Chris Wing
wingc@umich.edu