[OpenAFS-devel] Solaris fixes for 1.4.x / AFS_SUN510_ENV
Mike Battersby
mib@unimelb.edu.au
Wed, 30 Jan 2008 18:14:02 +1100
This is a multi-part message in MIME format.
--Boundary_(ID_lWssJMXNAwIOnUxIAqt4ow)
Content-type: text/plain; charset=ISO-8859-1; format=flowed
Content-transfer-encoding: 7bit
Dear AFS Developers,
Attached is a patch which I believe fixes two serious problems in the
AFS 1.4.x kernel module for Solaris 10 (AFS_SUN510_ENV).
1. SSYS process exiting considered harmful
The first problem is that setting process flag SSYS on a process that
exits, as the afs_osi_Invisible routine on Solaris 10 does, causes the
system not to clean up the contract state of the process. This leaves
a dangling kernel-memory pointer in the contract table which used to
point to the process struct.
Any user can corrupt kernel memory and cause a panic with the 'ctstat'
command and the system cannot shut down without either panicing or
going into an infinite loop as svc.startd repeatedly tries to kill the
non-existent process.
2. Taskq MSS job left running after shutdown
The new 5.10u4 support using taskq's to schedule the MSS probing
doesn't clean up when AFS is shut down. If the module is then unloaded
then the taskq is left scheduled to run pointing at a function that
no longer exists. Instant panic.
I really don't know why the code would set SSYS on a userland process
that's about to exit in the first place. Can anyone shed any light?
I'm not sure of the placing of the cleanup code for case #2, as no
spot seems particularly better than another in afs_shutdown().
This is a diff against 1.4.5 but I think it should apply cleanly to
1.4.6. I've no idea if these bugs are also present in the 1.5.x branch.
Since it is fairly small I've included it here. I apologise if that's
against list etiquette.
Regards,
- Mike
--Boundary_(ID_lWssJMXNAwIOnUxIAqt4ow)
Content-type: text/plain; x-mac-type="0"; x-mac-creator="0";
name="openafs-1.4.5.solfixes.diff"
Content-transfer-encoding: 7bit
Content-disposition: inline; filename="openafs-1.4.5.solfixes.diff"
diff -uNr openafs-1.4.5.orig/src/afs/afs_call.c openafs-1.4.5/src/afs/afs_call.c
--- openafs-1.4.5.orig/src/afs/afs_call.c 2007-10-17 13:51:44.000000000 +1000
+++ openafs-1.4.5/src/afs/afs_call.c 2008-01-30 18:02:20.008681000 +1100
@@ -1862,6 +1862,11 @@
#else
afs_termState = AFSOP_STOP_COMPLETE;
#endif
+#ifdef AFS_SUN510_ENV
+ afs_warn("NetIfPoller... ");
+ rw_destroy(&afsifinfo_lock);
+ ddi_taskq_destroy(afs_taskq);
+#endif
afs_warn("\n");
/* Close file only after daemons which can write to it are stopped. */
diff -uNr openafs-1.4.5.orig/src/afs/afs_osi.c openafs-1.4.5/src/afs/afs_osi.c
--- openafs-1.4.5.orig/src/afs/afs_osi.c 2007-04-04 04:57:06.000000000 +1000
+++ openafs-1.4.5/src/afs/afs_osi.c 2008-01-29 18:43:32.090145000 +1100
@@ -291,7 +291,7 @@
{
#ifdef AFS_LINUX22_ENV
afs_osi_MaskSignals();
-#elif defined(AFS_SUN5_ENV)
+#elif defined(AFS_SUN5_ENV) && !defined(AFS_SUN510_ENV)
curproc->p_flag |= SSYS;
#elif defined(AFS_HPUX101_ENV) && !defined(AFS_HPUX1123_ENV)
set_system_proc(u.u_procp);
diff -uNr openafs-1.4.5.orig/src/rx/SOLARIS/rx_knet.c openafs-1.4.5/src/rx/SOLARIS/rx_knet.c
--- openafs-1.4.5.orig/src/rx/SOLARIS/rx_knet.c 2007-10-05 12:54:10.000000000 +1000
+++ openafs-1.4.5/src/rx/SOLARIS/rx_knet.c 2008-01-30 16:36:20.033430000 +1100
@@ -591,6 +591,12 @@
int index;
uint_t mtu;
uint64_t flags;
+ extern int afs_shuttingdown;
+
+ /* If we're shutting down we need to stop rescheduling more
+ * taskq runs so we can destroy the taskq */
+ if (afs_shuttingdown)
+ return;
/* Get our permissions */
cr = CRED();
--Boundary_(ID_lWssJMXNAwIOnUxIAqt4ow)--