[OpenAFS-devel] Solaris fixes for 1.4.x / AFS_SUN510_ENV

Mike Battersby mib@unimelb.edu.au
Wed, 30 Jan 2008 18:14:02 +1100


This is a multi-part message in MIME format.

--Boundary_(ID_lWssJMXNAwIOnUxIAqt4ow)
Content-type: text/plain; charset=ISO-8859-1; format=flowed
Content-transfer-encoding: 7bit

Dear AFS Developers,

Attached is a patch which I believe fixes two serious problems in the
AFS 1.4.x kernel module for Solaris 10 (AFS_SUN510_ENV).


1. SSYS process exiting considered harmful

  The first problem is that setting process flag SSYS on a process that
  exits, as the afs_osi_Invisible routine on Solaris 10 does, causes the
  system not to clean up the contract state of the process.  This leaves
  a dangling kernel-memory pointer in the contract table which used to
  point to the process struct.

  Any user can corrupt kernel memory and cause a panic with the 'ctstat'
  command and the system cannot shut down without either panicing or
  going into an infinite loop as svc.startd repeatedly tries to kill the
  non-existent process.

2. Taskq MSS job left running after shutdown

  The new 5.10u4 support using taskq's to schedule the MSS probing
  doesn't clean up when AFS is shut down.  If the module is then unloaded
  then the taskq is left scheduled to run pointing at a function that
  no longer exists.  Instant panic.


I really don't know why the code would set SSYS on a userland process
that's about to exit in the first place.  Can anyone shed any light?

I'm not sure of the placing of the cleanup code for case #2, as no
spot seems particularly better than another in afs_shutdown().

This is a diff against 1.4.5 but I think it should apply cleanly to
1.4.6.  I've no idea if these bugs are also present in the 1.5.x branch.

Since it is fairly small I've included it here.  I apologise if that's
against list etiquette.

Regards,

  - Mike

--Boundary_(ID_lWssJMXNAwIOnUxIAqt4ow)
Content-type: text/plain; x-mac-type="0"; x-mac-creator="0";
 name="openafs-1.4.5.solfixes.diff"
Content-transfer-encoding: 7bit
Content-disposition: inline; filename="openafs-1.4.5.solfixes.diff"

diff -uNr openafs-1.4.5.orig/src/afs/afs_call.c openafs-1.4.5/src/afs/afs_call.c
--- openafs-1.4.5.orig/src/afs/afs_call.c	2007-10-17 13:51:44.000000000 +1000
+++ openafs-1.4.5/src/afs/afs_call.c	2008-01-30 18:02:20.008681000 +1100
@@ -1862,6 +1862,11 @@
 #else
     afs_termState = AFSOP_STOP_COMPLETE;
 #endif
+#ifdef AFS_SUN510_ENV
+    afs_warn("NetIfPoller... ");
+    rw_destroy(&afsifinfo_lock);
+    ddi_taskq_destroy(afs_taskq);
+#endif
     afs_warn("\n");
 
     /* Close file only after daemons which can write to it are stopped. */
diff -uNr openafs-1.4.5.orig/src/afs/afs_osi.c openafs-1.4.5/src/afs/afs_osi.c
--- openafs-1.4.5.orig/src/afs/afs_osi.c	2007-04-04 04:57:06.000000000 +1000
+++ openafs-1.4.5/src/afs/afs_osi.c	2008-01-29 18:43:32.090145000 +1100
@@ -291,7 +291,7 @@
 {
 #ifdef AFS_LINUX22_ENV
     afs_osi_MaskSignals();
-#elif defined(AFS_SUN5_ENV)
+#elif defined(AFS_SUN5_ENV) && !defined(AFS_SUN510_ENV)
     curproc->p_flag |= SSYS;
 #elif defined(AFS_HPUX101_ENV) && !defined(AFS_HPUX1123_ENV)
     set_system_proc(u.u_procp);
diff -uNr openafs-1.4.5.orig/src/rx/SOLARIS/rx_knet.c openafs-1.4.5/src/rx/SOLARIS/rx_knet.c
--- openafs-1.4.5.orig/src/rx/SOLARIS/rx_knet.c	2007-10-05 12:54:10.000000000 +1000
+++ openafs-1.4.5/src/rx/SOLARIS/rx_knet.c	2008-01-30 16:36:20.033430000 +1100
@@ -591,6 +591,12 @@
     int index;
     uint_t mtu;
     uint64_t flags;
+    extern int afs_shuttingdown;
+
+    /* If we're shutting down we need to stop rescheduling more
+     * taskq runs so we can destroy the taskq */
+    if (afs_shuttingdown)
+	return;
 
     /* Get our permissions */
     cr = CRED();

--Boundary_(ID_lWssJMXNAwIOnUxIAqt4ow)--