[OpenAFS-devel] AFS and Solaris UFS

Frank Batschulat Frank Batschulat <Frank.Batschulat@Sun.COM>
Wed, 11 Apr 2001 16:35:16 +0200 (MEST)


Subject: AFS and Solaris UFS

Hi All,

/*
 * standard disclaimer applies here: This is NOT an offical
 * statement from Sun Microsystems; I just felt it is right to
 * tell people about these our findings ...
 */

I dont know how much you care about the Solaris version
of AFS, but if you do, you may be interested in some
facts we discovered while working thru a bug where
TransArc's AFS has become wedged on Solaris (7).

It has taken lot of time to digg this out and while
we at Sun do no have the resources to actually provide
you with the suggested diffs, I thouht we should share
at least this information with you.

The AFS version we've investigated was TransArc AFS 3.6 
(production source code provided by TransArc) and the 
OpenAFS source I've checked was from:
"http://www.openafs.org/cgi-bin/cvsweb.cgi/openafs/src/"

This code is broken in 4 areas with respect to the way
it makes use of Solaris UFS internal functions.

AFS is hooked on Solaris UFS that way:

<snip from: afs-3.6/src/afs/SOLARIS/osi_vfsops.c>
   296      ufs_iputp = (int (*)()) modlookup("ufs", "ufs_iput");
   297      ufs_iallocp = (int (*)()) modlookup("ufs", "ufs_ialloc");
   298      ufs_iupdatp = (int (*)()) modlookup("ufs", "ufs_iupdat");
   299      ufs_igetp = (int (*)()) modlookup("ufs", "ufs_iget");
   300      udp_infop = (struct streamtab *) modlookup("udp", "udpinfo");
   301      ill_g_headp = (struct ill_s *) modlookup("ip", "ill_g_head");
   302
   303      if ( !ufs_iputp || !ufs_iallocp || !ufs_iupdatp ||
   304           !ufs_igetp || !udp_infop || !ill_g_headp )
   305          afs_warn("AFS to UFS mapping cannot be fully initialised\n");
<snip end>

Now the problematical areas in detail:

1.) AFS is broken with respect to Solaris UFS quotas:

    - AFS is'nt aware of the vfs_dqrwlock (Manages quota sub-system quiescence,
      see /usr/include/sys/fs/ufs_inode.h) when calling
      the Solaris UFS internal functions, hence AFS should not be
      used in case where UFS quotas are enabled, or vice-versa
      UFS quotas should not be turned on on an AFS operated UFS.
    - This also makes it hard to get a debug Solaris UFS module
      running on a system where AFS is operating.
    - This in detail is the case when AFS is calling Solaris UFS
      ufs_iget() without holding the vfs_dqrwlock, this will
      panic a Solaris system running a Solaris UFS debug module 
      straight away when AFS is performing the first ufs_iget()
      in its startup phase.

2.) AFS is broken with respect of updating inode->i_flags:

    - Found a place where AFS is updating (removing) IACC
      from the inode->i_flags without holding the
      proper locks (inode->i_contents(RW_READER) + inode->i_tlock)
    - This code here is obviously not helding any inode locks
      while updating the inode->i_flag:
    
<snip from: afs-3.6/src/afs/SOLARIS/osi_file.c>
   260  void osi_DisableAtimes(avp)
   261  struct vnode *avp;
   262  {
   263     if (afs_CacheFSType == AFS_SUN_UFS_CACHE) {
   264         struct inode *ip = VTOI(avp);
   265         ip->i_flag &= ~IACC;
   266     }
   267  }   
<snip end>
   
    - A quick search for callers of this function in ./afs-3.6/src/afs/SOLARIS
      gives and indeed unprotected call:
     
<snip from afs-3.6/src/afs/SOLARIS/osi_file.c>
   270  /* Generic read interface */
   271  afs_osi_Read(afile, offset, aptr, asize)
.......
   302      if (code == 0) {
   303          code = asize - resid;
   304          afile->offset += code;
   305          osi_DisableAtimes(afile->vnode);
   306      }
<snip end>
  
   - It should be checked if there are other callers of osi_DisableAtimes() 
     or afs_osi_Read() which do not held the proper locks.
     (as the case 
"afs_UFSCacheStoreProc()->afs_osi_Read()->osi_DisableAtimes()")
    
3.) AFS is broken with respect to usage of ufs_iput():

    - AFS is calling Solaris UFS ufs_iput() in several places,
      but ufs_iput() has been obsoleted since Solaris 2.5 
      in 1995 via bug# 1190137 in delta 2.132 of ufs_inode.c.
    - A comment has been added as a result of this in ufs_inode.c
      to reflect the new locking/calling roles and the new Solaris UFS 
      locking rules has been made visible in /usr/include/sys/fs/ufs_inode.h
      for the rest of the world as well.

<snip from: ufs_inode.c>
 540 /*
 541  * Update times and vrele associated vnode
 542  *
 543  *      DO NOT USE: Please see ufs_inode.h for new locking rules that
 544  *      have obsoleted the use of ufs_iput.  You should be using
 545  *      ITIMES, ITIMES_NOLOCK, and VN_RELE as appropriate.
 546  */
 547 void
 548 ufs_iput(struct inode *ip)
 549 {
 550         ITIMES(ip);
 551         VN_RELE(ITOV(ip));
 552 }
<snip end>
        
   - What's the problem with calling ufs_iput() regardless ?     
     Solaris UFS ufs_iput() is calling ITIMES() without any locks
     held to update the inode->i_flags and the inode modifications times,
     ITIMES() itself is only holding the inode->i_tlock mutex which protects
     the inode's i_flag's. The real problem is now beside the usage of
     obsoleted Solaris UFS ufs_iput() that AFS is'nt holding the proper
     locks when calling Solaris UFS ufs_iput(), it is not holding the
     inode->i_contents lock as RW_READER this opens a race window for a
     thread holding the inode->i_contents lock as RW_WRITER updating the
     inode and did'nt care about the inode->i_tlock. (Note this is ok for
     Solaris UFS internally as it follows the inode locking protocoll, see 
     /usr/include/sys/fs/ufs_inode.h for details)

4.) AFS is broken with respect to usage of ufs_ialloc():

   - AFS is calling Solaris UFS ufs_ialloc() without holding the 
     target directory's inode->i_rwlock as RW_WRITER and
     without holding the target directoy's inode->i_contents lock
     as RW_WRITER.
   - The code in detail:
   
<snip from: ./afs-3.6/src/afs/SOLARIS/osi_inode.c/afs_syscall_icreate()>
<comment of me: it gets the inode via ufs_iget().....>
   132      if (code = (*ufs_iallocp)(ip, near_inode, 0, &newip, credp)) {
   133          (*ufs_iputp)(ip);
   134          return (code);
<snip end>

   - When you check the whole code of this function it turnes out that
     this code actually has 3 problems:
     4.1) the ufs_ialloc() problem as discussed in 4.)
     4.2) the ufs_iput() problem as discussed in 3.)
     4.3) the vfs_dqrwlock problem as discussed in 1.)

One must see these problems in light of the parallel operating
Solaris UFS which rely and depends on the inode locking protocoll.
Not following these rules may lead to deadlocks, hangs, panics
(this is'nt a theorethical consequence, it has been observed)

Solaris UFS general rules:

   - If you're holding the inode->i_contents lock
     as RW_WRITER you could update anything in the inode
     and the inode->i_flags without need to hold the
     inode->i_tlock mutex.
      
   - If you're going to update only fields protected by the
     inode->i_tlock you have to hold the inode->i_contents
     lock as RW_READER *before* acquireing the inode->i_tlock mutex.
      
   - If you're not holding the inode->i_contents as RW_READER
     before getting the inode->i_tlock and updating those
     field (as ufs_iput() does) you're busted by several
     other Solaris UFS functions holding the inode->i_contents lock
     as RW_WRITER and do not care about the inode->i_tlock
     while modifiying the inode (Note: this is ok for
     Solaris UFS in general as it follows this protocoll)
      
   - If you're going to aquire the vfs_dqrwlock there are some policies
     with respect to the inode's i_contents lock (and I've seen AFS using
     it...in osi_inode.c) and things will get more complicated with it.
     The main difference in how the i_contents lock is taken should be
     based on whether or not the vfs_dqrwlock is held - if it is,
     then we need a writer lock, otherwise a reader is sufficient.
     that's how it's *supposed* to work.
    
   - At the end, the locking protocoll used by Solaris UFS
     is explained in /usr/include/sys/fs/ufs_inode.h.
     
   - If you want to see how Solaris UFS is implemented you should
     get a copy of the Free Solaris Foundation Source:
     http://www.sun.com/solaris/source/
     
kind regards
frank

========================================================================
Frank Batschulat                
Solaris CTE/Kernel Sustaining & Engineering

                               
Sun Microsystems GmbH          
mailto:Frank.Batschulat@Germany.Sun.COM
========================================================================