[OpenAFS-devel] AFS and Solaris UFS
Frank Batschulat
Frank Batschulat <Frank.Batschulat@Sun.COM>
Wed, 11 Apr 2001 16:35:16 +0200 (MEST)
Subject: AFS and Solaris UFS
Hi All,
/*
* standard disclaimer applies here: This is NOT an offical
* statement from Sun Microsystems; I just felt it is right to
* tell people about these our findings ...
*/
I dont know how much you care about the Solaris version
of AFS, but if you do, you may be interested in some
facts we discovered while working thru a bug where
TransArc's AFS has become wedged on Solaris (7).
It has taken lot of time to digg this out and while
we at Sun do no have the resources to actually provide
you with the suggested diffs, I thouht we should share
at least this information with you.
The AFS version we've investigated was TransArc AFS 3.6
(production source code provided by TransArc) and the
OpenAFS source I've checked was from:
"http://www.openafs.org/cgi-bin/cvsweb.cgi/openafs/src/"
This code is broken in 4 areas with respect to the way
it makes use of Solaris UFS internal functions.
AFS is hooked on Solaris UFS that way:
<snip from: afs-3.6/src/afs/SOLARIS/osi_vfsops.c>
296 ufs_iputp = (int (*)()) modlookup("ufs", "ufs_iput");
297 ufs_iallocp = (int (*)()) modlookup("ufs", "ufs_ialloc");
298 ufs_iupdatp = (int (*)()) modlookup("ufs", "ufs_iupdat");
299 ufs_igetp = (int (*)()) modlookup("ufs", "ufs_iget");
300 udp_infop = (struct streamtab *) modlookup("udp", "udpinfo");
301 ill_g_headp = (struct ill_s *) modlookup("ip", "ill_g_head");
302
303 if ( !ufs_iputp || !ufs_iallocp || !ufs_iupdatp ||
304 !ufs_igetp || !udp_infop || !ill_g_headp )
305 afs_warn("AFS to UFS mapping cannot be fully initialised\n");
<snip end>
Now the problematical areas in detail:
1.) AFS is broken with respect to Solaris UFS quotas:
- AFS is'nt aware of the vfs_dqrwlock (Manages quota sub-system quiescence,
see /usr/include/sys/fs/ufs_inode.h) when calling
the Solaris UFS internal functions, hence AFS should not be
used in case where UFS quotas are enabled, or vice-versa
UFS quotas should not be turned on on an AFS operated UFS.
- This also makes it hard to get a debug Solaris UFS module
running on a system where AFS is operating.
- This in detail is the case when AFS is calling Solaris UFS
ufs_iget() without holding the vfs_dqrwlock, this will
panic a Solaris system running a Solaris UFS debug module
straight away when AFS is performing the first ufs_iget()
in its startup phase.
2.) AFS is broken with respect of updating inode->i_flags:
- Found a place where AFS is updating (removing) IACC
from the inode->i_flags without holding the
proper locks (inode->i_contents(RW_READER) + inode->i_tlock)
- This code here is obviously not helding any inode locks
while updating the inode->i_flag:
<snip from: afs-3.6/src/afs/SOLARIS/osi_file.c>
260 void osi_DisableAtimes(avp)
261 struct vnode *avp;
262 {
263 if (afs_CacheFSType == AFS_SUN_UFS_CACHE) {
264 struct inode *ip = VTOI(avp);
265 ip->i_flag &= ~IACC;
266 }
267 }
<snip end>
- A quick search for callers of this function in ./afs-3.6/src/afs/SOLARIS
gives and indeed unprotected call:
<snip from afs-3.6/src/afs/SOLARIS/osi_file.c>
270 /* Generic read interface */
271 afs_osi_Read(afile, offset, aptr, asize)
.......
302 if (code == 0) {
303 code = asize - resid;
304 afile->offset += code;
305 osi_DisableAtimes(afile->vnode);
306 }
<snip end>
- It should be checked if there are other callers of osi_DisableAtimes()
or afs_osi_Read() which do not held the proper locks.
(as the case
"afs_UFSCacheStoreProc()->afs_osi_Read()->osi_DisableAtimes()")
3.) AFS is broken with respect to usage of ufs_iput():
- AFS is calling Solaris UFS ufs_iput() in several places,
but ufs_iput() has been obsoleted since Solaris 2.5
in 1995 via bug# 1190137 in delta 2.132 of ufs_inode.c.
- A comment has been added as a result of this in ufs_inode.c
to reflect the new locking/calling roles and the new Solaris UFS
locking rules has been made visible in /usr/include/sys/fs/ufs_inode.h
for the rest of the world as well.
<snip from: ufs_inode.c>
540 /*
541 * Update times and vrele associated vnode
542 *
543 * DO NOT USE: Please see ufs_inode.h for new locking rules that
544 * have obsoleted the use of ufs_iput. You should be using
545 * ITIMES, ITIMES_NOLOCK, and VN_RELE as appropriate.
546 */
547 void
548 ufs_iput(struct inode *ip)
549 {
550 ITIMES(ip);
551 VN_RELE(ITOV(ip));
552 }
<snip end>
- What's the problem with calling ufs_iput() regardless ?
Solaris UFS ufs_iput() is calling ITIMES() without any locks
held to update the inode->i_flags and the inode modifications times,
ITIMES() itself is only holding the inode->i_tlock mutex which protects
the inode's i_flag's. The real problem is now beside the usage of
obsoleted Solaris UFS ufs_iput() that AFS is'nt holding the proper
locks when calling Solaris UFS ufs_iput(), it is not holding the
inode->i_contents lock as RW_READER this opens a race window for a
thread holding the inode->i_contents lock as RW_WRITER updating the
inode and did'nt care about the inode->i_tlock. (Note this is ok for
Solaris UFS internally as it follows the inode locking protocoll, see
/usr/include/sys/fs/ufs_inode.h for details)
4.) AFS is broken with respect to usage of ufs_ialloc():
- AFS is calling Solaris UFS ufs_ialloc() without holding the
target directory's inode->i_rwlock as RW_WRITER and
without holding the target directoy's inode->i_contents lock
as RW_WRITER.
- The code in detail:
<snip from: ./afs-3.6/src/afs/SOLARIS/osi_inode.c/afs_syscall_icreate()>
<comment of me: it gets the inode via ufs_iget().....>
132 if (code = (*ufs_iallocp)(ip, near_inode, 0, &newip, credp)) {
133 (*ufs_iputp)(ip);
134 return (code);
<snip end>
- When you check the whole code of this function it turnes out that
this code actually has 3 problems:
4.1) the ufs_ialloc() problem as discussed in 4.)
4.2) the ufs_iput() problem as discussed in 3.)
4.3) the vfs_dqrwlock problem as discussed in 1.)
One must see these problems in light of the parallel operating
Solaris UFS which rely and depends on the inode locking protocoll.
Not following these rules may lead to deadlocks, hangs, panics
(this is'nt a theorethical consequence, it has been observed)
Solaris UFS general rules:
- If you're holding the inode->i_contents lock
as RW_WRITER you could update anything in the inode
and the inode->i_flags without need to hold the
inode->i_tlock mutex.
- If you're going to update only fields protected by the
inode->i_tlock you have to hold the inode->i_contents
lock as RW_READER *before* acquireing the inode->i_tlock mutex.
- If you're not holding the inode->i_contents as RW_READER
before getting the inode->i_tlock and updating those
field (as ufs_iput() does) you're busted by several
other Solaris UFS functions holding the inode->i_contents lock
as RW_WRITER and do not care about the inode->i_tlock
while modifiying the inode (Note: this is ok for
Solaris UFS in general as it follows this protocoll)
- If you're going to aquire the vfs_dqrwlock there are some policies
with respect to the inode's i_contents lock (and I've seen AFS using
it...in osi_inode.c) and things will get more complicated with it.
The main difference in how the i_contents lock is taken should be
based on whether or not the vfs_dqrwlock is held - if it is,
then we need a writer lock, otherwise a reader is sufficient.
that's how it's *supposed* to work.
- At the end, the locking protocoll used by Solaris UFS
is explained in /usr/include/sys/fs/ufs_inode.h.
- If you want to see how Solaris UFS is implemented you should
get a copy of the Free Solaris Foundation Source:
http://www.sun.com/solaris/source/
kind regards
frank
========================================================================
Frank Batschulat
Solaris CTE/Kernel Sustaining & Engineering
Sun Microsystems GmbH
mailto:Frank.Batschulat@Germany.Sun.COM
========================================================================