[OpenAFS] Re: Crazy DAFS problem (with log)

Tue, 22 Mar 2011 17:22:26 -0400

This is an OpenPGP/MIME signed message (RFC 2440 and 3156)
--------------enig18807DDCED9D95807CC3EA6E
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable

Andrew:

Thanks for taking the time to dive into this.  I agree that at this
point the discussion really needs to be taken to openafs-devel in order
to ensure that the right audience is paying attention.  This e-mail will
hopefully take care of that.

Jeffrey Altman

On 3/22/2011 4:13 PM, Andrew Deason wrote:
> On Mon, 21 Mar 2011 23:25:36 -0500
> Andrew Deason <adeason@sinenomine.net> wrote:
>=20
>> However, I don't think this fully explains the behavior. I haven't
>> checked it out yet (and will not today), but this shouldn't really be
>> causing the fd to be left open. We are calling IH_REALLYCLOSE when the=

>> volume goes offline, and we do appear to be going through the code
>> path that closes all relevant file descriptors. But after that is
>> done, I still see an FdHandle_t with a refcount of 0, in the _OPEN
>> state holding open the problematic inode (this is without the fix
>> mentioned above).  So, it seems like ih_fdclose or whatever isn't
>> doing its job, and it seems like that warrants investigation as well.
>=20
> We're pretty far into -devel territory now, I think, but...
>=20
> I think this is another separate bug, though I don't know if it
> manifests as an actual noticeable issue when gerrit 4272 is in play.
> While we do issue IH_REALLYCLOSE for all of the vnode handles for the
> volume, the handle we're leaving open isn't part of the volume that's
> going offline (in my usage, anyway). This may be a bit hard to explain,=

> but it's something like this...
>=20
> We open some vnode in RW vol for writing, and CoW it; our old handle fo=
r
> inode A is closed, and write to the new handle B (the RW copy). When th=
e
> vol goes offline, handle B sticks around because of the leak fixed by
> 4272. The volume is released, and we write to the vnode again. We CoW i=
t
> so now the RW has a new handle C, and FDH_REALLYCLOSE's handle B.
> However, handle B never goes away, since while we do IH_RELEASE it in
> the CoW/StoreData code, the ref count doesn't go down to 0, since we
> leaked a positive ref earlier.
>=20
> In theory, this should just be a resource leak, and the handle for B
> should not stay open because FDH_REALLYCLOSE is supposed to close the
> underlying fd. However, due to CoW refactoring in 1.6, the original
> target handle is opened twice. StoreData_RXStyle holds the ref in
> origfdP and CopyOnWrite opens another FdHandle_t from the target vnode'=
s
> ihandle. So, when HAVE_PIO is defined, this means that when CoW
> IH_OPEN's the targetvnode->handle, it gets back the same FdHandle_t as
> StorData_RXStyle has, but the ref count is inc'd. And if you look
> closely at fd_reallyclose, you'll see that it is effectively a no-op
> when fd_refcnt > 1.
>=20
> This isn't an issue when gerrit 4272 fixes the ihandle leak, since when=

> the ihandle refcnt reaches 0, we close all associated file descriptors
> anyway.
>=20
> A conservative fix for this I think would be to add some field to
> FdHandle_t to indicate that a reallyclose is needed, so a subsequent
> call to FDH_CLOSE causes an effective FDH_REALLYCLOSE. (Like how
> ihandle_t has the IH_REALLY_CLOSED flag.) A "better" fix would be to ad=
d
> a new fd_status type, but I'd be worried for stable 1.6.x about screwin=
g
> up the other fd_status checks.
>=20

--------------enig18807DDCED9D95807CC3EA6E
Content-Type: application/pgp-signature; name="signature.asc"
Content-Description: OpenPGP digital signature
Content-Disposition: attachment; filename="signature.asc"

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (MingW32)

iQEcBAEBAgAGBQJNiRMVAAoJENxm1CNJffh4CsAH/izcULzNEXD0iZOfzzA/dA4c
pQss5iy7ft45ZJmHHh5zTodV0qZs4SEJCe2j3M7B6tj5mCE7iw532FkhpQZ/bsMi
+vRFD5Xo9dOgMMUlSHJ9wBeSSk3dnrcfVTd3/X2MpzIDzYo+fyWPK+nlQlE7xhP9
wyJlcc+vN9ItYBa3vkLTPQElP34kCeORFLmc3h6oIo3A23pqbv34e+TNbLQ1U6Tt
QEEB+27UKW8Bw6OE2ea/q0zw/c94B1SzQUR+iMzfS77IGRcLlwf6KAMvt/owWvm1
SCHmfkxUhPZx5600ePzqsUKLWzXP9rPMXfUCLRDeyVJt9sIhoPVf/J9ec+sPxO0=
=n7Ey
-----END PGP SIGNATURE-----

--------------enig18807DDCED9D95807CC3EA6E--