From cg2v@andrew.cmu.edu Fri Mar 26 14:14:10 2021 From: cg2v@andrew.cmu.edu (Chaskiel M Grundman) Date: Fri, 26 Mar 2021 13:14:10 +0000 Subject: [OpenAFS-devel] short CacheItems reads - AND - vcache locking for afs_InvalidateAllSegments Message-ID: <58a0f3617d62409087a702fe821330ed@andrew.cmu.edu> --_000_58a0f3617d62409087a702fe821330edandrewcmuedu_ Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable While investigating a performance issue affecting timeshares at our institu= tion (which I am provisionally blaming on other clients driving up IO load = on the fileservers), I encountered a rerun of an issue that's been reported= on openafs-info twice before: [42342.692729] afs: disk cache read error in CacheItems slot 100849 off 806= 7940/8750020 code -5/80 (repeated) But this one ends differently than https://lists.openafs.org/pipermail/open= afs-info/2018-October/042576.html or https://lists.openafs.org/pipermail/op= enafs-info/2020-April/042930.html [42342.697743] afs: Failed to invalidate cache chunks for fid NNN.NNN.NNN.N= NN; our local disk cache may be throwing errors. We must invalidate these c= hunks to avoid possibly serving incorrect data, so we'll retry until we suc= ceed. If AFS access seems to hang, this may be why. [42342.697771] openafs: assertion failed: WriteLocked(&tvc->lock), file: /v= ar/lib/dkms/openafs/1.8.6-2.el7_9/build/src/libafs/MODLOAD-3.10.0-1160.6.1.= el7.x86_64-SP/afs_daemons.c, line: 606 The first thing I'm going to assert is that this isn't a hardware error. It= affects multiple virtual systems, and no IO errors are logged by the kerne= l. My assertion is that EIO is coming from osi_rdwr, which will turn a short r= ead or write into EIO. The supposition of myself and others who have looked= at this is that the source of the problem is using ext4 as a cache (and pe= rhaps also the dedicated cache filesystem being >80% full), and we're remed= iating that on these systems. This does leave us with two problems in openafs: * The use of EIO, leading to claims that people have hardware errors wh= en they may not. * The lock breakage. For the former, I'd recommend that either the short IOs be logged, or a dif= ferent code (perhaps ENODATA if available?) used to differentiate it from h= ardware errors. For the latter, I believe that there's inconsistency about the locking requ= irements of afs_InvalidateAllSegments. This comment claims the lock is held: /* * Ask a background daemon to do this request for us. Note that _we= _ hold * the write lock on 'avc', while the background daemon does the wo= rk. This * is a little weird, but it helps avoid any issues with lock order= ing * or if our caller does not expect avc->lock to be dropped while * running. */ When called from afs_StoreAllSegments's error path, avc->lock is clearly he= ld, because StoreAllSegments itself downgrades and upgrades the lock. When called from afs_dentry_iput via afs_InactiveVCache, it seems like it i= sn't. None of the callers on any platform seems to lock the cache before calling = inactive. (unless on some platforms there's aliasing between a VFS level lo= ck and vc->lock). afs_remunlink expects to be called with avc unlocked. --_000_58a0f3617d62409087a702fe821330edandrewcmuedu_ Content-Type: text/html; charset="us-ascii" Content-Transfer-Encoding: quoted-printable
While investigating a performance issue affecting ti=
meshares at our institution (which I am provisionally blaming on other clie=
nts driving up IO load on the fileservers), I encountered a rerun of an iss=
ue that’s been reported on openafs-info
twice before:
[42342.692729] afs: disk cache read error in CacheIt=
ems slot 100849 off 8067940/8750020 code -5/80
(repeated)
But this one ends differently than
https://lists.openafs.org/pipermail/openafs-info/2018-October/042576.html=
a> or
https://lists.openafs.org/pipermail/openafs-info/2020-April/042930.html=
[42342.697743] afs: Failed to invalidate cache chunk=
s for fid NNN.NNN.NNN.NNN; our local disk cache may be throwing errors. We =
must invalidate these chunks to avoid possibly serving incorrect data, so w=
e'll retry until we succeed. If AFS
access seems to hang, this may be why.
[42342.697771] openafs: assertion failed: WriteLocke=
d(&tvc->lock), file: /var/lib/dkms/openafs/1.8.6-2.el7_9/build/src/l=
ibafs/MODLOAD-3.10.0-1160.6.1.el7.x86_64-SP/afs_daemons.c, line: 606
The first thing I’m going to assert is that th=
is isn’t a hardware error. It affects multiple virtual systems, and n=
o IO errors are logged by the kernel.
My assertion is that EIO is coming from osi_rdwr, wh=
ich will turn a short read or write into EIO. The supposition of myself and=
others who have looked at this is that the source of the problem is using =
ext4 as a cache (and perhaps also
the dedicated cache filesystem being >80% full), and we’re remedi=
ating that on these systems.
This does leave us with two problems in openafs:
For the former, I’d recommend that either the =
short IOs be logged, or a different code (perhaps ENODATA if available?) us=
ed to differentiate it from hardware errors.
For the latter, I believe that there’s inconsi=
stency about the locking requirements of afs_InvalidateAllSegments.
This comment claims the lock is held:
/*
* A=
sk a background daemon to do this request for us. Note that _we_ hold
* t=
he write lock on 'avc', while the background daemon does the work. This
* i=
s a little weird, but it helps avoid any issues with lock ordering
* o=
r if our caller does not expect avc->lock to be dropped while
* r=
unning.
*/<= o:p>
When called from afs_StoreAllSegments’s error =
path, avc->lock is clearly held, because StoreAllSegments itself downgra=
des and upgrades the lock.
When called from afs_dentry_iput via afs_InactiveVCa=
che, it seems like it isn’t.
None of the callers on any platform seems to lock th=
e cache before calling inactive. (unless on some platforms there’s al=
iasing between a VFS level lock and vc->lock).
afs_remunlink expects to be called with avc unlocked=
.