[OpenAFS-devel] short CacheItems reads - AND - vcache locking for afs_InvalidateAllSegments

Chaskiel M Grundman cg2v@andrew.cmu.edu
Fri, 26 Mar 2021 13:14:10 +0000


--_000_58a0f3617d62409087a702fe821330edandrewcmuedu_
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

While investigating a performance issue affecting timeshares at our institu=
tion (which I am provisionally blaming on other clients driving up IO load =
on the fileservers), I encountered a rerun of an issue that's been reported=
 on openafs-info twice before:

[42342.692729] afs: disk cache read error in CacheItems slot 100849 off 806=
7940/8750020 code -5/80
(repeated)

But this one ends differently than https://lists.openafs.org/pipermail/open=
afs-info/2018-October/042576.html or https://lists.openafs.org/pipermail/op=
enafs-info/2020-April/042930.html

[42342.697743] afs: Failed to invalidate cache chunks for fid NNN.NNN.NNN.N=
NN; our local disk cache may be throwing errors. We must invalidate these c=
hunks to avoid possibly serving incorrect data, so we'll retry until we suc=
ceed. If AFS access seems to hang, this may be why.
[42342.697771] openafs: assertion failed: WriteLocked(&tvc->lock), file: /v=
ar/lib/dkms/openafs/1.8.6-2.el7_9/build/src/libafs/MODLOAD-3.10.0-1160.6.1.=
el7.x86_64-SP/afs_daemons.c, line: 606

The first thing I'm going to assert is that this isn't a hardware error. It=
 affects multiple virtual systems, and no IO errors are logged by the kerne=
l.
My assertion is that EIO is coming from osi_rdwr, which will turn a short r=
ead or write into EIO. The supposition of myself and others who have looked=
 at this is that the source of the problem is using ext4 as a cache (and pe=
rhaps also the dedicated cache filesystem being >80% full), and we're remed=
iating that on these systems.


This does leave us with two problems in openafs:

  *   The use of EIO, leading to claims that people have hardware errors wh=
en they may not.
  *   The lock breakage.

For the former, I'd recommend that either the short IOs be logged, or a dif=
ferent code (perhaps ENODATA if available?) used to differentiate it from h=
ardware errors.

For the latter, I believe that there's inconsistency about the locking requ=
irements of  afs_InvalidateAllSegments.
This comment claims the lock is held:
        /*
         * Ask a background daemon to do this request for us. Note that _we=
_ hold
         * the write lock on 'avc', while the background daemon does the wo=
rk. This
         * is a little weird, but it helps avoid any issues with lock order=
ing
         * or if our caller does not expect avc->lock to be dropped while
         * running.
         */
When called from afs_StoreAllSegments's error path, avc->lock is clearly he=
ld, because StoreAllSegments itself downgrades and upgrades the lock.
When called from afs_dentry_iput via afs_InactiveVCache, it seems like it i=
sn't.
None of the callers on any platform seems to lock the cache before calling =
inactive. (unless on some platforms there's aliasing between a VFS level lo=
ck and vc->lock).
afs_remunlink expects to be called with avc unlocked.


--_000_58a0f3617d62409087a702fe821330edandrewcmuedu_
Content-Type: text/html; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable

<html xmlns:v=3D"urn:schemas-microsoft-com:vml" xmlns:o=3D"urn:schemas-micr=
osoft-com:office:office" xmlns:w=3D"urn:schemas-microsoft-com:office:word" =
xmlns:m=3D"http://schemas.microsoft.com/office/2004/12/omml" xmlns=3D"http:=
//www.w3.org/TR/REC-html40">
<head>
<meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3Dus-ascii"=
>
<meta name=3D"Generator" content=3D"Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
	{font-family:Wingdings;
	panose-1:5 0 0 0 0 0 0 0 0 0;}
@font-face
	{font-family:"Cambria Math";
	panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
	{font-family:Calibri;
	panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
	{margin:0in;
	font-size:11.0pt;
	font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
	{mso-style-priority:99;
	color:#0563C1;
	text-decoration:underline;}
p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph
	{mso-style-priority:34;
	margin-top:0in;
	margin-right:0in;
	margin-bottom:0in;
	margin-left:.5in;
	font-size:11.0pt;
	font-family:"Calibri",sans-serif;}
span.EmailStyle17
	{mso-style-type:personal-compose;
	font-family:"Calibri",sans-serif;
	color:windowtext;}
.MsoChpDefault
	{mso-style-type:export-only;
	font-family:"Calibri",sans-serif;}
@page WordSection1
	{size:8.5in 11.0in;
	margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
	{page:WordSection1;}
/* List Definitions */
@list l0
	{mso-list-id:1947300032;
	mso-list-type:hybrid;
	mso-list-template-ids:2076479194 -1836283246 67698691 67698693 67698689 67=
698691 67698693 67698689 67698691 67698693;}
@list l0:level1
	{mso-level-start-at:2021;
	mso-level-number-format:bullet;
	mso-level-text:-;
	mso-level-tab-stop:none;
	mso-level-number-position:left;
	text-indent:-.25in;
	font-family:"Calibri",sans-serif;
	mso-fareast-font-family:Calibri;}
@list l0:level2
	{mso-level-number-format:bullet;
	mso-level-text:o;
	mso-level-tab-stop:none;
	mso-level-number-position:left;
	text-indent:-.25in;
	font-family:"Courier New";}
@list l0:level3
	{mso-level-number-format:bullet;
	mso-level-text:\F0A7;
	mso-level-tab-stop:none;
	mso-level-number-position:left;
	text-indent:-.25in;
	font-family:Wingdings;}
@list l0:level4
	{mso-level-number-format:bullet;
	mso-level-text:\F0B7;
	mso-level-tab-stop:none;
	mso-level-number-position:left;
	text-indent:-.25in;
	font-family:Symbol;}
@list l0:level5
	{mso-level-number-format:bullet;
	mso-level-text:o;
	mso-level-tab-stop:none;
	mso-level-number-position:left;
	text-indent:-.25in;
	font-family:"Courier New";}
@list l0:level6
	{mso-level-number-format:bullet;
	mso-level-text:\F0A7;
	mso-level-tab-stop:none;
	mso-level-number-position:left;
	text-indent:-.25in;
	font-family:Wingdings;}
@list l0:level7
	{mso-level-number-format:bullet;
	mso-level-text:\F0B7;
	mso-level-tab-stop:none;
	mso-level-number-position:left;
	text-indent:-.25in;
	font-family:Symbol;}
@list l0:level8
	{mso-level-number-format:bullet;
	mso-level-text:o;
	mso-level-tab-stop:none;
	mso-level-number-position:left;
	text-indent:-.25in;
	font-family:"Courier New";}
@list l0:level9
	{mso-level-number-format:bullet;
	mso-level-text:\F0A7;
	mso-level-tab-stop:none;
	mso-level-number-position:left;
	text-indent:-.25in;
	font-family:Wingdings;}
ol
	{margin-bottom:0in;}
ul
	{margin-bottom:0in;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext=3D"edit" spidmax=3D"1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext=3D"edit">
<o:idmap v:ext=3D"edit" data=3D"1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang=3D"EN-US" link=3D"#0563C1" vlink=3D"#954F72" style=3D"word-wrap:=
break-word">
<div class=3D"WordSection1">
<p class=3D"MsoNormal">While investigating a performance issue affecting ti=
meshares at our institution (which I am provisionally blaming on other clie=
nts driving up IO load on the fileservers), I encountered a rerun of an iss=
ue that&#8217;s been reported on openafs-info
 twice before:<o:p></o:p></p>
<p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoNormal">[42342.692729] afs: disk cache read error in CacheIt=
ems slot 100849 off 8067940/8750020 code -5/80<o:p></o:p></p>
<p class=3D"MsoNormal">(repeated)<o:p></o:p></p>
<p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoNormal">But this one ends differently than <a href=3D"https:=
//lists.openafs.org/pipermail/openafs-info/2018-October/042576.html">
https://lists.openafs.org/pipermail/openafs-info/2018-October/042576.html</=
a> or <a href=3D"https://lists.openafs.org/pipermail/openafs-info/2020-Apri=
l/042930.html">
https://lists.openafs.org/pipermail/openafs-info/2020-April/042930.html</a>=
<o:p></o:p></p>
<p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoNormal">[42342.697743] afs: Failed to invalidate cache chunk=
s for fid NNN.NNN.NNN.NNN; our local disk cache may be throwing errors. We =
must invalidate these chunks to avoid possibly serving incorrect data, so w=
e'll retry until we succeed. If AFS
 access seems to hang, this may be why.<o:p></o:p></p>
<p class=3D"MsoNormal">[42342.697771] openafs: assertion failed: WriteLocke=
d(&amp;tvc-&gt;lock), file: /var/lib/dkms/openafs/1.8.6-2.el7_9/build/src/l=
ibafs/MODLOAD-3.10.0-1160.6.1.el7.x86_64-SP/afs_daemons.c, line: 606<o:p></=
o:p></p>
<p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoNormal">The first thing I&#8217;m going to assert is that th=
is isn&#8217;t a hardware error. It affects multiple virtual systems, and n=
o IO errors are logged by the kernel.<o:p></o:p></p>
<p class=3D"MsoNormal">My assertion is that EIO is coming from osi_rdwr, wh=
ich will turn a short read or write into EIO. The supposition of myself and=
 others who have looked at this is that the source of the problem is using =
ext4 as a cache (and perhaps also
 the dedicated cache filesystem being &gt;80% full), and we&#8217;re remedi=
ating that on these systems.<o:p></o:p></p>
<p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoNormal">This does leave us with two problems in openafs:<o:p=
></o:p></p>
<ul style=3D"margin-top:0in" type=3D"disc">
<li class=3D"MsoListParagraph" style=3D"margin-left:0in;mso-list:l0 level1 =
lfo1">The use of EIO, leading to claims that people have hardware errors wh=
en they may not.<o:p></o:p></li><li class=3D"MsoListParagraph" style=3D"mar=
gin-left:0in;mso-list:l0 level1 lfo1">The lock breakage.<o:p></o:p></li></u=
l>
<p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoNormal">For the former, I&#8217;d recommend that either the =
short IOs be logged, or a different code (perhaps ENODATA if available?) us=
ed to differentiate it from hardware errors.<o:p></o:p></p>
<p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
<p class=3D"MsoNormal">For the latter, I believe that there&#8217;s inconsi=
stency about the locking requirements of &nbsp;afs_InvalidateAllSegments.<o=
:p></o:p></p>
<p class=3D"MsoNormal">This comment claims the lock is held:<o:p></o:p></p>
<p class=3D"MsoNormal">&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;/*<o:p></=
o:p></p>
<p class=3D"MsoNormal">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; * A=
sk a background daemon to do this request for us. Note that _we_ hold<o:p><=
/o:p></p>
<p class=3D"MsoNormal">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; * t=
he write lock on 'avc', while the background daemon does the work. This<o:p=
></o:p></p>
<p class=3D"MsoNormal">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; * i=
s a little weird, but it helps avoid any issues with lock ordering<o:p></o:=
p></p>
<p class=3D"MsoNormal">&nbsp;&nbsp;&nbsp;&nbsp; &nbsp;&nbsp;&nbsp;&nbsp;* o=
r if our caller does not expect avc-&gt;lock to be dropped while<o:p></o:p>=
</p>
<p class=3D"MsoNormal">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; * r=
unning.<o:p></o:p></p>
<p class=3D"MsoNormal">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; */<=
o:p></o:p></p>
<p class=3D"MsoNormal">When called from afs_StoreAllSegments&#8217;s error =
path, avc-&gt;lock is clearly held, because StoreAllSegments itself downgra=
des and upgrades the lock.<o:p></o:p></p>
<p class=3D"MsoNormal">When called from afs_dentry_iput via afs_InactiveVCa=
che, it seems like it isn&#8217;t.<o:p></o:p></p>
<p class=3D"MsoNormal">None of the callers on any platform seems to lock th=
e cache before calling inactive. (unless on some platforms there&#8217;s al=
iasing between a VFS level lock and vc-&gt;lock).<o:p></o:p></p>
<p class=3D"MsoNormal">afs_remunlink expects to be called with avc unlocked=
.<o:p></o:p></p>
<p class=3D"MsoNormal"><o:p>&nbsp;</o:p></p>
</div>
</body>
</html>

--_000_58a0f3617d62409087a702fe821330edandrewcmuedu_--