[OpenAFS] Clients are blocked with error code -3 of RXAFSCB_ProbeUuid

huangql huangql@ihep.ac.cn
Tue, 28 Apr 2020 10:30:50 +0800


This is a multi-part message in MIME format.

------=_001_NextPart628525624288_=----
Content-Type: text/plain;
	charset="UTF-8"
Content-Transfer-Encoding: base64

SGVsbG8gQmVuLA0KDQpUaGFuayB5b3UgZm9yIHlvdXIgcmVwbHkuDQoNCkFjdHVhbGx5LCBvdXIg
ZmFybSBleHBlcmllbmNlcyB0aGlzIGlzc3VlIGZvciBzb21lIHRpbWUuIEFuZCB3ZSBzcGVudCBh
IGxvdCBvZiB0aW1lIHRvIGZpZ3VyZSBvdXQgaXQuIFdlIGZvdW5kIHdoZW4gdGhlcmUgaXMgbGFy
Z2UgSU8gdGhyb3VnaHB1dCB0byBjb25zdW1lIHRoZSBuZXR3b3JrIGJhbmR3aWR0aCBhbmQgdGhl
cmUgYXJlIG1hbnkgIG5ldHdvcmsgcGFja2FnZSBsb3N0cywgdGhlIGlzc3VlIGlzIG1vcmUgc2Vy
aW91cy4gIEFmdGVyIHdlIGNvbmZpZ3VyZWQgYSBzZXBhcmF0ZSBuZXR3b3JrIGludGVyZmFjZSBm
b3IgY2xpZW50IG1hY2hpbmVzIGluIE5ldEluZm8gZmlsZS4gVGhpcyBzeW1wdG9tIGNoYW5nZWQg
YmV0dGVyLiBCdXQgdGhlIGlzc3VlIHN0aWxsIGV4aXN0cy4NCg0KQnV0IHdlIGFsbCB0aGluayBp
dCBkb2VzIG5vdCBwcm9jZXNzZWQgd2VsbCBpbiB0aGlzIGNhc2UuIFRoZSBjbGllbnQgc2hvdWxk
IG5vdCBiZSBibG9ja2VkIHJhdGhlciB0aGFuIHJlcG9ydCAidGltZW91dCIgYW5kIGV4aXQuDQoN
ClRoZSBvcGVuYWZzIHZlcnNpb24gd2UgdXNlZCBsaXN0ZWQgYmVsb3c6DQoNClNldmVyIHNpZGU6
IE9wZW5BRlMgMS42LjExDQoNCkNsaWVudCBzaWRlOiBPcGVuYWZzLTEuNi4yMw0KDQpBbnkgY29t
bWVudHMgb3Igc3VnZ2VzdGlvbnMgd2lsbCBiZSBncmF0ZWZ1bC4NCg0KDQpXaXNoZXMsDQpRaXVs
YW4NCg0KDQoNCmh1YW5ncWwNCj09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09
PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09DQpDb21wdXRpbmcgY2VudGVyLHRoZSBJbnN0
aXR1dGUgb2YgSGlnaCBFbmVyZ3kgUGh5c2ljcywgQ0FTLCBDaGluYQ0KUWl1bGFuIEh1YW5nICAg
ICAgICAgICAgICAgICAgICAgICBUZWw6ICgrODYpIDEwIDg4MjMgNjA4Nw0KUC5PLiBCb3ggOTE4
LTcgICAgICAgICAgICAgICAgICAgICAgIEZheDogKCs4NikgMTAgODgyMyA2ODM5DQpCZWlqaW5n
IDEwMDA0OSAgUC5SLiBDaGluYSAgICAgICAgICAgRW1haWw6IGh1YW5ncWxAaWhlcC5hYy5jbg0K
PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09PT09
PT09PT09PT09PQ0KIA0KRnJvbTogQmVuamFtaW4gS2FkdWsNCkRhdGU6IDIwMjAtMDQtMjggMDY6
MjkNClRvOiBodWFuZ3FsDQpDQzogb3BlbmFmcy1pbmZvOyBodXFiDQpTdWJqZWN0OiBSZTogW09w
ZW5BRlNdIENsaWVudHMgYXJlIGJsb2NrZWQgd2l0aCBlcnJvciBjb2RlIC0zIG9mIFJYQUZTQ0Jf
UHJvYmVVdWlkDQpPbiBNb24sIEFwciAyNywgMjAyMCBhdCAwOToxNjoxNEFNICswODAwLCBodWFu
Z3FsIHdyb3RlOg0KPiBIZWxsbyBBbGwsDQo+IA0KPiANCj4gV2UgZm91bmQgc29tZSBjbGllbnRz
IGJsb2NrZWQuIEFuZCBubyBtb3JlIG9wZXJhdGlvbnMgYXJlIGF2YWlsYWJsZSB1bmRlciAvYWZz
IGluc3RhbmNlIGxpa2Ug4oCcY2TigJ0ibHMiLCBhbGwgb2Ygd2hpY2ggYXJlIGJsb2NrZWQuDQo+
IA0KPiBXZSBjYW4gc2VlIHNvbWUgbG9nIG1lc3NhZ2Ugb24gc2VydmVyIHNpZGUgdG8ga25vdyB0
aGUgZXJyb3IgY29kZSAtMw0KPiANCj4gDQo+IE1vbiBBcHIgMjcgMDg6MDA6MzQgMjAyMCBDaGVj
a0hvc3RfcjogUHJvYmluZyBhbGwgaW50ZXJmYWNlcyBvZiBob3N0IDE5Mi4xNjguNjMuMTk0Ojcw
MDEgZmFpbGVkLCBjb2RlIC0zDQo+IE1vbiBBcHIgMjcgMDg6MDc6MzcgMjAyMCBDaGVja0hvc3Rf
cjogUHJvYmluZyBhbGwgaW50ZXJmYWNlcyBvZiBob3N0IDE5Mi4xNjguNjMuMjE5OjcwMDEgZmFp
bGVkLCBjb2RlIC0zDQo+IA0KPiBJdCBmYWlsZWQgdG8gcmVzdGFydCBhZnMgc2VydmljZSB0byBy
ZXN1bWUgdGhlIC9hZnMgZXhjZXB0aW5nIHJlc3RhcnRpbmcgdGhlIGNsaWVudCBub2Rlcy4NCj4g
DQo+IERvZXMgc29tZW9uZSBoYXZlIHRoZSBzaW1pbGFyIGNhc2VzPyBBbnkgc3VnZ2VzdGlvbnMg
d291bGQgYmUgYXBwcmVjaWF0ZWQuIFRoYW5rcy4NCiANClRoYXQncyBhbiBpbnRlcmVzdGluZyBl
cnJvciBjb2RlIHRvIGJlIHNlZWluZzsNCmh0dHBzOi8vd3d3LmNlbnRyYWwub3JnL3BhZ2VzL251
bWJlcnMvZXJyb3JzLmh0bWwgc2hvd3MgLTMgYXMNClJYX0NBTExfVElNRU9VVCwgd2hpY2ggZG9l
cyBub3Qgc2VlbSB0byBtYXRjaCB5b3VyIGRlc2NyaXB0aW9uIG9mIHRoZQ0KaXNzdWUuICBBIGJy
aWVmIGdsYW5jZSBhdCB0aGUgY29kZSBpbmRpY2F0ZXMgdGhhdCB3ZSBjYW4gYWxzbyBnZW5lcmF0
ZSB0aGlzDQplcnJvciBsb2NhbGx5IGlmIG91ciBjbG9jayBpcyBtb3ZpbmcgYmFja3dhcmRzIGEg
bG90Lg0KIA0KSSBkb24ndCBleHBlY3QgdGhlIGFib3ZlIHRvIGJlIGhlbHBmdWwsIGFuZCBkb24n
dCByZWNhbGwgYW55IHNpbWlsYXIgY2FzZXMsDQpidXQgZmlndXJlZCBpdCBpcyBiZXR0ZXIgdG8g
cmVwbHkgd2l0aCB3aGF0IGxpdHRsZSBJIGtub3cgdGhhbiB0byBsZWF2ZQ0KeW91ciBtZXNzYWdl
IHdpdGggbm8gcmVwbHkuDQogDQotQmVuDQpfX19fX19fX19fX19fX19fX19fX19fX19fX19fX19f
X19fX19fX19fX19fX19fXw0KT3BlbkFGUy1pbmZvIG1haWxpbmcgbGlzdA0KT3BlbkFGUy1pbmZv
QG9wZW5hZnMub3JnDQpodHRwczovL2xpc3RzLm9wZW5hZnMub3JnL21haWxtYW4vbGlzdGluZm8v
b3BlbmFmcy1pbmZvDQo=

------=_001_NextPart628525624288_=----
Content-Type: text/html;
	charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<html><head><meta http-equiv=3D"content-type" content=3D"text/html; charse=
t=3DUTF-8"><style>body { line-height: 1.5; }blockquote { margin-top: 0px; =
margin-bottom: 0px; margin-left: 0.5em; }body { font-size: 10.5pt; font-fa=
mily: =E5=BE=AE=E8=BD=AF=E9=9B=85=E9=BB=91; color: rgb(0, 0, 0); line-heig=
ht: 1.5; }</style></head><body>=0A<div>Hello Ben,<span></span></div><div><=
br></div><div>Thank you for your reply.</div><div><br></div><div>Actually,=
 our farm experiences this issue for some time. And we spent a lot of time=
 to figure out it. We found when there is large IO throughput to consume t=
he network bandwidth and there are many &nbsp;network package losts, the i=
ssue is more serious. &nbsp;After we configured a separate network interfa=
ce for client machines in NetInfo file. This symptom changed better. But t=
he issue still exists.</div><div><br></div><div>But we all think it does n=
ot processed well in this case. The client should not be blocked rather th=
an report "timeout" and exit.</div><div><br></div><div>The openafs version=
 we used listed below:</div><div><br></div><div>Sever side:&nbsp;<span sty=
le=3D"background-color: rgba(0, 0, 0, 0); font-size: 10.5pt; line-height: =
1.5;">OpenAFS&nbsp;1.6.11</span></div><div><span style=3D"background-color=
: rgba(0, 0, 0, 0); font-size: 10.5pt; line-height: 1.5;"><br></span></div=
><div><span style=3D"background-color: rgba(0, 0, 0, 0); font-size: 10.5pt=
; line-height: 1.5;">Client side: Openafs-</span><span style=3D"background=
-color: rgba(0, 0, 0, 0); font-size: 10.5pt; line-height: 1.5;">1.6.23</sp=
an></div><div><br></div><div>Any comments or suggestions will be grateful.=
</div><div><br></div><div><br></div><div>Wishes,</div><div>Qiulan</div><di=
v><br></div>=0A<hr style=3D"width: 210px; height: 1px;" align=3D"left" col=
or=3D"#b5c4df" size=3D"1">=0A<div><span><div style=3D"MARGIN: 10px; FONT-F=
AMILY: verdana; FONT-SIZE: 10pt"><div>huangql</div></div></span></div><div=
>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D<br>Computing =
center,the Institute of High Energy Physics, CAS, China<br>Qiulan Huang &n=
bsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp;=
 Tel: (+86) 10 8823 6087<br>P.O. Box 918-7 &nbsp; &nbsp; &nbsp; &nbsp; &nb=
sp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Fax: (+86) 10 8823 6839<br>B=
eijing 100049 &nbsp;P.R. China &nbsp; &nbsp; &nbsp; &nbsp; &nbsp; Email: h=
uangql@ihep.ac.cn<br>=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D</div>=0A<blockquote style=3D"margin-Top: 0px; margin-Bottom: 0px; m=
argin-Left: 0.5em"><div>&nbsp;</div><div style=3D"border:none;border-top:s=
olid #B5C4DF 1.0pt;padding:3.0pt 0cm 0cm 0cm"><div style=3D"PADDING-RIGHT:=
 8px; PADDING-LEFT: 8px; FONT-SIZE: 12px;FONT-FAMILY:tahoma;COLOR:#000000;=
 BACKGROUND: #efefef; PADDING-BOTTOM: 8px; PADDING-TOP: 8px"><div><b>From:=
</b>&nbsp;<a href=3D"mailto:kaduk@mit.edu">Benjamin Kaduk</a></div><div><b=
>Date:</b>&nbsp;2020-04-28&nbsp;06:29</div><div><b>To:</b>&nbsp;<a href=3D=
"mailto:huangql@ihep.ac.cn">huangql</a></div><div><b>CC:</b>&nbsp;<a href=
=3D"mailto:openafs-info@openafs.org">openafs-info</a>; <a href=3D"mailto:h=
uqb@ihep.ac.cn">huqb</a></div><div><b>Subject:</b>&nbsp;Re: [OpenAFS] Clie=
nts are blocked with error code -3 of RXAFSCB_ProbeUuid</div></div></div><=
div><div>On Mon, Apr 27, 2020 at 09:16:14AM +0800, huangql wrote:</div>=0A=
<div>&gt; Hello All,</div>=0A<div>&gt; </div>=0A<div>&gt; </div>=0A<div>&g=
t; We found some clients blocked. And no more operations are available und=
er /afs instance like =E2=80=9Ccd=E2=80=9D"ls", all of which are blocked.<=
/div>=0A<div>&gt; </div>=0A<div>&gt; We can see some log message on server=
 side to know the error code -3</div>=0A<div>&gt; </div>=0A<div>&gt; </div=
>=0A<div>&gt; Mon Apr 27 08:00:34 2020 CheckHost_r: Probing all interfaces=
 of host 192.168.63.194:7001 failed, code -3</div>=0A<div>&gt; Mon Apr 27 =
08:07:37 2020 CheckHost_r: Probing all interfaces of host 192.168.63.219:7=
001 failed, code -3</div>=0A<div>&gt; </div>=0A<div>&gt; It failed to rest=
art afs service to resume the /afs excepting restarting the client nodes.<=
/div>=0A<div>&gt; </div>=0A<div>&gt; Does someone have the similar cases? =
Any suggestions would be appreciated. Thanks.</div>=0A<div>&nbsp;</div>=0A=
<div>That's an interesting error code to be seeing;</div>=0A<div>https://w=
ww.central.org/pages/numbers/errors.html shows -3 as</div>=0A<div>RX_CALL_=
TIMEOUT, which does not seem to match your description of the</div>=0A<div=
>issue.&nbsp; A brief glance at the code indicates that we can also genera=
te this</div>=0A<div>error locally if our clock is moving backwards a lot.=
</div>=0A<div>&nbsp;</div>=0A<div>I don't expect the above to be helpful, =
and don't recall any similar cases,</div>=0A<div>but figured it is better =
to reply with what little I know than to leave</div>=0A<div>your message w=
ith no reply.</div>=0A<div>&nbsp;</div>=0A<div>-Ben</div>=0A<div>_________=
______________________________________</div>=0A<div>OpenAFS-info mailing l=
ist</div>=0A<div>OpenAFS-info@openafs.org</div>=0A<div>https://lists.opena=
fs.org/mailman/listinfo/openafs-info</div>=0A</div></blockquote></body></h=
tml>
------=_001_NextPart628525624288_=------