[OpenAFS] Client connection failure: bos failed to contact host's bosserver
(communication failure (-1))
Ximeng (Simon) Guan
xmgu@royole.com
Mon, 7 Jan 2019 19:40:36 +0000
--_000_ea613baf0abc4562a2ec61cfa1b7e255royolecom_
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Hello,
After a power outage on Christmas Eve which forced two database servers and=
all the network switches in one of our offices to re-boot, our laptop clie=
nts in that office can no longer connect to one of the AFS servers hosted i=
n the same office.
I am leaning towards the possibility that it is a network problem instead o=
f an OpenAFS service problem because:
1. Remote offices can access the full AFS space, including those volumes=
hosted on the re-booted servers.
2. Between the servers there is no access problem. Nothing wrong with th=
e result of "bos status", "rxdebug" or "udebug". "fs checkservers" show tha=
t all servers are running.
3. On the problematic laptops "fs checkservers" show that "All servers a=
re running".
4. On the problematic laptops "bos status afssrv1" returns a message:
"bos: failed to contact host's bosserver (communications failure (-1))."
But on the servers both in that office and in the remote offices, the same =
command shows that all services are up:
"Instance ptserver, currently running normally.
Instance vlserver, currently running normally.
Instance buserver, currently running normally.
Instance upserver, currently running normally.
Instance backupusers, currently running normally.
Auxiliary status is: run next at Tue Jan 8 04:00:00 2019.
Instance dafs, currently running normally.
Auxiliary status is: file server running."
1. On the problematic laptops "rxdebug afssrv1 -port 7000" returns *norm=
al* output, for example:
"Trying 10.12.8.33 (port 7000):
Free packets: 2073/6357, packet reclaims: 3, calls: 81, used FDs: 36
not waiting for packets.
0 calls waiting for a thread
125 threads are idle
1 calls have waited for a thread
Connection from host 10.9.119.50, port 7001, Cuid ae06e5b3/70fe0104
serial 12, natMTU 1344, security index 0, client conn
call 0: # 4, state dally, mode: receiving, flags: receive_done
call 1: # 0, state not initialized
call 2: # 0, state not initialized
call 3: # 0, state not initialized
Connection from host 10.12.4.74, port 7001, Cuid ae06e5b3/70fe0114
serial 21, natMTU 1344, security index 0, client conn
call 0: # 7, state dally, mode: receiving, flags: receive_done
call 1: # 0, state not initialized
call 2: # 0, state not initialized
call 3: # 0, state not initialized
Done."
I do not administer the network. Can I have some advice on how to futher de=
bug the connection problem? Which udp port does the command "bos status" us=
e?
Thank you!
Best regards,
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
Ximeng (Simon) Guan, Ph.D.
Associate Principal Engineer
Royole Corporation
48025 Fremont Blvd, Fremont, CA 94538
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D
--_000_ea613baf0abc4562a2ec61cfa1b7e255royolecom_
Content-Type: text/html; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
<html xmlns:v=3D"urn:schemas-microsoft-com:vml" xmlns:o=3D"urn:schemas-micr=
osoft-com:office:office" xmlns:w=3D"urn:schemas-microsoft-com:office:word" =
xmlns:m=3D"http://schemas.microsoft.com/office/2004/12/omml" xmlns=3D"http:=
//www.w3.org/TR/REC-html40">
<head>
<meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3Dus-ascii"=
>
<meta name=3D"Generator" content=3D"Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:DengXian;
panose-1:2 1 6 0 3 1 1 1 1 1;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:"\@DengXian";
panose-1:2 1 6 0 3 1 1 1 1 1;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:#0563C1;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:#954F72;
text-decoration:underline;}
p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph
{mso-style-priority:34;
margin-top:0in;
margin-right:0in;
margin-bottom:0in;
margin-left:.5in;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
span.EmailStyle17
{mso-style-type:personal-compose;
font-family:"Calibri",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-family:"Calibri",sans-serif;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
/* List Definitions */
@list l0
{mso-list-id:1148977309;
mso-list-type:hybrid;
mso-list-template-ids:760509940 67698703 67698713 67698715 67698703 676987=
13 67698715 67698703 67698713 67698715;}
@list l0:level1
{mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;}
@list l0:level2
{mso-level-number-format:alpha-lower;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;}
@list l0:level3
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
text-indent:-9.0pt;}
@list l0:level4
{mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;}
@list l0:level5
{mso-level-number-format:alpha-lower;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;}
@list l0:level6
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
text-indent:-9.0pt;}
@list l0:level7
{mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;}
@list l0:level8
{mso-level-number-format:alpha-lower;
mso-level-tab-stop:none;
mso-level-number-position:left;
text-indent:-.25in;}
@list l0:level9
{mso-level-number-format:roman-lower;
mso-level-tab-stop:none;
mso-level-number-position:right;
text-indent:-9.0pt;}
ol
{margin-bottom:0in;}
ul
{margin-bottom:0in;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext=3D"edit" spidmax=3D"1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext=3D"edit">
<o:idmap v:ext=3D"edit" data=3D"1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang=3D"EN-US" link=3D"#0563C1" vlink=3D"#954F72">
<div class=3D"WordSection1">
<p class=3D"MsoNormal"><span style=3D"font-size:12.0pt">Hello,<o:p></o:p></=
span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:12.0pt"><o:p> </o:p></=
span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:12.0pt">After a power outag=
e on Christmas Eve which forced two database servers and all the network sw=
itches in one of our offices to re-boot, our laptop clients in that office =
can no longer connect to one of the
AFS servers hosted in the same office. <o:p></o:p></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:12.0pt"><o:p> </o:p></=
span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:12.0pt">I am leaning toward=
s the possibility that it is a network problem instead of an OpenAFS servic=
e problem because:<o:p></o:p></span></p>
<ol style=3D"margin-top:0in" start=3D"1" type=3D"1">
<li class=3D"MsoListParagraph" style=3D"margin-left:0in;mso-list:l0 level1 =
lfo1"><span style=3D"font-size:12.0pt">Remote offices can access the full A=
FS space, including those volumes hosted on the re-booted servers.
<o:p></o:p></span></li><li class=3D"MsoListParagraph" style=3D"margin-left:=
0in;mso-list:l0 level1 lfo1"><span style=3D"font-size:12.0pt">Between the s=
ervers there is no access problem. Nothing wrong with the result of “=
bos status”, “rxdebug” or “udebug”. “fs=
checkservers” show that all
servers are running. <o:p></o:p></span></li><li class=3D"MsoListParagraph"=
style=3D"margin-left:0in;mso-list:l0 level1 lfo1"><span style=3D"font-size=
:12.0pt">On the problematic laptops “fs checkservers” show that=
“All servers are running”.<o:p></o:p></span></li><li class=3D"=
MsoListParagraph" style=3D"margin-left:0in;mso-list:l0 level1 lfo1"><span s=
tyle=3D"font-size:12.0pt">On the problematic laptops “bos status afss=
rv1” returns a message:<o:p></o:p></span></li></ol>
<p class=3D"MsoListParagraph"><span style=3D"font-size:12.0pt">“bos</=
span><span style=3D"font-size:12.0pt">: failed to contact host's bosserver =
(communications failure (-1)).”<o:p></o:p></span></p>
<p class=3D"MsoListParagraph"><span style=3D"font-size:12.0pt">But on the s=
ervers both in that office and in the remote offices, the same command show=
s that all services are up:<o:p></o:p></span></p>
<p class=3D"MsoListParagraph"><span style=3D"font-size:12.0pt">“Insta=
nce ptserver, currently running normally.<o:p></o:p></span></p>
<p class=3D"MsoListParagraph"><span style=3D"font-size:12.0pt">Instance vls=
erver, currently running normally.<o:p></o:p></span></p>
<p class=3D"MsoListParagraph"><span style=3D"font-size:12.0pt">Instance bus=
erver, currently running normally.<o:p></o:p></span></p>
<p class=3D"MsoListParagraph"><span style=3D"font-size:12.0pt">Instance ups=
erver, currently running normally.<o:p></o:p></span></p>
<p class=3D"MsoListParagraph"><span style=3D"font-size:12.0pt">Instance bac=
kupusers, currently running normally.<o:p></o:p></span></p>
<p class=3D"MsoListParagraph"><span style=3D"font-size:12.0pt"> =
Auxiliary status is: run next at Tue Jan 8 04:00:00 2019.<o:p>=
</o:p></span></p>
<p class=3D"MsoListParagraph"><span style=3D"font-size:12.0pt">Instance daf=
s, currently running normally.<o:p></o:p></span></p>
<p class=3D"MsoListParagraph" style=3D"text-indent:10.5pt"><span style=3D"f=
ont-size:12.0pt">Auxiliary status is: file server running.”<o:p></o:p=
></span></p>
<ol style=3D"margin-top:0in" start=3D"5" type=3D"1">
<li class=3D"MsoListParagraph" style=3D"margin-left:0in;mso-list:l0 level1 =
lfo1"><span style=3D"font-size:12.0pt">On the problematic laptops “rx=
debug afssrv1 -port 7000” returns *<b>normal</b>* output, for example=
:<o:p></o:p></span></li></ol>
<p class=3D"MsoListParagraph"><span style=3D"font-size:12.0pt">“Tryin=
g 10.12.8.33 (port 7000):<o:p></o:p></span></p>
<p class=3D"MsoListParagraph"><span style=3D"font-size:12.0pt">Free packets=
: 2073/6357, packet reclaims: 3, calls: 81, used FDs: 36<o:p></o:p></span><=
/p>
<p class=3D"MsoListParagraph"><span style=3D"font-size:12.0pt">not waiting =
for packets.</span><span style=3D"font-size:12.0pt"><o:p></o:p></span></p>
<p class=3D"MsoListParagraph"><span style=3D"font-size:12.0pt">0 calls wait=
ing for a thread<o:p></o:p></span></p>
<p class=3D"MsoListParagraph"><span style=3D"font-size:12.0pt">125 threads =
are idle<o:p></o:p></span></p>
<p class=3D"MsoListParagraph"><span style=3D"font-size:12.0pt">1</span><spa=
n style=3D"font-size:12.0pt"> calls have waited for a thread<o:p></o:p></sp=
an></p>
<p class=3D"MsoListParagraph"><span style=3D"font-size:12.0pt">Connection f=
rom host 10.9.119.50, port 7001, Cuid ae06e5b3/70fe0104<o:p></o:p></span></=
p>
<p class=3D"MsoListParagraph"><span style=3D"font-size:12.0pt"> seria=
l 12, natMTU 1344, security index 0, client conn<o:p></o:p></span></p=
>
<p class=3D"MsoListParagraph"><span style=3D"font-size:12.0pt"> =
call 0: # 4, state dally, mode: receiving, flags: receive_done<o:p><=
/o:p></span></p>
<p class=3D"MsoListParagraph"><span style=3D"font-size:12.0pt"> =
call 1: # 0, state not initialized<o:p></o:p></span></p>
<p class=3D"MsoListParagraph"><span style=3D"font-size:12.0pt"> =
call 2: # 0, state not initialized<o:p></o:p></span></p>
<p class=3D"MsoListParagraph"><span style=3D"font-size:12.0pt"> =
call 3: # 0, state not initialized<o:p></o:p></span></p>
<p class=3D"MsoListParagraph"><span style=3D"font-size:12.0pt">Connection f=
rom host 10.12.4.74, port 7001, Cuid ae06e5b3/70fe0114<o:p></o:p></span></p=
>
<p class=3D"MsoListParagraph"><span style=3D"font-size:12.0pt"> seria=
l 21, natMTU 1344, security index 0, client conn<o:p></o:p></span></p=
>
<p class=3D"MsoListParagraph"><span style=3D"font-size:12.0pt"> =
call 0: # 7, state dally, mode: receiving, flags: receive_done<o:p><=
/o:p></span></p>
<p class=3D"MsoListParagraph"><span style=3D"font-size:12.0pt"> =
call 1: # 0, state not initialized<o:p></o:p></span></p>
<p class=3D"MsoListParagraph"><span style=3D"font-size:12.0pt"> =
call 2: # 0, state not initialized<o:p></o:p></span></p>
<p class=3D"MsoListParagraph"><span style=3D"font-size:12.0pt"> =
call 3: # 0, state not initialized<o:p></o:p></span></p>
<p class=3D"MsoListParagraph"><span style=3D"font-size:12.0pt">Done.”=
<o:p></o:p></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:12.0pt"><o:p> </o:p></=
span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:12.0pt">I do not administer=
the network. Can I have some advice on how to futher debug the connection =
problem? Which udp port does the command “bos status” use?<o:p>=
</o:p></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:12.0pt"><o:p> </o:p></=
span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:12.0pt">Thank you!<o:p></o:=
p></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:12.0pt"><o:p> </o:p></=
span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:12.0pt;font-family:"Ti=
mes New Roman",serif">Best regards,<o:p></o:p></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:12.0pt;font-family:"Ti=
mes New Roman",serif">=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D<o:=
p></o:p></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:12.0pt;font-family:"Ti=
mes New Roman",serif">Ximeng (Simon) Guan, Ph.D.<o:p></o:p></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:12.0pt;font-family:"Ti=
mes New Roman",serif">Associate Principal Engineer<o:p></o:p></span></=
p>
<p class=3D"MsoNormal"><span style=3D"font-size:12.0pt;font-family:"Ti=
mes New Roman",serif">Royole Corporation<o:p></o:p></span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:12.0pt;font-family:"Ti=
mes New Roman",serif">48025 Fremont Blvd, Fremont, CA 94538<o:p></o:p>=
</span></p>
<p class=3D"MsoNormal"><span style=3D"font-size:12.0pt;font-family:"Ti=
mes New Roman",serif">=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D<o:=
p></o:p></span></p>
<p class=3D"MsoNormal"><o:p> </o:p></p>
</div>
</body>
</html>
--_000_ea613baf0abc4562a2ec61cfa1b7e255royolecom_--