[OpenAFS] token loss?

Andrei Maslennikov andrei.maslennikov@gmail.com
Mon, 5 Dec 2005 10:22:59 +0100


------=_Part_36695_28551459.1133774579073
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

On 12/5/05, Ryan Charles Underwood <rcuca4@umr.edu> wrote:
>
> Nope, they are definitely synced up.  Does anyone have any ideas how
> I can dig into this further?  Right now backups are non-functional
> because of it.
>
>
We have seen errors probably of a similar origin during AFS backups.
Our TCs were running 1.2.13, and fileservers were already 1.4.0. The
error manifested itself in a loss of token after several successfully
backed up volumes, all subsequent backup operations were failing with
a TExxx message similar to this:

  Sun Nov 27 00:38:40: Task 4002: Volume foo.bar (536912108) failed
  rxk: sealed data inconsistent

The problem could be cured by "vos remove *.backup" followed by
new "vos backupsys", but it would be reppearing again within 2-3 days.

We have now migrated the TCs to 1.4.0, and made sure that all machines
involved are connected to the same GigE switch. There was not a single
problem since 5 days, we are now monitoring it. But the problem is
apparently
there and may show up again. I believe it should have to do something with
timing, sort of a race condition which only pops up on a fast network. Woul=
d

it be repeating again, we will try to do some debug of butc/volserver.

Andrei.

------=_Part_36695_28551459.1133774579073
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: quoted-printable
Content-Disposition: inline

<br><div><span class=3D"gmail_quote">On 12/5/05, <b class=3D"gmail_senderna=
me">Ryan Charles Underwood</b> &lt;<a href=3D"mailto:rcuca4@umr.edu">rcuca4=
@umr.edu</a>&gt; wrote:</span><blockquote class=3D"gmail_quote" style=3D"bo=
rder-left: 1px solid rgb(204, 204, 204); margin: 0pt 0pt 0pt 0.8ex; padding=
-left: 1ex;">
Nope, they are definitely synced up.&nbsp;&nbsp;Does anyone have any ideas =
how<br>I can dig into this further?&nbsp;&nbsp;Right now backups are non-fu=
nctional<br>because of it.<br><br>
</blockquote></div><br>
We have seen errors probably of a similar origin during AFS backups.<br>
Our TCs were running 1.2.13, and fileservers were already 1.4.0. The<br>
error manifested itself in a loss of token after several successfully<br>
backed up volumes, all subsequent backup operations were failing with<br>
a TExxx message similar to this:<br>
<br>
&nbsp; Sun Nov 27 00:38:40: Task 4002: Volume foo.bar (536912108) failed<br=
>
&nbsp; rxk: sealed data inconsistent<br>
<br>
The problem could be cured by &quot;vos remove *.backup&quot; followed by<b=
r>
new &quot;vos backupsys&quot;, but it would be reppearing again within 2-3 =
days.<br>
<br>
We have now migrated the TCs to 1.4.0, and made sure that all machines<br>
involved are connected to the same GigE switch. There was not a single<br>
problem since 5 days, we are now monitoring it. But the problem is apparent=
ly<br>
there and may show up again. I believe it should have to do something with =
<br>
timing, sort of a race condition which only pops up on a fast network. Woul=
d <br>
it be repeating again, we will try to do some debug of butc/volserver.<br>
<br>
Andrei.<br>
<br>

------=_Part_36695_28551459.1133774579073--