[OpenAFS] Preliminary findings on today's brokenness
Benjamin Kaduk
kaduk@mit.edu
Thu, 14 Jan 2021 07:30:25 -0800
Jeffrey has dome some analysis that is consistent with your results, and
posted patches at
https://gerrit.openafs.org/#/c/14491
https://gerrit.openafs.org/#/c/14492
We'll be reviewing those shortly.
-Ben
On Thu, Jan 14, 2021 at 10:21:22AM -0500, Chaskiel Grundman wrote:
> None of these things is confirmed yet, but based on some analysis and
> testing carnegie mellon has done today:
>
> - The problem is in RX (the transport layer), not any of the applications
> - It likely affects 1.8.0 and newer, but not 1.6
> - It seems to be triggered by the RX epoch being after the unix time
> 0x60000000
> aka 1610612736, aka Thu Jan 14 08:25:36 UTC 2021
>
>
> So any cache manager and server that has been running since before that
> time will continue to work until they are restarted. Sites may wish to try
> and avoid having critical systems reboot or restart until a fix or
> workaround for this issue is identified.
>
> If anyone has a system running something 1.8.0 or newer where the command
> vos status afs-01.andrew.cmu.edu -noauth
>
> succeeds, I'd appreciate knowing about it, as it will change this analysis.