[OpenAFS] Preliminary findings on today's brokenness

Benjamin Kaduk kaduk@mit.edu
Thu, 14 Jan 2021 07:30:25 -0800


Jeffrey has dome some analysis that is consistent with your results, and
posted patches at
https://gerrit.openafs.org/#/c/14491
https://gerrit.openafs.org/#/c/14492

We'll be reviewing those shortly.

-Ben

On Thu, Jan 14, 2021 at 10:21:22AM -0500, Chaskiel Grundman wrote:
> None of these things is confirmed yet, but based on some analysis and
> testing carnegie mellon has done today:
> 
> - The problem is in RX (the transport layer), not any of the applications
> - It likely affects 1.8.0 and newer, but not 1.6
> - It seems to be triggered by the RX epoch being after the unix time
> 0x60000000
> aka 1610612736, aka Thu Jan 14 08:25:36 UTC 2021
> 
> 
> So any cache manager and server that has been running since before that
> time will continue to work until they are restarted. Sites may wish to try
> and avoid having critical systems reboot or restart until a fix or
> workaround for this issue is identified.
> 
> If anyone has a system running something 1.8.0 or newer where the command
> vos status afs-01.andrew.cmu.edu -noauth
> 
> succeeds, I'd appreciate knowing about it, as it will change this analysis.