EXTERNAL: [OpenAFS] Preliminary findings on today's brokenness
Ben Carter
bhc@pitt.edu
Thu, 14 Jan 2021 10:26:03 -0500
So we are running 1.6 code and we definitely have a problem. However
for us, a sync site is being elected, but doing a vos examine from a
client seems to hang. Actual access to files in AFS seems to be working
fine but we've not restarted any file server processes.
Ben
On 1/14/21 10:21 AM, Chaskiel Grundman wrote:
> None of these things is confirmed yet, but based on some analysis and
> testing carnegie mellon has done today:
>
> - The problem is in RX (the transport layer), not any of the applications
> - It likely affects 1.8.0 and newer, but not 1.6
> -It seems to be triggered by the RX epoch being after the unix time
> 0x60000000 aka 1610612736, aka Thu Jan 14 08:25:36 UTC 2021
>
>
> So any cache manager and server that has been running since before that
> time will continue to work until they are restarted. Sites may wish to
> try and avoid having critical systems reboot or restart until a fix or
> workaround for this issue is identified.
>
> If anyone has a system running something 1.8.0 or newer where the command
> vos status afs-01.andrew.cmu.edu
> <https://nam12.safelinks.protection.outlook.com/?url=http%3A%2F%2Fafs-01.andrew.cmu.edu%2F&data=04%7C01%7Cbhc%40pitt.edu%7C41b163d418f34672980208d8b8a01ee8%7C9ef9f489e0a04eeb87cc3a526112fd0d%7C1%7C0%7C637462345143664355%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C2000&sdata=yrFiXzq9V9tiqqASL4EDgRrSChdNPbgkOsWeY3SFjvY%3D&reserved=0>
> -noauth
>
> succeeds, I'd appreciate knowing about it, as it will change this analysis.
--
Ben Carter
System Engineer/Operations
University of Pittsburgh Information Technology
Office: 412-624-6470
bhc@pitt.edu