[OpenAFS] Re: ARM64 5.4.0-42 with 1.8.5 oops in afs_CellNumValid

Ian Wienand iwienand@redhat.com
Wed, 16 Jun 2021 12:07:38 +1000


Just to follow-up on this

On Wed, Aug 26, 2020 at 05:48:45AM +1000, Ian Wienand wrote:
> [Tue Aug 25 09:43:16 2020] Starting AFS cache scan...
> [Tue Aug 25 09:44:46 2020] Key type afs_pag unregistered

The magic is in here and the 1:30 default timeout of the service
start.  This is also more obvious if you look in syslog, not dmesg.
If systemd times out and aborts while afsd is in its tight loop
getting ready and iterating over the cache the technical term is "all
hell breaks loose".

If you have a system that usually, but not always, gets in under the
1:30 startup time limit this can become a fun race to debug when you
perform a quick reboot to say, fix a security issue :) Alternative
platforms like ARM64 (that are a bit slower than the rest of your
infrastructure) or virtualised servers probably exacerbate the
potential for issues.

So watch out for this; you can override the service timeout in many
various ways [1].

-i

[1] https://www.freedesktop.org/software/systemd/man/systemd.unit.html