[OpenAFS] OpenAFS client softlockup on highly concurrential file-system patterns (100% CPU in kernel mode)

Wed, 20 Nov 2019 20:50:34 +0200

On Wed, Nov 20, 2019 at 7:49 PM Mark Vitale <mvitale@sinenomine.net> wrote:
> > The following are the arguments of `fileserver`:
> > -syslog -sync always -p 4 -b 524288 -l 524288 -s 1048576 -vc 4096 -cb
> > 1048576 -vhandle-max-cachesize 32768 -jumbo -udpsize 67108864
> > -sendsize 67108864 -rxmaxmtu 9000 -rxpck 4096 -busyat 65536
>
> I see some areas of concern here.  First of all, many of your parameters
> indicate that you expect to run relatively high load through this fileserver.
> Yet there are only -p 4 server threads defined.  The fileserver will automatically
> increase this to the minimum of 6, but that still seems quite low.

These parameters (at least most of them) were empirically identified
for a highly concurrent access pattern, of a large number of 16KiB to
20MIB files, from a low number of users (2-3) over low-latency network
(wired, GigaBit, same LAN).  (I also had an IRC discussion with with
Jeffrey about this topic.)

There is a thread on this mailing list from 9th March 2019, with the
subject <<Questions regarding `afsd` caching arguments (`-dcache` and
`-files`)>>, where I've also listed the IRC discussion with Jeffrey
about this topic.  The `-p` argument is explicitly present in that
discussion.

The main use-case of my setup is a home / SOHO file server acting as a
NAS.  Therefore all my parameters are tuned towards low-latency and
high-bandwidth access, at the expense of server RAM (thus the large
number for buffers count and sizes).

> This low thread number, combined with a very large -busyat value,
> means that this fileserver will queue a very large backlog before returning
> VBUSY to the client.  Is there a reason you need to keep the fileserver
> threads so low?  Would it be possible for you to increase it dramatically
> (perhaps 100) and try the test again?

I've just increased this number to `-p 128`, and re-executed the
build.  (I haven't restarted the client, but I did restart the
server.)

Under initial parameters (i.e. 8 parallel builds) I wasn't able to
replicate the issue in 10 tries.

(The solution for this item seemed to be removing `-jumbo` and setting
`-rxmaxmtu 1500` instead of `9000`.)
Thus I've deleted around ~2K output files and increased the
parallelism to 32.  Under these conditions, although the build didn't
block, the bandwidth (over wireless) was around 500KiB/s (receive)
when I would have expected more (the input files are much larger than
the output files, for instance ~300KiB in to ~25KiB out), and the task
completion rate seemed verry jagged (i.e. no progress for a while,
then all of a sudden 10 would finish).  (I mention that the workload
is not CPU bound, average CPU on client is around ~20%.)

I've tried this second scenario (with the no-Jumbo settings) a few
times and still nothing got stuck.

However even if the case of "stuck process for 20 minutes" is solved,
there is still the issue of trying to `SIGTERM` those waiting
processes that jumps the kernel in 100% CPU.

If I can try other experiments, please let me know.

Thanks,
Ciprian.