[OpenAFS] OpenAFS client softlockup on concurrential file-system patterns (100% CPU in kernel mode)

Ciprian Dorin Craciun ciprian.craciun@gmail.com
Wed, 20 Nov 2019 21:37:22 +0200


Before replying, I want to note that I think I've stumbled upon three
(perhaps related) issues (some of which might just be configuration
error):
* AFS file access getting stuck;  (seems to be solved by increasing
the number of `fileserver` threads from `-p 4` to `-p 128`;)
* trying to `SIGTERM` or `SIGKILL` a "stuck" process, takes Linux (in
kernel code) to 100% CPU;
* having `-jumbo -rxmaxmtu 9000` on the server, but not on the client,
yields poor performance;

This new thread (which I was just going to open myself) is related to
the third problem of mismatch between server and client jumbo frames
setting.




On Wed, Nov 20, 2019 at 8:59 PM Kostas Liakakis <kostas@physics.auth.gr> wrote:
> (Yesterday over wireless I didn't use Jumbo frames, but the day
> before, where the same thing happened, I was using them.)
>
> Does this mean that '"the other day with jumbo frames" was over GigE ? Does this happen over GigE with jumbo frames disabled a well?


So, apparently having `-jumbo -rxmaxmtu 9000` on the server, but not
configuring jumbo frames on the client yields poor performance.

(Also the "getting stuck" issue happens regardless of this other problem.)

Without touching the `fileserver` parameters, none of the following
seem to work:
* `afsd` with `-rxmaxmtu 9000` but without jumbo frames configured on
the network card;  (clear missconfiguration on my part);
* `afsd` with `-rxmaxmtu 1500` but over GigaBit Ethernet (and without
jumbo frames configured on the network card);  (an usual client on the
same network without jumbo frames support;)
* `afsd` with `-rxmaxmtu 1500` but over Wifi (which is capable of ~14
MiB receive);  (clearly no jumbo frames are supported;)
* as mentioned only by matching the server configuration seem to solve
the issue;
* (encryption is disabled;)


I've changed the `fileserver` parameters by removing `-jumbo` and
updating `-rxmaxmtu 1400` (I also intend to use this over WAN, thus
over PPPoE and VPN, which will add quite an impact on the MTU).

Now the client works OK, however if I start the `afsd` client on the
server itself (i.e. over `loopback` network), where previously (with
`-jumbo`) I was able to max-out the disks (~300 MiB/s), now seems to
be capped at around ~120MiB.  (The packet-per-second is aroun
~120K...)




> I 've seen problems finally attributed to jumbo frames where some configuration change on a switch someplace amount the path rendered them unusable.


I don't think this is the case here.  I have only one switch between
the client and the server (no other network equipment), and I haven't
encountered performance problems (even with regard to jumbo frames).


Ciprian.