[OpenAFS] OpenAFS client softlockup on highly concurrential file-system patterns (100% CPU in kernel mode)

Benjamin Kaduk kaduk@mit.edu
Sun, 24 Nov 2019 16:53:06 -0800


On Tue, Nov 19, 2019 at 01:53:59PM +0200, Ciprian Dorin Craciun wrote:
> 
> My setup is as follows:
> 
> * OpenSUSE Tumbleweed, kernel 5.3.9-1-default, client package
> `openafs-client` and `openafs-kmp-default` at `1.8.5_k5.3.9_1-1.3` as
> provided by OpenSUSE;
> 
> * `afsd` parameters (neither memory cache (on `tmpfs`) or disk cache seems
> to help;  neither daemons from 4 to 1;  encryption is off):
> 
> ~~~~
> -verbose -blocks 7864320 -chunksize 17 -files 524288 -files_per_subdir 128
> -dcache 524288 -stat 524288 -volumes 128 -splitcache 90/10 -afsdb
> -dynroot-sparse -fakestat-all -inumcalc md5 -backuptree -daemons 1
> -rxmaxfrags 8 -rxmaxmtu 1500 -rxpck 4096 -nosettime
> ~~~~
> -verbose -memcache -blocks 1048576 -chunksize 17 -stat 524288 -volumes 128
> -splitcache 90/10 -afsdb -dynroot-sparse -fakestat-all -inumcalc md5
> -backuptree -daemons 1 -rxmaxfrags 8 -rxmaxmtu 1500 -rxpck 4096 -nosettime
> ~~~~
> 
> * the server is also on OpenSUSE Leap 15.0, with `openafs-server` package at
> `1.8.0-lp150.2.2.1` as provided by OpenSUSE;
> 
> * I suspect that perhaps the issue is due to the latest kernel version,
> because I have run similar patterns a few weeks ago on an older kernel (but
> still from the `5.x` family), but can't say for sure;

I see the diagnostics and further data points later in the thread, but are
you in a position to boot an older kernel to attempt to confirm/refute this
hypothesis?

Thanks,

Ben