[OpenAFS] Slow loading of virtually hosted web content

Benjamin Kaduk kaduk@mit.edu
Fri, 19 Nov 2021 15:28:12 -0800


Hi Kendrick,

While I don't have specific advice for you, I'll make a few high-level
points:

- most of the community knowledge about tuning, both for fileservers and
  cache managers, has been recorded in various workshop presentations over
  the years.  That should include what parameters to tune and how to
  determine whether changes are helping (or hurting)

- there's a somewhat fundamental performance bottleneck in the OpenAFS Rx
  RPC implementation so that performance degrades with latency between
  cache manager and fileserver.  It sounds like that's not your immediate
  issue, but is something to keep in mind

- It's possible for fileserver configuration to be the issue even if the
  fileservers themselves don't seem to be under heavy load.  For example,
  if the fileserver doesn't have enough space to store callback entries for
  all the active clients in normal use, it will be sending callback breaks
  and reducing the caching efficiency of the clients.

Best of luck,

Ben

On Wed, Nov 10, 2021 at 03:27:43PM -0500, Kendrick Hernandez wrote:
> Hi all,
> 
> We host around 240 departmental and campus web sites (individual afs
> volumes) across 6 virtual web servers on AFS storage. The web servers are 4
> core, 16G VMs, and the 4 file servers are 4 core 32G VMs. All CentOS 7
> systems.
> 
> In the past week or so, we've encountered high-load on the web servers
> (primary consumers being apache and afsd) during periods of increased
> traffic, and we're trying to identify ways to tune performance. After
> seeing the following in the logs:
> 
> 2021 11 08 08:52:03 -05:00 virthost4 [kern.warning] kernel: afs: Warning:
> > We are having trouble keeping the AFS stat cache trimmed down under the
> > configured limit (current -stat setting: 3000, current vcache usage: 18116).
> > 2021 11 08 08:52:03 -05:00 virthost4 [kern.warning] kernel: afs: If AFS
> > access seems slow, consider raising the -stat setting for afsd.
> 
> 
> I increased the disk cache to 10g and the -stat parameter to 100000, which
> has improved things somewhat, but we're not quite there yet. This is the
> current client cache configuration from one of the web servers:
> 
> Chunk files:   281250
> > Stat caches:   100000
> > Data caches:   10000
> > Volume caches: 200
> > Chunk size:    1048576
> > Cache size:    9000000 kB
> > Set time:      no
> > Cache type:    disk
> 
> 
> Has anyone else experienced this? I think the bottleneck is with the cache
> manager and not the file servers themselves, because they don't seem to be
> impacted much during those periods of high load, and I can access files in
> those web volumes from my local client without any noticable lag.
> 
> Any guidance on what to look at regarding performance would be much
> appreciated.
> 
> Thank you!
> k-
> 
> -- 
> Kendrick Hernandez
> *UNIX Systems Administrator*
> Division of Information Technology
> University of Maryland, Baltimore County