[OpenAFS] Unexpected no space left on device error

Benjamin Kaduk kaduk@mit.edu
Tue, 13 Nov 2018 21:36:54 -0600


On Tue, Nov 13, 2018 at 08:46:28PM -0500, Theo Ouzhinski wrote:
> Hi all,
> 
> Sorry for my previous incorrectly formatted email.
> Recently, I've seen an uptick in "no space left on device" errors for
> some of the home directories I administer.
> 
> For example,
> 
> matsumoto <USERNAME> # touch a
> touch: cannot touch 'a': No space left on device
> 
> We are not even close to filling up the cache (located at
> /var/cache/openafs) on this client machine.
> 
> matsumoto ~ # fs getcacheparms
> AFS using 10314 of the cache's available 10000000 1K byte blocks.
> matsumoto ~ # df -h
> Filesystem                   Size  Used Avail Use% Mounted on
> ....
> /dev/mapper/vgwrkstn-root    456G   17G  417G   4% /
> ....
> AFS                          2.0T     0  2.0T   0% /afs
> 
> 
> Nor is this home directory or any other problematic home directory close
> to their quota.
> 
> matsumoto <USERNAME> # fs lq
> Volume Name                    Quota       Used %Used   Partition
> <VOLUME NAME>              4194304     194403    5%         37%
> 
> According to previous posts on this list, many issues can be attributed
> to high inode usage.  However, this is not the case on our machines.
> 
> Here is sample output from one of our OpenAFS servers, which is similar
> to all of the four other ones.
> 
> openafs1 ~ # df -i
> Filesystem         Inodes   IUsed      IFree IUse% Mounted on
> udev              1903816     413    1903403    1% /dev
> tmpfs             1911210     551    1910659    1% /run
> /dev/vda1         1905008  154821    1750187    9% /
> tmpfs             1911210       1    1911209    1% /dev/shm
> tmpfs             1911210       5    1911205    1% /run/lock
> tmpfs             1911210      17    1911193    1% /sys/fs/cgroup
> /dev/vdb         19660800 3461203   16199597   18% /vicepa
> /dev/vdc         19660800 1505958   18154842    8% /vicepb
> tmpfs             1911210       4    1911206    1% /run/user/0
> AFS            2147483647       0 2147483647    0% /afs
> 
> 
> We are running the latest HWE kernel (4.15.0-38-generic) for Ubuntu
> 16.04 (which is the OS for both server and client machines). We are
> running on the clients, the following versions:
> 
> openafs-client/xenial,now 1.8.2-0ppa2~ubuntu16.04.1 amd64 [installed]
> openafs-krb5/xenial,now 1.8.2-0ppa2~ubuntu16.04.1 amd64 [installed]
> openafs-modules-dkms/xenial,xenial,now 1.8.2-0ppa2~ubuntu16.04.1 all
> [installed]
> 
> and on the servers, the following versions:
> 
> openafs-client/xenial,now 1.6.15-1ubuntu1 amd64 [installed]
> openafs-dbserver/xenial,now 1.6.15-1ubuntu1 amd64 [installed]
> openafs-fileserver/xenial,now 1.6.15-1ubuntu1 amd64 [installed]
> openafs-krb5/xenial,now 1.6.15-1ubuntu1 amd64 [installed]
> openafs-modules-dkms/xenial,xenial,now 1.6.15-1ubuntu1 all [installed]

(Off-topic, but that looks to be missing some security fixes.)

> What could be the problem? Is there something I missed?

It's not really ringing a bell off the top of my head, no.

That said, there's a number of potential ways to get ENOSPC, so it would be
good to get more data, like an strace of the failing touch, and maybe a
packet capture (port 7000) during the touch, both from a clean cache and
potentially a second attempt.

-Ben