[OpenAFS] Unexpected no space left on device error
Benjamin Kaduk
kaduk@mit.edu
Tue, 13 Nov 2018 21:36:54 -0600
On Tue, Nov 13, 2018 at 08:46:28PM -0500, Theo Ouzhinski wrote:
> Hi all,
>
> Sorry for my previous incorrectly formatted email.
> Recently, I've seen an uptick in "no space left on device" errors for
> some of the home directories I administer.
>
> For example,
>
> matsumoto <USERNAME> # touch a
> touch: cannot touch 'a': No space left on device
>
> We are not even close to filling up the cache (located at
> /var/cache/openafs) on this client machine.
>
> matsumoto ~ # fs getcacheparms
> AFS using 10314 of the cache's available 10000000 1K byte blocks.
> matsumoto ~ # df -h
> Filesystem Size Used Avail Use% Mounted on
> ....
> /dev/mapper/vgwrkstn-root 456G 17G 417G 4% /
> ....
> AFS 2.0T 0 2.0T 0% /afs
>
>
> Nor is this home directory or any other problematic home directory close
> to their quota.
>
> matsumoto <USERNAME> # fs lq
> Volume Name Quota Used %Used Partition
> <VOLUME NAME> 4194304 194403 5% 37%
>
> According to previous posts on this list, many issues can be attributed
> to high inode usage. However, this is not the case on our machines.
>
> Here is sample output from one of our OpenAFS servers, which is similar
> to all of the four other ones.
>
> openafs1 ~ # df -i
> Filesystem Inodes IUsed IFree IUse% Mounted on
> udev 1903816 413 1903403 1% /dev
> tmpfs 1911210 551 1910659 1% /run
> /dev/vda1 1905008 154821 1750187 9% /
> tmpfs 1911210 1 1911209 1% /dev/shm
> tmpfs 1911210 5 1911205 1% /run/lock
> tmpfs 1911210 17 1911193 1% /sys/fs/cgroup
> /dev/vdb 19660800 3461203 16199597 18% /vicepa
> /dev/vdc 19660800 1505958 18154842 8% /vicepb
> tmpfs 1911210 4 1911206 1% /run/user/0
> AFS 2147483647 0 2147483647 0% /afs
>
>
> We are running the latest HWE kernel (4.15.0-38-generic) for Ubuntu
> 16.04 (which is the OS for both server and client machines). We are
> running on the clients, the following versions:
>
> openafs-client/xenial,now 1.8.2-0ppa2~ubuntu16.04.1 amd64 [installed]
> openafs-krb5/xenial,now 1.8.2-0ppa2~ubuntu16.04.1 amd64 [installed]
> openafs-modules-dkms/xenial,xenial,now 1.8.2-0ppa2~ubuntu16.04.1 all
> [installed]
>
> and on the servers, the following versions:
>
> openafs-client/xenial,now 1.6.15-1ubuntu1 amd64 [installed]
> openafs-dbserver/xenial,now 1.6.15-1ubuntu1 amd64 [installed]
> openafs-fileserver/xenial,now 1.6.15-1ubuntu1 amd64 [installed]
> openafs-krb5/xenial,now 1.6.15-1ubuntu1 amd64 [installed]
> openafs-modules-dkms/xenial,xenial,now 1.6.15-1ubuntu1 all [installed]
(Off-topic, but that looks to be missing some security fixes.)
> What could be the problem? Is there something I missed?
It's not really ringing a bell off the top of my head, no.
That said, there's a number of potential ways to get ENOSPC, so it would be
good to get more data, like an strace of the failing touch, and maybe a
packet capture (port 7000) during the touch, both from a clean cache and
potentially a second attempt.
-Ben