[OpenAFS] OpenAFS 1.8.4pre1

Joyce, Stephen stephen@email.unc.edu
Thu, 12 Sep 2019 20:25:18 +0000


An update on my 1.8.4 experiences.

My initial success report was a bit premature.

I still occasionally get I/O errors with certain apps (esp libreoffice)=20
using 1.8.4. They occur with less frequency than with 1.8.3 and earlier,=20
however.

An strace shows libreoffice trying to do an openat() and getting an I/O=20
error.

[pid 44604] openat(AT_FDCWD,=20
"/afs/cas.unc.edu/home/stephen/.config/libreoffice/4/user/pNumql",=20
O_RDWR|O_CREAT|O_EXCL, 0600) =3D -1 EIO (Input/output error)

Interestingly, when I try to remove the "user" directory, I get behavior=20
that smells like a cache problem.

<3:27pm>stephen@lucifer:4>rm -rf user
rm: cannot remove 'user/config': Directory not empty

<3:27pm>stephen@lucifer:4>ls -la user/config
total 4
drwx------ 2 stephen users 2048 Sep 12 15:19 ./
drwx------ 3 stephen users 2048 Sep 12 15:23 ../

<3:28pm>stephen@lucifer:4>fs flush user/config

<3:28pm>stephen@lucifer:4>ls -la user/config/
total 6
drwx------ 2 stephen users 2048 Sep 12 15:19 ./
drwx------ 3 stephen users 2048 Sep 12 15:23 ../
-rw------- 1 stephen users 1703 Sep 12 15:19 javasettings_Linux_X86_64.xml

<3:28pm>stephen@lucifer:4>rm -rf user
rm: cannot remove 'user': Directory not empty

<3:37pm>stephen@lucifer:4>ls -la user
total 4
drwx------ 2 stephen users 2048 Sep 12 15:37 ./
drwx------ 3 stephen users 2048 Sep  3 15:30 ../

<3:37pm>stephen@lucifer:4>fs flush user
<3:37pm>stephen@lucifer:4>ls -la user
total 4
drwx------ 2 stephen users 2048 Sep 12 15:37 ./
drwx------ 3 stephen users 2048 Sep  3 15:30 ../
-rw------- 1 stephen users    0 Sep 12 15:23 K8Wuhr

I can eventually flush enough paths to remove the entire libreoffice/4/user=
=20
directory, but the problem recurs on the next launch. Once it gets into=20
this state, it seems quite reproducible.

If it helps, I've seen this behavior on multiple workstations, so I don't=20
think it's hardware. Thinking it might be callback related, I tested with=20
the client firewall set to default accept, but it made no apparent=20
difference.

>uname -a
Linux lucifer 4.15.0-58-generic #64-Ubuntu SMP Tue Aug 6 11:12:41 UTC 2019=
=20
x86_64 x86_64 x86_64 GNU/Linux

>lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description:    Ubuntu 18.04.3 LTS
Release:        18.04
Codename:       bionic

>rxdebug `hostname` 7001 -version
Trying 127.0.1.1 (port 7001):
AFS version: OpenAFS 1.8.4~pre1-1~ppa0~ubuntu18.04.1-debian 2019-08-27 root=
@

>cmdebug -server `hostname` -cache
Chunk files:   31250
Stat caches:   15000
Data caches:   10000
Volume caches: 200
Chunk size:    1048576
Cache size:    1000000 kB
Set time:      no
Cache type:    disk

>uptime
15:51:01 up 2 days,  1:25, 15 users,  load average: 0.47, 0.52, 0.57

Filesystem type is ext4. Storebehind is 0. Cache bypass is disabled.

Rebooting solves this issue but it generally recurs in 1-5 days. With=20
1.8.3, I'd sometimes need to reboot daily.

Probably doesn't matter given the changes, but 1.6.x on Ubuntu 16.04 on the=
=20
same hardware didn't exhibit this symptom.

Replacing the ~/.config/libreoffice/4/user directory with a symlink to a=20
location on the local disk appears to be a valid workaround.

Tickets/tokens seem fine. Other file accesses work as expected=20
before/during/after the above.

If any other info would help to diagnose this, let me know.

1.8.4pre1 is still an improvement over 1.8.3. Thanks again!


On Fri, 30 Aug 2019, Joyce, Stephen wrote:

> Just wanted to voice a thank you to all the devs who worked on the recent
> release. 1.8.4pre1 seems to have fixed several issues I was having on som=
e
> Ubuntu 18.04 workstations.
>
> While I haven't done any formal stress-testing, I have noticed no problem=
s
> so far.
>
> ~Stephen
> _______________________________________________
> OpenAFS-info mailing list
> OpenAFS-info@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-info
>