[OpenAFS-devel] Problem with mounts in AFS on CentOS 7.4 with openafs 1.6.2[01].1

Ragnar Sundblad ragge@csc.kth.se
Wed, 20 Dec 2017 16:53:14 +0100


Hi Mark,


Just to report back:

We have tried your (no longer recommended) patch
https://gerrit.openafs.org/#/c/12796/
as you pointed out in the thread "getcwd() error for RHEL 7.4 kernel=E2=80=
=9D in the openafs-info list.

As far as we have seen, this indeed solved our disappearing mount point =
problems.

We will of course switch to the new version of the patch (or maybe just =
1.8.0) as soon as there is one.

Thanks for your work!


Best regards,

/ragge


> On 3 Nov 2017, at 17:29, Ragnar Sundblad <ragge@csc.kth.se> wrote:
>=20
>=20
> Hi Mark,
>=20
>> On 3 Nov 2017, at 15:51, Mark Vitale <mvitale@sinenomine.net> wrote:
>>=20
>> Ragge,
>>=20
>>> On Nov 3, 2017, at 9:46 AM, Ragnar Sundblad <ragge@csc.kth.se> =
wrote:
>>>=20
>>> We have compute clusters where the nodes have almost everything of =
their roots in afs; most things in /, as /etc and /usr, are soft links =
into a complete os installation in afs. To be able to have some writable =
files and directories, such as /etc/adjtime or /var/tmp, we bind mount =
files and directories in the tree which is actually in afs (mainly using =
the rwtab functionality), and a lustre client that also gets mounted in =
the afs tree.
>>>=20
>>> When we upgraded from CentOS 7.3 to 7.4, kernel =
3.10.0-693.5.2.el7.x86_64, and using OpenAFS client 1.6.21.1 or =
1.6.20.1, when users having home directories in afs log in and start =
accessing their data, mounts in the afs tree starts to get randomly =
unmounted. In the lustre case, the lustre client nicely reports that it =
unmounts, so the unmounts seem to be handled in an orderly manner.
>>>=20
>>> We have a suspicion this may be related to the problem reported in =
the thread =C3=A2=C2=80=C2=9Cgetcwd() error for RHEL 7.4 kernel=C3=A2=C2=80=
=C2=9D, and that the kernel for some reason decides that path to the =
mount point is no good and unmounts.
>>> In addition, when this has started to happen, we are not able to =
mount anything more into afs, mount returns ENOENT.
>>>=20
>>> This is pretty easy to repeat.
>> Thank you for your detailed report.
>> I have an idea about what this may be, but I will try to duplicate it =
on my test system first.
>=20
> Thanks for investigating! :-)
>=20
>>> Our workaround for now is to use the tpmfs based root all the way =
down to the mount points, and have soft links into afs further down for =
the rest, which seems to work.
>> It=C3=A2=C2=80=C2=99s good that you have a workaround; thank you for =
sharing that as well.
>>=20
>>> Please let us know if we can provide any help debugging this.
>> For now I would like to see your afsd options, and also the output =
from =C3=A2=C2=80=C2=98cmdebug <client> -cache=C3=A2=C2=80=C2=99 for an =
affected client. =20
>=20
> We start it like so:
> /bin/chroot /sysimage /usr/vice/etc/afsd -memcache -verbose -nosettime =
-dynroot -mountdir /afs
> (Before systemd is started, we set up the runtime root in /sysimage, =
then chroot there, and start systemd to let it bring up the system.)
>=20
> Here is a cmdebug:
> # cmdebug tegner-login-2 -cache
> Chunk files:   1562
> Stat caches:   2343
> Data caches:   1562
> Volume caches: 200
> Chunk size:    65536
> Cache size:    100000 kB
> Set time:      no
> Cache type:    memory
>=20
> I now see that I forgot to mention that we use memory cache (since the =
nodes are diskless).
>=20
>> Although you haven=C3=A2=C2=80=C2=99t reported the getcwd() problem, =
could you please confirm if you=C3=A2=C2=80=C2=99ve seen it or not?
>=20
> We have not seen it, but we haven=E2=80=99t really looked for it =
either. Is there some test we could try?
>=20
>> And finally, just to confirm, you have seen bind mounts in /afs =
unmounted at CentOS 7.4 with both OpenAFS 1.6.21.1 and 1.6.20.1, but =
_not_ with CentOS 7.3 and those same OpenAFS client releases - correct?
>=20
> With 7.3 (kernel 3.10.0-514.26.2.el7.x86_64) we actually used openafs =
client 1.6.20.2, but with that combination this mount-within-afs thing =
worked just fine.
>=20
> Thanks!
>=20
> /ragge
>=20
> _______________________________________________
> OpenAFS-devel mailing list
> OpenAFS-devel@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-devel