From ragge@csc.kth.se Fri Nov 3 14:46:38 2017 From: ragge@csc.kth.se (Ragnar Sundblad) Date: Fri, 3 Nov 2017 14:46:38 +0100 Subject: [OpenAFS-devel] Problem with mounts in AFS on CentOS 7.4 with openafs 1.6.2[01].1 Message-ID: <3D3A5B80-7B2C-4968-B3AA-95A6EBC774C5@csc.kth.se> Hi all, We have compute clusters where the nodes have almost everything of their = roots in afs; most things in /, as /etc and /usr, are soft links into a = complete os installation in afs. To be able to have some writable files = and directories, such as /etc/adjtime or /var/tmp, we bind mount files = and directories in the tree which is actually in afs (mainly using the = rwtab functionality), and a lustre client that also gets mounted in the = afs tree. When we upgraded from CentOS 7.3 to 7.4, kernel = 3.10.0-693.5.2.el7.x86_64, and using OpenAFS client 1.6.21.1 or = 1.6.20.1, when users having home directories in afs log in and start = accessing their data, mounts in the afs tree starts to get randomly = unmounted. In the lustre case, the lustre client nicely reports that it = unmounts, so the unmounts seem to be handled in an orderly manner. We have a suspicion this may be related to the problem reported in the = thread =E2=80=9Cgetcwd() error for RHEL 7.4 kernel=E2=80=9D, and that = the kernel for some reason decides that path to the mount point is no = good and unmounts. In addition, when this has started to happen, we are not able to mount = anything more into afs, mount returns ENOENT. This is pretty easy to repeat. Our workaround for now is to use the tpmfs based root all the way down = to the mount points, and have soft links into afs further down for the = rest, which seems to work. Please let us know if we can provide any help debugging this. /ragge PDC Center for High Performance Computing, KTH Royal Institute of = Technology, Stockholm, Sweden From mvitale@sinenomine.net Fri Nov 3 15:51:17 2017 From: mvitale@sinenomine.net (Mark Vitale) Date: Fri, 3 Nov 2017 14:51:17 +0000 Subject: [OpenAFS-devel] Problem with mounts in AFS on CentOS 7.4 with openafs 1.6.2[01].1 In-Reply-To: <3D3A5B80-7B2C-4968-B3AA-95A6EBC774C5@csc.kth.se> References: <3D3A5B80-7B2C-4968-B3AA-95A6EBC774C5@csc.kth.se> Message-ID: <1366404F-9A0D-4357-A89B-65FB1DE77983@sinenomine.net> UmFnZ2UsDQoNCj4gT24gTm92IDMsIDIwMTcsIGF0IDk6NDYgQU0sIFJhZ25hciBTdW5kYmxhZCA8 cmFnZ2VAY3NjLmt0aC5zZT4gd3JvdGU6DQo+IA0KPiBXZSBoYXZlIGNvbXB1dGUgY2x1c3RlcnMg d2hlcmUgdGhlIG5vZGVzIGhhdmUgYWxtb3N0IGV2ZXJ5dGhpbmcgb2YgdGhlaXIgcm9vdHMgaW4g YWZzOyBtb3N0IHRoaW5ncyBpbiAvLCBhcyAvZXRjIGFuZCAvdXNyLCBhcmUgc29mdCBsaW5rcyBp bnRvIGEgY29tcGxldGUgb3MgaW5zdGFsbGF0aW9uIGluIGFmcy4gVG8gYmUgYWJsZSB0byBoYXZl IHNvbWUgd3JpdGFibGUgZmlsZXMgYW5kIGRpcmVjdG9yaWVzLCBzdWNoIGFzIC9ldGMvYWRqdGlt ZSBvciAvdmFyL3RtcCwgd2UgYmluZCBtb3VudCBmaWxlcyBhbmQgZGlyZWN0b3JpZXMgaW4gdGhl IHRyZWUgd2hpY2ggaXMgYWN0dWFsbHkgaW4gYWZzIChtYWlubHkgdXNpbmcgdGhlIHJ3dGFiIGZ1 bmN0aW9uYWxpdHkpLCBhbmQgYSBsdXN0cmUgY2xpZW50IHRoYXQgYWxzbyBnZXRzIG1vdW50ZWQg aW4gdGhlIGFmcyB0cmVlLg0KPiANCj4gV2hlbiB3ZSB1cGdyYWRlZCBmcm9tIENlbnRPUyA3LjMg dG8gNy40LCBrZXJuZWwgMy4xMC4wLTY5My41LjIuZWw3Lng4Nl82NCwgYW5kIHVzaW5nIE9wZW5B RlMgY2xpZW50IDEuNi4yMS4xIG9yIDEuNi4yMC4xLCB3aGVuIHVzZXJzIGhhdmluZyBob21lIGRp cmVjdG9yaWVzIGluIGFmcyBsb2cgaW4gYW5kIHN0YXJ0IGFjY2Vzc2luZyB0aGVpciBkYXRhLCBt b3VudHMgaW4gdGhlIGFmcyB0cmVlIHN0YXJ0cyB0byBnZXQgcmFuZG9tbHkgdW5tb3VudGVkLiBJ biB0aGUgbHVzdHJlIGNhc2UsIHRoZSBsdXN0cmUgY2xpZW50IG5pY2VseSByZXBvcnRzIHRoYXQg aXQgdW5tb3VudHMsIHNvIHRoZSB1bm1vdW50cyBzZWVtIHRvIGJlIGhhbmRsZWQgaW4gYW4gb3Jk ZXJseSBtYW5uZXIuDQo+IA0KPiBXZSBoYXZlIGEgc3VzcGljaW9uIHRoaXMgbWF5IGJlIHJlbGF0 ZWQgdG8gdGhlIHByb2JsZW0gcmVwb3J0ZWQgaW4gdGhlIHRocmVhZCDigJxnZXRjd2QoKSBlcnJv ciBmb3IgUkhFTCA3LjQga2VybmVs4oCdLCBhbmQgdGhhdCB0aGUga2VybmVsIGZvciBzb21lIHJl YXNvbiBkZWNpZGVzIHRoYXQgcGF0aCB0byB0aGUgbW91bnQgcG9pbnQgaXMgbm8gZ29vZCBhbmQg dW5tb3VudHMuDQo+IEluIGFkZGl0aW9uLCB3aGVuIHRoaXMgaGFzIHN0YXJ0ZWQgdG8gaGFwcGVu LCB3ZSBhcmUgbm90IGFibGUgdG8gbW91bnQgYW55dGhpbmcgbW9yZSBpbnRvIGFmcywgbW91bnQg cmV0dXJucyBFTk9FTlQuDQo+IA0KPiBUaGlzIGlzIHByZXR0eSBlYXN5IHRvIHJlcGVhdC4NClRo YW5rIHlvdSBmb3IgeW91ciBkZXRhaWxlZCByZXBvcnQuDQpJIGhhdmUgYW4gaWRlYSBhYm91dCB3 aGF0IHRoaXMgbWF5IGJlLCBidXQgSSB3aWxsIHRyeSB0byBkdXBsaWNhdGUgaXQgb24gbXkgdGVz dCBzeXN0ZW0gZmlyc3QuDQoNCj4gT3VyIHdvcmthcm91bmQgZm9yIG5vdyBpcyB0byB1c2UgdGhl IHRwbWZzIGJhc2VkIHJvb3QgYWxsIHRoZSB3YXkgZG93biB0byB0aGUgbW91bnQgcG9pbnRzLCBh bmQgaGF2ZSBzb2Z0IGxpbmtzIGludG8gYWZzIGZ1cnRoZXIgZG93biBmb3IgdGhlIHJlc3QsIHdo aWNoIHNlZW1zIHRvIHdvcmsuDQpJdOKAmXMgZ29vZCB0aGF0IHlvdSBoYXZlIGEgd29ya2Fyb3Vu ZDsgdGhhbmsgeW91IGZvciBzaGFyaW5nIHRoYXQgYXMgd2VsbC4NCg0KPiBQbGVhc2UgbGV0IHVz IGtub3cgaWYgd2UgY2FuIHByb3ZpZGUgYW55IGhlbHAgZGVidWdnaW5nIHRoaXMuDQpGb3Igbm93 IEkgd291bGQgbGlrZSB0byBzZWUgeW91ciBhZnNkIG9wdGlvbnMsIGFuZCBhbHNvIHRoZSBvdXRw dXQgZnJvbSDigJhjbWRlYnVnIDxjbGllbnQ+IC1jYWNoZeKAmSBmb3IgYW4gYWZmZWN0ZWQgY2xp ZW50LiAgDQoNCkFsdGhvdWdoIHlvdSBoYXZlbuKAmXQgcmVwb3J0ZWQgdGhlIGdldGN3ZCgpIHBy b2JsZW0sIGNvdWxkIHlvdSBwbGVhc2UgY29uZmlybSBpZiB5b3XigJl2ZSBzZWVuIGl0IG9yIG5v dD8NCg0KQW5kIGZpbmFsbHksIGp1c3QgdG8gY29uZmlybSwgeW91IGhhdmUgc2VlbiBiaW5kIG1v dW50cyBpbiAvYWZzIHVubW91bnRlZCBhdCBDZW50T1MgNy40IHdpdGggYm90aCBPcGVuQUZTIDEu Ni4yMS4xIGFuZCAxLjYuMjAuMSwgYnV0IF9ub3RfIHdpdGggQ2VudE9TIDcuMyBhbmQgdGhvc2Ug c2FtZSBPcGVuQUZTIGNsaWVudCByZWxlYXNlcyAtIGNvcnJlY3Q/DQoNClRoYW5rcywNCuKAlA0K TWFyayBWaXRhbGUNCk9wZW5BRlMgcmVsZWFzZSB0ZWFtDQoNCg== From ragge@csc.kth.se Fri Nov 3 17:29:58 2017 From: ragge@csc.kth.se (Ragnar Sundblad) Date: Fri, 3 Nov 2017 17:29:58 +0100 Subject: [OpenAFS-devel] Problem with mounts in AFS on CentOS 7.4 with openafs 1.6.2[01].1 In-Reply-To: <1366404F-9A0D-4357-A89B-65FB1DE77983@sinenomine.net> References: <3D3A5B80-7B2C-4968-B3AA-95A6EBC774C5@csc.kth.se> <1366404F-9A0D-4357-A89B-65FB1DE77983@sinenomine.net> Message-ID: Hi Mark, > On 3 Nov 2017, at 15:51, Mark Vitale wrote: >=20 > Ragge, >=20 >> On Nov 3, 2017, at 9:46 AM, Ragnar Sundblad wrote: >>=20 >> We have compute clusters where the nodes have almost everything of = their roots in afs; most things in /, as /etc and /usr, are soft links = into a complete os installation in afs. To be able to have some writable = files and directories, such as /etc/adjtime or /var/tmp, we bind mount = files and directories in the tree which is actually in afs (mainly using = the rwtab functionality), and a lustre client that also gets mounted in = the afs tree. >>=20 >> When we upgraded from CentOS 7.3 to 7.4, kernel = 3.10.0-693.5.2.el7.x86_64, and using OpenAFS client 1.6.21.1 or = 1.6.20.1, when users having home directories in afs log in and start = accessing their data, mounts in the afs tree starts to get randomly = unmounted. In the lustre case, the lustre client nicely reports that it = unmounts, so the unmounts seem to be handled in an orderly manner. >>=20 >> We have a suspicion this may be related to the problem reported in = the thread =C3=A2=C2=80=C2=9Cgetcwd() error for RHEL 7.4 kernel=C3=A2=C2=80= =C2=9D, and that the kernel for some reason decides that path to the = mount point is no good and unmounts. >> In addition, when this has started to happen, we are not able to = mount anything more into afs, mount returns ENOENT. >>=20 >> This is pretty easy to repeat. > Thank you for your detailed report. > I have an idea about what this may be, but I will try to duplicate it = on my test system first. Thanks for investigating! :-) >> Our workaround for now is to use the tpmfs based root all the way = down to the mount points, and have soft links into afs further down for = the rest, which seems to work. > It=C3=A2=C2=80=C2=99s good that you have a workaround; thank you for = sharing that as well. >=20 >> Please let us know if we can provide any help debugging this. > For now I would like to see your afsd options, and also the output = from =C3=A2=C2=80=C2=98cmdebug -cache=C3=A2=C2=80=C2=99 for an = affected client. =20 We start it like so: /bin/chroot /sysimage /usr/vice/etc/afsd -memcache -verbose -nosettime = -dynroot -mountdir /afs (Before systemd is started, we set up the runtime root in /sysimage, = then chroot there, and start systemd to let it bring up the system.) Here is a cmdebug: # cmdebug tegner-login-2 -cache Chunk files: 1562 Stat caches: 2343 Data caches: 1562 Volume caches: 200 Chunk size: 65536 Cache size: 100000 kB Set time: no Cache type: memory I now see that I forgot to mention that we use memory cache (since the = nodes are diskless). > Although you haven=C3=A2=C2=80=C2=99t reported the getcwd() problem, = could you please confirm if you=C3=A2=C2=80=C2=99ve seen it or not? We have not seen it, but we haven=E2=80=99t really looked for it either. = Is there some test we could try? > And finally, just to confirm, you have seen bind mounts in /afs = unmounted at CentOS 7.4 with both OpenAFS 1.6.21.1 and 1.6.20.1, but = _not_ with CentOS 7.3 and those same OpenAFS client releases - correct? With 7.3 (kernel 3.10.0-514.26.2.el7.x86_64) we actually used openafs = client 1.6.20.2, but with that combination this mount-within-afs thing = worked just fine. Thanks! /ragge From jason@rampaginggeek.com Sun Nov 26 14:40:42 2017 From: jason@rampaginggeek.com (Jason Edgecombe) Date: Sun, 26 Nov 2017 09:40:42 -0500 Subject: [OpenAFS-devel] Phasing out as buildbot admin Message-ID: Hi everyone, I want to let everyone  know that I'm phasing out as the buildbot admin. I haven't been doing much with it lately, and I'd rather hand it off to someone else that is more involved. Please direct all of your buildbot communication to Benjamin Kaduk and Michael Meffie . I'll still be around, but I'll mostly be lurking. It's been a pleasure to work with everyone, and I wish everyone the best. Sincerely, Jason From derek@ihtfp.com Sun Nov 26 15:04:53 2017 From: derek@ihtfp.com (Derek Atkins) Date: Sun, 26 Nov 2017 10:04:53 -0500 Subject: [OpenAFS-devel] Re: [OpenAFS] Phasing out as buildbot admin In-Reply-To: References: Message-ID: <22c8b41436a9947fd917a75501ea7f20.squirrel@mail2.ihtfp.org> Thank you for all your service! -derek On Sun, November 26, 2017 9:40 am, Jason Edgecombe wrote: > Hi everyone, > > I want to let everyone  know that I'm phasing out as the buildbot admin. > I haven't been doing much with it lately, and I'd rather hand it off to > someone else that is more involved. Please direct all of your buildbot > communication to Benjamin Kaduk and Michael Meffie > . I'll still be around, but I'll mostly be > lurking. It's been a pleasure to work with everyone, and I wish everyone > the best. > > Sincerely, > > Jason > > _______________________________________________ > OpenAFS-info mailing list > OpenAFS-info@openafs.org > https://lists.openafs.org/mailman/listinfo/openafs-info > -- Derek Atkins 617-623-3745 derek@ihtfp.com www.ihtfp.com Computer and Internet Security Consultant -- Derek Atkins 617-623-3745 derek@ihtfp.com www.ihtfp.com Computer and Internet Security Consultant From botsch@cnf.cornell.edu Sun Nov 26 17:44:24 2017 From: botsch@cnf.cornell.edu (Dave Botsch) Date: Sun, 26 Nov 2017 12:44:24 -0500 Subject: [OpenAFS-devel] Phasing out as buildbot admin In-Reply-To: References: Message-ID: <20171126174424.GX15420@cnf.cornell.edu> Jason, On behalf of the Foundation Board and of myself, thanks for all the time and work you've put in keeping the Buildbot system going. This effort has been invaluable to keeping the OpenAFS effort moving forward. Again, thank you! On Sun, Nov 26, 2017 at 09:40:42AM -0500, Jason Edgecombe wrote: > Hi everyone, >=20 > I want to let everyone=A0 know that I'm phasing out as the buildbot > admin. I haven't been doing much with it lately, and I'd rather hand > it off to someone else that is more involved. Please direct all of > your buildbot communication to Benjamin Kaduk and > Michael Meffie . I'll still be around, but > I'll mostly be lurking. It's been a pleasure to work with everyone, > and I wish everyone the best. >=20 > Sincerely, >=20 > Jason >=20 > _______________________________________________ > OpenAFS-devel mailing list > OpenAFS-devel@openafs.org > https://lists.openafs.org/mailman/listinfo/openafs-devel --=20 ******************************** David William Botsch Programmer/Analyst @CNFComputing botsch@cnf.cornell.edu ********************************