From gsgatlin@ncsu.edu Thu Feb 1 15:55:01 2018 From: gsgatlin@ncsu.edu (Gary Gatling) Date: Thu, 1 Feb 2018 10:55:01 -0500 Subject: [OpenAFS] Re: RHEL 7.5 beta / 3.10.0-830.el7.x86_66 kernel lock up In-Reply-To: References: Message-ID: --94eb2c0d9c0a58fc1e0564289e0d Content-Type: text/plain; charset="UTF-8" I don't get a kernel panic but instead I get: [gsgatlin@localhost ~]$ ls /afs/ ls: reading directory /afs/: Not a directory [gsgatlin@localhost ~]$ which is pretty weird. I don't see anything in the syslog about problems with openafs Feb 1 10:44:24 localhost systemd: Starting OpenAFS Client Service... Feb 1 10:44:24 localhost kernel: libafs: loading out-of-tree module taints kernel. Feb 1 10:44:24 localhost kernel: libafs: module license ' http://www.openafs.org/dl/license10.html' taints kernel. Feb 1 10:44:24 localhost kernel: Disabling lock debugging due to kernel taint Feb 1 10:44:24 localhost kernel: libafs: module verification failed: signature and/or required key missing - tainting kernel Feb 1 10:44:24 localhost kernel: Key type afs_pag registered Feb 1 10:44:24 localhost kernel: enabling dynamically allocated vcaches Feb 1 10:44:24 localhost kernel: Starting AFS cache scan...Memory cache: Allocating 1600 dcache entries...found 0 non-empty cache files (0%). Feb 1 10:44:24 localhost afsd: afsd: All AFS daemons started. Feb 1 10:44:24 localhost afsd: afsd: All AFS daemons started. Feb 1 10:44:24 localhost systemd: Started OpenAFS Client Service. I am using openafs-1.6.22 with correct-m4-conditionals-in-curses.m4.patch linux-test-for-vfswrite-rather-than-vfsread.patch linux-use-kernelread-kernelwrite-when-vfs-varian.patch from the arch linux distro in my rpm packages. Anyone know what ls: reading directory /afs/: Not a directory means and is there some way around it? Also, is 1.6.22.2 coming out soon? Thanks so much, On Wed, Jan 31, 2018 at 9:43 AM, Kodiak Firesmith wrote: > https://photos.app.goo.gl/WgPsSUCLK5ojxIuH3 > > > On Wed, Jan 31, 2018 at 9:41 AM, Kodiak Firesmith > wrote: > >> Folks, re-sending this because the first try never hit the list - perhaps >> mail with attachments are silently dropped or held for manual moderation? >> I'd originally attached an image of the stack trace. I'll host it and >> reply to this with a URL link in case that would also result in a drop or >> moderation. >> >> >> >> Anyhow: >> >> In testing the new RHEL 7.5 beta, we've discovered that hosts using AFS >> fail to boot after the upgrade, with Openafs 1.6.22.1 installed. >> >> We are wondering if some of the non-guaranteed kernel ABIs that OpenAFS >> uses might have changed with the latest kernel provided in RHEL 7. >> >> I've attached a picture of the trace. >> >> Anyone else kicking the tires on the new RHEL yet? >> >> Thanks! >> >> > --94eb2c0d9c0a58fc1e0564289e0d Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
I don't get a kernel panic but instead I get:

=
[gsgatlin@localhost ~]$ ls /afs/
ls: reading dire= ctory /afs/: Not a directory
[gsgatlin@localhost ~]$=C2=A0
<= /div>


which is pretty weird. I don't = see anything in the syslog about problems with openafs

=
Feb=C2=A0 1 10:44:24 localhost systemd: Starting OpenAFS Client S= ervice...
Feb=C2=A0 1 10:44:24 localhost kernel: libafs: loading = out-of-tree module taints kernel.
Feb=C2=A0 1 10:44:24 localhost = kernel: libafs: module license 'http://www.openafs.org/dl/license10.html' taints kern= el.
Feb=C2=A0 1 10:44:24 localhost kernel: Disabling lock debuggi= ng due to kernel taint
Feb=C2=A0 1 10:44:24 localhost kernel: lib= afs: module verification failed: signature and/or required key missing - ta= inting kernel
Feb=C2=A0 1 10:44:24 localhost kernel: Key type afs= _pag registered
Feb=C2=A0 1 10:44:24 localhost kernel: enabling d= ynamically allocated vcaches
Feb=C2=A0 1 10:44:24 localhost kerne= l: Starting AFS cache scan...Memory cache: Allocating 1600 dcache entries..= .found 0 non-empty cache files (0%).
Feb=C2=A0 1 10:44:24 localho= st afsd: afsd: All AFS daemons started.
Feb=C2=A0 1 10:44:24 loca= lhost afsd: afsd: All AFS daemons started.
Feb=C2=A0 1 10:44:24 l= ocalhost systemd: Started OpenAFS Client Service.

I am using=C2=A0openafs-1.6.22


with

correct-m4-conditionals-in-curses.m4.patch<= br>
linux-test-for-vfswrite-rather-than-vfsread.patch
linux-use-kernelread-kernelwrite-when-vfs-varian.patch
from the arch linux distro in my rpm packages.

Anyone know what=C2=A0

ls: reading direct= ory /afs/: Not a directory

means and is there = some way around it?

Also, is 1.6.22.2 coming out s= oon?

Thanks so much,

On Wed, Jan 31, 2018 at 9:43 AM, Ko= diak Firesmith <kfiresmith@gmail.com> wrote:

On Wed, Jan = 31, 2018 at 9:41 AM, Kodiak Firesmith <kfiresmith@gmail.com> wrote:
Folks, re-sen= ding this because the first try never hit the list - perhaps mail with atta= chments are silently dropped or held for manual moderation?=C2=A0 I'd o= riginally attached an image of the stack trace.=C2=A0 I'll host it and = reply to this with a=C2=A0 URL link in case that would also result in a dro= p or moderation.



Anyhow:= =C2=A0=C2=A0

In testing the new RHEL 7.5 beta, we&= #39;ve discovered that hosts using AFS fail to boot after the upgrade, with= Openafs 1.6.22.1 installed.=C2=A0=C2=A0

We are wondering if= some of the non-guaranteed kernel ABIs that OpenAFS uses might have change= d with the latest kernel provided in RHEL 7.=C2=A0=C2=A0

I&#= 39;ve attached a picture of the trace.

Anyone else kicking t= he tires on the new RHEL yet?

Thanks!



--94eb2c0d9c0a58fc1e0564289e0d-- From gsgatlin@ncsu.edu Thu Feb 1 15:58:00 2018 From: gsgatlin@ncsu.edu (Gary Gatling) Date: Thu, 1 Feb 2018 10:58:00 -0500 Subject: [OpenAFS] Re: RHEL 7.5 beta / 3.10.0-830.el7.x86_66 kernel lock up In-Reply-To: References: Message-ID: --001a1140372c0f0cd0056428a910 Content-Type: text/plain; charset="UTF-8" Ok. This gets weirder. Any directory under /afs says Not a directory. But I can read files like /afs/eos.ncsu.edu/software/inventory/software_inventory just fine. On Thu, Feb 1, 2018 at 10:55 AM, Gary Gatling wrote: > I don't get a kernel panic but instead I get: > > [gsgatlin@localhost ~]$ ls /afs/ > ls: reading directory /afs/: Not a directory > [gsgatlin@localhost ~]$ > > > which is pretty weird. I don't see anything in the syslog about problems > with openafs > > Feb 1 10:44:24 localhost systemd: Starting OpenAFS Client Service... > Feb 1 10:44:24 localhost kernel: libafs: loading out-of-tree module > taints kernel. > Feb 1 10:44:24 localhost kernel: libafs: module license ' > http://www.openafs.org/dl/license10.html' taints kernel. > Feb 1 10:44:24 localhost kernel: Disabling lock debugging due to kernel > taint > Feb 1 10:44:24 localhost kernel: libafs: module verification failed: > signature and/or required key missing - tainting kernel > Feb 1 10:44:24 localhost kernel: Key type afs_pag registered > Feb 1 10:44:24 localhost kernel: enabling dynamically allocated vcaches > Feb 1 10:44:24 localhost kernel: Starting AFS cache scan...Memory cache: > Allocating 1600 dcache entries...found 0 non-empty cache files (0%). > Feb 1 10:44:24 localhost afsd: afsd: All AFS daemons started. > Feb 1 10:44:24 localhost afsd: afsd: All AFS daemons started. > Feb 1 10:44:24 localhost systemd: Started OpenAFS Client Service. > > I am using openafs-1.6.22 > > > with > > correct-m4-conditionals-in-curses.m4.patch > linux-test-for-vfswrite-rather-than-vfsread.patch > linux-use-kernelread-kernelwrite-when-vfs-varian.patch > > from the arch linux distro in my rpm packages. > > Anyone know what > > ls: reading directory /afs/: Not a directory > > means and is there some way around it? > > Also, is 1.6.22.2 coming out soon? > > Thanks so much, > > On Wed, Jan 31, 2018 at 9:43 AM, Kodiak Firesmith > wrote: > >> https://photos.app.goo.gl/WgPsSUCLK5ojxIuH3 >> >> >> On Wed, Jan 31, 2018 at 9:41 AM, Kodiak Firesmith >> wrote: >> >>> Folks, re-sending this because the first try never hit the list - >>> perhaps mail with attachments are silently dropped or held for manual >>> moderation? I'd originally attached an image of the stack trace. I'll >>> host it and reply to this with a URL link in case that would also result >>> in a drop or moderation. >>> >>> >>> >>> Anyhow: >>> >>> In testing the new RHEL 7.5 beta, we've discovered that hosts using AFS >>> fail to boot after the upgrade, with Openafs 1.6.22.1 installed. >>> >>> We are wondering if some of the non-guaranteed kernel ABIs that OpenAFS >>> uses might have changed with the latest kernel provided in RHEL 7. >>> >>> I've attached a picture of the trace. >>> >>> Anyone else kicking the tires on the new RHEL yet? >>> >>> Thanks! >>> >>> >> > --001a1140372c0f0cd0056428a910 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Ok. This gets weirder. Any directory under /afs says=C2=A0= Not a directory. But I can read files like


jus= t fine.=C2=A0

On Thu, Feb 1, 2018 at 10:55 AM, Gary Gatling <= ;gsgatlin@ncsu.edu> wrote:
I d= on't get a kernel panic but instead I get:

[gsg= atlin@localhost ~]$ ls /afs/
ls: reading directory /afs/: Not a d= irectory
[gsgatlin@localhost ~]$=C2=A0

=

which is pretty weird. I don't see anything in the = syslog about problems with openafs

Feb=C2=A0 = 1 10:44:24 localhost systemd: Starting OpenAFS Client Service...
= Feb=C2=A0 1 10:44:24 localhost kernel: libafs: loading out-of-tree module t= aints kernel.
Feb=C2=A0 1 10:44:24 localhost kernel: Disabling lock debu= gging due to kernel taint
Feb=C2=A0 1 10:44:24 localhost kernel: = libafs: module verification failed: signature and/or required key missing -= tainting kernel
Feb=C2=A0 1 10:44:24 localhost kernel: Key type = afs_pag registered
Feb=C2=A0 1 10:44:24 localhost kernel: enablin= g dynamically allocated vcaches
Feb=C2=A0 1 10:44:24 localhost ke= rnel: Starting AFS cache scan...Memory cache: Allocating 1600 dcache entrie= s...found 0 non-empty cache files (0%).
Feb=C2=A0 1 10:44:24 loca= lhost afsd: afsd: All AFS daemons started.
Feb=C2=A0 1 10:44:24 l= ocalhost afsd: afsd: All AFS daemons started.
Feb=C2=A0 1 10:44:2= 4 localhost systemd: Started OpenAFS Client Service.

I am using=C2=A0openafs-1.6.22


=
with

correct-m4-conditionals-in-curses.m= 4.patch
linux-test-for-vfswrite-rather-than-vfsread.patc= h
linux-use-kernelread-kernelwrite-when-vfs-varian.= patch

from the arch linux distro in my rpm pac= kages.

Anyone know what=C2=A0

=
ls: reading directory /afs/: Not a directory

<= div>means and is there some way around it?

Also, i= s 1.6.22.2 coming out soon?

Thanks so much,
<= /div>
On Wed, Jan 31, 2018 at 9:43 AM, Kodiak Firesmi= th <kfiresmith@gmail.com> wrote:

On Wed, Jan 31, 2018 at 9:41 AM, Kodiak Firesmith <kf= iresmith@gmail.com> wrote:
=
Folks, re-sending this because the first try never hit the= list - perhaps mail with attachments are silently dropped or held for manu= al moderation?=C2=A0 I'd originally attached an image of the stack trac= e.=C2=A0 I'll host it and reply to this with a=C2=A0 URL link in case t= hat would also result in a drop or moderation.



Anyhow:=C2=A0=C2=A0

In testin= g the new RHEL 7.5 beta, we've discovered that hosts using AFS fail to = boot after the upgrade, with Openafs 1.6.22.1 installed.=C2=A0=C2=A0
<= div style=3D"color:rgb(34,34,34);font-family:arial,sans-serif;font-size:12.= 8px;font-style:normal;font-variant-ligatures:normal;font-variant-caps:norma= l;font-weight:400;letter-spacing:normal;text-align:start;text-indent:0px;te= xt-transform:none;white-space:normal;word-spacing:0px;background-color:rgb(= 255,255,255);text-decoration-style:initial;text-decoration-color:initial"><= br>
We are wondering if some of the non-guaranteed kernel ABIs that Op= enAFS uses might have changed with the latest kernel provided in RHEL 7.=C2= =A0=C2=A0

I've attached a picture of the trace.
Anyone else kicking the tires on the new RHEL yet?

=
= Thanks!




--001a1140372c0f0cd0056428a910-- From stephan.wiesand@desy.de Thu Feb 1 16:11:24 2018 From: stephan.wiesand@desy.de (Stephan Wiesand) Date: Thu, 1 Feb 2018 17:11:24 +0100 Subject: [OpenAFS] Re: RHEL 7.5 beta / 3.10.0-830.el7.x86_66 kernel lock up In-Reply-To: References: Message-ID: <98B9E358-A528-409E-B527-612B90DE904A@desy.de> Comparing the 1.6.22.2 module builds from the SL packaging, where the = kABI hashes of the used symbols are stored as a requirement, is seems = none of those hashes changed between -693 and -830. There are two differences in the configure results: -ac_cv_linux_header_sched_signal_h=3Dno +ac_cv_linux_header_sched_signal_h=3Dyes -ac_cv_linux_struct_file_operations_has_iterate=3Dno +ac_cv_linux_struct_file_operations_has_iterate=3Dyes And there's quite a bit of churn in include/linux.fs.h (and some in = key.h). > On 1. Feb 2018, at 16:58, Gary Gatling wrote: >=20 > Ok. This gets weirder. Any directory under /afs says Not a directory. = But I can read files like >=20 > /afs/eos.ncsu.edu/software/inventory/software_inventory >=20 > just fine.=20 >=20 > On Thu, Feb 1, 2018 at 10:55 AM, Gary Gatling = wrote: > I don't get a kernel panic but instead I get: >=20 > [gsgatlin@localhost ~]$ ls /afs/ > ls: reading directory /afs/: Not a directory > [gsgatlin@localhost ~]$=20 >=20 >=20 > which is pretty weird. I don't see anything in the syslog about = problems with openafs >=20 > Feb 1 10:44:24 localhost systemd: Starting OpenAFS Client Service... > Feb 1 10:44:24 localhost kernel: libafs: loading out-of-tree module = taints kernel. > Feb 1 10:44:24 localhost kernel: libafs: module license = 'http://www.openafs.org/dl/license10.html' taints kernel. > Feb 1 10:44:24 localhost kernel: Disabling lock debugging due to = kernel taint > Feb 1 10:44:24 localhost kernel: libafs: module verification failed: = signature and/or required key missing - tainting kernel > Feb 1 10:44:24 localhost kernel: Key type afs_pag registered > Feb 1 10:44:24 localhost kernel: enabling dynamically allocated = vcaches > Feb 1 10:44:24 localhost kernel: Starting AFS cache scan...Memory = cache: Allocating 1600 dcache entries...found 0 non-empty cache files = (0%). > Feb 1 10:44:24 localhost afsd: afsd: All AFS daemons started. > Feb 1 10:44:24 localhost afsd: afsd: All AFS daemons started. > Feb 1 10:44:24 localhost systemd: Started OpenAFS Client Service. >=20 > I am using openafs-1.6.22 >=20 >=20 > with >=20 > correct-m4-conditionals-in-curses.m4.patch > linux-test-for-vfswrite-rather-than-vfsread.patch > linux-use-kernelread-kernelwrite-when-vfs-varian.patch >=20 > from the arch linux distro in my rpm packages. >=20 > Anyone know what=20 >=20 > ls: reading directory /afs/: Not a directory >=20 > means and is there some way around it? >=20 > Also, is 1.6.22.2 coming out soon? >=20 > Thanks so much, >=20 > On Wed, Jan 31, 2018 at 9:43 AM, Kodiak Firesmith = wrote: > https://photos.app.goo.gl/WgPsSUCLK5ojxIuH3 >=20 >=20 > On Wed, Jan 31, 2018 at 9:41 AM, Kodiak Firesmith = wrote: > Folks, re-sending this because the first try never hit the list - = perhaps mail with attachments are silently dropped or held for manual = moderation? I'd originally attached an image of the stack trace. I'll = host it and reply to this with a URL link in case that would also = result in a drop or moderation. >=20 >=20 >=20 > Anyhow: =20 >=20 > In testing the new RHEL 7.5 beta, we've discovered that hosts using = AFS fail to boot after the upgrade, with Openafs 1.6.22.1 installed. =20 >=20 > We are wondering if some of the non-guaranteed kernel ABIs that = OpenAFS uses might have changed with the latest kernel provided in RHEL = 7. =20 >=20 > I've attached a picture of the trace. >=20 > Anyone else kicking the tires on the new RHEL yet? >=20 > Thanks! >=20 >=20 >=20 >=20 --=20 Stephan Wiesand DESY -DV- Platanenallee 6 15738 Zeuthen, Germany From mvanderw@nd.edu Thu Feb 1 16:13:31 2018 From: mvanderw@nd.edu (Matt Vander Werf) Date: Thu, 1 Feb 2018 11:13:31 -0500 Subject: [OpenAFS] Re: RHEL 7.5 beta / 3.10.0-830.el7.x86_66 kernel lock up In-Reply-To: References: Message-ID: --f4f5e807fe5005fc6d056428e3bf Content-Type: text/plain; charset="UTF-8" I'm also seeing the same issue as Gary on some RHEL 7.5 beta boxes running OpenAFS 1.6.22.1. Can't run ls under any /afs/.../.../etc directory, including in my AFS home directory when logged in as myself. [mvanderw@ ~]$ ls ls: reading directory .: Not a directory [mvanderw@ ~]$ ls ~ ls: reading directory /afs/crc.nd.edu/user/m/mvanderw: Not a directory [mvanderw@ ~]$ ls /afs/ ls: reading directory /afs/: Not a directory [mvanderw@ ~]$ ls /afs/crc.nd.edu ls: reading directory /afs/crc.nd.edu: Not a directory But no kernel panics here either. @Kodiak: Is it possible you were running a kmod-openafs from an older kernel? I compiled a new kmod-openafs RPM on a RHEL 7.5 beta system and it works well. I compiled all the OpenAFS packages from the source RPM on the RHEL 7.5 beta system itself and didn't run into any issues with the compile. Besides this, AFS seems to be running correctly with nothing in the logs indicating any problems (like Gary mentioned). Any idea what might be causing this? Some semantic changes like with the getcwd issue in RHEL 7.4? Thanks. -- Matt Vander Werf HPC System Administrator University of Notre Dame Center for Research Computing - Union Station 506 W. South Street South Bend, IN 46601 Phone: (574) 631-0692 On Thu, Feb 1, 2018 at 10:58 AM, Gary Gatling wrote: > Ok. This gets weirder. Any directory under /afs says Not a directory. But > I can read files like > > /afs/eos.ncsu.edu/software/inventory/software_inventory > > just fine. > > On Thu, Feb 1, 2018 at 10:55 AM, Gary Gatling wrote: > >> I don't get a kernel panic but instead I get: >> >> [gsgatlin@localhost ~]$ ls /afs/ >> ls: reading directory /afs/: Not a directory >> [gsgatlin@localhost ~]$ >> >> >> which is pretty weird. I don't see anything in the syslog about problems >> with openafs >> >> Feb 1 10:44:24 localhost systemd: Starting OpenAFS Client Service... >> Feb 1 10:44:24 localhost kernel: libafs: loading out-of-tree module >> taints kernel. >> Feb 1 10:44:24 localhost kernel: libafs: module license ' >> http://www.openafs.org/dl/license10.html' taints kernel. >> Feb 1 10:44:24 localhost kernel: Disabling lock debugging due to kernel >> taint >> Feb 1 10:44:24 localhost kernel: libafs: module verification failed: >> signature and/or required key missing - tainting kernel >> Feb 1 10:44:24 localhost kernel: Key type afs_pag registered >> Feb 1 10:44:24 localhost kernel: enabling dynamically allocated vcaches >> Feb 1 10:44:24 localhost kernel: Starting AFS cache scan...Memory cache: >> Allocating 1600 dcache entries...found 0 non-empty cache files (0%). >> Feb 1 10:44:24 localhost afsd: afsd: All AFS daemons started. >> Feb 1 10:44:24 localhost afsd: afsd: All AFS daemons started. >> Feb 1 10:44:24 localhost systemd: Started OpenAFS Client Service. >> >> I am using openafs-1.6.22 >> >> >> with >> >> correct-m4-conditionals-in-curses.m4.patch >> linux-test-for-vfswrite-rather-than-vfsread.patch >> linux-use-kernelread-kernelwrite-when-vfs-varian.patch >> >> from the arch linux distro in my rpm packages. >> >> Anyone know what >> >> ls: reading directory /afs/: Not a directory >> >> means and is there some way around it? >> >> Also, is 1.6.22.2 coming out soon? >> >> Thanks so much, >> >> On Wed, Jan 31, 2018 at 9:43 AM, Kodiak Firesmith >> wrote: >> >>> https://photos.app.goo.gl/WgPsSUCLK5ojxIuH3 >>> >>> >>> On Wed, Jan 31, 2018 at 9:41 AM, Kodiak Firesmith >>> wrote: >>> >>>> Folks, re-sending this because the first try never hit the list - >>>> perhaps mail with attachments are silently dropped or held for manual >>>> moderation? I'd originally attached an image of the stack trace. I'll >>>> host it and reply to this with a URL link in case that would also result >>>> in a drop or moderation. >>>> >>>> >>>> >>>> Anyhow: >>>> >>>> In testing the new RHEL 7.5 beta, we've discovered that hosts using AFS >>>> fail to boot after the upgrade, with Openafs 1.6.22.1 installed. >>>> >>>> We are wondering if some of the non-guaranteed kernel ABIs that OpenAFS >>>> uses might have changed with the latest kernel provided in RHEL 7. >>>> >>>> I've attached a picture of the trace. >>>> >>>> Anyone else kicking the tires on the new RHEL yet? >>>> >>>> Thanks! >>>> >>>> >>> >> > --f4f5e807fe5005fc6d056428e3bf Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
I'm also seeing the same issue as Gary on so= me RHEL 7.5 beta boxes running OpenAFS 1.6.22.1. Can't run ls under any= /afs/.../.../etc directory, including in my AFS home directory when logged= in as myself.

[mvanderw@<host> ~]$ ls
ls: reading director= y .: Not a directory
[mvanderw@<host> ~]$ ls ~
ls: reading dire= ctory /afs/crc.nd.edu/user/m/= mvanderw: Not a directory

[mvanderw@<host> ~]$ ls /afs/ls: reading directory /afs/: Not a directory
[mvanderw@<host> ~]$= ls /afs/crc.nd.edu
ls: reading direct= ory /afs/crc.nd.edu: Not a directory
<= br>
But no kernel panics here either.

@Kodiak: Is it possible y= ou were running a kmod-openafs from an older kernel? I compiled a new kmod-= openafs RPM on a RHEL 7.5 beta system and it works well.

I compiled= all the OpenAFS packages from the source RPM on the RHEL 7.5 beta system itself and didn't run into any issues with the compile.
Besides this, AFS seems to be running correctly with nothing in the l= ogs indicating any problems (like Gary mentioned).

Any id= ea what might be causing this? Some semantic changes like with the getcwd i= ssue in RHEL 7.4?

Thanks.

--
Matt Vander Werf
HPC System Administrator
University of N= otre Dame
Center for Research Computing - Union Station
506 W. South = Street
South Bend, IN 46601
Phone: (574) 631-0692

On Thu, Feb 1, 2018 at 10:58 AM, Gary Gatlin= g <gsgatlin@ncsu.edu> wrote:
Ok. This gets weirder. Any directory= under /afs says=C2=A0Not a directory. But I can read files like
On Thu, Feb 1, 2018 at 10:55 AM, Gary Gatling = <gsgatlin@ncsu.edu> wrote:
I don't get a kernel panic but ins= tead I get:

[gsgatlin@localhost ~]$ ls /afs/
<= div>ls: reading directory /afs/: Not a directory
[gsgatlin@localh= ost ~]$=C2=A0


which is pretty= weird. I don't see anything in the syslog about problems with openafs<= /div>

Feb=C2=A0 1 10:44:24 localhost systemd: Start= ing OpenAFS Client Service...
Feb=C2=A0 1 10:44:24 localhost kern= el: libafs: loading out-of-tree module taints kernel.
Feb=C2=A0 1= 10:44:24 localhost kernel: libafs: module license 'http://www.openafs.org/= dl/license10.html' taints kernel.
Feb=C2=A0 1 10:44:= 24 localhost kernel: Disabling lock debugging due to kernel taint
Feb=C2=A0 1 10:44:24 localhost kernel: libafs: module verification failed:= signature and/or required key missing - tainting kernel
Feb=C2= =A0 1 10:44:24 localhost kernel: Key type afs_pag registered
Feb= =C2=A0 1 10:44:24 localhost kernel: enabling dynamically allocated vcaches<= /div>
Feb=C2=A0 1 10:44:24 localhost kernel: Starting AFS cache scan...= Memory cache: Allocating 1600 dcache entries...found 0 non-empty cache file= s (0%).
Feb=C2=A0 1 10:44:24 localhost afsd: afsd: All AFS daemon= s started.
Feb=C2=A0 1 10:44:24 localhost afsd: afsd: All AFS dae= mons started.
Feb=C2=A0 1 10:44:24 localhost systemd: Started Ope= nAFS Client Service.

I am using=C2=A0openafs= -1.6.22


with

correct-m4-conditionals-in-curses.m4.patch
linux-test= -for-vfswrite-rather-than-vfsread.patch
linux-use-kernel= read-kernelwrite-when-vfs-varian.patch

fr= om the arch linux distro in my rpm packages.

Anyon= e know what=C2=A0

ls: reading directory /afs/: Not= a directory

means and is there some way aroun= d it?

Also, is 1.6.22.2 coming out soon?

Thanks so much,

On Wed, Jan 31, 2018 at 9:43 A= M, Kodiak Firesmith <kfiresmith@gmail.com> wrote:

On Wed, Jan 31, 2018 at 9:41 AM, Kodiak F= iresmith <kfiresmith@gmail.com> wrote:
Folks, re-sending this = because the first try never hit the list - perhaps mail with attachments ar= e silently dropped or held for manual moderation?=C2=A0 I'd originally = attached an image of the stack trace.=C2=A0 I'll host it and reply to t= his with a=C2=A0 URL link in case that would also result in a drop or moder= ation.



Anyhow:=C2=A0=C2=A0=

In testing the new RHEL 7.5 beta, we've disco= vered that hosts using AFS fail to boot after the upgrade, with Openafs 1.6= .22.1 installed.=C2=A0=C2=A0

We are wondering if some of the= non-guaranteed kernel ABIs that OpenAFS uses might have changed with the l= atest kernel provided in RHEL 7.=C2=A0=C2=A0

I've attach= ed a picture of the trace.

Anyone else kicking the tires on = the new RHEL yet?

Thanks!





--f4f5e807fe5005fc6d056428e3bf-- From kfiresmith@gmail.com Thu Feb 1 16:21:59 2018 From: kfiresmith@gmail.com (Kodiak Firesmith) Date: Thu, 1 Feb 2018 11:21:59 -0500 Subject: [OpenAFS] Re: RHEL 7.5 beta / 3.10.0-830.el7.x86_66 kernel lock up In-Reply-To: References: Message-ID: --f403045c22c4c8afe4056428fe08 Content-Type: text/plain; charset="UTF-8" Thanks for the replies! We're using DKMS and expected the dynamic re-roll of the kmods to work like any other kernel upgrade but that doesn't seem to be the case. I need to dig deeper, especially now that there is evidence that it's just our site. Thanks a bunch everyone. - Kodiak On Thu, Feb 1, 2018 at 11:13 AM, Matt Vander Werf wrote: > I'm also seeing the same issue as Gary on some RHEL 7.5 beta boxes running > OpenAFS 1.6.22.1. Can't run ls under any /afs/.../.../etc directory, > including in my AFS home directory when logged in as myself. > > [mvanderw@ ~]$ ls > ls: reading directory .: Not a directory > [mvanderw@ ~]$ ls ~ > ls: reading directory /afs/crc.nd.edu/user/m/mvanderw: Not a directory > > [mvanderw@ ~]$ ls /afs/ > ls: reading directory /afs/: Not a directory > [mvanderw@ ~]$ ls /afs/crc.nd.edu > ls: reading directory /afs/crc.nd.edu: Not a directory > > But no kernel panics here either. > > @Kodiak: Is it possible you were running a kmod-openafs from an older > kernel? I compiled a new kmod-openafs RPM on a RHEL 7.5 beta system and it > works well. > > I compiled all the OpenAFS packages from the source RPM on the RHEL 7.5 > beta system itself and didn't run into any issues with the compile. > > Besides this, AFS seems to be running correctly with nothing in the logs > indicating any problems (like Gary mentioned). > > Any idea what might be causing this? Some semantic changes like with the > getcwd issue in RHEL 7.4? > > Thanks. > > -- > Matt Vander Werf > HPC System Administrator > University of Notre Dame > Center for Research Computing - Union Station > 506 W. South Street > > South Bend, IN 46601 > > Phone: (574) 631-0692 > > On Thu, Feb 1, 2018 at 10:58 AM, Gary Gatling wrote: > >> Ok. This gets weirder. Any directory under /afs says Not a directory. But >> I can read files like >> >> /afs/eos.ncsu.edu/software/inventory/software_inventory >> >> just fine. >> >> On Thu, Feb 1, 2018 at 10:55 AM, Gary Gatling wrote: >> >>> I don't get a kernel panic but instead I get: >>> >>> [gsgatlin@localhost ~]$ ls /afs/ >>> ls: reading directory /afs/: Not a directory >>> [gsgatlin@localhost ~]$ >>> >>> >>> which is pretty weird. I don't see anything in the syslog about problems >>> with openafs >>> >>> Feb 1 10:44:24 localhost systemd: Starting OpenAFS Client Service... >>> Feb 1 10:44:24 localhost kernel: libafs: loading out-of-tree module >>> taints kernel. >>> Feb 1 10:44:24 localhost kernel: libafs: module license ' >>> http://www.openafs.org/dl/license10.html' taints kernel. >>> Feb 1 10:44:24 localhost kernel: Disabling lock debugging due to kernel >>> taint >>> Feb 1 10:44:24 localhost kernel: libafs: module verification failed: >>> signature and/or required key missing - tainting kernel >>> Feb 1 10:44:24 localhost kernel: Key type afs_pag registered >>> Feb 1 10:44:24 localhost kernel: enabling dynamically allocated vcaches >>> Feb 1 10:44:24 localhost kernel: Starting AFS cache scan...Memory >>> cache: Allocating 1600 dcache entries...found 0 non-empty cache files (0%). >>> Feb 1 10:44:24 localhost afsd: afsd: All AFS daemons started. >>> Feb 1 10:44:24 localhost afsd: afsd: All AFS daemons started. >>> Feb 1 10:44:24 localhost systemd: Started OpenAFS Client Service. >>> >>> I am using openafs-1.6.22 >>> >>> >>> with >>> >>> correct-m4-conditionals-in-curses.m4.patch >>> linux-test-for-vfswrite-rather-than-vfsread.patch >>> linux-use-kernelread-kernelwrite-when-vfs-varian.patch >>> >>> from the arch linux distro in my rpm packages. >>> >>> Anyone know what >>> >>> ls: reading directory /afs/: Not a directory >>> >>> means and is there some way around it? >>> >>> Also, is 1.6.22.2 coming out soon? >>> >>> Thanks so much, >>> >>> On Wed, Jan 31, 2018 at 9:43 AM, Kodiak Firesmith >>> wrote: >>> >>>> https://photos.app.goo.gl/WgPsSUCLK5ojxIuH3 >>>> >>>> >>>> On Wed, Jan 31, 2018 at 9:41 AM, Kodiak Firesmith >>> > wrote: >>>> >>>>> Folks, re-sending this because the first try never hit the list - >>>>> perhaps mail with attachments are silently dropped or held for manual >>>>> moderation? I'd originally attached an image of the stack trace. I'll >>>>> host it and reply to this with a URL link in case that would also result >>>>> in a drop or moderation. >>>>> >>>>> >>>>> >>>>> Anyhow: >>>>> >>>>> In testing the new RHEL 7.5 beta, we've discovered that hosts using >>>>> AFS fail to boot after the upgrade, with Openafs 1.6.22.1 installed. >>>>> >>>>> We are wondering if some of the non-guaranteed kernel ABIs that >>>>> OpenAFS uses might have changed with the latest kernel provided in RHEL 7. >>>>> >>>>> I've attached a picture of the trace. >>>>> >>>>> Anyone else kicking the tires on the new RHEL yet? >>>>> >>>>> Thanks! >>>>> >>>>> >>>> >>> >> > --f403045c22c4c8afe4056428fe08 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Thanks for the replies!=C2=A0

We're= using DKMS and expected the dynamic re-roll of the kmods to work like any = other kernel upgrade but that doesn't seem to be the case.=C2=A0 I need= to dig deeper, especially now that there is evidence that it's just ou= r site.=C2=A0=C2=A0

Thanks a bunch everyone.
=

=C2=A0- Kodiak
On Thu, Feb 1, 2018 at 11:13 AM, Matt Vander W= erf <mvanderw@nd.edu> wrote:
I'm also seeing the same issue as Gary on s= ome RHEL 7.5 beta boxes running OpenAFS 1.6.22.1. Can't run ls under an= y /afs/.../.../etc directory, including in my AFS home directory when logge= d in as myself.

[mvanderw@<host> ~]$ ls
ls: reading directo= ry .: Not a directory
[mvanderw@<host> ~]$ ls ~
ls: reading dir= ectory /afs/crc.nd.edu/user/m/mvanderw: Not a directory

[mvanderw@<= host> ~]$ ls /afs/
ls: reading directory /afs/: Not = a directory
[mvanderw@<host> ~]$ ls /afs/crc.nd.edu
ls: reading directory /afs/= crc.nd.edu: Not a direc= tory

But no kernel panics here either.

@Kodiak: Is it p= ossible you were running a kmod-openafs from an older kernel? I compiled a = new kmod-openafs RPM on a RHEL 7.5 beta system and it works well.

I= compiled all the OpenAFS packages from the source RPM on the RHEL 7.5 beta system itself and didn't run into any issues with the compile.
Besides this, AFS seems to be running correctly with nothing in the l= ogs indicating any problems (like Gary mentioned).

Any id= ea what might be causing this? Some semantic changes like with the getcwd i= ssue in RHEL 7.4?

Thanks.

--
Matt Vander Werf
HPC System Administrat= or
University of Notre Dame
Center for Research Computing - Union Sta= tion
506 W. South Street
= South Bend, IN 46601
P= hone: (574) 631-0692

On Thu, Feb 1, 2018 at 10:58 AM, Gary Gatlin= g <gsgatlin@ncsu.edu> wrote:
Ok. This gets weirder. Any directory= under /afs says=C2=A0Not a directory. But I can read files like

On T= hu, Feb 1, 2018 at 10:55 AM, Gary Gatling <gsgatlin@ncsu.edu> wrote:
I don't get a kernel panic but instead I get:

[gsgatlin@localhost ~]$ ls /afs/
ls: reading directory /afs/: = Not a directory
[gsgatlin@localhost ~]$=C2=A0

which is pretty weird. I don't see anything = in the syslog about problems with openafs

Feb= =C2=A0 1 10:44:24 localhost systemd: Starting OpenAFS Client Service...
Feb=C2=A0 1 10:44:24 localhost kernel: libafs: loading out-of-tree m= odule taints kernel.
Feb=C2=A0 1 10:44:24 localhost kernel: libaf= s: module license 'http://www.openafs.org/dl/license10.html' t= aints kernel.
Feb=C2=A0 1 10:44:24 localhost kernel: Disabling lo= ck debugging due to kernel taint
Feb=C2=A0 1 10:44:24 localhost k= ernel: libafs: module verification failed: signature and/or required key mi= ssing - tainting kernel
Feb=C2=A0 1 10:44:24 localhost kernel: Ke= y type afs_pag registered
Feb=C2=A0 1 10:44:24 localhost kernel: = enabling dynamically allocated vcaches
Feb=C2=A0 1 10:44:24 local= host kernel: Starting AFS cache scan...Memory cache: Allocating 1600 dcache= entries...found 0 non-empty cache files (0%).
Feb=C2=A0 1 10:44:= 24 localhost afsd: afsd: All AFS daemons started.
Feb=C2=A0 1 10:= 44:24 localhost afsd: afsd: All AFS daemons started.
Feb=C2=A0 1 = 10:44:24 localhost systemd: Started OpenAFS Client Service.

<= /div>
I am using=C2=A0openafs-1.6.22

with

correct-m4-conditionals-in-curses.m4.patch
linux-test-for-vfswrite-rather-than-vfsre= ad.patch
linux-use-kernelread-kernelwrite-when-vfs-varia= n.patch

from the arch linux distro in my rpm p= ackages.

Anyone know what=C2=A0

ls: reading directory /afs/: Not a directory

means and is there some way around it?

Also,= is 1.6.22.2 coming out soon?

Thanks so much,

On Wed, Jan 31, 2018 at = 9:43 AM, Kodiak Firesmith <kfiresmith@gmail.com> wrote:

On Wed, Jan 31, 2018 at 9:41 AM, Kodiak Firesmith &= lt;kfiresmith@gma= il.com> wrote:
Folks, re-sending this because the first try never = hit the list - perhaps mail with attachments are silently dropped or held f= or manual moderation?=C2=A0 I'd originally attached an image of the sta= ck trace.=C2=A0 I'll host it and reply to this with a=C2=A0 URL link in= case that would also result in a drop or moderation.


Anyhow:=C2=A0=C2=A0

In= testing the new RHEL 7.5 beta, we've discovered that hosts using AFS f= ail to boot after the upgrade, with Openafs 1.6.22.1 installed.=C2=A0=C2=A0=

We are wondering if some of the non-guaranteed kernel ABIs = that OpenAFS uses might have changed with the latest kernel provided in RHE= L 7.=C2=A0=C2=A0

I've attached a picture of the trace.

Anyone else kicking the tires on the new RHEL yet?

<= /div>
Thanks!






--f403045c22c4c8afe4056428fe08-- From kfiresmith@gmail.com Thu Feb 1 20:26:58 2018 From: kfiresmith@gmail.com (Kodiak Firesmith) Date: Thu, 1 Feb 2018 15:26:58 -0500 Subject: [OpenAFS] Re: RHEL 7.5 beta / 3.10.0-830.el7.x86_66 kernel lock up In-Reply-To: References: Message-ID: --94eb2c199088ef213605642c6ab2 Content-Type: text/plain; charset="UTF-8" I just rebuilt off-the-shelf RPMs based off of http://www.openafs.org/dl/openafs/1.6.22.1/openafs-1.6.22.1-1.src.rpm thinking maybe we had some historical patch in our build area that might be causing the problem, but alas, even the off-the-shelf RPMs cause a full wedge and reboot when openafs-client.service starts up. - Kodiak On Thu, Feb 1, 2018 at 1:23 PM, Kodiak Firesmith wrote: > Hello Rich! > It's a Dell Optiplex 7020 with an Intel i7-4790. > > Thanks! > - Kodiak > > On Thu, Feb 1, 2018 at 1:20 PM, Rich Sudlow wrote: > >> On 01/31/2018 09:43 AM, Kodiak Firesmith wrote: >> >>> https://photos.app.goo.gl/WgPsSUCLK5ojxIuH3 >>> >> >> Greetings >> >> What processor..etc is this machine? >> >> Rich >> >> >> >>> >>> On Wed, Jan 31, 2018 at 9:41 AM, Kodiak Firesmith >> > wrote: >>> >>> Folks, re-sending this because the first try never hit the list - >>> perhaps >>> mail with attachments are silently dropped or held for manual >>> moderation? I'd originally attached an image of the stack trace. I'll >>> host it and reply >>> to this with a URL link in case that would also result in a drop or >>> moderation. >>> >>> >>> >>> Anyhow: >>> >>> In testing the new RHEL 7.5 beta, we've discovered that hosts using >>> AFS fail >>> to boot after the upgrade, with Openafs 1.6.22.1 installed. >>> >>> We are wondering if some of the non-guaranteed kernel ABIs that >>> OpenAFS uses >>> might have changed with the latest kernel provided in RHEL 7. >>> >>> I've attached a picture of the trace. >>> >>> Anyone else kicking the tires on the new RHEL yet? >>> >>> Thanks! >>> >>> >>> >> >> -- >> Rich Sudlow >> University of Notre Dame >> Center for Research Computing - Union Station >> 506 W. South St >> South Bend, In 46601 >> >> (574) 631-7258 (office) >> (574) 807-1046 (cell) >> > > --94eb2c199088ef213605642c6ab2 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
I just rebuilt off-the-shelf RPMs based off of=C2=A0http://www.openafs.org/dl/openafs/1.6.22.1/openafs-1.6.22.1-1.src.rpm = thinking maybe we had some historical patch in our build area that might be= causing the problem, but alas, even the off-the-shelf RPMs cause a full we= dge and reboot when openafs-client.service starts up.=C2=A0=C2=A0

<= /div>
=C2=A0- Kodiak

On Thu, Feb 1, 2018 at 1:23 PM, Kodiak Firesmith <kf= iresmith@gmail.com> wrote:
=
Hello Rich!
It's a Dell Optiplex 7020 with an Inte= l i7-4790.

Thanks!
=C2=A0- Kodiak

On Thu, Feb 1, 2018 at 1:20 PM, Rich Sudlow <rich@nd.edu> wrote:
On 01/31/2018 09:43 AM, Kodi= ak Firesmith wrote:
https://photos.app.goo.gl/WgPsSUCLK5ojxIuH3

Greetings

What processor..etc is this machine?

Rich




On Wed, Jan 31, 2018 at 9:41 AM, Kodiak Firesmith <kfiresmith@gmail.com <mailto:kfiresmith@gmail.co= m>> wrote:

=C2=A0 =C2=A0 Folks, re-sending this because the first try never hit the li= st - perhaps
=C2=A0 =C2=A0 mail with attachments are silently dropped or held for manual= moderation?=C2=A0 =C2=A0 =C2=A0I'd originally attached an image of the= stack trace.=C2=A0 I'll host it and reply
=C2=A0 =C2=A0 to this with a=C2=A0 URL link in case that would also result = in a drop or moderation.



=C2=A0 =C2=A0 Anyhow:

=C2=A0 =C2=A0 In testing the new RHEL 7.5 beta, we've discovered that h= osts using AFS fail
=C2=A0 =C2=A0 to boot after the upgrade, with Openafs 1.6.22.1 installed.
=C2=A0 =C2=A0 We are wondering if some of the non-guaranteed kernel ABIs th= at OpenAFS uses
=C2=A0 =C2=A0 might have changed with the latest kernel provided in RHEL 7.=

=C2=A0 =C2=A0 I've attached a picture of the trace.

=C2=A0 =C2=A0 Anyone else kicking the tires on the new RHEL yet?

=C2=A0 =C2=A0 Thanks!




--
Rich Sudlow
University of Notre Dame
Center for Research Computing - Union Station
506 W. South St
South Bend, In 46601

(574) 631-7258=C2=A0(office)
(574) 807-1046=C2=A0(cell)


--94eb2c199088ef213605642c6ab2-- From gsgatlin@ncsu.edu Thu Feb 1 22:00:14 2018 From: gsgatlin@ncsu.edu (Gary Gatling) Date: Thu, 1 Feb 2018 17:00:14 -0500 Subject: [OpenAFS] Re: RHEL 7.5 beta / 3.10.0-830.el7.x86_66 kernel lock up In-Reply-To: <98B9E358-A528-409E-B527-612B90DE904A@desy.de> References: <98B9E358-A528-409E-B527-612B90DE904A@desy.de> Message-ID: --94eb2c07464a77f61e05642db8e9 Content-Type: text/plain; charset="UTF-8" I tried testing a work in progress 1.6.22.2 on rhel 7.5 beta by doing git clone git://git.openafs.org/openafs.git cd openafs git checkout remotes/origin/openafs-stable-1_6_x HEAD is now at d25c8e8... Make OpenAFS 1.6.22.2 But it seems to have the same problems with directories so I guess further changes will need to be made to get it to work on rhel 7.5 kernel. Not a kernel hacker so I'll wait to see what you guys come up with. :) Thanks, On Thu, Feb 1, 2018 at 11:11 AM, Stephan Wiesand wrote: > Comparing the 1.6.22.2 module builds from the SL packaging, where the kABI > hashes of the used symbols are stored as a requirement, is seems none of > those hashes changed between -693 and -830. > > There are two differences in the configure results: > > -ac_cv_linux_header_sched_signal_h=no > +ac_cv_linux_header_sched_signal_h=yes > > -ac_cv_linux_struct_file_operations_has_iterate=no > +ac_cv_linux_struct_file_operations_has_iterate=yes > > And there's quite a bit of churn in include/linux.fs.h (and some in key.h). > > > On 1. Feb 2018, at 16:58, Gary Gatling wrote: > > > > Ok. This gets weirder. Any directory under /afs says Not a directory. > But I can read files like > > > > /afs/eos.ncsu.edu/software/inventory/software_inventory > > > > just fine. > > > > On Thu, Feb 1, 2018 at 10:55 AM, Gary Gatling wrote: > > I don't get a kernel panic but instead I get: > > > > [gsgatlin@localhost ~]$ ls /afs/ > > ls: reading directory /afs/: Not a directory > > [gsgatlin@localhost ~]$ > > > > > > which is pretty weird. I don't see anything in the syslog about problems > with openafs > > > > Feb 1 10:44:24 localhost systemd: Starting OpenAFS Client Service... > > Feb 1 10:44:24 localhost kernel: libafs: loading out-of-tree module > taints kernel. > > Feb 1 10:44:24 localhost kernel: libafs: module license ' > http://www.openafs.org/dl/license10.html' taints kernel. > > Feb 1 10:44:24 localhost kernel: Disabling lock debugging due to kernel > taint > > Feb 1 10:44:24 localhost kernel: libafs: module verification failed: > signature and/or required key missing - tainting kernel > > Feb 1 10:44:24 localhost kernel: Key type afs_pag registered > > Feb 1 10:44:24 localhost kernel: enabling dynamically allocated vcaches > > Feb 1 10:44:24 localhost kernel: Starting AFS cache scan...Memory > cache: Allocating 1600 dcache entries...found 0 non-empty cache files (0%). > > Feb 1 10:44:24 localhost afsd: afsd: All AFS daemons started. > > Feb 1 10:44:24 localhost afsd: afsd: All AFS daemons started. > > Feb 1 10:44:24 localhost systemd: Started OpenAFS Client Service. > > > > I am using openafs-1.6.22 > > > > > > with > > > > correct-m4-conditionals-in-curses.m4.patch > > linux-test-for-vfswrite-rather-than-vfsread.patch > > linux-use-kernelread-kernelwrite-when-vfs-varian.patch > > > > from the arch linux distro in my rpm packages. > > > > Anyone know what > > > > ls: reading directory /afs/: Not a directory > > > > means and is there some way around it? > > > > Also, is 1.6.22.2 coming out soon? > > > > Thanks so much, > > > > On Wed, Jan 31, 2018 at 9:43 AM, Kodiak Firesmith > wrote: > > https://photos.app.goo.gl/WgPsSUCLK5ojxIuH3 > > > > > > On Wed, Jan 31, 2018 at 9:41 AM, Kodiak Firesmith > wrote: > > Folks, re-sending this because the first try never hit the list - > perhaps mail with attachments are silently dropped or held for manual > moderation? I'd originally attached an image of the stack trace. I'll > host it and reply to this with a URL link in case that would also result > in a drop or moderation. > > > > > > > > Anyhow: > > > > In testing the new RHEL 7.5 beta, we've discovered that hosts using AFS > fail to boot after the upgrade, with Openafs 1.6.22.1 installed. > > > > We are wondering if some of the non-guaranteed kernel ABIs that OpenAFS > uses might have changed with the latest kernel provided in RHEL 7. > > > > I've attached a picture of the trace. > > > > Anyone else kicking the tires on the new RHEL yet? > > > > Thanks! > > > > > > > > > > -- > Stephan Wiesand > DESY -DV- > Platanenallee 6 > 15738 Zeuthen, Germany > > > > _______________________________________________ > OpenAFS-info mailing list > OpenAFS-info@openafs.org > https://lists.openafs.org/mailman/listinfo/openafs-info > --94eb2c07464a77f61e05642db8e9 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
I tried testing a work in progress 1.6.22.2 on rhel 7.5 be= ta by doing

cd = openafs
git checkout remotes/origin/openafs-stable-1_6_x
HEAD is now at d25c8e8... Make OpenAFS 1.6.22.2

<= div>
But it seems to have the same problems with directories = so I guess further changes will need to be made to get it to work on rhel 7= .5 kernel. Not a kernel hacker so I'll wait to see what you guys come u= p with. :)

Thanks,

On Thu, Feb 1, 2018 at 11:11 AM, Step= han Wiesand <stephan.wiesand@desy.de> wrote:
Comparing the 1.6.22.2 module builds from the SL p= ackaging, where the kABI hashes of the used symbols are stored as a require= ment, is seems none of those hashes changed between -693 and -830.

There are two differences in the configure results:

-ac_cv_linux_header_sched_signal_h=3Dno
+ac_cv_linux_header_sched_signal_h=3Dyes

-ac_cv_linux_struct_file_operations_has_iterate=3Dno
+ac_cv_linux_struct_file_operations_has_iterate=3Dyes

And there's quite a bit of churn in include/linux.fs.h (and some in key= .h).

> On 1. Feb 2018, at 16:58, Gary Gatling <gsgatlin@ncsu.edu> wrote:
>
> Ok. This gets weirder. Any directory under /afs says Not a directory. = But I can read files like
>
> /afs/eos.ncsu.edu/software/invent= ory/software_inventory
>
> just fine.
>
> On Thu, Feb 1, 2018 at 10:55 AM, Gary Gatling <gsgatlin@ncsu.edu> wrote:
> I don't get a kernel panic but instead I get:
>
> [gsgatlin@localhost ~]$ ls /afs/
> ls: reading directory /afs/: Not a directory
> [gsgatlin@localhost ~]$
>
>
> which is pretty weird. I don't see anything in the syslog about pr= oblems with openafs
>
> Feb=C2=A0 1 10:44:24 localhost systemd: Starting OpenAFS Client Servic= e...
> Feb=C2=A0 1 10:44:24 localhost kernel: libafs: loading out-of-tree mod= ule taints kernel.
> Feb=C2=A0 1 10:44:24 localhost kernel: libafs: module license 'http://www.openafs.org/dl/license10.html' taints k= ernel.
> Feb=C2=A0 1 10:44:24 localhost kernel: Disabling lock debugging due to= kernel taint
> Feb=C2=A0 1 10:44:24 localhost kernel: libafs: module verification fai= led: signature and/or required key missing - tainting kernel
> Feb=C2=A0 1 10:44:24 localhost kernel: Key type afs_pag registered
> Feb=C2=A0 1 10:44:24 localhost kernel: enabling dynamically allocated = vcaches
> Feb=C2=A0 1 10:44:24 localhost kernel: Starting AFS cache scan...Memor= y cache: Allocating 1600 dcache entries...found 0 non-empty cache files (0%= ).
> Feb=C2=A0 1 10:44:24 localhost afsd: afsd: All AFS daemons started. > Feb=C2=A0 1 10:44:24 localhost afsd: afsd: All AFS daemons started. > Feb=C2=A0 1 10:44:24 localhost systemd: Started OpenAFS Client Service= .
>
> I am using openafs-1.6.22
>
>
> with
>
> correct-m4-conditionals-in-curses.m4.patch
> linux-test-for-vfswrite-rather-than-vfsread.patch
> linux-use-kernelread-kernelwrite-when-vfs-varian.patch
>
> from the arch linux distro in my rpm packages.
>
> Anyone know what
>
> ls: reading directory /afs/: Not a directory
>
> means and is there some way around it?
>
> Also, is 1.6.22.2 coming out soon?
>
> Thanks so much,
>
> On Wed, Jan 31, 2018 at 9:43 AM, Kodiak Firesmith <kfiresmith@gmail.com> wrote:
> https://photos.app.goo.gl/WgPsSUCLK5ojxIuH3=
>
>
> On Wed, Jan 31, 2018 at 9:41 AM, Kodiak Firesmith <kfiresmith@gmail.com> wrote:
> Folks, re-sending this because the first try never hit the list - perh= aps mail with attachments are silently dropped or held for manual moderatio= n?=C2=A0 I'd originally attached an image of the stack trace.=C2=A0 I&#= 39;ll host it and reply to this with a=C2=A0 URL link in case that would al= so result in a drop or moderation.
>
>
>
> Anyhow:
>
> In testing the new RHEL 7.5 beta, we've discovered that hosts usin= g AFS fail to boot after the upgrade, with Openafs 1.6.22.1 installed.
>
> We are wondering if some of the non-guaranteed kernel ABIs that OpenAF= S uses might have changed with the latest kernel provided in RHEL 7.
>
> I've attached a picture of the trace.
>
> Anyone else kicking the tires on the new RHEL yet?
>
> Thanks!
>
>
>
>

--
Stephan Wiesand
DESY -DV-
Platanenallee 6
15738 Zeuthen, Germany



_______________________________________________
OpenAFS-info mailing list
OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listin= fo/openafs-info

--94eb2c07464a77f61e05642db8e9-- From kaduk@mit.edu Fri Feb 2 01:14:21 2018 From: kaduk@mit.edu (Benjamin Kaduk) Date: Thu, 1 Feb 2018 19:14:21 -0600 Subject: [OpenAFS] Re: RHEL 7.5 beta / 3.10.0-830.el7.x86_66 kernel lock up In-Reply-To: <98B9E358-A528-409E-B527-612B90DE904A@desy.de> References: <98B9E358-A528-409E-B527-612B90DE904A@desy.de> Message-ID: <20180202011421.GS12363@mit.edu> On Thu, Feb 01, 2018 at 05:11:24PM +0100, Stephan Wiesand wrote: > Comparing the 1.6.22.2 module builds from the SL packaging, where the kABI hashes of the used symbols are stored as a requirement, is seems none of those hashes changed between -693 and -830. > > There are two differences in the configure results: > > -ac_cv_linux_header_sched_signal_h=no > +ac_cv_linux_header_sched_signal_h=yes > > -ac_cv_linux_struct_file_operations_has_iterate=no > +ac_cv_linux_struct_file_operations_has_iterate=yes That's very helpful to know. Does the new tree actually have a sched/signal.h header? Does the new struct file_operations have an 'iterate' member function? (The idea being to tell whether they changed something in new and interesting ways or our configure test(s) are broken.) -Ben > And there's quite a bit of churn in include/linux.fs.h (and some in key.h). From stephan.wiesand@desy.de Fri Feb 2 08:55:09 2018 From: stephan.wiesand@desy.de (Stephan Wiesand) Date: Fri, 2 Feb 2018 09:55:09 +0100 Subject: [OpenAFS] Re: RHEL 7.5 beta / 3.10.0-830.el7.x86_66 kernel lock up In-Reply-To: <20180202011421.GS12363@mit.edu> References: <98B9E358-A528-409E-B527-612B90DE904A@desy.de> <20180202011421.GS12363@mit.edu> Message-ID: > On 2. Feb 2018, at 02:14, Benjamin Kaduk wrote: >=20 > On Thu, Feb 01, 2018 at 05:11:24PM +0100, Stephan Wiesand wrote: >> Comparing the 1.6.22.2 module builds from the SL packaging, where the = kABI hashes of the used symbols are stored as a requirement, is seems = none of those hashes changed between -693 and -830. >>=20 >> There are two differences in the configure results: >>=20 >> -ac_cv_linux_header_sched_signal_h=3Dno >> +ac_cv_linux_header_sched_signal_h=3Dyes >>=20 >> -ac_cv_linux_struct_file_operations_has_iterate=3Dno >> +ac_cv_linux_struct_file_operations_has_iterate=3Dyes >=20 > That's very helpful to know. >=20 > Does the new tree actually have a sched/signal.h header? Yes it does. The only content is a guarded include of > Does the new struct file_operations have an 'iterate' member > function? Yes it does, wrapped in a RH_KABI_ITERATE macro. > (The idea being to tell whether they changed something in new and > interesting ways or our configure test(s) are broken.) It's the former :-( --=20 Stephan Wiesand DESY -DV- Platanenallee 6 15738 Zeuthen, Germany From stephan.wiesand@desy.de Fri Feb 2 09:05:50 2018 From: stephan.wiesand@desy.de (Stephan Wiesand) Date: Fri, 2 Feb 2018 10:05:50 +0100 Subject: [OpenAFS] Re: RHEL 7.5 beta / 3.10.0-830.el7.x86_66 kernel lock up In-Reply-To: References: <98B9E358-A528-409E-B527-612B90DE904A@desy.de> <20180202011421.GS12363@mit.edu> Message-ID: <512FCB2D-281D-41DA-9BCA-FF3AD2AD239B@desy.de> > On 2. Feb 2018, at 09:55, Stephan Wiesand = wrote: >=20 >=20 >> On 2. Feb 2018, at 02:14, Benjamin Kaduk wrote: >>=20 >> On Thu, Feb 01, 2018 at 05:11:24PM +0100, Stephan Wiesand wrote: >>> Comparing the 1.6.22.2 module builds from the SL packaging, where = the kABI hashes of the used symbols are stored as a requirement, is = seems none of those hashes changed between -693 and -830. >>>=20 >>> There are two differences in the configure results: >>>=20 >>> -ac_cv_linux_header_sched_signal_h=3Dno >>> +ac_cv_linux_header_sched_signal_h=3Dyes >>>=20 >>> -ac_cv_linux_struct_file_operations_has_iterate=3Dno >>> +ac_cv_linux_struct_file_operations_has_iterate=3Dyes >>=20 >> That's very helpful to know. >>=20 >> Does the new tree actually have a sched/signal.h header? >=20 > Yes it does. The only content is a guarded include of >=20 >> Does the new struct file_operations have an 'iterate' member >> function? >=20 > Yes it does, wrapped in a RH_KABI_ITERATE macro. er, nonsense, that's RH_KABI_EXTEND, sorry >=20 >> (The idea being to tell whether they changed something in new and >> interesting ways or our configure test(s) are broken.) >=20 > It's the former :-( From mvanderw@nd.edu Fri Feb 2 16:26:48 2018 From: mvanderw@nd.edu (Matt Vander Werf) Date: Fri, 2 Feb 2018 11:26:48 -0500 Subject: [OpenAFS] Re: RHEL 7.5 beta / 3.10.0-830.el7.x86_66 kernel lock up In-Reply-To: <512FCB2D-281D-41DA-9BCA-FF3AD2AD239B@desy.de> References: <98B9E358-A528-409E-B527-612B90DE904A@desy.de> <20180202011421.GS12363@mit.edu> <512FCB2D-281D-41DA-9BCA-FF3AD2AD239B@desy.de> Message-ID: --001a114103584ad70605643d306f Content-Type: text/plain; charset="UTF-8" Just for the sake of testing, I also installed 1.8.0pre4 RPMs on a RHEL 7.5 beta system and still had the same issue when using ls with directories under /afs/... Also (maybe this was already mentioned), it seems to be only directories as well. I can do an ls of a known file in my AFS home directory just fine: [mvanderw@ ~]$ echo testing > /afs/crc.nd.edu/user/m/mvanderw/testing [mvanderw@ ~]$ cat /afs/crc.nd.edu/user/m/mvanderw/testing testing [mvanderw@ ~]$ ls -al /afs/crc.nd.edu/user/m/mvanderw/testing -rw-r--r-- 1 mvanderw campus 8 Feb 2 11:20 /afs/ crc.nd.edu/user/m/mvanderw/testing vs [mvanderw@ ~]$ ls -al /afs/crc.nd.edu/user/m/mvanderw ls: reading directory /afs/crc.nd.edu/user/m/mvanderw: Not a directory total 0 Any ideas? Or anything we can test/do that would help? Thanks! -- Matt Vander Werf HPC System Administrator University of Notre Dame Center for Research Computing - Union Station 506 W. South Street South Bend, IN 46601 Phone: (574) 631-0692 On Fri, Feb 2, 2018 at 4:05 AM, Stephan Wiesand wrote: > > > On 2. Feb 2018, at 09:55, Stephan Wiesand > wrote: > > > > > >> On 2. Feb 2018, at 02:14, Benjamin Kaduk wrote: > >> > >> On Thu, Feb 01, 2018 at 05:11:24PM +0100, Stephan Wiesand wrote: > >>> Comparing the 1.6.22.2 module builds from the SL packaging, where the > kABI hashes of the used symbols are stored as a requirement, is seems none > of those hashes changed between -693 and -830. > >>> > >>> There are two differences in the configure results: > >>> > >>> -ac_cv_linux_header_sched_signal_h=no > >>> +ac_cv_linux_header_sched_signal_h=yes > >>> > >>> -ac_cv_linux_struct_file_operations_has_iterate=no > >>> +ac_cv_linux_struct_file_operations_has_iterate=yes > >> > >> That's very helpful to know. > >> > >> Does the new tree actually have a sched/signal.h header? > > > > Yes it does. The only content is a guarded include of > > > >> Does the new struct file_operations have an 'iterate' member > >> function? > > > > Yes it does, wrapped in a RH_KABI_ITERATE macro. > > er, nonsense, that's RH_KABI_EXTEND, sorry > > > > >> (The idea being to tell whether they changed something in new and > >> interesting ways or our configure test(s) are broken.) > > > > It's the former :-( > > _______________________________________________ > OpenAFS-info mailing list > OpenAFS-info@openafs.org > https://lists.openafs.org/mailman/listinfo/openafs-info > --001a114103584ad70605643d306f Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Just for the sake of testing, I also instal= led 1.8.0pre4 RPMs on a RHEL 7.5 beta system and still had the same issue w= hen using ls with directories under /afs/...

Also (maybe this = was already mentioned), it seems to be only directories as well. I can do a= n ls of a known file in my AFS home directory just fine:

[mvanderw@&= lt;host> ~]$ echo testing > /afs/crc.nd.edu/user/m/mvanderw/testing
[mvanderw@<h= ost> ~]$ cat /afs/= crc.nd.edu/user/m/mvanderw/testing
testing
[mvanderw@<host>= ~]$ ls -al /afs/crc.= nd.edu/user/m/mvanderw/testing
-rw-r--r-- 1 mvanderw campus 8 Feb=C2= =A0 2 11:20 /afs/crc.= nd.edu/user/m/mvanderw/testing

vs

[mvanderw@<hos= t> ~]$ ls -al /afs/crc.nd.= edu/user/m/mvanderw
ls: reading directory /afs/crc.nd.edu/user/m/mvanderw: Not a directorytotal 0

Any ideas? Or anything we can test/do that would help= ?

T= hanks!

--
Matt Vander We= rf
HPC System Administrator
University of Notre Dame
Center for Re= search Computing - Union Station
506 W. South Street
South Bend, IN 4= 6601
Phone: (574) 631-0692

On Fri, Feb 2, 2018 at 4:05 AM, Stephan Wies= and <stephan.wiesand@desy.de> wrote:

> On 2. Feb 2018, at 09:55, Stephan Wiesand <stephan.wiesand@desy.de> wrote:
>
>
>> On 2. Feb 2018, at 02:14, Benjamin Kaduk <kaduk@mit.edu> wrote:
>>
>> On Thu, Feb 01, 2018 at 05:11:24PM +0100, Stephan Wiesand wrote: >>> Comparing the 1.6.22.2 module builds from the SL packaging, wh= ere the kABI hashes of the used symbols are stored as a requirement, is see= ms none of those hashes changed between -693 and -830.
>>>
>>> There are two differences in the configure results:
>>>
>>> -ac_cv_linux_header_sched_signal_h=3Dno
>>> +ac_cv_linux_header_sched_signal_h=3Dyes
>>>
>>> -ac_cv_linux_struct_file_operations_has_iterate=3Dno
>>> +ac_cv_linux_struct_file_operations_has_iterate=3Dyes
>>
>> That's very helpful to know.
>>
>> Does the new tree actually have a sched/signal.h header?
>
> Yes it does. The only content is a guarded include of <linux/sched.= h>
>
>> Does the new struct file_operations have an 'iterate' memb= er
>> function?
>
> Yes it does, wrapped in a RH_KABI_ITERATE macro.

er, nonsense, that's RH_KABI_EXTEND, sorry

>
>> (The idea being to tell whether they changed something in new and<= br> >> interesting ways or our configure test(s) are broken.)
>
> It's the former :-(

________________= _______________________________
OpenAFS-info mailing list
OpenAFS-info@openafs.org https://lists.openafs.org/mailman/listin= fo/openafs-info

--001a114103584ad70605643d306f-- From kfiresmith@gmail.com Fri Feb 2 21:20:59 2018 From: kfiresmith@gmail.com (Kodiak Firesmith) Date: Fri, 2 Feb 2018 16:20:59 -0500 Subject: [OpenAFS] Re: RHEL 7.5 beta / 3.10.0-830.el7.x86_66 kernel lock up In-Reply-To: References: Message-ID: --94eb2c199088f2e51b056441497e Content-Type: text/plain; charset="UTF-8" Not much else to report today other than expanding my test base out to a few more RHEL 7.5b hosts, and re-rolled the 1.6.22.1-1 SRPM again, and am still seeing the same results universally. Every host fails to boot due to a kernel panic when it tries to load the openafs DKMS kernel module. My next move on Monday will be to try an actual kernel-specific kmod instead of DKMS. If that works I'll be kind of sad since we've had great luck with DKMS until now. - Kodiak On Thu, Feb 1, 2018 at 3:26 PM, Kodiak Firesmith wrote: > I just rebuilt off-the-shelf RPMs based off of http://www.openafs.org/dl/ > openafs/1.6.22.1/openafs-1.6.22.1-1.src.rpm thinking maybe we had some > historical patch in our build area that might be causing the problem, but > alas, even the off-the-shelf RPMs cause a full wedge and reboot when > openafs-client.service starts up. > > - Kodiak > > On Thu, Feb 1, 2018 at 1:23 PM, Kodiak Firesmith > wrote: > >> Hello Rich! >> It's a Dell Optiplex 7020 with an Intel i7-4790. >> >> Thanks! >> - Kodiak >> >> On Thu, Feb 1, 2018 at 1:20 PM, Rich Sudlow wrote: >> >>> On 01/31/2018 09:43 AM, Kodiak Firesmith wrote: >>> >>>> https://photos.app.goo.gl/WgPsSUCLK5ojxIuH3 >>>> >>> >>> Greetings >>> >>> What processor..etc is this machine? >>> >>> Rich >>> >>> >>> >>>> >>>> On Wed, Jan 31, 2018 at 9:41 AM, Kodiak Firesmith >>> > wrote: >>>> >>>> Folks, re-sending this because the first try never hit the list - >>>> perhaps >>>> mail with attachments are silently dropped or held for manual >>>> moderation? I'd originally attached an image of the stack trace. I'll >>>> host it and reply >>>> to this with a URL link in case that would also result in a drop >>>> or moderation. >>>> >>>> >>>> >>>> Anyhow: >>>> >>>> In testing the new RHEL 7.5 beta, we've discovered that hosts using >>>> AFS fail >>>> to boot after the upgrade, with Openafs 1.6.22.1 installed. >>>> >>>> We are wondering if some of the non-guaranteed kernel ABIs that >>>> OpenAFS uses >>>> might have changed with the latest kernel provided in RHEL 7. >>>> >>>> I've attached a picture of the trace. >>>> >>>> Anyone else kicking the tires on the new RHEL yet? >>>> >>>> Thanks! >>>> >>>> >>>> >>> >>> -- >>> Rich Sudlow >>> University of Notre Dame >>> Center for Research Computing - Union Station >>> 506 W. South St >>> South Bend, In 46601 >>> >>> (574) 631-7258 (office) >>> (574) 807-1046 (cell) >>> >> >> > --94eb2c199088f2e51b056441497e Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Not much else to report today other than expanding my test= base out to a few more RHEL 7.5b hosts, and re-rolled the 1.6.22.1-1 SRPM = again, and am still seeing the same results universally.=C2=A0 Every host f= ails to boot due to a kernel panic when it tries to load the openafs DKMS k= ernel module.

My next move on Monday will be to try an a= ctual kernel-specific kmod instead of DKMS.=C2=A0 If that works I'll be= kind of sad since we've had great luck with DKMS until now.
=
=C2=A0- Kodiak

On Thu, Feb 1, 2018 at 3:26 PM, Kodiak Firesmith <kfiresmith@gmail.com> wrote:
I just rebuilt off-the-shelf RPMs based off of=C2=A0<= a href=3D"http://www.openafs.org/dl/openafs/1.6.22.1/openafs-1.6.22.1-1.src= .rpm" target=3D"_blank">http://www.openafs.org/dl/openafs/1.6.22.1/ope= nafs-1.6.22.1-1.src.rpm thinking maybe we had some historical patc= h in our build area that might be causing the problem, but alas, even the o= ff-the-shelf RPMs cause a full wedge and reboot when openafs-client.service= starts up.=C2=A0=C2=A0
=
=C2=A0- Kodiak

On Thu, Feb 1, 2018 at 1:23 PM, Kodiak Firesmith <<= a href=3D"mailto:kfiresmith@gmail.com" target=3D"_blank">kfiresmith@gmail.c= om> wrote:
Hello Rich!
It's a Dell Optiplex 7020 with an Intel i7-4790.
=

Thanks!
=C2=A0- Kodiak

On Thu, Feb= 1, 2018 at 1:20 PM, Rich Sudlow <rich@nd.edu> wrote:
On 01/31/2018 09:43 AM, Kodiak Firesmith wrote:
https://photos.app.goo.gl/WgPsSUCLK5ojxIuH3

Greetings

What processor..etc is this machine?

Rich




On Wed, Jan 31, 2018 at 9:41 AM, Kodiak Firesmith <kfiresmith@gmail.com <mailto:kfiresmith@gmail.co= m>> wrote:

=C2=A0 =C2=A0 Folks, re-sending this because the first try never hit the li= st - perhaps
=C2=A0 =C2=A0 mail with attachments are silently dropped or held for manual= moderation?=C2=A0 =C2=A0 =C2=A0I'd originally attached an image of the= stack trace.=C2=A0 I'll host it and reply
=C2=A0 =C2=A0 to this with a=C2=A0 URL link in case that would also result = in a drop or moderation.



=C2=A0 =C2=A0 Anyhow:

=C2=A0 =C2=A0 In testing the new RHEL 7.5 beta, we've discovered that h= osts using AFS fail
=C2=A0 =C2=A0 to boot after the upgrade, with Openafs 1.6.22.1 installed.
=C2=A0 =C2=A0 We are wondering if some of the non-guaranteed kernel ABIs th= at OpenAFS uses
=C2=A0 =C2=A0 might have changed with the latest kernel provided in RHEL 7.=

=C2=A0 =C2=A0 I've attached a picture of the trace.

=C2=A0 =C2=A0 Anyone else kicking the tires on the new RHEL yet?

=C2=A0 =C2=A0 Thanks!




--
Rich Sudlow
University of Notre Dame
Center for Research Computing - Union Station
506 W. South St
South Bend, In 46601

(574) 631-7258=C2=A0(office)
(574) 807-1046=C2=A0(cell)



--94eb2c199088f2e51b056441497e-- From stephan.wiesand@desy.de Fri Feb 2 21:36:00 2018 From: stephan.wiesand@desy.de (Stephan Wiesand) Date: Fri, 2 Feb 2018 22:36:00 +0100 Subject: [OpenAFS] Re: RHEL 7.5 beta / 3.10.0-830.el7.x86_66 kernel lock up In-Reply-To: References: Message-ID: While additional data points are obviously most welcome, there is no = expectation that this issue is fixed with 1.6.22.x or 1.8.x right now. = Some serious work will be required to adapt OpenAFS to the changes in = this kernel (series), though there's some hope that it won't be quite as = hard to fix as the 7.4 getcwd issue. - Stephan > On 02.Feb 2018, at 22:20, Kodiak Firesmith = wrote: >=20 > Not much else to report today other than expanding my test base out to = a few more RHEL 7.5b hosts, and re-rolled the 1.6.22.1-1 SRPM again, and = am still seeing the same results universally. Every host fails to boot = due to a kernel panic when it tries to load the openafs DKMS kernel = module. >=20 > My next move on Monday will be to try an actual kernel-specific kmod = instead of DKMS. If that works I'll be kind of sad since we've had = great luck with DKMS until now. >=20 > - Kodiak >=20 > On Thu, Feb 1, 2018 at 3:26 PM, Kodiak Firesmith = wrote: > I just rebuilt off-the-shelf RPMs based off of = http://www.openafs.org/dl/openafs/1.6.22.1/openafs-1.6.22.1-1.src.rpm = thinking maybe we had some historical patch in our build area that might = be causing the problem, but alas, even the off-the-shelf RPMs cause a = full wedge and reboot when openafs-client.service starts up. =20 >=20 > - Kodiak >=20 > On Thu, Feb 1, 2018 at 1:23 PM, Kodiak Firesmith = wrote: > Hello Rich! > It's a Dell Optiplex 7020 with an Intel i7-4790. >=20 > Thanks! > - Kodiak >=20 > On Thu, Feb 1, 2018 at 1:20 PM, Rich Sudlow wrote: > On 01/31/2018 09:43 AM, Kodiak Firesmith wrote: > https://photos.app.goo.gl/WgPsSUCLK5ojxIuH3 >=20 > Greetings >=20 > What processor..etc is this machine? >=20 > Rich >=20 >=20 >=20 >=20 > On Wed, Jan 31, 2018 at 9:41 AM, Kodiak Firesmith = > wrote: >=20 > Folks, re-sending this because the first try never hit the list - = perhaps > mail with attachments are silently dropped or held for manual = moderation? I'd originally attached an image of the stack trace. = I'll host it and reply > to this with a URL link in case that would also result in a drop = or moderation. >=20 >=20 >=20 > Anyhow: >=20 > In testing the new RHEL 7.5 beta, we've discovered that hosts = using AFS fail > to boot after the upgrade, with Openafs 1.6.22.1 installed. >=20 > We are wondering if some of the non-guaranteed kernel ABIs that = OpenAFS uses > might have changed with the latest kernel provided in RHEL 7. >=20 > I've attached a picture of the trace. >=20 > Anyone else kicking the tires on the new RHEL yet? >=20 > Thanks! From kfiresmith@gmail.com Fri Feb 2 23:04:56 2018 From: kfiresmith@gmail.com (Kodiak Firesmith) Date: Fri, 2 Feb 2018 18:04:56 -0500 Subject: [OpenAFS] Re: RHEL 7.5 beta / 3.10.0-830.el7.x86_66 kernel lock up In-Reply-To: References: Message-ID: --f403045c22c4b671b2056442bdda Content-Type: text/plain; charset="UTF-8" Thanks Stephan, I'm relatively new to handling OpenAFS. Are these problems part of a normal "kernel release; openafs update" cycle and perhaps I'm getting snagged just by being too early of an adopter? I wanted to raise the alarm on this and see if anything else was needed from me as the reporter of the issue, but perhaps that's an overreaction to what is just part of a normal process I just haven't been tuned into in prior RHEL release cycles? Should I try to get an account set up at http://rt.central.org and file a bug? Thanks! - Kodiak On Fri, Feb 2, 2018 at 4:36 PM, Stephan Wiesand wrote: > While additional data points are obviously most welcome, there is no > expectation that this issue is fixed with 1.6.22.x or 1.8.x right now. Some > serious work will be required to adapt OpenAFS to the changes in this > kernel (series), though there's some hope that it won't be quite as hard to > fix as the 7.4 getcwd issue. > > - Stephan > > > On 02.Feb 2018, at 22:20, Kodiak Firesmith wrote: > > > > Not much else to report today other than expanding my test base out to a > few more RHEL 7.5b hosts, and re-rolled the 1.6.22.1-1 SRPM again, and am > still seeing the same results universally. Every host fails to boot due to > a kernel panic when it tries to load the openafs DKMS kernel module. > > > > My next move on Monday will be to try an actual kernel-specific kmod > instead of DKMS. If that works I'll be kind of sad since we've had great > luck with DKMS until now. > > > > - Kodiak > > > > On Thu, Feb 1, 2018 at 3:26 PM, Kodiak Firesmith > wrote: > > I just rebuilt off-the-shelf RPMs based off of > http://www.openafs.org/dl/openafs/1.6.22.1/openafs-1.6.22.1-1.src.rpm > thinking maybe we had some historical patch in our build area that might be > causing the problem, but alas, even the off-the-shelf RPMs cause a full > wedge and reboot when openafs-client.service starts up. > > > > - Kodiak > > > > On Thu, Feb 1, 2018 at 1:23 PM, Kodiak Firesmith > wrote: > > Hello Rich! > > It's a Dell Optiplex 7020 with an Intel i7-4790. > > > > Thanks! > > - Kodiak > > > > On Thu, Feb 1, 2018 at 1:20 PM, Rich Sudlow wrote: > > On 01/31/2018 09:43 AM, Kodiak Firesmith wrote: > > https://photos.app.goo.gl/WgPsSUCLK5ojxIuH3 > > > > Greetings > > > > What processor..etc is this machine? > > > > Rich > > > > > > > > > > On Wed, Jan 31, 2018 at 9:41 AM, Kodiak Firesmith > wrote: > > > > Folks, re-sending this because the first try never hit the list - > perhaps > > mail with attachments are silently dropped or held for manual > moderation? I'd originally attached an image of the stack trace. I'll > host it and reply > > to this with a URL link in case that would also result in a drop or > moderation. > > > > > > > > Anyhow: > > > > In testing the new RHEL 7.5 beta, we've discovered that hosts using > AFS fail > > to boot after the upgrade, with Openafs 1.6.22.1 installed. > > > > We are wondering if some of the non-guaranteed kernel ABIs that > OpenAFS uses > > might have changed with the latest kernel provided in RHEL 7. > > > > I've attached a picture of the trace. > > > > Anyone else kicking the tires on the new RHEL yet? > > > > Thanks! > > --f403045c22c4b671b2056442bdda Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Thanks Stephan,
I'm relatively new to handling Ope= nAFS.=C2=A0 Are these problems part of a normal "kernel release; opena= fs update" cycle and perhaps I'm getting snagged just by being too= early of an adopter?=C2=A0 I wanted to raise the alarm on this and see if = anything else was needed from me as the reporter of the issue, but perhaps = that's an overreaction to what is just part of a normal process I just = haven't been tuned into in prior RHEL release cycles?

Should I try to get an account set up at=C2=A0http://rt.central.org and file a bug?=C2=A0=C2=A0

Thanks!
=C2=A0- Kodiak

On Fri, Feb 2, 2018 at 4:36 PM,= Stephan Wiesand <stephan.wiesand@desy.de> wrote:
<= blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px= #ccc solid;padding-left:1ex">While additional data points are obviously mo= st welcome, there is no expectation that this issue is fixed with 1.6.22.x = or 1.8.x right now. Some serious work will be required to adapt OpenAFS to = the changes in this kernel (series), though there's some hope that it w= on't be quite as hard to fix as the 7.4 getcwd issue.

- Stephan

> On 02.Feb 2018, at 22:20, Kodiak Firesmith <kfiresmith@gmail.com> wrote:
>
> Not much else to report today other than expanding my test base out to= a few more RHEL 7.5b hosts, and re-rolled the 1.6.22.1-1 SRPM again, and a= m still seeing the same results universally.=C2=A0 Every host fails to boot= due to a kernel panic when it tries to load the openafs DKMS kernel module= .
>
> My next move on Monday will be to try an actual kernel-specific kmod i= nstead of DKMS.=C2=A0 If that works I'll be kind of sad since we've= had great luck with DKMS until now.
>
>=C2=A0 - Kodiak
>
> On Thu, Feb 1, 2018 at 3:26 PM, Kodiak Firesmith <kfiresmith@gmail.com> wrote:
> I just rebuilt off-the-shelf RPMs based off of http://www.openafs.org/dl/openafs/1.6.22.1/openaf= s-1.6.22.1-1.src.rpm thinking maybe we had some historical patch i= n our build area that might be causing the problem, but alas, even the off-= the-shelf RPMs cause a full wedge and reboot when openafs-client.service st= arts up.
>
>=C2=A0 - Kodiak
>
> On Thu, Feb 1, 2018 at 1:23 PM, Kodiak Firesmith <kfiresmith@gmail.com> wrote:
> Hello Rich!
> It's a Dell Optiplex 7020 with an Intel i7-4790.
>
> Thanks!
>=C2=A0 - Kodiak
>
> On Thu, Feb 1, 2018 at 1:20 PM, Rich Sudlow <rich@nd.edu> wrote:
> On 01/31/2018 09:43 AM, Kodiak Firesmith wrote:
> https://photos.app.goo.gl/WgPsSUCLK5ojxIuH3=
>
> Greetings
>
> What processor..etc is this machine?
>
> Rich
>
>
>
>
> On Wed, Jan 31, 2018 at 9:41 AM, Kodiak Firesmith <kfiresmith@gmail.com <mailto:kfiresmith@gmail.com>> wrote:
>
>=C2=A0 =C2=A0 =C2=A0Folks, re-sending this because the first try never = hit the list - perhaps
>=C2=A0 =C2=A0 =C2=A0mail with attachments are silently dropped or held = for manual moderation?=C2=A0 =C2=A0 =C2=A0I'd originally attached an im= age of the stack trace.=C2=A0 I'll host it and reply
>=C2=A0 =C2=A0 =C2=A0to this with a=C2=A0 URL link in case that would al= so result in a drop or moderation.
>
>
>
>=C2=A0 =C2=A0 =C2=A0Anyhow:
>
>=C2=A0 =C2=A0 =C2=A0In testing the new RHEL 7.5 beta, we've discove= red that hosts using AFS fail
>=C2=A0 =C2=A0 =C2=A0to boot after the upgrade, with Openafs 1.6.22.1 in= stalled.
>
>=C2=A0 =C2=A0 =C2=A0We are wondering if some of the non-guaranteed kern= el ABIs that OpenAFS uses
>=C2=A0 =C2=A0 =C2=A0might have changed with the latest kernel provided = in RHEL 7.
>
>=C2=A0 =C2=A0 =C2=A0I've attached a picture of the trace.
>
>=C2=A0 =C2=A0 =C2=A0Anyone else kicking the tires on the new RHEL yet?<= br> >
>=C2=A0 =C2=A0 =C2=A0Thanks!


--f403045c22c4b671b2056442bdda-- From kaduk@mit.edu Fri Feb 2 23:39:15 2018 From: kaduk@mit.edu (Benjamin Kaduk) Date: Fri, 2 Feb 2018 17:39:15 -0600 Subject: [OpenAFS] Re: RHEL 7.5 beta / 3.10.0-830.el7.x86_66 kernel lock up In-Reply-To: References: Message-ID: <20180202233914.GN12363@mit.edu> On Fri, Feb 02, 2018 at 04:20:59PM -0500, Kodiak Firesmith wrote: > Not much else to report today other than expanding my test base out to a > few more RHEL 7.5b hosts, and re-rolled the 1.6.22.1-1 SRPM again, and am > still seeing the same results universally. Every host fails to boot due to > a kernel panic when it tries to load the openafs DKMS kernel module. The screen picture you posted earlier had two entries for attempting to start the openafs client (both failed). The client is known to panic if afsd is run a second time without an unload/load of the kernel module in between. Is it possible that this is happening in your setup? -Ben From jaltman@auristor.com Sun Feb 4 01:11:37 2018 From: jaltman@auristor.com (Jeffrey Altman) Date: Sat, 3 Feb 2018 20:11:37 -0500 Subject: [OpenAFS] Re: RHEL 7.5 beta / 3.10.0-830.el7.x86_66 kernel lock up In-Reply-To: References: Message-ID: <54924524-154a-bee0-1719-77f8af636f63@auristor.com> This is a cryptographically signed message in MIME format. --------------ms060601090009020502050408 Content-Type: multipart/mixed; boundary="------------7071D8D7519B357F240449A9" Content-Language: en-US This is a multi-part message in MIME format. --------------7071D8D7519B357F240449A9 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 2/2/2018 6:04 PM, Kodiak Firesmith wrote: > I'm relatively new to handling OpenAFS.=C2=A0 Are these problems part o= f a > normal "kernel release; openafs update" cycle and perhaps I'm getting > snagged just by being too early of an adopter?=C2=A0 I wanted to raise = the > alarm on this and see if anything else was needed from me as the > reporter of the issue, but perhaps that's an overreaction to what is > just part of a normal process I just haven't been tuned into in prior > RHEL release cycles? Kodiak, On RHEL, DKMS is safe to use for kernel modules that restrict themselves to using the restricted set of kernel interfaces (the RHEL KABI) that Red Hat has designated will be supported across the lifespan of the RHEL major version number. OpenAFS is not such a kernel module. As a result it is vulnerable to breakage each and every time a new kernel is shipped.= There are two types of failures that can occur: 1. a change results in failure to build the OpenAFS kernel module for the new kernel 2. a change results in the OpenAFS kernel module building and successfully loading but failing to operate correctly It is the second of these possibilities that has taken place with the release of the 3.10.0-830.el7 kernel shipped as part of the RHEL 7.5 beta= =2E Are you an early adopter of RHEL 7.5 beta? Absolutely, its a beta release and as such you should expect that there will be bugs and that third party kernel modules that do not adhere to the KABI functionality might have compatibility issues. There was a compatibility issue with RHEL 7.4 kernel (3.10.0_693.1.1.el7) as well that was only fixed in the OpenAFS 1.6 release series this past week as part of 1.6.22.2: http://www.openafs.org/dl/openafs/1.6.22.2/RELNOTES-1.6.22.2 Jeffrey Altman AuriStor, Inc. P.S. - Welcome to the community. --------------7071D8D7519B357F240449A9 Content-Type: text/x-vcard; charset=utf-8; name="jaltman.vcf" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="jaltman.vcf" begin:vcard fn:Jeffrey Altman n:Altman;Jeffrey org:AuriStor, Inc. adr:Suite 6B;;255 West 94Th Street;New York;New York;10025-6985;United St= ates email;internet:jaltman@auristor.com title:Founder and CEO tel;work:+1-212-769-9018 note;quoted-printable:LinkedIn: https://www.linkedin.com/in/jeffreyaltman= =3D0D=3D0A=3D Skype: jeffrey.e.altman=3D0D=3D0A=3D =09 url:https://www.auristor.com/ version:2.1 end:vcard --------------7071D8D7519B357F240449A9-- --------------ms060601090009020502050408 Content-Type: application/pkcs7-signature; name="smime.p7s" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="smime.p7s" Content-Description: S/MIME Cryptographic Signature MIAGCSqGSIb3DQEHAqCAMIACAQExDzANBglghkgBZQMEAgEFADCABgkqhkiG9w0BBwEAAKCC DIIwggXpMIIE0aADAgECAhBAAV7gPRitcrlGsJTzkwjvMA0GCSqGSIb3DQEBCwUAMDoxCzAJ BgNVBAYTAlVTMRIwEAYDVQQKEwlJZGVuVHJ1c3QxFzAVBgNVBAMTDlRydXN0SUQgQ0EgQTEy MB4XDTE3MTAwMzAzMTczM1oXDTE4MTEwMzAzMTczM1owgYUxLTArBgNVBAsMJFZlcmlmaWVk IEVtYWlsOiBqYWx0bWFuQGF1cmlzdG9yLmNvbTEjMCEGCSqGSIb3DQEJARYUamFsdG1hbkBh dXJpc3Rvci5jb20xLzAtBgoJkiaJk/IsZAEBEx9BMDE0MjdFMDAwMDAxNUVFMDNEMTg3QTAw MDA0QUE1MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAqqJC89ZA1DSS7t/Ug8Dd BQv5nBDumInWtFvHwVCORitVCvlkX4SfqKpERATq0eHOSc0zEz1PUjhAT8lgbNj8Bs92pL9t DW/VHHpq11w06rCEmZJNxgErAIvMpRuAhGrzvBpQBLj8nDArHWw+5nRn/KnK7ZO81LEEj4TG w0PEKGSa0aFA+JdRTJ6BZSDP2o/8AHx+Bw4JgW8VppAe4IuY/F+JoYtyQDL+fm1YMnFMtf1A 6IvlGXD7gMksPRbVIfD+QpHZbQvNXZAVVDaCWZuWQq46Vl4lSlkmW9yMlGddvFGl2zSMK7ny f0kbWJLw9lZxXDegY0/ciJPACPsyBwuyLwIDAQABo4ICnTCCApkwDgYDVR0PAQH/BAQDAgWg MIGEBggrBgEFBQcBAQR4MHYwMAYIKwYBBQUHMAGGJGh0dHA6Ly9jb21tZXJjaWFsLm9jc3Au aWRlbnRydXN0LmNvbTBCBggrBgEFBQcwAoY2aHR0cDovL3ZhbGlkYXRpb24uaWRlbnRydXN0 LmNvbS9jZXJ0cy90cnVzdGlkY2FhMTIucDdjMB8GA1UdIwQYMBaAFKRz2u9pNYp1zKAZewgy +GuJ5ELsMAkGA1UdEwQCMAAwggEsBgNVHSAEggEjMIIBHzCCARsGC2CGSAGG+S8ABgsBMIIB CjBKBggrBgEFBQcCARY+aHR0cHM6Ly9zZWN1cmUuaWRlbnRydXN0LmNvbS9jZXJ0aWZpY2F0 ZXMvcG9saWN5L3RzL2luZGV4Lmh0bWwwgbsGCCsGAQUFBwICMIGuGoGrVGhpcyBUcnVzdElE IENlcnRpZmljYXRlIGhhcyBiZWVuIGlzc3VlZCBpbiBhY2NvcmRhbmNlIHdpdGggCklkZW5U cnVzdCdzIFRydXN0SUQgQ2VydGlmaWNhdGUgUG9saWN5IGZvdW5kIGF0IGh0dHBzOi8vc2Vj dXJlLmlkZW50cnVzdC5jb20vY2VydGlmaWNhdGVzL3BvbGljeS90cy9pbmRleC5odG1sMEUG A1UdHwQ+MDwwOqA4oDaGNGh0dHA6Ly92YWxpZGF0aW9uLmlkZW50cnVzdC5jb20vY3JsL3Ry dXN0aWRjYWExMi5jcmwwHwYDVR0RBBgwFoEUamFsdG1hbkBhdXJpc3Rvci5jb20wHQYDVR0O BBYEFNefZrPaqPUvaS6V6kAmHDwFhoDiMB0GA1UdJQQWMBQGCCsGAQUFBwMCBggrBgEFBQcD BDANBgkqhkiG9w0BAQsFAAOCAQEAKlssrfOJ5+WwHyhFSeSsioN0qpg2QDX/uvodF38JbquO 1U0my0j3Cc/bwk48++bjzp0Fvk/Kkcmss5/6zzJMjr9rf12QCQfKkbO9nMm8Bg6IP3pYgk0W /F1h3ZQF3OgBn3zZoOd3f1a6dF6z12MqKA/2g5GKrQFxkdzTGrNw6ISE9uY8ysvc3i2N2kas HNi5Etk7StZ1jvFX5sQMIeNdlF+z+BU/AyT7NoBS4gCH+ggF+DG7fAYywvy42Lfu8p6kopKT 5JZpYce1cNjnOaDhzhgeR+oXxoDbekF27JinXHQSKjBxhujcZu5leAkpctFpZxnIKZJZUBiu 31Nm7xYaijCCBpEwggR5oAMCAQICEQD53lZ/yU0Md3D5YBtS2hU7MA0GCSqGSIb3DQEBCwUA MEoxCzAJBgNVBAYTAlVTMRIwEAYDVQQKEwlJZGVuVHJ1c3QxJzAlBgNVBAMTHklkZW5UcnVz dCBDb21tZXJjaWFsIFJvb3QgQ0EgMTAeFw0xNTAyMTgyMjI1MTlaFw0yMzAyMTgyMjI1MTla MDoxCzAJBgNVBAYTAlVTMRIwEAYDVQQKEwlJZGVuVHJ1c3QxFzAVBgNVBAMTDlRydXN0SUQg Q0EgQTEyMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA0ZFNPM8KJzSSrkvpmtQl a3ksT+fq1s9c+Ea3YSC/umUkygSm9UkkOoaoNjKZoCx3wef1kwC4pQQV2XHk+AKR+7uMvnOC Iw2cAVUP0/Kuy4X6miqaXGGVDTqwVjaFuFCRVVDTQoI2BTMpwFQi+O/TjD5+E0+TAZbkzsB7 krk4YUbA6hFyT0YboxRUq9M2QHDb+80w53b1UZVO1HS2Mfk9LnINeyzjxiXU/iENK07YvjBO xbY/ftAYPbv/9cY3wrpqZYHoXZc6B9/8+aVCNA45FP3k+YuTDC+ZrmePQBLQJWnyS/QrZEdX saieWUqkUMxPQKTExArCiP61YRYlOIMpKwIDAQABo4ICgDCCAnwwgYkGCCsGAQUFBwEBBH0w ezAwBggrBgEFBQcwAYYkaHR0cDovL2NvbW1lcmNpYWwub2NzcC5pZGVudHJ1c3QuY29tMEcG CCsGAQUFBzAChjtodHRwOi8vdmFsaWRhdGlvbi5pZGVudHJ1c3QuY29tL3Jvb3RzL2NvbW1l cmNpYWxyb290Y2ExLnA3YzAfBgNVHSMEGDAWgBTtRBnA0/AGi+6ke75C5yZUyI42djAPBgNV HRMBAf8EBTADAQH/MIIBIAYDVR0gBIIBFzCCARMwggEPBgRVHSAAMIIBBTCCAQEGCCsGAQUF BwICMIH0MEUWPmh0dHBzOi8vc2VjdXJlLmlkZW50cnVzdC5jb20vY2VydGlmaWNhdGVzL3Bv bGljeS90cy9pbmRleC5odG1sMAMCAQEagapUaGlzIFRydXN0SUQgQ2VydGlmaWNhdGUgaGFz IGJlZW4gaXNzdWVkIGluIGFjY29yZGFuY2Ugd2l0aCBJZGVuVHJ1c3QncyBUcnVzdElEIENl cnRpZmljYXRlIFBvbGljeSBmb3VuZCBhdCBodHRwczovL3NlY3VyZS5pZGVudHJ1c3QuY29t L2NlcnRpZmljYXRlcy9wb2xpY3kvdHMvaW5kZXguaHRtbDBKBgNVHR8EQzBBMD+gPaA7hjlo dHRwOi8vdmFsaWRhdGlvbi5pZGVudHJ1c3QuY29tL2NybC9jb21tZXJjaWFscm9vdGNhMS5j cmwwHQYDVR0lBBYwFAYIKwYBBQUHAwIGCCsGAQUFBwMEMA4GA1UdDwEB/wQEAwIBhjAdBgNV HQ4EFgQUpHPa72k1inXMoBl7CDL4a4nkQuwwDQYJKoZIhvcNAQELBQADggIBAA3hgq7S+/Tr Yxl+D7ExI1Rdgq8fC9kiT7ofWlSaK/IMjgjoDfBbPGWvzdkmbSgYgXo8GxuAon9+HLIjNv68 BgUmbIjwj/SYaVz6chA25XZdjxzKk+hUkqCmfOn/twQJeRfxHg3I+0Sfwp5xs10YF0Robhrs CRne6OUmh9mph0fE3b21k90OVnx9Hfr+YAV4ISrTA6045zQTKGzb370whliPLFo+hNL6XzEt y5hfdFaWKtHIfpE994CLmTJI4SEbWq40d7TpAjCmKCPIVPq/+9GqggGvtakM5K3VXNc9VtKP U9xYGCTDIYoeVBQ65JsdsdyM4PzDzAdINsv4vaF7yE03nh2jLV7XAkcqad9vS4EB4hKjFFsm cwxa+ACUfkVWtBaWBqN4f/o1thsFJHEAu4Q6oRB6mYkzqrPigPazF2rgYw3lp0B1gSzCRj+j RtErIVdMPeZ2p5Fdx7SNhBtabuhqmpJkFxwW9SBg6sHvy0HpzVvEiBpApFKG1ZHXMwzQl+pR 8P27wWDsblJU7Qgb8ZzGRK9l5GOFhxtN+oXZ4CCmunLMtaZ2vSai7du/VKrg64GGZNAKerEB evjJVNFgeSnmUK9GB4kCZ7U5NWlU+2H87scntW4Q/0Y6vqQJcJeaMHg/dQnahTQ2p+hB1xJJ K32GWIAucTFMSOKLbQHadIOiMYIDFDCCAxACAQEwTjA6MQswCQYDVQQGEwJVUzESMBAGA1UE ChMJSWRlblRydXN0MRcwFQYDVQQDEw5UcnVzdElEIENBIEExMgIQQAFe4D0YrXK5RrCU85MI 7zANBglghkgBZQMEAgEFAKCCAZcwGAYJKoZIhvcNAQkDMQsGCSqGSIb3DQEHATAcBgkqhkiG 9w0BCQUxDxcNMTgwMjA0MDExMTM3WjAvBgkqhkiG9w0BCQQxIgQgqPC5LdFQhG4cPL7HoTmR oLs5XRE18jKxzs3flBOcBEEwXQYJKwYBBAGCNxAEMVAwTjA6MQswCQYDVQQGEwJVUzESMBAG A1UEChMJSWRlblRydXN0MRcwFQYDVQQDEw5UcnVzdElEIENBIEExMgIQQAFe4D0YrXK5RrCU 85MI7zBfBgsqhkiG9w0BCRACCzFQoE4wOjELMAkGA1UEBhMCVVMxEjAQBgNVBAoTCUlkZW5U cnVzdDEXMBUGA1UEAxMOVHJ1c3RJRCBDQSBBMTICEEABXuA9GK1yuUawlPOTCO8wbAYJKoZI hvcNAQkPMV8wXTALBglghkgBZQMEASowCwYJYIZIAWUDBAECMAoGCCqGSIb3DQMHMA4GCCqG SIb3DQMCAgIAgDANBggqhkiG9w0DAgIBQDAHBgUrDgMCBzANBggqhkiG9w0DAgIBKDANBgkq hkiG9w0BAQEFAASCAQB4ugk4F/g3XLG7vo+Gnv32I9t+2Ett04GHKIKJ7S/lVBv/7rHC+Rja qSbZ2mAWdMrA9uhdDIMqFFwXrN3i3tbhvQRYvClvqSEa7UCjQSUe+QCpsJPnlJRzdMKBKyS1 bOsS5iROABPkG4SaNxdHPFbnjKWK96hv7Wm/L1noddj4V3TXokvTGOQrVnnGOw9O2TZfdMe5 uDD404DytAYe4BXgOrPMoeeaVR5DPDAdL9fiNFHqsBhIg+0jfsQCT1wC8/zYaC4hMbnwXj19 FOkGvK4peAqayJ/B5bndiLhWZLjt0Hmj/BXidnc/MT1YBBsucjuHei+ijANYI1b5j11hF+UX AAAAAAAA --------------ms060601090009020502050408-- From jose.calhariz@tecnico.ulisboa.pt Sun Feb 4 12:29:30 2018 From: jose.calhariz@tecnico.ulisboa.pt (Jose M Calhariz) Date: Sun, 4 Feb 2018 12:29:30 +0000 Subject: [OpenAFS] connection timed out, how long is the timeout? Message-ID: <20180204122930.243hfca2xgjm7oz4@calhariz.com> Hi, I am chasing the root problem in my infra-structure of afsdb and afs-fileservers. Sometimes my afsdb loses quorum in the middle of a vos operation or the Linux clients time out talking to the file servers. To help diagnose the problem I would like to know how long is the timeout and if I can change the time out connections in the Debian clients and for the vos operations. My plan is to increase and decrease the timeouts in OpenAFS and other timeouts in Linux to identify if I have a possible problem with the data network, iSCSI network, overload on the hosts of VM, overload on the file servers or other possible problem. The core of my infra-structure are 4 afsdb running Debian 9, and using OpenAFS from Debian 1.6.20, on a shared virtualization platform. The file-servers running Debian 9 and using OpenAFS from Debian, 1.6.20, are VMs in dedicated hosts for OpenAFS on top of libvirt/KVM. Kind regards Jose M Calhariz -- -- A Coca-Cola encarna a verdadeira beleza do capitalismo. Ela é uma espécie de religićo secular, sem ensinamento moral nem outro mandamento que nćo seja o aumento do consumo de sua bebida --Mark Pendergrast From dirk.heinrichs@altum.de Sun Feb 4 12:54:26 2018 From: dirk.heinrichs@altum.de (Dirk Heinrichs) Date: Sun, 4 Feb 2018 13:54:26 +0100 Subject: [OpenAFS] connection timed out, how long is the timeout? In-Reply-To: <20180204122930.243hfca2xgjm7oz4@calhariz.com> References: <20180204122930.243hfca2xgjm7oz4@calhariz.com> Message-ID: <758b1f24-b1a5-28f7-ef47-a8169d29cd8c@altum.de> This is an OpenPGP/MIME signed message (RFC 4880 and 3156) --h6dm9IiOSvNZENHNc38GpuZNrYmKslsKL Content-Type: multipart/mixed; boundary="eP6honOXGsV2Aid1pySxGGC1H0ByHL0ec"; protected-headers="v1" From: Dirk Heinrichs To: openafs-info@openafs.org Message-ID: <758b1f24-b1a5-28f7-ef47-a8169d29cd8c@altum.de> Subject: Re: [OpenAFS] connection timed out, how long is the timeout? References: <20180204122930.243hfca2xgjm7oz4@calhariz.com> In-Reply-To: <20180204122930.243hfca2xgjm7oz4@calhariz.com> --eP6honOXGsV2Aid1pySxGGC1H0ByHL0ec Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Content-Language: de-DE Am 04.02.2018 um 13:29 schrieb Jose M Calhariz: > The core of my infra-structure are 4 afsdb Wasn't it so that it's better to have an odd number of DB servers (with a max. of 5)? Bye... =C2=A0=C2=A0=C2=A0 Dirk --=20 Dirk Heinrichs GPG Public Key: D01B367761B0F7CE6E6D81AAD5A2E54246986015 Sichere Internetkommunikation: http://www.retroshare.org Privacy Handbuch: https://www.privacy-handbuch.de --eP6honOXGsV2Aid1pySxGGC1H0ByHL0ec-- --h6dm9IiOSvNZENHNc38GpuZNrYmKslsKL Content-Type: application/pgp-signature; name="signature.asc" Content-Description: OpenPGP digital signature Content-Disposition: attachment; filename="signature.asc" -----BEGIN PGP SIGNATURE----- iQIzBAEBCgAdFiEEJgWJ3LIo7zNO9tmf0p7rxfc7RqsFAlp3AoMACgkQ0p7rxfc7 RqvwNw//WdqGPWqoSZqSdM4AwH2wP6WSn/Ly38htUGgx+VlqZRrRCkQpqx5xtpOC d5gORmhN0JYdCh9fHPli/o2BRrQ/He94/lMcyU5YpIb0ZnE+JqElelM6rl5U+Fau IuXqonIUmZPRZ7KJQxW+iI22X7bdDFK+mwqOCZaS8QpzABUhFLfP6PG446/fUJAj 0S14H21rN5qNu5Y5gKIEL0E1vsFlK/ORlQCNDf8IM/OvwZGE/1DD7rilFly9VqYW czo53XBZg+TvRCz0jDO7lKyZ8kkyBJNrBPQcXuMhkYD/mK/02zfRwBKLEx8R5hKv SEnI2VrqgKYAJalAu01AY9IxHTWrB+rw8Xal1MrnZ+XLdE2NCvj+2vCI3ef7lzZ7 0QkyIs28tZB1lFVJq54qPCDTjwSMuQ3Gfdk0OVlgr9tDpnGAdAS9sa1+G34WlvmS vlqqZTRGjLXo+L7LGg1L31k0/s3mGr4xsfAFp4oqi/DO4YeFVUq7fzmHOdswyvZB OT6xZoIrixubZ0rOhmNEh784jg5bHb4qieiJAb16Na5cYzKhTZAF2Uy34pCFh1pK C6LW8n3+L5TEU6UpZtDsIs2tcmusEsEyvsTO0PCloAWZhQIVWhHR0qd8j7T1CIdB eA7UMF53bNwVRz+g8TfhCnluJuulGRNrZP7zF+hG8DL7h/krLt8= =OuT2 -----END PGP SIGNATURE----- --h6dm9IiOSvNZENHNc38GpuZNrYmKslsKL-- From jose.calhariz@tecnico.ulisboa.pt Sun Feb 4 13:48:14 2018 From: jose.calhariz@tecnico.ulisboa.pt (Jose M Calhariz) Date: Sun, 4 Feb 2018 13:48:14 +0000 Subject: [OpenAFS] connection timed out, how long is the timeout? In-Reply-To: <758b1f24-b1a5-28f7-ef47-a8169d29cd8c@altum.de> References: <20180204122930.243hfca2xgjm7oz4@calhariz.com> <758b1f24-b1a5-28f7-ef47-a8169d29cd8c@altum.de> Message-ID: <20180204134813.eihnmb3dlb2bfji3@calhariz.com> On Sun, Feb 04, 2018 at 01:54:26PM +0100, Dirk Heinrichs wrote: > Am 04.02.2018 um 13:29 schrieb Jose M Calhariz: > > > The core of my infra-structure are 4 afsdb > > Wasn't it so that it's better to have an odd number of DB servers (with > a max. of 5)? Yes, it would be better with an odd number. For historical reasons is stuck on 4. But I think this is not the root cause of my problem. > > Bye... > >     Dirk > Kind regards Jose M Calhariz -- -- A Coca-Cola encarna a verdadeira beleza do capitalismo. Ela é uma espécie de religićo secular, sem ensinamento moral nem outro mandamento que nćo seja o aumento do consumo de sua bebida --Mark Pendergrast From kaduk@mit.edu Sun Feb 4 19:27:07 2018 From: kaduk@mit.edu (Benjamin Kaduk) Date: Sun, 4 Feb 2018 13:27:07 -0600 Subject: [OpenAFS] connection timed out, how long is the timeout? In-Reply-To: <20180204122930.243hfca2xgjm7oz4@calhariz.com> References: <20180204122930.243hfca2xgjm7oz4@calhariz.com> Message-ID: <20180204192707.GU12363@mit.edu> On Sun, Feb 04, 2018 at 12:29:30PM +0000, Jose M Calhariz wrote: > > Hi, > > I am chasing the root problem in my infra-structure of afsdb and > afs-fileservers. Sometimes my afsdb loses quorum in the middle of a It is a pretty disruptive event to lose quorum; do you have any idea what might be responsible for that happening? > vos operation or the Linux clients time out talking to the > file servers. To help diagnose the problem I would like to know how > long is the timeout and if I can change the time out connections in > the Debian clients and for the vos operations. My plan is to increase and The ubik election to determine quorum happens every SMALLTIME (60) seconds, but normally the current coordinator will retain that role and operations can span multiple election cycles. Most of the timeouts involved (e.g., RX_IDLE_DEAD_TIME and AFS_RXDEADTIME) are also on the order of a minute. I think you'd need to recompile in order to adjust these timeouts, though. And I really would recommend tracking down why you're losing quorum before trying to paper over things with longer timeouts. -Ben > decrease the timeouts in OpenAFS and other timeouts in Linux to > identify if I have a possible problem with the data network, iSCSI > network, overload on the hosts of VM, overload on the file servers or > other possible problem. > > The core of my infra-structure are 4 afsdb running Debian 9, and using > OpenAFS from Debian 1.6.20, on a shared virtualization platform. The > file-servers running Debian 9 and using OpenAFS from Debian, 1.6.20, > are VMs in dedicated hosts for OpenAFS on top of libvirt/KVM. > > > Kind regards > Jose M Calhariz > > -- > -- > > A Coca-Cola encarna a verdadeira beleza do capitalismo. Ela é uma espécie de religićo secular, sem ensinamento moral nem outro mandamento que nćo seja o aumento do consumo de sua bebida > > --Mark Pendergrast > _______________________________________________ > OpenAFS-info mailing list > OpenAFS-info@openafs.org > https://lists.openafs.org/mailman/listinfo/openafs-info From jose.calhariz@tecnico.ulisboa.pt Sun Feb 4 19:55:20 2018 From: jose.calhariz@tecnico.ulisboa.pt (Jose M Calhariz) Date: Sun, 4 Feb 2018 19:55:20 +0000 Subject: [OpenAFS] connection timed out, how long is the timeout? In-Reply-To: <20180204192707.GU12363@mit.edu> References: <20180204122930.243hfca2xgjm7oz4@calhariz.com> <20180204192707.GU12363@mit.edu> Message-ID: <20180204195520.qhscfm3cxcrwtlt2@calhariz.com> On Sun, Feb 04, 2018 at 01:27:07PM -0600, Benjamin Kaduk wrote: > On Sun, Feb 04, 2018 at 12:29:30PM +0000, Jose M Calhariz wrote: > > > > Hi, > > > > I am chasing the root problem in my infra-structure of afsdb and > > afs-fileservers. Sometimes my afsdb loses quorum in the middle of a > > It is a pretty disruptive event to lose quorum; do you have any idea > what might be responsible for that happening? In recent times I have seen two times a "vos release" of a critical volume to fail. I may have wrongly interpreted the error message. So I past it here the last one: Could not release lock on the VLDB entry for volume XXXXXXXXXXX u: major synchronization error Error in vos release command. u: major synchronization error > > > vos operation or the Linux clients time out talking to the > > file servers. To help diagnose the problem I would like to know how > > long is the timeout and if I can change the time out connections in > > the Debian clients and for the vos operations. My plan is to increase and > > The ubik election to determine quorum happens every SMALLTIME (60) > seconds, but normally the current coordinator will retain that role > and operations can span multiple election cycles. > > Most of the timeouts involved (e.g., RX_IDLE_DEAD_TIME and > AFS_RXDEADTIME) are also on the order of a minute. > > I think you'd need to recompile in order to adjust these timeouts, > though. And I really would recommend tracking down why you're > losing quorum before trying to paper over things with longer > timeouts. I am too chasing a second problem where a Debian OpenAFS client fail to comunicate with the fileserver and this problem is frequent. May I think that this timeout is about 60 seconds? And that I need to recompile the client to increase or decrease the timeout? > > -Ben > > > decrease the timeouts in OpenAFS and other timeouts in Linux to > > identify if I have a possible problem with the data network, iSCSI > > network, overload on the hosts of VM, overload on the file servers or > > other possible problem. > > > > The core of my infra-structure are 4 afsdb running Debian 9, and using > > OpenAFS from Debian 1.6.20, on a shared virtualization platform. The > > file-servers running Debian 9 and using OpenAFS from Debian, 1.6.20, > > are VMs in dedicated hosts for OpenAFS on top of libvirt/KVM. > > > > > > Kind regards > > Jose M Calhariz > > > _______________________________________________ > OpenAFS-info mailing list > OpenAFS-info@openafs.org > https://lists.openafs.org/mailman/listinfo/openafs-info > Kind regards Jose M Calhariz -- -- .adanibober odnes enilgaT .edraugA From jaltman@auristor.com Sun Feb 4 22:21:16 2018 From: jaltman@auristor.com (Jeffrey Altman) Date: Sun, 4 Feb 2018 17:21:16 -0500 Subject: [OpenAFS] connection timed out, how long is the timeout? In-Reply-To: <20180204122930.243hfca2xgjm7oz4@calhariz.com> References: <20180204122930.243hfca2xgjm7oz4@calhariz.com> Message-ID: <1ff6bf63-81c0-6d73-2178-60e792ac9506@auristor.com> This is a cryptographically signed message in MIME format. --------------ms010403050703030304060109 Content-Type: multipart/mixed; boundary="------------71CAD8B71AC7923BE183F978" Content-Language: en-US This is a multi-part message in MIME format. --------------71CAD8B71AC7923BE183F978 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 2/4/2018 7:29 AM, Jose M Calhariz wrote: > I am chasing the root problem in my infra-structure of afsdb and > afs-fileservers. Sometimes my afsdb loses quorum in the middle of a > vos operation or the Linux clients time out talking to the > file servers. To help diagnose the problem I would like to know how > long is the timeout and if I can change the time out connections in > the Debian clients and for the vos operations. >[...] > The core of my infra-structure are 4 afsdb running Debian 9, and using > OpenAFS from Debian 1.6.20, on a shared virtualization platform. The > file-servers running Debian 9 and using OpenAFS from Debian, 1.6.20, > are VMs in dedicated hosts for OpenAFS on top of libvirt/KVM. Jose, There is unlikely to be a single problem but since I'm procrastinating and curious I decided to perform some research on your cell. This research is the type of analysis that AuriStor performs on behalf of our support customers. Many of the problems you are experiencing with OpenAFS are likely due to or exacerbated by architectural limitations that are simply not present in AuriStorFS. Your cell has four db servers afs01 through afs04 with associated IP addresses that rank the servers from afs01 through afs04. therefore afs01 is the preferred coordinator (sync site) and if its not running afs02 will be elected. Given there are four servers it is not possible for afs03 or afs04 to be elected. There are of course multiple independent ubik database services (vl, pt, and bu) and it is possible for quorum to exist for one and not for others= =2E The vl service is used to store volume location information as well as fileserver/volserver location information. vl entries are modified when a fileserver restarts, when a vos command locks and unlocks an entry, or creates, updates or deletes an entry. Its primary consumer is the afs client which queries volume and file server location information. The pt service stores user and group entries. pt entries are modified by pts when new user entries are created, modified or deleted; and when groups are created, modified or deleted; or when group membership information is modified. The primary consumer is the fileserver which queries the pt service for user and host current protection sets each time a client establishes an rxkad connection to the fileserver. The vl and pt services are of course ubik services. Therefore each vlserver and ptserver process also offers the ubik disk and vote services which are critical. The vote service is used to hold elections, distribute current database version info, and maintain quorum. The disk service is used to distribute the database, update the database, and maintain database consistency. It should be noted that the vote service is time sensitive in that packets that are used to request votes from peers and the responses only have a limited valid lifetime. Some statistics regarding your vl service. Each server is configured with 16 LWP threads. afs03 and afs04 have both failed to service calls in a timely fashion since the last restart. If those failures were vote or disk calls then the coordinator would mark afs03 and afs04 as unreachable, force a recovery operation, and if both were marked down across an election could result in lose of quorum. Since the last restart afs01 has processed 1894352 vl transactions, afs02 1075698 transactions, afs03 2059186 transactions, and afs04 1403592 transactions. That will provide you some idea of the load balancing across your cache managers. The coordinator of course is the only one to handle write transactions; the rest are read transactions. For the pt service the transaction counts are afs01 1818212, afs02 1619962, afs03 1554918, and afs04 1075620. Roughly on par with the vl service load. Like the vl service each server has 16 LWP threads. However, unlike the vl service the pt service is not keeping up with the requests. Since the last restart all four servers have failed to service incoming calls in a timely manner thousands of times each. The pt service failing to be responsive is a problem because it has ripple effects on the file servers. The longer it takes a fileserver to query the CPS data the longer it takes to accept a new connection from a cache manager. The ubik services in all versions of OpenAFS prior to the 1.8 branch have been built as LWP (cooperatively threaded) processes. There is only a single thread in the process that swaps context state. The rx threads (listener, event, ...), the vote, disk, and application (vl, pt, bu, ...) contexts are swapped in either upon a blocking event or a yield. Failure of a context to yield blocks other activities including reading packets, processing requests, etc. Like AuriStorFS the OpenAFS 1.8 series converts the ubik services (vl, pt, bu) to native threading. This will permit the vote and disk services and the rx threads (listener, event,...) to operate with greater parallelism. Unlike AuriStorFS, the OpenAFS implementation still relies to a large extent on global locks for thread safety so all of the resource contention remains. Still, 1.8 will be much less vulnerable to vote packets being delayed beyond their validity. The vote and disk timeouts cannot be adjusted because they are a core part of the protocol. Each of the fileserver bnodes is configured as such dafileserver -L -b 8192 -vc 32768 -s 65536 -l 16384 -p 256 \ -udpsize 16777216 -cb 1048576 -rxpck 4000 davolserver -p 16 -udpsize 16777216 The fileserver and volserver use pthreads and not LWP. There above states that there should be a total of 256 fileserver threads (including rx and other background threads) and 16 volserver threads. The fileservers are also configured with one million callback entries each. It might be the case that some fileservers in the cell are firewalled from the internet. Of those that are visible here are some important details. Too few Max GUCB Max FStat Wait-for CBs seconds seconds threads afs11 yes 112 101 no afs12 yes 110 150 no afs13 yes 111 97 no afs14 yes 112 155 no afs15 yes 80 166 no afs16 yes 112 106 no afs17 yes 118 118 no afs18 yes 105 123 no afs19 no 56 53 yes afs20 no 32 85 no afs21 no 72 63 yes afs22 no 56 29 no=09 afs23 yes 64 29 no afs24 yes 106 84 no afs25 no 32 60 no afs26 no 64 120 yes The first column "too few CBs" indicates whether the number of callbacks requested by clients exceeds the size of the callback table. When the callback table is too small thrashing occurs because the fileserver is forced to invalidate existing callbacks in order to allocate the new one. Those invalidations require the fileserver to notify the impacted cache manager whose callback promise is going be broken prior to expiration even though there was no change to the file or directory. The second column is the maximum number of seconds it took the fileserver to process a GiveUpCallBacks call. GUCBs is issued when the cache manager's vcache is too small and it attempts to notify the fileserver that it no longer requires the callback promise because its flushing the object from the cache in order to make room for something else that is likely to require a callback promise. The third column is the maximum number of seconds it took to acquire the current metadata for the requested object and register a callback promise= =2E The fourth column is an indication of whether or not the fileserver has experienced the situation in which it received a call from a cache manager that could not be processed because all of the available worker threads were blocked waiting for other operations to complete. Before I answer your question about timeouts let me explain the workflow necessary to process a call issued by a cache manager on a new rx connection. 1. CM issues call to file server on new connection. 2. if the call is protected by rxkad, the FS issues an rxkad challenge to the client 3. if a challenge was received, the CM issues an rxkad response 4. FS finds an available worker thread to process the call 5. FS starts statistics clock 6. FS issues TellMeAboutYourself call to CM with the FS's capabilities 7. CM replies to TMAY call with its uuid and capabilities 8. if the UUID was previously associated with another endpoint (ipv4 address and port) or if the endpoint was associated with another UUID, the FS issues ProbeUuid calls to confirm whether or not there is a conflict. 9. If received, CM responds with a yes or no answer depending upon whether or not the Uuid matches. 10. FS obtains the necessary volume and vnode locks (if any) 11. For a StoreData call, the FS receives the rest of the data and writes it to disk 12. FS issues callback notifications to other cache managers impacted by the call 13. FS allocates or updates a new callback promise if necessary for the caller. 14. FS, if necessary, issues callback invalidation to affected CM which might be the same CM as issued the call that is being processed. 15. CM either replies or not to the callback notification. If not, the notification is added to the delayed callback queue for the CM. 16. FS releases any volume and vnode locks. 17. FS updates call statistics. 18. FS completes the call to the CM with success or failure. For any rx connection there are three timeout values that can be set on both sides of the connection. 1. Connection timeout. How long to wait if no packets have been received from the peer. 2. Idle timeout. How long to wait if ping packets are received but no data packets have been received. This is usually set only on the server side of a connection. 3. Hard timeout. How long is the call permitted to live before it is killed for taking too long even if data if flowing slowly. The defaults are a connection timeout of 12 seconds, an idle timeout of 60 seconds on the server side, and no hard dead timeout. A CM typically sets a 50 second connection timeout and no idle or hard timeout on calls to the FS. The FS sets a 50 second connection timeout and 120 second hard timeout on calls to the CM callback service; except for the ProbeUuid calls which are assigned a connection time of 12 seconds. The FS connections to the PT service use the defaults. I selected the GiveUpCallBacks call statistics because that call doesn't require any volume or vnode locks, nor can it involve any notifications to other CMs. Long timeouts for GUCBs means one or more of the following= : a. this is the first call on a new connection and the CM's one and only callback service thread is not responding to the FS promptly b. this is the first call on a new connection and the connection endpoint and the CM's UUID do not match and there is a conflict to resolve c. this is the first call on a new connection and the FS's two CPS queries to the protection service take a long time or timeout if the selected ptserver stops responding to ping ack packets. c. the FS's host table / callback table lock is in use by other threads and this thread cannot make progress The FetchStatus call is similar except that it can also block waiting for Volume and Vnode locks which might not be released until callback notifications are issued. So what are the potential bottlenecks that can result in extended delays totally tens or hundreds of seconds? 1. The single callback service thread in the cache manager which is known to experience soft-deadlocks. 2. The responsiveness of the ptservers to the file servers. 3. Blocking on callback invalidations due to the callback table being too small. 4. Network connectivity between the FS and both PT servers and CMs. Its time for the Super Bowl so I will send off this message as is. Perhaps it will be useful. Jeffrey Altman AuriStor, Inc. --------------71CAD8B71AC7923BE183F978 Content-Type: text/x-vcard; charset=utf-8; name="jaltman.vcf" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="jaltman.vcf" begin:vcard fn:Jeffrey Altman n:Altman;Jeffrey org:AuriStor, Inc. adr:Suite 6B;;255 West 94Th Street;New York;New York;10025-6985;United St= ates email;internet:jaltman@auristor.com title:Founder and CEO tel;work:+1-212-769-9018 note;quoted-printable:LinkedIn: https://www.linkedin.com/in/jeffreyaltman= =3D0D=3D0A=3D Skype: jeffrey.e.altman=3D0D=3D0A=3D =09 url:https://www.auristor.com/ version:2.1 end:vcard --------------71CAD8B71AC7923BE183F978-- --------------ms010403050703030304060109 Content-Type: application/pkcs7-signature; name="smime.p7s" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="smime.p7s" Content-Description: S/MIME Cryptographic Signature MIAGCSqGSIb3DQEHAqCAMIACAQExDzANBglghkgBZQMEAgEFADCABgkqhkiG9w0BBwEAAKCC DIIwggXpMIIE0aADAgECAhBAAV7gPRitcrlGsJTzkwjvMA0GCSqGSIb3DQEBCwUAMDoxCzAJ BgNVBAYTAlVTMRIwEAYDVQQKEwlJZGVuVHJ1c3QxFzAVBgNVBAMTDlRydXN0SUQgQ0EgQTEy MB4XDTE3MTAwMzAzMTczM1oXDTE4MTEwMzAzMTczM1owgYUxLTArBgNVBAsMJFZlcmlmaWVk IEVtYWlsOiBqYWx0bWFuQGF1cmlzdG9yLmNvbTEjMCEGCSqGSIb3DQEJARYUamFsdG1hbkBh dXJpc3Rvci5jb20xLzAtBgoJkiaJk/IsZAEBEx9BMDE0MjdFMDAwMDAxNUVFMDNEMTg3QTAw MDA0QUE1MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAqqJC89ZA1DSS7t/Ug8Dd BQv5nBDumInWtFvHwVCORitVCvlkX4SfqKpERATq0eHOSc0zEz1PUjhAT8lgbNj8Bs92pL9t DW/VHHpq11w06rCEmZJNxgErAIvMpRuAhGrzvBpQBLj8nDArHWw+5nRn/KnK7ZO81LEEj4TG w0PEKGSa0aFA+JdRTJ6BZSDP2o/8AHx+Bw4JgW8VppAe4IuY/F+JoYtyQDL+fm1YMnFMtf1A 6IvlGXD7gMksPRbVIfD+QpHZbQvNXZAVVDaCWZuWQq46Vl4lSlkmW9yMlGddvFGl2zSMK7ny f0kbWJLw9lZxXDegY0/ciJPACPsyBwuyLwIDAQABo4ICnTCCApkwDgYDVR0PAQH/BAQDAgWg MIGEBggrBgEFBQcBAQR4MHYwMAYIKwYBBQUHMAGGJGh0dHA6Ly9jb21tZXJjaWFsLm9jc3Au aWRlbnRydXN0LmNvbTBCBggrBgEFBQcwAoY2aHR0cDovL3ZhbGlkYXRpb24uaWRlbnRydXN0 LmNvbS9jZXJ0cy90cnVzdGlkY2FhMTIucDdjMB8GA1UdIwQYMBaAFKRz2u9pNYp1zKAZewgy +GuJ5ELsMAkGA1UdEwQCMAAwggEsBgNVHSAEggEjMIIBHzCCARsGC2CGSAGG+S8ABgsBMIIB CjBKBggrBgEFBQcCARY+aHR0cHM6Ly9zZWN1cmUuaWRlbnRydXN0LmNvbS9jZXJ0aWZpY2F0 ZXMvcG9saWN5L3RzL2luZGV4Lmh0bWwwgbsGCCsGAQUFBwICMIGuGoGrVGhpcyBUcnVzdElE IENlcnRpZmljYXRlIGhhcyBiZWVuIGlzc3VlZCBpbiBhY2NvcmRhbmNlIHdpdGggCklkZW5U cnVzdCdzIFRydXN0SUQgQ2VydGlmaWNhdGUgUG9saWN5IGZvdW5kIGF0IGh0dHBzOi8vc2Vj dXJlLmlkZW50cnVzdC5jb20vY2VydGlmaWNhdGVzL3BvbGljeS90cy9pbmRleC5odG1sMEUG A1UdHwQ+MDwwOqA4oDaGNGh0dHA6Ly92YWxpZGF0aW9uLmlkZW50cnVzdC5jb20vY3JsL3Ry dXN0aWRjYWExMi5jcmwwHwYDVR0RBBgwFoEUamFsdG1hbkBhdXJpc3Rvci5jb20wHQYDVR0O BBYEFNefZrPaqPUvaS6V6kAmHDwFhoDiMB0GA1UdJQQWMBQGCCsGAQUFBwMCBggrBgEFBQcD BDANBgkqhkiG9w0BAQsFAAOCAQEAKlssrfOJ5+WwHyhFSeSsioN0qpg2QDX/uvodF38JbquO 1U0my0j3Cc/bwk48++bjzp0Fvk/Kkcmss5/6zzJMjr9rf12QCQfKkbO9nMm8Bg6IP3pYgk0W /F1h3ZQF3OgBn3zZoOd3f1a6dF6z12MqKA/2g5GKrQFxkdzTGrNw6ISE9uY8ysvc3i2N2kas HNi5Etk7StZ1jvFX5sQMIeNdlF+z+BU/AyT7NoBS4gCH+ggF+DG7fAYywvy42Lfu8p6kopKT 5JZpYce1cNjnOaDhzhgeR+oXxoDbekF27JinXHQSKjBxhujcZu5leAkpctFpZxnIKZJZUBiu 31Nm7xYaijCCBpEwggR5oAMCAQICEQD53lZ/yU0Md3D5YBtS2hU7MA0GCSqGSIb3DQEBCwUA MEoxCzAJBgNVBAYTAlVTMRIwEAYDVQQKEwlJZGVuVHJ1c3QxJzAlBgNVBAMTHklkZW5UcnVz dCBDb21tZXJjaWFsIFJvb3QgQ0EgMTAeFw0xNTAyMTgyMjI1MTlaFw0yMzAyMTgyMjI1MTla MDoxCzAJBgNVBAYTAlVTMRIwEAYDVQQKEwlJZGVuVHJ1c3QxFzAVBgNVBAMTDlRydXN0SUQg Q0EgQTEyMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA0ZFNPM8KJzSSrkvpmtQl a3ksT+fq1s9c+Ea3YSC/umUkygSm9UkkOoaoNjKZoCx3wef1kwC4pQQV2XHk+AKR+7uMvnOC Iw2cAVUP0/Kuy4X6miqaXGGVDTqwVjaFuFCRVVDTQoI2BTMpwFQi+O/TjD5+E0+TAZbkzsB7 krk4YUbA6hFyT0YboxRUq9M2QHDb+80w53b1UZVO1HS2Mfk9LnINeyzjxiXU/iENK07YvjBO xbY/ftAYPbv/9cY3wrpqZYHoXZc6B9/8+aVCNA45FP3k+YuTDC+ZrmePQBLQJWnyS/QrZEdX saieWUqkUMxPQKTExArCiP61YRYlOIMpKwIDAQABo4ICgDCCAnwwgYkGCCsGAQUFBwEBBH0w ezAwBggrBgEFBQcwAYYkaHR0cDovL2NvbW1lcmNpYWwub2NzcC5pZGVudHJ1c3QuY29tMEcG CCsGAQUFBzAChjtodHRwOi8vdmFsaWRhdGlvbi5pZGVudHJ1c3QuY29tL3Jvb3RzL2NvbW1l cmNpYWxyb290Y2ExLnA3YzAfBgNVHSMEGDAWgBTtRBnA0/AGi+6ke75C5yZUyI42djAPBgNV HRMBAf8EBTADAQH/MIIBIAYDVR0gBIIBFzCCARMwggEPBgRVHSAAMIIBBTCCAQEGCCsGAQUF BwICMIH0MEUWPmh0dHBzOi8vc2VjdXJlLmlkZW50cnVzdC5jb20vY2VydGlmaWNhdGVzL3Bv bGljeS90cy9pbmRleC5odG1sMAMCAQEagapUaGlzIFRydXN0SUQgQ2VydGlmaWNhdGUgaGFz IGJlZW4gaXNzdWVkIGluIGFjY29yZGFuY2Ugd2l0aCBJZGVuVHJ1c3QncyBUcnVzdElEIENl cnRpZmljYXRlIFBvbGljeSBmb3VuZCBhdCBodHRwczovL3NlY3VyZS5pZGVudHJ1c3QuY29t L2NlcnRpZmljYXRlcy9wb2xpY3kvdHMvaW5kZXguaHRtbDBKBgNVHR8EQzBBMD+gPaA7hjlo dHRwOi8vdmFsaWRhdGlvbi5pZGVudHJ1c3QuY29tL2NybC9jb21tZXJjaWFscm9vdGNhMS5j cmwwHQYDVR0lBBYwFAYIKwYBBQUHAwIGCCsGAQUFBwMEMA4GA1UdDwEB/wQEAwIBhjAdBgNV HQ4EFgQUpHPa72k1inXMoBl7CDL4a4nkQuwwDQYJKoZIhvcNAQELBQADggIBAA3hgq7S+/Tr Yxl+D7ExI1Rdgq8fC9kiT7ofWlSaK/IMjgjoDfBbPGWvzdkmbSgYgXo8GxuAon9+HLIjNv68 BgUmbIjwj/SYaVz6chA25XZdjxzKk+hUkqCmfOn/twQJeRfxHg3I+0Sfwp5xs10YF0Robhrs CRne6OUmh9mph0fE3b21k90OVnx9Hfr+YAV4ISrTA6045zQTKGzb370whliPLFo+hNL6XzEt y5hfdFaWKtHIfpE994CLmTJI4SEbWq40d7TpAjCmKCPIVPq/+9GqggGvtakM5K3VXNc9VtKP U9xYGCTDIYoeVBQ65JsdsdyM4PzDzAdINsv4vaF7yE03nh2jLV7XAkcqad9vS4EB4hKjFFsm cwxa+ACUfkVWtBaWBqN4f/o1thsFJHEAu4Q6oRB6mYkzqrPigPazF2rgYw3lp0B1gSzCRj+j RtErIVdMPeZ2p5Fdx7SNhBtabuhqmpJkFxwW9SBg6sHvy0HpzVvEiBpApFKG1ZHXMwzQl+pR 8P27wWDsblJU7Qgb8ZzGRK9l5GOFhxtN+oXZ4CCmunLMtaZ2vSai7du/VKrg64GGZNAKerEB evjJVNFgeSnmUK9GB4kCZ7U5NWlU+2H87scntW4Q/0Y6vqQJcJeaMHg/dQnahTQ2p+hB1xJJ K32GWIAucTFMSOKLbQHadIOiMYIDFDCCAxACAQEwTjA6MQswCQYDVQQGEwJVUzESMBAGA1UE ChMJSWRlblRydXN0MRcwFQYDVQQDEw5UcnVzdElEIENBIEExMgIQQAFe4D0YrXK5RrCU85MI 7zANBglghkgBZQMEAgEFAKCCAZcwGAYJKoZIhvcNAQkDMQsGCSqGSIb3DQEHATAcBgkqhkiG 9w0BCQUxDxcNMTgwMjA0MjIyMTE2WjAvBgkqhkiG9w0BCQQxIgQgfGR9xX2saiQsxyju1zf6 8w9RHDcDQd4jynucxY9Z/wMwXQYJKwYBBAGCNxAEMVAwTjA6MQswCQYDVQQGEwJVUzESMBAG A1UEChMJSWRlblRydXN0MRcwFQYDVQQDEw5UcnVzdElEIENBIEExMgIQQAFe4D0YrXK5RrCU 85MI7zBfBgsqhkiG9w0BCRACCzFQoE4wOjELMAkGA1UEBhMCVVMxEjAQBgNVBAoTCUlkZW5U cnVzdDEXMBUGA1UEAxMOVHJ1c3RJRCBDQSBBMTICEEABXuA9GK1yuUawlPOTCO8wbAYJKoZI hvcNAQkPMV8wXTALBglghkgBZQMEASowCwYJYIZIAWUDBAECMAoGCCqGSIb3DQMHMA4GCCqG SIb3DQMCAgIAgDANBggqhkiG9w0DAgIBQDAHBgUrDgMCBzANBggqhkiG9w0DAgIBKDANBgkq hkiG9w0BAQEFAASCAQCEVmQQG49XQLTytRh3JlXyWdx6t6yK+Jn1PRT61i+/N/1wRKKcqsb6 M6VbUZcYaTQ9JhZIOetEDfCtlXjUBc6uBynMAr6H/ELgJs1sFBD8/MYHRiOedd2Fpe4AoM6c QvK6mSTeiVsXJM5ELgbgVe04sX3P5qLbaP14hkeHRRVW7f2jNepcQB1KovyRQcyvVqNRwNjw LQttrDSqq0nSxCOO92Zcf0AgVKiCML5CMYxFUQYFiQtAxIoStEWetxK+d+LXs3FOVTGlB3im 5ynOkJuLXR/K+BkGcGo9SZ8BfgOHB8319Ia+F9OFqSoXlsiUGfSoAvaRVUuWJVP9kHJ/UcOt AAAAAAAA --------------ms010403050703030304060109-- From jaltman@auristor.com Mon Feb 5 06:17:34 2018 From: jaltman@auristor.com (Jeffrey Altman) Date: Mon, 5 Feb 2018 01:17:34 -0500 Subject: [OpenAFS] connection timed out, how long is the timeout? In-Reply-To: <758b1f24-b1a5-28f7-ef47-a8169d29cd8c@altum.de> References: <20180204122930.243hfca2xgjm7oz4@calhariz.com> <758b1f24-b1a5-28f7-ef47-a8169d29cd8c@altum.de> Message-ID: This is a cryptographically signed message in MIME format. --------------ms010702000504020009030308 Content-Type: multipart/mixed; boundary="------------C539361FAF4663616935EC50" Content-Language: en-US This is a multi-part message in MIME format. --------------C539361FAF4663616935EC50 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 2/4/2018 7:54 AM, Dirk Heinrichs wrote: > Am 04.02.2018 um 13:29 schrieb Jose M Calhariz: >=20 >> The core of my infra-structure are 4 afsdb >=20 > Wasn't it so that it's better to have an odd number of DB servers (with= > a max. of 5)? The maximum number of ubik servers in an AFS3 cell is 20. This is a protocol constraint. However, due to performance characteristics it is unlikely that anyone could run that number of servers in a production cell. As the server count increases the number of messages that must be exchanged to conduct an election, complete database synchronization recovery, maintain quorum, and complete remote transactions. These messages compete with the application level requests arriving from clients. As the application level calls (vl, pt, ...) increase the risk of delayed processing of disk and vote calls increases which can lead to loss of quorum or remote transaction failures. The reason that odd numbers of servers are preferred is because of the failover properties. one server - single point of failure. outage leads to read and write failures. two servers - single point of failure for writes. only the lowest ipv4 address server can be elected coordinator. if it fails, writes are blocked. If it fails during a write transaction, read transactions on the second server are blocked until the first server recovers. three or four servers - either the first or second lowest ipv4 address servers can be elected coordinator. any one server can fail without loss of write or read. five or six servers - any of the first three lowest ipv4 address servers can be elected coordinator. any two servers can fail without loss of write or read. Although adding a fourth server increases the number of servers that can satisfy read requests, the lack of improved resiliency to failure and the increased risk of quorum loss makes its less desirable. The original poster indicated that his ubik servers are virtual machines. The OpenAFS Rx stack throughput is limited by the clock speed of a single processor core. The 1.6 ubik stack is further limited by the need to share a single processor core with all of the vote, disk and application call processing. As a result, anything that increases the overhead reduces increases the risk of quorum failures. This includes virtualization as well as the overhead imposed as a result of Meltdown and Spectre fixes. Meltdown and Spectre can provided a double whammy as a result of increased overhead both within the virtual machine and within the host's virtualization layer. AuriStor's UBIK variant does not suffer the scaling problems of AFS3 UBIK. AuriStor's UBIK has been successfully tested with 80 ubik servers in a cell. This is possible because of a more efficient protocol that is incompatible with AFS3 UBIK and the efficiencies in AuriStor's Rx implementation. Jeffrey Altman AuriStor, Inc. --------------C539361FAF4663616935EC50 Content-Type: text/x-vcard; charset=utf-8; name="jaltman.vcf" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="jaltman.vcf" begin:vcard fn:Jeffrey Altman n:Altman;Jeffrey org:AuriStor, Inc. adr:Suite 6B;;255 West 94Th Street;New York;New York;10025-6985;United St= ates email;internet:jaltman@auristor.com title:Founder and CEO tel;work:+1-212-769-9018 note;quoted-printable:LinkedIn: https://www.linkedin.com/in/jeffreyaltman= =3D0D=3D0A=3D Skype: jeffrey.e.altman=3D0D=3D0A=3D =09 url:https://www.auristor.com/ version:2.1 end:vcard --------------C539361FAF4663616935EC50-- --------------ms010702000504020009030308 Content-Type: application/pkcs7-signature; name="smime.p7s" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="smime.p7s" Content-Description: S/MIME Cryptographic Signature MIAGCSqGSIb3DQEHAqCAMIACAQExDzANBglghkgBZQMEAgEFADCABgkqhkiG9w0BBwEAAKCC DIIwggXpMIIE0aADAgECAhBAAV7gPRitcrlGsJTzkwjvMA0GCSqGSIb3DQEBCwUAMDoxCzAJ BgNVBAYTAlVTMRIwEAYDVQQKEwlJZGVuVHJ1c3QxFzAVBgNVBAMTDlRydXN0SUQgQ0EgQTEy MB4XDTE3MTAwMzAzMTczM1oXDTE4MTEwMzAzMTczM1owgYUxLTArBgNVBAsMJFZlcmlmaWVk IEVtYWlsOiBqYWx0bWFuQGF1cmlzdG9yLmNvbTEjMCEGCSqGSIb3DQEJARYUamFsdG1hbkBh dXJpc3Rvci5jb20xLzAtBgoJkiaJk/IsZAEBEx9BMDE0MjdFMDAwMDAxNUVFMDNEMTg3QTAw MDA0QUE1MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAqqJC89ZA1DSS7t/Ug8Dd BQv5nBDumInWtFvHwVCORitVCvlkX4SfqKpERATq0eHOSc0zEz1PUjhAT8lgbNj8Bs92pL9t DW/VHHpq11w06rCEmZJNxgErAIvMpRuAhGrzvBpQBLj8nDArHWw+5nRn/KnK7ZO81LEEj4TG w0PEKGSa0aFA+JdRTJ6BZSDP2o/8AHx+Bw4JgW8VppAe4IuY/F+JoYtyQDL+fm1YMnFMtf1A 6IvlGXD7gMksPRbVIfD+QpHZbQvNXZAVVDaCWZuWQq46Vl4lSlkmW9yMlGddvFGl2zSMK7ny f0kbWJLw9lZxXDegY0/ciJPACPsyBwuyLwIDAQABo4ICnTCCApkwDgYDVR0PAQH/BAQDAgWg MIGEBggrBgEFBQcBAQR4MHYwMAYIKwYBBQUHMAGGJGh0dHA6Ly9jb21tZXJjaWFsLm9jc3Au aWRlbnRydXN0LmNvbTBCBggrBgEFBQcwAoY2aHR0cDovL3ZhbGlkYXRpb24uaWRlbnRydXN0 LmNvbS9jZXJ0cy90cnVzdGlkY2FhMTIucDdjMB8GA1UdIwQYMBaAFKRz2u9pNYp1zKAZewgy +GuJ5ELsMAkGA1UdEwQCMAAwggEsBgNVHSAEggEjMIIBHzCCARsGC2CGSAGG+S8ABgsBMIIB CjBKBggrBgEFBQcCARY+aHR0cHM6Ly9zZWN1cmUuaWRlbnRydXN0LmNvbS9jZXJ0aWZpY2F0 ZXMvcG9saWN5L3RzL2luZGV4Lmh0bWwwgbsGCCsGAQUFBwICMIGuGoGrVGhpcyBUcnVzdElE IENlcnRpZmljYXRlIGhhcyBiZWVuIGlzc3VlZCBpbiBhY2NvcmRhbmNlIHdpdGggCklkZW5U cnVzdCdzIFRydXN0SUQgQ2VydGlmaWNhdGUgUG9saWN5IGZvdW5kIGF0IGh0dHBzOi8vc2Vj dXJlLmlkZW50cnVzdC5jb20vY2VydGlmaWNhdGVzL3BvbGljeS90cy9pbmRleC5odG1sMEUG A1UdHwQ+MDwwOqA4oDaGNGh0dHA6Ly92YWxpZGF0aW9uLmlkZW50cnVzdC5jb20vY3JsL3Ry dXN0aWRjYWExMi5jcmwwHwYDVR0RBBgwFoEUamFsdG1hbkBhdXJpc3Rvci5jb20wHQYDVR0O BBYEFNefZrPaqPUvaS6V6kAmHDwFhoDiMB0GA1UdJQQWMBQGCCsGAQUFBwMCBggrBgEFBQcD BDANBgkqhkiG9w0BAQsFAAOCAQEAKlssrfOJ5+WwHyhFSeSsioN0qpg2QDX/uvodF38JbquO 1U0my0j3Cc/bwk48++bjzp0Fvk/Kkcmss5/6zzJMjr9rf12QCQfKkbO9nMm8Bg6IP3pYgk0W /F1h3ZQF3OgBn3zZoOd3f1a6dF6z12MqKA/2g5GKrQFxkdzTGrNw6ISE9uY8ysvc3i2N2kas HNi5Etk7StZ1jvFX5sQMIeNdlF+z+BU/AyT7NoBS4gCH+ggF+DG7fAYywvy42Lfu8p6kopKT 5JZpYce1cNjnOaDhzhgeR+oXxoDbekF27JinXHQSKjBxhujcZu5leAkpctFpZxnIKZJZUBiu 31Nm7xYaijCCBpEwggR5oAMCAQICEQD53lZ/yU0Md3D5YBtS2hU7MA0GCSqGSIb3DQEBCwUA MEoxCzAJBgNVBAYTAlVTMRIwEAYDVQQKEwlJZGVuVHJ1c3QxJzAlBgNVBAMTHklkZW5UcnVz dCBDb21tZXJjaWFsIFJvb3QgQ0EgMTAeFw0xNTAyMTgyMjI1MTlaFw0yMzAyMTgyMjI1MTla MDoxCzAJBgNVBAYTAlVTMRIwEAYDVQQKEwlJZGVuVHJ1c3QxFzAVBgNVBAMTDlRydXN0SUQg Q0EgQTEyMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA0ZFNPM8KJzSSrkvpmtQl a3ksT+fq1s9c+Ea3YSC/umUkygSm9UkkOoaoNjKZoCx3wef1kwC4pQQV2XHk+AKR+7uMvnOC Iw2cAVUP0/Kuy4X6miqaXGGVDTqwVjaFuFCRVVDTQoI2BTMpwFQi+O/TjD5+E0+TAZbkzsB7 krk4YUbA6hFyT0YboxRUq9M2QHDb+80w53b1UZVO1HS2Mfk9LnINeyzjxiXU/iENK07YvjBO xbY/ftAYPbv/9cY3wrpqZYHoXZc6B9/8+aVCNA45FP3k+YuTDC+ZrmePQBLQJWnyS/QrZEdX saieWUqkUMxPQKTExArCiP61YRYlOIMpKwIDAQABo4ICgDCCAnwwgYkGCCsGAQUFBwEBBH0w ezAwBggrBgEFBQcwAYYkaHR0cDovL2NvbW1lcmNpYWwub2NzcC5pZGVudHJ1c3QuY29tMEcG CCsGAQUFBzAChjtodHRwOi8vdmFsaWRhdGlvbi5pZGVudHJ1c3QuY29tL3Jvb3RzL2NvbW1l cmNpYWxyb290Y2ExLnA3YzAfBgNVHSMEGDAWgBTtRBnA0/AGi+6ke75C5yZUyI42djAPBgNV HRMBAf8EBTADAQH/MIIBIAYDVR0gBIIBFzCCARMwggEPBgRVHSAAMIIBBTCCAQEGCCsGAQUF BwICMIH0MEUWPmh0dHBzOi8vc2VjdXJlLmlkZW50cnVzdC5jb20vY2VydGlmaWNhdGVzL3Bv bGljeS90cy9pbmRleC5odG1sMAMCAQEagapUaGlzIFRydXN0SUQgQ2VydGlmaWNhdGUgaGFz IGJlZW4gaXNzdWVkIGluIGFjY29yZGFuY2Ugd2l0aCBJZGVuVHJ1c3QncyBUcnVzdElEIENl cnRpZmljYXRlIFBvbGljeSBmb3VuZCBhdCBodHRwczovL3NlY3VyZS5pZGVudHJ1c3QuY29t L2NlcnRpZmljYXRlcy9wb2xpY3kvdHMvaW5kZXguaHRtbDBKBgNVHR8EQzBBMD+gPaA7hjlo dHRwOi8vdmFsaWRhdGlvbi5pZGVudHJ1c3QuY29tL2NybC9jb21tZXJjaWFscm9vdGNhMS5j cmwwHQYDVR0lBBYwFAYIKwYBBQUHAwIGCCsGAQUFBwMEMA4GA1UdDwEB/wQEAwIBhjAdBgNV HQ4EFgQUpHPa72k1inXMoBl7CDL4a4nkQuwwDQYJKoZIhvcNAQELBQADggIBAA3hgq7S+/Tr Yxl+D7ExI1Rdgq8fC9kiT7ofWlSaK/IMjgjoDfBbPGWvzdkmbSgYgXo8GxuAon9+HLIjNv68 BgUmbIjwj/SYaVz6chA25XZdjxzKk+hUkqCmfOn/twQJeRfxHg3I+0Sfwp5xs10YF0Robhrs CRne6OUmh9mph0fE3b21k90OVnx9Hfr+YAV4ISrTA6045zQTKGzb370whliPLFo+hNL6XzEt y5hfdFaWKtHIfpE994CLmTJI4SEbWq40d7TpAjCmKCPIVPq/+9GqggGvtakM5K3VXNc9VtKP U9xYGCTDIYoeVBQ65JsdsdyM4PzDzAdINsv4vaF7yE03nh2jLV7XAkcqad9vS4EB4hKjFFsm cwxa+ACUfkVWtBaWBqN4f/o1thsFJHEAu4Q6oRB6mYkzqrPigPazF2rgYw3lp0B1gSzCRj+j RtErIVdMPeZ2p5Fdx7SNhBtabuhqmpJkFxwW9SBg6sHvy0HpzVvEiBpApFKG1ZHXMwzQl+pR 8P27wWDsblJU7Qgb8ZzGRK9l5GOFhxtN+oXZ4CCmunLMtaZ2vSai7du/VKrg64GGZNAKerEB evjJVNFgeSnmUK9GB4kCZ7U5NWlU+2H87scntW4Q/0Y6vqQJcJeaMHg/dQnahTQ2p+hB1xJJ K32GWIAucTFMSOKLbQHadIOiMYIDFDCCAxACAQEwTjA6MQswCQYDVQQGEwJVUzESMBAGA1UE ChMJSWRlblRydXN0MRcwFQYDVQQDEw5UcnVzdElEIENBIEExMgIQQAFe4D0YrXK5RrCU85MI 7zANBglghkgBZQMEAgEFAKCCAZcwGAYJKoZIhvcNAQkDMQsGCSqGSIb3DQEHATAcBgkqhkiG 9w0BCQUxDxcNMTgwMjA1MDYxNzM0WjAvBgkqhkiG9w0BCQQxIgQg3A0i0skboIbrJ1ZoW4iz LqtsqFUeHAfyLJu3+wXsBOgwXQYJKwYBBAGCNxAEMVAwTjA6MQswCQYDVQQGEwJVUzESMBAG A1UEChMJSWRlblRydXN0MRcwFQYDVQQDEw5UcnVzdElEIENBIEExMgIQQAFe4D0YrXK5RrCU 85MI7zBfBgsqhkiG9w0BCRACCzFQoE4wOjELMAkGA1UEBhMCVVMxEjAQBgNVBAoTCUlkZW5U cnVzdDEXMBUGA1UEAxMOVHJ1c3RJRCBDQSBBMTICEEABXuA9GK1yuUawlPOTCO8wbAYJKoZI hvcNAQkPMV8wXTALBglghkgBZQMEASowCwYJYIZIAWUDBAECMAoGCCqGSIb3DQMHMA4GCCqG SIb3DQMCAgIAgDANBggqhkiG9w0DAgIBQDAHBgUrDgMCBzANBggqhkiG9w0DAgIBKDANBgkq hkiG9w0BAQEFAASCAQBjR/ZSNM/XpXHjkF4Rw9waeqA+SQUXiCqozicoro0WONmPTpOfhMBc x4C5E98AH2WH6GNanj4Nzy+Hm+uN5poLO1+l/rNcmyMz38wIn9uDQqynrSyqtaR9C0ghWJ10 xzbfhVbI0q6xc3c+7nF+9IXqRvaO5vRIaGIng4oFiIu4hJg3dB2ls7aDJz8Fy5ssGmKe58pI sINpyb75Ybwe3fEy+5zUvx9cwmzK56eKD+z9ATKW+In5P1qs794HXECLjSRihuY9fKvobRK/ puKYmHhyTo5OZlBa0w2cSFaSn+c0Iw0GTfW+J5BB0yon5QWIRkzxC+3/MaM4HlY4A1CZGuQ2 AAAAAAAA --------------ms010702000504020009030308-- From stephan.wiesand@desy.de Mon Feb 5 17:31:02 2018 From: stephan.wiesand@desy.de (Stephan Wiesand) Date: Mon, 5 Feb 2018 18:31:02 +0100 Subject: [OpenAFS] RHEL 7.5 beta / 3.10.0-830.el7.x86_66 kernel lock up In-Reply-To: <54924524-154a-bee0-1719-77f8af636f63@auristor.com> References: <54924524-154a-bee0-1719-77f8af636f63@auristor.com> Message-ID: > On 04.Feb 2018, at 02:11, Jeffrey Altman wrote: > > On 2/2/2018 6:04 PM, Kodiak Firesmith wrote: >> I'm relatively new to handling OpenAFS. Are these problems part of a >> normal "kernel release; openafs update" cycle and perhaps I'm getting >> snagged just by being too early of an adopter? I wanted to raise the >> alarm on this and see if anything else was needed from me as the >> reporter of the issue, but perhaps that's an overreaction to what is >> just part of a normal process I just haven't been tuned into in prior >> RHEL release cycles? > > > Kodiak, > > On RHEL, DKMS is safe to use for kernel modules that restrict themselves > to using the restricted set of kernel interfaces (the RHEL KABI) that > Red Hat has designated will be supported across the lifespan of the RHEL > major version number. OpenAFS is not such a kernel module. As a result > it is vulnerable to breakage each and every time a new kernel is shipped. Jeffrey, the usual way to use DKMS is to either have it build a module for a newly installed kernel or install a prebuilt module for that kernel. It may be possible to abuse it for providing a module built for another kernel, but I think that won't happen accidentally. You may be confusing DKMS with RHEL's "KABI tracking kmods". Those should be safe to use within a RHEL minor release (and the SL packaging has been using them like this since EL6.4), but aren't across minor releases (and that's why the SL packaging modifies the kmod handling to require a build for the minor release in question. > There are two types of failures that can occur: > > 1. a change results in failure to build the OpenAFS kernel module > for the new kernel > > 2. a change results in the OpenAFS kernel module building and > successfully loading but failing to operate correctly The latter shouldn't happen within a minor release, but can across minor releases. > It is the second of these possibilities that has taken place with the > release of the 3.10.0-830.el7 kernel shipped as part of the RHEL 7.5 beta. > > Are you an early adopter of RHEL 7.5 beta? Absolutely, its a beta > release and as such you should expect that there will be bugs and that > third party kernel modules that do not adhere to the KABI functionality > might have compatibility issues. The -830 kernel can break 3rd-party modules using non-whitelisted ABIs, whether or not they adhere to the "KABI functionality". > There was a compatibility issue with RHEL 7.4 kernel > (3.10.0_693.1.1.el7) as well that was only fixed in the OpenAFS 1.6 > release series this past week as part of 1.6.22.2: > > http://www.openafs.org/dl/openafs/1.6.22.2/RELNOTES-1.6.22.2 Yes, and this one was hard to fix. Thanks are due to Mark Vitale for developing the fix and all those who reviewed and tested it. > Jeffrey Altman > AuriStor, Inc. > > P.S. - Welcome to the community. Seconded. In particular, the problem report regarding the EL7.5beta kernel was absolutely appropriate. -- Stephan Wiesand DESY - DV - Platanenallee 6 15738 Zeuthen, Germany From jose.calhariz@tecnico.ulisboa.pt Mon Feb 5 18:50:33 2018 From: jose.calhariz@tecnico.ulisboa.pt (Jose M Calhariz) Date: Mon, 5 Feb 2018 18:50:33 +0000 Subject: [OpenAFS] connection timed out, how long is the timeout? In-Reply-To: <1ff6bf63-81c0-6d73-2178-60e792ac9506@auristor.com> References: <20180204122930.243hfca2xgjm7oz4@calhariz.com> <1ff6bf63-81c0-6d73-2178-60e792ac9506@auristor.com> Message-ID: <20180205185033.2auyxt32jojbqblg@calhariz.com> On Sun, Feb 04, 2018 at 05:21:16PM -0500, Jeffrey Altman wrote: > On 2/4/2018 7:29 AM, Jose M Calhariz wrote: > > I am chasing the root problem in my infra-structure of afsdb and > > afs-fileservers. Sometimes my afsdb loses quorum in the middle of a > > vos operation or the Linux clients time out talking to the > > file servers. To help diagnose the problem I would like to know how > > long is the timeout and if I can change the time out connections in > > the Debian clients and for the vos operations. > >[...] > > The core of my infra-structure are 4 afsdb running Debian 9, and using > > OpenAFS from Debian 1.6.20, on a shared virtualization platform. The > > file-servers running Debian 9 and using OpenAFS from Debian, 1.6.20, > > are VMs in dedicated hosts for OpenAFS on top of libvirt/KVM. > > Jose, > (...) Thank you for your report. I will read it with very much attention this nigth and again tomorrow. I am travelling from FOSDEM to home. > > Jeffrey Altman > AuriStor, Inc. > begin:vcard > fn:Jeffrey Altman > n:Altman;Jeffrey > org:AuriStor, Inc. > adr:Suite 6B;;255 West 94Th Street;New York;New York;10025-6985;United States > email;internet:jaltman@auristor.com > title:Founder and CEO > tel;work:+1-212-769-9018 > note;quoted-printable:LinkedIn: https://www.linkedin.com/in/jeffreyaltman=0D=0A= > Skype: jeffrey.e.altman=0D=0A= > > url:https://www.auristor.com/ > version:2.1 > end:vcard > Kind regards Jose M Calhariz -- -- De cem favoritos dos reis, noventa e cinco foram enforcados --Napolećo Bonaparte From jaltman@auristor.com Mon Feb 5 20:16:32 2018 From: jaltman@auristor.com (Jeffrey Altman) Date: Mon, 5 Feb 2018 15:16:32 -0500 Subject: [OpenAFS] RHEL 7.5 beta / 3.10.0-830.el7.x86_66 kernel lock up In-Reply-To: References: <54924524-154a-bee0-1719-77f8af636f63@auristor.com> Message-ID: This is a cryptographically signed message in MIME format. --------------ms040404080402030905040607 Content-Type: multipart/mixed; boundary="------------EDEE15541019E04945A29AA9" Content-Language: en-US This is a multi-part message in MIME format. --------------EDEE15541019E04945A29AA9 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On 2/5/2018 12:31 PM, Stephan Wiesand wrote: > the usual way to use DKMS is to either have it build a module for a new= ly > installed kernel or install a prebuilt module for that kernel. It may b= e > possible to abuse it for providing a module built for another kernel, b= ut > I think that won't happen accidentally. >=20 > You may be confusing DKMS with RHEL's "KABI tracking kmods". Those shou= ld > be safe to use within a RHEL minor release (and the SL packaging has be= en > using them like this since EL6.4), but aren't across minor releases (an= d > that's why the SL packaging modifies the kmod handling to require a bui= ld > for the minor release in question. On RHEL DKMS and KABI are tightly related because of the way in which Red Hat engineers back port feature and functionality changes. During mainline kernel development a change is likely to break an existing interface. Doing so is encouraged so that compilation errors will identify where code modifications are required. On RHEL there is a strong desire to maintain KABI compatibility. Whenever possible, backports are altered to preserve the existing binary interfaces at the risk of changing the interface semantics. As a result, compilation failures do not occur but semantic differences can result in breakage for third party kernel modules that have not been modified at the source level to be aware of the change. The breakage of OpenAFS by RHEL 7.4 and 7.5 (minor releases) were both due to back porting functionality in this manner. Such incompatibilities can result in system panics or silent data corruption depending upon the change. Jeffrey Altman AuriStor, Inc. --------------EDEE15541019E04945A29AA9 Content-Type: text/x-vcard; charset=utf-8; name="jaltman.vcf" Content-Transfer-Encoding: quoted-printable Content-Disposition: attachment; filename="jaltman.vcf" begin:vcard fn:Jeffrey Altman n:Altman;Jeffrey org:AuriStor, Inc. adr:Suite 6B;;255 West 94Th Street;New York;New York;10025-6985;United St= ates email;internet:jaltman@auristor.com title:Founder and CEO tel;work:+1-212-769-9018 note;quoted-printable:LinkedIn: https://www.linkedin.com/in/jeffreyaltman= =3D0D=3D0A=3D Skype: jeffrey.e.altman=3D0D=3D0A=3D =09 url:https://www.auristor.com/ version:2.1 end:vcard --------------EDEE15541019E04945A29AA9-- --------------ms040404080402030905040607 Content-Type: application/pkcs7-signature; name="smime.p7s" Content-Transfer-Encoding: base64 Content-Disposition: attachment; filename="smime.p7s" Content-Description: S/MIME Cryptographic Signature MIAGCSqGSIb3DQEHAqCAMIACAQExDzANBglghkgBZQMEAgEFADCABgkqhkiG9w0BBwEAAKCC DIIwggXpMIIE0aADAgECAhBAAV7gPRitcrlGsJTzkwjvMA0GCSqGSIb3DQEBCwUAMDoxCzAJ BgNVBAYTAlVTMRIwEAYDVQQKEwlJZGVuVHJ1c3QxFzAVBgNVBAMTDlRydXN0SUQgQ0EgQTEy MB4XDTE3MTAwMzAzMTczM1oXDTE4MTEwMzAzMTczM1owgYUxLTArBgNVBAsMJFZlcmlmaWVk IEVtYWlsOiBqYWx0bWFuQGF1cmlzdG9yLmNvbTEjMCEGCSqGSIb3DQEJARYUamFsdG1hbkBh dXJpc3Rvci5jb20xLzAtBgoJkiaJk/IsZAEBEx9BMDE0MjdFMDAwMDAxNUVFMDNEMTg3QTAw MDA0QUE1MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEAqqJC89ZA1DSS7t/Ug8Dd BQv5nBDumInWtFvHwVCORitVCvlkX4SfqKpERATq0eHOSc0zEz1PUjhAT8lgbNj8Bs92pL9t DW/VHHpq11w06rCEmZJNxgErAIvMpRuAhGrzvBpQBLj8nDArHWw+5nRn/KnK7ZO81LEEj4TG w0PEKGSa0aFA+JdRTJ6BZSDP2o/8AHx+Bw4JgW8VppAe4IuY/F+JoYtyQDL+fm1YMnFMtf1A 6IvlGXD7gMksPRbVIfD+QpHZbQvNXZAVVDaCWZuWQq46Vl4lSlkmW9yMlGddvFGl2zSMK7ny f0kbWJLw9lZxXDegY0/ciJPACPsyBwuyLwIDAQABo4ICnTCCApkwDgYDVR0PAQH/BAQDAgWg MIGEBggrBgEFBQcBAQR4MHYwMAYIKwYBBQUHMAGGJGh0dHA6Ly9jb21tZXJjaWFsLm9jc3Au aWRlbnRydXN0LmNvbTBCBggrBgEFBQcwAoY2aHR0cDovL3ZhbGlkYXRpb24uaWRlbnRydXN0 LmNvbS9jZXJ0cy90cnVzdGlkY2FhMTIucDdjMB8GA1UdIwQYMBaAFKRz2u9pNYp1zKAZewgy +GuJ5ELsMAkGA1UdEwQCMAAwggEsBgNVHSAEggEjMIIBHzCCARsGC2CGSAGG+S8ABgsBMIIB CjBKBggrBgEFBQcCARY+aHR0cHM6Ly9zZWN1cmUuaWRlbnRydXN0LmNvbS9jZXJ0aWZpY2F0 ZXMvcG9saWN5L3RzL2luZGV4Lmh0bWwwgbsGCCsGAQUFBwICMIGuGoGrVGhpcyBUcnVzdElE IENlcnRpZmljYXRlIGhhcyBiZWVuIGlzc3VlZCBpbiBhY2NvcmRhbmNlIHdpdGggCklkZW5U cnVzdCdzIFRydXN0SUQgQ2VydGlmaWNhdGUgUG9saWN5IGZvdW5kIGF0IGh0dHBzOi8vc2Vj dXJlLmlkZW50cnVzdC5jb20vY2VydGlmaWNhdGVzL3BvbGljeS90cy9pbmRleC5odG1sMEUG A1UdHwQ+MDwwOqA4oDaGNGh0dHA6Ly92YWxpZGF0aW9uLmlkZW50cnVzdC5jb20vY3JsL3Ry dXN0aWRjYWExMi5jcmwwHwYDVR0RBBgwFoEUamFsdG1hbkBhdXJpc3Rvci5jb20wHQYDVR0O BBYEFNefZrPaqPUvaS6V6kAmHDwFhoDiMB0GA1UdJQQWMBQGCCsGAQUFBwMCBggrBgEFBQcD BDANBgkqhkiG9w0BAQsFAAOCAQEAKlssrfOJ5+WwHyhFSeSsioN0qpg2QDX/uvodF38JbquO 1U0my0j3Cc/bwk48++bjzp0Fvk/Kkcmss5/6zzJMjr9rf12QCQfKkbO9nMm8Bg6IP3pYgk0W /F1h3ZQF3OgBn3zZoOd3f1a6dF6z12MqKA/2g5GKrQFxkdzTGrNw6ISE9uY8ysvc3i2N2kas HNi5Etk7StZ1jvFX5sQMIeNdlF+z+BU/AyT7NoBS4gCH+ggF+DG7fAYywvy42Lfu8p6kopKT 5JZpYce1cNjnOaDhzhgeR+oXxoDbekF27JinXHQSKjBxhujcZu5leAkpctFpZxnIKZJZUBiu 31Nm7xYaijCCBpEwggR5oAMCAQICEQD53lZ/yU0Md3D5YBtS2hU7MA0GCSqGSIb3DQEBCwUA MEoxCzAJBgNVBAYTAlVTMRIwEAYDVQQKEwlJZGVuVHJ1c3QxJzAlBgNVBAMTHklkZW5UcnVz dCBDb21tZXJjaWFsIFJvb3QgQ0EgMTAeFw0xNTAyMTgyMjI1MTlaFw0yMzAyMTgyMjI1MTla MDoxCzAJBgNVBAYTAlVTMRIwEAYDVQQKEwlJZGVuVHJ1c3QxFzAVBgNVBAMTDlRydXN0SUQg Q0EgQTEyMIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA0ZFNPM8KJzSSrkvpmtQl a3ksT+fq1s9c+Ea3YSC/umUkygSm9UkkOoaoNjKZoCx3wef1kwC4pQQV2XHk+AKR+7uMvnOC Iw2cAVUP0/Kuy4X6miqaXGGVDTqwVjaFuFCRVVDTQoI2BTMpwFQi+O/TjD5+E0+TAZbkzsB7 krk4YUbA6hFyT0YboxRUq9M2QHDb+80w53b1UZVO1HS2Mfk9LnINeyzjxiXU/iENK07YvjBO xbY/ftAYPbv/9cY3wrpqZYHoXZc6B9/8+aVCNA45FP3k+YuTDC+ZrmePQBLQJWnyS/QrZEdX saieWUqkUMxPQKTExArCiP61YRYlOIMpKwIDAQABo4ICgDCCAnwwgYkGCCsGAQUFBwEBBH0w ezAwBggrBgEFBQcwAYYkaHR0cDovL2NvbW1lcmNpYWwub2NzcC5pZGVudHJ1c3QuY29tMEcG CCsGAQUFBzAChjtodHRwOi8vdmFsaWRhdGlvbi5pZGVudHJ1c3QuY29tL3Jvb3RzL2NvbW1l cmNpYWxyb290Y2ExLnA3YzAfBgNVHSMEGDAWgBTtRBnA0/AGi+6ke75C5yZUyI42djAPBgNV HRMBAf8EBTADAQH/MIIBIAYDVR0gBIIBFzCCARMwggEPBgRVHSAAMIIBBTCCAQEGCCsGAQUF BwICMIH0MEUWPmh0dHBzOi8vc2VjdXJlLmlkZW50cnVzdC5jb20vY2VydGlmaWNhdGVzL3Bv bGljeS90cy9pbmRleC5odG1sMAMCAQEagapUaGlzIFRydXN0SUQgQ2VydGlmaWNhdGUgaGFz IGJlZW4gaXNzdWVkIGluIGFjY29yZGFuY2Ugd2l0aCBJZGVuVHJ1c3QncyBUcnVzdElEIENl cnRpZmljYXRlIFBvbGljeSBmb3VuZCBhdCBodHRwczovL3NlY3VyZS5pZGVudHJ1c3QuY29t L2NlcnRpZmljYXRlcy9wb2xpY3kvdHMvaW5kZXguaHRtbDBKBgNVHR8EQzBBMD+gPaA7hjlo dHRwOi8vdmFsaWRhdGlvbi5pZGVudHJ1c3QuY29tL2NybC9jb21tZXJjaWFscm9vdGNhMS5j cmwwHQYDVR0lBBYwFAYIKwYBBQUHAwIGCCsGAQUFBwMEMA4GA1UdDwEB/wQEAwIBhjAdBgNV HQ4EFgQUpHPa72k1inXMoBl7CDL4a4nkQuwwDQYJKoZIhvcNAQELBQADggIBAA3hgq7S+/Tr Yxl+D7ExI1Rdgq8fC9kiT7ofWlSaK/IMjgjoDfBbPGWvzdkmbSgYgXo8GxuAon9+HLIjNv68 BgUmbIjwj/SYaVz6chA25XZdjxzKk+hUkqCmfOn/twQJeRfxHg3I+0Sfwp5xs10YF0Robhrs CRne6OUmh9mph0fE3b21k90OVnx9Hfr+YAV4ISrTA6045zQTKGzb370whliPLFo+hNL6XzEt y5hfdFaWKtHIfpE994CLmTJI4SEbWq40d7TpAjCmKCPIVPq/+9GqggGvtakM5K3VXNc9VtKP U9xYGCTDIYoeVBQ65JsdsdyM4PzDzAdINsv4vaF7yE03nh2jLV7XAkcqad9vS4EB4hKjFFsm cwxa+ACUfkVWtBaWBqN4f/o1thsFJHEAu4Q6oRB6mYkzqrPigPazF2rgYw3lp0B1gSzCRj+j RtErIVdMPeZ2p5Fdx7SNhBtabuhqmpJkFxwW9SBg6sHvy0HpzVvEiBpApFKG1ZHXMwzQl+pR 8P27wWDsblJU7Qgb8ZzGRK9l5GOFhxtN+oXZ4CCmunLMtaZ2vSai7du/VKrg64GGZNAKerEB evjJVNFgeSnmUK9GB4kCZ7U5NWlU+2H87scntW4Q/0Y6vqQJcJeaMHg/dQnahTQ2p+hB1xJJ K32GWIAucTFMSOKLbQHadIOiMYIDFDCCAxACAQEwTjA6MQswCQYDVQQGEwJVUzESMBAGA1UE ChMJSWRlblRydXN0MRcwFQYDVQQDEw5UcnVzdElEIENBIEExMgIQQAFe4D0YrXK5RrCU85MI 7zANBglghkgBZQMEAgEFAKCCAZcwGAYJKoZIhvcNAQkDMQsGCSqGSIb3DQEHATAcBgkqhkiG 9w0BCQUxDxcNMTgwMjA1MjAxNjMyWjAvBgkqhkiG9w0BCQQxIgQg2NONdl5SvJfcl+Qxfh0a eV5zvOgHgiCnyDHb8RFolsQwXQYJKwYBBAGCNxAEMVAwTjA6MQswCQYDVQQGEwJVUzESMBAG A1UEChMJSWRlblRydXN0MRcwFQYDVQQDEw5UcnVzdElEIENBIEExMgIQQAFe4D0YrXK5RrCU 85MI7zBfBgsqhkiG9w0BCRACCzFQoE4wOjELMAkGA1UEBhMCVVMxEjAQBgNVBAoTCUlkZW5U cnVzdDEXMBUGA1UEAxMOVHJ1c3RJRCBDQSBBMTICEEABXuA9GK1yuUawlPOTCO8wbAYJKoZI hvcNAQkPMV8wXTALBglghkgBZQMEASowCwYJYIZIAWUDBAECMAoGCCqGSIb3DQMHMA4GCCqG SIb3DQMCAgIAgDANBggqhkiG9w0DAgIBQDAHBgUrDgMCBzANBggqhkiG9w0DAgIBKDANBgkq hkiG9w0BAQEFAASCAQCmj5RTxcmnupLkDK47lWRZB288SChIU3UOxTxEL2RC0ojc8L6Wb/P6 XbFrPdFub5hMzybXSn7RnEeg9l2K4vd66bktU7dQEskgQXLww75gqSByxSuXQz5Ls8GtVyzr yrXczptfU34j4cb3bhndneB7Pu7hlm3siTSvrxsmsYfod9stxy7cd0AMqzXH64L8p6KNlW2W +VTx1IFmvM0tunPjqQUogp6mr/MbpB2eJBbvQVBCi1BZHYfEpl2uxfuA30ZuILm5UkmDtRlp jZ6UtHdPV2rMdEI/Z0g2nn9e35sb60P0dkWWZsgmL0npgAAg7/zJal6yvdW+wwUTQIyEwLdW AAAAAAAA --------------ms040404080402030905040607-- From kfiresmith@gmail.com Wed Feb 7 16:46:28 2018 From: kfiresmith@gmail.com (Kodiak Firesmith) Date: Wed, 7 Feb 2018 11:46:28 -0500 Subject: [OpenAFS] RHEL 7.5 beta / 3.10.0-830.el7.x86_66 kernel lock up In-Reply-To: References: <54924524-154a-bee0-1719-77f8af636f63@auristor.com> Message-ID: --f40304398e6c6e738b0564a20990 Content-Type: text/plain; charset="UTF-8" Hello again All, As part of continued testing, I've been able to confirm that the SystemD double-service startup thing only happens to my hosts when going from RHEL 7.4 to RHEL 7.5beta. On a test host installed directly as RHEL 7.5beta, I get a bit farther with 1.6.18.22, in that I get to the point where OpenAFS "kind of" works. What I'm observing is that the openafs client Kernel module (built by DKMS) loads fine, and just so long as you know where you need to go in /afs, you can get there, and you can read and write files and the OpenAFS 'fs' command works. But doing an 'ls' of /afs or any path underneath results in "ls: reading directory /afs/: Not a directory". I ran an strace of a good RHEL 7.4 host running ls on /afs, and a RHEL 7.5beta host running ls on /afs and have created pastebins of both, as well as an inline diff. All can be seen at the following locations: works https://paste.fedoraproject.org/paste/Hiojt2~Be3wgez47bKNucQ fails https://paste.fedoraproject.org/paste/13ZXBfJIOMsuEJFwFShBfg diff https://paste.fedoraproject.org/paste/FJKRwep1fWJogIDbLnkn8A Hopefully this might help the OpenAFS devs, or someone might know what might be borking on every RHEL 7.5 beta host. It does fit with what other 7.5 beta users have observed OpenAFS doing. Thanks! - Kodiak On Mon, Feb 5, 2018 at 12:31 PM, Stephan Wiesand wrote: > > > On 04.Feb 2018, at 02:11, Jeffrey Altman wrote: > > > > On 2/2/2018 6:04 PM, Kodiak Firesmith wrote: > >> I'm relatively new to handling OpenAFS. Are these problems part of a > >> normal "kernel release; openafs update" cycle and perhaps I'm getting > >> snagged just by being too early of an adopter? I wanted to raise the > >> alarm on this and see if anything else was needed from me as the > >> reporter of the issue, but perhaps that's an overreaction to what is > >> just part of a normal process I just haven't been tuned into in prior > >> RHEL release cycles? > > > > > > Kodiak, > > > > On RHEL, DKMS is safe to use for kernel modules that restrict themselves > > to using the restricted set of kernel interfaces (the RHEL KABI) that > > Red Hat has designated will be supported across the lifespan of the RHEL > > major version number. OpenAFS is not such a kernel module. As a result > > it is vulnerable to breakage each and every time a new kernel is shipped. > > Jeffrey, > > the usual way to use DKMS is to either have it build a module for a newly > installed kernel or install a prebuilt module for that kernel. It may be > possible to abuse it for providing a module built for another kernel, but > I think that won't happen accidentally. > > You may be confusing DKMS with RHEL's "KABI tracking kmods". Those should > be safe to use within a RHEL minor release (and the SL packaging has been > using them like this since EL6.4), but aren't across minor releases (and > that's why the SL packaging modifies the kmod handling to require a build > for the minor release in question. > > > There are two types of failures that can occur: > > > > 1. a change results in failure to build the OpenAFS kernel module > > for the new kernel > > > > 2. a change results in the OpenAFS kernel module building and > > successfully loading but failing to operate correctly > > The latter shouldn't happen within a minor release, but can across > minor releases. > > > It is the second of these possibilities that has taken place with the > > release of the 3.10.0-830.el7 kernel shipped as part of the RHEL 7.5 > beta. > > > > Are you an early adopter of RHEL 7.5 beta? Absolutely, its a beta > > release and as such you should expect that there will be bugs and that > > third party kernel modules that do not adhere to the KABI functionality > > might have compatibility issues. > > The -830 kernel can break 3rd-party modules using non-whitelisted ABIs, > whether or not they adhere to the "KABI functionality". > > > There was a compatibility issue with RHEL 7.4 kernel > > (3.10.0_693.1.1.el7) as well that was only fixed in the OpenAFS 1.6 > > release series this past week as part of 1.6.22.2: > > > > http://www.openafs.org/dl/openafs/1.6.22.2/RELNOTES-1.6.22.2 > > Yes, and this one was hard to fix. Thanks are due to Mark Vitale for > developing the fix and all those who reviewed and tested it. > > > Jeffrey Altman > > AuriStor, Inc. > > > > P.S. - Welcome to the community. > > Seconded. In particular, the problem report regarding the EL7.5beta > kernel was absolutely appropriate. > > -- > Stephan Wiesand > DESY - DV - > Platanenallee 6 > 15738 Zeuthen, Germany > > > --f40304398e6c6e738b0564a20990 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hello again All,

As part of continued t= esting, I've been able to confirm that the SystemD double-service start= up thing only happens to my hosts when going from RHEL 7.4 to RHEL 7.5beta.= =C2=A0 On a test host installed directly as RHEL 7.5beta, I get a bit farth= er with 1.6.18.22, in that I get to the point where OpenAFS "kind of&q= uot; works.

What I'm observing is that the ope= nafs client Kernel module (built by DKMS) loads fine, and just so long as y= ou know where you need to go in /afs, you can get there, and you can read a= nd write files and the OpenAFS 'fs' command works.=C2=A0 But doing = an 'ls' of /afs or any path underneath results in "ls: reading= directory /afs/: Not a directory".

I ran an = strace of a good RHEL 7.4 host running ls on /afs, and a RHEL 7.5beta host = running ls on /afs and have created pastebins of both, as well as an inline= diff.

All can be seen at the following locations:=

<= br>

diff

Hopefully th= is might help the OpenAFS devs, or someone might know what might be borking= on every RHEL 7.5 beta host.=C2=A0 It does fit with what other 7.5 beta us= ers have observed OpenAFS doing.=C2=A0=C2=A0

Thank= s!
=C2=A0- Kodiak=C2=A0
On Mon, Feb 5, 2018 at 12:31 PM, Stephan Wiesan= d <stephan.wiesand@desy.de> wrote:

> On 04.Feb 2018, at 02:11, Jeffrey Altman <jaltman@auristor.com> wrote:
>
> On 2/2/2018 6:04 PM, Kodiak Firesmith wrote:
>> I'm relatively new to handling OpenAFS.=C2=A0 Are these proble= ms part of a
>> normal "kernel release; openafs update" cycle and perhap= s I'm getting
>> snagged just by being too early of an adopter?=C2=A0 I wanted to r= aise the
>> alarm on this and see if anything else was needed from me as the >> reporter of the issue, but perhaps that's an overreaction to w= hat is
>> just part of a normal process I just haven't been tuned into i= n prior
>> RHEL release cycles?
>
>
> Kodiak,
>
> On RHEL, DKMS is safe to use for kernel modules that restrict themselv= es
> to using the restricted set of kernel interfaces (the RHEL KABI) that<= br> > Red Hat has designated will be supported across the lifespan of the RH= EL
> major version number.=C2=A0 OpenAFS is not such a kernel module.=C2=A0= As a result
> it is vulnerable to breakage each and every time a new kernel is shipp= ed.

Jeffrey,

the usual way to use DKMS is to either have it build a module for a newly installed kernel or install a prebuilt module for that kernel. It may be possible to abuse it for providing a module built for another kernel, but I think that won't happen accidentally.

You may be confusing DKMS with RHEL's "KABI tracking kmods". = Those should
be safe to use within a RHEL minor release (and the SL packaging has been using them like this since EL6.4), but aren't across minor releases (an= d
that's why the SL packaging modifies the kmod handling to require a bui= ld
for the minor release in question.

> There are two types of failures that can occur:
>
> 1. a change results in failure to build the OpenAFS kernel module
>=C2=A0 =C2=A0 for the new kernel
>
> 2. a change results in the OpenAFS kernel module building and
>=C2=A0 =C2=A0 successfully loading but failing to operate correctly

The latter shouldn't happen within a minor release, but can acro= ss
minor releases.

> It is the second of these possibilities that has taken place with the<= br> > release of the 3.10.0-830.el7 kernel shipped as part of the RHEL 7.5 b= eta.
>
> Are you an early adopter of RHEL 7.5 beta?=C2=A0 Absolutely, its a bet= a
> release and as such you should expect that there will be bugs and that=
> third party kernel modules that do not adhere to the KABI functionalit= y
> might have compatibility issues.

The -830 kernel can break 3rd-party modules using non-whitelisted AB= Is,
whether or not they adhere to the "KABI functionality".

> There was a compatibility issue with RHEL 7.4 kernel
> (3.10.0_693.1.1.el7) as well that was only fixed in the OpenAFS 1.6 > release series this past week as part of 1.6.22.2:
>
>=C2=A0 http://www.openafs.org/dl/openafs/1.6.22.2/RELNOTES-1.6.22.2

Yes, and this one was hard to fix. Thanks are due to Mark Vitale for=
developing the fix and all those who reviewed and tested it.

> Jeffrey Altman
> AuriStor, Inc.
>
> P.S. - Welcome to the community.

Seconded. In particular, the problem report regarding the EL7.5beta<= br> kernel was absolutely appropriate.

--
Stephan Wiesand
DESY - DV -
Platanenallee 6
15738 Zeuthen, Germany



--f40304398e6c6e738b0564a20990-- From gsgatlin@ncsu.edu Wed Feb 7 22:54:55 2018 From: gsgatlin@ncsu.edu (Gary Gatling) Date: Wed, 7 Feb 2018 17:54:55 -0500 Subject: [OpenAFS] fatal error: rpc/types.h: No such file or directory Message-ID: --001a114b01381d12120564a72ff1 Content-Type: text/plain; charset="UTF-8" In fedora 28 with openafs 1.6.22.2 It fails to build. The error message is fatal error: rpc/types.h: No such file or directory I think maybe this has to do with: https://fedoraproject.org/wiki/Changes/SunRPCRemoval Anyone know if there is a way around that? Thanks for any ideas. I'm not really worried about it until fedora 28 ships which should be around may 1-8 I think. Was just curious. --001a114b01381d12120564a72ff1 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
In fedora 28 with openafs 1.6.22.2 It fails to build. The = error message is

fatal error: rpc/types.h: No such file = or directory

I think maybe this has to do with= :


Anyone know if there is a way around that?

Thanks for any ideas. I'm not really worried about= it until fedora 28 ships which should be around may 1-8 I think. Was just = curious.
--001a114b01381d12120564a72ff1-- From kaduk@mit.edu Wed Feb 7 23:19:04 2018 From: kaduk@mit.edu (Benjamin Kaduk) Date: Wed, 7 Feb 2018 17:19:04 -0600 Subject: [OpenAFS] fatal error: rpc/types.h: No such file or directory In-Reply-To: References: Message-ID: <20180207231904.GD12363@mit.edu> On Wed, Feb 07, 2018 at 05:54:55PM -0500, Gary Gatling wrote: > In fedora 28 with openafs 1.6.22.2 It fails to build. The error message is > > fatal error: rpc/types.h: No such file or directory > > I think maybe this has to do with: > > https://fedoraproject.org/wiki/Changes/SunRPCRemoval > > Anyone know if there is a way around that? Yup, a pretty easy one, applied to master in https://gerrit.openafs.org/12800 . It looks like it still needs to be pulled up to the stable branches, though. -Ben From gsgatlin@ncsu.edu Thu Feb 8 18:26:16 2018 From: gsgatlin@ncsu.edu (Gary Gatling) Date: Thu, 8 Feb 2018 13:26:16 -0500 Subject: [OpenAFS] fatal error: rpc/types.h: No such file or directory In-Reply-To: <20180207231904.GD12363@mit.edu> References: <20180207231904.GD12363@mit.edu> Message-ID: --94eb2c149f502b58720564b78cff Content-Type: text/plain; charset="UTF-8" On Wed, Feb 7, 2018 at 6:19 PM, Benjamin Kaduk wrote: > > Yup, a pretty easy one, applied to master in > https://gerrit.openafs.org/12800 . It looks like it still needs to > be pulled up to the stable branches, though. > > Thanks a lot. That fixed my compile issues in fedora 28. --94eb2c149f502b58720564b78cff Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


On Wed, Feb 7, 2018 at 6:19 PM, Benjamin Kaduk <kaduk@mit.edu> wrote:

Yup, a pretty easy one, applied to master in
https://gerrit.openafs.org/12800 .=C2=A0 It looks like it s= till needs to
be pulled up to the stable branches, though.


Thanks a lot. That fixed my compile is= sues in fedora 28.

=C2=A0
--94eb2c149f502b58720564b78cff-- From kaduk@mit.edu Fri Feb 9 00:01:56 2018 From: kaduk@mit.edu (Benjamin Kaduk) Date: Thu, 8 Feb 2018 18:01:56 -0600 Subject: [OpenAFS] RHEL 7.5 beta / 3.10.0-830.el7.x86_66 kernel lock up In-Reply-To: References: <54924524-154a-bee0-1719-77f8af636f63@auristor.com> Message-ID: <20180209000156.GM12363@mit.edu> On Wed, Feb 07, 2018 at 11:46:28AM -0500, Kodiak Firesmith wrote: > Hello again All, > > As part of continued testing, I've been able to confirm that the SystemD > double-service startup thing only happens to my hosts when going from RHEL > 7.4 to RHEL 7.5beta. On a test host installed directly as RHEL 7.5beta, I > get a bit farther with 1.6.18.22, in that I get to the point where OpenAFS > "kind of" works. Thanks for tracking this down. The rpm packaging maintainers may want to try to track down why the double-start happens in the upgrade scenario, as that's pretty nasty behavior. > What I'm observing is that the openafs client Kernel module (built by DKMS) > loads fine, and just so long as you know where you need to go in /afs, you > can get there, and you can read and write files and the OpenAFS 'fs' > command works. But doing an 'ls' of /afs or any path underneath results in > "ls: reading directory /afs/: Not a directory". > > I ran an strace of a good RHEL 7.4 host running ls on /afs, and a RHEL > 7.5beta host running ls on /afs and have created pastebins of both, as well > as an inline diff. > > All can be seen at the following locations: > > works > https://paste.fedoraproject.org/paste/Hiojt2~Be3wgez47bKNucQ > > fails > https://paste.fedoraproject.org/paste/13ZXBfJIOMsuEJFwFShBfg > > > diff > https://paste.fedoraproject.org/paste/FJKRwep1fWJogIDbLnkn8A > > Hopefully this might help the OpenAFS devs, or someone might know what > might be borking on every RHEL 7.5 beta host. It does fit with what other > 7.5 beta users have observed OpenAFS doing. Yes, now it seems like all our reports are consistent, and we just have to wait for a developer to get a better look at what Red Hat changed in the kernel that we need to adapt to. -Ben > Thanks! > - Kodiak > > On Mon, Feb 5, 2018 at 12:31 PM, Stephan Wiesand > wrote: > > > > > > On 04.Feb 2018, at 02:11, Jeffrey Altman wrote: > > > > > > On 2/2/2018 6:04 PM, Kodiak Firesmith wrote: > > >> I'm relatively new to handling OpenAFS. Are these problems part of a > > >> normal "kernel release; openafs update" cycle and perhaps I'm getting > > >> snagged just by being too early of an adopter? I wanted to raise the > > >> alarm on this and see if anything else was needed from me as the > > >> reporter of the issue, but perhaps that's an overreaction to what is > > >> just part of a normal process I just haven't been tuned into in prior > > >> RHEL release cycles? > > > > > > > > > Kodiak, > > > > > > On RHEL, DKMS is safe to use for kernel modules that restrict themselves > > > to using the restricted set of kernel interfaces (the RHEL KABI) that > > > Red Hat has designated will be supported across the lifespan of the RHEL > > > major version number. OpenAFS is not such a kernel module. As a result > > > it is vulnerable to breakage each and every time a new kernel is shipped. > > > > Jeffrey, > > > > the usual way to use DKMS is to either have it build a module for a newly > > installed kernel or install a prebuilt module for that kernel. It may be > > possible to abuse it for providing a module built for another kernel, but > > I think that won't happen accidentally. > > > > You may be confusing DKMS with RHEL's "KABI tracking kmods". Those should > > be safe to use within a RHEL minor release (and the SL packaging has been > > using them like this since EL6.4), but aren't across minor releases (and > > that's why the SL packaging modifies the kmod handling to require a build > > for the minor release in question. > > > > > There are two types of failures that can occur: > > > > > > 1. a change results in failure to build the OpenAFS kernel module > > > for the new kernel > > > > > > 2. a change results in the OpenAFS kernel module building and > > > successfully loading but failing to operate correctly > > > > The latter shouldn't happen within a minor release, but can across > > minor releases. > > > > > It is the second of these possibilities that has taken place with the > > > release of the 3.10.0-830.el7 kernel shipped as part of the RHEL 7.5 > > beta. > > > > > > Are you an early adopter of RHEL 7.5 beta? Absolutely, its a beta > > > release and as such you should expect that there will be bugs and that > > > third party kernel modules that do not adhere to the KABI functionality > > > might have compatibility issues. > > > > The -830 kernel can break 3rd-party modules using non-whitelisted ABIs, > > whether or not they adhere to the "KABI functionality". > > > > > There was a compatibility issue with RHEL 7.4 kernel > > > (3.10.0_693.1.1.el7) as well that was only fixed in the OpenAFS 1.6 > > > release series this past week as part of 1.6.22.2: > > > > > > http://www.openafs.org/dl/openafs/1.6.22.2/RELNOTES-1.6.22.2 > > > > Yes, and this one was hard to fix. Thanks are due to Mark Vitale for > > developing the fix and all those who reviewed and tested it. > > > > > Jeffrey Altman > > > AuriStor, Inc. > > > > > > P.S. - Welcome to the community. > > > > Seconded. In particular, the problem report regarding the EL7.5beta > > kernel was absolutely appropriate. > > > > -- > > Stephan Wiesand > > DESY - DV - > > Platanenallee 6 > > 15738 Zeuthen, Germany > > > > > > From kaduk@mit.edu Mon Feb 19 19:01:06 2018 From: kaduk@mit.edu (Benjamin Kaduk) Date: Mon, 19 Feb 2018 13:01:06 -0600 Subject: [OpenAFS] OpenAFS 1.8.0 release candidate 5 available Message-ID: <20180219190105.GK54688@kduck.kaduk.org> --3uo+9/B/ebqu+fSQ Content-Type: text/plain; charset=us-ascii Content-Disposition: inline The OpenAFS Guardians are happy to announce the availability of the fifth prerelease candidate of OpenAFS 1.8.0. Source files can be accessed via the web at: https://www.openafs.org/release/openafs-1.8.0pre5.html or via AFS at: UNIX: /afs/grand.central.org/software/openafs/candidate/1.8.0pre5/ UNC: \\/afs\grand.central.org\software\openafs\candidate\1.8.0pre5\ The changes since beta 4 include some cleanup of ubik behavior to avoid transactions that we know will fail, improving the RedHat packaging, fixing a rare deadlock on old Linux kernel versions, partially fixing a deadlock on Solaris, improving compatibility with glibc 2.26, some debugging aids for Solaris clients, improvements to FreeBSD support, and excluding RXGEN_OPCODE aborts from the threshold towards throttling misbehaving clients of the fileserver. This is a release candidate for the final version of 1.8.0. Please assist the guardians by deploying and testing this release and providing positive or negative feedback. Bug reports should be filed to openafs-bugs@openafs.org ; reports of successes should be sent to openafs-info@openafs.org. Benjamin Kaduk for the OpenAFS Guardians --3uo+9/B/ebqu+fSQ Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQG3BAABCgAdFiEE2WGV4E2ARf9BYP0XKNmm82TrdRIFAlqLHuwACgkQKNmm82Tr dRI87wwg6AunEZfK98/RuU5gy26e01P7dcUVBW4riNr4iZglj6hk0PzlHuoRbYxo EvL8e09MshZj4doCVc8A5kG27TB7tSDLO4I0VqbfKL3scz9khU8JgcfWL80jd3k2 21ByZUc/zhzX2U2D1L8KW6GZHTR/YPB5KnTyspJdoJTgoWunI4fxuhdc6rtUQaEK jfjcZtenAkbT2QBvNV+FWXO94QVbv0Ou3oZFaH/sfnb67jB98D68EMgTAZXeBgLu 024UkrhT9KK8CAx3DY5GMHB57xAbaagGzncu7VM4nhYeAI9LB+fhkbLNxNgakv2c pZzhHpBUtJ10pKrMdNawthgRpHqZm9JpN4ac9i9czK/gbiYeBQx3twi7eGjm6cH4 YjJWOtZL2Te9QLfYSHMJl+m5r8qNqu/AUn70o7tJYx2G39YZm+O3WGGIUDFVZEHC zQAl7/pLmdpPfkxBQuie/lIewdpnhbkq4eToRvqKAblUBPQPjQQKd9pTgOnO0qWc LqPnwI7XUyj6WQ== =2cCN -----END PGP SIGNATURE----- --3uo+9/B/ebqu+fSQ-- From lambert@psc.edu Wed Feb 21 15:52:41 2018 From: lambert@psc.edu (Michael H Lambert) Date: Wed, 21 Feb 2018 10:52:41 -0500 Subject: [OpenAFS] 1.8.0pre5 Build Error on FreeBSD 11.1 Message-ID: <64B1C8B9-804D-48FF-8B0E-4C11CF00C89C@psc.edu> When building 1.8.0pre5 with the defaults (./configure with no = arguments) on FreeBSD 11.1 ----- % uname -a FreeBSD arsenal 11.1-RELEASE-p4 FreeBSD 11.1-RELEASE-p4 #0: Tue Nov 14 = 06:12:40 UTC 2017 = root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys/GENERIC amd64 ----- I get the following errors in building afs_pioctl.c: ----- cc -I. -I.. -I../nfs = -I/home/lambert/openafs/openafs-1.8.0pre5/src/crypto/hcrypto/kernel = -I/home/lambert/openafs/openafs-1.8.0pre5/src = -I/home/lambert/openafs/openafs-1.8.0pre5/src/afs = -I/home/lambert/openafs/openafs-1.8.0pre5/src/afs/FBSD = -I/home/lambert/openafs/openafs-1.8.0pre5/src/config = -I/home/lambert/openafs/openafs-1.8.0pre5/src/rx/FBSD = -I/home/lambert/openafs/openafs-1.8.0pre5/src/external/heimdal = -I/home/lambert/openafs/openafs-1.8.0pre5/src = -I/home/lambert/openafs/openafs-1.8.0pre5/src/afs = -I/home/lambert/openafs/openafs-1.8.0pre5/src/afs/FBSD = -I/home/lambert/openafs/openafs-1.8.0pre5/src/config = -I/home/lambert/openafs/openafs-1.8.0pre5/src/fsint = -I/home/lambert/openafs/openafs-1.8.0pre5/src/vlserver = -I/home/lambert/openafs/openafs-1.8.0pre5/src/auth = -I/home/lambert/openafs/openafs-1.8.0pre5/include = -I/home/lambert/openafs/openafs-1.8.0pre5/include/afs -DAFSDEBUG = -DKERNEL -DAFS -DVICE -DNFS -DUFS -DINET -DQUOTA -DGETMOUNT -Werror = -D_KERNEL -DKLD_MODULE -nostdinc -I. -I.. -I../nfs = -I/home/lambert/openafs/openafs-1.8.0pre5/src/crypto/hcrypto/kernel = -I/home/lambert/openafs/openafs-1.8.0pre5/src = -I/home/lambert/openafs/openafs-1.8.0pre5/src/afs = -I/home/lambert/openafs/openafs-1.8.0pre5/src/afs/FBSD = -I/home/lambert/openafs/openafs-1.8.0pre5/src/config = -I/home/lambert/openafs/openafs-1.8.0pre5/src/rx/FBSD = -I/home/lambert/openafs/openafs-1.8.0pre5/src/external/heimdal = -I/home/lambert/openafs/openafs-1.8.0pre5/src = -I/home/lambert/openafs/openafs-1.8.0pre5/src/afs = -I/home/lambert/openafs/openafs-1.8.0pre5/src/afs/FBSD = -I/home/lambert/openafs/openafs-1.8.0pre5/src/config = -I/home/lambert/openafs/openafs-1.8.0pre5/src/fsint = -I/home/lambert/openafs/openafs-1.8.0pre5/src/vlserver = -I/home/lambert/openafs/openafs-1.8.0pre5/src/auth = -I/home/lambert/openafs/openafs-1.8.0pre5/include = -I/home/lambert/openafs/openafs-1.8.0pre5/include/afs -I. -I/usr/src/sys = -fno-common -fno-omit-frame-pointer -mno-omit-leaf-frame-pointer = -mcmodel=3Dkernel -mno-red-zone -mno-mmx -mno-sse -msoft-float = -fno-asynchronous-unwind-tables -ffreestanding -fwrapv -fstack-protector = -Wno-redundant-decls -mno-aes -mno-avx -std=3Diso9899:1999 = -DAFS_NONFSTRANS -o afs_pioctl.o -c = /home/lambert/openafs/openafs-1.8.0pre5/src/afs/afs_pioctl.c /home/lambert/openafs/openafs-1.8.0pre5/src/afs/afs_pioctl.c:5169:28: = error:=20 passing 'afs_uint32 *' (aka 'unsigned int *') to parameter of type 'afs_int32 *' (aka 'int *') converts between pointers to integer = types with different sign [-Werror,-Wpointer-sign] if (afs_pd_getInt(ain, &addr) !=3D 0) ^~~~~ /home/lambert/openafs/openafs-1.8.0pre5/src/afs/afs_pioctl.c:130:49: = note:=20 passing argument to parameter 'val' here afs_pd_getInt(struct afs_pdata *apd, afs_int32 *val) ^ /home/lambert/openafs/openafs-1.8.0pre5/src/afs/afs_pioctl.c:5236:46: = error:=20 passing 'afs_uint32 *' (aka 'unsigned int *') to parameter of type 'afs_int32 *' (aka 'int *') converts between pointers to integer = types with different sign [-Werror,-Wpointer-sign] code =3D RXAFS_CallBackRxConnAddr(rxconn, &addr); ^~~~~ /home/lambert/openafs/openafs-1.8.0pre5/src/fsint/afsint.h:1389:23: = note:=20 passing argument to parameter 'addr' here /*IN 0*/ afs_int32 * addr); ^ 2 errors generated. ----- The compiler is clang: ----- % cc --version FreeBSD clang version 4.0.0 (tags/RELEASE_400/final 297347) (based on = LLVM 4.0.0) Target: x86_64-unknown-freebsd11.1 Thread model: posix InstalledDir: /usr/bin ----- Thanks, Michael ----- Michael H Lambert, GigaPoP Manager Phone: +1 412 268-4960 Pittsburgh Supercomputing Center/3ROX FAX: +1 412 268-5832 300 S Craig St, Pittsburgh, PA 15213 USA lambert@psc.edu From kaduk@mit.edu Thu Feb 22 05:56:41 2018 From: kaduk@mit.edu (Benjamin Kaduk) Date: Wed, 21 Feb 2018 23:56:41 -0600 Subject: [OpenAFS] 1.8.0pre5 Build Error on FreeBSD 11.1 In-Reply-To: <64B1C8B9-804D-48FF-8B0E-4C11CF00C89C@psc.edu> References: <64B1C8B9-804D-48FF-8B0E-4C11CF00C89C@psc.edu> Message-ID: <20180222055641.GK54688@mit.edu> Hi Michael, On Wed, Feb 21, 2018 at 10:52:41AM -0500, Michael H Lambert wrote: > When building 1.8.0pre5 with the defaults (./configure with no arguments)= on FreeBSD 11.1 >=20 > ----- > % uname -a > FreeBSD arsenal 11.1-RELEASE-p4 FreeBSD 11.1-RELEASE-p4 #0: Tue Nov 14 06= :12:40 UTC 2017 root@amd64-builder.daemonology.net:/usr/obj/usr/src/sys= /GENERIC amd64 > ----- >=20 > I get the following errors in building afs_pioctl.c: >=20 > ----- > cc -I. -I.. -I../nfs -I/home/lambert/openafs/openafs-1.8.0pre5/src/crypt= o/hcrypto/kernel -I/home/lambert/openafs/openafs-1.8.0pre5/src -I/home/la= mbert/openafs/openafs-1.8.0pre5/src/afs -I/home/lambert/openafs/openafs-1.= 8.0pre5/src/afs/FBSD -I/home/lambert/openafs/openafs-1.8.0pre5/src/config = -I/home/lambert/openafs/openafs-1.8.0pre5/src/rx/FBSD -I/home/lambert/ope= nafs/openafs-1.8.0pre5/src/external/heimdal -I/home/lambert/openafs/openaf= s-1.8.0pre5/src -I/home/lambert/openafs/openafs-1.8.0pre5/src/afs -I/home= /lambert/openafs/openafs-1.8.0pre5/src/afs/FBSD -I/home/lambert/openafs/op= enafs-1.8.0pre5/src/config -I/home/lambert/openafs/openafs-1.8.0pre5/src/f= sint -I/home/lambert/openafs/openafs-1.8.0pre5/src/vlserver -I/home/lambe= rt/openafs/openafs-1.8.0pre5/src/auth -I/home/lambert/openafs/openafs-1.8.= 0pre5/include -I/home/lambert/openafs/openafs-1.8.0pre5/include/afs -DAFS= DEBUG -DKERNEL -DAFS -DVICE -DNFS -DUFS -DINET -DQUOTA -DGETMOUNT -Werror = -D_KERNEL -DKLD_MODULE -nostdinc -I. -I.. -I../nfs -I/home/lambert/openafs= /openafs-1.8.0pre5/src/crypto/hcrypto/kernel -I/home/lambert/openafs/openaf= s-1.8.0pre5/src -I/home/lambert/openafs/openafs-1.8.0pre5/src/afs -I/home/l= ambert/openafs/openafs-1.8.0pre5/src/afs/FBSD -I/home/lambert/openafs/opena= fs-1.8.0pre5/src/config -I/home/lambert/openafs/openafs-1.8.0pre5/src/rx/FB= SD -I/home/lambert/openafs/openafs-1.8.0pre5/src/external/heimdal -I/home/l= ambert/openafs/openafs-1.8.0pre5/src -I/home/lambert/openafs/openafs-1.8.0p= re5/src/afs -I/home/lambert/openafs/openafs-1.8.0pre5/src/afs/FBSD -I/home/= lambert/openafs/openafs-1.8.0pre5/src/config -I/home/lambert/openafs/openaf= s-1.8.0pre5/src/fsint -I/home/lambert/openafs/openafs-1.8.0pre5/src/vlserve= r -I/home/lambert/openafs/openafs-1.8.0pre5/src/auth -I/home/lambert/openaf= s/openafs-1.8.0pre5/include -I/home/lambert/openafs/openafs-1.8.0pre5/inclu= de/afs -I. -I/usr/src/sys -fno-common -fno-omit-frame-pointer -mno-omit-le= af-frame-pointer -mcmodel=3Dkernel -mno-red-zone -mno-mmx -mno-sse -msoft= -float -fno-asynchronou > /home/lambert/openafs/openafs-1.8.0pre5/src/afs/afs_pioctl.c:5169:28: err= or:=20 > passing 'afs_uint32 *' (aka 'unsigned int *') to parameter of type > 'afs_int32 *' (aka 'int *') converts between pointers to integer ty= pes > with different sign [-Werror,-Wpointer-sign] > if (afs_pd_getInt(ain, &addr) !=3D 0) > ^~~~~ > /home/lambert/openafs/openafs-1.8.0pre5/src/afs/afs_pioctl.c:130:49: note= :=20 > passing argument to parameter 'val' here > afs_pd_getInt(struct afs_pdata *apd, afs_int32 *val) > ^ > /home/lambert/openafs/openafs-1.8.0pre5/src/afs/afs_pioctl.c:5236:46: err= or:=20 > passing 'afs_uint32 *' (aka 'unsigned int *') to parameter of type > 'afs_int32 *' (aka 'int *') converts between pointers to integer ty= pes > with different sign [-Werror,-Wpointer-sign] > code =3D RXAFS_CallBackRxConnAddr(rxconn, &addr); > ^~~~~ > /home/lambert/openafs/openafs-1.8.0pre5/src/fsint/afsint.h:1389:23: note:= =20 > passing argument to parameter 'addr' here > /*IN 0*/ afs_int32 * addr); > ^ > 2 errors generated. > ----- >=20 > The compiler is clang: >=20 > ----- > % cc --version > FreeBSD clang version 4.0.0 (tags/RELEASE_400/final 297347) (based on LLV= M 4.0.0) > Target: x86_64-unknown-freebsd11.1 > Thread model: posix > InstalledDir: /usr/bin > ----- Thanks for the report. It looks like I only tested on FreeBSD 10.3 so far (which is still on clang 3.x), but I should be able to get a newer environment pretty easily. Still, it's a bit confusing, because our tree is definitely not pointer-sign clean at all, so I'm a bit surprised that it's getting enabled somehow along with -Werror. You don't have anything interesting in /etc/make.conf by chance, do you? Thanks, Ben From lambert@psc.edu Thu Feb 22 14:04:56 2018 From: lambert@psc.edu (Michael H Lambert) Date: Thu, 22 Feb 2018 09:04:56 -0500 Subject: [OpenAFS] 1.8.0pre5 Build Error on FreeBSD 11.1 In-Reply-To: <20180222055641.GK54688@mit.edu> References: <64B1C8B9-804D-48FF-8B0E-4C11CF00C89C@psc.edu> <20180222055641.GK54688@mit.edu> Message-ID: <5A8ECE08.1040303@psc.edu> Hi Ben, On 2018-02-22 00:56, Benjamin Kaduk wrote: > > Thanks for the report. It looks like I only tested on FreeBSD 10.3 > so far (which is still on clang 3.x), but I should be able to get a > newer environment pretty easily. Since 10.3 goes EOL at the end of April (and the whole 10 branch at the end of October), 11 is more interesting. > Still, it's a bit confusing, because our tree is definitely not > pointer-sign clean at all, so I'm a bit surprised that it's getting > enabled somehow along with -Werror. You don't have anything > interesting in /etc/make.conf by chance, do you? Nope: % more /etc/make.conf /etc/make.conf: No such file or directory This is on a pretty vanilla system. Michael From kaduk@mit.edu Thu Feb 22 19:28:08 2018 From: kaduk@mit.edu (Benjamin Kaduk) Date: Thu, 22 Feb 2018 13:28:08 -0600 Subject: [OpenAFS] 1.8.0pre5 Build Error on FreeBSD 11.1 In-Reply-To: <5A8ECE08.1040303@psc.edu> References: <64B1C8B9-804D-48FF-8B0E-4C11CF00C89C@psc.edu> <20180222055641.GK54688@mit.edu> <5A8ECE08.1040303@psc.edu> Message-ID: <20180222192808.GQ54688@mit.edu> On Thu, Feb 22, 2018 at 09:04:56AM -0500, Michael H Lambert wrote: > Hi Ben, > > On 2018-02-22 00:56, Benjamin Kaduk wrote: > > > > Thanks for the report. It looks like I only tested on FreeBSD 10.3 > > so far (which is still on clang 3.x), but I should be able to get a > > newer environment pretty easily. > > Since 10.3 goes EOL at the end of April (and the whole 10 branch at the > end of October), 11 is more interesting. Oh, definitely! But pre4 wouldn't build on anything newer than 10.3 out-of-the-box, since we haven't been keeping up with the needed param.h and sysname updates. pre5 should have that part in place, at least. > > Still, it's a bit confusing, because our tree is definitely not > > pointer-sign clean at all, so I'm a bit surprised that it's getting > > enabled somehow along with -Werror. You don't have anything > > interesting in /etc/make.conf by chance, do you? > > Nope: > > % more /etc/make.conf > /etc/make.conf: No such file or directory > > This is on a pretty vanilla system. Okay, thanks. (Presumably you can pass CFLAGS=-Wno-error to configure and let the build go farther, if this is blocking your progress.) -Ben From sanders@umich.edu Mon Feb 26 19:38:27 2018 From: sanders@umich.edu (Michael Sanders) Date: Mon, 26 Feb 2018 14:38:27 -0500 Subject: [OpenAFS] Re: [OpenAFS-announce] OpenAFS 1.8.0 release candidate 5 available In-Reply-To: <20180219190105.GK54688@kduck.kaduk.org> References: <20180219190105.GK54688@kduck.kaduk.org> Message-ID: --089e082f9d84743fa7056622a734 Content-Type: text/plain; charset="UTF-8" Please remove sanders@umich.edu from your list. He is deceased. Michael Sanders sanders@umich.edu On Mon, Feb 19, 2018 at 2:01 PM, Benjamin Kaduk wrote: > The OpenAFS Guardians are happy to announce the availability of the fifth > prerelease candidate of OpenAFS 1.8.0. > Source files can be accessed via the web at: > > https://www.openafs.org/release/openafs-1.8.0pre5.html > > or via AFS at: > > UNIX: /afs/grand.central.org/software/openafs/candidate/1.8.0pre5/ > UNC: \\/afs\grand.central.org\software\openafs\candidate\1. > 8.0pre5\ > > The changes since beta 4 include some cleanup of ubik behavior to > avoid transactions that we know will fail, improving the RedHat > packaging, fixing a rare deadlock on old Linux kernel versions, > partially fixing a deadlock on Solaris, improving compatibility with > glibc 2.26, some debugging aids for Solaris clients, improvements to > FreeBSD support, and excluding RXGEN_OPCODE aborts from the > threshold towards throttling misbehaving clients of the fileserver. > > This is a release candidate for the final version of 1.8.0. > > Please assist the guardians by deploying and testing this release and > providing positive or negative feedback. Bug reports should be filed > to openafs-bugs@openafs.org ; reports of successes should be sent to > openafs-info@openafs.org. > > Benjamin Kaduk > for the OpenAFS Guardians > > --089e082f9d84743fa7056622a734 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Please remove sanders= @umich.edu
from=C2=A0 your list.
He is deceased.

Michael Sanders
sanders@umich.edu
<= /div>
On Mon, Feb 19, 2018 at 2:01 PM, Benjamin Ka= duk <kaduk@mit.edu> wrote:
The= OpenAFS Guardians are happy to announce the availability of the fifth
prerelease candidate of OpenAFS 1.8.0.
Source files can be accessed via the web at:

=C2=A0 =C2=A0 =C2=A0 =C2=A0 https://www.openaf= s.org/release/openafs-1.8.0pre5.html

or via AFS at:

=C2=A0 =C2=A0 =C2=A0 =C2=A0 UNIX: /afs/grand.central.org/software/openafs/candidate/1.8.0pre5/
=C2=A0 =C2=A0 =C2=A0 =C2=A0 UNC: \\/afs\grand.central.org\software\= openafs\candidate\1.8.0pre5\

The changes since beta 4 include some cleanup of ubik behavior to
avoid transactions that we know will fail, improving the RedHat
packaging, fixing a rare deadlock on old Linux kernel versions,
partially fixing a deadlock on Solaris, improving compatibility with
glibc 2.26, some debugging aids for Solaris clients, improvements to
FreeBSD support, and excluding RXGEN_OPCODE aborts from the
threshold towards throttling misbehaving clients of the fileserver.

This is a release candidate for the final version of 1.8.0.

Please assist the guardians by deploying and testing this release and
providing positive or negative feedback.=C2=A0 Bug reports should be filed<= br> to openafs-bugs@openafs.org= ; reports of successes should be sent to
openafs-info@openafs.org.
Benjamin Kaduk
for the OpenAFS Guardians


--089e082f9d84743fa7056622a734--