[OpenAFS-devel] [OpenAFS] getcwd() error for RHEL 7.4 kernel

Stephan Wiesand stephan.wiesand@desy.de
Fri, 20 Oct 2017 21:27:14 +0200


On Oct 20, 2017, at 21:17 , Mark Vitale wrote:

>=20
>> On Oct 20, 2017, at 8:27 AM, Stephan Wiesand =
<stephan.wiesand@desy.de> wrote:
>>=20
>> [taking this thread to -devel]
>>=20
>>> On 20. Oct 2017, at 12:04, Stephan Wiesand <stephan.wiesand@desy.de> =
wrote:
>>>=20
>>> I ran configure against the EL7.3 and EL7.4 GA kernels =
(3.10.0-514.el7 and 3.10.0-696.el7) and compared the results.
>>>=20
>>> Besides the fact that in the 7.4 case conftest.c is compiled with an =
additional -DCONFIG_AVX512, which I doubt makes a difference, there are =
some differences in configure test results:
>>>=20
>>> 			7.3	7.4
>>> locks_lock_file_wait	no	yes
>>> inode_lock		no	yes
>>> exported tasklist_lock	yes	no
>>=20
>=20
> Thank you for this good information, Stephan.  Were those 3 the only =
OpenAFS config differences you found?

Yes of course.

>> It turns out the EL7.4 kernel turns tasklist_lock from an rwlock_t =
into a qrwlock_t and all read_{,un}lock() calls into qread_{,un}lock() =
ones. And no, it's not what mainline kernels do, including 4.14-rc5.
>>=20
>> We should probably adapt to this, and I guess it shouldn=92t be too =
hard, but is this change likely to be the reason for more frequent =
getcwd() problems?
>=20
>=20
> I took a look at all three differences with regard to the OpenAFS =
1.6.20.2 code, and I don=92t see a way that any of them could be causing =
the getcwd problems. =20
>=20
> In particular, the threadlist_lock references in OpenAFS 1.6.20.2 =
source will not actually result in any OpenAFS kernel module references, =
due to the results from other parts of the autoconfig for RHEL 7.4.  You =
can verify this for yourself by issuing:  =92nm <openafs.ko> | grep =
threadlist_lock=92
>=20
> However, don=92t rely on the nm trick to look for the other symbols =
referenced above. inode_lock() is defined as static inline and is thus =
inlined as a mutex_unlock(&inode->i_lock), which is indistinguishable =
from other mutex_unlock references.  And locks_lock_file_wait() is also =
static inline - it shows up as locks_lock_inode_wait in the nm output.=20=

>=20
> So in summary, thank you, but I don=92t believe any of these explain =
the current getcwd symptoms.
>=20
> Has anyone seen this with RHEL 7.4 and the previous OpenAFS releases - =
 1.6.20.1 or older?


Not here. It was 1.6.21, and the statistics isn't exactly great.

You mean it could simply be "shake harder" unmasking the actual issue =
again?

--=20
Stephan Wiesand
DESY -DV-
Platanenenallee 6
15738 Zeuthen, Germany