[OpenAFS-devel] [OpenAFS] getcwd() error for RHEL 7.4 kernel
Stephan Wiesand
stephan.wiesand@desy.de
Fri, 20 Oct 2017 21:27:14 +0200
On Oct 20, 2017, at 21:17 , Mark Vitale wrote:
>=20
>> On Oct 20, 2017, at 8:27 AM, Stephan Wiesand =
<stephan.wiesand@desy.de> wrote:
>>=20
>> [taking this thread to -devel]
>>=20
>>> On 20. Oct 2017, at 12:04, Stephan Wiesand <stephan.wiesand@desy.de> =
wrote:
>>>=20
>>> I ran configure against the EL7.3 and EL7.4 GA kernels =
(3.10.0-514.el7 and 3.10.0-696.el7) and compared the results.
>>>=20
>>> Besides the fact that in the 7.4 case conftest.c is compiled with an =
additional -DCONFIG_AVX512, which I doubt makes a difference, there are =
some differences in configure test results:
>>>=20
>>> 7.3 7.4
>>> locks_lock_file_wait no yes
>>> inode_lock no yes
>>> exported tasklist_lock yes no
>>=20
>=20
> Thank you for this good information, Stephan. Were those 3 the only =
OpenAFS config differences you found?
Yes of course.
>> It turns out the EL7.4 kernel turns tasklist_lock from an rwlock_t =
into a qrwlock_t and all read_{,un}lock() calls into qread_{,un}lock() =
ones. And no, it's not what mainline kernels do, including 4.14-rc5.
>>=20
>> We should probably adapt to this, and I guess it shouldn=92t be too =
hard, but is this change likely to be the reason for more frequent =
getcwd() problems?
>=20
>=20
> I took a look at all three differences with regard to the OpenAFS =
1.6.20.2 code, and I don=92t see a way that any of them could be causing =
the getcwd problems. =20
>=20
> In particular, the threadlist_lock references in OpenAFS 1.6.20.2 =
source will not actually result in any OpenAFS kernel module references, =
due to the results from other parts of the autoconfig for RHEL 7.4. You =
can verify this for yourself by issuing: =92nm <openafs.ko> | grep =
threadlist_lock=92
>=20
> However, don=92t rely on the nm trick to look for the other symbols =
referenced above. inode_lock() is defined as static inline and is thus =
inlined as a mutex_unlock(&inode->i_lock), which is indistinguishable =
from other mutex_unlock references. And locks_lock_file_wait() is also =
static inline - it shows up as locks_lock_inode_wait in the nm output.=20=
>=20
> So in summary, thank you, but I don=92t believe any of these explain =
the current getcwd symptoms.
>=20
> Has anyone seen this with RHEL 7.4 and the previous OpenAFS releases - =
1.6.20.1 or older?
Not here. It was 1.6.21, and the statistics isn't exactly great.
You mean it could simply be "shake harder" unmasking the actual issue =
again?
--=20
Stephan Wiesand
DESY -DV-
Platanenenallee 6
15738 Zeuthen, Germany