[OpenAFS-devel] linux45: smoke test failed

Stephan Wiesand stephan.wiesand@desy.de
Sat, 18 Jun 2016 20:27:34 +0200


Joe,

thanks for the feedback.

On Jun 17, 2016, at 23:39 , Joe Gorse wrote:
> FWIW, I am able to reproduce a "cwd" message for "git log" command on
> Fedora 23, 4.5.6-200.fc23.x86_64. "git log" reads:
>=20
> fatal: Unable to read current working directory: No such file or =
directory
>=20
> Though it should read:
>=20
> fatal: Not a git repository (or any of the parent directories): .git

Exactly the problem I'm seeing.
>=20
> However, I am not having any trouble with the git checkout. It seems =
to
> consistently work on Fedora 23. Even the "git checkout
> openafs-stable-1_6_18". Perhaps try on 4.5.6 on Fedora?

That's where I started, see the first message in this thread. And in all =
cases, the git checkout actually works. I'm just using it to trigger the =
cwd problem - there are probably many other ways. Note that also "git =
log" is just one way of exposing the problem

> Though I have seen some more of this issue on Debian 8 with kernel =
4.6.0.
> Three of three tests failed to checkout the openafs tree on this =
system. I
> will test some other kernels on this system later and note anything
> interesting.

Sounds even considerably worse :-(

Any errors logged? Does the client actually have some variant of gerrit =
12228 applied?

Cheers,
	Stephan

> Cheers,
> Joe
>=20
> On Fri, Jun 17, 2016 at 11:30 AM, Stephan Wiesand =
<stephan.wiesand@desy.de>
> wrote:
>=20
>>=20
>> On Jun 17, 2016, at 04:45 , Benjamin Kaduk wrote:
>>=20
>>> On Thu, 16 Jun 2016, Stephan Wiesand wrote:
>>>=20
>>>> I smoke tested what was planned to be OpenAFS 1.6.18.1, as =
discussed in
>> yesterday's release team meeting, on a Fedora 23 x86_64 VM with =
kernel
>> 4.5.6-200 today. The result was disappointing:
>>>>=20
>>>> git clone git://gerrit.openafs.org/openafs.git
>>>=20
>>> Is the pwd the root of a volume?
>>=20
>> No, everything happens at least one level below.
>>=20
>>>> cd openafs
>>>> git log
>>>> # scrolled through a few dozen changes, took a couple of seconds
>>>> git checkout openafs-stable-1_6_18
>>>>=20
>>>> At this point I got the following error:
>>>>=20
>>>> fatal: Unable to read current working directory: No such file or
>> directory
>>>>=20
>>>> A "cd; cd -" cures this for a while, and there's no apparent data
>> corruption. I'm still worried. The problem isn't 100% reproducible, =
but it
>> doesn't take too may tries checking out random tags or branches.
>>>>=20
>>>> This was plain 1.6.18 + gerrit 12300 12301 12302 12274.
>>>>=20
>>>> Cache is on ext4, no separate partition, default size as set by our =
RPM
>> (I think 100MB, but I don't have access to the VM right now to =
check).
>>>>=20
>>>> The small cache size may contribute to the problem. But I found no
>> errors logged anywhere, and this shouldn't happen no matter how small =
the
>> cache is.
>>>=20
>>> Please check if the cmdebug output is empty (I expect it is, but it =
is
>>> good to check).
>>=20
>> It is empty.
>>=20
>>>> NB we have a user report of exactly this problem happening =
frequently
>> while just editing files in a local git repo in AFS space. The data =
is a
>> bit sketchy, but it's probably Ubuntu 14.04 with its current default =
kernel
>> and the openafs packages from Anders' ppa. I'll try to get us more =
data.
>>>>=20
>>>>=20
>>>> Any thoughts? For the time being I'm considering this a showstopper =
for
>>>> 1.6.18.1, and it looks like we're not quite there yet regarding =
Linux
>>>> 4.5, let alone 4.6 or the 4.7 due in a few weeks :-(
>>>=20
>>> Can you run the same test on a 4.4 kernel for comparison?
>>=20
>> I tried under the last F22 kernel, 4.4.6-200.fc22. And ok, it's not =
4.5
>> specific, though it seems to happen more frequently with 4.5.2 than =
with
>> 4.4.6.
>>=20
>> By chance I found a pretty reliable reproducer:
>>=20
>>        cd /vol/ume/root
>>        mkdir g; cd g
>>        git clone git://gerrit.openafs.org/openafs.git; sleep 180; git =
log
>>=20
>> Note indeed no "cd openafs". Of course this should complain about the =
cwd
>> not being a git repo. But most of the time it will complain about the =
cwd
>> issue instead.
>>=20
>> I'm planning to verify that plain 1.6.18 behaves the same on 4.4.6, =
and if
>> it does I'll proceed with the 1.6.18.1 release.
>>=20
>> I couldn't reproduce this with any EL clients, but those have larger
>> caches (it's indeed 100 MB on that Fedora VM), so there's more to =
test.
>> Help welcome...
>>=20
>>=20
>> _______________________________________________
>> OpenAFS-devel mailing list
>> OpenAFS-devel@openafs.org
>> https://lists.openafs.org/mailman/listinfo/openafs-devel
>>=20
>=20
>=20
>=20
> --=20
> Joe Gorse
>=20
> C: 440-552-0730
> LI: Joe Gorse <http://www.linkedin.com/pub/joe-gorse/7/12/397>

--=20
Stephan Wiesand
DESY -DV-
Platanenenallee 6
15738 Zeuthen, Germany