[OpenAFS] Graphical file managers get stuck

jukka.tuominen@finndesign.fi jukka.tuominen@finndesign.fi
Tue, 11 Dec 2012 08:03:11 +0200 (EET)


[returning back to the list with the Wireshark results]

Excellent work Ken, thanks!

Just a few clarifications about the recording:

The system was running 1.4. during the test (with -afsdb and -dynroot).
It's a virtual environment and I can somewhat easily test either one.

I only clicked once on the target address and waited until timeout, so the
repetition was system generated.

br, jukka



> Yep, that capture is good. To see the problem, open it in Wireshark,
> and in the display filter, type "dns".
>
> You can see that at packet 59 the AFS client starts querying for a
> "Trash" cell. It can't find it, but it ends up querying four separate
> times in succession. Next it starts looking for a Trash-20014 cell,
> trying again four times, then giving up. Lastly it looks for a
> "hidden" cell. Googling around shows this is a Gnome (Nautilus)
> feature that may work on OS X too. Unfortunately it causes even more
> delays.
>
> The whole process seems to repeat itself about three minutes later -
> maybe you did something in the UI like navigate to a different folder?
> You can see that AFS unfortunately did not cache any of the DNS
> queries and it tries them all again.
>
> So in summary, this may very well be the issue, or a large part of it.
> Especially if your client is even slower on 1.6, because 1.6 makes
> twice as many DNS queries (it will look up SRV records in addition to
> AFSDB records).
>
> If you restart your AFS client without the -afsdb and -dynroot
> parameters I think you would see Nautilus speed up a lot. Of course
> you might not be able to get back into AFS then :) Before you turn off
> -afsdb and -dynroot you'd need to be sure that you have your VLDB
> server IP addresses statically defined in your CellServDB file.
>
> Obviously it is not optimal that users have to turn off -afsdb and
> -dynroot, so there will probably be more discussions among the
> developers about how best to solve this longer-term.
>
> - Ken
>
>
> On Mon, Dec 10, 2012 at 5:07 PM,  <jukka.tuominen@finndesign.fi> wrote:
>>
>> Hi Ken,
>>
>> I haven't used wireshark earlier, so I hope I got it right. Please see
>> the
>> attachment.
>>
>> Eventhough this is a development environment, I'd rather not have the
>> content sent to the mailing list, but of course the generic info to get
>> the issue solved and shared.
>>
>> I recorded Nautilus trying to access [cell A] from the local
>> [cell B] eventually timing out before reaching it.
>>
>> Is this of any good?
>>
>> br, jukka
>>
>>
>>
>>> On Mon, Dec 10, 2012 at 3:12 PM,  <jukka.tuominen@finndesign.fi> wrote:
>>>>
>>>> What do you mean by publishing DNS SRV records? The server has a FQDN
>>>> but
>>>> do you mean something else?
>>>
>>> If you can get a Wireshark trace of DNS activity while Nautilus is
>>> frozen, that will help us understand if Nautilus is causing AFS to
>>> stall out when making DNS queries for dynroot, or if it's something
>>> else entirely.
>>>
>>> I just tested this with Gnome on Fedora 19 and I can confirm that
>>> Nautilus also does the problematic /afs/.Trash lookups (details at
>>> [1]). It's just not clear to me how frequently Nautilus triggers these
>>> lookups, because in my light testing they seem to happen less often
>>> than on Xfce. Regardless, more debugging with Wireshark in your
>>> environment would give us more information to go on.
>>>
>>> - Ken
>>>
>>> [1]
>>> https://lists.openafs.org/pipermail/openafs-info/2011-June/036202.html
>>> _______________________________________________
>>> OpenAFS-info mailing list
>>> OpenAFS-info@openafs.org
>>> https://lists.openafs.org/mailman/listinfo/openafs-info
>>>
>