[OpenAFS] Connection Timed Out errors occasionally when accessing openafs drive

Ken Elkabany Ken@Elkabany.com
Tue, 19 May 2009 22:54:48 -0700


I upgraded our server and client to 1.4.10. Unfortunately, I am still
receiving Connection Timed Out errors. They rarely occur, but when
they do they are a severe hindrance. My use case is as follows:

Three different unix user accounts (root, www-data, aux) are all
running multiple background processes (~9 total) which access the afs
mount. They each automatically acquire, or re-acquire tickets and
tokens, and then proceed to read, copy, and write files. Occasionally,
upon creating a directory using a python os command similar to "mkdir
-p" (os.makedirs), I receive a "Connection Timed Out" error. The
processes must then be restarted.

Any other suggestions?

Ken

On Sun, May 10, 2009 at 7:41 PM, Derrick Brashear <shadow@gmail.com> wrote:
> it probably matters in the server here, but both.
>
> Derrick
>
>
> On May 10, 2009, at 10:35 PM, Ken Elkabany <Ken@Elkabany.com> wrote:
>
>> Is this bug fixed in the client or the server? Thanks.
>>
>> Ken
>>
>> On Sun, May 10, 2009 at 7:22 PM, Derrick Brashear <shadow@gmail.com>
>> wrote:
>>>
>>> I'd venture this is a bug fixed in 1.4.10, with idle dead time
>>> computation
>>> in rx.
>>>
>>> Derrick
>>>
>>>
>>> On May 10, 2009, at 9:53 PM, Ken Elkabany <Ken@Elkabany.com> wrote:
>>>
>>>> Hello,
>>>>
>>>> I have openafs 1.4.9 client and server running on two separate
>>>> machines across a WAN. The client has scripts that access the
>>>> /afs/our.cell/ directory. Occasionally, the script will fail to
>>>> complete, and the logs will say that the "Connection Timed Out" on a
>>>> "mkdir -p /afs/our.cell/x/y/z" command. The frequency of the errors
>>>> are approximately 1 in 100, small enough to not be easily reproducible
>>>> manually, but enough to hamper our project. The scripts run as the
>>>> root user, and is guaranteed to have the proper ticket and token. It's
>>>> also important to note that these scripts often run in parallel (4 at
>>>> a time, all root, modifying our cell). When one fails, all scripts
>>>> running concurrently will fail with the same error, and I typically
>>>> either unlog;kdestroy or restart the openafs-client (I am unsure which
>>>> of those solutions is necessary or sufficient). I will soon have an
>>>> additional LAN setup, and will determine if the same error occurs. Has
>>>> anyone dealt with this issue before?
>>>>
>>>> Thank you for the assistance,
>>>>
>>>> Ken
>>>> _______________________________________________
>>>> OpenAFS-info mailing list
>>>> OpenAFS-info@openafs.org
>>>> https://lists.openafs.org/mailman/listinfo/openafs-info
>>>
>