[OpenAFS] Re: errors in afs when multiple tasks are running

Andrew Deason adeason@sinenomine.net
Mon, 11 Jul 2011 12:57:09 -0500


On Mon, 11 Jul 2011 10:57:04 -0600
Mark Henry <mark.henry@infoprint.com> wrote:

> The tokens expired error occurs with the failed to store file error.
> The problem is that it occurs very often in the log file even when all
> is working well.

The "expired tokens" message appears often? Or do you mean the other
messages appear often, even without the token expiry?

> The job runs for around 7 hours and k5start is used.  Most of the time
> the build completes just fine.

Is it possible that it sometimes runs for over 10 hours?

> I am not aware of any core files created on the fileserver as a result
> of this issue.

You can generate a core manually from a running fileserver with
'gencore <fileserver pid> core.fileserver.whatever' (on AIX)

> Why would the contact to the fileserver be lost just because a second
> script gets kicked off in afs?

Well, the theory would be that it ups the abort count to over the
relevant threshold. But if you're running with the abort threshold
disabled, then it's not that.

-- 
Andrew Deason
adeason@sinenomine.net