[OpenAFS] AFS 1.2.8 fileserver Failing in GetClient()

Rainer Toebbicke rtb@pclella.cern.ch
Wed, 02 Apr 2003 11:26:50 +0200


Doug,

we had a number of crashes in GetClient()/'assertion failed' at the end of 
last year that have all been tracked down to 'host' entries being 
removed/reset while another thread updates the same host, easily recognizable 
by an entry in the FileLog shortly before the crash.

Everything went away after correcting the rx-thread-id assignment that 
controls per-thread 'holding' of hosts. The last in a small series of fixes is 
  STABLE12-rx-thread-id-startup-20030303 which hopefully is in the 1.2.9 rc4.

Cheers, Rainer

-- 
=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=
Rainer Toebbicke        http://cern.ch/~rtb         rtb@mail.cern.ch  O__
European Laboratory for Particle Physics(CERN) - Geneva, Switzerland   > |
Phone: +41 22 767 8985       Fax: +41 22 767 7155                     ( )\( )



Douglas E. Engert wrote:
> 
> Derrick J Brashear wrote:
> 
>>On Mon, 31 Mar 2003, Douglas E. Engert wrote:
>>
>>
>>>After looking at the AFS Bug Tracking, this problem looks like an old problem,
>>>1257, which was resolved on 2/3/3.
>>>
>>>But the resolution looks like it only added some extra error messages, not solved
>>>the problem. The comments indicate that backing off to 1.2.6 did not solve the problem,
>>>but a patch from lha@stacken.kth.se might have. It is not clear what the patch
>>>is, or if it is in the current source.
>>
>>As I recall several patches that went into src/viced/host.c (and possibly
>>some related changed in other files) happened to try to fix this. Those
>>changes should all be in the last 1.2.9 release candidate. A diff of that
>>against 1.2.8 would probably work, but if not I can try to hunt out the
>>deltas tomorrow.
> 
> 
> 
> Looking at bug 1257, it loks like the main change is to move the two lines
> 
>  h_Hold_r(host);
>  h_Lock_r(host);
> 
> 
> which is in hosts.c between 1.2.7.8 and 1.2.7.9
> and called STABLE12-viced-alloc-hosts-held-and-locked-20030114
> 
> I am going to try and built the 1.2.8 with this change. If you see any
> reason not to please let me know. 
>