[OpenAFS] file-server: salvaging

Klaas Hagemann kerberos@northsailor.de
Mon, 27 Jan 2003 13:05:08 +0100


Hartmut Reuter schrieb:
> 
> If the fileservers have pid 1 as father they are probably left overs of 
> a restart and if this happened on sunday I would guess from the regular 
> restart at sunday morning at 4:00. (try bos getrestart).

Is is not causes by the restart at sunday morning. It happens from time 
to time and i cannot reproduce it.

I have posted some log files and i am preparing for getting some 
debugging information. But as far as i can see it, it seems so as if the 
  fileserver prozess produces a memory address violation (segmentation 
fault).

It did not happen in my testing enviroment, so i think it only happens 
when more clients are accessing the afs fileserver. So i would like to 
know if there are any kernel parameters to be set?


Klaas

> 
> If the old fileservers don't go away the newly started fileservers will 
> give up after at time because of "bind failed". Then the new bosserver 
> will restart the fileserver and because the old one didn't regularly 
> shut down it will start first the salvager.
> 
> So make sure the old fileservers go away (if nothing else helps kill 
> them by hand). Perhaps you better set restart to 'never' unless you have 
> solved the problem.
> 
> Hartmut Reuter
> 
> 
> 
> Klaas Hagemann wrote:
> 
>> Derrick J Brashear schrieb:
>>
>>> On Fri, 24 Jan 2003, Klaas Hagemann wrote:
>>>
>>>
>>>> salvaging starts very often and file server prozesses are staying 
>>>> running but do not have the bosserver as ppid.
>>>
>>>
>>>
>>>
>>> either the bosserver is dying and somwething restarting it (doubt it) or
>>> more likely the "main" pthread is dying but the rest stay running. 
>>> strace
>>> output, core.(pid) or logs might be helpful.
>>
>>
>> This sunday this error occured again on another file-server.
>> All the file-server prozesses have the "1" as pid and the volumes are 
>> not accessible any more. I am not sure whether the bosserver was still 
>> running or not, because my kollegue restarted it.
>>
>> The AFS-Logs are empty, cause they were deleted on the new startup. I 
>> will keep them the next time.
>>
>> The file servers are running on suse linux 7.3. Are there any 
>> kernel-parameters which could be set? We had openafs running in our 
>> testing-enviroment without any problems, so i think this problem only 
>> occurs when many clients access the file-server.
>>
>> I will post any log-files when i get them, but any help or suggestions 
>> is very very welcome.
>>
>> Thanks
>> Klaas
>>
>>>
>>>
>>> _______________________________________________
>>> OpenAFS-info mailing list
>>> OpenAFS-info@openafs.org
>>> https://lists.openafs.org/mailman/listinfo/openafs-info
>>>
>>
>>
>> _______________________________________________
>> OpenAFS-info mailing list
>> OpenAFS-info@openafs.org
>> https://lists.openafs.org/mailman/listinfo/openafs-info
> 
> 
>