[OpenAFS] file-server: salvaging
Klaas Hagemann
kerberos@northsailor.de
Mon, 27 Jan 2003 15:19:47 +0100
Nathan Neulinger schrieb:
> This problem is caused when the fileserver fails for whatever reason,
> and due to pthreads, is unable to completely exit. Since it can't
> completely exit, bosserver can't start a new one.
>
> Go in and killall -KILL fileserver. That will clear it up. Doesn't solve
> the problem, but will get you back running without a reboot.
Thanks.
I got it so far, but the problem still occurs from time to time. and due
to high availability i need to solve the problems
>
> -- Nathan
>
> On Mon, 2003-01-27 at 06:05, Klaas Hagemann wrote:
>
>>Hartmut Reuter schrieb:
>>
>>>If the fileservers have pid 1 as father they are probably left overs of
>>>a restart and if this happened on sunday I would guess from the regular
>>>restart at sunday morning at 4:00. (try bos getrestart).
>>
>>Is is not causes by the restart at sunday morning. It happens from time
>>to time and i cannot reproduce it.
>>
>>I have posted some log files and i am preparing for getting some
>>debugging information. But as far as i can see it, it seems so as if the
>> fileserver prozess produces a memory address violation (segmentation
>>fault).
>>
>>It did not happen in my testing enviroment, so i think it only happens
>>when more clients are accessing the afs fileserver. So i would like to
>>know if there are any kernel parameters to be set?
>>
>>
>>Klaas
>>
>>
>>>If the old fileservers don't go away the newly started fileservers will
>>>give up after at time because of "bind failed". Then the new bosserver
>>>will restart the fileserver and because the old one didn't regularly
>>>shut down it will start first the salvager.
>>>
>>>So make sure the old fileservers go away (if nothing else helps kill
>>>them by hand). Perhaps you better set restart to 'never' unless you have
>>>solved the problem.
>>>
>>>Hartmut Reuter
>>>
>>>
>>>
>>>Klaas Hagemann wrote:
>>>
>>>
>>>>Derrick J Brashear schrieb:
>>>>
>>>>
>>>>>On Fri, 24 Jan 2003, Klaas Hagemann wrote:
>>>>>
>>>>>
>>>>>
>>>>>>salvaging starts very often and file server prozesses are staying
>>>>>>running but do not have the bosserver as ppid.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>either the bosserver is dying and somwething restarting it (doubt it) or
>>>>>more likely the "main" pthread is dying but the rest stay running.
>>>>>strace
>>>>>output, core.(pid) or logs might be helpful.
>>>>
>>>>
>>>>This sunday this error occured again on another file-server.
>>>>All the file-server prozesses have the "1" as pid and the volumes are
>>>>not accessible any more. I am not sure whether the bosserver was still
>>>>running or not, because my kollegue restarted it.
>>>>
>>>>The AFS-Logs are empty, cause they were deleted on the new startup. I
>>>>will keep them the next time.
>>>>
>>>>The file servers are running on suse linux 7.3. Are there any
>>>>kernel-parameters which could be set? We had openafs running in our
>>>>testing-enviroment without any problems, so i think this problem only
>>>>occurs when many clients access the file-server.
>>>>
>>>>I will post any log-files when i get them, but any help or suggestions
>>>>is very very welcome.
>>>>
>>>>Thanks
>>>>Klaas
>>>>
>>>>
>>>>>
>>>>>_______________________________________________
>>>>>OpenAFS-info mailing list
>>>>>OpenAFS-info@openafs.org
>>>>>https://lists.openafs.org/mailman/listinfo/openafs-info
>>>>>
>>>>
>>>>
>>>>_______________________________________________
>>>>OpenAFS-info mailing list
>>>>OpenAFS-info@openafs.org
>>>>https://lists.openafs.org/mailman/listinfo/openafs-info
>>>
>>>
>>>
>>
>>_______________________________________________
>>OpenAFS-info mailing list
>>OpenAFS-info@openafs.org
>>https://lists.openafs.org/mailman/listinfo/openafs-info