[OpenAFS] file-server: salvaging
Nathan Neulinger
nneul@umr.edu
27 Jan 2003 07:46:25 -0600
This problem is caused when the fileserver fails for whatever reason,
and due to pthreads, is unable to completely exit. Since it can't
completely exit, bosserver can't start a new one.
Go in and killall -KILL fileserver. That will clear it up. Doesn't solve
the problem, but will get you back running without a reboot.
-- Nathan
On Mon, 2003-01-27 at 06:05, Klaas Hagemann wrote:
> Hartmut Reuter schrieb:
> >
> > If the fileservers have pid 1 as father they are probably left overs of
> > a restart and if this happened on sunday I would guess from the regular
> > restart at sunday morning at 4:00. (try bos getrestart).
>
> Is is not causes by the restart at sunday morning. It happens from time
> to time and i cannot reproduce it.
>
> I have posted some log files and i am preparing for getting some
> debugging information. But as far as i can see it, it seems so as if the
> fileserver prozess produces a memory address violation (segmentation
> fault).
>
> It did not happen in my testing enviroment, so i think it only happens
> when more clients are accessing the afs fileserver. So i would like to
> know if there are any kernel parameters to be set?
>
>
> Klaas
>
> >
> > If the old fileservers don't go away the newly started fileservers will
> > give up after at time because of "bind failed". Then the new bosserver
> > will restart the fileserver and because the old one didn't regularly
> > shut down it will start first the salvager.
> >
> > So make sure the old fileservers go away (if nothing else helps kill
> > them by hand). Perhaps you better set restart to 'never' unless you have
> > solved the problem.
> >
> > Hartmut Reuter
> >
> >
> >
> > Klaas Hagemann wrote:
> >
> >> Derrick J Brashear schrieb:
> >>
> >>> On Fri, 24 Jan 2003, Klaas Hagemann wrote:
> >>>
> >>>
> >>>> salvaging starts very often and file server prozesses are staying
> >>>> running but do not have the bosserver as ppid.
> >>>
> >>>
> >>>
> >>>
> >>> either the bosserver is dying and somwething restarting it (doubt it) or
> >>> more likely the "main" pthread is dying but the rest stay running.
> >>> strace
> >>> output, core.(pid) or logs might be helpful.
> >>
> >>
> >> This sunday this error occured again on another file-server.
> >> All the file-server prozesses have the "1" as pid and the volumes are
> >> not accessible any more. I am not sure whether the bosserver was still
> >> running or not, because my kollegue restarted it.
> >>
> >> The AFS-Logs are empty, cause they were deleted on the new startup. I
> >> will keep them the next time.
> >>
> >> The file servers are running on suse linux 7.3. Are there any
> >> kernel-parameters which could be set? We had openafs running in our
> >> testing-enviroment without any problems, so i think this problem only
> >> occurs when many clients access the file-server.
> >>
> >> I will post any log-files when i get them, but any help or suggestions
> >> is very very welcome.
> >>
> >> Thanks
> >> Klaas
> >>
> >>>
> >>>
> >>> _______________________________________________
> >>> OpenAFS-info mailing list
> >>> OpenAFS-info@openafs.org
> >>> https://lists.openafs.org/mailman/listinfo/openafs-info
> >>>
> >>
> >>
> >> _______________________________________________
> >> OpenAFS-info mailing list
> >> OpenAFS-info@openafs.org
> >> https://lists.openafs.org/mailman/listinfo/openafs-info
> >
> >
> >
>
>
> _______________________________________________
> OpenAFS-info mailing list
> OpenAFS-info@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-info
--
------------------------------------------------------------
Nathan Neulinger EMail: nneul@umr.edu
University of Missouri - Rolla Phone: (573) 341-4841
Computing Services Fax: (573) 341-4216