[OpenAFS] file-server: salvaging

Nathan Neulinger nneul@umr.edu
27 Jan 2003 07:46:25 -0600


This problem is caused when the fileserver fails for whatever reason,
and due to pthreads, is unable to completely exit. Since it can't
completely exit, bosserver can't start a new one. 

Go in and killall -KILL fileserver. That will clear it up. Doesn't solve
the problem, but will get you back running without a reboot.

-- Nathan

On Mon, 2003-01-27 at 06:05, Klaas Hagemann wrote:
> Hartmut Reuter schrieb:
> > 
> > If the fileservers have pid 1 as father they are probably left overs of 
> > a restart and if this happened on sunday I would guess from the regular 
> > restart at sunday morning at 4:00. (try bos getrestart).
> 
> Is is not causes by the restart at sunday morning. It happens from time 
> to time and i cannot reproduce it.
> 
> I have posted some log files and i am preparing for getting some 
> debugging information. But as far as i can see it, it seems so as if the 
>   fileserver prozess produces a memory address violation (segmentation 
> fault).
> 
> It did not happen in my testing enviroment, so i think it only happens 
> when more clients are accessing the afs fileserver. So i would like to 
> know if there are any kernel parameters to be set?
> 
> 
> Klaas
> 
> > 
> > If the old fileservers don't go away the newly started fileservers will 
> > give up after at time because of "bind failed". Then the new bosserver 
> > will restart the fileserver and because the old one didn't regularly 
> > shut down it will start first the salvager.
> > 
> > So make sure the old fileservers go away (if nothing else helps kill 
> > them by hand). Perhaps you better set restart to 'never' unless you have 
> > solved the problem.
> > 
> > Hartmut Reuter
> > 
> > 
> > 
> > Klaas Hagemann wrote:
> > 
> >> Derrick J Brashear schrieb:
> >>
> >>> On Fri, 24 Jan 2003, Klaas Hagemann wrote:
> >>>
> >>>
> >>>> salvaging starts very often and file server prozesses are staying 
> >>>> running but do not have the bosserver as ppid.
> >>>
> >>>
> >>>
> >>>
> >>> either the bosserver is dying and somwething restarting it (doubt it) or
> >>> more likely the "main" pthread is dying but the rest stay running. 
> >>> strace
> >>> output, core.(pid) or logs might be helpful.
> >>
> >>
> >> This sunday this error occured again on another file-server.
> >> All the file-server prozesses have the "1" as pid and the volumes are 
> >> not accessible any more. I am not sure whether the bosserver was still 
> >> running or not, because my kollegue restarted it.
> >>
> >> The AFS-Logs are empty, cause they were deleted on the new startup. I 
> >> will keep them the next time.
> >>
> >> The file servers are running on suse linux 7.3. Are there any 
> >> kernel-parameters which could be set? We had openafs running in our 
> >> testing-enviroment without any problems, so i think this problem only 
> >> occurs when many clients access the file-server.
> >>
> >> I will post any log-files when i get them, but any help or suggestions 
> >> is very very welcome.
> >>
> >> Thanks
> >> Klaas
> >>
> >>>
> >>>
> >>> _______________________________________________
> >>> OpenAFS-info mailing list
> >>> OpenAFS-info@openafs.org
> >>> https://lists.openafs.org/mailman/listinfo/openafs-info
> >>>
> >>
> >>
> >> _______________________________________________
> >> OpenAFS-info mailing list
> >> OpenAFS-info@openafs.org
> >> https://lists.openafs.org/mailman/listinfo/openafs-info
> > 
> > 
> > 
> 
> 
> _______________________________________________
> OpenAFS-info mailing list
> OpenAFS-info@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-info
-- 

------------------------------------------------------------
Nathan Neulinger                       EMail:  nneul@umr.edu
University of Missouri - Rolla         Phone: (573) 341-4841
Computing Services                       Fax: (573) 341-4216