[OpenAFS] file-server: salvaging

Neulinger, Nathan nneul@umr.edu
Mon, 27 Jan 2003 08:38:47 -0600


You might try running the LWP fileserver instead of the pthread one. It
may help you out.=20

Build from source, and grab fileserver out of the viced/ directory
instead of the tviced/ one which is installed into dest/ by default.=20

-- Nathan

------------------------------------------------------------
Nathan Neulinger                       EMail:  nneul@umr.edu
University of Missouri - Rolla         Phone: (573) 341-4841
Computing Services                       Fax: (573) 341-4216


> -----Original Message-----
> From: Klaas Hagemann [mailto:kerberos@northsailor.de]=20
> Sent: Monday, January 27, 2003 8:20 AM
> To: Neulinger, Nathan
> Cc: Hartmut Reuter; openafs-info@openafs.org
> Subject: Re: [OpenAFS] file-server: salvaging
>=20
>=20
> Nathan Neulinger schrieb:
> > This problem is caused when the fileserver fails for=20
> whatever reason,
> > and due to pthreads, is unable to completely exit. Since it can't
> > completely exit, bosserver can't start a new one.=20
> >=20
> > Go in and killall -KILL fileserver. That will clear it up.=20
> Doesn't solve
> > the problem, but will get you back running without a reboot.
>=20
> Thanks.
> I got it so far, but the problem still occurs from time to=20
> time. and due=20
> to high availability i need to solve the problems
>=20
> >=20
> > -- Nathan
> >=20
> > On Mon, 2003-01-27 at 06:05, Klaas Hagemann wrote:
> >=20
> >>Hartmut Reuter schrieb:
> >>
> >>>If the fileservers have pid 1 as father they are probably=20
> left overs of=20
> >>>a restart and if this happened on sunday I would guess=20
> from the regular=20
> >>>restart at sunday morning at 4:00. (try bos getrestart).
> >>
> >>Is is not causes by the restart at sunday morning. It=20
> happens from time=20
> >>to time and i cannot reproduce it.
> >>
> >>I have posted some log files and i am preparing for getting some=20
> >>debugging information. But as far as i can see it, it seems=20
> so as if the=20
> >>  fileserver prozess produces a memory address violation=20
> (segmentation=20
> >>fault).
> >>
> >>It did not happen in my testing enviroment, so i think it=20
> only happens=20
> >>when more clients are accessing the afs fileserver. So i=20
> would like to=20
> >>know if there are any kernel parameters to be set?
> >>
> >>
> >>Klaas
> >>
> >>
> >>>If the old fileservers don't go away the newly started=20
> fileservers will=20
> >>>give up after at time because of "bind failed". Then the=20
> new bosserver=20
> >>>will restart the fileserver and because the old one didn't=20
> regularly=20
> >>>shut down it will start first the salvager.
> >>>
> >>>So make sure the old fileservers go away (if nothing else=20
> helps kill=20
> >>>them by hand). Perhaps you better set restart to 'never'=20
> unless you have=20
> >>>solved the problem.
> >>>
> >>>Hartmut Reuter
> >>>
> >>>
> >>>
> >>>Klaas Hagemann wrote:
> >>>
> >>>
> >>>>Derrick J Brashear schrieb:
> >>>>
> >>>>
> >>>>>On Fri, 24 Jan 2003, Klaas Hagemann wrote:
> >>>>>
> >>>>>
> >>>>>
> >>>>>>salvaging starts very often and file server prozesses=20
> are staying=20
> >>>>>>running but do not have the bosserver as ppid.
> >>>>>
> >>>>>
> >>>>>
> >>>>>
> >>>>>either the bosserver is dying and somwething restarting=20
> it (doubt it) or
> >>>>>more likely the "main" pthread is dying but the rest=20
> stay running.=20
> >>>>>strace
> >>>>>output, core.(pid) or logs might be helpful.
> >>>>
> >>>>
> >>>>This sunday this error occured again on another file-server.
> >>>>All the file-server prozesses have the "1" as pid and the=20
> volumes are=20
> >>>>not accessible any more. I am not sure whether the=20
> bosserver was still=20
> >>>>running or not, because my kollegue restarted it.
> >>>>
> >>>>The AFS-Logs are empty, cause they were deleted on the=20
> new startup. I=20
> >>>>will keep them the next time.
> >>>>
> >>>>The file servers are running on suse linux 7.3. Are there any=20
> >>>>kernel-parameters which could be set? We had openafs=20
> running in our=20
> >>>>testing-enviroment without any problems, so i think this=20
> problem only=20
> >>>>occurs when many clients access the file-server.
> >>>>
> >>>>I will post any log-files when i get them, but any help=20
> or suggestions=20
> >>>>is very very welcome.
> >>>>
> >>>>Thanks
> >>>>Klaas
> >>>>
> >>>>
> >>>>>
> >>>>>_______________________________________________
> >>>>>OpenAFS-info mailing list
> >>>>>OpenAFS-info@openafs.org
> >>>>>https://lists.openafs.org/mailman/listinfo/openafs-info
> >>>>>
> >>>>
> >>>>
> >>>>_______________________________________________
> >>>>OpenAFS-info mailing list
> >>>>OpenAFS-info@openafs.org
> >>>>https://lists.openafs.org/mailman/listinfo/openafs-info
> >>>
> >>>
> >>>
> >>
> >>_______________________________________________
> >>OpenAFS-info mailing list
> >>OpenAFS-info@openafs.org
> >>https://lists.openafs.org/mailman/listinfo/openafs-info
>=20
>=20
>=20