[OpenAFS-devel] stability problems, and interesting symptoms...

Nathan Neulinger nneul@umr.edu
Wed, 30 May 2001 18:44:10 -0500


Jimmy Engelbrecht wrote:
> 
> "Neulinger, Nathan" <nneul@umr.edu> writes:
> 
> > I've got two problems and one interesting symptom, though probably not of
> > any relation to the first problem.
> >
> > First, on a couple of my servers (and this started happening sometime back
> > about a month or so with no apparent changes to server hardware or software)
> > - if I start moving volumes off the server en-masse, one at a time, one
> > after another, at some point in the process, 50-100 volumes have been moved,
> > I get a volserver error complaining about being unable to attach a volume.
> > Once that happens, from then on out, any listvol or volserver activity
> > against the server fails. Usually bos status indicates that vol exited with
> > signal 6 although not necessarily immediately (I haven't seen that with
> > openafs yet, but that was typically what I saw with 3.6-2.3). I have no
> > error messages from the volserver other than this - and basically no
> > indication that anything is wrong.
> 
> I have seen similar bahvior when "reusing" old disks and using them for AFS-data,
> espacially when using old AFS-serverdisk in a new AFS-server, even a 'newfs'
> does not help, because of the problem "the old sysadmin" before me used to erase
> the disk's complelty by using 'dd', however i really never belived it untill i got
> similar problems a few weeks ago on an Tru64 5.0a machine running OpenAFS 1.0.4
> i reused 11 OLD SCSI-Disks, and while creating and moving volumes i got similar
> problems that you described, completly erasing the problemtic disk with 'dd' solved
> the problem.
>
> As we know the fileserver cripples with the filesystem-nodes on the disk (except
> on linux),this could be an explanation.
> 
> Signal 6 on your operatingsystem, is that SIGABRT ?

Yes. I must not have said, this is all on i386_linux22.

Turns out I was able to get rid of this symptom (apparently) by raising
the ulimits that the server runs with and/or by raising the inode limit
really high, but I think the change that mattered was the ulimit one.
I'm in the process of clearing that server off completely, and will
reformat the filesystems before putting it back in service and clearing
the next server off.

-- Nathan

------------------------------------------------------------
Nathan Neulinger                       EMail:  nneul@umr.edu
University of Missouri - Rolla         Phone: (573) 341-4841
CIS - Systems Programming                Fax: (573) 341-4216