[OpenAFS-devel] stability problems, and interesting symptoms...

Jimmy Engelbrecht jimmy-li@e.kth.se
31 May 2001 00:57:47 +0200


"Neulinger, Nathan" <nneul@umr.edu> writes:

> I've got two problems and one interesting symptom, though probably not of
> any relation to the first problem.
> 
> First, on a couple of my servers (and this started happening sometime back
> about a month or so with no apparent changes to server hardware or software)
> - if I start moving volumes off the server en-masse, one at a time, one
> after another, at some point in the process, 50-100 volumes have been moved,
> I get a volserver error complaining about being unable to attach a volume.
> Once that happens, from then on out, any listvol or volserver activity
> against the server fails. Usually bos status indicates that vol exited with
> signal 6 although not necessarily immediately (I haven't seen that with
> openafs yet, but that was typically what I saw with 3.6-2.3). I have no
> error messages from the volserver other than this - and basically no
> indication that anything is wrong.

I have seen similar bahvior when "reusing" old disks and using them for AFS-data,
espacially when using old AFS-serverdisk in a new AFS-server, even a 'newfs'
does not help, because of the problem "the old sysadmin" before me used to erase
the disk's complelty by using 'dd', however i really never belived it untill i got
similar problems a few weeks ago on an Tru64 5.0a machine running OpenAFS 1.0.4 
i reused 11 OLD SCSI-Disks, and while creating and moving volumes i got similar
problems that you described, completly erasing the problemtic disk with 'dd' solved
the problem.

As we know the fileserver cripples with the filesystem-nodes on the disk (except
on linux),this could be an explanation.

Signal 6 on your operatingsystem, is that SIGABRT ?

/Jimmy, KTH