[OpenAFS-devel] volserver dies on linux

David Thompson thomas@cs.wisc.edu
Tue, 28 May 2002 14:07:30 -0500


Are folks doing development aware of a volserver problem that manifests itself 
when trying to delete several volumes in succession?  The scenario looks 
something like:

- Create a couple thousand volumes on a 300+ GB (ext3 on software raid 5) 
partition

- Put some Large Files (i.e. 500 MB to 1 GB) in some of the volumes, but leave 
most empty

- Start deleting the volumes

After a few dozen, the volserver starts rejecting the 'vos remove' commands, 
although the volumes disappear from the vldb.  When I tried to go back over 
the data with 'vos zap' the error message was:

Could not start transaction on volume 2004066564
Volume not attached, does not exist, or not on line
Error in vos zap command.
Volume not attached, does not exist, or not on line

I get that message for as many more 'vos zap' commands as I try.  After that 
point, if I 'vos listvol' the partition, the volserver dies with SIGABRT.  
When it restarts, I can delete a few dozen more volumes before the volserver 
goes bad again.

Needless to say, this isn't a very efficient way to delete thousands of 
volumes.

This is OpenAFS 1.2.2 on Red Hat 7.2 with kernel.org 2.4.18 kernel, dual 
Athlon file server.

Has this already been diagnosed, or should I investigate further?

--
Dave Thompson  <thomas@cs.wisc.edu>

Associate Researcher                    Department of Computer Science
University of Wisconsin-Madison         http://www.cs.wisc.edu/~thomas
1210 West Dayton Street                 Phone:    (608)-262-1017
Madison, WI 53706-1685                  Fax:      (608)-262-6626
--