[OpenAFS-devel] stability problems, and interesting symptoms. ..

Neulinger, Nathan nneul@umr.edu
Wed, 30 May 2001 12:09:31 -0500


I added a pile of debugging to volume.c and volprocs.c and came to this:


    fdP = IH_OPEN(h);
    if (fdP == NULL) {
        Log("ReadHeader: %s:%d\n", __FILE__, __LINE__);
        *ec = VSALVAGE;
        return;
    }

in ReadHeader in volume.c... The IH_OPEN is failing. I'm trying to bump up
inode-max and file-max on the box in question - we'll see if that makes any
difference.

-- Nathan

> -----Original Message-----
> From: Neulinger, Nathan [mailto:nneul@umr.edu]
> Sent: Wednesday, May 30, 2001 10:46 AM
> To: 'openafs-devel@openafs.org'
> Subject: [OpenAFS-devel] stability problems, and interesting 
> symptoms...
> 
> 
> I've got two problems and one interesting symptom, though 
> probably not of
> any relation to the first problem.
> 
> First, on a couple of my servers (and this started happening 
> sometime back
> about a month or so with no apparent changes to server 
> hardware or software)
> - if I start moving volumes off the server en-masse, one at a 
> time, one
> after another, at some point in the process, 50-100 volumes 
> have been moved,
> I get a volserver error complaining about being unable to 
> attach a volume.
> Once that happens, from then on out, any listvol or volserver activity
> against the server fails. Usually bos status indicates that 
> vol exited with
> signal 6 although not necessarily immediately (I haven't seen 
> that with
> openafs yet, but that was typically what I saw with 3.6-2.3). 
> I have no
> error messages from the volserver other than this - and basically no
> indication that anything is wrong.
> 
> I get the error both with transarc 3.6-2.3 and openafs-cvs. 
> 
> Syslogs looks like this:
> ----
> (lots and lots of stuff like the next few lines for the other 
> volumes that
> moved ok.)
> May 30 10:30:18 afs4 fileserver[511]: fssync: volume 
> 537013509 moved to
> 63019783; breaking all call backs 
> May 30 10:30:18 afs4 volserver[483]: 1 Volser: Delete: volume 
> 537013509
> deleted  
> May 30 10:30:18 afs4 volserver[483]: 1 Volser: Delete: volume 
> 537013511
> deleted  
> May 30 10:30:18 afs4 volserver[483]: 1 Volser: Delete: volume 
> 537020173
> deleted  
> May 30 10:30:20 afs4 volserver[483]: 1 Volser: Clone: Cloning volume
> 536897629 to new volume 537020174 
> May 30 10:30:20 afs4 fileserver[511]: fssync: volume 
> 536897629 moved to
> 63019783; breaking all call backs 
> May 30 10:30:20 afs4 volserver[483]: 1 Volser: Delete: volume 
> 536897629
> deleted  
> May 30 10:30:20 afs4 volserver[483]: 1 Volser: Delete: volume 
> 536897631
> deleted  
> May 30 10:30:22 afs4 volserver[483]: 1 Volser: Delete: volume 
> 537020174
> deleted  
> May 30 10:30:23 afs4 volserver[483]: VAttachVolume: Error 
> attaching volume
> /vicepd/V0536906941.vol; volume needs salvage 
> May 30 10:30:23 afs4 volserver[483]: 1 Volser: ListVolumes: 
> Could not attach
> volume 536906941 
> May 30 10:30:23 afs4 volserver[483]: 1 Volser: ListVolumes: 
> Could not attach
> volume 536985904 
> May 30 10:30:23 afs4 volserver[483]: 1 Volser: ListVolumes: 
> Could not attach
> volume 536889228 
> May 30 10:30:23 afs4 volserver[483]: 1 Volser: ListVolumes: 
> Could not attach
> volume 536924071 
> May 30 10:30:23 afs4 volserver[483]: 1 Volser: ListVolumes: 
> Could not attach
> volume 536896750 
> May 30 10:30:23 afs4 volserver[483]: 1 Volser: ListVolumes: 
> Could not attach
> volume 536897341 
> May 30 10:30:23 afs4 volserver[483]: 1 Volser: ListVolumes: 
> Could not attach
> volume 536983233 
> May 30 10:30:23 afs4 volserver[483]: 1 Volser: ListVolumes: 
> Could not attach
> volume 536906834 
> (tons of that for every volume on the server, and happens 
> again if you do a
> vos listvol against the server.)
> -----
> 
> The other symptom - when clearing off a server, I happened to 
> notice that
> the volserver seemed to hang (and not respond to any new 
> client requests
> such as vos partinfo) if I started a vos release against it. 
> Once the vos
> release (in particular the ForwardMulti) completed, the 
> volserver responded
> again. I'm not talking about a huge volume - maybe 5-10 megs 
> with a few
> thousand files in it. 
> 
> I'm running volserver with no options in both cases. 
> 
> -- Nathan
> 
> ------------------------------------------------------------
> Nathan Neulinger                       EMail:  nneul@umr.edu
> University of Missouri - Rolla         Phone: (573) 341-4841
> Computing Services                       Fax: (573) 341-4216
> _______________________________________________
> OpenAFS-devel mailing list
> OpenAFS-devel@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-devel
>