[OpenAFS-devel] stability problems, and interesting symptoms. ..

Neulinger, Nathan nneul@umr.edu
Wed, 30 May 2001 12:32:33 -0500


No... this didn't help... Here's a look right after it crashed:

troot-afs4(23)> cat /proc/sys/fs/file-max 
65535
troot-afs4(24)> cat /proc/sys/fs/file-nr 
2381    2       65535
troot-afs4(25)> cat /proc/sys/fs/inode-nr 
256020  33754
troot-afs4(26)> cat /proc/sys/fs/inode-max 
256000

Note the enormous first val in /proc/sys/fs/inode-nr? I think that indicates
that at one point, over 256020 inodes were open.

I think the volserver is leaking inodes in certain cases or something. 

-- Nathan

> -----Original Message-----
> From: Neulinger, Nathan [mailto:nneul@umr.edu]
> Sent: Wednesday, May 30, 2001 12:10 PM
> To: 'openafs-devel@openafs.org'
> Subject: RE: [OpenAFS-devel] stability problems, and interesting
> symptoms. ..
> 
> 
> I added a pile of debugging to volume.c and volprocs.c and 
> came to this:
> 
> 
>     fdP = IH_OPEN(h);
>     if (fdP == NULL) {
>         Log("ReadHeader: %s:%d\n", __FILE__, __LINE__);
>         *ec = VSALVAGE;
>         return;
>     }
> 
> in ReadHeader in volume.c... The IH_OPEN is failing. I'm 
> trying to bump up
> inode-max and file-max on the box in question - we'll see if 
> that makes any
> difference.
> 
> -- Nathan
> 
> > -----Original Message-----
> > From: Neulinger, Nathan [mailto:nneul@umr.edu]
> > Sent: Wednesday, May 30, 2001 10:46 AM
> > To: 'openafs-devel@openafs.org'
> > Subject: [OpenAFS-devel] stability problems, and interesting 
> > symptoms...
> > 
> > 
> > I've got two problems and one interesting symptom, though 
> > probably not of
> > any relation to the first problem.
> > 
> > First, on a couple of my servers (and this started happening 
> > sometime back
> > about a month or so with no apparent changes to server 
> > hardware or software)
> > - if I start moving volumes off the server en-masse, one at a 
> > time, one
> > after another, at some point in the process, 50-100 volumes 
> > have been moved,
> > I get a volserver error complaining about being unable to 
> > attach a volume.
> > Once that happens, from then on out, any listvol or 
> volserver activity
> > against the server fails. Usually bos status indicates that 
> > vol exited with
> > signal 6 although not necessarily immediately (I haven't seen 
> > that with
> > openafs yet, but that was typically what I saw with 3.6-2.3). 
> > I have no
> > error messages from the volserver other than this - and basically no
> > indication that anything is wrong.
> > 
> > I get the error both with transarc 3.6-2.3 and openafs-cvs. 
> > 
> > Syslogs looks like this:
> > ----
> > (lots and lots of stuff like the next few lines for the other 
> > volumes that
> > moved ok.)
> > May 30 10:30:18 afs4 fileserver[511]: fssync: volume 
> > 537013509 moved to
> > 63019783; breaking all call backs 
> > May 30 10:30:18 afs4 volserver[483]: 1 Volser: Delete: volume 
> > 537013509
> > deleted  
> > May 30 10:30:18 afs4 volserver[483]: 1 Volser: Delete: volume 
> > 537013511
> > deleted  
> > May 30 10:30:18 afs4 volserver[483]: 1 Volser: Delete: volume 
> > 537020173
> > deleted  
> > May 30 10:30:20 afs4 volserver[483]: 1 Volser: Clone: Cloning volume
> > 536897629 to new volume 537020174 
> > May 30 10:30:20 afs4 fileserver[511]: fssync: volume 
> > 536897629 moved to
> > 63019783; breaking all call backs 
> > May 30 10:30:20 afs4 volserver[483]: 1 Volser: Delete: volume 
> > 536897629
> > deleted  
> > May 30 10:30:20 afs4 volserver[483]: 1 Volser: Delete: volume 
> > 536897631
> > deleted  
> > May 30 10:30:22 afs4 volserver[483]: 1 Volser: Delete: volume 
> > 537020174
> > deleted  
> > May 30 10:30:23 afs4 volserver[483]: VAttachVolume: Error 
> > attaching volume
> > /vicepd/V0536906941.vol; volume needs salvage 
> > May 30 10:30:23 afs4 volserver[483]: 1 Volser: ListVolumes: 
> > Could not attach
> > volume 536906941 
> > May 30 10:30:23 afs4 volserver[483]: 1 Volser: ListVolumes: 
> > Could not attach
> > volume 536985904 
> > May 30 10:30:23 afs4 volserver[483]: 1 Volser: ListVolumes: 
> > Could not attach
> > volume 536889228 
> > May 30 10:30:23 afs4 volserver[483]: 1 Volser: ListVolumes: 
> > Could not attach
> > volume 536924071 
> > May 30 10:30:23 afs4 volserver[483]: 1 Volser: ListVolumes: 
> > Could not attach
> > volume 536896750 
> > May 30 10:30:23 afs4 volserver[483]: 1 Volser: ListVolumes: 
> > Could not attach
> > volume 536897341 
> > May 30 10:30:23 afs4 volserver[483]: 1 Volser: ListVolumes: 
> > Could not attach
> > volume 536983233 
> > May 30 10:30:23 afs4 volserver[483]: 1 Volser: ListVolumes: 
> > Could not attach
> > volume 536906834 
> > (tons of that for every volume on the server, and happens 
> > again if you do a
> > vos listvol against the server.)
> > -----
> > 
> > The other symptom - when clearing off a server, I happened to 
> > notice that
> > the volserver seemed to hang (and not respond to any new 
> > client requests
> > such as vos partinfo) if I started a vos release against it. 
> > Once the vos
> > release (in particular the ForwardMulti) completed, the 
> > volserver responded
> > again. I'm not talking about a huge volume - maybe 5-10 megs 
> > with a few
> > thousand files in it. 
> > 
> > I'm running volserver with no options in both cases. 
> > 
> > -- Nathan
> > 
> > ------------------------------------------------------------
> > Nathan Neulinger                       EMail:  nneul@umr.edu
> > University of Missouri - Rolla         Phone: (573) 341-4841
> > Computing Services                       Fax: (573) 341-4216
> > _______________________________________________
> > OpenAFS-devel mailing list
> > OpenAFS-devel@openafs.org
> > https://lists.openafs.org/mailman/listinfo/openafs-devel
> > 
> _______________________________________________
> OpenAFS-devel mailing list
> OpenAFS-devel@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-devel
>