[OpenAFS-devel] stability problems, and interesting symptoms. ..

Neulinger, Nathan nneul@umr.edu
Wed, 30 May 2001 13:45:03 -0500


Interesting... I think that my change to add a "ulimit -Hsn 16384" to the
afs startup script right before afsd started actually had more of an effect
than the change to the inode-max value.

After messing with both of those values, I have been unable to get the
server to fail in the same way it did before.

-- Nathan

> -----Original Message-----
> From: Neulinger, Nathan [mailto:nneul@umr.edu]
> Sent: Wednesday, May 30, 2001 12:33 PM
> To: 'openafs-devel@openafs.org'
> Subject: RE: [OpenAFS-devel] stability problems, and interesting
> symptoms. ..
> 
> 
> No... this didn't help... Here's a look right after it crashed:
> 
> troot-afs4(23)> cat /proc/sys/fs/file-max 
> 65535
> troot-afs4(24)> cat /proc/sys/fs/file-nr 
> 2381    2       65535
> troot-afs4(25)> cat /proc/sys/fs/inode-nr 
> 256020  33754
> troot-afs4(26)> cat /proc/sys/fs/inode-max 
> 256000
> 
> Note the enormous first val in /proc/sys/fs/inode-nr? I think 
> that indicates
> that at one point, over 256020 inodes were open.
> 
> I think the volserver is leaking inodes in certain cases or 
> something. 
> 
> -- Nathan
> 
> > -----Original Message-----
> > From: Neulinger, Nathan [mailto:nneul@umr.edu]
> > Sent: Wednesday, May 30, 2001 12:10 PM
> > To: 'openafs-devel@openafs.org'
> > Subject: RE: [OpenAFS-devel] stability problems, and interesting
> > symptoms. ..
> > 
> > 
> > I added a pile of debugging to volume.c and volprocs.c and 
> > came to this:
> > 
> > 
> >     fdP = IH_OPEN(h);
> >     if (fdP == NULL) {
> >         Log("ReadHeader: %s:%d\n", __FILE__, __LINE__);
> >         *ec = VSALVAGE;
> >         return;
> >     }
> > 
> > in ReadHeader in volume.c... The IH_OPEN is failing. I'm 
> > trying to bump up
> > inode-max and file-max on the box in question - we'll see if 
> > that makes any
> > difference.
> > 
> > -- Nathan
> > 
> > > -----Original Message-----
> > > From: Neulinger, Nathan [mailto:nneul@umr.edu]
> > > Sent: Wednesday, May 30, 2001 10:46 AM
> > > To: 'openafs-devel@openafs.org'
> > > Subject: [OpenAFS-devel] stability problems, and interesting 
> > > symptoms...
> > > 
> > > 
> > > I've got two problems and one interesting symptom, though 
> > > probably not of
> > > any relation to the first problem.
> > > 
> > > First, on a couple of my servers (and this started happening 
> > > sometime back
> > > about a month or so with no apparent changes to server 
> > > hardware or software)
> > > - if I start moving volumes off the server en-masse, one at a 
> > > time, one
> > > after another, at some point in the process, 50-100 volumes 
> > > have been moved,
> > > I get a volserver error complaining about being unable to 
> > > attach a volume.
> > > Once that happens, from then on out, any listvol or 
> > volserver activity
> > > against the server fails. Usually bos status indicates that 
> > > vol exited with
> > > signal 6 although not necessarily immediately (I haven't seen 
> > > that with
> > > openafs yet, but that was typically what I saw with 3.6-2.3). 
> > > I have no
> > > error messages from the volserver other than this - and 
> basically no
> > > indication that anything is wrong.
> > > 
> > > I get the error both with transarc 3.6-2.3 and openafs-cvs. 
> > > 
> > > Syslogs looks like this:
> > > ----
> > > (lots and lots of stuff like the next few lines for the other 
> > > volumes that
> > > moved ok.)
> > > May 30 10:30:18 afs4 fileserver[511]: fssync: volume 
> > > 537013509 moved to
> > > 63019783; breaking all call backs 
> > > May 30 10:30:18 afs4 volserver[483]: 1 Volser: Delete: volume 
> > > 537013509
> > > deleted  
> > > May 30 10:30:18 afs4 volserver[483]: 1 Volser: Delete: volume 
> > > 537013511
> > > deleted  
> > > May 30 10:30:18 afs4 volserver[483]: 1 Volser: Delete: volume 
> > > 537020173
> > > deleted  
> > > May 30 10:30:20 afs4 volserver[483]: 1 Volser: Clone: 
> Cloning volume
> > > 536897629 to new volume 537020174 
> > > May 30 10:30:20 afs4 fileserver[511]: fssync: volume 
> > > 536897629 moved to
> > > 63019783; breaking all call backs 
> > > May 30 10:30:20 afs4 volserver[483]: 1 Volser: Delete: volume 
> > > 536897629
> > > deleted  
> > > May 30 10:30:20 afs4 volserver[483]: 1 Volser: Delete: volume 
> > > 536897631
> > > deleted  
> > > May 30 10:30:22 afs4 volserver[483]: 1 Volser: Delete: volume 
> > > 537020174
> > > deleted  
> > > May 30 10:30:23 afs4 volserver[483]: VAttachVolume: Error 
> > > attaching volume
> > > /vicepd/V0536906941.vol; volume needs salvage 
> > > May 30 10:30:23 afs4 volserver[483]: 1 Volser: ListVolumes: 
> > > Could not attach
> > > volume 536906941 
> > > May 30 10:30:23 afs4 volserver[483]: 1 Volser: ListVolumes: 
> > > Could not attach
> > > volume 536985904 
> > > May 30 10:30:23 afs4 volserver[483]: 1 Volser: ListVolumes: 
> > > Could not attach
> > > volume 536889228 
> > > May 30 10:30:23 afs4 volserver[483]: 1 Volser: ListVolumes: 
> > > Could not attach
> > > volume 536924071 
> > > May 30 10:30:23 afs4 volserver[483]: 1 Volser: ListVolumes: 
> > > Could not attach
> > > volume 536896750 
> > > May 30 10:30:23 afs4 volserver[483]: 1 Volser: ListVolumes: 
> > > Could not attach
> > > volume 536897341 
> > > May 30 10:30:23 afs4 volserver[483]: 1 Volser: ListVolumes: 
> > > Could not attach
> > > volume 536983233 
> > > May 30 10:30:23 afs4 volserver[483]: 1 Volser: ListVolumes: 
> > > Could not attach
> > > volume 536906834 
> > > (tons of that for every volume on the server, and happens 
> > > again if you do a
> > > vos listvol against the server.)
> > > -----
> > > 
> > > The other symptom - when clearing off a server, I happened to 
> > > notice that
> > > the volserver seemed to hang (and not respond to any new 
> > > client requests
> > > such as vos partinfo) if I started a vos release against it. 
> > > Once the vos
> > > release (in particular the ForwardMulti) completed, the 
> > > volserver responded
> > > again. I'm not talking about a huge volume - maybe 5-10 megs 
> > > with a few
> > > thousand files in it. 
> > > 
> > > I'm running volserver with no options in both cases. 
> > > 
> > > -- Nathan
> > > 
> > > ------------------------------------------------------------
> > > Nathan Neulinger                       EMail:  nneul@umr.edu
> > > University of Missouri - Rolla         Phone: (573) 341-4841
> > > Computing Services                       Fax: (573) 341-4216
> > > _______________________________________________
> > > OpenAFS-devel mailing list
> > > OpenAFS-devel@openafs.org
> > > https://lists.openafs.org/mailman/listinfo/openafs-devel
> > > 
> > _______________________________________________
> > OpenAFS-devel mailing list
> > OpenAFS-devel@openafs.org
> > https://lists.openafs.org/mailman/listinfo/openafs-devel
> > 
> _______________________________________________
> OpenAFS-devel mailing list
> OpenAFS-devel@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-devel
>