[OpenAFS-devel] stability problems, and interesting symptoms.
..
Neulinger, Nathan
nneul@umr.edu
Wed, 30 May 2001 13:11:02 -0500
Well, the system is definately leaking inodes, I'm just not sure why...
Watching /proc/sys/fs/inode-state - the first two numbers are supposed to be
'allocated' and 'free'. The 'allocated' number is continually increasing as
I move volumes, and the 'free' value is staying around 1-30.
At the moment, I'm seeing if a HUGE value for inode-max alleviates the
symptoms, but this is really ugly.
Any ideas?
-- Nathan
> -----Original Message-----
> From: Neulinger, Nathan [mailto:nneul@umr.edu]
> Sent: Wednesday, May 30, 2001 12:33 PM
> To: 'openafs-devel@openafs.org'
> Subject: RE: [OpenAFS-devel] stability problems, and interesting
> symptoms. ..
>
>
> No... this didn't help... Here's a look right after it crashed:
>
> troot-afs4(23)> cat /proc/sys/fs/file-max
> 65535
> troot-afs4(24)> cat /proc/sys/fs/file-nr
> 2381 2 65535
> troot-afs4(25)> cat /proc/sys/fs/inode-nr
> 256020 33754
> troot-afs4(26)> cat /proc/sys/fs/inode-max
> 256000
>
> Note the enormous first val in /proc/sys/fs/inode-nr? I think
> that indicates
> that at one point, over 256020 inodes were open.
>
> I think the volserver is leaking inodes in certain cases or
> something.
>
> -- Nathan
>
> > -----Original Message-----
> > From: Neulinger, Nathan [mailto:nneul@umr.edu]
> > Sent: Wednesday, May 30, 2001 12:10 PM
> > To: 'openafs-devel@openafs.org'
> > Subject: RE: [OpenAFS-devel] stability problems, and interesting
> > symptoms. ..
> >
> >
> > I added a pile of debugging to volume.c and volprocs.c and
> > came to this:
> >
> >
> > fdP = IH_OPEN(h);
> > if (fdP == NULL) {
> > Log("ReadHeader: %s:%d\n", __FILE__, __LINE__);
> > *ec = VSALVAGE;
> > return;
> > }
> >
> > in ReadHeader in volume.c... The IH_OPEN is failing. I'm
> > trying to bump up
> > inode-max and file-max on the box in question - we'll see if
> > that makes any
> > difference.
> >
> > -- Nathan
> >
> > > -----Original Message-----
> > > From: Neulinger, Nathan [mailto:nneul@umr.edu]
> > > Sent: Wednesday, May 30, 2001 10:46 AM
> > > To: 'openafs-devel@openafs.org'
> > > Subject: [OpenAFS-devel] stability problems, and interesting
> > > symptoms...
> > >
> > >
> > > I've got two problems and one interesting symptom, though
> > > probably not of
> > > any relation to the first problem.
> > >
> > > First, on a couple of my servers (and this started happening
> > > sometime back
> > > about a month or so with no apparent changes to server
> > > hardware or software)
> > > - if I start moving volumes off the server en-masse, one at a
> > > time, one
> > > after another, at some point in the process, 50-100 volumes
> > > have been moved,
> > > I get a volserver error complaining about being unable to
> > > attach a volume.
> > > Once that happens, from then on out, any listvol or
> > volserver activity
> > > against the server fails. Usually bos status indicates that
> > > vol exited with
> > > signal 6 although not necessarily immediately (I haven't seen
> > > that with
> > > openafs yet, but that was typically what I saw with 3.6-2.3).
> > > I have no
> > > error messages from the volserver other than this - and
> basically no
> > > indication that anything is wrong.
> > >
> > > I get the error both with transarc 3.6-2.3 and openafs-cvs.
> > >
> > > Syslogs looks like this:
> > > ----
> > > (lots and lots of stuff like the next few lines for the other
> > > volumes that
> > > moved ok.)
> > > May 30 10:30:18 afs4 fileserver[511]: fssync: volume
> > > 537013509 moved to
> > > 63019783; breaking all call backs
> > > May 30 10:30:18 afs4 volserver[483]: 1 Volser: Delete: volume
> > > 537013509
> > > deleted
> > > May 30 10:30:18 afs4 volserver[483]: 1 Volser: Delete: volume
> > > 537013511
> > > deleted
> > > May 30 10:30:18 afs4 volserver[483]: 1 Volser: Delete: volume
> > > 537020173
> > > deleted
> > > May 30 10:30:20 afs4 volserver[483]: 1 Volser: Clone:
> Cloning volume
> > > 536897629 to new volume 537020174
> > > May 30 10:30:20 afs4 fileserver[511]: fssync: volume
> > > 536897629 moved to
> > > 63019783; breaking all call backs
> > > May 30 10:30:20 afs4 volserver[483]: 1 Volser: Delete: volume
> > > 536897629
> > > deleted
> > > May 30 10:30:20 afs4 volserver[483]: 1 Volser: Delete: volume
> > > 536897631
> > > deleted
> > > May 30 10:30:22 afs4 volserver[483]: 1 Volser: Delete: volume
> > > 537020174
> > > deleted
> > > May 30 10:30:23 afs4 volserver[483]: VAttachVolume: Error
> > > attaching volume
> > > /vicepd/V0536906941.vol; volume needs salvage
> > > May 30 10:30:23 afs4 volserver[483]: 1 Volser: ListVolumes:
> > > Could not attach
> > > volume 536906941
> > > May 30 10:30:23 afs4 volserver[483]: 1 Volser: ListVolumes:
> > > Could not attach
> > > volume 536985904
> > > May 30 10:30:23 afs4 volserver[483]: 1 Volser: ListVolumes:
> > > Could not attach
> > > volume 536889228
> > > May 30 10:30:23 afs4 volserver[483]: 1 Volser: ListVolumes:
> > > Could not attach
> > > volume 536924071
> > > May 30 10:30:23 afs4 volserver[483]: 1 Volser: ListVolumes:
> > > Could not attach
> > > volume 536896750
> > > May 30 10:30:23 afs4 volserver[483]: 1 Volser: ListVolumes:
> > > Could not attach
> > > volume 536897341
> > > May 30 10:30:23 afs4 volserver[483]: 1 Volser: ListVolumes:
> > > Could not attach
> > > volume 536983233
> > > May 30 10:30:23 afs4 volserver[483]: 1 Volser: ListVolumes:
> > > Could not attach
> > > volume 536906834
> > > (tons of that for every volume on the server, and happens
> > > again if you do a
> > > vos listvol against the server.)
> > > -----
> > >
> > > The other symptom - when clearing off a server, I happened to
> > > notice that
> > > the volserver seemed to hang (and not respond to any new
> > > client requests
> > > such as vos partinfo) if I started a vos release against it.
> > > Once the vos
> > > release (in particular the ForwardMulti) completed, the
> > > volserver responded
> > > again. I'm not talking about a huge volume - maybe 5-10 megs
> > > with a few
> > > thousand files in it.
> > >
> > > I'm running volserver with no options in both cases.
> > >
> > > -- Nathan
> > >
> > > ------------------------------------------------------------
> > > Nathan Neulinger EMail: nneul@umr.edu
> > > University of Missouri - Rolla Phone: (573) 341-4841
> > > Computing Services Fax: (573) 341-4216
> > > _______________________________________________
> > > OpenAFS-devel mailing list
> > > OpenAFS-devel@openafs.org
> > > https://lists.openafs.org/mailman/listinfo/openafs-devel
> > >
> > _______________________________________________
> > OpenAFS-devel mailing list
> > OpenAFS-devel@openafs.org
> > https://lists.openafs.org/mailman/listinfo/openafs-devel
> >
> _______________________________________________
> OpenAFS-devel mailing list
> OpenAFS-devel@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-devel
>