[OpenAFS-devel] stability problems, and interesting symptoms.
..
Neulinger, Nathan
nneul@umr.edu
Wed, 30 May 2001 12:32:33 -0500
No... this didn't help... Here's a look right after it crashed:
troot-afs4(23)> cat /proc/sys/fs/file-max
65535
troot-afs4(24)> cat /proc/sys/fs/file-nr
2381 2 65535
troot-afs4(25)> cat /proc/sys/fs/inode-nr
256020 33754
troot-afs4(26)> cat /proc/sys/fs/inode-max
256000
Note the enormous first val in /proc/sys/fs/inode-nr? I think that indicates
that at one point, over 256020 inodes were open.
I think the volserver is leaking inodes in certain cases or something.
-- Nathan
> -----Original Message-----
> From: Neulinger, Nathan [mailto:nneul@umr.edu]
> Sent: Wednesday, May 30, 2001 12:10 PM
> To: 'openafs-devel@openafs.org'
> Subject: RE: [OpenAFS-devel] stability problems, and interesting
> symptoms. ..
>
>
> I added a pile of debugging to volume.c and volprocs.c and
> came to this:
>
>
> fdP = IH_OPEN(h);
> if (fdP == NULL) {
> Log("ReadHeader: %s:%d\n", __FILE__, __LINE__);
> *ec = VSALVAGE;
> return;
> }
>
> in ReadHeader in volume.c... The IH_OPEN is failing. I'm
> trying to bump up
> inode-max and file-max on the box in question - we'll see if
> that makes any
> difference.
>
> -- Nathan
>
> > -----Original Message-----
> > From: Neulinger, Nathan [mailto:nneul@umr.edu]
> > Sent: Wednesday, May 30, 2001 10:46 AM
> > To: 'openafs-devel@openafs.org'
> > Subject: [OpenAFS-devel] stability problems, and interesting
> > symptoms...
> >
> >
> > I've got two problems and one interesting symptom, though
> > probably not of
> > any relation to the first problem.
> >
> > First, on a couple of my servers (and this started happening
> > sometime back
> > about a month or so with no apparent changes to server
> > hardware or software)
> > - if I start moving volumes off the server en-masse, one at a
> > time, one
> > after another, at some point in the process, 50-100 volumes
> > have been moved,
> > I get a volserver error complaining about being unable to
> > attach a volume.
> > Once that happens, from then on out, any listvol or
> volserver activity
> > against the server fails. Usually bos status indicates that
> > vol exited with
> > signal 6 although not necessarily immediately (I haven't seen
> > that with
> > openafs yet, but that was typically what I saw with 3.6-2.3).
> > I have no
> > error messages from the volserver other than this - and basically no
> > indication that anything is wrong.
> >
> > I get the error both with transarc 3.6-2.3 and openafs-cvs.
> >
> > Syslogs looks like this:
> > ----
> > (lots and lots of stuff like the next few lines for the other
> > volumes that
> > moved ok.)
> > May 30 10:30:18 afs4 fileserver[511]: fssync: volume
> > 537013509 moved to
> > 63019783; breaking all call backs
> > May 30 10:30:18 afs4 volserver[483]: 1 Volser: Delete: volume
> > 537013509
> > deleted
> > May 30 10:30:18 afs4 volserver[483]: 1 Volser: Delete: volume
> > 537013511
> > deleted
> > May 30 10:30:18 afs4 volserver[483]: 1 Volser: Delete: volume
> > 537020173
> > deleted
> > May 30 10:30:20 afs4 volserver[483]: 1 Volser: Clone: Cloning volume
> > 536897629 to new volume 537020174
> > May 30 10:30:20 afs4 fileserver[511]: fssync: volume
> > 536897629 moved to
> > 63019783; breaking all call backs
> > May 30 10:30:20 afs4 volserver[483]: 1 Volser: Delete: volume
> > 536897629
> > deleted
> > May 30 10:30:20 afs4 volserver[483]: 1 Volser: Delete: volume
> > 536897631
> > deleted
> > May 30 10:30:22 afs4 volserver[483]: 1 Volser: Delete: volume
> > 537020174
> > deleted
> > May 30 10:30:23 afs4 volserver[483]: VAttachVolume: Error
> > attaching volume
> > /vicepd/V0536906941.vol; volume needs salvage
> > May 30 10:30:23 afs4 volserver[483]: 1 Volser: ListVolumes:
> > Could not attach
> > volume 536906941
> > May 30 10:30:23 afs4 volserver[483]: 1 Volser: ListVolumes:
> > Could not attach
> > volume 536985904
> > May 30 10:30:23 afs4 volserver[483]: 1 Volser: ListVolumes:
> > Could not attach
> > volume 536889228
> > May 30 10:30:23 afs4 volserver[483]: 1 Volser: ListVolumes:
> > Could not attach
> > volume 536924071
> > May 30 10:30:23 afs4 volserver[483]: 1 Volser: ListVolumes:
> > Could not attach
> > volume 536896750
> > May 30 10:30:23 afs4 volserver[483]: 1 Volser: ListVolumes:
> > Could not attach
> > volume 536897341
> > May 30 10:30:23 afs4 volserver[483]: 1 Volser: ListVolumes:
> > Could not attach
> > volume 536983233
> > May 30 10:30:23 afs4 volserver[483]: 1 Volser: ListVolumes:
> > Could not attach
> > volume 536906834
> > (tons of that for every volume on the server, and happens
> > again if you do a
> > vos listvol against the server.)
> > -----
> >
> > The other symptom - when clearing off a server, I happened to
> > notice that
> > the volserver seemed to hang (and not respond to any new
> > client requests
> > such as vos partinfo) if I started a vos release against it.
> > Once the vos
> > release (in particular the ForwardMulti) completed, the
> > volserver responded
> > again. I'm not talking about a huge volume - maybe 5-10 megs
> > with a few
> > thousand files in it.
> >
> > I'm running volserver with no options in both cases.
> >
> > -- Nathan
> >
> > ------------------------------------------------------------
> > Nathan Neulinger EMail: nneul@umr.edu
> > University of Missouri - Rolla Phone: (573) 341-4841
> > Computing Services Fax: (573) 341-4216
> > _______________________________________________
> > OpenAFS-devel mailing list
> > OpenAFS-devel@openafs.org
> > https://lists.openafs.org/mailman/listinfo/openafs-devel
> >
> _______________________________________________
> OpenAFS-devel mailing list
> OpenAFS-devel@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-devel
>