[OpenAFS] 1.4.4 client on EL3: panic in afs_HashOutDcache

Stephan Wiesand Stephan.Wiesand@desy.de
Thu, 12 Apr 2007 19:26:49 +0200 (CEST)


On Thu, 12 Apr 2007, Derrick J Brashear wrote:

> On Thu, 12 Apr 2007, Stephan Wiesand wrote:
> 
> > On Wed, 11 Apr 2007, Derrick J Brashear wrote:
> >
> > > On Wed, 11 Apr 2007, Stephan Wiesand wrote:
> > > 
> > > > One of our systems panicked two times within 2 hours yesterday, at the
[...]
> > > How reproducible is it?
> >
> > Good news: it is reproducible. The user confessed that he'd run "less than
> > 20" parallel rsyncs transferring data to our cell. The files are a mixture
> > af data and log files, with typical sizes of 15MB and 100kB.
> >
> > So I set up a dozen rsyncs to copy this data into another volume, and after
> > some 9 hours got the panic you find below.
> >
> > I'm going to repeat this exercise now, and will also try to make the panic
> > happen earlier (more rsyncs, read data from a faster source - any other
> > ideas?).

This time, it took less than three hours.

> > Just wondering what to do next then.
> 
> I'm thinking about a patch. I have something else I need to deal with but I
> will try to work something up after. There's a 3rd possibility, namely the
> missing object being mishashed. We can presumably just instead of panicing
> iterate everything and dump state.
> 
> I suppose the other possibility would be to get a kernel crash dump but it's
> sort of cumbersome to move those around so unless you're comfortable with a
> debugger on a kernel dump that's probably a non-starter.

I'll work on getting us a crash dump. Getting this going and practising 
how to get information out of them is high on my list anyway, and this is 
the perfect occasion. This is not the first crash of these systems 
happening during large scale data transfers - just the first one with the 
latest stable OpenAFS release.

Trying the patch you have in mind may still be the faster way to pinpoint 
the problem. I agree that this issue is not extremely urgent, though - 
most sites will probably get by for months before it strikes the first 
time.

Thanks again,
	Stephan

-- 
Stephan Wiesand
  DESY - DV -
  Platanenallee 6
  15738 Zeuthen, Germany