[OpenAFS] file corruption redux

Miles Davis miles@CS.Stanford.EDU
Wed, 31 May 2006 11:02:13 -0700


On Wed, May 31, 2006 at 01:25:45PM -0400, Derrick J Brashear wrote:
> On Wed, 31 May 2006, Miles Davis wrote:
> 
> >>>OK, here we go: cmp -l aspell-bg-0.50-9.i386.rpm
> >>>/tmp/aspell-bg-0.50-9.i386.rpm
> >>>1683429 377 177
> >>>
> >>>(same from at least two clients)
> >>>
> >>>tcpdump (4.4MB) file is at 
> >>>http://cs.stanford.edu/people/miles/tcpdump.out
> >>>Server is 171.64.64.67, client is 171.64.64.132.
> >>
> >>That's a single bit error. That screams bad hardware. I will look at the
> >>tcpdump, though.
> >
> >Bugger. Well, while I have your attention, do you have an educated guess 
> >as to
> >what I should yank & replace next? I already replaced the memory, and it's
> >single-bit ECC...I haven't managed to get any failures from memtest86, but 
> >then
> >again I don't recall ever getting memtest86 to find an error.
> 
> Mrph mpfl. Um. Anyone else have thoughts on this?

Well, just for fun, I exported my /vicep via NFS and I can reproduce the exact 
same bit errors -- in fact, they seem to always occur at the same offset in a 
given file, which totally freaks me out. I guess I get to spend some time 
ripping out various parts one by one.

Thanks for all the help,

-- 
// Miles Davis - miles@cs.stanford.edu - http://www.cs.stanford.edu/~miles
// Computer Science Department - Computer Facilities
// Stanford University