[OpenAFS] file corruption redux

Miles Davis miles@CS.Stanford.EDU
Wed, 31 May 2006 12:32:53 -0700


On Wed, May 31, 2006 at 11:23:29AM -0700, Russ Allbery wrote:
> Miles Davis <miles@CS.Stanford.EDU> writes:
> 
> > Well, just for fun, I exported my /vicep via NFS and I can reproduce the
> > exact same bit errors -- in fact, they seem to always occur at the same
> > offset in a given file, which totally freaks me out. I guess I get to
> > spend some time ripping out various parts one by one.
> 
> On Solaris, I've had this problem before and think I decided it was either
> a hard disk controller or hard disk problem (more likely with the onboard
> memory than with the physical media, I expect).  But it was just once, and
> a long time ago.

Well, it appears to be the onboard gig-e controller. The onboard 100mbit 
interface works fine, but the gig interface shows corruption every time.

OK, so that was one raid controller that corrupted data on disk, and now one 
ethernet controller that corrupted data as it sent it out on the net. I'm 
looking forward to a hard-to-diagnose cpu problem next. That or chassis 
failure. :)



-- 
// Miles Davis - miles@cs.stanford.edu - http://www.cs.stanford.edu/~miles
// Computer Science Department - Computer Facilities
// Stanford University