[OpenAFS] File corruption, 1.4.1 on linux

Miles Davis miles@CS.Stanford.EDU
Thu, 11 May 2006 13:19:25 -0700


OK, I'm having deja vu, but I can't remember when I saw this 
last...1.3.80something probably, and I think I blamed hardware at the time or 
something.

I've got a mirror of fedora, among other things. After an upgrade on the server 
side to openafs-1.4.1, I had some problems doing installs from my mirror 
because of bad RPMS. Sure enough, a handful were bad, e.g.

$ rpm -qp hdparm-5.9-1.i386.rpm
error: hdparm-5.9-1.i386.rpm: headerRead failed: tag[7]: BAD, tag 1006 type 4 
offset 1009741886 count 218762506

OK, don't know why that happened, but I need to do installs, so nuke it and 
resync to parent mirror. Check the RPM again, everything is good.

$ rpm  -K hdparm-5.9-1.i386.rpm
hdparm-5.9-1.i386.rpm: sha1 md5 OK

release the volume, RO copy looks good to. Everything right with the world. 
Now, wait some amount of time, do another install, and whamo -- bad rpm again, 
*even on the ro volume*.

$ rpm -qp hdparm-5.9-1.i386.rpm
error: hdparm-5.9-1.i386.rpm: headerRead failed: tag[7]: BAD, tag 1006 type 4 
offset 636236390 count 1482459716

It only seems to happen if I do alot of reads (like, an kickstart install out 
of that dir, or checking all the rpms). Once the file is corrupt, if I replace 
it with a good version, the RO is still corrupt until release. The corruption 
appears on all clients.

Things I've tried to no avail:

Moved volume to new partition on server
Salvage volume
Tried from different clients (i386, x86_64, 1.4.1 and 1.3.86)
Tried disk cache vs memcache on 1.4.1

I don't see any problems on the file server end with either the hardware (no 
errors reported) or underlying filesystem (XFS).

-- 
// Miles Davis - miles@cs.stanford.edu - http://www.cs.stanford.edu/~miles
// Computer Science Department - Computer Facilities
// Stanford University