[OpenAFS] File corruption, 1.4.1 on linux
Miles Davis
miles@CS.Stanford.EDU
Thu, 11 May 2006 13:19:25 -0700
OK, I'm having deja vu, but I can't remember when I saw this
last...1.3.80something probably, and I think I blamed hardware at the time or
something.
I've got a mirror of fedora, among other things. After an upgrade on the server
side to openafs-1.4.1, I had some problems doing installs from my mirror
because of bad RPMS. Sure enough, a handful were bad, e.g.
$ rpm -qp hdparm-5.9-1.i386.rpm
error: hdparm-5.9-1.i386.rpm: headerRead failed: tag[7]: BAD, tag 1006 type 4
offset 1009741886 count 218762506
OK, don't know why that happened, but I need to do installs, so nuke it and
resync to parent mirror. Check the RPM again, everything is good.
$ rpm -K hdparm-5.9-1.i386.rpm
hdparm-5.9-1.i386.rpm: sha1 md5 OK
release the volume, RO copy looks good to. Everything right with the world.
Now, wait some amount of time, do another install, and whamo -- bad rpm again,
*even on the ro volume*.
$ rpm -qp hdparm-5.9-1.i386.rpm
error: hdparm-5.9-1.i386.rpm: headerRead failed: tag[7]: BAD, tag 1006 type 4
offset 636236390 count 1482459716
It only seems to happen if I do alot of reads (like, an kickstart install out
of that dir, or checking all the rpms). Once the file is corrupt, if I replace
it with a good version, the RO is still corrupt until release. The corruption
appears on all clients.
Things I've tried to no avail:
Moved volume to new partition on server
Salvage volume
Tried from different clients (i386, x86_64, 1.4.1 and 1.3.86)
Tried disk cache vs memcache on 1.4.1
I don't see any problems on the file server end with either the hardware (no
errors reported) or underlying filesystem (XFS).
--
// Miles Davis - miles@cs.stanford.edu - http://www.cs.stanford.edu/~miles
// Computer Science Department - Computer Facilities
// Stanford University