[OpenAFS] Tracking down AFS Fileserver corruption

Jack Neely jjneely@pams.ncsu.edu
Mon, 28 Nov 2011 14:57:44 -0500


On Mon, Nov 28, 2011 at 08:34:00PM +0100, Stephan Wiesand wrote:
> Hi Jack,
> 
> no help, just a few dumb questions inline:
> 
> On Nov 28, 2011, at 19:13 , Jack Neely wrote:
> 
> > Folks,
> > 
> > I'm deploying new OpenAFS 1.6.0 DAFS file servers on fully update RHEL
> > 6.1 servers and I've stumbled across a data corruption problem.  My ext4
> > filesystem on the vice mounts are not getting corrupted, just the AFS
> > volume data.
> > 
> > Our /vicep[ab] mounts are provided by an EMC Clariion SAN array using
> > the PowerPath driver.  Each of the two vice mounts have 4 paths and are
> > not partitioned.  I've directly formatted the /dev/emcpower[ab] block
> > device as ext4.  Of course, the /dev/emcpowerX device is mounted on
> > /vicepX.
> 
> emcpower{a,b} map to sdc{c,e} ?
> 

emcpowera is made of the paths: sdc sde sdg sdi

emcpowerb is made of the paths: sdb sdd sdf sdh

Here's the information from the powermt tool:
http://pastebin.com/sfmJX5Kc

> > Every hour our OCS Inventory agent runs which eventually runs "fdisk -l"
> > to get statistics for the storage on the server.  When I was moving test
> > volumes to the new server and the agent ran fdisk -l the kernel would
> > print:
> > 
> >    Nov 28 13:01:39 xxx kernel: sdc: unknown partition table
> >    Nov 28 13:01:39 xxx kernel: sde: unknown partition table
> >    Nov 28 13:01:49 xxx kernel: sdc: unknown partition table
> >    Nov 28 13:01:49 xxx kernel: sde: unknown partition table
> 
> If the devices aren't partitioned, why would it ever find a partition table?

It shouldn't.  But why does it keep looking (and cause corruption)?
Before I figured out that the corruption was happening at the same time
as these messages I didn't think that there was any connection.

> 
> This may have changed, but Red Hat used to not support setups with filesystems on unpartitioned block devices, I believe.
> 

I have a support case open with Red Hat as well and they have not
indicated this.  In fact, not partitioning SAN devices (especially large
ones) seems to be accepted practice nowadays.

> > and the volume being moved at that exact time would be corrupt.  Usually
> > the server would soon detect this and salvage the volume, but the level
> > of corruptions has varied.
> 
> I don't have experience with running 1.6 servers in production yet, but since the AFS fileserver is entirely running in userland, it should not cause this kind of corruption. That being said, there's an open BZ regarding ext4 corruption due to Ceph userland processes...
> 

The ext4 file system is not corrupted...so I think the afs daemons are
somehow being disturbed and not writing complete data.

> > The above messages and corruption only seem to happen when volume moves
> > are in progress.  Running fdisk -l on an idle server produces no
> > messages.
> 
> Any messages if you run bonnie++ or iozone on the filesystem when the agent runs?
> 

Haven't tried yet.  Good idea though.

> > Other things cause the above messages to be re-printed, such as running
> > fsck -yf /dev/emcpowera.
> 
> Is this safe to do on a mounted ext4 filesystem?
> 

I ran fsck on the unmounted SAN LUN to make sure I didn't have file
system corruption.  I was surprised that it seemed to trigger partition
rescans again....

Jack

> >  They occur during the early hours of the
> > morning as well from something that appears to be related to a cron job
> > I've not tracked down yet.  
> > 
> > I need some help in figuring out what is causing the corruption and,
> > more importantly, how to fix things.
> 
> If the AFS fileserver could be run under a different account than root, one could be completely confident it's not the culprit. As things are, I'm only 99% confident...
> 
> Best regards,
> 	Stephan
> > 
> > Thanks,
> > Jack Neely
> > 
> > -- 
> > Jack Neely <jjneely@ncsu.edu>
> > Linux Czar, OIT Campus Linux Services
> > Office of Information Technology, NC State University
> > GPG Fingerprint: 1917 5AC1 E828 9337 7AA4  EA6B 213B 765F 3B6A 5B89
> > _______________________________________________
> > OpenAFS-info mailing list
> > OpenAFS-info@openafs.org
> > https://lists.openafs.org/mailman/listinfo/openafs-info
> 
> -- 
> Stephan Wiesand
> DESY -DV-
> Platanenenallee 6
> 15738 Zeuthen, Germany
> 
> _______________________________________________
> OpenAFS-info mailing list
> OpenAFS-info@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-info
> 

-- 
Jack Neely <jjneely@ncsu.edu>
Linux Czar, OIT Campus Linux Services
Office of Information Technology, NC State University
GPG Fingerprint: 1917 5AC1 E828 9337 7AA4  EA6B 213B 765F 3B6A 5B89