[OpenAFS-devel] Unable to release R/O volume -- 1 Volser: ReadVnodes: IH_CREATE: File exists - restore aborted

Michael Meffie mmeffie@sinenomine.net
Thu, 24 May 2018 11:58:21 -0400


On Thu, 24 May 2018 20:40:02 +1000
Ian Wienand <iwienand@redhat.com> wrote:

> Hello,
> 
> We lost the backing storage on our R/O server /vicepa sometime
> yesterday (it's cloud block storage out of our control, so it
> disappeared in a unknown manner).  Once things came back, we had
> volumes in a range of mostly locked states from updates and "vos
> release"s triggered by update cron jobs.
> 
> Quite a few I could manually unlock and re-release, and things went
> OK.  Others have proven more of a problem.
> 
> To cut things short, there was a lot of debugging, and we ended up
> with stuck transactions between the R/W and R/O server and
> un-unlockable volumes.  Eventually we rebooted both to clear out
> everything.  In an attempt to just clear the R/O mirrors and start
> again, I did for each problem volume:
> 
>  vos unlock $MIRROR
>  vos remove -server afs02.dfw.openstack.org -partition a -id $MIRROR.readonly
>  vos release -v $MIRROR
>  vos addsite -server afs02.dfw.openstack.org -partition a -id $MIRROR
> 
> My theory being this would completely remove the R/O mirror volume and
> start fresh.  I then proceeded to do a "vos release" on each volume in
> sequence (more details in [1]).
> 
> However, this release on the new R/O volume has not worked.  Here is
> the output from the release of one of the volumes:
> 
> ---
> Thu May 24 09:49:54 UTC 2018
> Kerberos initialization for service/afsadmin@OPENSTACK.ORG
> 
> mirror.ubuntu-ports
>     RWrite: 536871041     ROnly: 536871042
>     number of sites -> 3
>        server afs01.dfw.openstack.org partition /vicepa RW Site
>        server afs01.dfw.openstack.org partition /vicepa RO Site
>        server afs02.dfw.openstack.org partition /vicepa RO Site  -- Not released
> This is a complete release of volume 536871041
> There are new RO sites; we will try to only release to new sites
> Querying old RO sites for update times... done
> RW vol has not changed; only releasing to new RO sites
> Starting transaction on cloned volume 536871042... done
> Creating new volume 536871042 on replication site afs02.dfw.openstack.org:  done
> This will be a full dump: read-only volume needs be created for new site
> Starting ForwardMulti from 536871042 to 536871042 on afs02.dfw.openstack.org (entire volume).
> Release failed: VOLSER: Problems encountered in doing the dump !
> The volume 536871041 could not be released to the following 1 sites:
>                     afs02.dfw.openstack.org /vicepa
> VOLSER: release could not be completed
> Error in vos release command.
> VOLSER: release could not be completed
> Thu May 24 09:51:49 UTC 2018
> ---
> 
> It triggers the salvage, on the I presume only partially cloned
> volume, which logs
> 
> ---
> 05/24/2018 09:51:49 dispatching child to salvage volume 536871041...
> 05/24/2018 09:51:49 namei_ListAFSSubDirs: warning: VG 536871042 does not have a link table; salvager will recreate it.
> 05/24/2018 09:51:49 fileserver requested salvage of clone 536871042; scheduling salvage of volume group 536871041...
> 05/24/2018 09:51:49 VReadVolumeDiskHeader: Couldn't open header for volume 536871041 (errno 2).
> 05/24/2018 09:51:49 2 nVolumesInInodeFile 64 
> 05/24/2018 09:51:49 CHECKING CLONED VOLUME 536871042.
> 05/24/2018 09:51:49 mirror.ubuntu-ports.readonly (536871042) updated 05/24/2018 06:08
> 05/24/2018 09:51:49 totalInodes 32896
> ---
> 
> On the R/O server side (afs02) we have
> 
> ---
> Thu May 24 09:49:55 2018 VReadVolumeDiskHeader: Couldn't open header for volume 536871042 (errno 2).
> Thu May 24 09:49:55 2018 attach2: forcing vol 536871042 to error state (state 0 flags 0x0 ec 103)
> Thu May 24 09:49:55 2018 1 Volser: CreateVolume: volume 536871042 (mirror.ubuntu-ports.readonly) created
> Thu May 24 09:51:49 2018 1 Volser: ReadVnodes: IH_CREATE: File exists - restore aborted
> Thu May 24 09:51:49 2018 Scheduling salvage for volume 536871042 on part /vicepa over FSSYNC
> ---
> 
> I do not see anything on the R/W server side (afs01).
> 
> I have fsck'd the /vicepa partition on the RO server (afs02) and it is
> OK.
> 
> I can not find much info on "IH_CREATE: File exists" which I assume is
> the problem here.

Yes, there seems to be files left over. For that parent volume number (536871041)
the left over files would be in the path /vicep*/AFSIDat/=0/=0++U

> I would welcome any suggestions!  Clearly my theory
> of "vos remove" and "vos add" of the mirror hasn't cleared out enough
> state to recover things?

A full partition salvage on the ro server should remove the orphaned files,

  bos salvage -server afs02 -partition a -showlog -orphans attach -forceDAFS

> 
> All servers are Xenial-based with it's current 1.6.7-1ubuntu1.1
> openafs packages.
> 
> Thanks,
> 
> -i
> 
> [1] http://lists.openstack.org/pipermail/openstack-infra/2018-May/005949.html
> _______________________________________________
> OpenAFS-devel mailing list
> OpenAFS-devel@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-devel


-- 
Michael Meffie <mmeffie@sinenomine.net>