[OpenAFS-devel] Unable to release R/O volume -- 1 Volser: ReadVnodes: IH_CREATE: File exists - restore aborted

Ian Wienand iwienand@redhat.com
Mon, 28 May 2018 10:16:17 +1000


Thank you for the response

On 05/25/2018 01:58 AM, Michael Meffie wrote:
>> I can not find much info on "IH_CREATE: File exists" which I assume is
>> the problem here.
> 
> Yes, there seems to be files left over. For that parent volume number (536871041)
> the left over files would be in the path /vicep*/AFSIDat/=0/=0++U

We (with the help of auristor) came to a similar conclusion for
another of our corrupted volumes, with the help of strace().  How did
you calculate that hash for volume number -> AFSIDat/ path?

We ended up rm-ing the directory, and the release of that volume
worked.

> A full partition salvage on the ro server should remove the orphaned files,
> 
>    bos salvage -server afs02 -partition a -showlog -orphans attach -forceDAFS

I did run this as suggested, but it didn't seem sufficient to find
these orphaned files.

Here is the salvage log for

 # bos salvage -server localhost -localauth -partition a -showlog -orphans attach -forceDAFS -volume mirror.ubuntu-ports.readonly

 05/24/2018 23:36:52 dispatching child to salvage volume 536871041...
 05/24/2018 23:36:52 VReadVolumeDiskHeader: Couldn't open header for volume 536871041 (errno 2).
 05/24/2018 23:36:52 2 nVolumesInInodeFile 64 
 05/24/2018 23:36:52 CHECKING CLONED VOLUME 536871042.
 05/24/2018 23:36:52 mirror.ubuntu-ports.readonly (536871042) updated 05/24/2018 06:08
 05/24/2018 23:36:52 totalInodes 32894

It doesn't seem to be mentioning orphaned files ... it looks for
/vicepa/V0536871041.vol which isn't there

 16923 open("/vicepa/V0536871041.vol", O_RDONLY) = -1 ENOENT (No such file or directory)
 16923 gettimeofday({1527205271, 228504}, NULL) = 0
 16923 stat("/etc/localtime", {st_mode=S_IFREG|0644, st_size=118, ...}) = 0
 16923 write(7, "05/24/2018 23:41:11 VReadVolumeD"..., 96) = 96

And then, as you say, starts looking at /vicepa/AFSIDat/=0/=0++U

 16923 openat(AT_FDCWD, "/vicepa", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 10
 16923 getdents(10, /* 29 entries */, 32768) = 1088
 16923 getdents(10, /* 0 entries */, 32768) = 0
 16923 close(10)                         = 0
 16923 open("/vicepa/salvage.inodes.vicepa.16923", O_RDWR|O_CREAT|O_TRUNC, 0666) = 10
 16923 unlink("/vicepa/salvage.inodes.vicepa.16923") = 0
 16923 stat("/vicepa", {st_mode=S_IFDIR|0755, st_size=4096, ...}) = 0
 16923 openat(AT_FDCWD, "/vicepa/AFSIDat/=0/=0++U/special", O_RDONLY|O_NONBLOCK|O_DIRECTORY|O_CLOEXEC) = 11
 16923 getdents(11, /* 6 entries */, 32768) = 176
 16923 stat("/vicepa/AFSIDat/=0/=0++U/special/zzzzD66+++0", {st_mode=S_IFREG|03, st_size=8205056, ...}) = 0
 16923 fstat(10, {st_mode=S_IFREG|0644, st_size=0, ...}) = 0
 16923 mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f89ff67b000
 16923 stat("/vicepa/AFSIDat/=0/=0++U/special/zzzzP26+++0", {st_mode=S_IFREG|06, st_size=375950, ...}) = 0
 16923 open("/vicepa/AFSIDat/=0/=0++U/special/zzzzP26+++0", O_RDWR) = 12
 (and so on)
 ...

Is it "safe" -- in the context of recovery of a RO mirror from an
uncontrolled storage loss which is not responding to salvage attempts
-- to simply remove the AFSIDat/... directory for the volume and let
the "vos release" process re-create everything?  Is that private for
that volume; i.e. removing it won't make thing worse for *other*
volumes at least?

-i