[OpenAFS] vos release running unexpectedly long

Benjamin Kaduk kaduk@mit.edu
Sun, 1 Sep 2019 15:59:27 -0500


On Fri, Aug 30, 2019 at 12:35:09PM +1000, Ian Wienand wrote:
> Hello,
> 
> I'm struggling to find an angle to debug very long "vos release" times
> with some of our volumes.  e.g. some runtimes for the "vos release" of
> a fedora mirror volume from [1]
> 
>  2019-08-22 08:46:58 - 2019-08-22 14:50:31
>  2019-08-23 16:45:09 - 2019-08-24 00:27:14
>  2019-08-23 00:44:56 - 2019-08-23 08:12:44
> 
> Now a little bit changes between each mirror run, but not enough for
> 8+hours of incremental copying.  Extracting for example the first run
> logs:
> 
> ---
>  2019-08-22T08:46:58,270313531+00:00
>  Kerberos initialization for service/fedora-mirror@OPENSTACK.ORG
>  + echo 'rsync completed successfully, running vos release.'
>  rsync completed successfully, running vos release.
>  + k5start -t -f /etc/afsadmin.keytab service/afsadmin -- vos release -v mirror.fedora
> 
>  mirror.fedora 
>      RWrite: 536871006     ROnly: 536871007 
>      number of sites -> 3
>         server afs01.dfw.openstack.org partition /vicepa RW Site 
>         server afs01.dfw.openstack.org partition /vicepa RO Site 
>         server afs02.dfw.openstack.org partition /vicepa RO Site 
>  This is a complete release of volume 536871006
>  Re-cloning permanent RO volume 536871007 ... done
>  Getting status of parent volume 536871006... done
>  Starting transaction on RO clone volume 536871007... done
>  Setting volume flags for volume 536871007... done
>  Ending transaction on volume 536871007... done
>  Replacing VLDB entry for mirror.fedora... done
>  Starting transaction on cloned volume 536871007... done
>  Updating existing ro volume 536871007 on afs02.dfw.openstack.org ...
>  Starting ForwardMulti from 536871007 to 536871007 on afs02.dfw.openstack.org (as of Thu Aug 22 02:44:19 2019).
>  updating VLDB ... done
>  Released volume mirror.fedora successfully
>  Kerberos initialization for service/afsadmin@OPENSTACK.ORG
>  + date --iso-8601=ns
>  2019-08-22T14:50:31,918754893+00:00
> ---
> 
> It seems to me that it must be choosing to remirror the whole RO
> volume (~450gb) rather than just taking incrementals.


I'm not super-familiar with the volserver code (and the logging
infrastructure seems kind of primitive), so I can't provide much insight
here.

> I can see nothing of interest in the VolSer; we just start seeing:
> 
>  Thu Aug 22 08:52:21 2019 trans 45925 on volume 536871007 is older than 300 seconds
>  Thu Aug 22 08:52:51 2019 trans 45925 on volume 536871007 is older than 330 seconds
> 
> which lines up time-wise.  There's nothing wrong with the write speeds
> to vicepa on the server
> 
>  root@afs02:/vicepa# dd if=/dev/zero of=./test.file bs=1G count=1 oflag=dsync
>  1+0 records in
>  1+0 records out
>  1073741824 bytes (1.1 GB, 1.0 GiB) copied, 6.5336 s, 164 MB/s
> 
> The only thing that makes this perhaps odd is a client/server version
> mismatch; the client is running 1.8.3-1 (built from upstream) and the
> server is standard Xenial 1.6.15-1ubuntu1-debian

I don't think that this kind of version mismatch is going to be relevant,
though; this is to large extent "core behavior" that shouldn't be changing
between releases.

-Ben

> It's unfortunately hard to say how long this has been going on; at
> least since we started exporting the update logs with [1].  [2] is the
> latest logs I could find from July on an older server (which *was*
> using the openafs 1.6.3 client) which looks very similar to me.
> 
> Any and all suggestions welcome!
> 
> -i
> 
> [1] http://files.openstack.org/mirror/logs/rsync-mirrors/fedora*
> [2] http://paste.openstack.org/show/767468/
> _______________________________________________
> OpenAFS-info mailing list
> OpenAFS-info@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-info