[OpenAFS] vos release running unexpectedly long
Benjamin Kaduk
kaduk@mit.edu
Sun, 1 Sep 2019 15:59:27 -0500
On Fri, Aug 30, 2019 at 12:35:09PM +1000, Ian Wienand wrote:
> Hello,
>
> I'm struggling to find an angle to debug very long "vos release" times
> with some of our volumes. e.g. some runtimes for the "vos release" of
> a fedora mirror volume from [1]
>
> 2019-08-22 08:46:58 - 2019-08-22 14:50:31
> 2019-08-23 16:45:09 - 2019-08-24 00:27:14
> 2019-08-23 00:44:56 - 2019-08-23 08:12:44
>
> Now a little bit changes between each mirror run, but not enough for
> 8+hours of incremental copying. Extracting for example the first run
> logs:
>
> ---
> 2019-08-22T08:46:58,270313531+00:00
> Kerberos initialization for service/fedora-mirror@OPENSTACK.ORG
> + echo 'rsync completed successfully, running vos release.'
> rsync completed successfully, running vos release.
> + k5start -t -f /etc/afsadmin.keytab service/afsadmin -- vos release -v mirror.fedora
>
> mirror.fedora
> RWrite: 536871006 ROnly: 536871007
> number of sites -> 3
> server afs01.dfw.openstack.org partition /vicepa RW Site
> server afs01.dfw.openstack.org partition /vicepa RO Site
> server afs02.dfw.openstack.org partition /vicepa RO Site
> This is a complete release of volume 536871006
> Re-cloning permanent RO volume 536871007 ... done
> Getting status of parent volume 536871006... done
> Starting transaction on RO clone volume 536871007... done
> Setting volume flags for volume 536871007... done
> Ending transaction on volume 536871007... done
> Replacing VLDB entry for mirror.fedora... done
> Starting transaction on cloned volume 536871007... done
> Updating existing ro volume 536871007 on afs02.dfw.openstack.org ...
> Starting ForwardMulti from 536871007 to 536871007 on afs02.dfw.openstack.org (as of Thu Aug 22 02:44:19 2019).
> updating VLDB ... done
> Released volume mirror.fedora successfully
> Kerberos initialization for service/afsadmin@OPENSTACK.ORG
> + date --iso-8601=ns
> 2019-08-22T14:50:31,918754893+00:00
> ---
>
> It seems to me that it must be choosing to remirror the whole RO
> volume (~450gb) rather than just taking incrementals.
I'm not super-familiar with the volserver code (and the logging
infrastructure seems kind of primitive), so I can't provide much insight
here.
> I can see nothing of interest in the VolSer; we just start seeing:
>
> Thu Aug 22 08:52:21 2019 trans 45925 on volume 536871007 is older than 300 seconds
> Thu Aug 22 08:52:51 2019 trans 45925 on volume 536871007 is older than 330 seconds
>
> which lines up time-wise. There's nothing wrong with the write speeds
> to vicepa on the server
>
> root@afs02:/vicepa# dd if=/dev/zero of=./test.file bs=1G count=1 oflag=dsync
> 1+0 records in
> 1+0 records out
> 1073741824 bytes (1.1 GB, 1.0 GiB) copied, 6.5336 s, 164 MB/s
>
> The only thing that makes this perhaps odd is a client/server version
> mismatch; the client is running 1.8.3-1 (built from upstream) and the
> server is standard Xenial 1.6.15-1ubuntu1-debian
I don't think that this kind of version mismatch is going to be relevant,
though; this is to large extent "core behavior" that shouldn't be changing
between releases.
-Ben
> It's unfortunately hard to say how long this has been going on; at
> least since we started exporting the update logs with [1]. [2] is the
> latest logs I could find from July on an older server (which *was*
> using the openafs 1.6.3 client) which looks very similar to me.
>
> Any and all suggestions welcome!
>
> -i
>
> [1] http://files.openstack.org/mirror/logs/rsync-mirrors/fedora*
> [2] http://paste.openstack.org/show/767468/
> _______________________________________________
> OpenAFS-info mailing list
> OpenAFS-info@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-info