[OpenAFS] Re: vos release running unexpectedly long
Ian Wienand
iwienand@redhat.com
Wed, 11 Sep 2019 10:38:26 +1000
On Fri, Aug 30, 2019 at 12:35:09PM +1000, Ian Wienand wrote:
> I'm struggling to find an angle to debug very long "vos release" times
> with some of our volumes.
So some more details ... I've managed to take out any write updates
from the equation, but the volume with no updates still takes quite a
long time to release.
I first ran a release on the volume:
...
Released volume mirror.fedora successfully
root@afs01:~# date
Tue Sep 10 23:19:37 UTC 2019
I then quickly started a new release; nothing had written to this
volume in the mean time (the only server that could is shutdown):
root@afs01:~# date
Tue Sep 10 23:24:45 UTC 2019
root@afs01:~# vos release -localauth -v mirror.fedora
So this should be a zero-delta update. However, this took ~ 30
minutes:
Released volume mirror.fedora successfully
root@afs01:~# date
Tue Sep 10 23:55:27 UTC 2019
The place it spends time is
Starting ForwardMulti from 536871007 to 536871007 on afs02.dfw.openstack.org (as of Mon Sep 9 06:23:11 2019).
I had rxdebug and netstat running in a loop on afs02 while this was
happening. The numbers seem to add up that about 50gb of data is
transferred as part of the (theoretically zero-delta) release.
start of vos release netstat
1969282994 packets received
InOctets: 2855172400594
end of vos release netstat
2003101829 packets received
InOctets: 2904920938101
final rxdebug report for thread before it disappeared:
Received 157738240 bytes in 33476256 packets
(it seems the received stats rolls over at 4gb? but packet count is accurate)
Total octets recevied over period:
2904920938101 - 2855172400594 = 49748537507 (49.74gb)
Total packets recevied over period:
2003101829 - 1969282994 = 33818835
The approx packet size is sane for mtu
49748537507 / 33818835 = 1471.03049253
Thus calculating the percentage of total packets _not_ accounted for
by rxdebug:
(33818835 - 33476256) / 33818835 * 100 ~= 1%
So it lines up that 99% of all of the system traffic seen via netstat
over the release period is accounted for by the update thread reported
by rxdebug.
Ergo the "vos release" of the volume with no changes has resulted in
about 50gb of data being sent to the R/O mirror, and consequently
long release times.
I do have the file audit logs for this period, although I'm not sure
what to look for exactly.
-i