[OpenAFS-devel] 1.8.11pre1 client hanging on Linux 6.7

Michael Laß lass@mail.upb.de
Wed, 31 Jan 2024 17:32:56 +0100


Thank you Jeffrey for the detailed summary!

I finished bisecting the changes between Linux 6.6 and 6.7 and by now I
think we were hunting multiple issues here.


1. A regression in Linux 6.7:

Bisecting lead me to dc32bff195b45e8571c442954beee259e9500dac
("iov_iter, net: Fold in csum_and_memcpy()") being the first bad
commit. With this change, my client cannot at all talk to my test cell,
which runs in a VM on the same system. I think I spotted the mistake in
that change and I just proposed a fix on the netdev mailing list:

https://lore.kernel.org/netdev/20240131155220.82641-1-bevan@bi-co.net/T/#u

With this patch applied on top of v6.7.2, access to my test cell works
fine again. Note that this is likely not related to any MTU
restrictions, as the traffic does not leave my home network.


2. Lost packets and significant delays due to MTU restrictions:

As Jeffrey explained, the MTU of my IPv4 connection is reduced down to
1460 due to the tunneling over IPv6. When accessing a public cell with
default settings, large reply packets are lost on their way. At some
point in time (and for unknown reasons) the packets start to arrive in
fragments. From there on, the connection works fine, although likely
not optimally due to fragmentation overhead.

I think that this problem already affected me with earlier kernel
versions, as an initial access always took quite a while. I can only
assume that problem no. 1 additionally influenced my tests with public
cells and made things even worse.

This issue can easily be fixed by passing `-rxmaxmtu 1404` to afsd.
Knowing about my internet connection, I will use this flag in future.


I will continue testing for a bit to see if there are any remaining
issues.

Best regards,
Michael