[OpenAFS-devel] 1.8.11pre1 client hanging on Linux 6.7

Michael Laß lass@mail.upb.de
Fri, 26 Jan 2024 19:53:46 +0100


I captured the following traces and will comment inline on what I could
find:


Starting with a client running on Linux 6.6.13, trying to access
/afs/desy.de:
fstrace: https://homepages.upb.de/lass/openafs/6.6.13.fstrace
pcapng:  https://homepages.upb.de/lass/openafs/6.6.13.pcapng

The packet trace (pcapng, can be opened with Wireshark) shows that the
reply to fetch-data-64 (i.e., the directory listing) arrives in
fragments (e.g., frames 127+128). Nevertheless, the reception of the
packet is acknowledged in frame 131. In the end, everything works fine.


Running the same scenario on Linux 6.7:
fstrace: https://homepages.upb.de/lass/openafs/6.7_default-mtu_default-rxma=
xmtu.fstrace
pcapng:  https://homepages.upb.de/lass/openafs/6.7_default-mtu_default-rxma=
xmtu.pcapng

The receiving side looks very similar, we still receive the reply to
fetch-data-64 in fragments (frames 127+128, 129+130, etc.). However,
the reception is never acknowledged by the client. The getdents64
syscall hangs forever.


Reducing the maximum RX MTU via -rxmaxmtu 1400 on Linux 6.7:
fstrace: https://homepages.upb.de/lass/openafs/6.7_default-mtu_rxmtu-1400.f=
strace
pcapng:  https://homepages.upb.de/lass/openafs/6.7_default-mtu_rxmaxmtu-140=
0.pcapng

The reply to fetch-data-64 is not fragmented anymore because the RX
packets are sufficiently small (frames 149-152). The reception is ACK'd
in frame 154.


It could be that the larger UDP packets are segmented by my provider,
as my IPv4 connection is realized via DS-Lite (a carrier-grade NAT
[1][2]), which may reduce the MTU. This segmentation may be key to
reproduce this issue.

Still, it worked fine with Linux 6.6, even when receiving fragmented
responses, and it is not working anymore with Linux 6.7. I may start
bisecting the Linux kernel changes between 6.6 and 6.7, but I fear that
this will take weeks...

Best regards,
Michael


[1] https://en.wikipedia.org/wiki/Carrier-grade_NAT
[2] https://en.wikipedia.org/wiki/IPv6_transition_mechanism#Dual-Stack_Lite=
_(DS-Lite)