[OpenAFS] Re: Debugging a network performance problem that affects AFS

Harald Barth haba@kth.se
Fri, 26 Aug 2011 08:43:46 +0200 (CEST)

> In this particular case, I am running xstat_cm_test on the client,
> against that same client. So the kind of error you cite shouldn't
> exist. I suspect it's a problem with the building of the package. I
> need to reboot to Gentoo, and I'm pretty sure I've run xstat_cm_test
> there, though with different options.

Now I have stumbled over the same problem. On all machines I have
tried (well, they are all 64 bit Linux) I get

# xstat_cm_test localhost 2 -onceonly -debug
RunTheTest: Allocating socket array for 1 Cache Manager(s)
Allocating 1 long(s) for coll ID
CollID at index 0 is 2

Starting up the xstat_cm service, debugging enabled, one-shot operation
[xstat_cm_Init] Asking for 1 collection(s): 2 
[xstat_cm_Init] Initializing Rx on port 0
[xstat_cm_Init] Rx initialized on port 0
[xstat_cm_Init] Probe LWP client security object created
[xstat_cm_Init] Copying in the following socket info:
[xstat_cm_Init] IP addr 0s, port 1231122960
[xstat_cm_Init] Host name for server index 0 is localhost
[xstat_cm_Init] Connecting to srv idx 0, IP addr, port 7001, service 1
[xstat_cm_Init] New connection at 0x14e8ee0
[xstat_cm_Init] Creating the probe LWP
[xstat_cm_Init] Probe LWP process structure located at 0x14fb670
[RunTheTest] Calling LWP_WaitProcess() on event 0x62f180
[xstat_cm_LWP] Waking up, getting data from 1 server(s)
[xstat_cm_LWP] Getting collections from Cache Manager 'localhost'
[xstat_cm_LWP] Connection OK, calling RXAFSCB_GetXStats
[xstat_cm_LWP] Asking for data collection 2
xstat_cm_LWP: Calling RXAFSCB_GetXStats, conn=0x14e8ee0, clientVersionNumber=2, collectionNumber=2, srvVersionNumberP=0x6203cfac, timeP=0x6311a4, dataP=0x6311b8
xstat_cm_LWP: [bufflen=2048, buffer at 0x62f1a0]
[xstat_cm_LWP] Calling handler routine.

** Data size mismatch in performance collection!** Expecting 1064, got 759
** Version mismatch with Cache Manager
[xstat_cm_LWP] Polling complete for probe round 1.
[xstat_cm_LWP] Signalling main process at 0x62f180
[RunTheTest] Returned from LWP_WaitProcess()

Yawn, main thread just woke up.  Cleaning things out...

This is against localhost. So this is not version mismatch but bug.

Any ideas? I have not started searching around like trying on other OSes yet.


PS: Or is this still the same bug as 2006?