[OpenAFS] fileserver crashes

Jeffrey Altman jaltman@columbia.edu
Wed, 20 Oct 2004 14:33:54 -0400


This is a cryptographically signed message in MIME format.

--------------ms050401060408070702070605
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit

Ok.  So the client has an additional bug.  At least the windows 1.3.73
client is no longer crashing your file server. ;)

the best would be for you to provide me remote access to the volume
so I can attempt to replicate the problem in a debugger.  The only
difference between the DEBUG and non-DEBUG builds are:

  * the DEBUG build is not optimized
  * the DEBUG build installs symbols by default

If you cannot provide remote access, you can increase the size of the
Trace Log buffers using the registry and use the "fs trace -dump"
command to dump the output after the problem starts to occur.  This 
might or might not provide some clues.

The output of SysInternal's filemon.exe would also provide some insight
into what it is that MatLab is trying to do.

Jeffrey Altman



John W. Sopko Jr. wrote:

> The user that was having the problem upgraded to OpenAFS 1.3.73 and is 
> still
> having problems. He goes into his application, (Matlab), which allows you
> to cd and list files from within the application. When he does this the
> application just hangs. I turned up the FileLog debugging and the
> system just streams requests like the following continuously until he 
> kills the Matlab application:
> 
> ---
> Wed Oct 20 13:30:24 2004 SRXAFS_FetchData, Fid = 1769554818.372998.237859
> Wed Oct 20 13:30:24 2004 SRXAFS_FetchData, Fid = 
> 1769554818.372998.237859, Host 152.2.128.179, Id 5269
> Wed Oct 20 13:30:24 2004 SRXAFS_FetchData returns 0
> Wed Oct 20 13:30:24 2004 SAFS_FetchStatus,  Fid = 
> 1769554818.372996.237858, Host 152.2.128.179, Id 5269
> Wed Oct 20 13:30:24 2004 SAFS_FetchStatus returns 0
> Wed Oct 20 13:30:24 2004 SRXAFS_FetchData, Fid = 1769554818.372996.237858
> Wed Oct 20 13:30:24 2004 SRXAFS_FetchData, Fid = 
> 1769554818.372996.237858, Host 152.2.128.179, Id 5269
> Wed Oct 20 13:30:24 2004 SRXAFS_FetchData returns 0
> Wed Oct 20 13:30:24 2004 SAFS_FetchStatus,  Fid = 
> 1769554818.372994.237857, Host 152.2.128.179, Id 5269
> Wed Oct 20 13:30:24 2004 SAFS_FetchStatus returns 0
> Wed Oct 20 13:30:24 2004 SRXAFS_FetchData, Fid = 1769554818.372994.237857
> Wed Oct 20 13:30:24 2004 SRXAFS_FetchData, Fid = 
> 1769554818.372994.237857, Host 152.2.128.179, Id 5269
> Wed Oct 20 13:30:24 2004 SRXAFS_FetchData returns 0
> Wed Oct 20 13:30:24 2004 SAFS_FetchStatus,  Fid = 
> 1769554818.372992.237856, Host 152.2.128.179, Id 5269
> Wed Oct 20 13:30:24 2004 SAFS_FetchStatus returns 0
> Wed Oct 20 13:30:24 2004 SRXAFS_FetchData, Fid = 1769554818.372992.237856
> Wed Oct 20 13:30:24 2004 SRXAFS_FetchData, Fid = 
> 1769554818.372992.237856, Host 152.2.128.179, Id 5269
> Wed Oct 20 13:30:24 2004 SRXAFS_FetchData returns 0
> ---
> 
> He has the same problem from 2 different windows machines running
> 1.3.73. The directory he is listing does contain 1777 files.
> We can see the files fine through a normal Window explorer type window and
> I can list the directory from a 1.2.11 linux client without a problem.
> 
> I am more of a unix/linux guy but I have involved our windows folks.
> If you can tell me how to use the windows debug client or get more 
> information some other way I will be happy to try it out.
> Thanks for your help.
> 
> John W. Sopko Jr. wrote:
> 
>> Derrick,
>>
>> Jeff Altman pointed me to a windows beta client that I downloaded and
>> will have our user test. Is the patch you refer to below a windows client
>> patch or a linux server patch?
>>
>> Jeff,
>>
>> We are having an important site visit here the next 2 days and
>> will not be able to test out the windows beta client until Monday.
>>
>> Derrick J Brashear wrote:
>>
>>> the Windows client problem you reference below will be fixed in 1.3.72
>>> however, the underlying filesystem issue is almost certainly the same 
>>> one other people are having, and i can give you a patch if you're 
>>> willing to try it which will not fix the issue but may help us track it
>>>
>>> On Wed, 13 Oct 2004, John W. Sopko Jr. wrote:
>>>
>>>> Our linux/AFS 1.2.11 file server has been hanging the last few weeks.
>>>> We have been upgrading machines to Windows XP SPII and OpenAFS 1.7.x
>>>> over the last month or so. Here is one issue I found that was causing
>>>> the problem:
>>>>
>>>> We have a user who uses a Windows application called Matlab for
>>>> generating and processing hundreds of files in AFS space from a Windows
>>>> XP machine. He was running OpenAFS 1.2.x client. His machine was 
>>>> upgraded
>>>> to Service pack II and OpenAFS 1.3.71. His Matlab application hangs in
>>>> windows and our file server eventually melts down.
>>>>
>>>> I am not an expert at debugging AFS, let me know if you want me to try
>>>> something. I cranked up the debug on the FileLog to 25. I could see his
>>>> machine was constantly logging messages like this, (the user name 
>>>> really
>>>> is debug):
>>>>
>>>> Wed Oct 13 12:15:47 2004 FindClient: authenticating connection: 
>>>> authClass=2
>>>> Wed Oct 13 12:15:47 2004 FindClient: rxkad conn:
>>>> name=debug,inst=,cell=,exp=1097688735,kvno=8
>>>
>>>
>>>
>>> _______________________________________________
>>> OpenAFS-info mailing list
>>> OpenAFS-info@openafs.org
>>> https://lists.openafs.org/mailman/listinfo/openafs-info
>>
>>
>>
> 

--------------ms050401060408070702070605
Content-Type: application/x-pkcs7-signature; name="smime.p7s"
Content-Transfer-Encoding: base64
Content-Disposition: attachment; filename="smime.p7s"
Content-Description: S/MIME Cryptographic Signature

MIAGCSqGSIb3DQEHAqCAMIACAQExCzAJBgUrDgMCGgUAMIAGCSqGSIb3DQEHAQAAoIIJPzCC
AvowggJjoAMCAQICAwxk8TANBgkqhkiG9w0BAQQFADBiMQswCQYDVQQGEwJaQTElMCMGA1UE
ChMcVGhhd3RlIENvbnN1bHRpbmcgKFB0eSkgTHRkLjEsMCoGA1UEAxMjVGhhd3RlIFBlcnNv
bmFsIEZyZWVtYWlsIElzc3VpbmcgQ0EwHhcNMDQwNTI3MTc1ODU4WhcNMDUwNTI3MTc1ODU4
WjBrMQ8wDQYDVQQEEwZBbHRtYW4xFTATBgNVBCoTDEplZmZyZXkgRXJpYzEcMBoGA1UEAxMT
SmVmZnJleSBFcmljIEFsdG1hbjEjMCEGCSqGSIb3DQEJARYUamFsdG1hbkBjb2x1bWJpYS5l
ZHUwggEiMA0GCSqGSIb3DQEBAQUAA4IBDwAwggEKAoIBAQDc3JqO5AsZrozd+mJ2mPuCTYo2
+nJ9Qq6jtUYtp7YTMW4d2Q6GLhNaHb1l9m74SxuY4f5vP6JtZjr6p9+LCCxD0w0NVLKRgUDp
z+tKFitbkJe9BSCxCURRvY3vdWA71gSCUvZAN3346hHb4oGVqgdpmfFJXYAHWpC46wiL72N9
WxySzY17/0eU0c8+r9dNoLpPQeL43O66O80jCl1qnXMaXaakZPsfm+5W90MYXhpQ1WIQpv02
lBn3BH5YE8xwbsNrw5AF4v7pjMuW85GI6FrDmfbpJX473Rpl5rmv3TpXkJ+7UsIIO1puyS8r
1o7kjDZ5EUYJxxglTGR6XL/RNzqHAgMBAAGjMTAvMB8GA1UdEQQYMBaBFGphbHRtYW5AY29s
dW1iaWEuZWR1MAwGA1UdEwEB/wQCMAAwDQYJKoZIhvcNAQEEBQADgYEAZYeVFCMP0iV+UVa0
eFoXkzMVl61CNAVY2YQ9/QQazO3G4qNiif35ArrnjPRDRj5M7WTeOCFqPVuvCttyJRiDKsEe
L4Yah22mRA3mR7x52j2FquPYZ9qCr1IhrNGzsMk+gopX5G0fTHZb6+uDu5SeMPNNcIznGA7M
CMpXAJ2PcKgwggL6MIICY6ADAgECAgMMZPEwDQYJKoZIhvcNAQEEBQAwYjELMAkGA1UEBhMC
WkExJTAjBgNVBAoTHFRoYXd0ZSBDb25zdWx0aW5nIChQdHkpIEx0ZC4xLDAqBgNVBAMTI1Ro
YXd0ZSBQZXJzb25hbCBGcmVlbWFpbCBJc3N1aW5nIENBMB4XDTA0MDUyNzE3NTg1OFoXDTA1
MDUyNzE3NTg1OFowazEPMA0GA1UEBBMGQWx0bWFuMRUwEwYDVQQqEwxKZWZmcmV5IEVyaWMx
HDAaBgNVBAMTE0plZmZyZXkgRXJpYyBBbHRtYW4xIzAhBgkqhkiG9w0BCQEWFGphbHRtYW5A
Y29sdW1iaWEuZWR1MIIBIjANBgkqhkiG9w0BAQEFAAOCAQ8AMIIBCgKCAQEA3NyajuQLGa6M
3fpidpj7gk2KNvpyfUKuo7VGLae2EzFuHdkOhi4TWh29ZfZu+EsbmOH+bz+ibWY6+qffiwgs
Q9MNDVSykYFA6c/rShYrW5CXvQUgsQlEUb2N73VgO9YEglL2QDd9+OoR2+KBlaoHaZnxSV2A
B1qQuOsIi+9jfVscks2Ne/9HlNHPPq/XTaC6T0Hi+NzuujvNIwpdap1zGl2mpGT7H5vuVvdD
GF4aUNViEKb9NpQZ9wR+WBPMcG7Da8OQBeL+6YzLlvORiOhaw5n26SV+O90aZea5r906V5Cf
u1LCCDtabskvK9aO5Iw2eRFGCccYJUxkely/0Tc6hwIDAQABozEwLzAfBgNVHREEGDAWgRRq
YWx0bWFuQGNvbHVtYmlhLmVkdTAMBgNVHRMBAf8EAjAAMA0GCSqGSIb3DQEBBAUAA4GBAGWH
lRQjD9IlflFWtHhaF5MzFZetQjQFWNmEPf0EGsztxuKjYon9+QK654z0Q0Y+TO1k3jghaj1b
rwrbciUYgyrBHi+GGodtpkQN5ke8edo9harj2Gfagq9SIazRs7DJPoKKV+RtH0x2W+vrg7uU
njDzTXCM5xgOzAjKVwCdj3CoMIIDPzCCAqigAwIBAgIBDTANBgkqhkiG9w0BAQUFADCB0TEL
MAkGA1UEBhMCWkExFTATBgNVBAgTDFdlc3Rlcm4gQ2FwZTESMBAGA1UEBxMJQ2FwZSBUb3du
MRowGAYDVQQKExFUaGF3dGUgQ29uc3VsdGluZzEoMCYGA1UECxMfQ2VydGlmaWNhdGlvbiBT
ZXJ2aWNlcyBEaXZpc2lvbjEkMCIGA1UEAxMbVGhhd3RlIFBlcnNvbmFsIEZyZWVtYWlsIENB
MSswKQYJKoZIhvcNAQkBFhxwZXJzb25hbC1mcmVlbWFpbEB0aGF3dGUuY29tMB4XDTAzMDcx
NzAwMDAwMFoXDTEzMDcxNjIzNTk1OVowYjELMAkGA1UEBhMCWkExJTAjBgNVBAoTHFRoYXd0
ZSBDb25zdWx0aW5nIChQdHkpIEx0ZC4xLDAqBgNVBAMTI1RoYXd0ZSBQZXJzb25hbCBGcmVl
bWFpbCBJc3N1aW5nIENBMIGfMA0GCSqGSIb3DQEBAQUAA4GNADCBiQKBgQDEpjxVc1X7TrnK
mVoeaMB1BHCd3+n/ox7svc31W/Iadr1/DDph8r9RzgHU5VAKMNcCY1osiRVwjt3J8CuFWqo/
cVbLrzwLB+fxH5E2JCoTzyvV84J3PQO+K/67GD4Hv0CAAmTXp6a7n2XRxSpUhQ9IBH+nttE8
YQRAHmQZcmC3+wIDAQABo4GUMIGRMBIGA1UdEwEB/wQIMAYBAf8CAQAwQwYDVR0fBDwwOjA4
oDagNIYyaHR0cDovL2NybC50aGF3dGUuY29tL1RoYXd0ZVBlcnNvbmFsRnJlZW1haWxDQS5j
cmwwCwYDVR0PBAQDAgEGMCkGA1UdEQQiMCCkHjAcMRowGAYDVQQDExFQcml2YXRlTGFiZWwy
LTEzODANBgkqhkiG9w0BAQUFAAOBgQBIjNFQg+oLLswNo2asZw9/r6y+whehQ5aUnX9MIbj4
Nh+qLZ82L8D0HFAgk3A8/a3hYWLD2ToZfoSxmRsAxRoLgnSeJVCUYsfbJ3FXJY3dqZw5jowg
T2Vfldr394fWxghOrvbqNOUQGls1TXfjViF4gtwhGTXeJLHTHUb/XV9lTzGCAzswggM3AgEB
MGkwYjELMAkGA1UEBhMCWkExJTAjBgNVBAoTHFRoYXd0ZSBDb25zdWx0aW5nIChQdHkpIEx0
ZC4xLDAqBgNVBAMTI1RoYXd0ZSBQZXJzb25hbCBGcmVlbWFpbCBJc3N1aW5nIENBAgMMZPEw
CQYFKw4DAhoFAKCCAacwGAYJKoZIhvcNAQkDMQsGCSqGSIb3DQEHATAcBgkqhkiG9w0BCQUx
DxcNMDQxMDIwMTgzMzU0WjAjBgkqhkiG9w0BCQQxFgQUpLTv+txU2bV8ctUtn97A5NsYx7cw
UgYJKoZIhvcNAQkPMUUwQzAKBggqhkiG9w0DBzAOBggqhkiG9w0DAgICAIAwDQYIKoZIhvcN
AwICAUAwBwYFKw4DAgcwDQYIKoZIhvcNAwICASgweAYJKwYBBAGCNxAEMWswaTBiMQswCQYD
VQQGEwJaQTElMCMGA1UEChMcVGhhd3RlIENvbnN1bHRpbmcgKFB0eSkgTHRkLjEsMCoGA1UE
AxMjVGhhd3RlIFBlcnNvbmFsIEZyZWVtYWlsIElzc3VpbmcgQ0ECAwxk8TB6BgsqhkiG9w0B
CRACCzFroGkwYjELMAkGA1UEBhMCWkExJTAjBgNVBAoTHFRoYXd0ZSBDb25zdWx0aW5nIChQ
dHkpIEx0ZC4xLDAqBgNVBAMTI1RoYXd0ZSBQZXJzb25hbCBGcmVlbWFpbCBJc3N1aW5nIENB
AgMMZPEwDQYJKoZIhvcNAQEBBQAEggEASgsq1M//KHseLjay8Lrg18J+SeKWhApEbH4+j66K
MSOJpl+kEmz5knJsci4SwZFD+cDHruccR22YPrVQR0zp3ykE19xTgiSYRplBe5vLzP1SAxjh
rdhKpor5YZQU/gTPEJx6Nxsf92zmsYUOXr120rnckh3MR2xuKJV0w98+kuKIQz0QJ0PcQLL7
bkWtx7gFzlUeeGXP1mwwDTk/85AN7sCRsbcZ6opG9FBhrtuaw9qKctYebjMRx9AJIX5BGpDe
zb052oHWO+g3yrIWKu5eZgf0HHmZP3yKm0WoUSdJmkjEeRbfaSc4zuDg3Zyzn8xZRss3Wb0/
oC1j1X1eE1BydQAAAAAAAA==
--------------ms050401060408070702070605--