[OpenAFS] OpenAFS for Windows 1.5.72, Windows 7, VPN session
killing
Jeff Blaine
jblaine@kickflop.net
Sat, 13 Mar 2010 23:00:37 -0500
[ Composed over the course of the day ]
> Its the assumption that something must be wrong
> with KFW, OpenAFS or NetIdMgr and not with the Cisco software.
I wrote "might be" (in several different open-ended ways), and
you read "must be." I can't fix that, Jeffrey.
>> I'm sorry I don't know immediately and exactly where to
>> look for the cause of problems like you do. I wish I
>> knew everything about everything, but I don't, and you
>> don't.
>
> I don't expect that you would. What I would appreciate since
> you are requesting free assistance from the software authors
> and the user community is a bit of respect and consideration.
I don't see that anything I said was disrespectful or
inconsiderate in my report of the problem I was having.
If I was that way, unprovoked, I apologize.
>> I posted to kerberos@mit.edu with the initial screenshot
>> and query.
>
> Be aware that e-mails with screenshots do not arrive on
> the list. They are filtered. Text only on kerberos@mit.edu
I wasn't aware of that. Without at least an auto-reply,
that seems lame to me. Thanks for the information.
My 1st post to (or not to as it were) kerberos@mit.edu with
the screenshot said:
Cisco VPN is working great. As soon as KfW 3.2.2
(with stock NIDmgr and also 2.0 NIDmgr from Secure
Endpoints) tries to get creds, the VPN connection
drops.
I can repeat this at will.
OpenAFS 1.5.72 for Windows
Kerberos for Windows 3.2.2
Windows 7 32-bit
Has anyone else run into this?
[ vpn-killed.jpg ]
I should have just resent it to openafs-info, but composed
a new message instead which left out the original details.
> For example, I still have no idea which Cisco VPN product
> you are using. Are you using Win7 64-bit or 32-bit? Which
> KFW distribution are you using? Is it one of my private builds
> (that are supposed to be for support customers only but that
> I don't protect the downloads of particularly well) or one of
> the official builds from MIT that have not had a bug fix applied
> in three years?
[ Edit: doesn't matter after all, see end of message ]
MIT KfW 3.2.2
Windows 7 32-bit
Cisco VPN 5.0.05.0290
Cisco VPN does not exist for 64-bit, and is essentially EOL'd
> Which version of OpenAFS?
1.5.72 (subject)
>>> klist -c MSLSA:
>>>
>>> kdestroy -c MSLSA:
>>>
>>> ms2mit
>>>
>>> mit2ms
>>
>> Uninstalled OpenAFS + loopback adapter, Network ID Manager not
>> running.
>>
>> None of these commands (issued in the order above) bring the
>> VPN session down.
>>
>> kinit jblaine@RCF.OUR.ORG does, for whatever that's worth.
>
> Its worth a hell of a lot. Now you have narrowed down a minimal
> reproducible test case. The next question is "what is your ccache?"
> Is it the MSLSA or is it something like "API:jblaine@RCF.OUR.ORG"?
I've not set anything explicit anywhere, so it's whatever the
default is. How would I check from the cli tools?
The nid log said it was using API:, but I don't know if that
translates over to the KfW *cli* tools (which I've never touched
in my life before yesterday on Windows).
> If it is an API: ccache, does the problem occur if you use a FILE:
> ccache?
>
> SET KRB5CCNAME=FILE:C:\krb5cc
For the hell of it, without the solid answer to the previous
question, I gave this a shot and a kinit does still kill
the VPN session with KRB5CCNAME=FILE:C:\krb5cc
[ Edit: nevermind, see below ]
> If it doesn't, then the problem might have something to do with the
> RPC communication with the API: credential cache service. If it does,
> we can rule out any of the credential cache implementations and focus
> on the network traffic that is performed by the krb5_32.dll library
> as part of obtaining a TGT.
>
> Unfortunately, the only way to debug the krb5_32.dll library is to
> use a source code debugger. Attach a debugger to kinit.exe, set the
> command line to "jblaine@RCF.OUR.ORG" and step into the library and
> execute one function at a time until the connection drops. Then
> repeat the process by going one level further with each repetition
> until the Win32 call that is triggering the event is identified.
Would I be able to do this with Cygwin + gdb perhaps? I don't
own a dev environment for Windows. I've done it before a handful
of times with Solaris+Linux.
[ Edit: nevermind, see below ]
> Another source of useful information would be to attach WireShark
> to the VPN connection and capture the traffic that is sent on the
> connection up until the connection drops. Cisco has experienced
> problems in the past with packet fragmentation of UDP packets. This
> could be a new instance of the problem.
Yes, you've helped me before with that. Thank you. I already
have RxMaxMTU set to 1300 (tried 1400, then 1300, and left it
there). 1400 worked with XP and the same VPN client previously
for me.
More below...
> I am fairly sure though that you can rule out any issues with OpenAFS
> and NetIdMgr.
I installed Wireshark and had a look at the small portion of
network traffic before the VPN session was killed.
I *originally* thought the AS_REQ that *did* happen and
get logged before the VPN session was killed was to an
incorrect IP address. I saw the DNS queries just before
AS_REQ, jumped the gun, and incorrectly thought, "Why is
it querying DNS to find the KDCs?"
Turns out, this misread was serendipitous.
As soon as I added the following to libdefaults in krb5.ini,
based on a completely bogus reading of the packets,
everything worked fine:
dns_lookup_realm = no
dns_lookup_kdc = no
Looking back at the pcap more carefully, I noticed that all of
the DNS queries before AS_REQ were of the proper KDCs (3) and
in fact the AS_REQ and AS_REP were done with a proper KDC for
RCF.OUR.ORG.
So now I'm really confused. I re-ran both krb5.ini cases
(old and lines added) and confirmed that the addition of these
2 lines above saves my VPN sessions from being killed, even
though without them I was talking to the proper KDCs fine
(but the VPN session was dying).
Any ideas on that?