[OpenAFS] OpenAFS for Windows 1.5.72, Windows 7, VPN session killing

Jeff Blaine jblaine@kickflop.net
Sat, 13 Mar 2010 23:00:37 -0500


[ Composed over the course of the day ]

> Its the assumption that something must be wrong
> with KFW, OpenAFS or NetIdMgr and not with the Cisco software.

I wrote "might be" (in several different open-ended ways), and
you read "must be."  I can't fix that, Jeffrey.

>> I'm sorry I don't know immediately and exactly where to
>> look for the cause of problems like you do.  I wish I
>> knew everything about everything, but I don't, and you
>> don't.
>
> I don't expect that you would.  What I would appreciate since
> you are requesting free assistance from the software authors
> and the user community is a bit of respect and consideration.

I don't see that anything I said was disrespectful or
inconsiderate in my report of the problem I was having.

If I was that way, unprovoked, I apologize.

>> I posted to kerberos@mit.edu with the initial screenshot
>> and query.
>
> Be aware that e-mails with screenshots do not arrive on
> the list.  They are filtered.  Text only on kerberos@mit.edu

I wasn't aware of that.  Without at least an auto-reply,
that seems lame to me.  Thanks for the information.

My 1st post to (or not to as it were) kerberos@mit.edu with
the screenshot said:

     Cisco VPN is working great.  As soon as KfW 3.2.2
     (with stock NIDmgr and also 2.0 NIDmgr from Secure
     Endpoints) tries to get creds, the VPN connection
     drops.

     I can repeat this at will.

     OpenAFS 1.5.72 for Windows
     Kerberos for Windows 3.2.2
     Windows 7 32-bit

     Has anyone else run into this?

     [ vpn-killed.jpg ]

I should have just resent it to openafs-info, but composed
a new message instead which left out the original details.

> For example, I still have no idea which Cisco VPN product
> you are using.  Are you using Win7 64-bit or 32-bit?  Which
> KFW distribution are you using?  Is it one of my private builds
> (that are supposed to be for support customers only but that
> I don't protect the downloads of particularly well) or one of
> the official builds from MIT that have not had a bug fix applied
> in three years?

[ Edit: doesn't matter after all, see end of message ]

MIT KfW 3.2.2
Windows 7 32-bit
Cisco VPN 5.0.05.0290
Cisco VPN does not exist for 64-bit, and is essentially EOL'd

 > Which version of OpenAFS?

1.5.72 (subject)

>>>     klist -c MSLSA:
>>>
>>>     kdestroy -c MSLSA:
>>>
>>>     ms2mit
>>>
>>>     mit2ms
>>
>> Uninstalled OpenAFS + loopback adapter, Network ID Manager not
>> running.
>>
>> None of these commands (issued in the order above) bring the
>> VPN session down.
>>
>> kinit jblaine@RCF.OUR.ORG does, for whatever that's worth.
>
> Its worth a hell of a lot.  Now you have narrowed down a minimal
> reproducible test case.  The next question is "what is your ccache?"
> Is it the MSLSA or is it something like "API:jblaine@RCF.OUR.ORG"?

I've not set anything explicit anywhere, so it's whatever the
default is.  How would I check from the cli tools?

The nid log said it was using API:, but I don't know if that
translates over to the KfW *cli* tools (which I've never touched
in my life before yesterday on Windows).

> If it is an API: ccache, does the problem occur if you use a FILE:
> ccache?
>
>    SET KRB5CCNAME=FILE:C:\krb5cc

For the hell of it, without the solid answer to the previous
question, I gave this a shot and a kinit does still kill
the VPN session with KRB5CCNAME=FILE:C:\krb5cc

[ Edit: nevermind, see below ]

> If it doesn't, then the problem might have something to do with the
> RPC communication with the API: credential cache service.  If it does,
> we can rule out any of the credential cache implementations and focus
> on the network traffic that is performed by the krb5_32.dll library
> as part of obtaining a TGT.
>
> Unfortunately, the only way to debug the krb5_32.dll library is to
> use a source code debugger.  Attach a debugger to kinit.exe, set the
> command line to "jblaine@RCF.OUR.ORG" and step into the library and
> execute one function at a time until the connection drops.  Then
> repeat the process by going one level further with each repetition
> until the Win32 call that is triggering the event is identified.

Would I be able to do this with Cygwin + gdb perhaps?  I don't
own a dev environment for Windows.  I've done it before a handful
of times with Solaris+Linux.

[ Edit: nevermind, see below ]

> Another source of useful information would be to attach WireShark
> to the VPN connection and capture the traffic that is sent on the
> connection up until the connection drops.  Cisco has experienced
> problems in the past with packet fragmentation of UDP packets.  This
> could be a new instance of the problem.

Yes, you've helped me before with that.  Thank you.  I already
have RxMaxMTU set to 1300 (tried 1400, then 1300, and left it
there).  1400 worked with XP and the same VPN client previously
for me.

More below...

> I am fairly sure though that you can rule out any issues with OpenAFS
> and NetIdMgr.

I installed Wireshark and had a look at the small portion of
network traffic before the VPN session was killed.

I *originally* thought the AS_REQ that *did* happen and
get logged before the VPN session was killed was to an
incorrect IP address.  I saw the DNS queries just before
AS_REQ, jumped the gun, and incorrectly thought, "Why is
it querying DNS to find the KDCs?"

Turns out, this misread was serendipitous.

As soon as I added the following to libdefaults in krb5.ini,
based on a completely bogus reading of the packets,
everything worked fine:

     dns_lookup_realm = no
     dns_lookup_kdc = no

Looking back at the pcap more carefully, I noticed that all of
the DNS queries before AS_REQ were of the proper KDCs (3) and
in fact the AS_REQ and AS_REP were done with a proper KDC for
RCF.OUR.ORG.

So now I'm really confused.  I re-ran both krb5.ini cases
(old and lines added) and confirmed that the addition of these
2 lines above saves my VPN sessions from being killed, even
though without them I was talking to the proper KDCs fine
(but the VPN session was dying).

Any ideas on that?