[OpenAFS-devel] pam_afs hangs when AFS DB server is down

Hans-Werner Paulsen hans@MPA-Garching.MPG.DE
Fri, 14 Jan 2005 14:43:14 +0100


Hello,
Currently I am checking if our users can work, when one (not all) of
our AFS database servers is down.

The tests were made with Linux 2.4.28-rc1, OpenAFS 1.2.13. To simulate
a database server which is down, I set one database server IP address
to a machine on the local subnet, which does not run any AFS servers.

(1) /bin/login: takes some time, but works
(2) /usr/X11R6/bin/xdm: hangs forever

I wrote a small test program, which uses similiar calls to PAM as xdm
does. These calls are:
        pam_start
        pam_authenticate
        pam_acct_mgmt
        pam_setcred
        pam_open_session
        fork
	parent:
		wait for child
	child:
        	pam_setcred
        	pam_end
This test program hangs within the child "pam_setcred" (like xdm).
Using tcpdump I can see that no more packets are sent to/from the
machine running the program.
If I omit the "fork" call, the program succeeded in getting a token.
Of course this needs some time, because the program first tries to
connect to port 7004 of the wrong DB machine.

Any idea or help?
Hans-Werner

-- 
Hans-Werner Paulsen		hans@MPA-Garching.MPG.DE
MPI für Astrophysik		Tel 089-30000-2602
Karl-Schwarzschild-Str. 1	Fax 089-30000-2235	
D-85741 Garching