[OpenAFS] find /afs/ breaking the client?

Jakub Witkowski jpw@wszib.edu.pl
Wed, 07 Feb 2007 11:11:28 +0100


--=-c/EZ53ebAjfzHifbYmS5
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable

Dnia 04-02-2007, nie o godzinie 10:21 -0500, Derrick J Brashear
napisa=C5=82(a):
> On Sun, 4 Feb 2007, Jakub Witkowski wrote:
>=20
> >> Well, we haven't recommended 1.5.14 so I'm curious why you chose it, b=
ut,
> >> do you have an oops?
> >>
> >
> > No, no oops. The system just... blocks. You can interact with programs
> > already in memory, access open files, but not open new.
> >
> > I chose .14 mostly because I was having problems building the module fo=
r
> > Xen kernel and this version simply was first that I got compiled. I may
> > fall back to something more stable now, as I know how to get things
> > running.
> >
> > Which OpenAFS version you recommend for installation on a client? On a
> > server?
>=20
> For Linux, we haven't recommended any 1.5.x client. 1.4.2, generally,=20
> though 1.4.3rc2 should be out in a day or so.
>=20
> If you can get cmdebug information when it's hung, that's be useful to=20
> see.

I have done some experiments and my findings are not exactly optimistic.
First of all, I found out that the hang was actually caused by some
weird interaction between OpenAFS client and libnss-ldap library; in
test enviroinment I can reproduce the systemwide hang described above
when I set up nsswitch library to look uids up in ldap, but if it is not
configured to do so, only the find process hangs - and then, only for a
few minutes. Adding -fakestat-all switch makes the problem less
pronounced (i.e. find lists more files) but not go away.

On the other hand, 1.4.2 appears to be free of this problem, at least I
have not yet found a way to crash nor hang it.

1.4.3rc1 has a bug:

Unable to handle kernel paging request at ffff880040000000 RIP:=20
 [<ffffffff880d3462>] :libafs:InstallUVolumeEntry+0x162/0x480
PGD e74067 PUD 1076067 PMD 1077067 PTE 0
Oops: 0000 [1] SMP=20
CPU 0=20
Modules linked in: libafs xt_tcpudp ip6table_filter ip6_tables xt_state
xt_pkttype iptable_raw xt_CLASSIFY xt_CONNMARK xt_connmark xt_policy
xt_multiport xt_conntrack iptable_mangle ipt_ULOG ipt_TTL ipt_ttl
ipt_TOS ipt_tos ipt_TCPMSS ipt_SAME ipt_REJECT ipt_REDIRECT ipt_recent
ipt_owner ipt_NETMAP ipt_MASQUERADE ipt_LOG ipt_iprange ipt_hashlimit
ipt_ECN ipt_ecn ipt_DSCP ipt_dscp ipt_CLUSTERIP ipt_ah ipt_addrtype
ip_nat_irc ip_nat_tftp ip_nat_ftp ip_conntrack_irc ip_conntrack_tftp
ip_conntrack_ftp iptable_nat ip_nat ip_conntrack nfnetlink
iptable_filter ip_tables x_tables xenbus_be xenblk
Pid: 11703, comm: find Tainted: P      2.6.18-xenU5 #6
RIP: e030:[<ffffffff880d3462>]
[<ffffffff880d3462>] :libafs:InstallUVolumeEntry+0x162/0x480
RSP: e02b:ffff88003a80bac8  EFLAGS: 00010206
RAX: 00000000006b5aa0 RBX: ffffc200001d8850 RCX: ffff88003f08e080
RDX: 000000005c0c0000 RSI: ffff88003a80ba94 RDI: ffff88003e20d9c0
RBP: 0000000000000000 R08: ffff88003a80bd68 R09: 0000000000000000
R10: 0000000000000020 R11: 0000000000000008 R12: ffff88003e529400
R13: 00000000006b5aa0 R14: ffff880045083e00 R15: ffffc200001d8850
FS:  00002b0b5e7986d0(0000) GS:ffffffff80582000(0000)
knlGS:0000000000000000
CS:  e033 DS: 0000 ES: 0000
Process find (pid: 11703, threadinfo ffff88003a80a000, task
ffff88003f08e080)
Stack:  ffffffff88134fa0 0000000000000034 ffff88003a80bb68
ffff88003e20d9e0
 ffff88003a80bd68 ffff88003e20d9c0 000000a600000000 ffffffff8037c3a9
 ffff88003e20d9c0 0000000000000000 ffff8800230883c0 ffffffff80290439
Call Trace:
 [<ffffffff8037c3a9>] _atomic_dec_and_lock+0x39/0x58
 [<ffffffff80290439>] dput+0x34/0x153
 [<ffffffff802940eb>] mntput_no_expire+0x19/0x8b
 [<ffffffff880d3ea2>] :libafs:afs_SetupVolume+0x372/0x440
 [<ffffffff880d44d1>] :libafs:afs_NewVolumeByName+0x561/0x610
 [<ffffffff880a7cb7>] :libafs:afs_TraverseCells_nl+0x37/0x60
 [<ffffffff880d4607>] :libafs:afs_GetVolumeByName+0x87/0x140
 [<ffffffff880ca557>] :libafs:EvalMountPoint+0x1d7/0x400
 [<ffffffff880ca8ac>] :libafs:afs_EvalFakeStat_int+0x12c/0x3e0
 [<ffffffff880c3a7c>] :libafs:afs_access+0x9c/0x380
 [<ffffffff880f7faf>] :libafs:afs_linux_permission+0x7f/0xf0
 [<ffffffff8028758a>] permission+0x81/0xc8
 [<ffffffff802886a7>] may_open+0x58/0x21e
 [<ffffffff8028ae4a>] open_namei+0x2b5/0x6c6
 [<ffffffff802792e3>] do_filp_open+0x1c/0x38
 [<ffffffff80279343>] do_sys_open+0x44/0xbe
 [<ffffffff80209d7a>] system_call+0x86/0x8b
 [<ffffffff80209cf4>] system_call+0x0/0x8b


Code: 43 8b 84 ac 80 01 00 00 85 44 24 4c 0f 84 cc 02 00 00 a8 20=20
RIP  [<ffffffff880d3462>] :libafs:InstallUVolumeEntry+0x162/0x480
 RSP <ffff88003a80bac8>
CR2: ffff880040000000

The failed command was find -L /afs/wszib.edu.pl/
I think the oops happened when I pressed ctrl-C to kill it, but I'm not
exactly sure.

Jakub.



> _______________________________________________
> OpenAFS-info mailing list
> OpenAFS-info@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-info

--=-c/EZ53ebAjfzHifbYmS5
Content-Type: application/pgp-signature; name=signature.asc
Content-Description: To jest =?UTF-8?Q?cz=C4=99=C5=9B=C4=87?= listu
	podpisana cyfrowo

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.3 (GNU/Linux)

iD8DBQBFyaXQozPug5EmwHARAnG2AJ0SIl61+qykVw1/y0N7/KHeEQRWXwCgk43z
TtnvFTqL8OiFOGD2+hDY8LA=
=UJrk
-----END PGP SIGNATURE-----

--=-c/EZ53ebAjfzHifbYmS5--