[OpenAFS] find /afs/ breaking the client?

Derrick J Brashear shadow@dementia.org
Wed, 7 Feb 2007 09:19:39 -0500 (EST)


  This message is in MIME format.  The first part should be readable text,
  while the remaining parts are likely unreadable without MIME-aware tools.

---559023410-2042780465-1170857979=:10874
Content-Type: TEXT/PLAIN; charset=iso-8859-1; format=flowed
Content-Transfer-Encoding: quoted-printable
X-MIME-Autoconverted: from 8bit to quoted-printable by meredith.dementia.org id l17EJeP6027726

We fixed that bug in what will be 1.4.3rc2.

On Wed, 7 Feb 2007, Jakub Witkowski wrote:

> Dnia 04-02-2007, nie o godzinie 10:21 -0500, Derrick J Brashear
> napisa=FF=FF(a):
>> On Sun, 4 Feb 2007, Jakub Witkowski wrote:
>>
>>>> Well, we haven't recommended 1.5.14 so I'm curious why you chose it,=
 but,
>>>> do you have an oops?
>>>>
>>>
>>> No, no oops. The system just... blocks. You can interact with program=
s
>>> already in memory, access open files, but not open new.
>>>
>>> I chose .14 mostly because I was having problems building the module =
for
>>> Xen kernel and this version simply was first that I got compiled. I m=
ay
>>> fall back to something more stable now, as I know how to get things
>>> running.
>>>
>>> Which OpenAFS version you recommend for installation on a client? On =
a
>>> server?
>>
>> For Linux, we haven't recommended any 1.5.x client. 1.4.2, generally,
>> though 1.4.3rc2 should be out in a day or so.
>>
>> If you can get cmdebug information when it's hung, that's be useful to
>> see.
>
> I have done some experiments and my findings are not exactly optimistic.
> First of all, I found out that the hang was actually caused by some
> weird interaction between OpenAFS client and libnss-ldap library; in
> test enviroinment I can reproduce the systemwide hang described above
> when I set up nsswitch library to look uids up in ldap, but if it is no=
t
> configured to do so, only the find process hangs - and then, only for a
> few minutes. Adding -fakestat-all switch makes the problem less
> pronounced (i.e. find lists more files) but not go away.
>
> On the other hand, 1.4.2 appears to be free of this problem, at least I
> have not yet found a way to crash nor hang it.
>
> 1.4.3rc1 has a bug:
>
> Unable to handle kernel paging request at ffff880040000000 RIP:
> [<ffffffff880d3462>] :libafs:InstallUVolumeEntry+0x162/0x480
> PGD e74067 PUD 1076067 PMD 1077067 PTE 0
> Oops: 0000 [1] SMP
> CPU 0
> Modules linked in: libafs xt_tcpudp ip6table_filter ip6_tables xt_state
> xt_pkttype iptable_raw xt_CLASSIFY xt_CONNMARK xt_connmark xt_policy
> xt_multiport xt_conntrack iptable_mangle ipt_ULOG ipt_TTL ipt_ttl
> ipt_TOS ipt_tos ipt_TCPMSS ipt_SAME ipt_REJECT ipt_REDIRECT ipt_recent
> ipt_owner ipt_NETMAP ipt_MASQUERADE ipt_LOG ipt_iprange ipt_hashlimit
> ipt_ECN ipt_ecn ipt_DSCP ipt_dscp ipt_CLUSTERIP ipt_ah ipt_addrtype
> ip_nat_irc ip_nat_tftp ip_nat_ftp ip_conntrack_irc ip_conntrack_tftp
> ip_conntrack_ftp iptable_nat ip_nat ip_conntrack nfnetlink
> iptable_filter ip_tables x_tables xenbus_be xenblk
> Pid: 11703, comm: find Tainted: P      2.6.18-xenU5 #6
> RIP: e030:[<ffffffff880d3462>]
> [<ffffffff880d3462>] :libafs:InstallUVolumeEntry+0x162/0x480
> RSP: e02b:ffff88003a80bac8  EFLAGS: 00010206
> RAX: 00000000006b5aa0 RBX: ffffc200001d8850 RCX: ffff88003f08e080
> RDX: 000000005c0c0000 RSI: ffff88003a80ba94 RDI: ffff88003e20d9c0
> RBP: 0000000000000000 R08: ffff88003a80bd68 R09: 0000000000000000
> R10: 0000000000000020 R11: 0000000000000008 R12: ffff88003e529400
> R13: 00000000006b5aa0 R14: ffff880045083e00 R15: ffffc200001d8850
> FS:  00002b0b5e7986d0(0000) GS:ffffffff80582000(0000)
> knlGS:0000000000000000
> CS:  e033 DS: 0000 ES: 0000
> Process find (pid: 11703, threadinfo ffff88003a80a000, task
> ffff88003f08e080)
> Stack:  ffffffff88134fa0 0000000000000034 ffff88003a80bb68
> ffff88003e20d9e0
> ffff88003a80bd68 ffff88003e20d9c0 000000a600000000 ffffffff8037c3a9
> ffff88003e20d9c0 0000000000000000 ffff8800230883c0 ffffffff80290439
> Call Trace:
> [<ffffffff8037c3a9>] _atomic_dec_and_lock+0x39/0x58
> [<ffffffff80290439>] dput+0x34/0x153
> [<ffffffff802940eb>] mntput_no_expire+0x19/0x8b
> [<ffffffff880d3ea2>] :libafs:afs_SetupVolume+0x372/0x440
> [<ffffffff880d44d1>] :libafs:afs_NewVolumeByName+0x561/0x610
> [<ffffffff880a7cb7>] :libafs:afs_TraverseCells_nl+0x37/0x60
> [<ffffffff880d4607>] :libafs:afs_GetVolumeByName+0x87/0x140
> [<ffffffff880ca557>] :libafs:EvalMountPoint+0x1d7/0x400
> [<ffffffff880ca8ac>] :libafs:afs_EvalFakeStat_int+0x12c/0x3e0
> [<ffffffff880c3a7c>] :libafs:afs_access+0x9c/0x380
> [<ffffffff880f7faf>] :libafs:afs_linux_permission+0x7f/0xf0
> [<ffffffff8028758a>] permission+0x81/0xc8
> [<ffffffff802886a7>] may_open+0x58/0x21e
> [<ffffffff8028ae4a>] open_namei+0x2b5/0x6c6
> [<ffffffff802792e3>] do_filp_open+0x1c/0x38
> [<ffffffff80279343>] do_sys_open+0x44/0xbe
> [<ffffffff80209d7a>] system_call+0x86/0x8b
> [<ffffffff80209cf4>] system_call+0x0/0x8b
>
>
> Code: 43 8b 84 ac 80 01 00 00 85 44 24 4c 0f 84 cc 02 00 00 a8 20
> RIP  [<ffffffff880d3462>] :libafs:InstallUVolumeEntry+0x162/0x480
> RSP <ffff88003a80bac8>
> CR2: ffff880040000000
>
> The failed command was find -L /afs/wszib.edu.pl/
> I think the oops happened when I pressed ctrl-C to kill it, but I'm not
> exactly sure.
>
> Jakub.
>
>
>
>> _______________________________________________
>> OpenAFS-info mailing list
>> OpenAFS-info@openafs.org
>> https://lists.openafs.org/mailman/listinfo/openafs-info
>
---559023410-2042780465-1170857979=:10874--