[OpenAFS] AFS crashed at access to "~/.IceAuthority"

Frédéric Grelot fredericg_99@yahoo.fr
Tue, 17 Nov 2009 23:16:29 +0100 (CET)


Hi everybody, 

I'm using openafs 1.4.11 (default version on ubuntu karmic), with a "standard" kerberos configuration, together with libnss-afs and nscd to allow user home directories to be seamlessly translated.
I just experienced several complete afs crashes, while I was trying to run evince (the pdf reader). It seems, according to dmesg, that it happens when evince tries to access the special file "~/.ICEauthority", which must be (I'm not sure) a socket file of some sort.
The session then hangs completely, and I couldn't find any other way than logging as root and "# reboot" to get my system back.

I managed to get the logs with dmesg : the first time, it mentionned .ICEauthority, which I deleted before trying again, but it crashed anyway.
The first log (syslog) is :
kernel: [ 9522.432659] type=1503 audit(1258492388.453:26): operation="open" pid=5987 parent=1 profile="/usr/bin/evince" requested_mask="r::" denied_mask="r::" fsuid=20001 ouid=20001 name="/afs/cell.tld/use=
r/u/us/user/.ICEauthority"
kernel: [ 9522.432947] openafs: Can't open inode 62749
kernel: [ 9522.433032] Modules linked in: (lots of stuff)
kernel: [ 9522.433193] RIP: 0010:[<ffffffffa0ff3d35>]  [<ffffffffa0ff3d35>] osi_UFSOpen+0x1e5/0x240 [openafs]
kernel: [ 9522.433400]  [<ffffffffa0fcf4ee>] afs_UFSWrite+0x3be/0x9f0 [openafs]
kernel: [ 9522.433476]  [<ffffffffa0ff84f7>] afs_linux_writepage_sync+0x267/0x4a0 [openafs]
kernel: [ 9522.433540]  [<ffffffffa0ff875d>] afs_linux_write_end+0x2d/0x50 [openafs]
kernel: [ 9522.433669]  [<ffffffffa0ff2871>] ? crget+0x21/0xa0 [openafs]
kernel: [ 9522.433707]  [<ffffffffa0fb1799>] ? PagInCred+0x49/0xd0 [openafs]
kernel: [ 9522.433750]  [<ffffffffa0ff6af5>] afs_linux_write+0x285/0x3f0 [openafs]
kernel: [ 9522.433879] RIP  [<ffffffffa0ff3d35>] osi_UFSOpen+0x1e5/0x240 [openafs]
kernel: [ 9526.345486] type=1503 audit(1258492392.363:28): operation="open" pid=5990 parent=5989 profile="/usr/bin/evince" requested_mask="r::" denied_mask="r::" fsuid=20001 ouid=20001 name="/afs/cell.tld/u/us/user/.ICEauthority"


The last one is a bit different (after .ICEauthority was manually deleted), since it mentions "/var" as the erroneous path, still accessed by openafs :

[  519.517002] __ratelimit: 6 callbacks suppressed
[  519.517010] type=1503 audit(1258494347.339:26): operation="open" pid=3672 parent=3671 profile="/usr/bin/evince-thumbnailer" requested_mask="::rw" denied_mask="::rw" fsuid=20001 ouid=0 name="/var"
[  519.517021] openafs: Can't open inode 104085
[  519.517077] ------------[ cut here ]------------
[  519.517082] kernel BUG at /var/lib/dkms/openafs/1.4.11/build/src/libafs/MODLOAD-2.6.31-14-generic-MP/osi_file.c:87!
[  519.517088] invalid opcode: 0000 [#1] SMP 
[  519.517096] last sysfs file: /sys/devices/virtual/block/dm-3/uevent
[  519.517102] CPU 0 
[  519.517107] Modules linked in: binfmt_misc ppdev openafs(P) bnep ipmi_msghandler vboxnetadp vboxnetflt vboxdrv kvm_intel kvm kqemu dlm configfs nfs lockd nfs_acl auth_rpcgss sunrpc bridge stp snd_hda_codec_si3054 snd_hda_codec_realtek snd_usb_audio snd_usb_lib snd_seq_dummy arc4 snd_seq_oss snd_hda_intel snd_hda_codec snd_seq_midi ecb iwlagn snd_rawmidi joydev iwlcore snd_seq_midi_event snd_hwdep sdhci_pci nvidia(P) uvcvideo snd_seq snd_pcm_oss iptable_filter mac80211 snd_mixer_oss snd_seq_device videodev sdhci snd_pcm tpm_infineon snd_timer ip_tables asus_laptop v4l1_compat psmouse tpm v4l2_compat_ioctl32 serio_raw lp btusb tpm_bios led_class snd_page_alloc snd parport cfg80211 soundcore x_tables raid10 raid456 usbhid raid6_pq async_xor async_memcpy async_tx raid1 raid0 multipath linear dm_raid45 xor ohci1394 ieee1394 video output atl1 mii intel_agp
[  519.517263] Pid: 3672, comm: evince-thumbnai Tainted: P           2.6.31-14-generic #48-Ubuntu M51Sn               
[  519.517269] RIP: 0010:[<ffffffffa10c1d35>]  [<ffffffffa10c1d35>] osi_UFSOpen+0x1e5/0x240 [openafs]
[  519.517325] RSP: 0018:ffff88008edf19f8  EFLAGS: 00010296
[  519.517331] RAX: 0000000000000036 RBX: ffff8800b5c05000 RCX: 000000000000001e
[  519.517337] RDX: 0000000000000000 RSI: 0000000000000086 RDI: 0000000000000246
[  519.517342] RBP: ffff88008edf1a28 R08: 0000000000000032 R09: 000000000000eb2d
[  519.517348] R10: 0000000000000005 R11: 0000000000000000 R12: 0000000000019695
[  519.517354] R13: ffffc90013e66740 R14: 0000000000001000 R15: 0000000000000001
[  519.517361] FS:  00007f6247f04790(0000) GS:ffff880001a00000(0000) knlGS:0000000000000000
[  519.517367] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  519.517373] CR2: 00007fff988ff448 CR3: 000000008b1c5000 CR4: 00000000000026f0
[  519.517379] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  519.517385] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[  519.517392] Process evince-thumbnai (pid: 3672, threadinfo ffff88008edf0000, task ffff880093935ac0)
[  519.517397] Stack:
[  519.517401]  0000000000019695 ffffc90013e666a8 ffff880091d85f80 ffffc90013e66740
[  519.517412] <0> ffffc90013e666a8 ffff880091d85f80 ffff88008edf1b28 ffffffffa1096e06
[  519.517425] <0> 0000000000000000 ffffc90013e666a8 ffff880000000000 ffff880091d86208
[  519.517439] Call Trace:
[  519.517482]  [<ffffffffa1096e06>] afs_UFSRead+0x256/0x1270 [openafs]
[  519.517528]  [<ffffffffa10c0871>] ? crget+0x21/0xa0 [openafs]
[  519.517572]  [<ffffffffa10c6bec>] afs_linux_readpage+0x12c/0x460 [openafs]
[  519.517586]  [<ffffffff81277827>] ? radix_tree_insert+0x1c7/0x210
[  519.517598]  [<ffffffff810f1975>] ? __inc_zone_page_state+0x25/0x30
[  519.517607]  [<ffffffff810db7cc>] T.768+0xfc/0x440
[  519.517616]  [<ffffffff810dbbc6>] generic_file_aio_read+0xb6/0x1d0
[  519.517627]  [<ffffffff8111ec32>] do_sync_read+0xf2/0x130
[  519.517637]  [<ffffffff81078b30>] ? autoremove_wake_function+0x0/0x40
[  519.517681]  [<ffffffffa10c0871>] ? crget+0x21/0xa0 [openafs]
[  519.517719]  [<ffffffffa107f799>] ? PagInCred+0x49/0xd0 [openafs]
[  519.517763]  [<ffffffffa10c4d60>] afs_linux_read+0x100/0x2a0 [openafs]
[  519.517774]  [<ffffffff8142007c>] ? apparmor_file_permission+0xac/0xd0
[  519.517784]  [<ffffffff8111f215>] vfs_read+0xb5/0x1a0
[  519.517792]  [<ffffffff8111f81c>] sys_read+0x4c/0x80
[  519.517802]  [<ffffffff81012002>] system_call_fastpath+0x16/0x1b
[  519.517807] Code: d3 4d 46 e0 0f 0b eb fe 48 c7 c7 48 b8 0d a1 31 c0 e8 c1 4d 46 e0 0f 0b eb fe 44 89 e6 48 c7 c7 98 b9 0d a1 31 c0 e8 ac 4d 46 e0 <0f> 0b eb fe 44 89 e6 48 c7 c7 68 b9 0d a1 e8 99 4d 46 e0 0f 0b 
[  519.517895] RIP  [<ffffffffa10c1d35>] osi_UFSOpen+0x1e5/0x240 [openafs]
[  519.517940]  RSP <ffff88008edf19f8>
[  519.517947] ---[ end trace d5ed6153240e0aa0 ]---

I may mention that /var is partition by itself, in ext4 format on top of lvm :
# df /var -ih
Sys. de fich.            Inodes   IUtil.  ILib. %IUti. Monté sur
/dev/mapper/vg00-local--var
                        239K    109K    130K   46% /var
# df /var -h
Sys. de fich.            Tail. Occ. Disp. %Occ. Monté sur
/dev/mapper/vg00-local--var
                      3,7G  1,5G  2,1G  42% /var

I set a cache of 500Mb, with some custom options to tune up network transfers by slicing my files in small chunks with :
/sbin/afsd -afsdb -dynroot -fakestat -chunksize 13
(this last parameter, chunksize 13, came as a conclusion of a discussion I initiated few months ago to avoid my network connection to drop at a ping of >2000ms when making lots of upload to the server)


I will appreciate any help on this issue : it is not a huge problem by itself (I can easily replace evince with okular for example), but I don't like when I just drop a problem without really solving it!



Thanks in advance, 

Frederic.