[OpenAFS-devel] OpenAFS release team weekly meeting

markus.suvanto@gmail.com markus.suvanto@gmail.com
Mon, 15 Jan 2018 00:45:52 +0200


On Sun, 2018-01-14 at 20:33 +0000, Mark Vitale wrote:
> Markus,
> 
> Thank you for your report.  See my comments interleaved below:
> 
> > On Jan 12, 2018, at 1:37 PM, markus.suvanto@gmail.com wrote:
> > 
> > Openafs kernel module compiled from git using version: 1.8.0pre4
> > 
> > emerge --info
> > 
> > Portage 2.3.13 (python 2.7.14-final-0,
> > default/linux/amd64/17.0/desktop/gnome/systemd, gcc-7.2.0, glibc-2.26-r5,
> > 4.15.0-rc7 x86_64)
> > =================================================================
> > System uname: Linux-4.15.0-rc7-x86_64-Intel-R-_Xeon-R-_CPU_E5530_@_2.40GHz-with-
> > 
> > kernel: afs: disk cache read error in CacheItems slot 69050 off 5524020/8000020 code -4/80
> 
> Code -4 is EINTR (interrupt) during the read.
> Since this happened in afs_GetValidDSlot (afs_UFSGetDSlot), it returns NULL for the dcache (tdc).
> This causes the caller, afs_InvalidateAllSegments(), to panic with the following:
> 
> > kernel: openafs: afs_InvalidateAllSegments tdc count
> > kernel: ------------[ cut here ]------------
> > kernel: Kernel BUG at 0000000080540eff [verbose debug info unavailable]
> > kernel: invalid opcode: 0000 [#1] SMP
> > kernel: Modules linked in: libafs(PO) mcryptd sha256_ssse3 sha256_generic
> > cfg80211 cbc rbd libceph iptable_nat nf_nat_ipv4 nf_nat xt_tcpudp xt_physdev
> > br_netfilter nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack
> > libcrc32c crc32c_generic iptable_filter ip_tables x_tables nf_tables nfnetlink
> > bridge stp llc mousedev joydev hid_logitech_hidpp snd_hda_codec_realtek
> > snd_hda_codec_generic snd_hda_codec_hdmi hp_wmi snd_hda_intel sparse_keymap
> > snd_hda_codec gpio_ich rfkill wmi_bmof psmouse snd_hda_core intel_powerclamp
> > pcspkr snd_pcm wmi snd_timer rtc_cmos snd evdev usbmouse hid_logitech_dj
> > input_leds acpi_cpufreq lpc_ich soundcore i7core_edac button sch_fq_codel
> > kyber_iosched bfq vhost_net vhost tap tun kvm_intel kvm irqbypass smsc47b397
> > coretemp hid_generic usbkbd btrfs usbhid xor zstd_decompress
> > kernel:  zstd_compress xxhash raid6_pq sr_mod sd_mod cdrom amdgpu uhci_hcd chash
> > i2c_algo_bit backlight drm_kms_helper cfbfillrect syscopyarea cfbimgblt
> > sysfillrect sysimgblt fb_sys_fops cfbcopyarea fb font fbdev ahci ttm libahci
> > crc32c_intel atkbd libata tg3 drm serio_raw ehci_pci firewire_ohci ehci_hcd ptp
> > scsi_mod firewire_core pps_core usbcore libphy crc_itu_t agpgart hwmon i2c_core
> > floppy unix ipv6 autofs4
> > kernel: CPU: 15 PID: 91713 Comm: tracker-store Tainted:
> > P          IO     4.15.0-rc7 #1
> > kernel: Hardware name: Hewlett-Packard HP Z600 Workstation/0AE8h, BIOS 786G4
> > v03.19 03/11/2011
> > kernel: RIP: 0010:afs_InvalidateAllSegments+0x42e/0x430 [libafs]
> > kernel: RSP: 0000:ffffc9000b1f3e28 EFLAGS: 00010292
> > kernel: RAX: 000000000000002c RBX: 0000000000000001 RCX: ffffffff81c3eb98
> > kernel: RDX: 0000000000000001 RSI: 0000000000000086 RDI: ffffffff81f79584
> > kernel: RBP: ffff8802fff63740 R08: 0000000000001518 R09: ffffffff81f7b9c2
> > kernel: R10: ffff8801a9807000 R11: 0000000000000000 R12: 0000000000010dba
> > kernel: R13: 0000000000000000 R14: 00000000000006d2 R15: 0000000000000000
> > kernel: FS:  00007ff04f68f7c0(0000) GS:ffff88032fdc0000(0000)
> > knlGS:0000000000000000
> > kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > kernel: CR2: 00007f91fc057630 CR3: 00000001390c1000 CR4: 00000000000026e0
> > kernel: Call Trace:
> > kernel:  afs_StoreAllSegments+0x584/0xbe0 [libafs]
> > kernel:  afs_linux_flush+0x482/0x500 [libafs]
> > kernel:  filp_close+0x22/0x70
> > kernel:  SyS_close+0x1a/0x40
> > kernel:  entry_SYSCALL_64_fastpath+0x13/0x6c
> > kernel: RIP: 0033:0x7ff04e2d2910
> > kernel: RSP: 002b:00007ffcc2324530 EFLAGS: 00000293
> > kernel: Code: 48 c7 c7 a0 82 bd a0 e8 51 9e ff ff e9 79 ff ff ff 48 c7 c7 e0 d4
> > bc a0 e8 15 68 54 e0 0f 0b 48 c7 c7 b0 d4 bc a0 e8 07 68 54 e0 <0f> 0b 41 57 41
> > 56 41 55 41 54 55 53 48 89 fb 48 81 c7 b0 02 00 
> > kernel: RIP: afs_InvalidateAllSegments+0x42e/0x430 [libafs] RSP:
> > ffffc9000b1f3e28
> > kernel: —[ end trace 709a142c7fd521a4 ]---
> 
> How many times have you seen this problem?  Are you able to reproduce it at will?
I have seen this a few times. It seems that it happends only
a) when I'am logging out of gnome (in my case wayland) session 
b) when I'am unlocking gnome screensaver
c) when I'am changing virtual terminal

> What is the backend filesystem for your AFS cache partition?
AFS cache partition is btrfs subvolume, 
cacheinfo: /afs:/mnt/ssd/openafs_cache:20000000

> Is it possible it was slow or hung at the time, leading someone to try
> an interrupt to free the hang?  Could you share the syslog that precedes
> the panic?
My home directory is under afs and there is other afs volumes mounted also:
/afs/my_chell/user/my_home_dir 
/afs/my_chell/user/my_home_dir/another_afs_volume

Maybe DAFS file server has detaches unused volume "another_afs_volume"
and some gnome process goes mad when volume attach takes long time?
(Some of my /vicep are hosted under CEPH -> slow attach time sometimes)

Sorry, systemd has destroyed logs already

> Are you able to share the core file from this panic?
> If not, would you be willing to examine it with the ‘crash’ utiltity
> and provide the backtraces from the other OpenAFS kernel threads at
> the time of the crash?
Sorry, coredumpctl didn't catch any core file.

-Markus