[OpenAFS] Kernel panic running 1.8.4 on linux 3.10.0-1062.7.1.el7.x86_64

Neil Brown neilb+afs@inf.ed.ac.uk
Thu, 9 Jan 2020 13:02:35 +0000 (GMT)


Hi,

This is just a heads up in case anyone else has seen or sees something 
similar.

We've just updated an SL 7.6 AFS file server to the above version:

Linux bulleid 3.10.0-1062.7.1.el7.x86_64 #1 SMP Thu Dec 5 14:45:00 CST 2019 x86_64 x86_64 x86_64 GNU/Linux

A reboot and couple of hours or so later the machine panicked. It was in 
the middle of some volume moves at the time.

The machine had quite happily moved TBs of data the preceding days with 
the previous kernel, but only lasted a couple of hours and moved GBs of 
data before it crashed.

The two things (the moves and the crash) could be completely unrelated, 
and nothing to do with AFS, we're going to do some more moves to see 
if we can trigger the crash again.

The console log dump is below. I had to google jbd2 to find out it's 
something to do with the journaling filesystem. We are serving our 
/vicep's from ext4 filesystems.

Neil

[-- MARK -- Wed Jan  8 17:00:00 2020]
[ 6485.225355] perf: interrupt took too long (3142 > 3137), lowering kernel.perf_event_max_sample_rate to 63000
[ 8604.874829] perf: interrupt took too long (3929 > 3927), lowering kernel.perf_event_max_sample_rate to 50000
[ 9499.498630] ------------[ cut here ]------------
[ 9499.503791] kernel BUG at fs/jbd2/journal.c:783!
[ 9499.508953] invalid opcode: 0000 [#1] SMP 
[ 9499.513551] Modules linked in: fuse btrfs raid6_pq xor vfat msdos fat xfs libcrc32c dm_mod bonding sb_edac intel_powerclamp coretemp intel_rapl iosf_mbi crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt iTCO_vendor_support dcdbas mxm_wmi pcspkr mei_me lpc_ich mei sg pcc_cpufreq wmi acpi_power_meter auth_rpcgss sunrpc ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ahci ttm ipmi_si tg3 libahci drm crct10dif_pclmul ipmi_devintf crct10dif_common ptp megaraid_sas libata crc32c_intel ipmi_msghandler drm_panel_orientation_quirks pps_core
[ 9499.581070] CPU: 3 PID: 1090 Comm: jbd2/sdc4-8 Not tainted 3.10.0-1062.7.1.el7.x86_64 #1
[ 9499.590117] Hardware name: Dell Inc. PowerEdge R730xd/0WCJNT, BIOS 2.8.0 005/17/2018
[ 9499.598777] task: ffff932f68d6e2a0 ti: ffff933e382e0000 task.ti: ffff933e382e0000
[ 9499.607146] RIP: 0010:[<ffffffffc03db209>]  [<ffffffffc03db209>] jbd2_journal_next_log_block+0x79/0x80 [jbd2]
[ 9499.618250] RSP: 0018:ffff933e382e3c90  EFLAGS: 00010246
[ 9499.624187] RAX: 0000000000000001 RBX: ffff933e2aead000 RCX: 000000000000000c
[ 9499.632166] RDX: 000000000001c817 RSI: ffff933e382e3d38 RDI: ffff933e2aead02c
[ 9499.640145] RBP: ffff933e382e3ca8 R08: ffff933e2c675958 R09: 0000000000000000
[ 9499.648124] R10: 0000000000000001 R11: 0000000000ea4e7b R12: ffff933e2aead028
[ 9499.656103] R13: ffff933e382e3d38 R14: ffff933e2aead000 R15: ffff932f68d6e2a0
[ 9499.664082] FS:  0000000000000000(0000) GS:ffff933e3f2c0000(0000) knlGS:0000000000000000
[ 9499.673131] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 9499.679555] CR2: 00007f142e7d6000 CR3: 0000000f78410000 CR4: 00000000001607e0
[ 9499.687537] Call Trace:
[ 9499.690272]  [<ffffffffc03d3828>] jbd2_journal_commit_transaction+0x788/0x19f0 [jbd2]
[ 9499.699032]  [<ffffffffbaa2b59e>] ? __switch_to+0xce/0x580
[ 9499.705168]  [<ffffffffc03d9ee9>] kjournald2+0xc9/0x260 [jbd2]
[ 9499.711693]  [<ffffffffbaac72e0>] ? wake_up_atomic_t+0x30/0x30
[ 9499.718216]  [<ffffffffc03d9e20>] ? commit_timeout+0x10/0x10 [jbd2]
[ 9499.725225]  [<ffffffffbaac61f1>] kthread+0xd1/0xe0
[ 9499.730680]  [<ffffffffbaac6120>] ? insert_kthread_work+0x40/0x40
[ 9499.737486]  [<ffffffffbb18dd37>] ret_from_fork_nospec_begin+0x21/0x21
[ 9499.744786]  [<ffffffffbaac6120>] ? insert_kthread_work+0x40/0x40
[ 9499.751600] Code: 03 00 00 75 0e 48 8b 83 30 03 00 00 48 89 83 18 03 00 00 f0 41 ff 44 24 04 4c 89 ea 48 89 df e8 0e ff ff ff 5b 41 5c 41 5d 5d c3 <0f> 0b 0f 1f 44 00 00 0f 1f 44 00 00 55 48 89 e5 41 54 49 89 fc 
[ 9499.773333] RIP  [<ffffffffc03db209>] jbd2_journal_next_log_block+0x79/0x80 [jbd2]
[ 9499.781812]  RSP <ffff933e382e3c90>
[ 9499.788431] ---[ end trace b851f55eb110a37b ]---
2020-01-08T17:57:24.931211+00:00 bulleid kernel:[ 9499.856790] Kernel panic - not syncing: Fatal exception
  kernel BUG at f[ 9499.958226] Kernel Offset: 0x39a00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
[-- MARK -- Wed Jan  8 18:00:00 2020 .. Wed Jan  8 22:00:00 2020 -- MARK --]

-- 
  Neil Brown - Computing Officer - Appleton Tower 7.12a | Neil.Brown @ ed. ac.uk
  School of Informatics, University of Edinburgh        | Tel: +44 131 6504422

The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.