[OpenAFS] Kernel panic running 1.8.4 on linux 3.10.0-1062.7.1.el7.x86_64

Benjamin Kaduk kaduk@mit.edu
Sun, 12 Jan 2020 21:34:48 -0800


Hi Neil,

My money is on "unrelated", but thank you for sending the report with full
details!

-Ben

On Thu, Jan 09, 2020 at 01:02:35PM +0000, Neil Brown wrote:
> Hi,
> 
> This is just a heads up in case anyone else has seen or sees something 
> similar.
> 
> We've just updated an SL 7.6 AFS file server to the above version:
> 
> Linux bulleid 3.10.0-1062.7.1.el7.x86_64 #1 SMP Thu Dec 5 14:45:00 CST 2019 x86_64 x86_64 x86_64 GNU/Linux
> 
> A reboot and couple of hours or so later the machine panicked. It was in 
> the middle of some volume moves at the time.
> 
> The machine had quite happily moved TBs of data the preceding days with 
> the previous kernel, but only lasted a couple of hours and moved GBs of 
> data before it crashed.
> 
> The two things (the moves and the crash) could be completely unrelated, 
> and nothing to do with AFS, we're going to do some more moves to see 
> if we can trigger the crash again.
> 
> The console log dump is below. I had to google jbd2 to find out it's 
> something to do with the journaling filesystem. We are serving our 
> /vicep's from ext4 filesystems.
> 
> Neil
> 
> [-- MARK -- Wed Jan  8 17:00:00 2020]
> [ 6485.225355] perf: interrupt took too long (3142 > 3137), lowering kernel.perf_event_max_sample_rate to 63000
> [ 8604.874829] perf: interrupt took too long (3929 > 3927), lowering kernel.perf_event_max_sample_rate to 50000
> [ 9499.498630] ------------[ cut here ]------------
> [ 9499.503791] kernel BUG at fs/jbd2/journal.c:783!
> [ 9499.508953] invalid opcode: 0000 [#1] SMP 
> [ 9499.513551] Modules linked in: fuse btrfs raid6_pq xor vfat msdos fat xfs libcrc32c dm_mod bonding sb_edac intel_powerclamp coretemp intel_rapl iosf_mbi crc32_pclmul ghash_clmulni_intel aesni_intel lrw gf128mul glue_helper ablk_helper cryptd iTCO_wdt iTCO_vendor_support dcdbas mxm_wmi pcspkr mei_me lpc_ich mei sg pcc_cpufreq wmi acpi_power_meter auth_rpcgss sunrpc ip_tables ext4 mbcache jbd2 sd_mod crc_t10dif crct10dif_generic mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ahci ttm ipmi_si tg3 libahci drm crct10dif_pclmul ipmi_devintf crct10dif_common ptp megaraid_sas libata crc32c_intel ipmi_msghandler drm_panel_orientation_quirks pps_core
> [ 9499.581070] CPU: 3 PID: 1090 Comm: jbd2/sdc4-8 Not tainted 3.10.0-1062.7.1.el7.x86_64 #1
> [ 9499.590117] Hardware name: Dell Inc. PowerEdge R730xd/0WCJNT, BIOS 2.8.0 005/17/2018
> [ 9499.598777] task: ffff932f68d6e2a0 ti: ffff933e382e0000 task.ti: ffff933e382e0000
> [ 9499.607146] RIP: 0010:[<ffffffffc03db209>]  [<ffffffffc03db209>] jbd2_journal_next_log_block+0x79/0x80 [jbd2]
> [ 9499.618250] RSP: 0018:ffff933e382e3c90  EFLAGS: 00010246
> [ 9499.624187] RAX: 0000000000000001 RBX: ffff933e2aead000 RCX: 000000000000000c
> [ 9499.632166] RDX: 000000000001c817 RSI: ffff933e382e3d38 RDI: ffff933e2aead02c
> [ 9499.640145] RBP: ffff933e382e3ca8 R08: ffff933e2c675958 R09: 0000000000000000
> [ 9499.648124] R10: 0000000000000001 R11: 0000000000ea4e7b R12: ffff933e2aead028
> [ 9499.656103] R13: ffff933e382e3d38 R14: ffff933e2aead000 R15: ffff932f68d6e2a0
> [ 9499.664082] FS:  0000000000000000(0000) GS:ffff933e3f2c0000(0000) knlGS:0000000000000000
> [ 9499.673131] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 9499.679555] CR2: 00007f142e7d6000 CR3: 0000000f78410000 CR4: 00000000001607e0
> [ 9499.687537] Call Trace:
> [ 9499.690272]  [<ffffffffc03d3828>] jbd2_journal_commit_transaction+0x788/0x19f0 [jbd2]
> [ 9499.699032]  [<ffffffffbaa2b59e>] ? __switch_to+0xce/0x580
> [ 9499.705168]  [<ffffffffc03d9ee9>] kjournald2+0xc9/0x260 [jbd2]
> [ 9499.711693]  [<ffffffffbaac72e0>] ? wake_up_atomic_t+0x30/0x30
> [ 9499.718216]  [<ffffffffc03d9e20>] ? commit_timeout+0x10/0x10 [jbd2]
> [ 9499.725225]  [<ffffffffbaac61f1>] kthread+0xd1/0xe0
> [ 9499.730680]  [<ffffffffbaac6120>] ? insert_kthread_work+0x40/0x40
> [ 9499.737486]  [<ffffffffbb18dd37>] ret_from_fork_nospec_begin+0x21/0x21
> [ 9499.744786]  [<ffffffffbaac6120>] ? insert_kthread_work+0x40/0x40
> [ 9499.751600] Code: 03 00 00 75 0e 48 8b 83 30 03 00 00 48 89 83 18 03 00 00 f0 41 ff 44 24 04 4c 89 ea 48 89 df e8 0e ff ff ff 5b 41 5c 41 5d 5d c3 <0f> 0b 0f 1f 44 00 00 0f 1f 44 00 00 55 48 89 e5 41 54 49 89 fc 
> [ 9499.773333] RIP  [<ffffffffc03db209>] jbd2_journal_next_log_block+0x79/0x80 [jbd2]
> [ 9499.781812]  RSP <ffff933e382e3c90>
> [ 9499.788431] ---[ end trace b851f55eb110a37b ]---
> 2020-01-08T17:57:24.931211+00:00 bulleid kernel:[ 9499.856790] Kernel panic - not syncing: Fatal exception
>   kernel BUG at f[ 9499.958226] Kernel Offset: 0x39a00000 from 0xffffffff81000000 (relocation range: 0xffffffff80000000-0xffffffffbfffffff)
> [-- MARK -- Wed Jan  8 18:00:00 2020 .. Wed Jan  8 22:00:00 2020 -- MARK --]
> 
> -- 
>   Neil Brown - Computing Officer - Appleton Tower 7.12a | Neil.Brown @ ed. ac.uk
>   School of Informatics, University of Edinburgh        | Tel: +44 131 6504422
> 
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
> 
> _______________________________________________
> OpenAFS-info mailing list
> OpenAFS-info@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-info