[OpenAFS] AFS client hanged
Andreas Ladanyi
andreas.ladanyi@kit.edu
Mon, 16 Dec 2019 11:34:10 +0100
This is a multi-part message in MIME format.
--------------B7F92AC70B85453802299C4D
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 8bit
Hi ,
> Dear all,
>
> Recently, I'm stuck with some AFS issues.
>
> AFS client hanged with the following log message. In this case,
> the AFS instance blocked and jobs failed to access any files
> located in AFS. I have to reboot the work node to recover service.
>
> Dec 6 15:03:18 bws0825 kernel: INFO: task afs_callback:19124 blocked for more than 120 seconds.
> Dec 6 15:03:18 bws0825 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Dec 6 15:03:18 bws0825 kernel: afs_callback D ffff9860d826e180 0 19124 2 0x00000000
> Dec 6 15:03:18 bws0825 kernel: Call Trace:
> Dec 6 15:03:18 bws0825 kernel: afs_callback D ffff9860d826e180 0 19124 2 0x00000000
> Dec 6 15:03:18 bws0825 kernel: Call Trace:
> Dec 6 15:03:18 bws0825 kernel: [<ffffffffa2169df9>] schedule_preempt_disabled+0x29/0x70
> Dec 6 15:03:18 bws0825 kernel: [<ffffffffa2167d77>] __mutex_lock_slowpath+0xc7/0x1d0
> Dec 6 15:03:18 bws0825 kernel: [<ffffffffa216715f>] mutex_lock+0x1f/0x2f
> Dec 6 15:03:18 bws0825 kernel: [<ffffffffc084dff4>] SRXAFSCB_InitCallBackState+0x34/0x470 [openafs]
> Dec 6 15:03:18 bws0825 kernel: [<ffffffffc0898047>] ? afs_xdr_vector+0x57/0x90 [openafs]
> Dec 6 15:03:18 bws0825 kernel: [<ffffffffc084f19e>] SRXAFSCB_InitCallBackState3+0xe/0x10 [openafs]
> Dec 6 15:03:18 bws0825 kernel: [<ffffffffc08b6f43>] RXAFSCB_ExecuteRequest+0x6f3/0x8a0 [openafs]
> Dec 6 15:03:18 bws0825 kernel: [<ffffffffa1b028ae>] ? getnstimeofday64+0xe/0x30
> Dec 6 15:03:18 bws0825 kernel: [<ffffffffc08ae589>] ? afs_mutex_exit+0x29/0x40 [openafs]
> Dec 6 15:03:18 bws0825 kernel: [<ffffffffc08a6a5d>] rxi_ServerProc+0xcd/0x1e0 [openafs]
> Dec 6 15:03:18 bws0825 kernel: [<ffffffffc08c74c0>] ? afs_shutdown_pagecopy+0x20/0x20 [openafs]
> Dec 6 15:03:18 bws0825 kernel: [<ffffffffc08af017>] rx_ServerProc+0x87/0xe0 [openafs]
> Dec 6 15:03:18 bws0825 kernel: [<ffffffffc084eedd>] afs_RXCallBackServer+0x3d/0x50 [openafs]
> Dec 6 15:03:18 bws0825 kernel: [<ffffffffc08c76a5>] afsd_thread+0x1e5/0x730 [openafs]
> Dec 6 15:03:18 bws0825 kernel: [<ffffffffc08c74c0>] ? afs_shutdown_pagecopy+0x20/0x20 [openafs]
> Dec 6 15:03:18 bws0825 kernel: [<ffffffffa1ac1da1>] kthread+0xd1/0xe0
> Dec 6 15:03:18 bws0825 kernel: [<ffffffffa1ac1cd0>] ? insert_kthread_work+0x40/0x40
> Dec 6 15:03:18 bws0825 kernel: [<ffffffffa2175c1d>] ret_from_fork_nospec_begin+0x7/0x21
> Dec 6 15:03:18 bws0825 kernel: [<ffffffffa1ac1cd0>] ? insert_kthread_work+0x40/0x40
> Dec 6 15:03:18 bws0825 kernel: INFO: task afs_rxevent:19127 blocked for more than 120 seconds.
> Dec 6 15:03:18 bws0825 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Dec 6 15:03:18 bws0825 kernel: afs_rxevent D ffff9860cbbf6180 0 19127 2 0x00000000
> Dec 6 15:03:18 bws0825 kernel: Call Trace:
> Dec 6 15:03:18 bws0825 kernel: [<ffffffffa1aaa2d2>] ? del_timer_sync+0x52/0x60
> Dec 6 15:03:18 bws0825 kernel: [<ffffffffa2169df9>] schedule_preempt_disabled+0x29/0x70
> Dec 6 15:03:18 bws0825 kernel: [<ffffffffa2167d77>] __mutex_lock_slowpath+0xc7/0x1d0
> Dec 6 15:03:18 bws0825 kernel: [<ffffffffa216715f>] mutex_lock+0x1f/0x2f
> Dec 6 15:03:18 bws0825 kernel: [<ffffffffc08bdb58>] afs_osi_TimedSleep+0x118/0x210 [openafs]
> Dec 6 15:03:18 bws0825 kernel: [<ffffffffa1ad6b60>] ? wake_up_state+0x20/0x20
> Dec 6 15:03:18 bws0825 kernel: [<ffffffffc08bdce8>] afs_osi_Wait+0x98/0xd0 [openafs]
> Dec 6 15:03:18 bws0825 kernel: [<ffffffffc08c74c0>] ? afs_shutdown_pagecopy+0x20/0x20 [openafs]
> Dec 6 15:03:18 bws0825 kernel: [<ffffffffc08af575>] afs_rxevent_daemon+0x95/0x140 [openafs]
> Dec 6 15:03:18 bws0825 kernel: [<ffffffffc08c7af6>] afsd_thread+0x636/0x730 [openafs]
> Dec 6 15:03:18 bws0825 kernel: [<ffffffffc08c74c0>] ? afs_shutdown_pagecopy+0x20/0x20 [openafs]
> Dec 6 15:03:18 bws0825 kernel: [<ffffffffa1ac1da1>] kthread+0xd1/0xe0
> Dec 6 15:03:18 bws0825 kernel: [<ffffffffa1ac1cd0>] ? insert_kthread_work+0x40/0x40
> Dec 6 15:03:18 bws0825 kernel: [<ffffffffa2175c1d>] ret_from_fork_nospec_begin+0x7/0x21
> Dec 6 15:03:18 bws0825 kernel: [<ffffffffa1ac1cd0>] ? insert_kthread_work+0x40/0x40
> Dec 6 15:03:18 bws0825 kernel: [<ffffffffa2175c1d>] ret_from_fork_nospec_begin+0x7/0x21
> Dec 6 15:03:18 bws0825 kernel: [<ffffffffa1ac1cd0>] ? insert_kthread_work+0x40/0x40
> Dec 6 15:03:18 bws0825 kernel: INFO: task afs_checkserver:19870 blocked for more than 120 seconds.
> Dec 6 15:03:18 bws0825 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
> Dec 6 15:03:18 bws0825 kernel: afs_checkserver D ffff9860c7811040 0 19870 2 0x00000000
> Dec 6 15:03:18 bws0825 kernel: Call Trace:
> Dec 6 15:03:18 bws0825 kernel: [<ffffffffa1aaa2d2>] ? del_timer_sync+0x52/0x60
> Dec 6 15:03:18 bws0825 kernel: [<ffffffffa2169df9>] schedule_preempt_disabled+0x29/0x70
> Dec 6 15:03:18 bws0825 kernel: [<ffffffffa2167d77>] __mutex_lock_slowpath+0xc7/0x1d0
> Dec 6 15:03:18 bws0825 kernel: [<ffffffffa216715f>] mutex_lock+0x1f/0x2f
> Dec 6 15:03:18 bws0825 kernel: [<ffffffffc08bdb58>] afs_osi_TimedSleep+0x118/0x210 [openafs]
> Dec 6 15:03:18 bws0825 kernel: [<ffffffffa1ad6b60>] ? wake_up_state+0x20/0x20
> Dec 6 15:03:18 bws0825 kernel: [<ffffffffc08bdce8>] afs_osi_Wait+0x98/0xd0 [openafs]
> Dec 6 15:03:18 bws0825 kernel: [<ffffffffc0853b08>] afs_CheckServerDaemon+0x118/0x1a0 [openafs]
> Dec 6 15:03:18 bws0825 kernel: [<ffffffffc08c74c0>] ? afs_shutdown_pagecopy+0x20/0x20 [openafs]
> Dec 6 15:03:18 bws0825 kernel: [<ffffffffc08c7930>] afsd_thread+0x470/0x730 [openafs]
> Dec 6 15:03:18 bws0825 kernel: [<ffffffffc08c74c0>] ? afs_shutdown_pagecopy+0x20/0x20 [openafs]
> Dec 6 15:03:18 bws0825 kernel: [<ffffffffa1ac1da1>] kthread+0xd1/0xe0
> Dec 6 15:03:18 bws0825 kernel: [<ffffffffa1ac1cd0>] ? insert_kthread_work+0x40/0x40
> Dec 6 15:03:18 bws0825 kernel: [<ffffffffa2175c1d>] ret_from_fork_nospec_begin+0x7/0x21
>
Is there an IO intensive process running in the background ?
Is there an process which uses too much RAM ?
>
>
> Does the 1.6.23 is not compatible with the linux kernel or AFS
> server version?
>
SL7 has kernel 3.10, since AFS 1.6.4
SL6 has kernel 2.6, support before AFS 1.6
Since AFS 1.6.22.4 kernel support up to 4.18 is included
>
> Any information you provided would be appreciated. Thanks.
>
>
> Regards,
> Qiulan
>
>
> ------------------------------------------------------------------------
> huangql
> ====================================================================
> Computing center,the Institute of High Energy Physics, CAS, China
> Qiulan Huang Tel: (+86) 10 8823 6087
> P.O. Box 918-7 Fax: (+86) 10 8823 6839
> Beijing 100049 P.R. China Email: huangql@ihep.ac.cn
> ===================================================================
>
--------------B7F92AC70B85453802299C4D
Content-Type: text/html; charset=windows-1252
Content-Transfer-Encoding: 8bit
<html>
<head>
<meta http-equiv="Content-Type" content="text/html;
charset=windows-1252">
</head>
<body>
<div class="moz-cite-prefix">Hi ,<br>
</div>
<div class="moz-cite-prefix"><br>
</div>
<blockquote type="cite" cite="mid:2019121614534294436611@ihep.ac.cn">
<meta http-equiv="content-type" content="text/html;
charset=windows-1252">
<style>body { line-height: 1.5; }blockquote { margin-top: 0px; margin-bottom: 0px; margin-left: 0.5em; }div.foxdiv20191216145306432838 { }body { font-size: 10.5pt; font-family: ????; color: rgb(0, 0, 0); line-height: 1.5; }</style>
<div><span></span></div>
<blockquote style="margin-Top: 0px; margin-Bottom: 0px;
margin-Left: 0.5em">
<div>
<div class="FoxDiv20191216145306432838">
<div>
<div>Dear all,</div>
<div><br>
</div>
<div>Recently, I'm stuck with some AFS issues.</div>
<div><br>
</div>
<div>AFS client hanged with the following log message. In
this case, the AFS instance blocked and jobs failed to
access any files located in AFS. I have to reboot the
work node to recover service.</div>
<div><br>
</div>
<div><span style="color: rgb(0, 0, 0); background-color:
rgba(0, 0, 0, 0);">Dec 6 15:03:18 bws0825 kernel: INFO: task afs_callback:19124 blocked for more than 120 seconds.<br>
Dec 6 15:03:18 bws0825 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.<br>
Dec 6 15:03:18 bws0825 kernel: afs_callback D ffff9860d826e180 0 19124 2 0x00000000<br>
Dec 6 15:03:18 bws0825 kernel: Call Trace:<br>
Dec 6 15:03:18 bws0825 kernel: afs_callback D ffff9860d826e180 0 19124 2 0x00000000<br>
Dec 6 15:03:18 bws0825 kernel: Call Trace:<br>
Dec 6 15:03:18 bws0825 kernel: [<ffffffffa2169df9>] schedule_preempt_disabled+0x29/0x70<br>
Dec 6 15:03:18 bws0825 kernel: [<ffffffffa2167d77>] __mutex_lock_slowpath+0xc7/0x1d0<br>
Dec 6 15:03:18 bws0825 kernel: [<ffffffffa216715f>] mutex_lock+0x1f/0x2f<br>
Dec 6 15:03:18 bws0825 kernel: [<ffffffffc084dff4>] SRXAFSCB_InitCallBackState+0x34/0x470 [openafs]<br>
Dec 6 15:03:18 bws0825 kernel: [<ffffffffc0898047>] ? afs_xdr_vector+0x57/0x90 [openafs]<br>
Dec 6 15:03:18 bws0825 kernel: [<ffffffffc084f19e>] SRXAFSCB_InitCallBackState3+0xe/0x10 [openafs]<br>
Dec 6 15:03:18 bws0825 kernel: [<ffffffffc08b6f43>] RXAFSCB_ExecuteRequest+0x6f3/0x8a0 [openafs]<br>
Dec 6 15:03:18 bws0825 kernel: [<ffffffffa1b028ae>] ? getnstimeofday64+0xe/0x30<br>
Dec 6 15:03:18 bws0825 kernel: [<ffffffffc08ae589>] ? afs_mutex_exit+0x29/0x40 [openafs]<br>
Dec 6 15:03:18 bws0825 kernel: [<ffffffffc08a6a5d>] rxi_ServerProc+0xcd/0x1e0 [openafs]<br>
Dec 6 15:03:18 bws0825 kernel: [<ffffffffc08c74c0>] ? afs_shutdown_pagecopy+0x20/0x20 [openafs]<br>
Dec 6 15:03:18 bws0825 kernel: [<ffffffffc08af017>] rx_ServerProc+0x87/0xe0 [openafs]<br>
Dec 6 15:03:18 bws0825 kernel: [<ffffffffc084eedd>] afs_RXCallBackServer+0x3d/0x50 [openafs]<br>
Dec 6 15:03:18 bws0825 kernel: [<ffffffffc08c76a5>] afsd_thread+0x1e5/0x730 [openafs]<br>
Dec 6 15:03:18 bws0825 kernel: [<ffffffffc08c74c0>] ? afs_shutdown_pagecopy+0x20/0x20 [openafs]<br>
Dec 6 15:03:18 bws0825 kernel: [<ffffffffa1ac1da1>] kthread+0xd1/0xe0<br>
Dec 6 15:03:18 bws0825 kernel: [<ffffffffa1ac1cd0>] ? insert_kthread_work+0x40/0x40<br>
Dec 6 15:03:18 bws0825 kernel: [<ffffffffa2175c1d>] ret_from_fork_nospec_begin+0x7/0x21<br>
Dec 6 15:03:18 bws0825 kernel: [<ffffffffa1ac1cd0>] ? insert_kthread_work+0x40/0x40<br>
Dec 6 15:03:18 bws0825 kernel: INFO: task afs_rxevent:19127 blocked for more than 120 seconds.<br>
Dec 6 15:03:18 bws0825 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.<br>
Dec 6 15:03:18 bws0825 kernel: afs_rxevent D ffff9860cbbf6180 0 19127 2 0x00000000<br>
Dec 6 15:03:18 bws0825 kernel: Call Trace:<br>
Dec 6 15:03:18 bws0825 kernel: [<ffffffffa1aaa2d2>] ? del_timer_sync+0x52/0x60<br>
Dec 6 15:03:18 bws0825 kernel: [<ffffffffa2169df9>] schedule_preempt_disabled+0x29/0x70<br>
Dec 6 15:03:18 bws0825 kernel: [<ffffffffa2167d77>] __mutex_lock_slowpath+0xc7/0x1d0<br>
Dec 6 15:03:18 bws0825 kernel: [<ffffffffa216715f>] mutex_lock+0x1f/0x2f<br>
Dec 6 15:03:18 bws0825 kernel: [<ffffffffc08bdb58>] afs_osi_TimedSleep+0x118/0x210 [openafs]<br>
Dec 6 15:03:18 bws0825 kernel: [<ffffffffa1ad6b60>] ? wake_up_state+0x20/0x20<br>
Dec 6 15:03:18 bws0825 kernel: [<ffffffffc08bdce8>] afs_osi_Wait+0x98/0xd0 [openafs]<br>
Dec 6 15:03:18 bws0825 kernel: [<ffffffffc08c74c0>] ? afs_shutdown_pagecopy+0x20/0x20 [openafs]<br>
Dec 6 15:03:18 bws0825 kernel: [<ffffffffc08af575>] afs_rxevent_daemon+0x95/0x140 [openafs]<br>
Dec 6 15:03:18 bws0825 kernel: [<ffffffffc08c7af6>] afsd_thread+0x636/0x730 [openafs]<br>
Dec 6 15:03:18 bws0825 kernel: [<ffffffffc08c74c0>] ? afs_shutdown_pagecopy+0x20/0x20 [openafs]<br>
Dec 6 15:03:18 bws0825 kernel: [<ffffffffa1ac1da1>] kthread+0xd1/0xe0<br>
Dec 6 15:03:18 bws0825 kernel: [<ffffffffa1ac1cd0>] ? insert_kthread_work+0x40/0x40<br>
Dec 6 15:03:18 bws0825 kernel: [<ffffffffa2175c1d>] ret_from_fork_nospec_begin+0x7/0x21<br>
Dec 6 15:03:18 bws0825 kernel: [<ffffffffa1ac1cd0>] ? insert_kthread_work+0x40/0x40<br>
</span></div>
<div><span style="color: rgb(0, 0, 0); background-color:
rgba(0, 0, 0, 0);">Dec 6 15:03:18 bws0825 kernel: [<ffffffffa2175c1d>] ret_from_fork_nospec_begin+0x7/0x21<br>
Dec 6 15:03:18 bws0825 kernel: [<ffffffffa1ac1cd0>] ? insert_kthread_work+0x40/0x40<br>
Dec 6 15:03:18 bws0825 kernel: INFO: task afs_checkserver:19870 blocked for more than 120 seconds.<br>
Dec 6 15:03:18 bws0825 kernel: "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.<br>
Dec 6 15:03:18 bws0825 kernel: afs_checkserver D ffff9860c7811040 0 19870 2 0x00000000<br>
Dec 6 15:03:18 bws0825 kernel: Call Trace:<br>
Dec 6 15:03:18 bws0825 kernel: [<ffffffffa1aaa2d2>] ? del_timer_sync+0x52/0x60<br>
Dec 6 15:03:18 bws0825 kernel: [<ffffffffa2169df9>] schedule_preempt_disabled+0x29/0x70<br>
Dec 6 15:03:18 bws0825 kernel: [<ffffffffa2167d77>] __mutex_lock_slowpath+0xc7/0x1d0<br>
Dec 6 15:03:18 bws0825 kernel: [<ffffffffa216715f>] mutex_lock+0x1f/0x2f<br>
Dec 6 15:03:18 bws0825 kernel: [<ffffffffc08bdb58>] afs_osi_TimedSleep+0x118/0x210 [openafs]<br>
Dec 6 15:03:18 bws0825 kernel: [<ffffffffa1ad6b60>] ? wake_up_state+0x20/0x20<br>
Dec 6 15:03:18 bws0825 kernel: [<ffffffffc08bdce8>] afs_osi_Wait+0x98/0xd0 [openafs]<br>
Dec 6 15:03:18 bws0825 kernel: [<ffffffffc0853b08>] afs_CheckServerDaemon+0x118/0x1a0 [openafs]<br>
Dec 6 15:03:18 bws0825 kernel: [<ffffffffc08c74c0>] ? afs_shutdown_pagecopy+0x20/0x20 [openafs]<br>
Dec 6 15:03:18 bws0825 kernel: [<ffffffffc08c7930>] afsd_thread+0x470/0x730 [openafs]<br>
Dec 6 15:03:18 bws0825 kernel: [<ffffffffc08c74c0>] ? afs_shutdown_pagecopy+0x20/0x20 [openafs]<br>
Dec 6 15:03:18 bws0825 kernel: [<ffffffffa1ac1da1>] kthread+0xd1/0xe0<br>
Dec 6 15:03:18 bws0825 kernel: [<ffffffffa1ac1cd0>] ? insert_kthread_work+0x40/0x40<br>
Dec 6 15:03:18 bws0825 kernel: [<ffffffffa2175c1d>] ret_from_fork_nospec_begin+0x7/0x21<br>
</span></div>
</div>
</div>
</div>
</blockquote>
</blockquote>
<p>Is there an IO intensive process running in the background ?</p>
<p>Is there an process which uses too much RAM ?</p>
<br>
<blockquote type="cite" cite="mid:2019121614534294436611@ihep.ac.cn">
<blockquote style="margin-Top: 0px; margin-Bottom: 0px;
margin-Left: 0.5em">
<div>
<div class="FoxDiv20191216145306432838">
<div>
<div><span style="color: rgb(0, 0, 0); background-color:
rgba(0, 0, 0, 0);"><br>
</span></div>
<div><span style="color: rgb(0, 0, 0); background-color:
rgba(0, 0, 0, 0);"><br>
</span></div>
<div><span style="color: rgb(0, 0, 0); background-color:
rgba(0, 0, 0, 0);">Does the 1.6.23 is not compatible
with the linux kernel or AFS server version?</span></div>
</div>
</div>
</div>
</blockquote>
</blockquote>
<br>
<p>SL7 has kernel 3.10, since AFS 1.6.4<br>
</p>
<p>SL6 has kernel 2.6, support before AFS 1.6<br>
</p>
<p>Since AFS 1.6.22.4 kernel support up to 4.18 is included</p>
<blockquote type="cite" cite="mid:2019121614534294436611@ihep.ac.cn">
<blockquote style="margin-Top: 0px; margin-Bottom: 0px;
margin-Left: 0.5em">
<div>
<div class="FoxDiv20191216145306432838">
<div>
<div><span style="color: rgb(0, 0, 0); background-color:
rgba(0, 0, 0, 0);"><br>
</span></div>
<div><span style="color: rgb(0, 0, 0); background-color:
rgba(0, 0, 0, 0);">Any information you provided would
be appreciated. </span><span style="font-size: 10.5pt;
line-height: 1.5; background-color: window;">Thanks.</span></div>
<div><span style="font-size: 10.5pt; line-height: 1.5;
background-color: window;"><br>
</span></div>
<div><span style="font-size: 10.5pt; line-height: 1.5;
background-color: window;"><br>
</span></div>
<div><span style="font-size: 10.5pt; line-height: 1.5;
background-color: window;">Regards,</span></div>
<div><span style="font-size: 10.5pt; line-height: 1.5;
background-color: window;">Qiulan</span></div>
</div>
<div><span style="font-size: 10.5pt; line-height: 1.5;
background-color: window;"><br>
</span></div>
<div><br>
</div>
<hr style="width: 210px; height: 1px;" size="1"
color="#b5c4df" align="left">
<div><span>
<div style="MARGIN: 10px; FONT-FAMILY: verdana;
FONT-SIZE: 10pt">
<div>huangql</div>
</div>
</span></div>
<div>====================================================================<br>
Computing center,the Institute of High Energy Physics,
CAS, China<br>
Qiulan Huang Tel: (+86) 10 8823 6087<br>
P.O. Box 918-7 Fax: (+86) 10 8823
6839<br>
Beijing 100049 P.R. China Email:
<a class="moz-txt-link-abbreviated" href="mailto:huangql@ihep.ac.cn">huangql@ihep.ac.cn</a><br>
===================================================================</div>
</div>
</div>
</blockquote>
</blockquote>
<br>
</body>
</html>
--------------B7F92AC70B85453802299C4D--