[OpenAFS] disk cache read error in CacheItems
Benjamin Kaduk
kaduk@mit.edu
Tue, 23 Oct 2018 21:27:42 -0500
On Tue, Oct 23, 2018 at 02:14:38PM +0200, Stephan Wiesand wrote:
>
> > On 23. Oct 2018, at 12:16, Andreas Ladanyi <andreas.ladanyi@kit.edu> wrote:
> >
> >> In the last few days we've observed an increasing number of Nodes,
> >> which are no longer be reached and have to be rebooted
> >>
> >> In the /var/log/messages we see a lot of lines with e.g.
> >>
> >> Oct 22 18:48:26 bird858 kernel: afs: disk cache read error in
> >> CacheItems slot 25254 off 2020340/13880020 code -5/80
> >> Oct 22 18:48:26 bird858 kernel: afs: disk cache read error in
> >> CacheItems slot 25253 off 2020260/13880020 code -5/80
> >> Oct 22 18:48:26 bird858 kernel: afs: disk cache read error in
> >> CacheItems slot 25252 off 2020180/13880020 code -5/80
> >> Oct 22 18:48:26 bird858 kernel: afs: disk cache read error in
> >> CacheItems slot 25251 off 2020100/13880020 code -5/80
> >>
> >> till nothing happens anymore ...
> >>
> >> The clients are Centos 7.5 , 3.10.0-862.14.4.el7.x86_64, OpenAFS
> >> 1.6.23 built 2018-09-12 (289.sl7.862.11.6@fnal.gov)
> >>
> >> Any hints for the possible reason ?
> >
> > I have the same constellation with AFS 1.6.23 client from jsbilling repo.
> >
> > I cant see this messages in /var/log/messages yet.
>
> We're running the same kernel version and the same client build (it's the SL one) on a fair number of SL 7.4 systems, and don't see these issues either.
>
> -5 is EIO, meaning an actual I/O error is reported.
>
> What's the size and type of the cache filesystems? What does "fs getcache report"? What are the afsd parameters? Could these nodes be out of space or inodes for the cache?
It's also possible that the actual disk is having trouble, and/or got
remounted RO. dmesg and/or syslog might have some clues.
(Interestingly enough, we had some changes go by recently on master to make
the error handling for certain cases in this same class more graceful (i.e.,
fail requests but not panic), though those changes are not in 1.6.23.)
-Ben