[OpenAFS] disk cache read error in CacheItems

Stephan Wiesand stephan.wiesand@desy.de
Tue, 23 Oct 2018 14:14:38 +0200


> On 23. Oct 2018, at 12:16, Andreas Ladanyi <andreas.ladanyi@kit.edu> =
wrote:
>=20
>> In the last few days we've observed an increasing number of Nodes,
>> which are no longer be reached and have to be rebooted
>>=20
>> In the /var/log/messages we see a lot of lines with e.g.
>>=20
>> Oct 22 18:48:26 bird858 kernel: afs: disk cache read error in
>> CacheItems slot 25254 off 2020340/13880020 code -5/80
>> Oct 22 18:48:26 bird858 kernel: afs: disk cache read error in
>> CacheItems slot 25253 off 2020260/13880020 code -5/80
>> Oct 22 18:48:26 bird858 kernel: afs: disk cache read error in
>> CacheItems slot 25252 off 2020180/13880020 code -5/80
>> Oct 22 18:48:26 bird858 kernel: afs: disk cache read error in
>> CacheItems slot 25251 off 2020100/13880020 code -5/80
>>=20
>> till nothing happens anymore ...
>>=20
>> The clients are  Centos 7.5 , 3.10.0-862.14.4.el7.x86_64, OpenAFS
>> 1.6.23 built 2018-09-12 (289.sl7.862.11.6@fnal.gov)
>>=20
>> Any hints for the possible reason ?
>=20
> I have the same constellation with AFS 1.6.23 client from jsbilling =
repo.
>=20
> I cant see this messages in /var/log/messages yet.

We're running the same kernel version and the same client build (it's =
the SL one) on a fair number of SL 7.4 systems, and don't see these =
issues either.

-5 is EIO, meaning an actual I/O error is reported.

What's the size and type of the cache filesystems? What does "fs =
getcache report"? What are the afsd parameters? Could these nodes be out =
of space or inodes for the cache?

--=20
Stephan Wiesand
DESY -DV-
Platanenallee 6
15738 Zeuthen, Germany