[OpenAFS] Solaris AFS client down - why does this happen

Mark Vitale mvitale@sinenomine.net
Wed, 28 Oct 2015 18:38:35 +0000


Hi Karl,

On Oct 16, 2015, at 10:46 AM, Karl Behler <karl.behler@ipp.mpg.de> wrote:

> we experience unwanted "shutdown" events of our OpenAFS 1.6.9 clients und=
er Solaris 10.
>=20
> Running this client since October last year without problems on ten Solar=
is desktop servers which reboot regularly on weekends, we recently had kind=
 of crashes on nearly half of these servers in the middle of a week.
>=20
> The log file (/var/adm/messages) contains kernel messages which look like=
 a shutdown which seems to be initiated by the afsd itself.
> (In the following log the real event starts at Oct 16 11:54:47)
>=20
> Oct 16 11:35:39 sxaug37 genunix: [ID 900631 kern.notice] afs: byte-range =
lock/unlock ignored; make sure no one else is running this program (pid 230=
06 (thunderbird-bin), user 13471, fid 1108706165.12934.344145).
> Oct 16 11:39:23 sxaug37 genunix: [ID 900631 kern.notice] afs: byte-range =
lock/unlock ignored; make sure no one else is running this program (pid 220=
54 (firefox-bin), user 6570, fid 1108604831.175334.13229850).
> Oct 16 11:49:23 sxaug37 last message repeated 1 time
> Oct 16 11:54:47 sxaug37 genunix: [ID 146023 kern.notice] afs: WARM
> Oct 16 11:54:47 sxaug37 genunix: [ID 510892 kern.notice] shutting down of=
: vcaches...
> Oct 16 11:54:47 sxaug37 genunix: [ID 159345 kern.notice] Failed to flush =
vcache 0x28e2f840
> Oct 16 11:54:47 sxaug37 genunix: [ID 159345 kern.notice] Failed to flush =
vcache 0x2924b960
> Oct 16 11:54:47 sxaug37 genunix: [ID 159345 kern.notice] Failed to flush =
vcache 0x28114c00
> Oct 16 11:54:47 sxaug37 genunix: [ID 159345 kern.notice] Failed to flush =
vcache 0x27d49000
> ... several hundert similar messages
> Oct 16 11:54:47 sxaug37 genunix: [ID 159345 kern.notice] Failed to flush =
vcache 0x2811dbc0
> Oct 16 11:54:47 sxaug37 genunix: [ID 159345 kern.notice] Failed to flush =
vcache 0x28a53c60
> Oct 16 11:54:47 sxaug37 genunix: [ID 159345 kern.notice] Failed to flush =
vcache 0x27e10460
> Oct 16 11:54:47 sxaug37 genunix: [ID 159345 kern.notice] Failed to flush =
vcache 0x289fad40
> Oct 16 11:54:47 sxaug37 genunix: [ID 364168 kern.notice] BkG...
> Oct 16 11:54:47 sxaug37 genunix: [ID 338304 kern.notice] CB...
> Oct 16 11:54:47 sxaug37 genunix: [ID 543876 kern.notice] afs...
> Oct 16 11:54:47 sxaug37 genunix: [ID 229921 kern.notice] CTrunc...
> Oct 16 11:54:47 sxaug37 genunix: [ID 916331 kern.notice] AFSDB...
> Oct 16 11:54:47 sxaug37 genunix: [ID 196290 kern.notice] RxEvent...
> Oct 16 11:54:48 sxaug37 genunix: [ID 687192 kern.notice] UnmaskRxkSignals=
...
> Oct 16 11:54:48 sxaug37 genunix: [ID 346748 kern.notice] RxListener...
> Oct 16 11:54:48 sxaug37 genunix: [ID 890369 kern.notice] NetIfPoller...
> Oct 16 11:54:48 sxaug37 genunix: [ID 288918 kern.notice] WARNING: not all=
 blocks freed: large 0 small 217
> Oct 16 11:54:48 sxaug37 genunix: [ID 646860 kern.notice]  ALL allocated t=
ables...
> Oct 16 11:54:48 sxaug37 genunix: [ID 773001 kern.notice] done
> Oct 16 11:58:24 sxaug37 genunix: [ID 540533 kern.notice] ^MSunOS Release =
5.10 Version Generic_150401-28 64-bit
> Oct 16 11:58:24 sxaug37 genunix: [ID 282658 kern.notice] Copyright (c) 19=
83, 2015, Oracle and/or its affiliates. All rights reserved.
>=20
> Sometimes the system reboots immediately and sometimes the system stays i=
n a state where all attempts to access AFS end with I/O Error.
>=20
> Any idea what happens and what to do?

afsd WARM shutdown is triggered automatically when /afs is unmounted, i.e. =
 '# umount /afs'.


Regards,
--
Mark Vitale
Sine Nomine Associates