[OpenAFS] Solaris AFS client down - why does this happen

Karl Behler karl.behler@ipp.mpg.de
Fri, 16 Oct 2015 16:46:29 +0200


Dear All,

we experience unwanted "shutdown" events of our OpenAFS 1.6.9 clients 
under Solaris 10.

Running this client since October last year without problems on ten 
Solaris desktop servers which reboot regularly on weekends, we recently 
had kind of crashes on nearly half of these servers in the middle of a week.

The log file (/var/adm/messages) contains kernel messages which look 
like a shutdown which seems to be initiated by the afsd itself.
(In the following log the real event starts at Oct 16 11:54:47)

Oct 16 11:35:39 sxaug37 genunix: [ID 900631 kern.notice] afs: byte-range lock/unlock ignored; make sure no one else is running this program (pid 23006 (thunderbird-bin), user 13471, fid 1108706165.12934.344145).
Oct 16 11:39:23 sxaug37 genunix: [ID 900631 kern.notice] afs: byte-range lock/unlock ignored; make sure no one else is running this program (pid 22054 (firefox-bin), user 6570, fid 1108604831.175334.13229850).
Oct 16 11:49:23 sxaug37 last message repeated 1 time
Oct 16 11:54:47 sxaug37 genunix: [ID 146023 kern.notice] afs: WARM
Oct 16 11:54:47 sxaug37 genunix: [ID 510892 kern.notice] shutting down of: vcaches...
Oct 16 11:54:47 sxaug37 genunix: [ID 159345 kern.notice] Failed to flush vcache 0x28e2f840
Oct 16 11:54:47 sxaug37 genunix: [ID 159345 kern.notice] Failed to flush vcache 0x2924b960
Oct 16 11:54:47 sxaug37 genunix: [ID 159345 kern.notice] Failed to flush vcache 0x28114c00
Oct 16 11:54:47 sxaug37 genunix: [ID 159345 kern.notice] Failed to flush vcache 0x27d49000
... several hundert similar messages
Oct 16 11:54:47 sxaug37 genunix: [ID 159345 kern.notice] Failed to flush vcache 0x2811dbc0
Oct 16 11:54:47 sxaug37 genunix: [ID 159345 kern.notice] Failed to flush vcache 0x28a53c60
Oct 16 11:54:47 sxaug37 genunix: [ID 159345 kern.notice] Failed to flush vcache 0x27e10460
Oct 16 11:54:47 sxaug37 genunix: [ID 159345 kern.notice] Failed to flush vcache 0x289fad40
Oct 16 11:54:47 sxaug37 genunix: [ID 364168 kern.notice] BkG...
Oct 16 11:54:47 sxaug37 genunix: [ID 338304 kern.notice] CB...
Oct 16 11:54:47 sxaug37 genunix: [ID 543876 kern.notice] afs...
Oct 16 11:54:47 sxaug37 genunix: [ID 229921 kern.notice] CTrunc...
Oct 16 11:54:47 sxaug37 genunix: [ID 916331 kern.notice] AFSDB...
Oct 16 11:54:47 sxaug37 genunix: [ID 196290 kern.notice] RxEvent...
Oct 16 11:54:48 sxaug37 genunix: [ID 687192 kern.notice] UnmaskRxkSignals...
Oct 16 11:54:48 sxaug37 genunix: [ID 346748 kern.notice] RxListener...
Oct 16 11:54:48 sxaug37 genunix: [ID 890369 kern.notice] NetIfPoller...
Oct 16 11:54:48 sxaug37 genunix: [ID 288918 kern.notice] WARNING: not all blocks freed: large 0 small 217
Oct 16 11:54:48 sxaug37 genunix: [ID 646860 kern.notice]  ALL allocated tables...
Oct 16 11:54:48 sxaug37 genunix: [ID 773001 kern.notice] done
Oct 16 11:58:24 sxaug37 genunix: [ID 540533 kern.notice] ^MSunOS Release 5.10 Version Generic_150401-28 64-bit
Oct 16 11:58:24 sxaug37 genunix: [ID 282658 kern.notice] Copyright (c) 1983, 2015, Oracle and/or its affiliates. All rights reserved.

Sometimes the system reboots immediately and sometimes the system stays 
in a state where all attempts to access AFS end with I/O Error.

Any idea what happens and what to do?

Best regards,

Karl

-- 
Dr. Karl Behler	
CODAC & IT services ASDEX Upgrade
phon +49 89 3299-1351 fax 3299-961351