[OpenAFS] Solaris AFS client down - why does this happen

Karl Behler karl.behler@ipp.mpg.de
Wed, 04 Nov 2015 17:50:03 +0100


Dear Mark and Ben,

thanks for your response. We could not find which component in our 
system may have caused the "umount".
But since then it never happened again. I think we will go over to a 
newer version of the client and then see what happens.

Best regards,

Karl

On 28.10.15 19:38, Mark Vitale wrote:
> Hi Karl,
>
> On Oct 16, 2015, at 10:46 AM, Karl Behler <karl.behler@ipp.mpg.de> wrote:
>
>> we experience unwanted "shutdown" events of our OpenAFS 1.6.9 clients under Solaris 10.
>>
>> Running this client since October last year without problems on ten Solaris desktop servers which reboot regularly on weekends, we recently had kind of crashes on nearly half of these servers in the middle of a week.
>>
>> The log file (/var/adm/messages) contains kernel messages which look like a shutdown which seems to be initiated by the afsd itself.
>> (In the following log the real event starts at Oct 16 11:54:47)
>>
>> Oct 16 11:35:39 sxaug37 genunix: [ID 900631 kern.notice] afs: byte-range lock/unlock ignored; make sure no one else is running this program (pid 23006 (thunderbird-bin), user 13471, fid 1108706165.12934.344145).
>> Oct 16 11:39:23 sxaug37 genunix: [ID 900631 kern.notice] afs: byte-range lock/unlock ignored; make sure no one else is running this program (pid 22054 (firefox-bin), user 6570, fid 1108604831.175334.13229850).
>> Oct 16 11:49:23 sxaug37 last message repeated 1 time
>> Oct 16 11:54:47 sxaug37 genunix: [ID 146023 kern.notice] afs: WARM
>> Oct 16 11:54:47 sxaug37 genunix: [ID 510892 kern.notice] shutting down of: vcaches...
>> Oct 16 11:54:47 sxaug37 genunix: [ID 159345 kern.notice] Failed to flush vcache 0x28e2f840
>> Oct 16 11:54:47 sxaug37 genunix: [ID 159345 kern.notice] Failed to flush vcache 0x2924b960
>> Oct 16 11:54:47 sxaug37 genunix: [ID 159345 kern.notice] Failed to flush vcache 0x28114c00
>> Oct 16 11:54:47 sxaug37 genunix: [ID 159345 kern.notice] Failed to flush vcache 0x27d49000
>> ... several hundert similar messages
>> Oct 16 11:54:47 sxaug37 genunix: [ID 159345 kern.notice] Failed to flush vcache 0x2811dbc0
>> Oct 16 11:54:47 sxaug37 genunix: [ID 159345 kern.notice] Failed to flush vcache 0x28a53c60
>> Oct 16 11:54:47 sxaug37 genunix: [ID 159345 kern.notice] Failed to flush vcache 0x27e10460
>> Oct 16 11:54:47 sxaug37 genunix: [ID 159345 kern.notice] Failed to flush vcache 0x289fad40
>> Oct 16 11:54:47 sxaug37 genunix: [ID 364168 kern.notice] BkG...
>> Oct 16 11:54:47 sxaug37 genunix: [ID 338304 kern.notice] CB...
>> Oct 16 11:54:47 sxaug37 genunix: [ID 543876 kern.notice] afs...
>> Oct 16 11:54:47 sxaug37 genunix: [ID 229921 kern.notice] CTrunc...
>> Oct 16 11:54:47 sxaug37 genunix: [ID 916331 kern.notice] AFSDB...
>> Oct 16 11:54:47 sxaug37 genunix: [ID 196290 kern.notice] RxEvent...
>> Oct 16 11:54:48 sxaug37 genunix: [ID 687192 kern.notice] UnmaskRxkSignals...
>> Oct 16 11:54:48 sxaug37 genunix: [ID 346748 kern.notice] RxListener...
>> Oct 16 11:54:48 sxaug37 genunix: [ID 890369 kern.notice] NetIfPoller...
>> Oct 16 11:54:48 sxaug37 genunix: [ID 288918 kern.notice] WARNING: not all blocks freed: large 0 small 217
>> Oct 16 11:54:48 sxaug37 genunix: [ID 646860 kern.notice]  ALL allocated tables...
>> Oct 16 11:54:48 sxaug37 genunix: [ID 773001 kern.notice] done
>> Oct 16 11:58:24 sxaug37 genunix: [ID 540533 kern.notice] ^MSunOS Release 5.10 Version Generic_150401-28 64-bit
>> Oct 16 11:58:24 sxaug37 genunix: [ID 282658 kern.notice] Copyright (c) 1983, 2015, Oracle and/or its affiliates. All rights reserved.
>>
>> Sometimes the system reboots immediately and sometimes the system stays in a state where all attempts to access AFS end with I/O Error.
>>
>> Any idea what happens and what to do?
> afsd WARM shutdown is triggered automatically when /afs is unmounted, i.e.  '# umount /afs'.
>
>
> Regards,
> --
> Mark Vitale
> Sine Nomine Associates


-- 
Dr. Karl Behler	
CODAC & IT services ASDEX Upgrade
phon +49 89 3299-1351 fax 3299-961351