[OpenAFS] mysterious freezes

Lee Damon nomad@ssli-mail.ee.washington.edu
Tue, 21 Oct 2003 09:34:48 -0700


We've been having a lot of mysterious deaths/freezes over the past several
months.  It dropped off for a while, but is back with a vengence of late.

The symptoms are mixed.  Some systems die with the screen blanker on
(no hope of seeing what's on the console), some die with the screen
saver completely frozen (if they were running X), some die scrolling
stuff on the console so fast there is no hope of reading any of it, etc.
They are frequently pingable, and when you telnet to a port (22, for
example) you get a connection, but no banner.

None of the hosts respond to <ctrl><alt><del>.

Nothing is ever logged.

Today one died in such a state that the screen blanker was still working.
I was able to read the screen for once:

[<f8a33388>] phTable [libafs-2.4.20-18.7-i386.nikola.mp] 0xa8 (0xf3eebe54))
[<c012d8fc>] do_no_page [kernel] 0x3c (0xf3eebe5c))
[<f89c17a8>] afs_dir_GetBlob [libafs-2.4.20-18.7-i386.nikola.mp] 0x18 (oxf3eebe74))
[<f89dc36e>] BlobScan [libafs-2.4.20-18.7-i386.nikola.mp] 0x2e (0xf3eebe94))
[<f89c7195>] PagInCred [libafs-2.4.20-18.7-i386.nikola.mp] 0x35 (0xf3eebea0))
[<f89d7ed9>] afs_EvalFakeStat [libafs-2.4.20-18.7-i386.nikola.mp] 0x19 (0xf3eebec0))
[<f89e0a4d>] afs_close [libafs-2.4.20-18.7-i386.nikola.mp] 0x3bd (0xf3eebee0))
[<c0150ade>] getname [kernel] 0x5e (0xf3eebf0c))
[<c0151c7b>] path_lookup [kernel] 0x1b (0xf3eebf20))
[<c0151f74>] __user_walk [kernel] 0x24 (0xf3eebf30))
[<c014dda7>] vfs_stat [kernel] 0x17 (0xf3eebf44))
[<c0147314>] fput [kernel] 0xd4 (0xf3ebf64))
[<c014e351>] sys_stat64 [kernel] 0x11 (0xf3eebf70))
[<c0145df5>] flip_close [kernel] 0x95 (0xf3eebf8c)
[<c0145e5b>] sys_close [kernel] 0x5b (0xf3eebfb0))
[<c0108be3>] system_call [kernel] 0x33 (0xf3eebfc0))

About the host in question:

AFS version 1.2.10
Linux tahoe5 2.4.20-20.7smp #1 SMP Mon Aug 18 14:46:14 EDT 2003 i686 unknown


I'd appreciate any hints.

thanks,
nomad
 -----------                       - Lee "nomad" Damon -          \
work: nomad@ee.washington.edu                                      \
play: nomad@castle.org    or castle!nomad                           \
                                                                    /\
Sr. Systems Admin, UWEE SSLI Lab                                   /  \
                "Celebrate Diversity"                             /    \