[OpenAFS] mysterious freezes

Derrick J Brashear shadow@dementia.org
Tue, 21 Oct 2003 12:46:44 -0400 (EDT)


On Tue, 21 Oct 2003, Lee Damon wrote:

> We've been having a lot of mysterious deaths/freezes over the past several
> months.  It dropped off for a while, but is back with a vengence of late.
>
> The symptoms are mixed.  Some systems die with the screen blanker on
> (no hope of seeing what's on the console), some die with the screen
> saver completely frozen (if they were running X), some die scrolling
> stuff on the console so fast there is no hope of reading any of it, etc.
> They are frequently pingable, and when you telnet to a port (22, for
> example) you get a connection, but no banner.
>
> None of the hosts respond to <ctrl><alt><del>.
>
> Nothing is ever logged.
>
> Today one died in such a state that the screen blanker was still working.
> I was able to read the screen for once:
>
> [<f8a33388>] phTable [libafs-2.4.20-18.7-i386.nikola.mp] 0xa8 (0xf3eebe54))
> [<c012d8fc>] do_no_page [kernel] 0x3c (0xf3eebe5c))
> [<f89c17a8>] afs_dir_GetBlob [libafs-2.4.20-18.7-i386.nikola.mp] 0x18 (oxf3eebe74))
> [<f89dc36e>] BlobScan [libafs-2.4.20-18.7-i386.nikola.mp] 0x2e (0xf3eebe94))
> [<f89c7195>] PagInCred [libafs-2.4.20-18.7-i386.nikola.mp] 0x35 (0xf3eebea0))

PagInCred is either InitReq or the nfs exporter handler. I'd guess the
former. BlobScan is either readdir or BulkStat. Thing is, that doesn't
make sense. If you compiled this yourself, can you --enable-debug-kernel
at configure time (to remove -fomit-frame-pointer) to give the backtrace a
chance of being more meaningful? The stack trace you have looks tained.

> [<f89d7ed9>] afs_EvalFakeStat [libafs-2.4.20-18.7-i386.nikola.mp] 0x19 (0xf3eebec0))
> [<f89e0a4d>] afs_close [libafs-2.4.20-18.7-i386.nikola.mp] 0x3bd (0xf3eebee0))
> [<c0150ade>] getname [kernel] 0x5e (0xf3eebf0c))
> [<c0151c7b>] path_lookup [kernel] 0x1b (0xf3eebf20))
> [<c0151f74>] __user_walk [kernel] 0x24 (0xf3eebf30))
> [<c014dda7>] vfs_stat [kernel] 0x17 (0xf3eebf44))
> [<c0147314>] fput [kernel] 0xd4 (0xf3ebf64))
> [<c014e351>] sys_stat64 [kernel] 0x11 (0xf3eebf70))
> [<c0145df5>] flip_close [kernel] 0x95 (0xf3eebf8c)
> [<c0145e5b>] sys_close [kernel] 0x5b (0xf3eebfb0))
> [<c0108be3>] system_call [kernel] 0x33 (0xf3eebfc0))
>
> About the host in question:
>
> AFS version 1.2.10
> Linux tahoe5 2.4.20-20.7smp #1 SMP Mon Aug 18 14:46:14 EDT 2003 i686 unknown