[OpenAFS-devel] OpenAFS on 2.4.26 ? OpenMosix ?
Jeffrey Hutzelman
jhutz@cmu.edu
Wed, 15 Dec 2004 14:48:32 -0500
On Wednesday, December 15, 2004 14:02:26 -0500 Terry Gliedt <tpg@umich.edu>
wrote:
>####### from /var/log/messages Watch for line wraps
>
> Unable to handle kernel NULL pointer dereference at virtual address
> 00000004 printing eip:
> f8b73af8
> *pde = 2bcc0001
> *pte = 00000000
> Oops: 0000
> CPU: 2
> EIP: 0010:[<f8b73af8>] Tainted: PF
> EFLAGS: 00010282
> eax: 20003312 ebx: f8c4be14 ecx: ec6b5dfc edx: 00000000
> esi: f8c4c038 edi: ec6b5da0 ebp: ec6b5da0 esp: ecbbfe40
> ds: 0018 es: 0018 ss: 0018
> Process cp (pid: 3288, stackpage=ecbbf000)
> Stack: f9417000 ecbbe000 00000000 f8c4be14 f8c4c038 ecbbfe90 ec6b5da0
> f8b776b2 ec6b5da0 ec6b5dfc 00000002 ecbbfe90 c0360a00 ec71ad20
> 00000001 f9417000 ec6b5dfc f8c4c038 ec6b5dfc 0000ffff 0001e194
> 00000040 f8ba22c0 f8b78a00 Call Trace: [<f8b776b2>] [<f8ba22c0>]
> [<f8b78a00>] [<c01611ed>] [<c0161a22>] [<c01620c9>] [<c0162429>]
> [<c0153443>] [<c016c8d1>] [<c0155f88>] [<c01befd5>] [<c01bf0df>]
> [<c010b8bc>]
>
> Code: 39 42 04 0f 84 c7 00 00 00 e8 3a e7 ff ff 89 c5 50 8d 44 24
That's not surprising. In all of the cases you described where a process
randomly seg faults, you should see output like that in /var/log/messages
or in dmesg output. There are a wide variety of bad things that, if user
code does them, cause the program to exit on a signal like SIGSEGV or
SIGBUS, and drop a core file. In Linux, if one of these things happens in
kernel code, the process exits on SIGSEGV (no core), and you get an "oops"
message which contains information about the state of the kernel at the
time of the failure. That's what the message you quoted is.
Unfortunately, the oops message is not useful in its raw form. All of the
numbers you see in [<>] are actually addresses inside the kernel. In order
for the backtrace to be useful, these need to be converted to symbolic
form. This is usually done automatically by the logging software, if it
can find the kernel symbol table, which is usually available in a file
called "System.map". Since the conversion did not happen automatically,
you will need to either find and use ksymoops, or reconfigure the kernel
logging software to do the translation, and then reproduce the problem
again.
The simplest thing to do is to make sure that klogd is able to find the
System.map file, and that it is not invoked with -x. You will probably get
the best results by running klogd with -p, so it will reload symbol table
information when it sees an error (otherwise it may not have a complete set
of symbols for openafs).
FWIW, I have not heard of anyone getting OpenAFS and OpenMosix to work
together, even to the extent that you've reported so far. We have had
several reports of failures in the past, though...
-- Jeffrey T. Hutzelman (N3NHS) <jhutz+@cmu.edu>
Sr. Research Systems Programmer
School of Computer Science - Research Computing Facility
Carnegie Mellon University - Pittsburgh, PA