1.3.79 and kernel Oops (was Re: [OpenAFS] Problems with 2.6.10, 1.3.79 & memcache)

Kevin openafs@gnosys.biz
Fri, 11 Mar 2005 10:33:42 -0500


On Fri, 2005-03-11 at 13:40 +0100, Lars Schimmer wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> Lars Schimmer schrieb:
> | Hi!
> |
> | I setup a filserver on debian sarge on 2.6.10 and 1.3.79 and I wanted to
> | try out
> | the memcache option.
> | But after a reboot syslog pronts out:
> | allocation failed: out of vmalloc space - use vmalloc=<size> to increase
> | size.
<snip>
> | Has anyone else seeing these kind of errors?
> Ok, after setting memcache to diskcache, AFS starts and I can get tokens and can
> access the AFS fs.
> But if I setup a volume on that machine and try to copy data in that volume, AFS
> ooops.

Me too.  I'm running gentoo-dev-sources kernel version 2.6.10 on a
current-with-portage 2004.3 Gentoo x86 system.

In my case, it only happened one time out of two copy operations (one
comprised of 24 files totaling about 6MB, the other many files totaling
about 710MB).  I performed the smaller copy operation first (shortly
after afs system startup and creation of a new volume on the new server)
and it was extremely slow (> 10 minutes to complete), but it did
complete without any apparent errors.  The larger copy operation was
done maybe 12 - 18 hours after system startup and also shortly after
creation of yet another new volume on the new server.

I noticed problems with the copy operation during the copy of the second
file out of many (see below; the first file was copied with no errors).

=========================
adam@hermes data $ cp -a * /afs/.folkvang.org/netdata/
cp: preserving times for
`/afs/.folkvang.org/netdata/blender/tutorials/blendervt-interface-concept-v234-r0.avi': No such file or directory
cp: preserving times for
`/afs/.folkvang.org/netdata/blender/tutorials/blendervt-interface-vitalfunctions-appending-v234-r0.avi': No such file or directory
cp: preserving times for
`/afs/.folkvang.org/netdata/blender/tutorials/blendervt-interface-vitalfunctions-v234-r0.avi': No such file or directory
cp: preserving times for
`/afs/.folkvang.org/netdata/blender/tutorials/blendervt-materials-rampshaders-v234-r0.avi': No such file or directory
cp: preserving times for
`/afs/.folkvang.org/netdata/blender/tutorials/dupliframe.avi': No such
file or directory
cp: preserving times for
`/afs/.folkvang.org/netdata/blender/tutorials/duplivert.avi': No such
file or directory
cp: preserving times for
`/afs/.folkvang.org/netdata/blender/tutorials/kfipo.avi': No such file
or directory
cp: preserving times for
`/afs/.folkvang.org/netdata/blender/tutorials/subdiv.avi': No such file
or directory
cp: preserving times for
`/afs/.folkvang.org/netdata/blender/tutorials/uvtut.avi': No such file
or directory
cp: cannot open `blender/instructional/blends/yot08a5t.zip' for reading:
Permission denied
cp: cannot open `blender/instructional/blender/09-10_tuto.zip' for
reading: Permission denied
cp: preserving times for
`/afs/.folkvang.org/netdata/blender/instructional/blender/CUBE_SOLID.AVI': No such file or directory
cp: preserving times for
`/afs/.folkvang.org/netdata/blender/instructional/blender/CUBE_WIRE.AVI': No such file or directory
cp: preserving times for
`/afs/.folkvang.org/netdata/blender/instructional/blender/FINAL.AVI': No
such file or directory
cp: preserving times for
`/afs/.folkvang.org/netdata/blender/instructional/blender/JUMPYLOGO.AVI': No such file or directory
cp: preserving times for
`/afs/.folkvang.org/netdata/blender/instructional/blender/PARTICLES.AVI': No such file or directory
cp: preserving times for
`/afs/.folkvang.org/netdata/blender/instructional/blender/TUNNEL.AVI':
No such file or directory
cp: preserving times for
`/afs/.folkvang.org/netdata/blender/instructional/blender/ZOOMLOGO.AVI':
No such file or directory
cp: cannot open `blender/instructional/blender/files/BLENDER.FILES.zip'
for reading: Permission denied
cp: preserving times for
`/afs/.folkvang.org/netdata/blender/instructional/blender/files/02.QUICKSTART/STAGE06.BLEND': No such file or directory
cp: preserving times for
`/afs/.folkvang.org/netdata/blender/instructional/blender/files/02.QUICKSTART/TEXTURES/BRICK05.JPG': No such file or directory
cp: preserving times for
`/afs/.folkvang.org/netdata/blender/instructional/blender/files/02.QUICKSTART/TEXTURES/WOOD_27.TGA': No such file or directory
cp: preserving times for
`/afs/.folkvang.org/netdata/blender/instructional/blender/files/09.ANIMATION/10.SEQUENCE_EDITOR/TUNNEL.MPG': No such file or directory
cp: preserving times for
`/afs/.folkvang.org/netdata/blender/instructional/blender/files/09.ANIMATION/10.SEQUENCE_EDITOR/TUNNEL_4.BLEND': No such file or directory
cp: preserving times for
`/afs/.folkvang.org/netdata/blender/instructional/blender/files/09.ANIMATION/10.SEQUENCE_EDITOR/TUNNEL_EFFECT.BLEND': No such file or directory
cp: preserving times for `/afs/.folkvang.org/netdata/vis5d
+-1.3.0-beta/vis5d-5.2/src/libmcidas5.a': No such file or directory
cp: preserving times for `/afs/.folkvang.org/netdata/vis5d
+-1.3.0-beta/vis5d-5.2/util/comp_to_v5d.c': No such file or directory
cp: preserving times for `/afs/.folkvang.org/netdata/vis5d
+-1.3.0-beta/SCHL.v5d': No such file or directory
cp: preserving times for `/afs/.folkvang.org/netdata/vis5d
+-1.3.0-beta/LAMPS.v5d': No such file or directory
=========================

After the copy completed (albeit unsuccessfully), the afs system seemed
to become unstable (ls reported "no such file or directory," but then
mkdir of same directory failed with "directory already present").

At that point, I tried to shutdown the afs server gracefully (using bos
shutdown -server...), but the system couldn't manage it.  Some of the
server processes exited gracefully, but not all (I can provide some more
log files if anyone would like them).  I ended up having to manually
kill some of the processes, and I could not remove the afslib module
from the kernel.  When attempted, the system reported that the module
was in use.  I unloaded all other modules and killed all afs processes,
but still could not remove the libafs module.  I tried restarting afs,
but no go.  At that point, I tried rebooting the system, but the kernel
was extremely unstable and although the system went through the motions
of shutdown -r now or shutdown -h now, it never actually rebooted or
shutdown.  Finally, when I tried to sync (in preparation for hard
reset), the system locked up (sync never completed) and I had to power
down the machine to get it to restart.

In /var/log/messages, I also have this:

=========================
Mar 11 07:58:19 aphrodite rxi_AllocPacket error<1>Unable to handle
kernel paging request at virtual address ffffffff
Mar 11 07:58:19 aphrodite printing eip:
Mar 11 07:58:19 aphrodite f8a61951
Mar 11 07:58:19 aphrodite *pde = 00002067
Mar 11 07:58:19 aphrodite Oops: 0002 [#1]
Mar 11 07:58:19 aphrodite PREEMPT
Mar 11 07:58:19 aphrodite Modules linked in: ntfs libafs snd_pcm_oss
snd_mixer_oss snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device
snd_intel8x0 snd_ac97_codec snd_pcm snd_timer snd soundcore
snd_page_alloc
Mar 11 07:58:19 aphrodite CPU:    0
Mar 11 07:58:19 aphrodite EIP:    0060:[<f8a61951>]    Tainted: P
VLI
Mar 11 07:58:19 aphrodite EFLAGS: 00010292   (2.6.10-gentoo-r6-kf2)
Mar 11 07:58:19 aphrodite EIP is at osi_Panic+0x31/0x40 [libafs]
Mar 11 07:58:19 aphrodite eax: 00000015   ebx: f8a9c97c   ecx: 00000000
edx: c044294c
Mar 11 07:58:19 aphrodite esi: 00000000   edi: 00003767   ebp: e80b1880
esp: d71ebeb4
Mar 11 07:58:19 aphrodite ds: 007b   es: 007b   ss: 0068
Mar 11 07:58:19 aphrodite Process afs_rxlistener (pid: 28757,
threadinfo=d71eb000 task=f779b0a0)
Mar 11 07:58:19 aphrodite Stack: f8a82847 00000000 f8a5c9fc 00000002
f8a62c5c f8a82847 00000000 f8a5c9fc
Mar 11 07:58:19 aphrodite 00000002 f8994ae0 00000001 00000000 e80b1890
0f000000 00000000 f89b9410
Mar 11 07:58:19 aphrodite f89977e8 00000000 00000000 e80b1880 00000000
f8a5a5ea e80b1880 00000000
Mar 11 07:58:19 aphrodite Call Trace:
Mar 11 07:58:19 aphrodite [<f8a5c9fc>] rxi_SendAck+0x50c/0x530 [libafs]
Mar 11 07:58:19 aphrodite [<f8a62c5c>] rxi_AllocPacketNoLock+0xbc/0x110
[libafs]Mar 11 07:58:19 aphrodite [<f8a5c9fc>] rxi_SendAck+0x50c/0x530
[libafs]
Mar 11 07:58:19 aphrodite [<f8a5a5ea>] rxi_ReceiveDataPacket+0x29a/0x710
[libafs]
Mar 11 07:58:19 aphrodite [<f8a59c92>] rxi_ReceivePacket+0x262/0x620
[libafs]
Mar 11 07:58:19 aphrodite [<f8a62364>] rxk_Listener+0x164/0x1a0 [libafs]
Mar 11 07:58:19 aphrodite [<f8a71ed3>] afsd_thread+0x393/0x3a0 [libafs]
Mar 11 07:58:19 aphrodite [<f8a71b40>] afsd_thread+0x0/0x3a0 [libafs]
Mar 11 07:58:19 aphrodite [<c01012bd>] kernel_thread_helper+0x5/0x18
Mar 11 07:58:19 aphrodite Code: a8 f8 8b 54 24 14 85 d2 0f 44 d0 8b 44
24 20 89 14 24 89 44 24 0c 8b 44 24 1c 89 44 24 08 8b 44 24 18 89 44 24
04 e8 2f 87 6b c7 <c6> 05 ff ff ff ff 2a 83 c4 10 c3 8d 74 26 00 55 b8
ff ff ff ff
Mar 11 08:00:01 aphrodite /usr/sbin/cron[30767]: (root) CMD (test
-x /usr/sbin/run-crons && /usr/sbin/run-crons )
Mar 11 08:00:01 aphrodite /usr/sbin/cron[30768]: (root) CMD (rm
-f /var/spool/cron/lastrun/cron.hourly)
Mar 11 08:00:42 aphrodite afs: Lost contact with file server
192.168.69.60 in cell folkvang.org (all multi-homed ip addresses down
for the server)
Mar 11 08:00:42 aphrodite afs: Lost contact with file server
192.168.69.60 in cell folkvang.org (all multi-homed ip addresses down
for the server)
Mar 11 08:10:01 aphrodite /usr/sbin/cron[30788]: (root) CMD (test
-x /usr/sbin/run-crons && /usr/sbin/run-crons )
Mar 11 08:11:29 aphrodite afs: Lost contact with file server
192.168.69.6 in cell folkvang.org (all multi-homed ip addresses down for
the server)
Mar 11 08:11:29 aphrodite afs: Lost contact with file server
192.168.69.6 in cell folkvang.org (all multi-homed ip addresses down for
the server)
Mar 11 08:20:01 aphrodite /usr/sbin/cron[30807]: (root) CMD (test
-x /usr/sbin/run-crons && /usr/sbin/run-crons )
Mar 11 08:30:01 aphrodite /usr/sbin/cron[30846]: (root) CMD (test
-x /usr/sbin/run-crons && /usr/sbin/run-crons )
Mar 11 08:32:07 aphrodite WARM shutting down of: CB... afs... BkG...
CTrunc... AFSDB... RxEvent... UnmaskRxkSignals... RxListener...
<1>Unable to handle kernel NULL pointer dereference at virtual address
00000004
Mar 11 08:32:07 aphrodite printing eip:
Mar 11 08:32:07 aphrodite c0123770
Mar 11 08:32:07 aphrodite *pde = 00000000
Mar 11 08:32:07 aphrodite Oops: 0000 [#2]
Mar 11 08:32:07 aphrodite PREEMPT
Mar 11 08:32:07 aphrodite Modules linked in: ntfs libafs snd_pcm_oss
snd_mixer_oss snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device
snd_intel8x0 snd_ac97_codec snd_pcm snd_timer snd soundcore
snd_page_alloc
Mar 11 08:32:07 aphrodite CPU:    0
Mar 11 08:32:07 aphrodite EIP:    0060:[<c0123770>]    Tainted: P
VLI
Mar 11 08:32:07 aphrodite EFLAGS: 00010002   (2.6.10-gentoo-r6-kf2)
Mar 11 08:32:07 aphrodite EIP is at flush_signals+0x20/0x80
Mar 11 08:32:07 aphrodite eax: 00000001   ebx: e1f89000   ecx: 00000000
edx: 00007055
Mar 11 08:32:07 aphrodite esi: 00000000   edi: 00000282   ebp: e1f89000
esp: e1f89eec
Mar 11 08:32:07 aphrodite ds: 007b   es: 007b   ss: 0068
Mar 11 08:32:07 aphrodite Process umount (pid: 30887,
threadinfo=e1f89000 task=f7a46a60)
Mar 11 08:32:07 aphrodite Stack: 00000400 00000000 c04cfa00 f8a92040
f8a617a0 00000000 00007055 f79d9000
Mar 11 08:32:07 aphrodite f79d9000 f8a72eca f8a82e98 f8a6e675 f8a92040
cfd0e8d4 c015ec5d f79d9000
Mar 11 08:32:07 aphrodite e1f89000 0000000e 00000000 c015f5c7 f79d9000
c0174ac7 f79d9000 f8a92000
Mar 11 08:32:07 aphrodite Call Trace:
Mar 11 08:32:07 aphrodite [<f8a617a0>] osi_StopListener+0x70/0xa0
[libafs]
Mar 11 08:32:07 aphrodite [<f8a72eca>] afs_shutdown+0x13a/0x2f0 [libafs]
Mar 11 08:32:07 aphrodite [<f8a6e675>] afs_put_super+0x25/0xb0 [libafs]
Mar 11 08:32:07 aphrodite [<c015ec5d>] generic_shutdown_super
+0x11d/0x130
Mar 11 08:32:07 aphrodite [<c015f5c7>] kill_anon_super+0x27/0x60
Mar 11 08:32:07 aphrodite [<c0174ac7>] free_vfsmnt+0x27/0x30
Mar 11 08:32:07 aphrodite [<c015ea4b>] deactivate_super+0x5b/0x90
Mar 11 08:32:07 aphrodite [<c017536f>] sys_umount+0x3f/0xa0
Mar 11 08:32:07 aphrodite [<c014ba66>] do_munmap+0x146/0x190
Mar 11 08:32:07 aphrodite [<c014bb00>] sys_munmap+0x50/0x80
Mar 11 08:32:07 aphrodite [<c01753e5>] sys_oldumount+0x15/0x20
Mar 11 08:32:07 aphrodite [<c0103089>] sysenter_past_esp+0x52/0x75
Mar 11 08:32:07 aphrodite Code: 47 e7 01 00 eb ca 90 8d 74 26 00 83 ec
10 89 74 24 08 8b 74 24 14 89 5c 24 04 89 7c 24 0c 9c 5f fa bb 00 f0 ff
ff 21 e3 ff 43 14 <8b> 46 04 0f ba 70 08 02 8d 86 84 04 00 00 89 04 24
e8 5a ff ff
Mar 11 08:32:07 aphrodite <6>note: umount[30887] exited with
preempt_count 2
Mar 11 08:32:07 aphrodite scheduling while atomic:
umount/0x00000002/30887
Mar 11 08:32:07 aphrodite [<c03cc4e2>] schedule+0x532/0x540
Mar 11 08:32:07 aphrodite [<c0147223>] unmap_page_range+0x53/0x80
Mar 11 08:32:07 aphrodite [<c0147406>] unmap_vmas+0x1b6/0x1d0
Mar 11 08:32:07 aphrodite [<c014be6d>] exit_mmap+0x7d/0x160
Mar 11 08:32:07 aphrodite [<c0117b17>] mmput+0x37/0xa0
Mar 11 08:32:07 aphrodite [<c011c4ef>] do_exit+0x16f/0x470
Mar 11 08:32:07 aphrodite [<c01042cb>] die+0x18b/0x190
Mar 11 08:32:07 aphrodite [<c011a097>] printk+0x17/0x20
Mar 11 08:32:07 aphrodite [<c0114b41>] do_page_fault+0x241/0x5ae
Mar 11 08:32:07 aphrodite [<c0171524>] d_rehash+0x54/0x80
Mar 11 08:32:07 aphrodite [<c0115e32>] activate_task+0x62/0x80
Mar 11 08:32:07 aphrodite [<c0115f26>] try_to_wake_up+0xa6/0xc0
Mar 11 08:32:07 aphrodite [<c0116891>] __wake_up_common+0x41/0x70
Mar 11 08:32:07 aphrodite [<c0114900>] do_page_fault+0x0/0x5ae
Mar 11 08:32:07 aphrodite [<c0103b17>] error_code+0x2b/0x30
Mar 11 08:32:07 aphrodite [<c0123770>] flush_signals+0x20/0x80
Mar 11 08:32:07 aphrodite [<f8a617a0>] osi_StopListener+0x70/0xa0
[libafs]
Mar 11 08:32:07 aphrodite [<f8a72eca>] afs_shutdown+0x13a/0x2f0 [libafs]
Mar 11 08:32:07 aphrodite [<f8a6e675>] afs_put_super+0x25/0xb0 [libafs]
Mar 11 08:32:07 aphrodite [<c015ec5d>] generic_shutdown_super
+0x11d/0x130
Mar 11 08:32:07 aphrodite [<c015f5c7>] kill_anon_super+0x27/0x60
Mar 11 08:32:07 aphrodite [<c0174ac7>] free_vfsmnt+0x27/0x30
Mar 11 08:32:07 aphrodite [<c015ea4b>] deactivate_super+0x5b/0x90
Mar 11 08:32:07 aphrodite [<c017536f>] sys_umount+0x3f/0xa0
Mar 11 08:32:07 aphrodite [<c014ba66>] do_munmap+0x146/0x190
Mar 11 08:32:07 aphrodite [<c014bb00>] sys_munmap+0x50/0x80
Mar 11 08:32:07 aphrodite [<c01753e5>] sys_oldumount+0x15/0x20
Mar 11 08:32:07 aphrodite [<c0103089>] sysenter_past_esp+0x52/0x75
Mar 11 08:32:35 aphrodite rc-scripts: Error starting AFS
Mar 11 08:36:59 aphrodite rc-scripts: WARNING:  "afs" has already been
started.
Mar 11 08:37:21 aphrodite You must reload the AFS kernel extensions
before remounting AFS.
Mar 11 08:37:21 aphrodite rc-scripts: Error starting AFS
Mar 11 08:37:46 aphrodite rc-scripts: status:  stopped
Mar 11 08:37:52 aphrodite You must reload the AFS kernel extensions
before remounting AFS.
Mar 11 08:37:52 aphrodite rc-scripts: Error starting AFS
Mar 11 08:39:47 aphrodite su(pam_unix)[30409]: session closed for user
root
Mar 11 08:39:49 aphrodite su(pam_unix)[30423]: session closed for user
root
Mar 11 08:40:01 aphrodite /usr/sbin/cron[31215]: (root) CMD (test
-x /usr/sbin/run-crons && /usr/sbin/run-crons )
Mar 11 08:40:19 aphrodite kde(pam_unix)[30296]: session closed for user
adam
Mar 11 08:40:20 aphrodite (adam-30392): Received signal 15, shutting
down cleanly
Mar 11 08:40:20 aphrodite (adam-30392): Exiting
Mar 11 08:42:43 aphrodite login(pam_unix)[8425]: session opened for user
root by (uid=0)
Mar 11 08:50:01 aphrodite /usr/sbin/cron[31861]: (root) CMD (test
-x /usr/sbin/run-crons && /usr/sbin/run-crons )
Mar 11 08:50:22 aphrodite init: Switching to runlevel: 1
Mar 11 08:50:26 aphrodite ntpd[1695]: ntpd exiting on signal 15
Mar 11 08:50:26 aphrodite sshd[1745]: Received signal 15; terminating.
Mar 11 08:50:28 aphrodite xinetd[7471]: Exiting...
Mar 11 08:50:28 aphrodite syslog-ng[1586]: syslog-ng version 1.6.5 going
down
=========================

I'd be glad to try and help isolate the source of the problem(s) if
anyone would like that.  Just let me know exactly what I should do.


-- 
-Kevin
http://www.gnosys.us