[OpenAFS-devel] openafs 1.5.50.dfsg1-1 packages and problems on lenny

Dr A V Le Blanc Dr A V Le Blanc <LeBlanc@man.ac.uk>
Tue, 29 Jul 2008 11:11:19 +0100


I sent this to Russ Allbery, and he suggested that I send it to
openafs-devel.

We've got an old AFS cell, and I've been looking at moving to Debian
lenny for the file and db servers, and I started experimenting with
Russ's 1.5.50.dfsg1-1 version, which I used to create a new experimental
cell.  I've seen a number of problems:

The fileserver and dbserver packages had a number of issues; I was able
to create the cell and get a quorum for the vlserver and ptserver, but
attempts to create a volume always ended with a communications failure,
and nothing would ever make the volumes online and readable.

When I attempted to start the client on the cell, it took a very long
time, and then failed, presumably since root.afs wasn't online.  But
I was unable to stop and restart it, getting the message about the lack
of memory which I describe below.

By the way, attempting to run the afs-newcell script even with all the
requirements satisfied (of course) failed.

When I replaced the dbserver, fileserver, client and openafs-krb5 packages
with openafs-1.4.7.dfsg1-2 packages, everything worked perfectly -- even
when I still had the 1.5.50.dfsg1-1 module in the kernel.  This seems
to me to show that it was not a problem with firewalling or other
communications issues.

A typical message from a shutdown was this:

Jul 24 11:39:26 scree kernel: [79231.987117] WARM shutting down of: CB... afs... BkG... CTrunc... AFSDB... RxEvent... UnmaskRxkSignals... RxListener...
Jul 24 11:39:26 scree kernel: [79232.491466] WARNING: not all blocks freed: large 1 small 4
Jul 24 11:39:26 scree kernel: [79232.491466]  ALL allocated tables

also I have this:

Jul 24 13:15:48 scree kernel: [85788.248612] COLD shutting down of: CB... afs... BkG... CTrunc... AFSDB... RxEvent... UnmaskRxkSignals... RxListener...
Jul 24 13:15:48 scree kernel: [85788.871295]  ALL allocated tables
Jul 24 13:15:48 scree kernel: [85788.888977] slab error in kmem_cache_destroy(): cache `afs_inode_cache': Can't free all objects
Jul 24 13:15:48 scree kernel: [85788.993231]  [<c0174519>] kmem_cache_destroy+0x6a/0xb6
Jul 24 13:15:48 scree kernel: [85788.993261]  [<f8b5c9da>] cleanup_module+0x1e/0x32 [openafs]
Jul 24 13:15:48 scree kernel: [85788.993345]  [<c0140dfa>] sys_delete_module+0x1a8/0x1f7
Jul 24 13:15:48 scree kernel: [85788.993374]  [<c01672e1>] remove_vma+0x3e/0x43
Jul 24 13:15:48 scree kernel: [85788.993388]  [<c0167fe4>] do_munmap+0x1ba/0x1d4Jul 24 13:15:48 scree kernel: [85788.993409]  [<c0103982>] syscall_call+0x7/0xb
Jul 24 13:15:48 scree kernel: [85788.993436]  =======================
Jul 24 13:21:11 scree kernel: [86141.296068] Symbol init_mm is marked as UNUSED, however this module is using it.
Jul 24 13:21:11 scree kernel: [86141.296082] This symbol will go away in the future.

and from a failed attempt to restart the client:

Jul 24 13:21:11 scree kernel: [86141.298714] Found system call table at 0xfffffffe (exported)
Jul 24 13:21:11 scree kernel: [86141.298720] Address 0xfffffffe is not writable.Jul 24 13:21:11 scree kernel: [86141.298725] System call hooks will not be installed; proceeding anyway
Jul 24 13:21:11 scree kernel: [86141.298733] kmem_cache_create: duplicate cache
afs_inode_cache
Jul 24 13:21:11 scree kernel: [86141.382946]  [<c0174623>] kmem_cache_create+0xbe/0x33b
Jul 24 13:21:11 scree kernel: [86141.382987]  [<f8b4e68e>] afs_init_inodecache+0x1b/0x2b [openafs]
Jul 24 13:21:11 scree kernel: [86141.383069]  [<f8b4e69e>] init_once+0x0/0x7 [openafs]
Jul 24 13:21:11 scree kernel: [86141.383133]  [<f892f025>] init_module+0x25/0x5f [openafs]
Jul 24 13:21:11 scree kernel: [86141.383193]  [<c0140a85>] sys_init_module+0x1862/0x19e5
Jul 24 13:21:11 scree kernel: [86141.383270]  [<c01304d9>] find_task_by_vpid+0x0/0x19
Jul 24 13:21:11 scree kernel: [86141.383331]  [<c0103982>] syscall_call+0x7/0xb
Jul 24 13:21:11 scree kernel: [86141.383368]  =======================

I have not saved logs from the salvager processes, but there didn't seem to me
to be anything useful in them.

I hope this is useful, and that someone can see what some of the problems are.
Test builds of the kernel module show some peculiarities with other kernels,
at least to the extent of giving a warning message about being unable to
unload sunrpc.  I'd be happy to do any experiments that might help illumine
or solve this problem.

     -- Owen