[OpenAFS-devel] openafs 1.5.50.dfsg1-1 packages and problems on lenny

Derrick Brashear shadow@gmail.com
Tue, 29 Jul 2008 08:05:00 -0400


If mount fails, (no root.afs and not dynroot for instance) the client
failure is well-known. The old Linux-AFS client did something clever
which we probably should also: fake a root and later swap in the real
one.

I'm more interested in the server errors. Can you share anything from the logs?

On Tue, Jul 29, 2008 at 6:11 AM, Dr A V Le Blanc <LeBlanc@man.ac.uk> wrote:
> I sent this to Russ Allbery, and he suggested that I send it to
> openafs-devel.
>
> We've got an old AFS cell, and I've been looking at moving to Debian
> lenny for the file and db servers, and I started experimenting with
> Russ's 1.5.50.dfsg1-1 version, which I used to create a new experimental
> cell.  I've seen a number of problems:
>
> The fileserver and dbserver packages had a number of issues; I was able
> to create the cell and get a quorum for the vlserver and ptserver, but
> attempts to create a volume always ended with a communications failure,
> and nothing would ever make the volumes online and readable.
>
> When I attempted to start the client on the cell, it took a very long
> time, and then failed, presumably since root.afs wasn't online.  But
> I was unable to stop and restart it, getting the message about the lack
> of memory which I describe below.
>
> By the way, attempting to run the afs-newcell script even with all the
> requirements satisfied (of course) failed.
>
> When I replaced the dbserver, fileserver, client and openafs-krb5 packages
> with openafs-1.4.7.dfsg1-2 packages, everything worked perfectly -- even
> when I still had the 1.5.50.dfsg1-1 module in the kernel.  This seems
> to me to show that it was not a problem with firewalling or other
> communications issues.
>
> A typical message from a shutdown was this:
>
> Jul 24 11:39:26 scree kernel: [79231.987117] WARM shutting down of: CB... afs... BkG... CTrunc... AFSDB... RxEvent... UnmaskRxkSignals... RxListener...
> Jul 24 11:39:26 scree kernel: [79232.491466] WARNING: not all blocks freed: large 1 small 4
> Jul 24 11:39:26 scree kernel: [79232.491466]  ALL allocated tables
>
> also I have this:
>
> Jul 24 13:15:48 scree kernel: [85788.248612] COLD shutting down of: CB... afs... BkG... CTrunc... AFSDB... RxEvent... UnmaskRxkSignals... RxListener...
> Jul 24 13:15:48 scree kernel: [85788.871295]  ALL allocated tables
> Jul 24 13:15:48 scree kernel: [85788.888977] slab error in kmem_cache_destroy(): cache `afs_inode_cache': Can't free all objects
> Jul 24 13:15:48 scree kernel: [85788.993231]  [<c0174519>] kmem_cache_destroy+0x6a/0xb6
> Jul 24 13:15:48 scree kernel: [85788.993261]  [<f8b5c9da>] cleanup_module+0x1e/0x32 [openafs]
> Jul 24 13:15:48 scree kernel: [85788.993345]  [<c0140dfa>] sys_delete_module+0x1a8/0x1f7
> Jul 24 13:15:48 scree kernel: [85788.993374]  [<c01672e1>] remove_vma+0x3e/0x43
> Jul 24 13:15:48 scree kernel: [85788.993388]  [<c0167fe4>] do_munmap+0x1ba/0x1d4Jul 24 13:15:48 scree kernel: [85788.993409]  [<c0103982>] syscall_call+0x7/0xb
> Jul 24 13:15:48 scree kernel: [85788.993436]  =======================
> Jul 24 13:21:11 scree kernel: [86141.296068] Symbol init_mm is marked as UNUSED, however this module is using it.
> Jul 24 13:21:11 scree kernel: [86141.296082] This symbol will go away in the future.
>
> and from a failed attempt to restart the client:
>
> Jul 24 13:21:11 scree kernel: [86141.298714] Found system call table at 0xfffffffe (exported)
> Jul 24 13:21:11 scree kernel: [86141.298720] Address 0xfffffffe is not writable.Jul 24 13:21:11 scree kernel: [86141.298725] System call hooks will not be installed; proceeding anyway
> Jul 24 13:21:11 scree kernel: [86141.298733] kmem_cache_create: duplicate cache
> afs_inode_cache
> Jul 24 13:21:11 scree kernel: [86141.382946]  [<c0174623>] kmem_cache_create+0xbe/0x33b
> Jul 24 13:21:11 scree kernel: [86141.382987]  [<f8b4e68e>] afs_init_inodecache+0x1b/0x2b [openafs]
> Jul 24 13:21:11 scree kernel: [86141.383069]  [<f8b4e69e>] init_once+0x0/0x7 [openafs]
> Jul 24 13:21:11 scree kernel: [86141.383133]  [<f892f025>] init_module+0x25/0x5f [openafs]
> Jul 24 13:21:11 scree kernel: [86141.383193]  [<c0140a85>] sys_init_module+0x1862/0x19e5
> Jul 24 13:21:11 scree kernel: [86141.383270]  [<c01304d9>] find_task_by_vpid+0x0/0x19
> Jul 24 13:21:11 scree kernel: [86141.383331]  [<c0103982>] syscall_call+0x7/0xb
> Jul 24 13:21:11 scree kernel: [86141.383368]  =======================
>
> I have not saved logs from the salvager processes, but there didn't seem to me
> to be anything useful in them.
>
> I hope this is useful, and that someone can see what some of the problems are.
> Test builds of the kernel module show some peculiarities with other kernels,
> at least to the extent of giving a warning message about being unable to
> unload sunrpc.  I'd be happy to do any experiments that might help illumine
> or solve this problem.
>
>     -- Owen
> _______________________________________________
> OpenAFS-devel mailing list
> OpenAFS-devel@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-devel
>