[OpenAFS] Can't get this going on Coraid CLN22 (Debian).

Tony Shadwick tshadwick@oss-solutions.com
Thu, 29 Mar 2007 14:29:16 -0500


I've been bouncing in and out of #OpenAFS for the last week trying to 
get this working, and I've been working with Coraid support and all to 
no avail.  It appears something is up with pthreads, but Coraid support 
ran a test and pthreads work in the kernel.  Rather than copy and paste 
the whole long deal, here's the page I have on my site with all of the info:

http://www.numbski.com/hacks/coraid/openafs-on-cln22.html

In that log you'll see I've tried using both afs-newcell and the script 
found at Debian World.

Here's the logs without and without fileserver -d 99 turned on (I know, 
bad loglevel, didn't know until afterwards though):

nas1:/var/log/openafs# cat /var/log/openafs/FileLog
Thu Mar 29 13:52:06 2007 File server starting
Thu Mar 29 13:52:06 2007 afs_krb_get_lrealm failed, using
oss-solutions.com.
Thu Mar 29 13:52:06 2007 Set thread id 14 for FSYNC_sync
Thu Mar 29 13:52:06 2007 Partition /vicepa: attaching volumes
Thu Mar 29 13:52:06 2007 Partition /vicepa: attached 0 volumes; 0
volumes not attached
Thu Mar 29 13:52:06 2007
: Assertion failed! file ../viced/viced.c, line 1956.


and with logging turned up:

nas1:/var/log/openafs# cat FileLog
Thu Mar 29 14:03:02 2007 File server starting
Thu Mar 29 14:03:02 2007 afs_krb_get_lrealm failed, using
oss-solutions.com.
Thu Mar 29 14:03:02 2007 VL_RegisterAddrs rpc failed; will retry
periodically (code=5376, err=0)
Thu Mar 29 14:03:02 2007 Set thread id 14 for FSYNC_sync
Thu Mar 29 14:03:02 2007 Partition /vicepa: attaching volumes
Thu Mar 29 14:03:02 2007 Partition /vicepa: attached 0 volumes; 0
volumes not attached
Thu Mar 29 14:03:02 2007 Starting pthreads
Thu Mar 29 14:03:02 2007 Starting five minute check process
Thu Mar 29 14:03:02 2007 Set thread id 15 for 'FiveMinuteCheckLWP'
Thu Mar 29 14:03:02 2007
: Assertion failed! file ../viced/viced.c, line 1958.

The code in question:

1954    assert(pthread_create
1955           (&serverPid, &tattr, (void *)FiveMinuteCheckLWP,
1956            &fiveminutes) == 0);
1957    assert(pthread_create
1958           (&serverPid, &tattr, (void *)HostCheckLWP, &fiveminutes) 
== 0);
1959    assert(pthread_create
1960           (&serverPid, &tattr, (void *)FsyncCheckLWP, &fiveminutes) 
== 0);
1961 #else /* AFS_PTHREAD_ENV */
1962    ViceLog(5, ("Starting LWP\n"));
1963    assert(LWP_CreateProcess
1964           (FiveMinuteCheckLWP, stack * 1024, LWP_MAX_PRIORITY - 2,
1965            (void *)&fiveminutes, "FiveMinuteChecks",
1966            &serverPid) == LWP_SUCCESS);

Totally lost, frustrated and confused.  Any devs wish to take pity on me 
and help?  This is an AMD64 box running Debian.

Tony Shadwick
OSS Solutions