[OpenAFS] Can't get this going on Coraid CLN22 (Debian).
Thu, 29 Mar 2007 20:00:49 -0500
I won't call it "fixed", but with much help from the guys in #openafs,
we did get things working.
The problem appears to be in ulimit:
nas1:~# ulimit -a
core file size (blocks, -c) 0
data seg size (kbytes, -d) unlimited
max nice (-e) 0
file size (blocks, -f) unlimited
pending signals (-i) unlimited
max locked memory (kbytes, -l) unlimited
max memory size (kbytes, -m) unlimited
open files (-n) 1024
pipe size (512 bytes, -p) 8
POSIX message queues (bytes, -q) unlimited
max rt priority (-r) 0
stack size (kbytes, -s) 8192
cpu time (seconds, -t) unlimited
max user processes (-u) unlimited
virtual memory (kbytes, -v) unlimited
file locks (-x) unlimited
The stack size is set to 8192. We had to change that to unlimited, then
things started working, so ulimit -s unlimited.
Ed, if you see this...any thoughts on what might cause this?
I've been instructed to file a bug report on openafs-bugs, and to debian
regarding the package, as the /etc/init.d/openafs-filserver script has
to be modified to do ulimit -s unlimited at each startup, as the setting
is a per-session thing. Speculation as to the cause is welcome.
Please don't think a small thing of this. I've spent well over 40
hours, along with the help of several people to weed this out!
Tony Shadwick wrote:
> I've been bouncing in and out of #OpenAFS for the last week trying to
> get this working, and I've been working with Coraid support and all to
> no avail. It appears something is up with pthreads, but Coraid support
> ran a test and pthreads work in the kernel. Rather than copy and paste
> the whole long deal, here's the page I have on my site with all of the
> In that log you'll see I've tried using both afs-newcell and the script
> found at Debian World.
> Here's the logs without and without fileserver -d 99 turned on (I know,
> bad loglevel, didn't know until afterwards though):
> nas1:/var/log/openafs# cat /var/log/openafs/FileLog
> Thu Mar 29 13:52:06 2007 File server starting
> Thu Mar 29 13:52:06 2007 afs_krb_get_lrealm failed, using
> Thu Mar 29 13:52:06 2007 Set thread id 14 for FSYNC_sync
> Thu Mar 29 13:52:06 2007 Partition /vicepa: attaching volumes
> Thu Mar 29 13:52:06 2007 Partition /vicepa: attached 0 volumes; 0
> volumes not attached
> Thu Mar 29 13:52:06 2007
> : Assertion failed! file ../viced/viced.c, line 1956.
> and with logging turned up:
> nas1:/var/log/openafs# cat FileLog
> Thu Mar 29 14:03:02 2007 File server starting
> Thu Mar 29 14:03:02 2007 afs_krb_get_lrealm failed, using
> Thu Mar 29 14:03:02 2007 VL_RegisterAddrs rpc failed; will retry
> periodically (code=5376, err=0)
> Thu Mar 29 14:03:02 2007 Set thread id 14 for FSYNC_sync
> Thu Mar 29 14:03:02 2007 Partition /vicepa: attaching volumes
> Thu Mar 29 14:03:02 2007 Partition /vicepa: attached 0 volumes; 0
> volumes not attached
> Thu Mar 29 14:03:02 2007 Starting pthreads
> Thu Mar 29 14:03:02 2007 Starting five minute check process
> Thu Mar 29 14:03:02 2007 Set thread id 15 for 'FiveMinuteCheckLWP'
> Thu Mar 29 14:03:02 2007
> : Assertion failed! file ../viced/viced.c, line 1958.
> The code in question:
> 1954 assert(pthread_create
> 1955 (&serverPid, &tattr, (void *)FiveMinuteCheckLWP,
> 1956 &fiveminutes) == 0);
> 1957 assert(pthread_create
> 1958 (&serverPid, &tattr, (void *)HostCheckLWP, &fiveminutes)
> == 0);
> 1959 assert(pthread_create
> 1960 (&serverPid, &tattr, (void *)FsyncCheckLWP, &fiveminutes)
> == 0);
> 1961 #else /* AFS_PTHREAD_ENV */
> 1962 ViceLog(5, ("Starting LWP\n"));
> 1963 assert(LWP_CreateProcess
> 1964 (FiveMinuteCheckLWP, stack * 1024, LWP_MAX_PRIORITY - 2,
> 1965 (void *)&fiveminutes, "FiveMinuteChecks",
> 1966 &serverPid) == LWP_SUCCESS);
> Totally lost, frustrated and confused. Any devs wish to take pity on me
> and help? This is an AMD64 box running Debian.
> Tony Shadwick
> OSS Solutions
> OpenAFS-info mailing list