[OpenAFS-port-freebsd] Problems on 5.2-R

Nathan A. Cosgray ncosgray@u.washington.edu
Tue, 13 Apr 2004 10:21:50 -0700


Hello,

I'm having two problems with OpenAFS on a FreeBSD 5.2-R system we have here.

# uname -a
FreeBSD x.washington.edu 5.2-RELEASE FreeBSD 5.2-RELEASE #0: Sun Jan 11 
04:21:45 GMT 2004 
root@wv1u.btc.adaptec.com:/usr/obj/usr/src/sys/GENERIC  i386

Firstly, we cannot get the stable 1.2.11 to compile on this system. The 
latest unstable release that will successfully compile is 1.3.52. All 
later versions fail as follows (this is an attempt to compile 1.3.63):

# ./configure --with-afs-sysname=i386_fbsd_52 --enable-transarc-paths
[deletia]
# make
[deletia]
touch MODLOAD/sec_net.h
cd MODLOAD ;  echo make DESTDIR= single_compdir_libafs;  make DESTDIR= 
single_compdir_libafs
make DESTDIR= single_compdir_libafs
cc -I. -I.. -I../nfs  -I/usr/ports/AFS/openafs-1.3.63/src 
-I/usr/ports/AFS/openafs-1.3.63/src/afs 
-I/usr/ports/AFS/openafs-1.3.63/src/afs/FBSD 
-I/usr/ports/AFS/openafs-1.3.63/src/config 
-I/usr/ports/AFS/openafs-1.3.63/src/rx/FBSD 
-I/usr/ports/AFS/openafs-1.3.63/src/rxkad 
-I/usr/ports/AFS/openafs-1.3.63/src/rxkad/domestic 
-I/usr/ports/AFS/openafs-1.3.63/src/util 
-I/usr/ports/AFS/openafs-1.3.63/src 
-I/usr/ports/AFS/openafs-1.3.63/src/afs 
-I/usr/ports/AFS/openafs-1.3.63/src/afs/FBSD 
-I/usr/ports/AFS/openafs-1.3.63/src/util 
-I/usr/ports/AFS/openafs-1.3.63/src/rxkad 
-I/usr/ports/AFS/openafs-1.3.63/src/config 
-I/usr/ports/AFS/openafs-1.3.63/src/fsint 
-I/usr/ports/AFS/openafs-1.3.63/src/vlserver 
-I/usr/ports/AFS/openafs-1.3.63/include 
-I/usr/ports/AFS/openafs-1.3.63/include/afs  -O -I. -I.. 
-I/usr/ports/AFS/openafs-1.3.63/src/config  -DAFSDEBUG -DKERNEL -DAFS 
-DVICE -DNFS -DUFS -DINET -DQUOTA -DGETMOUNT -Wall -ansi -nostdinc 
-I/usr/include -D_KERNEL -DKLD_MODULE  -elf -mpreferred-stack-boundary=2 
  -mno-align-long-strings -fformat-extensions -fno-common -ffreestanding 
  -I/usr/src/sys/i386/compile/GENERIC -include opt_global.h 
-fno-strict-aliasing  -O2 -c 
/usr/ports/AFS/openafs-1.3.63/src/util/afs_atomlist.c
cc1: opt_global.h: No such file or directory
*** Error code 1

Stop in /usr/ports/AFS/openafs-1.3.63/src/libafs/MODLOAD.
*** Error code 1

Stop in /usr/ports/AFS/openafs-1.3.63/src/libafs.
*** Error code 1

Stop in /usr/ports/AFS/openafs-1.3.63.
*** Error code 1

Stop in /usr/ports/AFS/openafs-1.3.63.
*** Error code 1

Stop in /usr/ports/AFS/openafs-1.3.63.

The error is the same on all 1.3.6x versions. Anyone have a suggestion 
about what may be wrong or missing? opt_global.h does not exist anywhere 
on the machine. So that's the first problem.

The above configure + make sequence succeeds on 1.3.52, so this is what 
we have been using. And at first glance 1.3.52 seems to work perfectly, 
with all required OpenAFS server daemons running and operational. But 
what I have noticed is that fileserver coredumps precisely every 5 minutes.

/usr/afs/logs/FileLog:
Tue Apr  6 13:26:47 2004 File server starting
Tue Apr  6 13:26:47 2004 afs_krb_get_lrealm failed, using x.washington.edu.
Tue Apr  6 13:26:47 2004 Partition /vicepa: attaching volumes
Tue Apr  6 13:26:47 2004 Partition /vicepa: attached 2 volumes; 0 
volumes not attached
Tue Apr  6 13:26:47 2004 Getting FileServer name...
Tue Apr  6 13:26:47 2004 FileServer host name is 'x.washington.edu'
Tue Apr  6 13:26:47 2004 Getting FileServer address...
Tue Apr  6 13:26:47 2004 Set thread id 14 for FSYNC_sync
Tue Apr  6 13:26:47 2004 Set thread id 15 for 'FiveMinuteCheckLWP'
Tue Apr  6 13:26:47 2004 Set thread id 16 for 'HostCheckLWP'
Tue Apr  6 13:26:47 2004 Set thread id 17 for 'FsyncCheckLWP'
Tue Apr  6 13:26:47 2004 FileServer x.washington.edu has address x (x or 
x in host byte order)
Tue Apr  6 13:26:47 2004 File Server started Tue Apr  6 13:26:47 2004
Tue Apr  6 13:31:47 2004 pthread_cond_timedwait returned 22

I ran a ktrace too:
[deletia]
  11867 fileserver RET   poll 0
  11867 fileserver CALL  poll(0x80ea000,0x4,0x24c3)
  11867 fileserver RET   poll 0
  11867 fileserver CALL  gettimeofday(0x2810ab18,0)
  11867 fileserver RET   gettimeofday 0
  11867 fileserver CALL  gettimeofday(0xbf9ccf98,0)
  11867 fileserver RET   gettimeofday 0
  11867 fileserver CALL  gettimeofday(0xbf9ccb38,0)
  11867 fileserver RET   gettimeofday 0
  11867 fileserver CALL  write(0x6,0xbf9ccb70,0x3c)
  11867 fileserver GIO   fd 6 wrote 60 bytes
        "Tue Apr  6 10:41:47 2004 pthread_cond_timedwait returned 22
        "
  11867 fileserver RET   write 60/0x3c
  11867 fileserver CALL  gettimeofday(0xbf9ccf48,0)
  11867 fileserver RET   gettimeofday 0
  11867 fileserver CALL  fstat(0x2,0xbf9ccc30)
  11867 fileserver RET   fstat 0
  11867 fileserver CALL  ioctl(0x2,TIOCGETA,0xbf9ccc70)
  11867 fileserver RET   ioctl -1 errno 25 Inappropriate ioctl for device
  11867 fileserver CALL  write(0x2,0x8603000,0x4e)
  11867 fileserver GIO   fd 2 wrote 78 bytes
        "Tue Apr  6 10:41:47 2004
         : Assertion failed! file ../viced/viced.c, line 501.
        "
  11867 fileserver RET   write 78/0x4e
  11867 fileserver CALL  write(0x5,0x85fe000,0x1)
  11867 fileserver GIO   fd 5 wrote 1 byte
        "\r"
  11867 fileserver RET   write 1
  11867 fileserver CALL  getpid
  11867 fileserver RET   getpid 11867/0x2e5b
  11867 fileserver CALL  kill(0x2e5b,0x6)
  11867 fileserver RET   kill 0
  11867 fileserver PSIG  SIGIOT caught handler=0x280f9ac0 mask=0x0 code=0x0
  11867 fileserver CALL  sigreturn(0xbf9ccbe0)
  11867 fileserver RET   sigreturn JUSTRETURN
  11867 fileserver CALL  sigaction(0x6,0xbf9cced0,0)
  11867 fileserver RET   sigaction 0
  11867 fileserver CALL  getpid
  11867 fileserver RET   getpid 11867/0x2e5b
  11867 fileserver CALL  kill(0x2e5b,0x6)
  11867 fileserver RET   kill 0
  11867 fileserver PSIG  SIGIOT SIG_DFL
  11867 fileserver NAMI  "fileserver.core"

So, any ideas? I've heard there are pthread problems with OpenAFS but 
was unable to find solutions. We'd really like to get AFS running on 
this server, but things are not looking good so far.

thanks!
-n.