[OpenAFS] Problems with OpenAFS 1.4.1.

Beam Davis beam@transmeta.com
Thu, 15 Jun 2006 15:22:59 -0700


I was running an IBM AFS cell on IBM pSeries (RS/6000) hardware (on the 
server side).  I have clients running IBM AFS on IBM pSeries machines 
(AIX 5.1) and OpenAFS on both Linux & a few legacy Solaris machines.  It 
was decided to migrate to OpenAFS 1.4.1 database & file servers running 
Linux (2.6.14, 64-bit).  I've moved over to the new Linux servers 
without much difficulty, but...

We used to use Netbackup to backup our cell, but Netbackup doesn't 
support OpenAFS file servers (only IBM AFS file servers).  Netbackup 
understood the IBM AFS vice partitions and could backup volumes directly 
without the need for buserver or butc.  Now we have to use buserver and 
butc to dump the contents of our cell to file so Netbackup can backup 
the files.

I configured buserver on our 3 new OpenAFS database servers.  It is 
running OK and backup can talk to it.  I have 3 file servers, 2 running 
OpenAFS on Linux and 1 still running IBM AFS on AIX (needed if we have 
to restore any of the volumes backed up with Netbackup from the old IBM 
AFS file servers).

I have configured and started an instance of butc for each day of the 
week on both OpenAFS file servers (nothing is backed up from the IBM AFS 
file server -- it is just for restores).  I've also configured and 
started a separate butc instance for each day of the week for "backup 
savedb"'s.

I've included additional configuration below, but first my problem...  
When I run, "backup savedb 13 -localauth", it works fine, but when I 
run, "backup dump anu.weekly /full 3 -localauth", it lists all the files 
it's going to backup, then it seg. faults:

...
        root.projects.backup (536870929)
        rs_aix51.usr.afsws.backup (536870926)
        rs_aix51.usr.backup (536870923)
        rs_aix51.backup (536870920)
        root.cell.backup (536870917)
        root.afs.backup (536870914)
Segmentation fault

The same thing happens with the other file server.  I even tried 
creating a volset to dump only 1 volume, and "backup dump" still seg. 
faulted.

If I try to use IBM AFS's backup command to backup one of the OpenAFS 
file servers from the IBM AFS file server, I get this:

...
        root.projects.backup (536870929)
        rs_aix51.usr.afsws.backup (536870926)
        rs_aix51.usr.backup (536870923)
        rs_aix51.backup (536870920)
        root.cell.backup (536870917)
        root.afs.backup (536870914)
backup: waiting for job termination
Starting dump
backup: Task 3001: Dump (anu.weekly.full)

...but then the butc process for port 3 seg. faults and dies on the 
OpenAFS file server that was the target of the "backup dump" operation.

My question is, what the heck am I doing wrong?  Does anything see a 
problem with my configuration or have any idea why this stuff keeps seg. 
faulting?

About how I'm starting my butc's...  I've created a subdirectory under 
"/usr/afs/local" for each butc instance (example: 
"/usr/afs/local/butc0") and I start each one in their own subdirectory 
with this command: "nohup /usr/sbin/butc -port 0 -localauth &".  Of 
course, I put the appropriate port number after "-port".  Since they are 
all started in separate subdirectories, they each write to their own 
"nohup.out" file.  If there is a problem with using nohup with butc (I'm 
not aware of any), please let me know.

Additional Configuration Information:

My Saturday butc "CFG" file on each of the 2 OpenAFS file servers, for 
example, is called "CFG_afsbackup_sat" and looks like this (actually, 
all my "CFG" files look like this):

ASK NO
AUTOQUERY NO
FILE YES
NAME_CHECK NO

The "tapeconfig" from one of my file servers looks like this:

1.5T 0 /afsbackup/sun 0
1.5T 0 /afsbackup/mon 1
1.5T 0 /afsbackup/tue 2
1.5T 0 /afsbackup/wed 3
1.5T 0 /afsbackup/thu 4
1.5T 0 /afsbackup/fri 5
1.5T 0 /afsbackup/sat 6
1.5T 0 /afsbackup/sundb 20
1.5T 0 /afsbackup/mondb 21
1.5T 0 /afsbackup/tuedb 22
1.5T 0 /afsbackup/weddb 23
1.5T 0 /afsbackup/thudb 24
1.5T 0 /afsbackup/fridb 25
1.5T 0 /afsbackup/satdb 26

Ports 20-26 are for "backup savedb".  The one on other file server is 
pretty similar, but no "backup savedb" butc's and the ports are numbered 
10-16.  To "touch"'ed each on of the target files on both file servers, 
so they exist (0 bytes).

This is the output from "backup listhosts":

Tape hosts:
    Host anu, port offset 0
    Host anu, port offset 1
    Host anu, port offset 2
    Host anu, port offset 3
    Host anu, port offset 4
    Host anu, port offset 5
    Host anu, port offset 6
    Host anu, port offset 20
    Host anu, port offset 21
    Host anu, port offset 22
    Host anu, port offset 23
    Host anu, port offset 24
    Host anu, port offset 25
    Host anu, port offset 26
    Host calypso, port offset 10
    Host calypso, port offset 11
    Host calypso, port offset 12
    Host calypso, port offset 13
    Host calypso, port offset 14
    Host calypso, port offset 15
    Host calypso, port offset 16

This is the output from "backup listvolsets":

Volume set anu.weekly:
    Entry   1: server anu, partition /vicepa, volumes: .*\.backup

Volume set calypso.weekly:
    Entry   1: server calypso, partition /vicepa, volumes: .*\.backup

This is the output from "backup listdumps":

/full  expires in  7d
    /incr  expires in  7d

I'd thank you in advance, but I think I'll wait to see if anyone even 
read this novel, much less replies to it.

Beam Davis
Systems and Network Administrator
Transmeta Corporation
3990 Freedom Circle
Santa Clara, CA  95054

E-Mail:		beam@transmeta.com
Telephone:	(408) 919-3065
Home:		http://www.transmeta.com/

--- Where there's smoke there's fire, but where there's a vague fishy odor, it could be any number of things.