[OpenAFS] Problems with OpenAFS 1.4.1.
Beam Davis
beam@transmeta.com
Thu, 15 Jun 2006 15:22:59 -0700
I was running an IBM AFS cell on IBM pSeries (RS/6000) hardware (on the
server side). I have clients running IBM AFS on IBM pSeries machines
(AIX 5.1) and OpenAFS on both Linux & a few legacy Solaris machines. It
was decided to migrate to OpenAFS 1.4.1 database & file servers running
Linux (2.6.14, 64-bit). I've moved over to the new Linux servers
without much difficulty, but...
We used to use Netbackup to backup our cell, but Netbackup doesn't
support OpenAFS file servers (only IBM AFS file servers). Netbackup
understood the IBM AFS vice partitions and could backup volumes directly
without the need for buserver or butc. Now we have to use buserver and
butc to dump the contents of our cell to file so Netbackup can backup
the files.
I configured buserver on our 3 new OpenAFS database servers. It is
running OK and backup can talk to it. I have 3 file servers, 2 running
OpenAFS on Linux and 1 still running IBM AFS on AIX (needed if we have
to restore any of the volumes backed up with Netbackup from the old IBM
AFS file servers).
I have configured and started an instance of butc for each day of the
week on both OpenAFS file servers (nothing is backed up from the IBM AFS
file server -- it is just for restores). I've also configured and
started a separate butc instance for each day of the week for "backup
savedb"'s.
I've included additional configuration below, but first my problem...
When I run, "backup savedb 13 -localauth", it works fine, but when I
run, "backup dump anu.weekly /full 3 -localauth", it lists all the files
it's going to backup, then it seg. faults:
...
root.projects.backup (536870929)
rs_aix51.usr.afsws.backup (536870926)
rs_aix51.usr.backup (536870923)
rs_aix51.backup (536870920)
root.cell.backup (536870917)
root.afs.backup (536870914)
Segmentation fault
The same thing happens with the other file server. I even tried
creating a volset to dump only 1 volume, and "backup dump" still seg.
faulted.
If I try to use IBM AFS's backup command to backup one of the OpenAFS
file servers from the IBM AFS file server, I get this:
...
root.projects.backup (536870929)
rs_aix51.usr.afsws.backup (536870926)
rs_aix51.usr.backup (536870923)
rs_aix51.backup (536870920)
root.cell.backup (536870917)
root.afs.backup (536870914)
backup: waiting for job termination
Starting dump
backup: Task 3001: Dump (anu.weekly.full)
...but then the butc process for port 3 seg. faults and dies on the
OpenAFS file server that was the target of the "backup dump" operation.
My question is, what the heck am I doing wrong? Does anything see a
problem with my configuration or have any idea why this stuff keeps seg.
faulting?
About how I'm starting my butc's... I've created a subdirectory under
"/usr/afs/local" for each butc instance (example:
"/usr/afs/local/butc0") and I start each one in their own subdirectory
with this command: "nohup /usr/sbin/butc -port 0 -localauth &". Of
course, I put the appropriate port number after "-port". Since they are
all started in separate subdirectories, they each write to their own
"nohup.out" file. If there is a problem with using nohup with butc (I'm
not aware of any), please let me know.
Additional Configuration Information:
My Saturday butc "CFG" file on each of the 2 OpenAFS file servers, for
example, is called "CFG_afsbackup_sat" and looks like this (actually,
all my "CFG" files look like this):
ASK NO
AUTOQUERY NO
FILE YES
NAME_CHECK NO
The "tapeconfig" from one of my file servers looks like this:
1.5T 0 /afsbackup/sun 0
1.5T 0 /afsbackup/mon 1
1.5T 0 /afsbackup/tue 2
1.5T 0 /afsbackup/wed 3
1.5T 0 /afsbackup/thu 4
1.5T 0 /afsbackup/fri 5
1.5T 0 /afsbackup/sat 6
1.5T 0 /afsbackup/sundb 20
1.5T 0 /afsbackup/mondb 21
1.5T 0 /afsbackup/tuedb 22
1.5T 0 /afsbackup/weddb 23
1.5T 0 /afsbackup/thudb 24
1.5T 0 /afsbackup/fridb 25
1.5T 0 /afsbackup/satdb 26
Ports 20-26 are for "backup savedb". The one on other file server is
pretty similar, but no "backup savedb" butc's and the ports are numbered
10-16. To "touch"'ed each on of the target files on both file servers,
so they exist (0 bytes).
This is the output from "backup listhosts":
Tape hosts:
Host anu, port offset 0
Host anu, port offset 1
Host anu, port offset 2
Host anu, port offset 3
Host anu, port offset 4
Host anu, port offset 5
Host anu, port offset 6
Host anu, port offset 20
Host anu, port offset 21
Host anu, port offset 22
Host anu, port offset 23
Host anu, port offset 24
Host anu, port offset 25
Host anu, port offset 26
Host calypso, port offset 10
Host calypso, port offset 11
Host calypso, port offset 12
Host calypso, port offset 13
Host calypso, port offset 14
Host calypso, port offset 15
Host calypso, port offset 16
This is the output from "backup listvolsets":
Volume set anu.weekly:
Entry 1: server anu, partition /vicepa, volumes: .*\.backup
Volume set calypso.weekly:
Entry 1: server calypso, partition /vicepa, volumes: .*\.backup
This is the output from "backup listdumps":
/full expires in 7d
/incr expires in 7d
I'd thank you in advance, but I think I'll wait to see if anyone even
read this novel, much less replies to it.
Beam Davis
Systems and Network Administrator
Transmeta Corporation
3990 Freedom Circle
Santa Clara, CA 95054
E-Mail: beam@transmeta.com
Telephone: (408) 919-3065
Home: http://www.transmeta.com/
--- Where there's smoke there's fire, but where there's a vague fishy odor, it could be any number of things.