[OpenAFS] Serious trouble, mounting /afs, ptserver, database rebuilding

Christof Hanke hanke@csc.fi
Thu, 24 Jul 2008 09:04:08 +0300


Don't be so unpatient, your reluctance to give proper information 
(OS,openafs-version, which services are running  on which servers and so on) 
does not really invite help, does it ?
Also, multiple emails having the same content doesn't really help.
Anyway...
First of all, 2 database-servers are a really bad idea. Either have one or 
more than three. (There has benn umpteen mails about this here)

Thus, the quickest way for now is to disable the database-services server1, 
because your output shows that server1 is not working, while server2 is.
Thus, I guess it is best for you now to disable the database instances on 
server1 and your cell should work again.
Since I don't know what you have running (kaserver or Kerberos5), I can't help 
you. 
Basically you want on server1 only the Instances "fs" and maybe "upclient" 
running.
You do this the following way:
* Shutdown server2.
* Stop all instances on server1 ("bos stop")except "fs" (and if its 
running "upclient") and then delete them ("bos delete")
* Shutdown server1
on both servers:  
remove from the file /etc/openafs/server/CellServDB the ip-adress of server1
(I just guessed the path of the CellServDB, it might be somewhere else.)

* start server2
* make sure it is up and running (also the udebug should give you a positive 
answer)
* start server1

After that, go to your boss and ask for a third database-server. 


T/Christof



On Thursday 24 July 2008 07:46:27 kanou wrote:
> So, I think there will be no help.
> So please tell me:
> Is it possible to build a system from scratch and to include all our
> user-data, all the files from our users?
> I got clean backups of all files.
>
> cheers
> kanou
>
> Am 23.07.2008 um 22:45 schrieb kanou:
> > so, i thank you all for your help. i m sorry for not beeing so
> > experienced with afs/kerberos. it s pretty new to me.
> > by now, my first server, the one that was broken, is running pretty
> > good and everybody gets their files, but still the database needs
> > rebuilding and i dont know how.
> >
> > the second one, that worked well until the other began running
> > again, is in a bad state.
> > nobody can connect to their files and i m not able to get a ticket
> > from kerberos.
> >
> > bos status -server myserver2
> > bos: no such entry (getting tickets)
> > bos: running unauthenticated
> > Instance fs, currently running normally.
> >    Auxiliary status is: file server running.
> > Instance ptserver, currently running normally.
> > Instance vlserver, currently running normally.
> >
> > and logs:
> > ==> /var/log/openafs/PtLog <==
> > ptserver: Unknown code pt 11 (267275) Can't rebuild database because
> > it is not empty
> >
> > please tell me how to rebuild the database.
> > kanou
> >
> > Am 23.07.2008 um 20:06 schrieb kanou:
> >> Well i did:
> >> udebug myserver 7002
> >> Host's addresses are:  ip-myserver
> >> Host's  ip-myserver time is Wed Jul 23 19:47:26 2008
> >> Local time is Wed Jul 23 19:47:29 2008 (time differential 3 secs)
> >> Last yes vote for ip-myserver2 was 24 secs ago (sync site);
> >> Last vote started 24 secs ago (at Wed Jul 23 19:47:05 2008)
> >> Local db version is 1214806124.7
> >> I am not sync site
> >> Lowest host ip-myserver2 was set 24 secs ago
> >> Sync host ip-myserver2 was set 24 secs ago
> >> Sync site's db version is 1214806124.7
> >> 0 locked pages, 0 of them for write
> >>
> >> udebug myserver 7003
> >> Host's addresses are: ip-myserver
> >> Host's ip-myserver time is Wed Jul 23 19:49:34 2008
> >> Local time is Wed Jul 23 19:49:35 2008 (time differential 1 secs)
> >> Last yes vote for ip-myserver was 44 secs ago (not sync site);
> >> Last vote started 44 secs ago (at Wed Jul 23 19:48:51 2008)
> >> Local db version is 1216833173.3
> >> I am not sync site
> >> Lowest host ip-myserver2 was set 5 secs ago
> >> Sync host 0.0.0.0 was set 154 secs ago
> >> Sync site's db version is 1216833173.3
> >> 0 locked pages, 0 of them for write
> >>
> >> and on myserver2:
> >> udebug myserver2 7002
> >> Host's addresses are: ip-myserver2
> >> Host's ip-myserver2 time is Wed Jul 23 19:56:38 2008
> >> Local time is Wed Jul 23 19:56:39 2008 (time differential 1 secs)
> >> Last yes vote for ip-myserver2 was 6 secs ago (sync site);
> >> Last vote started 6 secs ago (at Wed Jul 23 19:56:33 2008)
> >> Local db version is 1214806124.7
> >> I am sync site until 51 secs from now (at Wed Jul 23 19:57:30 2008)
> >> (2 servers)
> >> Recovery state 1f
> >> Sync site's db version is 1214806124.7
> >> 0 locked pages, 0 of them for write
> >>
> >> Server (ip-myserver): (db 1214806124.7)
> >>   last vote rcvd 9 secs ago (at Wed Jul 23 19:56:30 2008),
> >>   last beacon sent 6 secs ago (at Wed Jul 23 19:56:33 2008), last
> >> vote was yes
> >>   dbcurrent=1, up=1 beaconSince=1
> >>
> >> udebug myserver2 7003
> >> Host's addresses are: ip-myserver2
> >> Host's ip-myserver2 time is Wed Jul 23 19:57:50 2008
> >> Local time is Wed Jul 23 19:57:50 2008 (time differential 0 secs)
> >> Last yes vote for ip-myserver2 was 3 secs ago (sync site);
> >> Last vote started 3 secs ago (at Wed Jul 23 19:57:47 2008)
> >> Local db version is 1216835658.20
> >> I am sync site until 56 secs from now (at Wed Jul 23 19:58:46 2008)
> >> (2 servers)
> >> Recovery state 1f
> >> Sync site's db version is 1216835658.20
> >> 0 locked pages, 0 of them for write
> >> Last time a new db version was labelled was:
> >>       212 secs ago (at Wed Jul 23 19:54:18 2008)
> >>
> >> Server (ip-myserver): (db 1216835658.20)
> >>   last vote rcvd 4 secs ago (at Wed Jul 23 19:57:46 2008),
> >>   last beacon sent 3 secs ago (at Wed Jul 23 19:57:47 2008), last
> >> vote was yes
> >>   dbcurrent=1, up=1 beaconSince=1
> >>
> >> The system on myserver (the first one) is now running again but if
> >> i try to create a user i still get:
> >> pts: database needs rebuilding ; unable to create user TESTUSER
> >> with id 3563
> >> Volume 536872154 created on partition /vicepa of myserver
> >> Released volume cell.user successfully
> >> fs: Invalid argument, possible reasons include:
> >>      -File not in AFS
> >>      -Too many users on access control list
> >>      -Tried to add non-existent user to access control list
> >> pts: User or group doesn't exist ; unable to add user TESTUSER to
> >> group TESTGROUP
> >> pts: User or group doesn't exist ; unable to add user TESTUSER to
> >> group TESTAFSUSER
> >> Added replication site myserver /vicepa for volume user. TESTUSER
> >>
> >> Could please someone tell me how to rebuild the protection database?
> >> I cant bos on myserver2 because i dont get a ticket from kerberos.
> >> or does /etc/init.d/openafs-client stop and /etc/init.d/openafs-
> >> fileserver stop the job to stop the whole system?
> >>
> >> thanks all for your help
> >> kanou
> >>
> >> Am 23.07.2008 um 19:30 schrieb Hartmut Reuter:
> >>> kanou wrote:
> >>>> My logs on the second machine tell me:
> >>>> ==> /var/log/openafs/FileLog.old <==
> >>>> Wed Jul 23 19:03:37 2008 File server starting
> >>>> Wed Jul 23 19:03:37 2008 afs_krb_get_lrealm failed, myserver2.
> >>>> Wed Jul 23 19:03:37 2008 VL_RegisterAddrs rpc failed; will retry
> >>>> periodically (code=5376, err=4)
> >>>
> >>> code 5376 means no quorum elected. Are you sure your database
> >>> servers are all running?
> >>>
> >>> Try "udebug <server> 7002" for the ptserver
> >>> and "udebug <server> 7003" for the vldb
> >>>
> >>>> Wed Jul 23 19:03:37 2008 Couldn't get CPS for AnyUser, will try
> >>>> again  in 30 seconds; code=267275.
> >>>> ==> /var/log/openafs/SalvageLog <==
> >>>> 07/23/2008 19:08:27 SALVAGING OF PARTITION /vicepa COMPLETED
> >>>> and aklog gives me:
> >>>> aklog: Couldn't get hrf.uni-koeln.de AFS tickets:
> >>>> aklog: Cannot contact any KDC for requested realm while getting
> >>>> AFS  tickets
> >>>> damn! i did not do anything on that second one!
> >>>>
> >>>>> Just to make sure you're working on the correct file:
> >>>>> As I understand you first deleted the file /var/lib/openafs/db/
> >>>>> prdb.DB0.
> >>>>> This file was then probably recreated when you restarted the
> >>>>> ptserver.
> >>>>> Run this command on the backupfile you made first (or better on
> >>>>> a  copy of the backup file).
> >>>>>
> >>>>> T/Christof
> >>>>> ________________________________________
> >>>>> From: openafs-info-admin@openafs.org [openafs-info- admin@openafs.org
> >>>>> ] On Behalf Of kanou [kanou@gmx.ch]
> >>>>> Sent: Wednesday, July 23, 2008 6:46 PM
> >>>>> To: openafs-info@openafs.org
> >>>>> Subject: Re: [OpenAFS] Serious trouble, mounting /afs,
> >>>>> ptserver,  database rebuilding
> >>>>>
> >>>>> Thanks for your answer.
> >>>>> Well I found the file prdb_check. It doesnt print any errors. Only
> >>>>> thing I can find is with
> >>>>> ./prdb_check -database /var/lib/openafs/db/prdb.DB0 -uheader -
> >>>>> verbose
> >>>>> this line:
> >>>>> Ubik header size is 0 (should be 64)
> >>>>>
> >>>>> So there are no errors! I can start the server and everything runs
> >>>>> fine but the machine wont mount /afs!
> >>>>> kanou
> >>>>>
> >>>>> Am 23.07.2008 um 17:26 schrieb Steven Jenkins:
> >>>>>> On Wed, Jul 23, 2008 at 10:51 AM, kanou <kanou@gmx.ch> wrote:
> >>>>>>> Hello,
> >>>>>>> well, there is a file called db_verify.c in the folder
> >>>>>>> /usr/src/modules/openafs/ptserver but I don' know how to build
> >>>>>>> it.
> >>>>>>
> >>>>>> If I recall correctly, db_verify gets renamed to 'prdb_check'
> >>>>>> during
> >>>>>> the install, so you should check for the existence of that file.
> >>>>>>
> >>>>>> If you can't find it, you'll need to build it from source code:
> >>>>>> the
> >>>>>> directions on the AFSLore wiki are a good place to start:
> >>>>>>
> >>>>>> http://www.dementia.org/twiki/bin/view/AFSLore/HowToBuildOpenAFSFrom
> >>>>>>Source
> >>>>>>
> >>>>>> If you have problems building openafs-stable-1_4_x, you could get
> >>>>>> openafs-stable-1_4_7 instead, as that is the latest official
> >>>>>> release.
> >>>>>>
> >>>>>> Once you have built the tree, src/ptserver/db_verify should
> >>>>>> get  built,
> >>>>>> so you can simply copy it out of the source tree for your use.
> >>>>>> If it
> >>>>>> doesn't get built automatically for you, you can cd into src/
> >>>>>> ptserver
> >>>>>> and do a 'make db_verify' manuall.
> >>>>>>
> >>>>>> Also, feel free to ask for help here  or on the irc channel.
> >>>>>>
> >>>>>> Steven Jenkins
> >>>>>> End Point Corporation
> >>>>>> http://www.endpoint.com/
> >>>>>
> >>>>> _______________________________________________
> >>>>> OpenAFS-info mailing list
> >>>>> OpenAFS-info@openafs.org
> >>>>> https://lists.openafs.org/mailman/listinfo/openafs-info
> >>>>
> >>>> _______________________________________________
> >>>> OpenAFS-info mailing list
> >>>> OpenAFS-info@openafs.org
> >>>> https://lists.openafs.org/mailman/listinfo/openafs-info
> >>>
> >>> --
> >>> -----------------------------------------------------------------
> >>> Hartmut Reuter                  e-mail              reuter@rzg.mpg.de
> >>>                             phone            +49-89-3299-1328
> >>>                             fax              +49-89-3299-1301
> >>> RZG (Rechenzentrum Garching)        web    http://www.rzg.mpg.de/~hwr
> >>> Computing Center of the Max-Planck-Gesellschaft (MPG) and the
> >>> Institut fuer Plasmaphysik (IPP)
> >>> -----------------------------------------------------------------
> >>
> >> _______________________________________________
> >> OpenAFS-info mailing list
> >> OpenAFS-info@openafs.org
> >> https://lists.openafs.org/mailman/listinfo/openafs-info
> >
> > _______________________________________________
> > OpenAFS-info mailing list
> > OpenAFS-info@openafs.org
> > https://lists.openafs.org/mailman/listinfo/openafs-info
>
> _______________________________________________
> OpenAFS-info mailing list
> OpenAFS-info@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-info