[OpenAFS] Serious trouble, mounting /afs, ptserver, database rebuilding

kanou kanou@gmx.ch
Wed, 23 Jul 2008 22:45:35 +0200


so, i thank you all for your help. i m sorry for not beeing so  
experienced with afs/kerberos. it s pretty new to me.
by now, my first server, the one that was broken, is running pretty  
good and everybody gets their files, but still the database needs  
rebuilding and i dont know how.

the second one, that worked well until the other began running again,  
is in a bad state.
nobody can connect to their files and i m not able to get a ticket  
from kerberos.

bos status -server myserver2
bos: no such entry (getting tickets)
bos: running unauthenticated
Instance fs, currently running normally.
     Auxiliary status is: file server running.
Instance ptserver, currently running normally.
Instance vlserver, currently running normally.

and logs:
==> /var/log/openafs/PtLog <==
ptserver: Unknown code pt 11 (267275) Can't rebuild database because  
it is not empty

please tell me how to rebuild the database.
kanou

Am 23.07.2008 um 20:06 schrieb kanou:

> Well i did:
> udebug myserver 7002
> Host's addresses are:  ip-myserver
> Host's  ip-myserver time is Wed Jul 23 19:47:26 2008
> Local time is Wed Jul 23 19:47:29 2008 (time differential 3 secs)
> Last yes vote for ip-myserver2 was 24 secs ago (sync site);
> Last vote started 24 secs ago (at Wed Jul 23 19:47:05 2008)
> Local db version is 1214806124.7
> I am not sync site
> Lowest host ip-myserver2 was set 24 secs ago
> Sync host ip-myserver2 was set 24 secs ago
> Sync site's db version is 1214806124.7
> 0 locked pages, 0 of them for write
>
> udebug myserver 7003
> Host's addresses are: ip-myserver
> Host's ip-myserver time is Wed Jul 23 19:49:34 2008
> Local time is Wed Jul 23 19:49:35 2008 (time differential 1 secs)
> Last yes vote for ip-myserver was 44 secs ago (not sync site);
> Last vote started 44 secs ago (at Wed Jul 23 19:48:51 2008)
> Local db version is 1216833173.3
> I am not sync site
> Lowest host ip-myserver2 was set 5 secs ago
> Sync host 0.0.0.0 was set 154 secs ago
> Sync site's db version is 1216833173.3
> 0 locked pages, 0 of them for write
>
> and on myserver2:
> udebug myserver2 7002
> Host's addresses are: ip-myserver2
> Host's ip-myserver2 time is Wed Jul 23 19:56:38 2008
> Local time is Wed Jul 23 19:56:39 2008 (time differential 1 secs)
> Last yes vote for ip-myserver2 was 6 secs ago (sync site);
> Last vote started 6 secs ago (at Wed Jul 23 19:56:33 2008)
> Local db version is 1214806124.7
> I am sync site until 51 secs from now (at Wed Jul 23 19:57:30 2008)  
> (2 servers)
> Recovery state 1f
> Sync site's db version is 1214806124.7
> 0 locked pages, 0 of them for write
>
> Server (ip-myserver): (db 1214806124.7)
>    last vote rcvd 9 secs ago (at Wed Jul 23 19:56:30 2008),
>    last beacon sent 6 secs ago (at Wed Jul 23 19:56:33 2008), last  
> vote was yes
>    dbcurrent=1, up=1 beaconSince=1
>
> udebug myserver2 7003
> Host's addresses are: ip-myserver2
> Host's ip-myserver2 time is Wed Jul 23 19:57:50 2008
> Local time is Wed Jul 23 19:57:50 2008 (time differential 0 secs)
> Last yes vote for ip-myserver2 was 3 secs ago (sync site);
> Last vote started 3 secs ago (at Wed Jul 23 19:57:47 2008)
> Local db version is 1216835658.20
> I am sync site until 56 secs from now (at Wed Jul 23 19:58:46 2008)  
> (2 servers)
> Recovery state 1f
> Sync site's db version is 1216835658.20
> 0 locked pages, 0 of them for write
> Last time a new db version was labelled was:
> 	 212 secs ago (at Wed Jul 23 19:54:18 2008)
>
> Server (ip-myserver): (db 1216835658.20)
>    last vote rcvd 4 secs ago (at Wed Jul 23 19:57:46 2008),
>    last beacon sent 3 secs ago (at Wed Jul 23 19:57:47 2008), last  
> vote was yes
>    dbcurrent=1, up=1 beaconSince=1
>
> The system on myserver (the first one) is now running again but if i  
> try to create a user i still get:
> pts: database needs rebuilding ; unable to create user TESTUSER with  
> id 3563
> Volume 536872154 created on partition /vicepa of myserver
> Released volume cell.user successfully
> fs: Invalid argument, possible reasons include:
> 	-File not in AFS
> 	-Too many users on access control list
> 	-Tried to add non-existent user to access control list
> pts: User or group doesn't exist ; unable to add user TESTUSER to  
> group TESTGROUP
> pts: User or group doesn't exist ; unable to add user TESTUSER to  
> group TESTAFSUSER
> Added replication site myserver /vicepa for volume user. TESTUSER
>
> Could please someone tell me how to rebuild the protection database?
> I cant bos on myserver2 because i dont get a ticket from kerberos.
> or does /etc/init.d/openafs-client stop and /etc/init.d/openafs- 
> fileserver stop the job to stop the whole system?
>
> thanks all for your help
> kanou
>
>
> Am 23.07.2008 um 19:30 schrieb Hartmut Reuter:
>
>> kanou wrote:
>>> My logs on the second machine tell me:
>>> ==> /var/log/openafs/FileLog.old <==
>>> Wed Jul 23 19:03:37 2008 File server starting
>>> Wed Jul 23 19:03:37 2008 afs_krb_get_lrealm failed, myserver2.
>>> Wed Jul 23 19:03:37 2008 VL_RegisterAddrs rpc failed; will retry   
>>> periodically (code=5376, err=4)
>>
>>
>> code 5376 means no quorum elected. Are you sure your database  
>> servers are all running?
>>
>> Try "udebug <server> 7002" for the ptserver
>> and "udebug <server> 7003" for the vldb
>>
>>> Wed Jul 23 19:03:37 2008 Couldn't get CPS for AnyUser, will try  
>>> again  in 30 seconds; code=267275.
>>> ==> /var/log/openafs/SalvageLog <==
>>> 07/23/2008 19:08:27 SALVAGING OF PARTITION /vicepa COMPLETED
>>> and aklog gives me:
>>> aklog: Couldn't get hrf.uni-koeln.de AFS tickets:
>>> aklog: Cannot contact any KDC for requested realm while getting  
>>> AFS  tickets
>>> damn! i did not do anything on that second one!
>>>> Just to make sure you're working on the correct file:
>>>> As I understand you first deleted the file /var/lib/openafs/db/  
>>>> prdb.DB0.
>>>> This file was then probably recreated when you restarted the  
>>>> ptserver.
>>>> Run this command on the backupfile you made first (or better on  
>>>> a  copy of the backup file).
>>>>
>>>> T/Christof
>>>> ________________________________________
>>>> From: openafs-info-admin@openafs.org [openafs-info- admin@openafs.org 
>>>> ] On Behalf Of kanou [kanou@gmx.ch]
>>>> Sent: Wednesday, July 23, 2008 6:46 PM
>>>> To: openafs-info@openafs.org
>>>> Subject: Re: [OpenAFS] Serious trouble, mounting /afs, ptserver,   
>>>> database rebuilding
>>>>
>>>> Thanks for your answer.
>>>> Well I found the file prdb_check. It doesnt print any errors. Only
>>>> thing I can find is with
>>>> ./prdb_check -database /var/lib/openafs/db/prdb.DB0 -uheader - 
>>>> verbose
>>>> this line:
>>>> Ubik header size is 0 (should be 64)
>>>>
>>>> So there are no errors! I can start the server and everything runs
>>>> fine but the machine wont mount /afs!
>>>> kanou
>>>>
>>>> Am 23.07.2008 um 17:26 schrieb Steven Jenkins:
>>>>
>>>>> On Wed, Jul 23, 2008 at 10:51 AM, kanou <kanou@gmx.ch> wrote:
>>>>>
>>>>>> Hello,
>>>>>> well, there is a file called db_verify.c in the folder
>>>>>> /usr/src/modules/openafs/ptserver but I don' know how to build  
>>>>>> it.
>>>>>
>>>>>
>>>>> If I recall correctly, db_verify gets renamed to 'prdb_check'  
>>>>> during
>>>>> the install, so you should check for the existence of that file.
>>>>>
>>>>> If you can't find it, you'll need to build it from source code:  
>>>>> the
>>>>> directions on the AFSLore wiki are a good place to start:
>>>>>
>>>>> http://www.dementia.org/twiki/bin/view/AFSLore/HowToBuildOpenAFSFromSource
>>>>>
>>>>> If you have problems building openafs-stable-1_4_x, you could get
>>>>> openafs-stable-1_4_7 instead, as that is the latest official  
>>>>> release.
>>>>>
>>>>> Once you have built the tree, src/ptserver/db_verify should get   
>>>>> built,
>>>>> so you can simply copy it out of the source tree for your use.   
>>>>> If it
>>>>> doesn't get built automatically for you, you can cd into src/ 
>>>>> ptserver
>>>>> and do a 'make db_verify' manuall.
>>>>>
>>>>> Also, feel free to ask for help here  or on the irc channel.
>>>>>
>>>>> Steven Jenkins
>>>>> End Point Corporation
>>>>> http://www.endpoint.com/
>>>>
>>>>
>>>> _______________________________________________
>>>> OpenAFS-info mailing list
>>>> OpenAFS-info@openafs.org
>>>> https://lists.openafs.org/mailman/listinfo/openafs-info
>>> _______________________________________________
>>> OpenAFS-info mailing list
>>> OpenAFS-info@openafs.org
>>> https://lists.openafs.org/mailman/listinfo/openafs-info
>>
>>
>> -- 
>> -----------------------------------------------------------------
>> Hartmut Reuter                  e-mail 		reuter@rzg.mpg.de
>> 			   	phone 		 +49-89-3299-1328
>> 			   	fax   		 +49-89-3299-1301
>> RZG (Rechenzentrum Garching)   	web    http://www.rzg.mpg.de/~hwr
>> Computing Center of the Max-Planck-Gesellschaft (MPG) and the
>> Institut fuer Plasmaphysik (IPP)
>> -----------------------------------------------------------------
>
> _______________________________________________
> OpenAFS-info mailing list
> OpenAFS-info@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-info