[OpenAFS] Serious trouble, mounting /afs, ptserver, database rebuilding

kanou kanou@gmx.ch
Wed, 23 Jul 2008 20:06:55 +0200


Well i did:
udebug myserver 7002
Host's addresses are:  ip-myserver
Host's  ip-myserver time is Wed Jul 23 19:47:26 2008
Local time is Wed Jul 23 19:47:29 2008 (time differential 3 secs)
Last yes vote for ip-myserver2 was 24 secs ago (sync site);
Last vote started 24 secs ago (at Wed Jul 23 19:47:05 2008)
Local db version is 1214806124.7
I am not sync site
Lowest host ip-myserver2 was set 24 secs ago
Sync host ip-myserver2 was set 24 secs ago
Sync site's db version is 1214806124.7
0 locked pages, 0 of them for write

udebug myserver 7003
Host's addresses are: ip-myserver
Host's ip-myserver time is Wed Jul 23 19:49:34 2008
Local time is Wed Jul 23 19:49:35 2008 (time differential 1 secs)
Last yes vote for ip-myserver was 44 secs ago (not sync site);
Last vote started 44 secs ago (at Wed Jul 23 19:48:51 2008)
Local db version is 1216833173.3
I am not sync site
Lowest host ip-myserver2 was set 5 secs ago
Sync host 0.0.0.0 was set 154 secs ago
Sync site's db version is 1216833173.3
0 locked pages, 0 of them for write

and on myserver2:
udebug myserver2 7002
Host's addresses are: ip-myserver2
Host's ip-myserver2 time is Wed Jul 23 19:56:38 2008
Local time is Wed Jul 23 19:56:39 2008 (time differential 1 secs)
Last yes vote for ip-myserver2 was 6 secs ago (sync site);
Last vote started 6 secs ago (at Wed Jul 23 19:56:33 2008)
Local db version is 1214806124.7
I am sync site until 51 secs from now (at Wed Jul 23 19:57:30 2008) (2  
servers)
Recovery state 1f
Sync site's db version is 1214806124.7
0 locked pages, 0 of them for write

Server (ip-myserver): (db 1214806124.7)
     last vote rcvd 9 secs ago (at Wed Jul 23 19:56:30 2008),
     last beacon sent 6 secs ago (at Wed Jul 23 19:56:33 2008), last  
vote was yes
     dbcurrent=1, up=1 beaconSince=1

udebug myserver2 7003
Host's addresses are: ip-myserver2
Host's ip-myserver2 time is Wed Jul 23 19:57:50 2008
Local time is Wed Jul 23 19:57:50 2008 (time differential 0 secs)
Last yes vote for ip-myserver2 was 3 secs ago (sync site);
Last vote started 3 secs ago (at Wed Jul 23 19:57:47 2008)
Local db version is 1216835658.20
I am sync site until 56 secs from now (at Wed Jul 23 19:58:46 2008) (2  
servers)
Recovery state 1f
Sync site's db version is 1216835658.20
0 locked pages, 0 of them for write
Last time a new db version was labelled was:
	 212 secs ago (at Wed Jul 23 19:54:18 2008)

Server (ip-myserver): (db 1216835658.20)
     last vote rcvd 4 secs ago (at Wed Jul 23 19:57:46 2008),
     last beacon sent 3 secs ago (at Wed Jul 23 19:57:47 2008), last  
vote was yes
     dbcurrent=1, up=1 beaconSince=1

The system on myserver (the first one) is now running again but if i  
try to create a user i still get:
pts: database needs rebuilding ; unable to create user TESTUSER with  
id 3563
Volume 536872154 created on partition /vicepa of myserver
Released volume cell.user successfully
fs: Invalid argument, possible reasons include:
	-File not in AFS
	-Too many users on access control list
	-Tried to add non-existent user to access control list
pts: User or group doesn't exist ; unable to add user TESTUSER to  
group TESTGROUP
pts: User or group doesn't exist ; unable to add user TESTUSER to  
group TESTAFSUSER
Added replication site myserver /vicepa for volume user. TESTUSER

Could please someone tell me how to rebuild the protection database?
I cant bos on myserver2 because i dont get a ticket from kerberos.
or does /etc/init.d/openafs-client stop and /etc/init.d/openafs- 
fileserver stop the job to stop the whole system?

thanks all for your help
kanou


Am 23.07.2008 um 19:30 schrieb Hartmut Reuter:

> kanou wrote:
>> My logs on the second machine tell me:
>> ==> /var/log/openafs/FileLog.old <==
>> Wed Jul 23 19:03:37 2008 File server starting
>> Wed Jul 23 19:03:37 2008 afs_krb_get_lrealm failed, myserver2.
>> Wed Jul 23 19:03:37 2008 VL_RegisterAddrs rpc failed; will retry   
>> periodically (code=5376, err=4)
>
>
> code 5376 means no quorum elected. Are you sure your database  
> servers are all running?
>
> Try "udebug <server> 7002" for the ptserver
> and "udebug <server> 7003" for the vldb
>
>> Wed Jul 23 19:03:37 2008 Couldn't get CPS for AnyUser, will try  
>> again  in 30 seconds; code=267275.
>> ==> /var/log/openafs/SalvageLog <==
>> 07/23/2008 19:08:27 SALVAGING OF PARTITION /vicepa COMPLETED
>> and aklog gives me:
>> aklog: Couldn't get hrf.uni-koeln.de AFS tickets:
>> aklog: Cannot contact any KDC for requested realm while getting  
>> AFS  tickets
>> damn! i did not do anything on that second one!
>>> Just to make sure you're working on the correct file:
>>> As I understand you first deleted the file /var/lib/openafs/db/  
>>> prdb.DB0.
>>> This file was then probably recreated when you restarted the  
>>> ptserver.
>>> Run this command on the backupfile you made first (or better on a   
>>> copy of the backup file).
>>>
>>> T/Christof
>>> ________________________________________
>>> From: openafs-info-admin@openafs.org [openafs-info- admin@openafs.org 
>>> ] On Behalf Of kanou [kanou@gmx.ch]
>>> Sent: Wednesday, July 23, 2008 6:46 PM
>>> To: openafs-info@openafs.org
>>> Subject: Re: [OpenAFS] Serious trouble, mounting /afs, ptserver,   
>>> database rebuilding
>>>
>>> Thanks for your answer.
>>> Well I found the file prdb_check. It doesnt print any errors. Only
>>> thing I can find is with
>>> ./prdb_check -database /var/lib/openafs/db/prdb.DB0 -uheader - 
>>> verbose
>>> this line:
>>> Ubik header size is 0 (should be 64)
>>>
>>> So there are no errors! I can start the server and everything runs
>>> fine but the machine wont mount /afs!
>>> kanou
>>>
>>> Am 23.07.2008 um 17:26 schrieb Steven Jenkins:
>>>
>>>> On Wed, Jul 23, 2008 at 10:51 AM, kanou <kanou@gmx.ch> wrote:
>>>>
>>>>> Hello,
>>>>> well, there is a file called db_verify.c in the folder
>>>>> /usr/src/modules/openafs/ptserver but I don' know how to build it.
>>>>
>>>>
>>>> If I recall correctly, db_verify gets renamed to 'prdb_check'  
>>>> during
>>>> the install, so you should check for the existence of that file.
>>>>
>>>> If you can't find it, you'll need to build it from source code: the
>>>> directions on the AFSLore wiki are a good place to start:
>>>>
>>>> http://www.dementia.org/twiki/bin/view/AFSLore/HowToBuildOpenAFSFromSource
>>>>
>>>> If you have problems building openafs-stable-1_4_x, you could get
>>>> openafs-stable-1_4_7 instead, as that is the latest official  
>>>> release.
>>>>
>>>> Once you have built the tree, src/ptserver/db_verify should get   
>>>> built,
>>>> so you can simply copy it out of the source tree for your use.   
>>>> If it
>>>> doesn't get built automatically for you, you can cd into src/ 
>>>> ptserver
>>>> and do a 'make db_verify' manuall.
>>>>
>>>> Also, feel free to ask for help here  or on the irc channel.
>>>>
>>>> Steven Jenkins
>>>> End Point Corporation
>>>> http://www.endpoint.com/
>>>
>>>
>>> _______________________________________________
>>> OpenAFS-info mailing list
>>> OpenAFS-info@openafs.org
>>> https://lists.openafs.org/mailman/listinfo/openafs-info
>> _______________________________________________
>> OpenAFS-info mailing list
>> OpenAFS-info@openafs.org
>> https://lists.openafs.org/mailman/listinfo/openafs-info
>
>
> -- 
> -----------------------------------------------------------------
> Hartmut Reuter                  e-mail 		reuter@rzg.mpg.de
> 			   	phone 		 +49-89-3299-1328
> 			   	fax   		 +49-89-3299-1301
> RZG (Rechenzentrum Garching)   	web    http://www.rzg.mpg.de/~hwr
> Computing Center of the Max-Planck-Gesellschaft (MPG) and the
> Institut fuer Plasmaphysik (IPP)
> -----------------------------------------------------------------