[OpenAFS] Backup/Determining Ubik Coordinator/Verifying Databases
Thu, 10 Apr 2008 00:31:39 -0400
"Joseph D. Kulisics" <email@example.com> writes:
> backup> deletedump -to 03/21/2008
> The following dumps were deleted:
> backup: RPC interface mismatch (-452) ; Error while deleting dumps from 0 t=
> o 1206082859
-452 = RXGEN_SS_MARSHAL "server marshall failed".
This is usually generated at runtime by code on the server after
an operation was completed and the results are being packed onto the wire
to be returned to the client.
In your case, "while deleting dumps" - can only happen as
a result of calling bcdb_listDumps. This maps into a call
to ubik_Call ( BUDB_ListDumps and on the server . This
returns a list of dumps bounded by BUDB_MAX_RETURN_LIST (1000).
Perhaps you tried to delete more than 1000 dumps?
Break your deletes down to no more than 1000 per run.
> 1. How can I determine which server has been elected to be the Ubik write c=
I doubt this is a ubik error (see above). However,
udebug <db-server> 7021
will tell you what ubik thinks is happening with budb, including who is
the sync site. On the sync site, udebug will print additional
information, including "Recovery state". You want to see "1f".
Replace 7021 with 7002 or 7003 to find information on pt and vl,
although those are most likely not your worry here.
> 2. Is there a way to check the consistency of the various databases across =
> all of the database servers?
I think you could build and run "ol_verify". This is probably
not your best strategy.
If you really think your backup database is corrupted, it's probably
simplier to build it from scratch. You'll have to scan each dump,
but if you've got them online this may be pretty cheap. Transarc used
to recommend keeping frequent backups of the backup database, and
they probably also recommended purging it on a regular basis of backups
that were no longer of interest.
Before doing either of these, you should certainly save your current
backup database, on each machine.