[OpenAFS] VLDB corruption cause mount point go to other volume.

Mark Vitale mvitale@sinenomine.net
Thu, 21 Feb 2019 14:34:14 +0000


Pomm,

Thank you for your report.  Could you provide some more details (inline bel=
ow)?

> On Feb 21, 2019, at 4:58 AM, Thossaporn (Pommm) Phetruphant <pommm@yannix=
.com> wrote:
>=20
> I have 3 vldb/pts servers and 13 file servers in my network. All are on t=
he same subnet, same location.
> We have encountered 2nd time of corrupted VLDB where when 'cd' into a mou=
nt point it go difference volume.
>=20
> Example:
> live.D1 mount at /afs/domain/live/data1
> live.D2 mount at /afs/domain/live/data2
> root.cell is at /afs/domain
>=20
>=20
> cd /afs/domain/live/data1
>=20
> 'fs exa . ' show volume named 'live.D2' mounted at this mount point
>=20
> 'ls' show data in data2
>=20
> or
>=20
> cd /afs/domain
>=20
> 'fs exa . ' show volume named 'live.D1' mounted at this mount point
>=20
> 'ls' show data in data1

Mount point information is stored in the fileserver vice partitions, not in=
 the VLDB.
What version of AFS are you using for your vlservers, fileservers, and cach=
e managers (clients)?
And what operating system and version do your clients run on?

> <snip>
>=20
> 'vldb_check -database /var/lib/openafs/db/vldb.DB0'  show 'root.cell (xxx=
xxxxxxx) has no RW volume'  and ~10 volumes also 'has no RW volume'
>=20
> So, I have backup of VLDB hourly, so it can be recovered fast enough but =
it is 2nd time that this happen.
> Is anyone known why this would happen?  How can we prevent it?

Are you running vldb_check against a live VLDB?
Do the VLDB entries for the apparently corrupted volumes change frequently?
Are you taking any steps to ensure the VLDB is not changing when you back i=
t up?
Could you provide more details about the steps you take to recover your VLD=
B?

Regards,
--
Mark Vitale
mvitale@sinenomine.net