[OpenAFS] VLDB corruption cause mount point go to other volume.

Benjamin Kaduk kaduk@mit.edu
Fri, 22 Feb 2019 21:34:43 -0600


On Thu, Feb 21, 2019 at 04:58:54PM +0700, Thossaporn (Pommm) Phetruphant wrote:
> Hi everyone,
> 
> I have 3 vldb/pts servers and 13 file servers in my network. All are on 
> the same subnet, same location.
> We have encountered 2nd time of corrupted VLDB where when 'cd' into a 
> mount point it go difference volume.
> 
> Example:
> live.D1 mount at /afs/domain/live/data1
> live.D2 mount at /afs/domain/live/data2
> root.cell is at /afs/domain
> 
> 
> cd /afs/domain/live/data1
> 
> 'fs exa . ' show volume named 'live.D2' mounted at this mount point
> 
> 'ls' show data in data2
> 
> or
> 
> cd /afs/domain
> 
> 'fs exa . ' show volume named 'live.D1' mounted at this mount point
> 
> 'ls' show data in data1

I may be confused -- does the difference show up for all clients or just
one?

-Ben

> At first I think NTP getting out of sync but it is not.
> I have 1 GPS NTP stratum 1 server and 2 of NTP stratum 2 on my network, 
> Nagios and Cacti report no NTP down time during this event.
> 
> 'vldb_check -database /var/lib/openafs/db/vldb.DB0'  show 'root.cell 
> (xxxxxxxxxx) has no RW volume'  and ~10 volumes also 'has no RW volume'
> 
> So, I have backup of VLDB hourly, so it can be recovered fast enough but 
> it is 2nd time that this happen.
> Is anyone known why this would happen?  How can we prevent it?
> 
> Best regards,
> 
> Pommm
> 
> _______________________________________________
> OpenAFS-info mailing list
> OpenAFS-info@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-info