[OpenAFS] Volume access problems when DB server downed

Erwin Broschinski broschi@id.ethz.ch
Thu, 07 Mar 2002 15:01:16 +0100 (MET)


Hi,

We are running AFS 3.6 P3 on SUNs with Solaris 2.6 or 8.

One of our 5 DB-Servers was taken down for an OS upgrade. It was sync-site and
Ubik reselected another one correctly. 
In order to avoid any problems, the downed DB-Server was renamed, had a
different IP number and no AFS server processes were started until I had
finished my work on it.

While this server was down, various irritations appeared on AFS, mainly due to
volume access problems:

vos exa some_volume: When repeatedly executed on a SUN client, this command
                     would sometimes hang indefinetely.
Deleting the downed server from the client's CellServDB had no positive effect.
It seemed as if VL requests were still directed to the downed DB server.

When the server went up again everything was fine after it resynced its DB
with the current sync site.

We have seen the same effect in the past on different AFS versions and on DB
servers that were not sync site. 
It should be possible to take down a redundant DB server?
Has anybody seen this happen in OpenAFS?

Any hints are very welcome

Erwin

                                                         ''`'
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~O-O~~~~~~~
Erwin Broschinski               Tel:    +41 1 632 4281
Swiss Fed. Inst. of Technology  Fax:    +41 1 632 1225 
ETH Zentrum RZ/G8.1             E-Mail: broschi@id.ethz.ch
8092 Zurich                     PGP-key:  
Switzerland                     www.tik.ee.ethz.ch/~pgp/Search.html
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

"Ceterum censeo, 'Parvam Mollim' esse delendam."  (nach Cicero)