[OpenAFS] Salvage and .gconf lock and other problems with OpenAFS 1.2.10 and 1.2.11

Renata Maria Dart Renata Maria Dart <renata@slac.stanford.edu>
Mon, 22 Mar 2004 15:34:15 -0800 (PST)


Hi, here are a few more details regarding our experience with this
problem.  

1.  We only transitioned from Transarc to OpenAFS last year 
during the summer.  Since that time our fileservers have been stable.  
We did a clean restart of them all once in September (to fix a server 
key problem), and since then there have been no issues until afs09
started having problems in January.   Along with the troubles afs09
has been seeing, users  with home directory volumes on  it started
reporting .gconf lock problems for the first time, after unclean
restarts of the fileserver.  Derrick mentioned that other AFS
admins were reporting .gconf lock problems with no fileserver
restart.  Could it be that they had the 4a.m. Sunday morning general
restart and/or the 5:00a.m. every day automatic restart still
set and perhaps didn't even know or remember it?  We have ours
turned off, both the general and the new binary.  Another thought
here is that while some users complain very quickly post-crash
about their corrupted .gconf directory, others may not use gnome
until a week or more after the crash, and if you don't find that
out, it may seem like a failure out of the blue.

2.  One of our admins who uses gnome, had their home directory volume
on afs09.  After each round of problems that started with this
fileserver crashing in January, and all of which involved an unclean 
fileserver shutdown and subsequent salvage, he experienced the 
unremovable .gconf lock.  A bos salvage normally fixed the problem 
for him.  But after the last fileserver crash it didn't.  I tried 
moving him to a different server and running a bos salvage there.  
That still didn't fix the problem.  It turned out that he still had 
an open session from when afs09 had crashed and once he quit out of 
that, a salvage did fix things. 

3.  We have a number of fileservers still at 1.2.9.  Late last night
I upgraded (to get the fix for the solaris fileserver problem
that existed in 1.2.9) and cleanly restarted one of them.  All of
our .gconf lock problems have arisen from unclean fileserver restarts,
and so I was curious what kind of fallout there would be from this.
So far there have been no reports of gnome problems and one of our
admins who's home directory volume was on that fileserver, had an
open gnome session at the time and had no problems with either it or
with launching new ones.  (I realize I am arguing here against my 
theory in item 1 above that an automatic restart might be the
cause of the .gconf locks for the admins who  reported no restart.
Perhaps it is only an intermittent problem with clean restarts and
more reliably reproducible with crashes?)

4.  And today, I had a need to restart afs09 and there appears to be no 
fallout from that restart either.  Btw, this restart was needed because
one of our admins tried to do a bos salvage and it ran amok again - the
salvage ran on for pages so he did a ctl-C and then the fileserver
started using 100% of one of the 2 cpus and it could not respond to any 
vos commands.  Can you tell me if there is any information I could gather 
to help solve this one?


Hope this is useful information,

-Renata


>Date: Sat, 20 Mar 2004 16:57:57 -0500 (EST)
>From: Dave McMurtrie <dgm+@pitt.edu>
>Subject: Re: [OpenAFS] Salvage and .gconf lock and other problems with OpenAFS 
1.2.10 and 1.2.11
>X-X-Sender: dgm@butthead.cssd.pitt.edu
>To: openafs-info@openafs.org
>MIME-version: 1.0
>Delivered-to: openafs-info@openafs.org
>X-commodore: Commodore_Business_Machines
>X-BeenThere: openafs-info@openafs.org
>X-Mailman-Version: 2.0.4
>X-PMX-Version: 4.5.0.92886, Antispam-Core: 4.0.4.93542, Antispam-Data: 
2004.3.19.94861
>List-Post: <mailto:openafs-info@openafs.org>
>List-Subscribe: <https://lists.openafs.org/mailman/listinfo/openafs-info>, 
<mailto:openafs-info-request@openafs.org?subject=subscribe>
>List-Unsubscribe: <https://lists.openafs.org/mailman/listinfo/openafs-info>, 
<mailto:openafs-info-request@openafs.org?subject=unsubscribe>
>List-Archive: <https://lists.openafs.org/pipermail/openafs-info/>
>List-Help: <mailto:openafs-info-request@openafs.org?subject=help>
>List-Id: OpenAFS Info/Discussion <openafs-info.openafs.org>
>X-Keywords: 
>
>On Sat, 20 Mar 2004, Derrick J Brashear wrote:
>
>> Well, I think maybe the key fact we just learned is a fileserver restart
>> is involved. Are you willing (do you have time) to try that test case if I
>> give you a fileserver you can knock over?
>
>No problem.  I found the code I wrote back when I was working on this
>before.  I'll wait to hear from you.
>
>Thanks,
>
>Dave
>--
>Dave McMurtrie, Systems Programmer
>University of Pittsburgh
>Computing Services and Systems Development,
>Development Services -- UNIX and VMS Services
>717P Cathedral of Learning
>(412)-624-6413
>
>PGP/GPG Key:  http://www.pitt.edu/~dgm/gpgkey.asc.txt
>_______________________________________________
>OpenAFS-info mailing list
>OpenAFS-info@openafs.org
>https://lists.openafs.org/mailman/listinfo/openafs-info

 Renata Dart                         | renata@SLAC.Stanford.edu  
 Stanford Linear Accelerator Center  |    
 2575 Sand Hill Road, MS 97          | (650) 926-2848 (office)
 Stanford, California   94025        | (650) 926-3329 (fax)