[OpenAFS] Server crash

Rob Banz rob@nofocus.org
Fri, 7 Dec 2007 14:09:20 -0500


>
>
> Look at the FileLog and see what failed to attach.
>
> This is one reason I dislike that optimization.
>


For the most part, its been a win for me.  With a decent filesystem on  
the back-end, I haven't had volume attachment problems running with a  
fast-restart fileserver.  I'd say if I had seen an issue where I did  
have a multitude of volumes that needed salvaging, its not too hard to  
either write a little script to troll your FileLog and run salvager on  
the appropriate volumes -- or stop the fileserver and salvage the  
whole partition.

In the environment I was responsible for, the only time I was having  
to implement drastic measures (kill -9'ing the fileserver) was in the  
instance of those dreaded clogged RX calls due to (usually) connection  
table lockups -- and I never had a problem with using the fast-restart  
fileserver, and it brought us back into service in a few minutes  
rather than the hour+ that a salvage would cause.  Even in the couple  
instances where we did have storage go offline, at least since we used  
ZFS, everything would come up fine in the fast-restart environment...   
I think your success or failure with it is very dependent on the  
behavior of your backing filesystem and how it orders transactions...

-rob