[OpenAFS] Puzzler: lack of access to AFS files

Steve Simmons scs@umich.edu
Thu, 13 Dec 2007 13:38:03 -0500


On Dec 12, 2007, at 10:21 PM, Christopher D. Clausen wrote:

> I know this is isn't a useful data point, but to my knowledge, none of
> the AFS servers that I maintain have lost important data due to a  
> fault
> in AFS.  Yes, some test data was lost, but that is exactly why a
> "professional" sysadmin runs tests in the first place.  Have you
> actually lost data?  Or are you just concerned about truthful warnings
> posted by the developers?  (Of course I realize that there is  
> always the
> possibility that data is corrupted and one doesn't know yet.   
> Volunteer
> and help test new builds to help reduce these posibilities or fund
> development.)

This is also true here. The cell I managed that was in such horrible  
shape was in that condition in part due to a campus-wide power outage  
that included multiple surges, file servers that were on short-life  
UPSs, unreliable ancient raids that were on shorter-life UPSs, and so  
on. Yeah: the raids lost power in mid-writes, had no memory backup,  
the servers had writes they couldn't finish, then the servers lost  
power, and the raids lost disks on powerup. Disaster upon disaster  
upon disaster. Someone who only knew AFS peripherally had to bring it  
all back up, and did an amazing job considering the circumstances.  
Some restores from tape were required. Three months later I was hired  
and starting into looking into what was wrong with these flaky  
servers...

And yet, with all that, there was no data loss or corruption that  
anyone could detect. That, my friends, is pretty goddamned reliable.

Steve