[OpenAFS] Puzzler: lack of access to AFS files
Steve Simmons
scs@umich.edu
Thu, 13 Dec 2007 13:38:03 -0500
On Dec 12, 2007, at 10:21 PM, Christopher D. Clausen wrote:
> I know this is isn't a useful data point, but to my knowledge, none of
> the AFS servers that I maintain have lost important data due to a
> fault
> in AFS. Yes, some test data was lost, but that is exactly why a
> "professional" sysadmin runs tests in the first place. Have you
> actually lost data? Or are you just concerned about truthful warnings
> posted by the developers? (Of course I realize that there is
> always the
> possibility that data is corrupted and one doesn't know yet.
> Volunteer
> and help test new builds to help reduce these posibilities or fund
> development.)
This is also true here. The cell I managed that was in such horrible
shape was in that condition in part due to a campus-wide power outage
that included multiple surges, file servers that were on short-life
UPSs, unreliable ancient raids that were on shorter-life UPSs, and so
on. Yeah: the raids lost power in mid-writes, had no memory backup,
the servers had writes they couldn't finish, then the servers lost
power, and the raids lost disks on powerup. Disaster upon disaster
upon disaster. Someone who only knew AFS peripherally had to bring it
all back up, and did an amazing job considering the circumstances.
Some restores from tape were required. Three months later I was hired
and starting into looking into what was wrong with these flaky
servers...
And yet, with all that, there was no data loss or corruption that
anyone could detect. That, my friends, is pretty goddamned reliable.
Steve