[OpenAFS-devel] How to create inconsistency in the volserver
and my mind.
Jeffrey Hutzelman
jhutz@cmu.edu
Thu, 17 Mar 2005 13:08:02 -0500
On Thursday, March 17, 2005 12:33:12 PM +0100 Harald Barth
<haba@pdc.kth.se> wrote:
>
>> I suppose it's possible you could construct something that does this
>> using the convert-RO-to-RW functionality that is in very recent
>> servers. But I'd have to think about it for a lot longer to convince
>> myself that this would actually be stable.
>
> Yes. Something like that would be nice.
>
>> Those aren't error messages; they're log messages. They are normal.
>> The -overwrite switch doesn't mean the volume already exists; it tells
>> vos what to do _if_ the volume already exists. The way it tells that
>> is by trying to create the volume and looking at the error code.
>
> The problem is that they look dangerous to the non-suspecting sysadmin.
> "Abort, abort - all brace for impact" ;-)
The non-suspecting sysadmin needs to get out of the habit of assuming that
any output produced by any program must be a horrible fatal error. Solve
that problem, and then we can talk about whether the messages are
meaningful enough.
>> > Tue Mar 15 11:05:19 2005 1 Volser: Delete: volume 537057012 deleted
>> > Tue Mar 15 11:05:19 2005 1 Volser: CreateVolume: volume 537057012
>> > (dah.test.flopp) created Tue Mar 15 11:05:19 2005 1 Volser:
>> > RestoreVolume: Error reading header file for dump; aborted
>> >
>> > And this is the log from the broken -overwrite full which results in
>> > the vl-volser inconsistency.
>
>> Yeah, that makes sense. The error is referring to the volserver's
>> inability to read the dump header over the wire, which is not
>> unsurprising since in your example, vos will never send one.
>
> And here, aborted actually means it fell over.
No, it means the volserver aborted the RPC, just like the first case.
Before, the operation it was aborting was CreateVolume; in this example,
it's RestoreVolume. Really, people who want to know the result of a
command they ran with vos should look at the output of vos, not the
contents of the volserver log.
>> > cysteine# tail -1 BosLog
>> > Mon Mar 14 18:15:08 2005: fs:vol exited on signal 6
>
>> What version and platform?
>
> We are OpenAFS 1.3.77 built 2005-01-18 on i386 RH9.
>
> Seems to be the threaded beast:
Well, then, that kills my theory that it's the 25-day bug, which only
affects LWP processes, and apparently only on fairly new Linux.
> cysteine# ldd /usr/openafs/libexec/openafs/volserver
> libpthread.so.0 => /lib/i686/libpthread.so.0 (0x4001e000)
> libresolv.so.2 => /lib/libresolv.so.2 (0x4006f000)
> libc.so.6 => /lib/i686/libc.so.6 (0x40081000)
> /lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000)
>
> What is the current status about OpenAFS and Linux threads? I know the
> thread situation on Linux sucks in general, just tell me your best
> practice, ok? :-)
Ok. My best practice is to run fileservers on SPARC Solaris, thereby
avoiding the Linux threads mess, the horrible kludge that is the namei
fileserver, and all sorts of other problems that the rest of you have seen.
:-)
Really, I can't tell you much about OpenAFS and Linux threads. Maybe
Derrick can field that one.
-- Jeff