[OpenAFS-devel] How to create inconsistency in the volserver
and my mind.
Harald Barth
haba@pdc.kth.se
Thu, 17 Mar 2005 12:33:12 +0100 (MET)
> I suppose it's possible you could construct something that does this using
> the convert-RO-to-RW functionality that is in very recent servers. But I'd
> have to think about it for a lot longer to convince myself that this would
> actually be stable.
Yes. Something like that would be nice.
> Those aren't error messages; they're log messages. They are normal. The
> -overwrite switch doesn't mean the volume already exists; it tells vos what
> to do _if_ the volume already exists. The way it tells that is by trying
> to create the volume and looking at the error code.
The problem is that they look dangerous to the non-suspecting sysadmin.
"Abort, abort - all brace for impact" ;-)
> > Tue Mar 15 11:05:19 2005 1 Volser: Delete: volume 537057012 deleted
> > Tue Mar 15 11:05:19 2005 1 Volser: CreateVolume: volume 537057012
> > (dah.test.flopp) created Tue Mar 15 11:05:19 2005 1 Volser:
> > RestoreVolume: Error reading header file for dump; aborted
> >
> > And this is the log from the broken -overwrite full which results in
> > the vl-volser inconsistency.
> Yeah, that makes sense. The error is referring to the volserver's
> inability to read the dump header over the wire, which is not unsurprising
> since in your example, vos will never send one.
And here, aborted actually means it fell over.
> > Failed to get info about server's -2098337598 address(es) from vlserver
> > (err=0)
> -2098337598 is 0x82EDE8C2 or 130.237.232.194, houting.pdc.kth.se
Which has been in the vlserver for a long time.
> You'll note the message in question says (err=0). This message actually
> shouldn't be printed at all in that case, but the conditional was
> inadvertently removed between src/volser/vsprocs.c verisons 1.15 and 1.16,
> in DELTA no-copy-libafs-builds-20021015. What this delta has to do with
> changing the way errors are reported in vsprocs, I do not know.
Ooopsi.
> > cysteine# tail -1 BosLog
> > Mon Mar 14 18:15:08 2005: fs:vol exited on signal 6
> What version and platform?
We are OpenAFS 1.3.77 built 2005-01-18 on i386 RH9.
Seems to be the threaded beast:
cysteine# ldd /usr/openafs/libexec/openafs/volserver
libpthread.so.0 => /lib/i686/libpthread.so.0 (0x4001e000)
libresolv.so.2 => /lib/libresolv.so.2 (0x4006f000)
libc.so.6 => /lib/i686/libc.so.6 (0x40081000)
/lib/ld-linux.so.2 => /lib/ld-linux.so.2 (0x40000000)
What is the current status about OpenAFS and Linux threads? I know the
thread situation on Linux sucks in general, just tell me your best
practice, ok? :-)
> How long was it running before it exited?
>From Thu Mar 10 20:41:11 to Mon Mar 14 18:15:08.
It exited either at first vos backup or vos dump operation
from our backup scripts which are invoked 18:15:00. The
scripts seem to need about 8 secs to ask TSM what is
already backed up.
> Actually, signal 6 is SIGIOT, which generally means an abort.
> It's possible an abort message was written, but went out to the beginning
> of the log file instead of the end (stdout and stderr don't share a file
> position)
Nope, did not find anything useful at another place in the file either :-(
Harald.