[OpenAFS] log messages I don't understand

Jeffrey Hutzelman jhutz@cmu.edu
Mon, 23 Oct 2006 14:28:14 -0400


On Monday, October 23, 2006 10:39:54 AM -0400 seth vidal 
<skvidal@linux.duke.edu> wrote:

> hi,
>  I've been getting log reports from some new (openafs 1.4.0 centos 4
> x86_64) vol servers we've setup. I'm kind of new at this so I have a few
> questions about what some of the logs mean:
>
>
> volserver[3417]: trans 843 on volume 538013895 is older than 1170
> seconds
> I get a lot of these each day

volume operations like move, release, backup, and so on are done in 
"transactions" that link several volserver RPC's into a single operation. 
In normal operation, the volserver periodically logs a message about any 
transaction that has been in existance for more than 5 minutes.  This is 
perfectly normal, as some operations on large volumes can take considerably 
longer than that.





> fileserver[4701]: CB: Call back connect back failed (in break delayed)
> for
>                   192.168.62.100:7001
>
> I think this one is just a communication failure talking back to the
> client from what I've seen in google.

Right; this means the fileserver tried to break a callback with that client 
and the client was nonresponsive.  That could mean the client has shut down 
or left the network, or that there's some network problem.


> fileserver[4496]: FindClient: client 9632c138(14277b94) already had conn
> b62030
>                  (host 67640398), stolen by client 9632c138(14277b94)

This is normal, but uncommon.  It does not itself indicate a problem, but 
provides debugging information for openafs developers that may be useful in 
debugging a problem with the client-tracking code, if one occurs (note: I 
always have trouble describing this, because I'm afraid people will think I 
mean there's a known problem, which is not the case.  However, we've found 
problems in this area to be difficult to track down, and when doing so, the 
situation that triggers this message can be confusing if you don't know it 
happened).


> fileserver[4680]: fssync: volume 537466995 restored; breaking all call
> backs

This means that the volserver had attached that volume (use 'vos listvl' or 
'vos examine' to find out what volume it is) and has now given it back, 
possibly with changes.  Typically you'll see this on the target of a move, 
release, or restore, or for an RO or backup volume which has just been 
recloned.


> volserver[4681]: 1 Volser: Clone: Recloning volume 537466996 to volume
> 537466998

And this is the volserver telling you it is recloning an RO or backup 
volume; this is what 'vos release', 'vos backup', and 'vos backupsys' do.

-- Jeffrey T. Hutzelman (N3NHS) <jhutz+@cmu.edu>
   Sr. Research Systems Programmer
   School of Computer Science - Research Computing Facility
   Carnegie Mellon University - Pittsburgh, PA