[OpenAFS] Problem after weekly bos restart on servers

Kevin Coffman kwc@citi.umich.edu
Fri, 05 Nov 2004 15:58:39 -0500


> Kevin Coffman wrote:
> 
> >>Some of our AFS servers, which are running openafs-1.2.11 on Solaris 7 
> >>and RedHat Enterprise Linux 3, have had a problem after their weekly 
> >>restart where the following error is appearing at regular (approximately 
> >>3 min 26 seconds) intervals:
> >>
> >>Fri Nov  5 13:13:07 2004: fs:vol exited with code 1
> >>Fri Nov  5 13:16:33 2004: fs:vol exited with code 1
> >>
> >>Does anyone know what causes this error to appear?
> >>    
> >>
> >
> >Hopefully, your VolserLog (or VolserLog.old) holds a clue?
> >
> 
> VolserLog has nothing past 02:42 Fri Nov  5 in it.
> I've tried adding -log into BosConfig to see if I can get any logging 
> info from that.
> Nothing so far.
> 
> One additional note, I'm seeing the following message on a full restart 
> of AFS after shutting down AFS and then killing the bosserver:
> 
> FSYNC_clientInit temporary failure (will retry): Connection refused
> 
> I'm also seeing the following which looks worse:
> 
> FSYNC_clientInit failed (giving up!): Connection refused

The volume server attempts to create a TCP connection to the fileserver.
It is normal to see the "temporary failure" message above while the
fileserver is bringing volumes online.  (It doesn't listen for
connections until after all volumes are brought online.)  I thought
it tried for longer than 3.5 minutes, but maybe not.  Do you have
a ton of volumes on the servers that it would take longer than 3.5
minutes to attach all the volumes?

The "FSYNC_clientInit failed (giving up!): Connection refused" message
seems to be your problem.  (Is this always in VolserLog.old ?)

Any clues in the FileLog ?