[OpenAFS-devel] find_preferred_connection: no connection and !create

Mon, 19 Mar 2018 04:27:02 -0500

On Mon, Mar 19, 2018 at 11:14:13AM +1100, Ian Wienand wrote:
> Hello,
> 
> With 1.8.0~pre5 we occasionally get
> 
>  [Fri Mar 16 08:00:41 2018] find_preferred_connection: no connection and !create
>  [Fri Mar 16 08:00:41 2018] find_preferred_connection: no connection and !create
>  [Fri Mar 16 10:00:07 2018] find_preferred_connection: no connection and !create
>  [Fri Mar 16 12:00:06 2018] find_preferred_connection: no connection and !create
>  [Fri Mar 16 14:00:07 2018] find_preferred_connection: no connection and !create
>  [Fri Mar 16 16:42:15 2018] find_preferred_connection: no connection and !create
>  [Fri Mar 16 18:21:58 2018] find_preferred_connection: no connection and !create
> 
> in the kernel logs.  You can see from [1] it's usually around the top
> of the hour when mirroring processes start; but not always.  I've had
> a look at [2] ... there doesn't seem to be anything obviously tunable
> about this?  Is it something we should worry about?

I think it should be harmless, and should probably be removed.
It looks like the only place where we call
find_preferred_connection() with create == 0 is within
afs_ConnByHost(), where we first check if there's an existing
connection to reuse, and if not, we create one.  So this message
would just be telling us that we are not reusing a cached connection
and had to make a new one, which is mostly of interest only to the
developer working on the code.

> ---
> 
> For background ... in OpenStack we have based our mirroring
> infrastructure off AFS.  We have a single host that updates from
> various upstream mirrors to RW volumes then releases them; mirror
> hosts in various remote clouds then serve the volumes via apache to
> local nodes in their own cloud.
> 
> Unfortunately this mirror updater has been very unstable lately.  In
> particular, we use "reprepro" to mirror deb-based repositories like
> Debian, Ubuntu, Ubuntu Ports, etc. and its on-disk databases are very
> sensitive to corruption of files; when it does happen, recovering or
> remirroring these big repos is not fun (others we just rsync, which is
> much more tolerant to failures).
> 
> We were previously running Trusty on this host, which would be openafs
> 1.6.7 [3].  We'd fairly regularly see things like:
> 
>  afs: Lost contact with file server 104.130.138.161 in cell openstack.org (code -512) (all multi-homed ip addresses down for the server)
>  afs: failed to store file (110)
> 
> and at the fs level we'd end up with files not written or corruption.
> 
> Anyway, it didn't seem worth spending time on such old code; we have
> upgraded the host to Xenial now, and are using a backport of the
> bionic 1.8.0~pre5 packages in a PPA [4].  This is so far working well,
> modulo the warning above.

That's great feedback to hear; thanks!

-Ben