[OpenAFS] Timeouts and odd behavior with 1.6.0 file servers

Jack Neely jjneely@pams.ncsu.edu
Thu, 26 Jan 2012 17:22:20 -0500


On Wed, Jan 25, 2012 at 02:22:26PM -0800, Russ Allbery wrote:
> Jack Neely <jjneely@pams.ncsu.edu> writes:
> 
> > We are working our way through a migration from old Sun AFS hardware
> > running Openafs 1.4.11 to HP Blades running RHEL 6 with OpenAFS 1.6.0.
> > At this point we've completed most of our file servers.
> 
> Don't use the 1.6.0 file server.  It has a data corruption problem when
> you have an inode clone (such as a backup volume or a migration clone) and
> directories are moved with mv.  These are fixed in 1.6.1pre1 and in the
> Debian 1.6.0-3 packages.  This may be what you're running into.
> 
> > RHEL 6 / 1.6.0 clients wired into network occasionally have long pauses
> > when doing AFS operations, such as running ls.  It may take 30 seconds
> > to a minute for the AFS server (the datacenter is downstairs) to
> > respond.  We are not seeing high load or any signs on the server that
> > something is wrong.
> 
> > The above applies as well to our web servers that are RHEL 6 / 1.6.0.
> > Several times a week load on the web servers will suddenly spike and
> > rxdebug tells us that RX calls to one of the AFS servers are all/mostly
> > in the reader_wait state.  Just as suddenly as it starts, its over with.
> 
> >     call 0: # 5231, state active, mode: receiving, flags: reader_wait
> 
> > Our cron job that mirrors CPAN to AFS space now often fails with time
> > out errors.
> 
> >     readlink_stat("/afs/...") failed: Connection timed out (110)
> 
> Yes, this is consistent with the problems that we're seeing on our web
> servers with OpenAFS 1.4 as well, which are probably due at least in part
> to the pathological idledead interactions with the way that server threads
> can back up waiting for vnode locks.  1.6.1pre2 (coming shortly) has both
> client and server fixes for the idledead part of this.
> 
> -- 
> Russ Allbery (rra@stanford.edu)             <http://www.eyrie.org/~eagle/>
> _______________________________________________
> OpenAFS-info mailing list
> OpenAFS-info@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-info

So, I grabbed the current HEAD of the openafs-stable-1_6_x branch which
looks to be prep'd for 1.6.1pre2.  I build that and deployed it to a
server I could do some testing on.  I'm seeing good results, but we
haven't finished our testing yet.

Jack
-- 
Jack Neely <jjneely@ncsu.edu>
Linux Czar, OIT Campus Linux Services
Office of Information Technology, NC State University
GPG Fingerprint: 1917 5AC1 E828 9337 7AA4  EA6B 213B 765F 3B6A 5B89