[OpenAFS] Timeouts and odd behavior with 1.6.0 file servers
Thu, 26 Jan 2012 17:22:20 -0500
On Wed, Jan 25, 2012 at 02:22:26PM -0800, Russ Allbery wrote:
> Jack Neely <firstname.lastname@example.org> writes:
> > We are working our way through a migration from old Sun AFS hardware
> > running Openafs 1.4.11 to HP Blades running RHEL 6 with OpenAFS 1.6.0.
> > At this point we've completed most of our file servers.
> Don't use the 1.6.0 file server. It has a data corruption problem when
> you have an inode clone (such as a backup volume or a migration clone) and
> directories are moved with mv. These are fixed in 1.6.1pre1 and in the
> Debian 1.6.0-3 packages. This may be what you're running into.
> > RHEL 6 / 1.6.0 clients wired into network occasionally have long pauses
> > when doing AFS operations, such as running ls. It may take 30 seconds
> > to a minute for the AFS server (the datacenter is downstairs) to
> > respond. We are not seeing high load or any signs on the server that
> > something is wrong.
> > The above applies as well to our web servers that are RHEL 6 / 1.6.0.
> > Several times a week load on the web servers will suddenly spike and
> > rxdebug tells us that RX calls to one of the AFS servers are all/mostly
> > in the reader_wait state. Just as suddenly as it starts, its over with.
> > call 0: # 5231, state active, mode: receiving, flags: reader_wait
> > Our cron job that mirrors CPAN to AFS space now often fails with time
> > out errors.
> > readlink_stat("/afs/...") failed: Connection timed out (110)
> Yes, this is consistent with the problems that we're seeing on our web
> servers with OpenAFS 1.4 as well, which are probably due at least in part
> to the pathological idledead interactions with the way that server threads
> can back up waiting for vnode locks. 1.6.1pre2 (coming shortly) has both
> client and server fixes for the idledead part of this.
> Russ Allbery (email@example.com) <http://www.eyrie.org/~eagle/>
> OpenAFS-info mailing list
So, I grabbed the current HEAD of the openafs-stable-1_6_x branch which
looks to be prep'd for 1.6.1pre2. I build that and deployed it to a
server I could do some testing on. I'm seeing good results, but we
haven't finished our testing yet.
Jack Neely <firstname.lastname@example.org>
Linux Czar, OIT Campus Linux Services
Office of Information Technology, NC State University
GPG Fingerprint: 1917 5AC1 E828 9337 7AA4 EA6B 213B 765F 3B6A 5B89