[OpenAFS] afs memcache tuning... lockups in afs_cv_wait

Mike Polek mike@pictage.com
Tue, 12 Feb 2008 19:46:37 -0800


This is a multi-part message in MIME format.
--------------070700010300050305010201
Content-Type: text/plain; charset=us-ascii; format=flowed
Content-Transfer-Encoding: 7bit

Hello, all,
   I'm working on some performance tuning using Linux servers with
AFS clients and ftp server software to shuttle data from users
on the net to my AFS file servers. While working on tuning
things for performance and stability, the number one issue
I run into is that if things get "too busy," all the processes
go into a holding pattern where ps axo wchan shows them all
in afs_cv_wait.
   I know this sort of thing has been posted on the list
before, but I don't know if there was any resolution
beyond "bug in version 1.4.1, upgrade." I'm running
Fedora core 6 and openafs-1.4.4 servers and clients.

   If I throttle things a bit at the ftp server, and keep
all the streams under 2Mbps, it generally works ok.
If I bump it up a little bit, I occasionally see
processes go into and out of afs_cv_wait as I watch them.
If I bump up the bandwidth allowance too high, they
all go into that state and stay there, locking up
the afs cache completely, or so it seems to me.
   I've attached the stack trace which shows the
trace for the proftpd process. (proftpd 1.3.1 if it's
important). All the proftpd processes look pretty much
the same, but if anyone wants to see the whole file,
please ask and I'll provide it.

   From what I can tell, RX wants to allocate a packet
to send data on the network, but there are none available,
so it decides to just hang out and wait, assuming that
a response will come back from the file server eventually,
somebody will free up a packet, and things will continue.
In the previous posts, I believe it was stated that packets
did come back from the file server, but perhaps were not
processed by the client. I'm wondering if there is some
sort of race condition where when things get busy, the
client can't process the returning packets, and everything
just deadlocks.

   Any tips/pointers? I'm ok with just capping the usage
and declaring the limit to be somewhere below the threshold
where things go horribly wrong. But if there is a way to
get performance to degrade gracefully under a high load,
that would be preferred.

   And if I forgot to post some important vital and
super obvious piece of information... apologies in advance. ;-)

Thanks,
Mike

P.S. It would be great to see a talk on client/server performance
      tuning, not only Windows/*nix, but diskcache/memcache, too,
      at the upcoming Best Practices 2008 Workshop. I don't have
      enough experience to give a talk on the subject, but I'll
      certainly be one of the attentive folks in the audience!




-- 
Michael Polek
Director of System Operations
1580 Francisco Street, Suite 101
Torrance, CA 90501
Phone: (310) 525-1600 ext. 628
Email: mike@pictage.com
http://www.pictage.com

--------------070700010300050305010201
Content-Type: text/plain;
 name="sysrq-trace.txt"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="sysrq-trace.txt"

Feb 12 17:37:51 ftpserver9 kernel: SysRq : Show State
Feb 12 17:37:51 ftpserver9 kernel: 
Feb 12 17:37:51 ftpserver9 kernel:                          free                        sibling
Feb 12 17:37:51 ftpserver9 kernel:   task             PC    stack   pid father child younger older
Feb 12 17:37:51 ftpserver9 kernel: init          S C2491B48   732     1      0     2               (NOTLB)
Feb 12 17:37:51 ftpserver9 kernel:        c2491b5c 00000082 00000002 c2491b48 c2491b44 00000000 cacda840 c0620a05 
Feb 12 17:37:51 ftpserver9 kernel:        c061f3b9 c2491b1c 0000000a c2493630 c06fc480 56d223c7 00000093 00004e34 
Feb 12 17:37:51 ftpserver9 kernel:        c249373c c23fcb80 00000000 cacda840 00051585 00000286 c2491b6c ffffffff 
Feb 12 17:37:51 ftpserver9 kernel: Call Trace:
Feb 12 17:37:51 ftpserver9 kernel:  [<c0620a05>] _spin_unlock_irq+0x5/0x7
Feb 12 17:37:51 ftpserver9 kernel:  [<c061f3b9>] __sched_text_start+0x999/0xa21
Feb 12 17:37:51 ftpserver9 kernel:  [<c061fb17>] schedule_timeout+0x70/0x8d
Feb 12 17:37:52 ftpserver9 kernel:  [<c04378de>]  2098          2511  2508 (NOTLB)
Feb 12 17:37:52 ftpserver9 kernel:        c985ca44 00000082 00000002 c985ca30 c985ca2c 00000000 00000000 00000000 
Feb 12 17:37:52 ftpserver9 kernel:        00000082 c24e05c0 0000000a c2510830 c8546730 49b84dda 00000093 0000104a 
Feb 12 17:37:52 ftpserver9 kernel:        c251093c c2404b80 00000001 ca6b7980 cbb2da60 00000000 00000246 cbb2daf0 
Feb 12 17:37:52 ftpserver9 kernel: Call Trace:
Feb 12 17:37:52 ftpserver9 kernel:  [<cbaef9a4>] afs_cv_wait+0x115/0x20f [openafs]
Feb 12 17:37:52 ftpserver9 kernel:  [<c042264f>] default_wake_function+0x0/0xc
Feb 12 17:37:52 ftpserver9 kernel:  [<cbaf16be>] rxi_AllocSendPacket+0xe0/0x116 [openafs]
Feb 12 17:37:52 ftpserver9 kernel:  [<cbaee584>] rxi_WriteProc+0x17e/0x312 [openafs]
Feb 12 17:37:52 ftpserver9 kernel:  [<cbaee786>] rx_WriteProc32+0x6e/0x7d [openafs]
Feb 12 17:37:52 ftpserver9 kernel:  [<cbaf28e6>] xdrrx_putint32+0x14/0x1f [openafs]
Feb 12 17:37:52 ftpserver9 kernel:  [<cbaf7407>] afs_xdr_int+0x2d/0x4c [openafs]
Feb 12 17:37:52 ftpserver9 kernel:  [<cbae43df>] StartRXAFS_StoreData64+0x21/0x7c [openafs]
Feb 12 17:37:52 ftpserver9 kernel:  [<cbac959b>] afs_StoreAllSegments+0x705/0x19f2 [openafs]
Feb 12 17:37:52 ftpserver9 kernel:  [<c042ede5>] lock_timer_base+0x15/0x2f
Feb 12 17:37:52 ftpserver9 kernel:  [<c061fd43>] mutex_lock+0x1a/0x29
Feb 12 17:37:52 ftpserver9 kernel:  [<cbac6756>] afs_MemWriteUIO+0x1d5/0x265 [openafs]
Feb 12 17:37:52 ftpserver9 kernel:  [<cbac82e2>] PagInCred+0x28/0x93 [openafs]
Feb 12 17:37:52 ftpserver9 kernel:  [<cbabd639>] afs_AdjustSize+0x4e/0x64 [openafs]
Feb 12 17:37:52 ftpserver9 kernel:  [<cbadf2a9>] afs_MemWrite+0x7ea/0x7fe [openafs]
Feb 12 17:37:52 ftpserver9 kernel:  [<cbac82e2>] PagInCred+0x28/0x93 [openafs]
Feb 12 17:37:52 ftpserver9 kernel:  [<cbafab57>] afs_linux_writepage_sync+0x1b7/0x29e [openafs]
Feb 12 17:37:52 ftpserver9 kernel:  [<c05e127e>] ip_queue_xmit+0x3b2/0x3f4
Feb 12 17:37:52 ftpserver9 kernel:  [<cbafac3e>] afs_linux_commit_write+0x0/0xf [openafs]
Feb 12 17:37:52 ftpserver9 kernel:  [<c0457a7f>] generic_file_buffered_write+0x3f7/0x607
Feb 12 17:37:52 ftpserver9 kernel:  [<c04ed600>] copy_to_user+0x3c/0x50
Feb 12 17:37:52 ftpserver9 kernel:  [<c042b0d2>] current_fs_time+0x4f/0x59
Feb 12 17:37:52 ftpserver9 kernel:  [<c0458170>] __generic_file_aio_write_nolock+0x4e1/0x55a
Feb 12 17:37:52 ftpserver9 kernel:  [<c0458241>] generic_file_aio_write+0x58/0xb6
Feb 12 17:37:52 ftpserver9 kernel:  [<c04723a5>] do_sync_write+0xc7/0x10a
Feb 12 17:37:52 ftpserver9 kernel:  [<c043775d>] autoremove_wake_function+0x0/0x35
Feb 12 17:37:52 ftpserver9 kernel:  [<c043a072>] ktime_get_ts+0x16/0x44
Feb 12 17:37:52 ftpserver9 kernel:  [<c04c4de7>] file_has_perm+0x8c/0x94
Feb 12 17:37:52 ftpserver9 kernel:  [<cbac82e2>] PagInCred+0x28/0x93 [openafs]
Feb 12 17:37:52 ftpserver9 kernel:  [<cbafb13e>] afs_linux_write+0x1a4/0x31e [openafs]
Feb 12 17:37:52 ftpserver9 kernel:  [<cbafaf9a>] afs_linux_write+0x0/0x31e [openafs]
Feb 12 17:37:52 ftpserver9 kernel:  [<c0472be4>] vfs_write+0xa8/0x154
Feb 12 17:37:52 ftpserver9 kernel:  [<c04731ff>] sys_write+0x41/0x67
Feb 12 17:37:52 ftpserver9 kernel:  [<c0403f64>] syscall_call+0x7/0xb
Feb 12 17:37:52 ftpserver9 kernel:  =======================
Feb 12 17:37:52 ftpserver9 kernel: proftpd       S FFFFFF82  1204  2511   2098          2512  2509 (NOTLB)
Feb 12 17:37:52 ftpserver9 kernel:        c984fa38 00000086 00000000 ffffff82 c061f40c 00000002 c984fa30 c984fa2c 
Feb 12 17:37:52 ftpserver9 kernel:        00000000 00000000 00000009 c25118b0 c9b0eef0 49bc5470 00000093 00002525 
Feb 12 17:37:52 ftpserver9 kernel:        c25119bc c2404b80 00000001 ca603dc0 cbb2da60 00000000 00000246 cbb2daf0 
Feb 12 17:37:52 ftpserver9 kernel: Call Trace:
Feb 12 17:37:52 ftpserver9 kernel:  [<c061f40c>] __sched_text_start+0x9ec/0xa21
Feb 12 17:37:52 ftpserver9 kernel:  [<cbaef9a4>] afs_cv_wait+0x115/0x20f [openafs]
Feb 12 17:37:52 ftpserver9 kernel:  [<c042264f>] default_wake_function+0x0/0xc
Feb 12 17:37:52 ftpserver9 kernel:  [<cbaf16be>] rxi_AllocSendPacket+0xe0/0x116 [openafs]
Feb 12 17:37:52 ftpserver9 kernel:  [<cbaee23b>] rxi_WritevAlloc+0xab/0x233 [openafs]
Feb 12 17:37:52 ftpserver9 kernel:  [<cbaee3f4>] rx_WritevAlloc+0x31/0x43 [openafs]
Feb 12 17:37:52 ftpserver9 kernel:  [<cbac6940>] afs_MemCacheStoreProc+0xd2/0x24e [openafs]
Feb 12 17:37:52 ftpserver9 kernel:  [<cbac97cf>] afs_StoreAllSegments+0x939/0x19f2 [openafs]
Feb 12 17:37:52 ftpserver9 kernel:  [<c042ede5>] lock_timer_base+0x15/0x2f
Feb 12 17:37:52 ftpserver9 kernel:  [<c061fd43>] mutex_lock+0x1a/0x29
Feb 12 17:37:52 ftpserver9 kernel:  [<cbac6756>] afs_MemWriteUIO+0x1d5/0x265 [openafs]
Feb 12 17:37:52 ftpserver9 kernel:  [<cbac82e2>] PagInCred+0x28/0x93 [openafs]
Feb 12 17:37:52 ftpserver9 kernel:  [<cbabd639>] afs_AdjustSize+0x4e/0x64 [openafs]
Feb 12 17:37:52 ftpserver9 kernel:  [<cbadf2a9>] afs_MemWrite+0x7ea/0x7fe [openafs]
Feb 12 17:37:52 ftpserver9 kernel:  [<cbafab57>] afs_linux_writepage_sync+0x1b7/0x29e [openafs]
Feb 12 17:37:52 ftpserver9 kernel:  [<c05e127e>] ip_queue_xmit+0x3b2/0x3f4
Feb 12 17:37:52 ftpserver9 kernel:  [<cbafac3e>] afs_linux_commit_write+0x0/0xf [openafs]
Feb 12 17:37:52 ftpserver9 kernel:  [<c0457a7f>] generic_file_buffered_write+0x3f7/0x607
Feb 12 17:37:52 ftpserver9 kernel:  [<c04ed600>] copy_to_user+0x3c/0x50
Feb 12 17:37:52 ftpserver9 kernel:  [<c042b0d2>] current_fs_time+0x4f/0x59
Feb 12 17:37:52 ftpserver9 kernel:  [<c0458170>] __generic_file_aio_write_nolock+0x4e1/0x55a
Feb 12 17:37:52 ftpserver9 kernel:  [<c0458241>] generic_file_aio_write+0x58/0xb6
Feb 12 17:37:52 ftpserver9 kernel:  [<c04723a5>] do_sync_write+0xc7/0x10a
Feb 12 17:37:52 ftpserver9 kernel:  [<c043775d>] autoremove_wake_function+0x0/0x35
Feb 12 17:37:52 ftpserver9 kernel:  [<c043a072>] ktime_get_ts+0x16/0x44
Feb 12 17:37:52 ftpserver9 kernel:  [<c04c4de7>] file_has_perm+0x8c/0x94
Feb 12 17:37:52 ftpserver9 kernel:  [<cbac82e2>] PagInCred+0x28/0x93 [openafs]
Feb 12 17:37:52 ftpserver9 kernel:  [<cbafb13e>] afs_linux_write+0x1a4/0x31e [openafs]
Feb 12 17:37:52 ftpserver9 kernel:  [<cbafaf9a>] afs_linux_write+0x0/0x31e [openafs]
Feb 12 17:37:52 ftpserver9 kernel:  [<c0472be4>] vfs_write+0xa8/0x154
Feb 12 17:37:52 ftpserver9 kernel:  [<c04731ff>] sys_write+0x41/0x67
Feb 12 17:37:52 ftpserver9 kernel:  [<c0403f64>] syscall_call+0x7/0xb
Feb 12 17:37:52 ftpserver9 kernel:  =======================

--------------070700010300050305010201--