[OpenAFS] File Server appears to stop responding

Randy Kemp rkemp@srhs.net
Fri, 26 Sep 2008 15:17:38 -0400


This is a multi-part message in MIME format.
--------------040604060303050400080406
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit

I've been experiencing a problem where fileserver appears to simply stop 
responding to requests.  I currently only have one AFS server (OpenAFS 
version 1.4.7).

Currently the primary host accessing AFS is an application server for 
104 thin-clients.  This problem seems to arise when a large number of 
users (30+) try to log in to this application server at about the same 
time.  The users home directories are on AFS.  Once it stops responding 
no host is able to access it.  Not even the local client.

The AFS server has 5 interfaces on different networks.  The application 
server has 3 interfaces on 3 of the 5 networks.  However, the CellServDB 
on the application server only references the address for one of the 
interfaces on the AFS server because that's the route I want the traffic 
to take.

This setup seemed to be working fine for about a month before the 
problem started occurring.  In the FileLog I'm now frequently seeing 
"CallPreamble: Couldn't get CPS. Too many lockers" even when everything 
appears to be working correctly.  When it does occur, restarting the 
OpenAFS daemons does not fix the problem.  I have to completely reboot 
the server to resolve it.  I've attached the FileLog from the last 
occurrence.  This log was generated after I restarted fileserver and it 
still wasn't working.

Any advice would be greatly appreciated.  I've been trying to figure 
this out for a week now and from what I've been able to find, no one 
seems to have had quite this problem.  Let me know if any other info 
would be helpful.

-- 
Randy Kemp


--------------040604060303050400080406
Content-Type: text/plain;
 name="FileLog"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline;
 filename="FileLog"

Fri Sep 26 13:47:30 2008 File server starting
Fri Sep 26 13:47:30 2008 afs_krb_get_lrealm failed, using phoenix.srhs.net.
Fri Sep 26 13:47:30 2008 VL_RegisterAddrs rpc failed; will retry periodically (code=5376, err=0)
Fri Sep 26 13:47:30 2008 Set thread id 13 for FSYNC_sync
Fri Sep 26 13:47:30 2008 FSYNC_sync: bind failed with (98), removed bogus /var/lib/openafs/fssync.sock
Fri Sep 26 13:47:30 2008 Partition /vicepa: attaching volumes
Fri Sep 26 13:47:30 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:30 2008 Partition /vicepa: attached 1863 volumes; 0 volumes not attached
Fri Sep 26 13:47:30 2008 Set thread id 15 for 'FiveMinuteCheckLWP'
Fri Sep 26 13:47:30 2008 Set thread id 16 for 'HostCheckLWP'
Fri Sep 26 13:47:30 2008 Set thread id 17 for 'FsyncCheckLWP'
Fri Sep 26 13:47:30 2008 Getting FileServer name...
Fri Sep 26 13:47:30 2008 FileServer host name is 'afs'
Fri Sep 26 13:47:30 2008 Getting FileServer address...
Fri Sep 26 13:47:30 2008 FileServer afs has address 10.141.4.2 (0x2048d0a or 0xa8d0402 in host byte order)
Fri Sep 26 13:47:30 2008 File Server started Fri Sep 26 13:47:30 2008
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:47:59 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:49:38 2008 CB: RCallBackConnectBack failed for host 1c77c010 (10.141.16.3:7001)
Fri Sep 26 13:49:38 2008 CB: RCallBackConnectBack failed for host 1c77c148 (10.141.5.123:7001)
Fri Sep 26 13:50:07 2008 CB: RCallBackConnectBack failed for host 1c77c628 (192.168.0.2:7001)
Fri Sep 26 13:51:46 2008 CB: WhoAreYou failed for host 1c77c010 (10.141.16.3:7001), error -03
Fri Sep 26 13:51:46 2008 CB: WhoAreYou failed for host 1c77c148 (10.141.5.123:7001), error -03
Fri Sep 26 13:52:08 2008 CB: Call back connect back failed (in break delayed) for Host 192.168.0.2:7001
Fri Sep 26 13:52:08 2008 BreakDelayedCallbacks FAILED for host 192.168.0.2:7001 which IS UP.  Connection from 192.168.0.2:7001.  Possible network or routing failure.
Fri Sep 26 13:52:08 2008 MultiProbe failed to find new address for host 192.168.0.2:7001
Fri Sep 26 13:52:08 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:52:08 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:52:08 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:52:08 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:52:08 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:52:08 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:52:08 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:52:08 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:52:08 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:52:08 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:52:08 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:52:08 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:52:08 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:52:08 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:52:08 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:52:08 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:52:08 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:52:08 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:52:08 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:52:08 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:52:08 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:52:08 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:52:08 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:52:08 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:52:08 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:52:08 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:52:08 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:52:08 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:52:08 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:52:08 2008 CallPreamble: Couldn't get CPS. Too many lockers
Fri Sep 26 13:53:47 2008 CB: WhoAreYou failed for host 1c77c010 (10.141.16.3:7001), error -03
Fri Sep 26 13:53:47 2008 CB: WhoAreYou failed for host 1c77c148 (10.141.5.123:7001), error -03
Fri Sep 26 13:54:16 2008 CB: Call back connect back failed (in break delayed) for Host 192.168.0.2:7001
Fri Sep 26 13:54:16 2008 BreakDelayedCallbacks FAILED for host 192.168.0.2:7001 which IS UP.  Connection from 192.168.0.2:7001.  Possible network or routing failure.
Fri Sep 26 13:54:16 2008 MultiProbe failed to find new address for host 192.168.0.2:7001
Fri Sep 26 13:55:55 2008 CB: WhoAreYou failed for host 1c77c010 (10.141.16.3:7001), error -03
Fri Sep 26 13:55:55 2008 CB: WhoAreYou failed for host 1c77c148 (10.141.5.123:7001), error -03
Fri Sep 26 13:56:24 2008 CB: WhoAreYou failed for host 1c77c628 (192.168.0.2:7001), error -03
Fri Sep 26 13:58:03 2008 CB: Call back connect back failed (in break delayed) for Host 10.141.5.123:7001
Fri Sep 26 13:58:03 2008 BreakDelayedCallbacks FAILED for host 10.141.5.123:7001 which IS UP.  Connection from 10.141.5.123:7001.  Possible network or routing failure.
Fri Sep 26 13:58:03 2008 MultiProbe failed to find new address for host 10.141.5.123:7001
Fri Sep 26 13:58:03 2008 CB: Call back connect back failed (in break delayed) for Host 10.141.16.3:7001
Fri Sep 26 13:58:03 2008 BreakDelayedCallbacks FAILED for host 10.141.16.3:7001 which IS UP.  Connection from 10.141.16.3:7001.  Possible network or routing failure.
Fri Sep 26 13:58:03 2008 MultiProbe failed to find new address for host 10.141.16.3:7001
Fri Sep 26 13:58:30 2008 CB: Call back connect back failed (in break delayed) for Host 192.168.0.2:7001
Fri Sep 26 13:58:30 2008 BreakDelayedCallbacks FAILED for host 192.168.0.2:7001 which IS UP.  Connection from 192.168.0.2:7001.  Possible network or routing failure.
Fri Sep 26 13:58:30 2008 MultiProbe failed to find new address for host 192.168.0.2:7001
Fri Sep 26 14:00:11 2008 CB: Call back connect back failed (in break delayed) for Host 10.141.5.123:7001
Fri Sep 26 14:00:11 2008 BreakDelayedCallbacks FAILED for host 10.141.5.123:7001 which IS UP.  Connection from 10.141.5.123:7001.  Possible network or routing failure.
Fri Sep 26 14:00:11 2008 MultiProbe failed to find new address for host 10.141.5.123:7001
Fri Sep 26 14:00:11 2008 CB: Call back connect back failed (in break delayed) for Host 10.141.16.3:7001
Fri Sep 26 14:00:11 2008 BreakDelayedCallbacks FAILED for host 10.141.16.3:7001 which IS UP.  Connection from 10.141.16.3:7001.  Possible network or routing failure.
Fri Sep 26 14:00:11 2008 MultiProbe failed to find new address for host 10.141.16.3:7001
Fri Sep 26 14:00:36 2008 Shutting down file server at Fri Sep 26 14:00:36 2008
Fri Sep 26 14:00:36 2008 Vice was last started at Fri Sep 26 13:47:30 2008

Fri Sep 26 14:00:36 2008 Large vnode cache, 400 entries, 0 allocs, 3 gets (1 reads), 0 writes
Fri Sep 26 14:00:36 2008 Small vnode cache,400 entries, 0 allocs, 0 gets (0 reads), 0 writes
Fri Sep 26 14:00:36 2008 Volume header cache, 400 entries, 3 gets, 1 replacements
Fri Sep 26 14:00:36 2008 Partition /vicepa: 1941465536 available 1K blocks (minfree=0), Fri Sep 26 14:00:36 2008 1933869296 free blocks
Fri Sep 26 14:00:36 2008 With 90 directory buffers; 0 reads resulted in 0 read I/Os
Fri Sep 26 14:00:36 2008 Total Client entries = 31, blocks = 1; Host entries = 9, blocks = 1
Fri Sep 26 14:00:36 2008 There are 29 connections, process size 40276
Fri Sep 26 14:00:36 2008 There are 9 workstations, 9 are active (req in < 15 mins), 3 marked "down"
Fri Sep 26 14:00:36 2008 VShutdown:  shutting down on-line volumes...
Fri Sep 26 14:00:36 2008 VShutdown:  complete.
Fri Sep 26 14:00:36 2008 File server has terminated normally at Fri Sep 26 14:00:36 2008

12 debug 0 params 0 unused 0 unused 0 unused 0 version 0 
   other send counters: ack 27395, data 1272 (not resends), resends 266, pushed 0, acked&ignored 1
   	(these should be small) sendFailed 0, fatalErrors 0
   Average rtt is 0.002, with 1 samples
   Minimum rtt is 0.002, maximum is 0.002
   228 server connections, 21 client connections, 11 peer structs, 341 call structs, 28 free call structs
3 add CB, 0 break CB, 0 del CB, 0 del FE, 0 CB's timed out, 0 space reclaim, 0 del host
1 CBs, 1 FEs, (2 of total of 60000 16-byte blocks)

--------------040604060303050400080406--