[OpenAFS] Solaris 10 inode server

John Tang Boyland boyland@cs.uwm.edu
Fri, 26 May 2006 10:30:08 -0500


(Since I get openafs-info in digest form, a direct cc is appreciated.)

We are expanding our small AFS cell at UWM.  We have new SPARC
blades running Solaris 10 and the inode fileserver.  But we've
found it impossible to create or release volumes to the new machine.
(The new machine is called jeremiah.cs.uwm.edu.)

I can add a readonly site, but releasing the volume to the server
hangs, as does trying to create a volume.  For example:

% vos release root.cell -verbose

root.cell 
    RWrite: 536870915     ROnly: 536870916     RClone: 536870916 
    number of sites -> 4
       server solomons.cs.uwm.edu partition /vicepa RW Site 
       server solomons.cs.uwm.edu partition /vicepa RO Site  -- New release
       server afs2.cs.uwm.edu partition /vicepa RO Site  -- New release
       server jeremiah.cs.uwm.edu partition /vicepa RO Site  -- Old release
This is a completion of the previous release
[HANG]
(Control-C followed by "vos unlock root.cell".)

The FileLog says:

Fri May 26 06:56:04 2006 File server starting
Fri May 26 06:56:04 2006 afs_krb_get_lrealm failed, using cs.uwm.edu.
Fri May 26 06:56:04 2006 Set thread id 14 for FSYNC_sync
Fri May 26 06:56:04 2006 Partition /vicepa: attaching volumes
Fri May 26 06:56:04 2006 Partition /vicepa: attached 0 volumes; 0 volumes not attached
Fri May 26 06:56:04 2006 Getting FileServer name...
Fri May 26 06:56:04 2006 FileServer host name is 'jeremiah'
Fri May 26 06:56:04 2006 Getting FileServer address...
Fri May 26 06:56:04 2006 FileServer jeremiah has address 129.89.143.70 (0x468f5981 or 0x81598f46 in host byte order)
Fri May 26 06:56:04 2006 File Server started Fri May 26 06:56:04 2006
Fri May 26 06:56:04 2006 Set thread id 15 for 'FiveMinuteCheckLWP'
Fri May 26 06:56:04 2006 Set thread id 16 for 'HostCheckLWP'
Fri May 26 06:56:04 2006 Set thread id 17 for 'FsyncCheckLWP'

While the release operation is hanging, the VolserLog says:

Fri May 26 06:56:07 2006 Starting AFS Volserver 2.0 (/usr/afs/bin/volserver)
Fri May 26 09:17:07 2006 trans 1 on volume 536870916 is older than 300 seconds
Fri May 26 09:17:37 2006 trans 1 on volume 536870916 is older than 330 seconds
Fri May 26 09:18:07 2006 trans 1 on volume 536870916 is older than 360 seconds
Fri May 26 09:18:37 2006 trans 1 on volume 536870916 is older than 390 seconds
...
Even after killing the release, it still prints out messages:
...
Fri May 26 09:48:08 2006 trans 1 on volume 536870916 is older than 2160 seconds
...

the volserver refuses any otehr requests: rxdebug shows it's still
alive, but 'vos listvol' hangs as done 'vos create'.  Stopping the 'fs'
instance hangs too:
(from bos status -long)
Instance fs, (type is fs) disabled, has core file, currently shutting down.
    Auxiliary status is: file server shutting down.
    Process last started at Fri May 26 06:56:04 2006 (2 proc starts)
    Last exit at Fri May 26 09:53:35 2006
    Command 1 is '/usr/afs/bin/fileserver'
    Command 2 is '/usr/afs/bin/volserver'
    Command 3 is '/usr/afs/bin/salvager'
(But the FileLog says:
Fri May 26 09:53:35 2006 Shutting down file server at Fri May 26 09:53:35 2006
Fri May 26 09:53:35 2006 Vice was last started at Fri May 26 06:56:04 2006

Fri May 26 09:53:35 2006 Large vnode cache, 400 entries, 0 allocs, 0 gets (0 reads), 0 writes
Fri May 26 09:53:35 2006 Small vnode cache,400 entries, 0 allocs, 0 gets (0 reads), 0 writes
Fri May 26 09:53:35 2006 Volume header cache, 400 entries, 0 gets, 0 replacements
Fri May 26 09:53:35 2006 Partition /vicepa: 418562775 available 1K blocks (minfree=4227906), Fri May 26 09:53:35 2006 418562766 free blocks
Fri May 26 09:53:35 2006 With 90 directory buffers; 0 reads resulted in 0 read I/Os
Fri May 26 09:53:35 2006 Total Client entries = 0, blocks = 0; Host entries = 0, blocks = 0
Fri May 26 09:53:35 2006 There are 0 connections, process size 137858
Fri May 26 09:53:35 2006 There are 0 workstations, 0 are active (req in < 15 mins), 0 marked "down"
Fri May 26 09:53:35 2006 VShutdown:  shutting down on-line volumes...
Fri May 26 09:53:35 2006 VShutdown:  complete.
Fri May 26 09:53:35 2006 File server has terminated normally at Fri May 26 09:53:35 2006
)
And the VolserLog has stopped adding lines.

Am I going to have to reboot to get the system to respond?

I don't see any indication of what the problem is.  Is this a problem
with the inode fileserver?  The entry from vfstab for the partition
is:

/dev/dsk/c0t600C0FF000000000098C96204C3F4A00d0s7        /dev/rdsk/c0t600C0FF000000000098C96204C3F4A00d0s7       /vicepa afs     3       yes     nologging

(Before trying to stop the fs process.)
% rxdebug jeremiah -port 7005
Trying 129.89.143.70 (port 7005):
Free packets: 193, packet reclaims: 0, calls: 3, used FDs: 6
not waiting for packets.
0 calls waiting for a thread
8 threads are idle
Connection from host 129.89.38.129, port 48474, Cuid a15f602d/cd4f1f5c
  serial 70,  natMTU 1444, flags pktCksum, security index 2, server conn
  rxkad: level clear, flags authenticated pktCksum, expires in 23.4 hours
  Received 16 bytes in 1 packets
  Sent 0 bytes in 0 packets
    call 0: # 1, state active, mode: receiving, flags: receive_done
    call 1: # 0, state not initialized
    call 2: # 0, state not initialized
    call 3: # 0, state not initialized
Connection from host 129.89.38.129, port 48583, Cuid 85144430/47e481e8
  serial 5,  natMTU 1444, flags pktCksum, security index 2, server conn
  rxkad: level clear, flags authenticated pktCksum, expires in 23.5 hours
  Received 4 bytes in 1 packets
  Sent 0 bytes in 0 packets
    call 0: # 1, state active, mode: error
    call 1: # 0, state not initialized
    call 2: # 0, state not initialized
    call 3: # 0, state not initialized
Connection from host 129.89.38.129, port 48520, Cuid 8a59d408/a9360290
  serial 28,  natMTU 1444, flags pktCksum, security index 2, server conn
  rxkad: level clear, flags authenticated pktCksum, expires in 23.5 hours
  Received 4 bytes in 1 packets
  Sent 0 bytes in 0 packets
    call 0: # 1, state active, mode: error
    call 1: # 0, state not initialized
    call 2: # 0, state not initialized
    call 3: # 0, state not initialized
Done.
% rxdebug jeremiah -port 7005 -version
Trying 129.89.143.70 (port 7005):
AFS version:  OpenAFS 1.4.1 built  2006-04-12 

Regards,
John