[OpenAFS] Weird AFS fileserver problem

Brian Sebby sebby@anl.gov
Thu, 8 Feb 2007 17:45:54 -0600


I've got a problematic volume on one of my file servers.  When I go to the
read-write version of the volume, I can see the data without a problem.
When I went to the read-only copy, however, I would see the following:

sebby:/usr/afsws/bin% ls
./CONFIG: Error 112

I did a vos remsite on the volumes, and when only the read-write was left,
I could see all the data.

I then did a vos listvol and saw that the .readonly volume was still there.
I tried to zap it, and it gave an error.  Now, when I try to do a "vos
listvol" or "vos examine" command on any volumes on that server, I see this:

sebby:~% vos listvol antenor.ctd.anl.gov a
Could not fetch the list of partitions from the server
Possible communication failure
Possible communication failure

sebby:~% vos exa sun4x_510.usr.DE142Nmi
Could not fetch the information about volume 1818570311 from the server
Possible communication failure
Error in vos examine command.
Possible communication failure

Dump only information from VLDB

sun4x_510.usr.DE142Nmi 
    RWrite: 1818570311    Backup: 1818570313
    number of sites -> 1
       server antenor.ctd.anl.gov partition /vicepa RW Site 


I plan to shut down the file server late tonight and doing a full fsck/salvage
on the partition (we're still running the inode server).  I just wondered if
anyone had seen this before, and if anyone had any suggestions/comments on
what could be causing this.


Thanks,

Brian

-- 
Brian Sebby  (sebby@anl.gov)  |  Unix and Operation Services
Phone: +1 630.252.9935        |  Computing and Information Systems
Fax:   +1 630.252.4601        |  Argonne National Laboratory