[OpenAFS] "No Such Device" error on newly created and mounted volume

J. Maynard Gelinas gelinas@MIT.EDU
Wed, 29 Jul 2009 17:21:49 -0400


We're in the process of transitioning from OpenAFS 1.2.11 to 1.4.7.  
Both systems are running Debian, though the earlier server runs Debian  
3 while the new one Debian 5. Many of these new server hosts are  
actually Xen instances, though not all on the same physical server. I  
assume that AFS servers running under Xen should be perfectly OK.

New db and file servers running 1.4.7 are in place. I've migrated all  
volumes to the new server as well as started two secondary db servers  
running 1.4.7. I have the old main db and file server (lowest IP) and  
a secondary backup server (with LTO tape drive) running an empty file  
server both running 1.2.11.

Everything worked just great for a week or so. But now I'm seeing  
tremendously slow operations when conducting any AFS administrative  
operation. Do a vos create and it takes minutes to create the volume.  
Nighly backups have now started taking hours to complete when they  
used to be done within twenty minutes or so. And most disturbingly,  
when I try mounting a volume I successfully created, it reports  
success. But if I then try to cd into the volume or access it in any  
way I get the error message:

fs3:/afs/.lns.mit.edu/user# vos listvol afs3 vicepj
Total number of volumes on server afs3 partition /vicepj: 66
test                              536876681 RW          2 K On-line
[...]

afs3:/afs/.lns.mit.edu/public# fs mkmount test test
afs3:/afs/.lns.mit.edu/public# ls test
ls: cannot access test: No such device
afs3:/afs/.lns.mit.edu/public#

afsdbserv1:/var/lib/openafs/db# udebug afs2 7002
Host's addresses are: ***.***.***.134
Host's ***.***.***.134 time is Wed Jul 29 17:18:06 2009
Local time is Wed Jul 29 17:18:09 2009 (time differential 3 secs)
Last yes vote for ***.***.***.134 was 3 secs ago (sync site);
Last vote started 3 secs ago (at Wed Jul 29 17:18:06 2009)
Local db version is 1248882567.2
I am sync site until 56 secs from now (at Wed Jul 29 17:19:05 2009) (3  
servers)
Recovery state 1f
Sync site's db version is 1248882567.2
0 locked pages, 0 of them for write
Last time a new db version was labelled was:
	 19719 secs ago (at Wed Jul 29 11:49:30 2009)

Server (***.***.***.218): (db 1248882567.2)
     last vote rcvd 4 secs ago (at Wed Jul 29 17:18:05 2009),
     last beacon sent 3 secs ago (at Wed Jul 29 17:18:06 2009), last  
vote was yes
     dbcurrent=1, up=1 beaconSince=1

Server (***.***.***.217): (db 1248882567.2)
     last vote rcvd 4 secs ago (at Wed Jul 29 17:18:05 2009),
     last beacon sent 3 secs ago (at Wed Jul 29 17:18:06 2009), last  
vote was yes
     dbcurrent=1, up=1 beaconSince=1

port 7003 says pretty much the same thing.

Is my problem the difference between openafs 1.2.11 vs. 1.4.7 or do I  
have a deeper problem going on here?

Thanks a bunch for any suggestions.

J. Maynard Gelinas
Computer Services Manager
24-030d
617-253-5222
gelinas@mit.edu