[OpenAFS-port-darwin] Mystery Problem with directories on Tiger

Garance A Drosihn drosih@rpi.edu
Sat, 10 Mar 2007 22:49:26 -0500


At 1:32 PM -0500 3/10/07, Derrick J Brashear wrote:
>1) what's in syslog.log?
>2) does it happen with 1.4.2?

I have not been able to reproduce the problem on my Mac-mini yet,
but I did have some other fun adventures.

Right now I seem to have an invocation of bash where the problem
does not go away, which made it easier to investigate.  I found
that what fails is the system routine getcwd().  The code:

	errno = 0;
	cwdres = getcwd(bigbuf, BIGBUFLEN);
	if (cwdres == 0) {
		perror("call to getcwd() failed");
		return (1);
	}
	printf("cur dir = %s\n", cwdres);

will print out:
      call to getcwd() failed: No such file or directory

if I run it from an AFS directory where I'm having the problem.
It will print out the correct "cur dir = ..." if I run it after
cd-ing to some other directory.

So then I wrote a ruby script which would take some starting
directory, and find each AFS volume underneath that directory by
doing a lot of 'fs lq' commands.  For each AFS volume found, the
script does a Dir.chdir to that directory, tries a Dir.getwd,
and catches any errors that come up.  This shows me which AFS
volumes are having the problem.

I ran the script on some reasonably-large tree of directories in
our AFS cell (1817 directories), and sure enough it discovers that
Dir.getwd is failing in the two AFS volumes I'm working in.  But
it also fails in four other AFS volumes which I'm sure I haven't
touched since the last time I rebooted.  Out of the 43 volumes
which were checked by that run of the script, the problem is only
seen on those six volumes.

And then I notice something even more bizarre...  While writing up
this script, I opened up a few more Terminal windows to test various
things.  It turns out that the script finds *no* problem volumes if
I run it with the exact same parameters from a different session of
bash!  I can even do the simple test of cd'ing to the same directory
in two different windows.  /bin/pwd works fine in one window, and
still fails in my original session!

It could be that the problem I'm seeing this time is different than
problems I've seen in the past.  This problem bash-session has
remained screwed-up for several hours now, and I've never seen that
before.  And for at least some times this has come up, I know I've
tried the same 'cd ... ; open ...' combination in multiple windows,
and either it works in all windows or it fails in all windows.

I had installed the OpenAFS-1.5.15 about 3 weeks ago, and have not
rebooted my machine in the past two weeks.  For my next step, I'll
install 1.4.2, reboot, and see if the problem shows up again.  If it
does, I can continue investigating with the simple getcwd() program
and the more elaborate ruby script.

>3) does cmdebug have anything to say?

cmdebug `hostname`

has nothing to say.


-- 
Garance Alistair Drosehn            =   gad@gilead.netel.rpi.edu
Senior Systems Programmer           or  gad@freebsd.org
Rensselaer Polytechnic Institute    or  drosih@rpi.edu