[OpenAFS-devel] CopyOnWrite failure

Derrick J Brashear shadow@dementia.org
Tue, 12 Mar 2002 23:46:37 -0500 (EST)


On Tue, 5 Mar 2002, Derrick J Brashear wrote:

> On Tue, 5 Mar 2002, Matthew N. Andrews wrote:
> 
> > does anyone have any suggestions about where I might proceed with
> > respect to tracking this down, and fixing it?
> 
> Due to other things which have arisen, I suspect the pthread fileserver. I
> can't prove it and I'm not sure how to narrow it down other than "try the
> lwp fileserver for whatever window necessary to prove it doesn't happen,
> then start debugging"

More thoughts. Clearly I was (somewhat) wrong. If it was merely pthreads,
we'd see it on Solaris. As far as I know, we don't. So, I'll narrow my
theory to namei fileserver and pthreads, or just the namei fileserver.

This can be narrowed in 2 ways: 
-trying an lwp fileserver on linux (it gets built but not installed)
-trying a namei fileserver on solaris

As yet I haven't seen this problem on my linux machine so preferably I'd
need some help to track it, and this week I have other
(non-computing-related) problems anyway. But, thanks to a donation from
MIT to replace virtue.openafs.org, the current virtue hardware should be
available for the latter test as soon as I have time to set up the new
machine.

Some questions for those of you with this problem:
-only with linux servers?
-always with non-replicated volumes that have a .backup?
-if the above, was the backup being recreated at the time? (the VolserLog
 may be helpful here, as well as the vos examine info)
-what if anything pertinent about access patterns?
-can you try a lwp fileserver for some period of time on your hardware?

-D