[OpenAFS] AFS CopyOnWrite problem

Derrick J Brashear shadow@dementia.org
Fri, 19 Mar 2004 00:19:30 -0500 (EST)


On Thu, 18 Mar 2004, Marc Santoro wrote:

> I spent some time with 'strace' and judicious use of grep and the source
> and found a pseudo-resolution to the problem.
>
> Apparently, for some reason, AFS is trying to create new files over
> existing ones; I stuck a strace on all the fileserver processes, and
> grep'd for EEXIST, which was the errno=17 reported in the log. Every time
> I tried to create/remove/rename a file in the directory, I'd see something
> like this pop up in the strace output:
>
> open("/vicepa/AFSIDat/4/4+++U/+/+/T5++2kg", O_RDWR|O_CREAT|O_TRUNC|O_EXCL,
> 0) = -1 EEXIST (File exists).

the logs of this list have references of such happening before. we fixed a
few bugs that might cause the orphaned files to happen.

> Upon moving all files that showed this error (there were only three), all
> symptoms went away. One of these files contained the contents of a PINE
> debug file; another was a binary file that looked like a directory (?).
> There seemed to be no data loss after moving the files, however. No
> problems appeared when I salvaged the partition.
>
> I don't know how this happened; maybe AFS thought those file names were
> good for new vnodes and tried to use them, and just punted when those file
> names turned out to be in use. So, crud left around, no longer referenced,
> but still occupying files? I don't know why the salvage operation wouldn't
> catch that, though . . .

because a salvage wouldn't know they were there, since the volume didn't
reference them. perhaps, though, it should be made to do something.

> A kludge solution might be to check for the existance of a file before it
> is opened with O_EXCL, and move it out of the way if it is there.

well, technically it *is* an error condition, i don't think that's the
right answer. changing salvager to deal is probably more "right"