[OpenAFS-devel] mount-point inode-number inconsistencies with
openafs-1.4.1
Jeffrey Hutzelman
jhutz@cmu.edu
Thu, 01 Jun 2006 11:34:19 -0400
``
On Thursday, June 01, 2006 04:36:21 PM +0200 Alexander Bergolth
<leo@strike.wu-wien.ac.at> wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 05/30/2006 07:42 PM, chas williams - CONTRACTOR wrote:
>> In message <447C75A0.60003@strike.wu-wien.ac.at>,Alexander Bergolth
>> writes:
>>
>>> When determining the inode-numbers of the mount-points and using
>>> relative path names, different inode numbers are shown when called for
>>> the first time. On subsequent calls the same inode numbers are shown (!)
>>> until I do a "pwd", then the behavior is reset and the next call prints
>>> different inode numbers again:
>>
>> linux doesnt handle having a directory inode mounted twice in the
>> same filesystem very well. the linux vfs operates on pathnames
>> more than inodes, so there needs to be only one dcache entry per
>> inode directory. since you have two paths to the same inode,
>> we need to pick which dcache entry to keep current. in 1.4.1
>> this is now the latest dcache entry (it cured a different bug
>> in the vfs filesystem) instead of the "first found" dcache entry.
>>
>> when a new dcache entry is chosen, the inode number is updated.
>> i believe the inode number is based on the mount point so this is
>> going to lead to different inode numbers.
>>
>>> -------------------- snip! --------------------
>>> $ ls -id1 . backup backup/backup
>>> 278020792 .
>>> 193265714 backup
>>> 193265714 backup/backup
>>
>> here you switch to from the original path to a new path so
>> the inode number changed.
>
> Hmm - I didn't get it...
> Working directory is /afs/wu-wien.ac.at/home/edvz/skamrada and I'm
> referencing . backup and backup/backup, so it's 3 paths (and 3 mount
> points), isn't it?
>
> /afs/wu-wien.ac.at/home/edvz/skamrada/logs
> /afs/wu-wien.ac.at/home/edvz/skamrada/logs/backup
> /afs/wu-wien.ac.at/home/edvz/skamrada/logs/backup/backup
Not entirely. The inode numbers assigned to files and directories in AFS
are derived from the FID (volume, vnode, uniqifier) of each file or
directory. But volume roots are handled specially - in order for readdir
to work correctly on a directory containing mount points, the inode number
assigned to a volume root directory is based on the FID of the mount point
used to reach that volume, not that of the directory itself. Each time you
traverse a mount point, the inode number for the resulting volume root
directory changes to reflect the FID of the mount point you just used to
reach it.
In your example, there are actually only two mount points, not three,
because logs/backup and logs/backup/backup are the same mount point. They
are both the entry with the name 'backup' in the root directory of the
volume user.skamrada.log, and thus have the same FID. So, the inode number
is recomputed when you traverse logs/backup/backup, but it doesn't appear
to change because the computed value is the same.
>> referencing . still stays on the current path.
Yes, because that doesn't involve traversing a mount point.
>>> $ pwd
>>> /afs/wu-wien.ac.at/home/edvz/skamrada/logs
>>> $ ls -id1 . backup backup/backup
>>> 278020792 .
>>
>> ah, you found a "new" (different) path to the same volume.
>> we switch back. but again, you reference the other path
>> and we switch the inode back.
>
> Why does pwd cause this?
Because of the way it works, which is something like this:
(0) start with an empty path
(1) stat "." to find out its inode number
(2) readdir ".." looking for an entry with a matching number
- if no entry is found, give up
- if the entry is named ".", we are done
- otherwise, add the name we found to the front of the path
(3) chdir to ".."
(4) repeat
(5) when done, chdir back to the original directory
One side-effect of this is to traverse the "real" mount point, which gets
you the real inode number again. It also results in the volume moving
around in the dentry tree, which can confuse the pwd algorithm into
failing, if someone else is using that volume in a way that prevents the
dentry from being moved at the right time.
Now, you didn't see this on previous Linux releases because FC5 introduced
a new version of coreutils (which contains the 'pwd' program) which
contains a bug. The bug is that coreutils uses its own getcwd() routine
which does something like what is described above, instead of using the one
provided by the system library. It's unclear to me why they did this;
perhaps the upstream maintainers thought their approach was more efficient,
or perhaps it was just an oversight that occurred in the process of copying
the getcwd() code from glibc (which IMHO was pretty dumb, since it
essentially means the two pieces of code will now be maintained
independently).
The reason this is important is that on Linux, the getcwd() provided by
glibc is implemented by making a system call, which walks upward along the
dentry tree collecting the names of each entry. Not only is this method
considerably more efficient than the algorithm described above, it also
always produces correct results.
> The problem is that we are not able to control what users are doing. Of
> course this mount-point loops are not desirable but the problem is that
> one user may add such a mount-point and render other user's applications
> that traverse the filesystem (like find) unusable.
It is generally not a safe idea to use 'find' to traverse parts of the
filesystem where other people might do dangerous or malicious things. For
example, using find to manipulate ACL's on space managed by someone else is
dangerous, because they might insert a mount point which causes you to
traverse into a volume you didn't intend to change.
-- Jeffrey T. Hutzelman (N3NHS) <jhutz+@cmu.edu>
Sr. Research Systems Programmer
School of Computer Science - Research Computing Facility
Carnegie Mellon University - Pittsburgh, PA