[OpenAFS-devel] openafs can crash in linux symlink code on kernels prior to 2.6.13?(RT #56542)

Christopher Allen Wing wingc@engin.umich.edu
Fri, 16 Mar 2007 15:14:53 -0400 (EDT)


I opened up a bug ticket for this (RT #56542); I am sending it to 
openafs-devel because I'd appreciate some more eyes on this to confirm 
that my analysis is correct here.

----------------
We recently saw a kernel crash on a machine running RHEL4, which hit the 
following assert in (linux)/fs/namei.c:

 	void page_put_link(struct dentry *dentry, struct nameidata *nd)
 	{
 	if (!IS_ERR(nd_get_link(nd))) {
 		struct page *page;
 		page = find_get_page(dentry->d_inode->i_mapping, 0);
 		if (!page)
 			BUG();



I did some research and it appears that the symlink caching API was changed in 
the linux kernel source on August 20, 2005 via the following commits:

-	"Fix nasty ncpfs symlink handling bug."
 	Linus Torvalds [Sat, 20 Aug 2005 01:02:56 +0000 (18:02 -0700)]
see:
 	http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=cc314eef0128a807e50fa03baf2d0abc0647952c

-	"[PATCH] Fix up symlink function pointers"
 	author	Al Viro <viro@parcelfarce.linux.theplanet.co.uk>
 	Fri, 19 Aug 2005 23:17:39 +0000 (00:17 +0100)
 	committer	Linus Torvalds <torvalds@g5.osdl.org>
 	Sat, 20 Aug 2005 01:08:21 +0000 (18:08 -0700)
see:
 	http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=008b150a3c4d971cd65d02d107b8fcc860bc959c


My understanding of the discussion that preceded this change is that the 
symlink caching API in the linux kernel:

 	page_follow_link_light(), page_put_link(), page_readlink(), etc.

is unsafe for a network filesystem to use, prior to the above commits. Since 
OpenAFS uses this API, I believe that AFS is vulnerable to kernel crashes when 
running on kernels older than 2.6.13 (the first release incorporating the above 
changes).


I don't have a test case to reproduce the crash yet, but my understanding is 
that the kernel can crash while following symlinks during a path name lookup; 
it's possible for one thread to be following symlinks while another thread 
causes the file cache pages (which contain the symlink text) to be evicted from 
memory.  After the file cache pages are evicted, when the thread doing the 
symlink lookup calls page_put_link(), the BUG() will trigger.

Apparently, the symlink caching API was only suitable for local filesystems 
prior to the change that went into 2.6.13.



Here is a link to the original discussion of this bug on linux-kernel back in 
2005:

 	Subject:    Kernel bug: Bad page state: related to generic symlink code 
and mmap
 	From:       Anton Altaparmakov <aia21@cam.ac.uk>
 	Date:       2005-08-19 11:14:48
 	Message-ID: 1124450088.2294.31.camel@imp.csi.cam.ac.uk

archived at:
 	http://www.uwsg.indiana.edu/hypermail/linux/kernel/0508.2/0858.html
or:
 	http://marc.info/?l=linux-kernel&m=112445020708392


I believe that in order to be safe, openafs cannot use the page_*link() API on 
kernels that do not include the patch.  Since the patch changed the calling 
convention of the:

 	inode_operations.follow_link()
 	inode_operations.put_link()

methods, it should be possible to check this with an autoconf test.




To fix things up on old kernels, I can think of two options:

 	1. Disable symlink caching with page_*link() API on unpatched
 	   kernels.  OpenAFS already has code which might be made to work:

 		src/afs/LINUX/osi_vnodeops.c::
 			afs_linux_readlink()
 			afs_linux_follow_link()

 	   but these methods were written for pre-2.4 kernels, so they'd
 	   need updating.


 	2. Prior to the patch that went into linux-2.6.13, the NFS client
 	   code in the Linux kernel was using its own symlink caching
 	   code, which is why NFS was never affected by the bug.
 	   Adopting something similar to the old NFS code should fix
 	   OpenAFS; here is a patch from 2005 which should be suitable as
 	   a starting point:


 	Subject:    Re: Kernel bug: Bad page state: related to generic symlink 
code and mmap
 	From:       Al Viro <viro@parcelfarce.linux.theplanet.co.uk>
 	Date:       2005-08-19 18:02:18
 	Message-ID: 20050819180218.GE29811@parcelfarce.linux.theplanet.co.uk

archived at:
 	http://www.uwsg.indiana.edu/hypermail/linux/kernel/0508.2/0923.html
or:
 	http://marc.info/?l=linux-kernel&m=112447444702991



I don't know if this bug affects 2.4 kernels as well as 2.6 kernels <= 2.6.13.


-Chris Wing
wingc@engin.umich.edu