[OpenAFS] rought timeline for 1.4.x
Derrick J Brashear
shadow@dementia.org
Fri, 10 Dec 2004 17:11:04 -0500 (EST)
On Fri, 10 Dec 2004, Jason McCormick wrote:
> * Inability to unmount /usr/vice/cache (or / if it's not a separate
> partition). This is 100% repeatable on all FC3 machines. The following
> steps will always create this problem:
>
> - Stop all processes and logout all users of AFS
> - Stop all AFS processes and unload libafs kernel module
> - lsof | grep -i afs reports nothing open
> - umount /usr/vice/cache
this implies one of the "special" file opens is somehow being leaked.
(inside the kernel)
Is this e.g.
umount /afs
afsd -shutdown
rmmod libafs
?
> * Accessing an AFS volume over our VPN results in an immediate kernel
> panic. The panic message reports many "Unable to handle kernel NULL
> pointer deference at virtual address" errors followed by "Recursive die()
> failure, output suppressed" and "<0>Kernel panic - not syncing: Fatal
> exception in interrupt". This is present only on 1 of 2 laptops running
> FC3, but is 100% repeatable on the failing laptop.
No oops, I assume.
> * Copying large files (~450Mb0 into AFS from non-AFS partitions results
> in a kernel oops. The error reported is:
>
> rxi_Start: xmit list overflowed<1>Unable to handle kernel paging request
> at virtual address ffffffff
>
> This problem is also 100% repeatable. 'fs getcache' does not report that
> the cache is full. I've attached a file gti-largefile-copy-oops.txt that
> is the "soft" kernel oops.
Screams stack overflow, but the backtrace is nonsensical. Recompile module
with -fomit-frame-pointer?
> * Random cache consistency problems. A file will be present in the
> filesystem and viewable on other machines but not on the FC3 host. fs
> flush does not always solve this problem however another client operating
> on the same directory (i.e. touch hi) seems to unstick the client. We do
> have one test case that seems to always generate this problem, but it's not
> very portable for other to test as it requires our internal package
> management software. Rudy Maceyko is going to test this with 1.3.75
> shortly.
Ok. We fixed only one thing which might affect this, and I doubt it's it.