[OpenAFS] rought timeline for 1.4.x

Derrick J Brashear shadow@dementia.org
Fri, 10 Dec 2004 17:11:04 -0500 (EST)


On Fri, 10 Dec 2004, Jason McCormick wrote:

>  * Inability to unmount /usr/vice/cache (or / if it's not a separate
> partition).  This is 100% repeatable on all FC3 machines.  The following
> steps will always create this problem:
>
>      - Stop all processes and logout all users of AFS
>      - Stop all AFS processes and unload libafs kernel module
>      - lsof | grep -i afs reports nothing open
>      - umount /usr/vice/cache

this implies one of the "special" file opens is somehow being leaked. 
(inside the kernel)

Is this e.g.
umount /afs
afsd -shutdown
rmmod libafs
?

>  * Accessing an AFS volume over our VPN results in an immediate kernel
> panic.  The panic message reports many "Unable to handle kernel NULL
> pointer deference at virtual address" errors followed by "Recursive die()
> failure, output suppressed" and "<0>Kernel panic - not syncing: Fatal
> exception in interrupt".  This is present only on 1 of 2 laptops running
> FC3, but is 100% repeatable on the failing laptop.

No oops, I assume.

>  * Copying large files (~450Mb0 into AFS from non-AFS partitions results
> in a kernel oops.  The error reported is:
>
>   rxi_Start: xmit list overflowed<1>Unable to handle kernel paging request
> at virtual address ffffffff
>
> This problem is also 100% repeatable.  'fs getcache' does not report that
> the cache is full.  I've attached a file gti-largefile-copy-oops.txt that
> is the "soft" kernel oops.

Screams stack overflow, but the backtrace is nonsensical. Recompile module 
with -fomit-frame-pointer?

>  * Random cache consistency problems.  A file will be present in the
> filesystem and viewable on other machines but not on the FC3 host.  fs
> flush does not always solve this problem however another client operating
> on the same directory (i.e. touch hi) seems to unstick the client.  We do
> have one test case that seems to always generate this problem, but it's not
> very portable for other to test as it requires our internal package
> management software.  Rudy Maceyko is going to test this with 1.3.75
> shortly.

Ok. We fixed only one thing which might affect this, and I doubt it's it.