[OpenAFS] 1.4.0 Solaris 10 sparc client hang

Renata Maria Dart renata@slac.stanford.edu
Tue, 8 Nov 2005 10:05:29 -0800 (PST)

Hi, we are also seeing problems with solaris 10 and the just announced
1.4.0 OpenAFS binaries (at least we suspect it is with OpenAFS).  In
our case we have a system which we use for our cron jobs one of which
is a "findtrash" operation that runs once a day.  It combs a portion
of our AFS space for garbage and files that should be expired.  Ever
since we switched this system over from solaris 9, OpenAFS 1.2.13 to
solaris 10, OpenAFS 1.4.0 last week, the system has been been falling
into a "hung" state every couple of days, the symptom being that it
stops running its various cron jobs.  If you try to login in at this
point it says there is no room to fork another process and you can't
get a shell prompt.  Yesterday morning the system was in such a state,
so we rebooted and monitored it with top.  It appeared stable for most
of the day until the findtrash run.  Findtrash usually takes an hour
and a half or so, and we noticed that the real memory on the machine
started at 1.7gb at the start of the findtrash process, and ended with
1.07gb when the process was complete....the memory was never
recovered.  The machine is currently at 450mb of free memory so it
will most likely be in need of a reboot soon.  Tonight's findtrash run
will almost certainly do it in.  Please let me know if there is
something I can run that will provide better information.  Would a
crash dump help?


>On Mon, 7 Nov 2005, Christopher D. Clausen wrote:
>> On 07Nov2005 10:45a Derrick J Brashear <shadow@dementia.org> wrote:
>>> On Mon, 7 Nov 2005, Christopher D. Clausen wrote:
>>>> The AFS client has hung on one of my AFS servers (E3000 running
>>>> Solaris 10.) It has the 1.4.0 binaries from the openafs.orgr website
>>>> installed. The client hung on a cp operation from afs to the local
>>>> disk.