[OpenAFS-devel] Linux deadlocks (possibly fixed in IBM-AFS)
Derrick J Brashear
shadow@dementia.org
Thu, 4 Jul 2002 01:43:05 -0400 (EDT)
On Wed, 3 Jul 2002, Broughton, Travis V wrote:
>
> We've been running into some bugs in 1.2.5 that are causing deadlocks and
> hangs on the Linux client. Unlike most AFS deadlocks I've seen, the system
> load average goes to zero rather than steadily increasing. We believe this
> behavior to have been fixed in the most recent IBM-AFS release, namely by
> the following deltas:
>
> srikanth-IY31752-afs3.6-race-condition-in-afs-buffer-cache 1.2
>
> Fix race condition in function afs_newslot(). This function is used
> to recycle buffers based on the buffer reference count and the
> buffer age. This function used to check the buffer reference count
> without locking it. The result was that buffers that were in use
> would also be recycled.
I believe this was actually a fileserver fix that came from me
> and
>
> srikanth-12885-afs3.6-race.condition.in.linux.event.handling 1.5
>
> Fix another race condition in the event handling code. This race is
> because the operation of dropping GLOCK and going to sleep is not
> atomic. This gives an opportunity for another thread to grab GLOCK
> and call wake_up before the first thread actually goes to sleep. The
> result is a lost wake up.
>
> The new code does not rely on sleep_on(). Instead it manually adds
> the thread to the wait queues and changes to process state to
> "sleeping" before it drops GLOCK. It then drops GLOCK and invokes
> the
> scheduler. Doing it this way allows us to control the the process
> state more precisely and avoids this race condition.
And this is probably from the latest go-around of patches which will be in
OpenAFS 1.2.6
> Has anyone else run into these issues in OpenAFS? Have fixes analogous to
> the above been incorporated into OpenAFS? I can provide kdumps and other
> debug info if that would help to narrow down the source of problem.