OpenAFS Master Repository branch, openafs-stable-1_6_9-branch, updated. openafs-stable-1_6_8-73-gca0be5b
Gerrit Code Review
gerrit@openafs.org
Thu, 12 Jun 2014 13:57:45 -0400
The following commit has been merged in the openafs-stable-1_6_9-branch branch:
commit bc8f62fcdfa479023d15125404d1b13b6dfd6dc3
Author: Jeffrey Altman <jaltman@your-file-system.com>
Date: Wed Jun 11 19:03:45 2014 -0400
Revert "viced: Avoid issuing redundant TMAY requests"
This reverts commit 03a9b481c7f27c462c9d65a756d172e79758b86d.
Andrew Deason wrote,
"Briefly, 'host' structures are allocated without clearing all of the
contents to '0'. Only part of the structure is cleared, according to the
HOST_TO_ZERO macro. Unfortunately I put the new tmay_ fields right below
the 'index' field for some reason, so this means they aren't zeroed and
can contain garbage. This means we can easily segfault in the fileserver
when we try to access the pointers in there.
"We access uninitialized memory for every 'host' that is allocated. So
the chance of us corrupting memory is the chance that a particular
pointer-sized area of memory from 'malloc' is not already NULL.
"That seems pretty likely, but it's not so frequent as to have the
fileserver effectively "constantly" crashing at the site that noticed.
So it has not been a fire drill, but it has been noticeable (we heard
about it I think yesterday, and got details today when it happened
again). The noticing incident was a segfault, but an abort or sigbus are
probably also likely.
"Of course, the chances of noticing go way up with more clients. I expect
the chances dramatically increase if you have more than 512 client hosts
hit the box, since the first block of 512 are allocated before we really
do anything. For the next 512, it seems much more likely that 'malloc'
will give us back non-zeroed data. But this is just theory.
"With the incident I know about, the crash happened semi-quickly after
the server started (a few minutes). But it seems likely to occur after
the server has been up for a long time, if/when you cross the next line
of 512 hosts.
"I am also concerned that this can easily be corrupting memory without
being noticed via a crash (or it takes a while to crash), since we are
potentially free'ing invalid pointers, or stomping over someone else's
memory, etc etc."
Change-Id: I20bd40fc9df69247884099a0623e6db40908b3e8
src/viced/host.c | 243 +-----------------------------------------------------
src/viced/host.h | 6 --
2 files changed, 4 insertions(+), 245 deletions(-)
--
OpenAFS Master Repository