OpenAFS Master Repository branch, openafs-stable-1_6_9-branch, updated. openafs-stable-1_6_8-73-gca0be5b

Gerrit Code Review gerrit@openafs.org
Thu, 12 Jun 2014 13:57:45 -0400


The following commit has been merged in the openafs-stable-1_6_9-branch branch:
commit bc8f62fcdfa479023d15125404d1b13b6dfd6dc3
Author: Jeffrey Altman <jaltman@your-file-system.com>
Date:   Wed Jun 11 19:03:45 2014 -0400

    Revert "viced: Avoid issuing redundant TMAY requests"
    
    This reverts commit 03a9b481c7f27c462c9d65a756d172e79758b86d.
    
    Andrew Deason wrote,
    
      "Briefly, 'host' structures are allocated without clearing all of the
      contents to '0'. Only part of the structure is cleared, according to the
      HOST_TO_ZERO macro. Unfortunately I put the new tmay_ fields right below
      the 'index' field for some reason, so this means they aren't zeroed and
      can contain garbage. This means we can easily segfault in the fileserver
      when we try to access the pointers in there.
    
      "We access uninitialized memory for every 'host' that is allocated. So
      the chance of us corrupting memory is the chance that a particular
      pointer-sized area of memory from 'malloc' is not already NULL.
    
      "That seems pretty likely, but it's not so frequent as to have the
      fileserver effectively "constantly" crashing at the site that noticed.
      So it has not been a fire drill, but it has been noticeable (we heard
      about it I think yesterday, and got details today when it happened
      again). The noticing incident was a segfault, but an abort or sigbus are
      probably also likely.
    
      "Of course, the chances of noticing go way up with more clients. I expect
      the chances dramatically increase if you have more than 512 client hosts
      hit the box, since the first block of 512 are allocated before we really
      do anything. For the next 512, it seems much more likely that 'malloc'
      will give us back non-zeroed data. But this is just theory.
    
      "With the incident I know about, the crash happened semi-quickly after
      the server started (a few minutes). But it seems likely to occur after
      the server has been up for a long time, if/when you cross the next line
      of 512 hosts.
    
      "I am also concerned that this can easily be corrupting memory without
      being noticed via a crash (or it takes a while to crash), since we are
      potentially free'ing invalid pointers, or stomping over someone else's
      memory, etc etc."
    
    Change-Id: I20bd40fc9df69247884099a0623e6db40908b3e8

 src/viced/host.c |  243 +-----------------------------------------------------
 src/viced/host.h |    6 --
 2 files changed, 4 insertions(+), 245 deletions(-)

-- 
OpenAFS Master Repository