[OpenAFS] volserver crashing

Eric Chris Garrison ecgarris@iupui.edu
Tue, 12 Apr 2011 09:27:58 -0400


This is a multi-part message in MIME format.
--------------020007070807010007080002
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit


Hello,

I've recently upgraded all my servers to openafs-1.4.14-1.1.1 and things 
have been going fine for the past month, but I hit a snag yesterday.

Over the weekend, I used Russ's mvto (with a mod to make it run with 
-localauth to avoid expiring tokens) 900+ volumes from one server that 
I'm clearing off (rfsb1) to another, previously empty server (rfsb2).  
Those all moved fine, though a bit slow (something like 20-40 Mbit/s 
over a GigE).

Then I moved on to the "project" volumes, which have a much higher 
quota.  One (383GB in size) seems to cause problems when I try to move 
it.   It moves a LOT faster (more like 300-400 Mbit/s), but at some 
point, the volserver on the receiving end crashes and all volume moves 
abort:

Apr 10 12:52:41 rfsb2 kernel: volserver[25425]: segfault at 
000000009cf1fdd8 rip 0000003b9ce79a30 rsp 0000000042b42208 error 4
Apr 11 09:50:05 rfsb2 kernel: volserver[7230]: segfault at 
000000009cf1fdd8 rip 0000003b9ce79a30 rsp 0000000043591208 error 4
Apr 11 11:38:50 rfsb2 kernel: volserver[23537]: segfault at 
000000009cf1fdd8 rip 0000003b9ce79a30 rsp 0000000047c09208 error 4
Apr 11 14:53:09 rfsb2 kernel: volserver[25121]: segfault at 
000000009cf1fdd8 rip 0000003b9ce79a30 rsp 0000000048df4208 error 4
Apr 11 16:21:53 rfsb2 kernel: volserver[27908]: segfault at 
000000009cf1fdd8 rip 0000003b9ce79a30 rsp 00000000441a8208 error 4
Apr 11 17:49:33 rfsb2 kernel: volserver[29065]: segfault at 
000000009cf1fdd8 rip 0000003b9ce79a30 rsp 0000000045801208 error

I don't know if other large volumes will cause this problem yet.   The 
logs don't show the volserver crashing anytime before I first attempted 
to move that large volume, and I believe every one of those crashes 
corresponds to an attempt.

I did a vos dump of the volume just to see if it threw errors, in case 
it was corrupt, but it made no complaints and completed without anything 
crashing.

I'm running the volserver with -nojumbo and the fileserver with -L and 
-nojumbo.

What could be the problem here?

Thanks for any help,

Chris
--
Eric Chris Garrison
Indiana University
Research Storage


--------------020007070807010007080002
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
  <head>

    <meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">
  </head>
  <body bgcolor="#ffffff" text="#000000">
    <br>
    Hello,<br>
    <br>
    I've recently upgraded all my servers to openafs-1.4.14-1.1.1 and
    things have been going fine for the past month, but I hit a snag
    yesterday.<br>
    <br>
    Over the weekend, I used Russ's mvto (with a mod to make it run with
    -localauth to avoid expiring tokens) 900+ volumes from one server
    that I'm clearing off (rfsb1) to another, previously empty server
    (rfsb2).&nbsp; Those all moved fine, though a bit slow (something like
    20-40 Mbit/s over a GigE).&nbsp; <br>
    <br>
    Then I moved on to the "project" volumes, which have a much higher
    quota.&nbsp; One (383GB in size) seems to cause problems when I try to
    move it.&nbsp;&nbsp; It moves a LOT faster (more like 300-400 Mbit/s), but at
    some point, the volserver on the receiving end crashes and all
    volume moves abort:<br>
    <br>
    <font face="Courier New">Apr 10 12:52:41 rfsb2 kernel:
      volserver[25425]: segfault at 000000009cf1fdd8 rip
      0000003b9ce79a30 rsp 0000000042b42208 error 4
      <br>
      Apr 11 09:50:05 rfsb2 kernel: volserver[7230]: segfault at
      000000009cf1fdd8 rip 0000003b9ce79a30 rsp 0000000043591208 error 4
      <br>
      Apr 11 11:38:50 rfsb2 kernel: volserver[23537]: segfault at
      000000009cf1fdd8 rip 0000003b9ce79a30 rsp 0000000047c09208 error 4
      <br>
      Apr 11 14:53:09 rfsb2 kernel: volserver[25121]: segfault at
      000000009cf1fdd8 rip 0000003b9ce79a30 rsp 0000000048df4208 error 4
      <br>
      Apr 11 16:21:53 rfsb2 kernel: volserver[27908]: segfault at
      000000009cf1fdd8 rip 0000003b9ce79a30 rsp 00000000441a8208 error 4
      <br>
      Apr 11 17:49:33 rfsb2 kernel: volserver[29065]: segfault at
      000000009cf1fdd8 rip 0000003b9ce79a30 rsp 0000000045801208 error
    </font><br>
    <br>
    I don't know if other large volumes will cause this problem yet.&nbsp;&nbsp;
    The logs don't show the volserver crashing anytime before I first
    attempted to move that large volume, and I believe every one of
    those crashes corresponds to an attempt.<br>
    <br>
    I did a vos dump of the volume just to see if it threw errors, in
    case it was corrupt, but it made no complaints and completed without
    anything crashing.<br>
    <br>
    I'm running the volserver with -nojumbo and the fileserver with -L
    and -nojumbo.&nbsp;&nbsp; <br>
    <br>
    What could be the problem here?<br>
    <br>
    Thanks for any help,<br>
    <br>
    Chris<br>
    --<br>
    Eric Chris Garrison<br>
    Indiana University<br>
    Research Storage<br>
    <br>
  </body>
</html>

--------------020007070807010007080002--