[OpenAFS] volserver crashing
Eric Chris Garrison
ecgarris@iupui.edu
Tue, 12 Apr 2011 09:27:58 -0400
This is a multi-part message in MIME format.
--------------020007070807010007080002
Content-Type: text/plain; charset=ISO-8859-1; format=flowed
Content-Transfer-Encoding: 7bit
Hello,
I've recently upgraded all my servers to openafs-1.4.14-1.1.1 and things
have been going fine for the past month, but I hit a snag yesterday.
Over the weekend, I used Russ's mvto (with a mod to make it run with
-localauth to avoid expiring tokens) 900+ volumes from one server that
I'm clearing off (rfsb1) to another, previously empty server (rfsb2).
Those all moved fine, though a bit slow (something like 20-40 Mbit/s
over a GigE).
Then I moved on to the "project" volumes, which have a much higher
quota. One (383GB in size) seems to cause problems when I try to move
it. It moves a LOT faster (more like 300-400 Mbit/s), but at some
point, the volserver on the receiving end crashes and all volume moves
abort:
Apr 10 12:52:41 rfsb2 kernel: volserver[25425]: segfault at
000000009cf1fdd8 rip 0000003b9ce79a30 rsp 0000000042b42208 error 4
Apr 11 09:50:05 rfsb2 kernel: volserver[7230]: segfault at
000000009cf1fdd8 rip 0000003b9ce79a30 rsp 0000000043591208 error 4
Apr 11 11:38:50 rfsb2 kernel: volserver[23537]: segfault at
000000009cf1fdd8 rip 0000003b9ce79a30 rsp 0000000047c09208 error 4
Apr 11 14:53:09 rfsb2 kernel: volserver[25121]: segfault at
000000009cf1fdd8 rip 0000003b9ce79a30 rsp 0000000048df4208 error 4
Apr 11 16:21:53 rfsb2 kernel: volserver[27908]: segfault at
000000009cf1fdd8 rip 0000003b9ce79a30 rsp 00000000441a8208 error 4
Apr 11 17:49:33 rfsb2 kernel: volserver[29065]: segfault at
000000009cf1fdd8 rip 0000003b9ce79a30 rsp 0000000045801208 error
I don't know if other large volumes will cause this problem yet. The
logs don't show the volserver crashing anytime before I first attempted
to move that large volume, and I believe every one of those crashes
corresponds to an attempt.
I did a vos dump of the volume just to see if it threw errors, in case
it was corrupt, but it made no complaints and completed without anything
crashing.
I'm running the volserver with -nojumbo and the fileserver with -L and
-nojumbo.
What could be the problem here?
Thanks for any help,
Chris
--
Eric Chris Garrison
Indiana University
Research Storage
--------------020007070807010007080002
Content-Type: text/html; charset=ISO-8859-1
Content-Transfer-Encoding: 7bit
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=ISO-8859-1">
</head>
<body bgcolor="#ffffff" text="#000000">
<br>
Hello,<br>
<br>
I've recently upgraded all my servers to openafs-1.4.14-1.1.1 and
things have been going fine for the past month, but I hit a snag
yesterday.<br>
<br>
Over the weekend, I used Russ's mvto (with a mod to make it run with
-localauth to avoid expiring tokens) 900+ volumes from one server
that I'm clearing off (rfsb1) to another, previously empty server
(rfsb2). Those all moved fine, though a bit slow (something like
20-40 Mbit/s over a GigE). <br>
<br>
Then I moved on to the "project" volumes, which have a much higher
quota. One (383GB in size) seems to cause problems when I try to
move it. It moves a LOT faster (more like 300-400 Mbit/s), but at
some point, the volserver on the receiving end crashes and all
volume moves abort:<br>
<br>
<font face="Courier New">Apr 10 12:52:41 rfsb2 kernel:
volserver[25425]: segfault at 000000009cf1fdd8 rip
0000003b9ce79a30 rsp 0000000042b42208 error 4
<br>
Apr 11 09:50:05 rfsb2 kernel: volserver[7230]: segfault at
000000009cf1fdd8 rip 0000003b9ce79a30 rsp 0000000043591208 error 4
<br>
Apr 11 11:38:50 rfsb2 kernel: volserver[23537]: segfault at
000000009cf1fdd8 rip 0000003b9ce79a30 rsp 0000000047c09208 error 4
<br>
Apr 11 14:53:09 rfsb2 kernel: volserver[25121]: segfault at
000000009cf1fdd8 rip 0000003b9ce79a30 rsp 0000000048df4208 error 4
<br>
Apr 11 16:21:53 rfsb2 kernel: volserver[27908]: segfault at
000000009cf1fdd8 rip 0000003b9ce79a30 rsp 00000000441a8208 error 4
<br>
Apr 11 17:49:33 rfsb2 kernel: volserver[29065]: segfault at
000000009cf1fdd8 rip 0000003b9ce79a30 rsp 0000000045801208 error
</font><br>
<br>
I don't know if other large volumes will cause this problem yet.
The logs don't show the volserver crashing anytime before I first
attempted to move that large volume, and I believe every one of
those crashes corresponds to an attempt.<br>
<br>
I did a vos dump of the volume just to see if it threw errors, in
case it was corrupt, but it made no complaints and completed without
anything crashing.<br>
<br>
I'm running the volserver with -nojumbo and the fileserver with -L
and -nojumbo. <br>
<br>
What could be the problem here?<br>
<br>
Thanks for any help,<br>
<br>
Chris<br>
--<br>
Eric Chris Garrison<br>
Indiana University<br>
Research Storage<br>
<br>
</body>
</html>
--------------020007070807010007080002--