[OpenAFS] Re: volserver crashing

Derrick Brashear shadow@gmail.com
Tue, 12 Apr 2011 12:42:12 -0400


On Tue, Apr 12, 2011 at 12:36 PM, Eric Chris Garrison
<ecgarris@iupui.edu> wrote:
> On 4/12/11 11:43 AM, Andrew Deason <adeason@sinenomine.net> wrote:
>>
>> On Tue, 12 Apr 2011 09:27:58 -0400
>> Eric Chris Garrison<ecgarris@iupui.edu> =A0wrote:
>>>
>>> > =A0I've recently upgraded all my servers to openafs-1.4.14-1.1.1 and
>>
>> Is this RHEL4/5, or ... ? I assume these are binaries/RPMs from
>> openafs.org?
>
> We compiled from source on RHEL 5.5 to make RPM packages to distribute on
> our RHEL servers. =A0One machine is still at RHEL 4, and it had its RPMs
> compiled separately.
>
>>> > =A0Then I moved on to the "project" volumes, which have a much higher
>>> > =A0quota. =A0One (383GB in size) seems to cause problems when I try t=
o move
>>> > =A0it. =A0 It moves a LOT faster (more like 300-400 Mbit/s), but at s=
ome
>>> > =A0point, the volserver on the receiving end crashes and all volume m=
oves
>>> > =A0abort:
>>> > =A0> =A0Apr 10 12:52:41 rfsb2 kernel: volserver[25425]: segfault at
>>> > =A0000000009cf1fdd8 rip 0000003b9ce79a30 rsp 0000000042b42208 error 4
>>
>> Do you get a core in /usr/afs/logs ? If you can get a backtrace in gdb
>> (just run 'bt'), we could tell you what this is.
>>
>> If you don't get a core, make sure you're running bosserver with
>> 'ulimit -c unlimited'
>
> Yeah, I don't have a core, RHEL sets ulimit to such that none are produce=
d.
> =A0I'm changing that for the next one. =A0Once I'm done with some long vo=
lume
> transfers, I'll try that problem volume again to see if it coughs up a co=
re.
>
> Dumb question: =A0Would I have to restart bosserver? =A0Can I do so witho=
ut
> being disruptive (i.e. restarting the fileserver process) to my users?

the problem is, the ulimit applies to "me and my future children" so
you'd need to effectively cause the current running bosserver to ask
to modify
its ulimit for future fileservers, or ask the current fileserver to do
so for itself. so, not short of binary patching at runtime.



--=20
Derrick