[OpenAFS] Full disk woes

Kim Kimball dhk@ccre.com
Wed, 11 Jul 2007 15:36:26 -0600


Probably too late here ... the volserver can be killed without 
restarting the fileserver.  That will clear the volserver queue without 
causing the fileserver to detach/reattach all the volumes, which can 
take some minutes depending on how many volumes there are.  It also 
leaves your file server up and causes no outage.

Just "kill -9 <PID of /usr/afs/bin/volserver>" -- the bosserver will 
restart it with an empty queue.

If someone somewhere is running lots of vos commands they will show up 
in the queue if you run "vos status ..."

Test on non-production machine first!

Kim


Hartmut Reuter wrote:
>
> I tried a
>
> /afs/ipp/backups: vos listvldb 1938590434 -cell msu.edu
> vsu_ClientInit: Could not get afs tokens, running unauthenticated.
>
> svc.ml.mdsolids.31
>     RWrite: 1938590433    ROnly: 1938590434    RClone: 1938590434
>     number of sites -> 3
>        server afsfs7.cl.msu.edu partition /vicepa RW Site
>        server afsfs9.cl.msu.edu partition /vicepa RO Site  -- Old release
>        server afsfs7.cl.msu.edu partition /vicepa RO Site  -- New release
> /afs/ipp/backups:
>
> and found out it's your machine afsfs9.cl.msu.edu which does the 
> trouble. Then I did a "vos status " to this machine which did not 
> respond.
>
> rxdebug "afsfs9.cl.msu.edu 7005" shows a lot of connections in state 
> precall with source ports != 7005. That means you have a lot vos 
> commands running anywhere. Those you should stop first! Then perhaps
> restart your fileserver to get rid of the old transactions and then 
> hopefully everthing is OK again.
>
> Hartmut
>
> Steve Devine wrote:
>> Hartmut Reuter wrote:
>>
>>> Steve Devine wrote:
>>>
>>>> Hartmut Reuter wrote:
>>>>
>>>>
>>>>> Steve Devine wrote:
>>>>>
>>>>>
>>>>>> I committed the cardinal sin of letting a server partition fill up.
>>>>>> I have tried vos remove and vos zap .. I can't get rid of any
>>>>>> vols.Volume management fails on this machine.
>>>>>> Its the old style (non namei) fileserver. It doesn't seem like I can
>>>>>> just "rm the V#####.vol" can I?
>>>>>> Any help?
>>>>>>
>>>>>
>>>>> To remove the small V#####.vol files doesn't help, they are really
>>>>> only 76 bytes long.
>>>>>
>>>>> What happens if you do a "vos remove" or a "vos zap"?
>>>>
>>>> both commands fail. Even when I use force.
>>>
>>>
>>> What says the VolserLog?
>>>
>>>
>>>
>>>>> Go the volumes away and the free space seems as low as before?
>>>>>
>>>>> This can happen, if you only removed readonly and backup volumes 
>>>>> which
>>>>> typically can free only the space used by their metadata while the
>>>>> space used by their files and directories is shared between them and
>>>>> the RW volume. But, of course, you don't want to remove your
>>>>> RW-volumes.
>>>>> May be, if you have removed all RO- and BK- volumes you have enough
>>>>> free space for the temporary volume being created when you try to 
>>>>> move
>>>>> your smallest RW-volume to another partition/server.
>>>>>
>>>>> There is also a "-live" option for the vos move command which should
>>>>> doe the move without creating a clone. I suppose it has been written
>>>>> for such cases.
>>>>>
>>>>> Good luck,
>>>>> Hartmut
>>>>> -----------------------------------------------------------------
>>>>> Hartmut Reuter                           e-mail reuter@rzg.mpg.de
>>>>>                      phone +49-89-3299-1328
>>>>> RZG (Rechenzentrum Garching)               fax   +49-89-3299-1301
>>>>> Computing Center of the Max-Planck-Gesellschaft (MPG) and the
>>>>> Institut fuer Plasmaphysik (IPP)
>>>>> -----------------------------------------------------------------
>>>>> _______________________________________________
>>>>> OpenAFS-info mailing list
>>>>> OpenAFS-info@openafs.org
>>>>> https://lists.openafs.org/mailman/listinfo/openafs-info
>>>>
>>>>
>>>>
>>>
>> Lot of lines like this ..
>> Fri Jul  6 10:05:18 2007 trans 3811071 on volume 1938590434 is older
>> than 29730 seconds
>> Fri Jul  6 10:05:48 2007 trans 3811072 on volume 1937192577 is older
>> than 28530 seconds
>> Fri Jul  6 10:05:48 2007 trans 3811071 on volume 1938590434 is older
>> than 29760 seconds
>> Fri Jul  6 10:06:18 2007 trans 3811072 on volume 1937192577 is older
>> than 28560 seconds
>> Fri Jul  6 10:06:18 2007 trans 3811071 on volume 1938590434 is older
>> than 29790
>>
>
>