[OpenAFS] vos move & vos rel failing on large volumes

Thu, 10 Apr 2003 09:28:44 -0400

Hi -

Many times when I troubleshoot long running vos transactions,
I want to look at the traffic between the volservers.

I use the Meltdown.pl script that is available from the
Transarc IBM Web Site.  It is just a perl script that is
a wrapper around the

  rxdebug <src-volserver> 7003 -rxstat -noconn

Meltdown.pl -s <src volserver> -p 7005 -t 30  (seconds)

and I want to pay attention to the "resend" column.
If this column is increasing while the release/move is in
progress than I would want to look at the routes between
these 2 volserver machines.

Is duplex settings set properly on the network card and
switch port ?  "netstat -ni" can show collisions if this is
a problem.

Can the network handle fragmented packets ?

How FULL is the destination vice partition ?  Does it have
ample free inodes around ?

I would also look at the the volserver options
  - using         -udpsize 1048576

Are there any other vos tranactions happening on the src or
destination server during these times

      vos status <src volserver>
      vos status <dst volserver>

I would look into these things.

Thanks

Todd DeSantis
AFS Support

                      Nathan Neulinger                                                                                          
                      <nneul@umr.edu>            To:       Lee Damon <nomad@crow.ee.washington.edu>                             
                      Sent by:                   cc:       openafs-info@openafs.org, nikola@crow.ee.washington.edu              
                      openafs-info-admin@        Subject:  Re: [OpenAFS] vos move & vos rel failing on large volumes            
                      openafs.org                                                                                               

                      04/09/03 02:57 PM                                                                                         

That isn't a problem with the volume size... at least not _just_ the
volume size. We have numerous volumes > 10 GB with no problems at all on
linux, and move them around at will... Don't have many that large that a
replicated, but we definately have a few 2GB+ volumes that are
replicated.

-- Nathan

On Wed, 2003-04-09 at 13:01, Lee Damon wrote:
> Ignorant but trying to learn AFS...
>
> I know we shouldn't have volumes this large, but there isn't anything I
can
> do to fix the cause of them.
>
> fs exa
/afs/.ee.washington.edu/nikola/groups/vlsi/.Solaris8-sparc/pkgs/cadence/
> .2002a
> Volume status for vid = 536872898 named v.s8.cadence.2002a
> Current disk quota is 5000000
> Current blocks used are 2527255
> The partition has 12956012 blocks available out of 33223516
>
> However, I would still like to vos move and vos release them.  Problem
is,
> vos move ends in the following failure.  vos rel gives a similar error
> after which vos exa shows some volumes as new and some as old.  Does
anyone
> have any hints about what I can check/change/do to try to get this move
to
> work?
>
> time vos move -fromserver kite -frompartition /vicepc -toserver potential

> -topartition /vicepf -id v.s8.cadence.2002a
> exit
> Failed to move data for the volume 536872898
>    rxk: sealed data inconsistent
> vos move: operation interrupted, cleanup in progress...
> clear transaction contexts
> FATAL: VLDB access error: abort cleanup
> cleanup complete - user verify desired result
>
> real    619m16.383s
> user    0m0.004s
> sys     0m0.010s
>
> vos exa v.s8.cadence.2002a
> v.s8.cadence.2002a                536872898 RW    2527255 K  On-line
>     kite.ee.washington.edu /vicepc
>     RWrite  536872898 ROnly  536873557 Backup          0
>     MaxQuota    5000000 K
>     Creation    Wed Nov 20 15:54:32 2002
>     Last Update Wed Mar 19 09:35:35 2003
>     0 accesses in the past day (i.e., vnode references)
>
>     RWrite: 536872898
>     number of sites -> 1
>        server kite.ee.washington.edu partition /vicepc RW Site
>
>
> I'm getting errors from vos syncvldb now, as well:
>
> vos syncvldb potential /vicepf
> Could not create a VLDB entry for the volume 536873547
> VLDB: volume entry exists in the vldb
> Could not create a VLDB entry for the volume 536873457
> VLDB: volume entry exists in the vldb
> *** Warning: Orphaned RW volume 536872898 exists on
> potential.ee.washington.edu /vicepf
>     VLDB reports RW volume 536872898 exists on kite.ee.washington.edu
/vicepc
> *** Warning: Orphaned RW volume 536872892 exists on
> potential.ee.washington.edu /vicepf
>     VLDB reports RW volume 536872892 exists on kite.ee.washington.edu
/vicepc
> Could not process entries on server potential.ee.washington.edu partition

> /vicepf
> VLDB synchronized with state of server potential partition /vicepf
>
>
>
> Thanks for any advice,
> nomad
>  -----------                       - Lee "nomad" Damon -          \
> work: nomad@ee.washington.edu                                      \
> play: nomad@castle.org    or castle!nomad                           \
>                                                                     /\
> Sr. Systems Admin, UWEE SSLI Lab                                   /  \
>                 "Celebrate Diversity"                             /    \
>
>
> _______________________________________________
> OpenAFS-info mailing list
> OpenAFS-info@openafs.org
> https://lists.openafs.org/mailman/listinfo/openafs-info
--

------------------------------------------------------------
Nathan Neulinger                       EMail:  nneul@umr.edu
University of Missouri - Rolla         Phone: (573) 341-4841
Computing Services                       Fax: (573) 341-4216

_______________________________________________
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info