[OpenAFS] vos move & vos rel failing on large volumes

Fri, 11 Apr 2003 07:44:32 -0400

Hi -

I am glad you found at least one fileserver machine that had
a faulty setup with its duplex settings.  Over the years we
have found that the auto-negotiate settings for cards/switches
was not behaving properly and that you need to physically
make the settings.

Maybe now that this one problem is cleared up, it might help
with the overall health of the network and maybe the long
running "vos move" will move along.  Do you know if it sees
the problem in the same time span or same spot ?  I know this
would be very difficult to catch.  Maybe you could run scripts
against the source and destination volservers every 30 seconds
to see how the transaction is proceeding

   touch vollog.<server>
   while (1)
      date >> vollog.<server>
      vos status <server>  >> vollog.<server>
      sleep 30
   end

Maybe this can show that the transaction stops sending data or
receiving data at some point while the timer still continues for
10 plus hours.

The Meltdown output for your destination volserver will not
show any real resends, it is just the src volserver that will
show that.

The rxkad error shows that the packet does not have a proper length
so it takes the error condition.

Thanks

Todd

                      Lee Damon                                                                                                  
                      <nomad@crow.ee.wash        To:       Todd DeSantis/Pittsburgh/IBM@ibmus, Ray Pasetes <rayp@fnal.gov>,      
                      ington.edu>                 shadow@dementia.org                                                            
                                                 cc:       openafs-info@openafs.org, nikola@crow.ee.washington.edu               
                      04/10/2003 12:52 PM        Subject:  Re: [OpenAFS] vos move & vos rel failing on large volumes             

I've cut and pasted a bunch of different suggestions into this email and am
replying to each one in turn.  I would like to thank everyone who has
replied with suggestions or timing information.

> been able to move them.  You may want to issue your vos move command
> again with the -verbose option to see if there is more info on what

[root@harrier root]# time vos move -verbose -fromserver kite -frompartition

/vic epc -toserver potential -topartition /vicepf -id v.s8.cadence.2002a
Starting transaction on source volume 536872898 ... done
Cloning source volume 536872898 ... done
Ending the transaction on the source volume 536872898 ... done
Starting transaction on the cloned volume 1443364917 ... done
Deleting pre-existing volume 536872898 on destination ... done
Creating the destination volume 536872898 ... done
Dumping from clone 1443364917 on source to volume 536872898 on destination
...Failed to move data for the volume 536872898
   rxk: sealed data inconsistent
vos move: operation interrupted, cleanup in progress...
clear transaction contexts
access VLDB
FATAL: VLDB access error: abort cleanup
cleanup complete - user verify desired result

real    648m58.951s
user    0m0.010s
sys     0m0.006s

>me either. i'd sort of like to see volserver strace/truss/ptrace/whatever
>output while such a slow thing is happening.

Derrick: I'll send this directly, it's *very* verbose, as one would expect
(22MB of output from running strace for about 4 minutes.)

> Meltdown.pl -s <src volserver> -p 7005 -t 30  (seconds)
>
> and I want to pay attention to the "resend" column.

This looks telling...
/tmp/Meltdown.pl -s kite -p 7005 -t 30

Server: kite, Port: 7005, Interval 30 seconds
Thu Apr 10 09:23:30 PDT 2003

hh:mm:ss wproc nobufs   wpack  fpack    calls     delta  data      resends

idle
09:23:30 0     0        0      464      8358      0      12387116  686973
11

09:24:00 0     0        0      464      8358      0      12387116  686973
11

09:24:30 0     0        0      464      8358      0      12387116  686973
11

09:25:00 0     0        0      464      8369      11     12387140  686974
11

09:25:30 0     0        0      432      8369      0      12389368  687103
10

09:26:00 0     0        0      432      8369      0      12392574  687283
10

09:26:30 0     0        0      432      8369      0      12395914  687468
10

09:27:00 0     0        0      432      8369      0      12398900  687639
10

09:27:30 0     0        0      432      8369      0      12401784  687806
10

09:28:00 0     0        0      432      8369      0      12405380  687998
10

09:28:30 0     0        0      432      8369      0      12408306  688166
10

09:29:00 0     0        0      432      8369      0      12411452  688340
10

And, for the fileserver we're trying to move to:

/tmp/Meltdown.pl -s potential -p 7005 -t 30

Server: potential, Port: 7005, Interval 30 seconds
Thu Apr 10 09:26:52 PDT 2003

hh:mm:ss wproc nobufs   wpack  fpack    calls     delta  data      resends

idle
09:26:52 0     0        0      462      2310      0      3812638   253710
10

09:27:22 0     0        0      462      2310      0      3812638   253710
10

09:27:52 0     0        0      462      2310      0      3812638   253710
10

09:28:22 0     0        0      462      2310      0      3812638   253710
10

09:28:52 0     0        0      462      2310      0      3812638   253710
10

09:29:22 0     0        0      462      2310      0      3812638   253710
10

09:29:52 0     0        0      462      2310      0      3812638   253710
10

09:30:22 0     0        0      462      2310      0      3812638   253710
10

09:30:52 0     0        0      462      2310      0      3812638   253710
10

09:31:22 0     0        0      462      2310      0      3812638   253710
10

09:31:52 0     0        0      462      2310      0      3812638   253710
10

09:32:22 0     0        0      462      2310      0      3812638   253710
10

09:32:52 0     0        0      462      2310      0      3812638   253710
10

> If this column is increasing while the release/move is in
> progress than I would want to look at the routes between
> these 2 volserver machines.

The routes look clean, but they are something I have very little control
over.  I'm trying to get networking involved, we'll see.  One thing I
discovered yesterday was that the new fileserver (potential) was ordered
with 100baseT cards instead of gig, but I don't think that is sufficient
to explain the problem.

The thing is, we were seeing slow moves before we brought potential in.
Our first two fileservers, kite & harrier, are connected directly to
the same gig switch, and large releases were taking hours there too.  I'm
going to do the same set of tests (with a much smaller volume :) between
them and see if resends climb while that is happening.    Since I
don't want to leave the move in an "indeterminate state", I'll have to wait
for that to time out and die of its own unnatural causes first.

> Is duplex settings set properly on the network card and
> switch port ?  "netstat -ni" can show collisions if this is
> a problem.

netstat -ni shows no collisions or other errors on any interface on any of
our fileservers.

> Can the network handle fragmented packets ?

yes, I believe so.

> How FULL is the destination vice partition ?  Does it have
> ample free inodes around ?

Darn near empty.  It's a new fileserver.

> I would also look at the the volserver options
>   - using         -udpsize 1048576

I'm a bit lost as to how to reset BosConfig to pass this option on to
volserver
without trashing my entire cell.  Any hints for a lost soul?

> Are there any other vos tranactions happening on the src or
> destination server during these times

In the duration of an 10+hour move, there are a few other vos releases that
happen.

>       vos status <src volserver>
>       vos status <dst volserver>

Here's a snapshot from just now:

 [root@harrier root]# vos status kite
Total transactions: 1
--------------------------------------
transaction: 7376  created: Thu Apr 10 09:25:01 2003
attachFlags:  offline
volumeStatus:
volume: 1443364918  partition: /vicepc  procedure: Forward
packetRead: 1  lastReceiveTime: 96171503  packetSend: 41257  lastSendTime:
96171503
--------------------------------------

[root@harrier root]# vos status potential
Total transactions: 1
--------------------------------------
transaction: 1815  created: Thu Apr 10 09:25:11 2003
volumeStatus:
volume: 536872898  partition: /vicepf  procedure: Restore
packetRead: 41366  lastReceiveTime: 96171459  packetSend: 1  lastSendTime:
96171459
--------------------------------------

Another interesting tidbit is in the VolserLog on kite. Volume 1443364918
is the one I'm trying to move to (on potential), 536872898 is the source
(on
kite).

Thu Apr 10 09:38:01 2003 trans 7376 on volume 1443364918 is older than 780
secon
ds
Thu Apr 10 09:38:31 2003 trans 7376 on volume 1443364918 is older than 810
secon
ds
Thu Apr 10 09:39:01 2003 trans 7376 on volume 1443364918 is older than 840
secon
ds
Thu Apr 10 09:39:31 2003 trans 7376 on volume 1443364918 is older than 870
secon
ds
Thu Apr 10 09:40:01 2003 trans 7376 on volume 1443364918 is older than 900
secon
ds
Thu Apr 10 09:40:31 2003 trans 7376 on volume 1443364918 is older than 930
secon
ds
Thu Apr 10 09:41:01 2003 trans 7376 on volume 1443364918 is older than 960
secon
ds

The previous move said that for a very long time, then started saying the
trans was idle:

Thu Apr 10 01:06:01 2003 trans 7236 on volume 1443364917 is older than
38730
sec
onds
Thu Apr 10 01:06:31 2003 trans 7236 on volume 1443364917 is older than
38760
sec
onds
Thu Apr 10 01:07:01 2003 trans 7236 on volume 1443364917 is older than
38790
sec
onds
Thu Apr 10 01:12:31 2003 trans 7236 on volume 1443364917 has been idle for
more
than 300 seconds

and finally:

Thu Apr 10 01:16:01 2003 trans 7236 on volume 1443364917 has been idle for
more
than 510 seconds
Thu Apr 10 01:16:31 2003 trans 7236 on volume 1443364917 has been idle for
more
than 540 seconds
Thu Apr 10 01:17:01 2003 trans 7236 on volume 1443364917 has been idle for
more
than 570 seconds
Thu Apr 10 01:17:31 2003 trans 7236 on volume 1443364917 has been idle for
more
than 600 seconds
Thu Apr 10 01:17:31 2003 trans 7236 on volume 1443364917 has timed out

thanks again for all the hints,
nomad
 -----------                       - Lee "nomad" Damon -          \
work: nomad@ee.washington.edu                                      \
play: nomad@castle.org    or castle!nomad                           \
                                                                    /\
Sr. Systems Admin, UWEE SSLI Lab                                   /  \
                "Celebrate Diversity"                             /    \