[OpenAFS] Problem with large data transfer (WinNT AFS, please HELP!)

Shyh-Wei Luan luan@almaden.ibm.com
Tue, 26 Feb 2002 00:00:01 -0800

Is it possible that your AFS server is overloaded?  The 1009 event id
indicates an RX timeout, i.e., the server did not respond to the client in
time.  If you have a 2 gig volume being read by a large number of users
(how big was your pilot) simultaneously.  The server might be slowed down
significantly.  You may want to replicate the volume and somehow scatter
the installation time of users.

I am testing a large copy here to see if I can reproduce the error.

Shyh-Wei Luan

Lubos Kejzlar <kejzlar@civ.zcu.cz>@openafs.org on 02/25/2002 11:59:10 AM

Sent by:    openafs-info-admin@openafs.org

To:    OpenAFS Info Mailing List <openafs-info@openafs.org>, OpenAFS
       Developers Mailing List <openafs-devel@openafs.org>, AFS mail list
Subject:    [OpenAFS] Problem with large data transfer (WinNT AFS, please

Hi all,

   we are trying to extend our (long lived) campus-wide Unix-based AFS
infrastructure (Transarc AFS 3.6 DB servers, mix of Transarc/OpenAFS
to end-user workstations running Win 98/NT/W2k as a main distributed
storage solution.

Unfortunately, our users experienced _significant_ problems during pilot

- all significant tests are running on Win NT SP5 workstations. Similar
  problems are reported by Win98 users (smaller amount of data, not proved
  yet by support people)

- as an part of automated SW installation, there is need to copy large
  subtree from AFS to local file system (there is no possibility to run SW
  directly from AFS space, unfortunately):

     - all data are readable to system:anyuser
     - both client & server are using 100baseT-FD network connections and
       there are no communication problem during tests
     - total amount of data copied is about 2+ GB
     - there is large number (70+ k) of small files to copy
     - all data are located in single volume

- unfortunately, we are _UNABLE_ to copy such data using (any) different
  methods (MS Explorer, Perl-based command line tools, etc.):

     - the copy process breaks at random (AFAIK) point with following error
       events (occurred roughly at same time in system/application event

       EventID: 3013
       The redirector has timed out request to xxxxxx-afs ...

       EventID: 1009
       cm_Analyze: HardDeadTime exceeded ....

       and/or (?)

       EventID: 1005
       Pkt straddled session startup, took xxxxxx ms, ncb length xxx.

    - there are no active CM RX connections (from rxdebux) and system seems
      to be 'frozen' for a while, during error event (AFAIK).

Does someone ever seen similar problems??

Currently it's really _HIGH_PRIORITY_ISSUE_ for us to provide and support
single distributed FS infrastructure for all our users (10000+), so we are
looking for _ANY_HELP_OR_SUGGESTION_ (I'm not very familiar with M$
Windows, but I'm able (glad) to provide any further info for someone could
help us)!

So again thank you VERY MUCH in advance for _ANY_HELP_OR SUGGESTION_ !!

Best regards,

Lubos Kejzlar
System and Network Specialist

Laboratory for Computer Science                Tel.:      ++420-19-7491536
University of West Bohemia                                ++420-19-7421414
Univerzitni 8, 30614 Pilsen                    Fax:       ++420-19-7421419
Czech Republic                                 E-mail:  kejzlar@civ.zcu.cz

PGP Key fingerprint  =  5621 06DA 3EDE 5D15 F287  5408 9B8E C766 CD64 3A3F

OpenAFS-info mailing list