[OpenAFS] Problem with large data transfer (WinNT AFS, please HELP!)
Shyh-Wei Luan
luan@almaden.ibm.com
Tue, 26 Feb 2002 00:00:01 -0800
Is it possible that your AFS server is overloaded? The 1009 event id
indicates an RX timeout, i.e., the server did not respond to the client in
time. If you have a 2 gig volume being read by a large number of users
(how big was your pilot) simultaneously. The server might be slowed down
significantly. You may want to replicate the volume and somehow scatter
the installation time of users.
I am testing a large copy here to see if I can reproduce the error.
Shyh-Wei Luan
Lubos Kejzlar <kejzlar@civ.zcu.cz>@openafs.org on 02/25/2002 11:59:10 AM
Sent by: openafs-info-admin@openafs.org
To: OpenAFS Info Mailing List <openafs-info@openafs.org>, OpenAFS
Developers Mailing List <openafs-devel@openafs.org>, AFS mail list
<info-afs@transarc.com>
cc:
Subject: [OpenAFS] Problem with large data transfer (WinNT AFS, please
HELP!)
Hi all,
we are trying to extend our (long lived) campus-wide Unix-based AFS
infrastructure (Transarc AFS 3.6 DB servers, mix of Transarc/OpenAFS
clients)
to end-user workstations running Win 98/NT/W2k as a main distributed
storage solution.
Unfortunately, our users experienced _significant_ problems during pilot
phase:
- all significant tests are running on Win NT SP5 workstations. Similar
problems are reported by Win98 users (smaller amount of data, not proved
yet by support people)
- as an part of automated SW installation, there is need to copy large
subtree from AFS to local file system (there is no possibility to run SW
directly from AFS space, unfortunately):
- all data are readable to system:anyuser
- both client & server are using 100baseT-FD network connections and
there are no communication problem during tests
- total amount of data copied is about 2+ GB
- there is large number (70+ k) of small files to copy
- all data are located in single volume
- unfortunately, we are _UNABLE_ to copy such data using (any) different
methods (MS Explorer, Perl-based command line tools, etc.):
- the copy process breaks at random (AFAIK) point with following error
events (occurred roughly at same time in system/application event
log):
EventID: 3013
The redirector has timed out request to xxxxxx-afs ...
EventID: 1009
cm_Analyze: HardDeadTime exceeded ....
and/or (?)
EventID: 1005
Pkt straddled session startup, took xxxxxx ms, ncb length xxx.
- there are no active CM RX connections (from rxdebux) and system seems
to be 'frozen' for a while, during error event (AFAIK).
Does someone ever seen similar problems??
Currently it's really _HIGH_PRIORITY_ISSUE_ for us to provide and support
single distributed FS infrastructure for all our users (10000+), so we are
looking for _ANY_HELP_OR_SUGGESTION_ (I'm not very familiar with M$
Windows, but I'm able (glad) to provide any further info for someone could
help us)!
So again thank you VERY MUCH in advance for _ANY_HELP_OR SUGGESTION_ !!
Best regards,
Lubos
--------------------------------------------------------------------------
Lubos Kejzlar
System and Network Specialist
Laboratory for Computer Science Tel.: ++420-19-7491536
University of West Bohemia ++420-19-7421414
Univerzitni 8, 30614 Pilsen Fax: ++420-19-7421419
Czech Republic E-mail: kejzlar@civ.zcu.cz
PGP Key fingerprint = 5621 06DA 3EDE 5D15 F287 5408 9B8E C766 CD64 3A3F
--------------------------------------------------------------------------
_______________________________________________
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info