AW: AW: [OpenAFS] Severe Performance Issue with SCSI!

Mon, 5 Aug 2002 10:54:58 +0200

First of all some remarks on the "its the network " theory:

	- we tested SCSI as well as IDE performance with several
computers and several network adapters and different versions of radhat
linux. We never saw that an SCSI backed AFS sever is nearly as fast as
an IDE backed AFS sever with similar benchmarks that operate on small
files and did heavy writing. When files get very large (>10 ... 100 MB)
_AND_ we are only reading SCSI dominates;  but this is the only case it
does so.

	-> I belief that this observation cannot be easily explained by
misconfigured network! Why would only system that contain have a
/vicepxx mounted on a SCSI disk be affected?!

And now for the tests:

	-> Our performance tests consists of compiling real programs
(internal development) with javac (1.4) using ant and gcc using make for
building. We did a make all / make clean cycle several times and counted
user / sys  / wall clock. We uses a warm cache (several test performed
were done before we really counted). Here is what we saw:

					ant clean		ant all
AFS:
SCSI (10k rpm			22			32
IDE (5400 rpm) 			10			16
IDE (7200 rpm)			7			15
SCSI (Digital Alpha, 10MBit)	9			16

ext3:
IDE (5400 rpm)			3			9

	- We get less than 1 Mbit/sec performance in any case (for AFS).
A network monitor show that network traffic is about 100 ... 350
kBit/sec.

	- Even more interesting is what we heard and saw when we looked
at our servers: SCSI severs made almost constant noise and disk lights
where constant on; the router showed little traffic utilization (< 8%).
When we did the test on IDE backed AFS severs the disk lights went on
only in a sparse pattern -- Especially when deleting files.

	-> How can this be explained? Not really with network problems,
right? Maybe it's the difference between the semantics of write and
flush in SCSI and IDE driver / disks / AFS???

--------------------------------------

We took some measurements with "postmark" from netapp.

Here is what we saw:
	Current configuration is:
	The base number of files is 1000
	Transactions: 50000
	Files range between 500 bytes and 9.77 kilobytes in size
	Working directory: 
      	  /xxxxxx (weight=1)
	Block sizes are: read=512 bytes, write=512 bytes
	Biases are: read/append=5, create/delete=5
	Using Unix buffered file I/O
	Random number generator seed is 42
	Report format is verbose.

			transa	read 			write
AFS:
Server SCSI       69		222.09 k/s		232.06 k/s
Server IDE        59		191.66 k/s		200.25 k/s
Server SCSI		45		147.34 k/s		153.94
k/s
(Digital Alpha, 10MBit)

NFS:
Server SCSI		122		396.67 k/s		414.46
k/s

ext3:
Local disk IDE	2272		7.32 M/s		7.65 M/s

What shall we expect? As we introduced AFS we hoped that a slowdown by
50 ... 150 % but as you can see we got > 1000%

In this benchmark SCSI doesn't seen to be too bad, but that's not what
we got by our own benchmarks. (see above)

Remark: All of our Ethernet cards are on auto negotiation, for sure. 
Remark: changing the MTU on server an client leave the performance
unchanged.

I hope someone got a clue?

Bye, ruby

-----Ursprüngliche Nachricht-----
Von: openafs-info-admin@openafs.org
[mailto:openafs-info-admin@openafs.org] Im Auftrag von Daniel
Clark/Cambridge/IBM
Gesendet: Samstag, 3. August 2002 09:50
An: openafs-info@openafs.org
Betreff: Re: AW: [OpenAFS] Severe Performance Issue with SCSI!

> We are checking this with our infrastructure guys, but so far we 
> believe that the switches as well as our network is doing all right 
> (that is duplex).

Getting duplex settings right can be tricky - for example with the
particular combination of new Cisco routers and network cards at my
site, the only way to get 100baseT Full Duplex turned out to be to set
everything to auto-negotiation. With the old Cisco routers,
auto-negotiation didn't work, everything had to be set manually. The
paper "Ethernet Auto-sensing: Adventures in manual configuration" [1] is
great. As has been mentioned this can easily look like an AFS problem
instead of a network problem - usual tests of network throughput (using
ftp, ping, etc.) tend to be highly unidirectional and can be seemingly
unaffected by duplex problems. AFS/Rx seem to be highly sensitive to
network problems in general, so you may want to try other network tests
[2]. One thing you can try to determine if the performance problem is
really an AFS issue is to set up your AFS servers to temporarily also be
NFS servers, and see if the speed problem still exists over NFS. If you
want to be somewhat scientific you can try running postmark [3] against
AFS and NFS using the same client to the same server, once against AFS,
then against NFS. Another cheap thing to try is to set the MTU on the
server and client to something small - say 384 bytes - and see if it
makes any difference. If it does, some router is having problems with
packet fragmentation.

> Besides, we did the benchmark on different ports and segments of our 
> network -- and the AFS server with IDE drives _are_ fast.

> Some AlphaServer, Tru64 UNIX 5.1 (IBM AFS) seems to perform well even 
> with  SCSI; actually this nearly 10 year old machine 10MBit interface 
> is even as fast as our fastest IDE-server running OpenAFS.

Are you saying you are getting less then 1 MB/sec performance even from
your OpenAFS IDE boxes on 100baseT Full Duplex Switched ethernet? If
there is no other load on the network/systems you really should be
getting at least 2.5 MB/sec on reads from AFS. If the most you can get
is 10baseT speeds - around 300-500 KB/sec - then something is seriously
wrong, and it's probably the network.

[1] http://www-commeng.cso.uiuc.edu/docs/autosense/autosense.html
[2] http://www.oreilly.com/catalog/nettroubletools/
[3] http://www.netapp.com/tech_library/3022.html

--
Daniel Clark § Sys Admin & Assistant Release Engineer
IBM » Lotus » Messaging Technology Group

_______________________________________________
OpenAFS-info mailing list
OpenAFS-info@openafs.org
https://lists.openafs.org/mailman/listinfo/openafs-info