[OpenAFS] calculating memory

Andy Cobaugh phalenor@gmail.com
Sat, 29 Jan 2011 01:19:34 -0500 (EST)


On 2011-01-28 at 22:38, Gary Gatling ( gsgatlin@eos.ncsu.edu ) said:
>
> I am going to use RHEL 6 for the fileserver. I have a test VM up and working 
> with openafs 1.4.14 to start with. Seems to work ok with ext4. The version of 
> VMware we are using is VMware ESX. We pay full price for that. I think we are 
> slowly moving to version 4 but now I think now its mostly 3. (We can use 
> vmxnet 2 NIC but not 3 on most boxes so far)

Sounds similar to what we do, except switch rhel for Solaris 10. You 
definitely want to use whatever the latest vmxnet drivers you have. This 
speeds the network up tremendously, or at the very least reduces CPU 
overhead. I think in Centos 5.5 64bit I couldn't get much more than 
~300Mbps with the virtualized nic, but can get at least 800Mbps with 
vmxnet3. Similar results under Solaris.

Only reason we use Solaris is for compression. With LZJB, we see almost 
2:1 compression on our home directories, which are currently using about 
4TB+ of our SAN storage, which really means we have closer to 6TB of 
actual home directories. LZJB uses hardly any CPU, and I'm sure in some 
cases it's faster to compress than to write to disk. Oh, and end-to-end 
checksums is a nice bonus too if you don't trust your underlying storage, 
even if it is fancy uber-expensive SAN storage (we don't do ZFS RAID, just 
zpools with a single vdev -> RAID5 LUN).

We currently run 3 such fileserver VMs on VMware ESXi 4.x on the same box, 
2 vCPUs each (fileserver will barely use 2 CPUs, so factor in that plus a 
CPU for volserver when doing vos moves). Each of those VMs has 2GB of 
memory assigned to it right now, and that seems to be enough even with ZFS 
in play. If I'm reading the output from ps correctly, one of our larger 
DAFS fileservers running on Centos 5.5 64bit is using 1.8GB, davolserver 
1.5GB. (That's with -p 128 to both commands, so actual memory usage is 
probably much smaller than that).

> It seems like on Solaris 10 with openafs 1.4.11 the server seems to use about 
> 1 GB when its not backing up. I am not sure how much it uses at "peak times" 
> or when doing full backups. And I don't have the new backup software (yet). 
> Teradactyl is the backup software we are switching to to ditch Solaris for 
> Linux.

Just to add another datapoint to the mix, we use TSM (provided by our 
university's central IT), and just do file-level backups. At least that 
way we're server agnostic (though it's not the fastest solution by a 
longshot - the TSM server is the bottleneck in our case, so there wasn't 
any point in choosing a faster backup strategy).

I'm curious - how are you backing up AFS now?

> I gather real servers aren't an option `cause management really likes moving 
> most everything into VMware. We already moved all our license and web servers 
> into VMware and we have some other weird servers working in it also. Even 
> Windows infrastucture like domain controllers and stuff. If everyone says its 
> a bad idea I can make an argument though. :)

Eh, if you push your data onto these virtualized servers and performance 
takes a hit (we'll sometimes see sporadic slowdowns when vos moves 
are happening on the same ESX host), then obviously you can try to take 
the "I told you so" approach and get some bare metal hardware to compare 
things to.

Oh, and we also do raw device maps in ESX. I haven't quantified how much 
faster raw device maps are over <your vicep fs> -> <virtualized 
storage> -> VMXFS -> SAN, but being able to access that LUN from a non-ESX 
box and see ext4 instead of VMXFS sounds like the makings of a good DR 
strategy.

One more thing: SAN raw device maps in ESX 4 are limited to 2TB. I guess 
the hypervisor is still using Linux 2.4, and there are some limitations 
from Linux itself in play there. You can create a VMXFS bigger than 2TB by 
using multiple extents (I think). iSCSI doesn't have this limitation. Just 
something to be aware of.

I would be very curious to see any benchmarks you come up with. Things 
like iozone on the vicep itself, iperf between VMs on the same vSwitch, 
between VMs on different hosts, etc.

--andy