[OpenAFS] Deploying OpenAFS on VMs
Hoskins, Matthew
Matthew.Hoskins@njit.edu
Mon, 20 Jun 2011 12:56:12 -0400
--_000_8C89C6E7BB60B041A71DC9845CA92E690906B32CDCadm01njitdmca_
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
Hello all,
NJIT has been running Virtualized fileservers and database servers for at 4=
years. We operate two cells, our administrative cell is fully virtualize=
d. Our Student/Research cell is partially virtualized (full virtualization=
in progress now).
OpenAFS at NJIT forms the backbone of many of our IT operations, It is crit=
ical to our system management philosophy and is used widely by both adminis=
trative staff, faculty and students. (Even though they may not always kno=
w they are using OpenAFS under the covers!)
The VM platform we use is VMware vSphere 4.1 running on a 30 hosts in three=
HA clusters. Our storage is currently SUN/StorageTek FC SAN arrays and N=
etApp (11 arrays total). Each vSphere host is dual attached to the SAN at=
4gb/s and dual-pathed etherchannel to the campus network cisco 65xx VSS da=
tacenter switch complex. (Migration to 10G net and 8g FC underway as we al=
so migrate to a new hardware vendor)
All VMs use normal VMDK virtual disks.
VMware HA clusters have DRS in "Fully automated" mode, so VMs will vMotion =
between nodes seeking best use of available resources. vSphere does a pr=
etty good job at this, provided you adequately configure how you want resou=
rces allocated using VMware "Resource Pools"
Overall, we have about 360 VMs in the entire environment, of those about 30=
of those are VM fileservers. All other VMs and nearly every desktop on c=
ampus is a client.
Performance is good. Complaints are rare and have never been attributed t=
o "virtualization" as the real cause. (Although it is sometimes the first =
to be blamed) We have everything from user home directories, web content,=
software, data files, etc. Wildly different workloads.
OpenAFS is an workload just like any other. It uses CPU, Memory, and I/O.=
When building up a virtual environment all of those resources need to be=
accounted for and scaled appropriately. With modern hardware the "virtua=
lization tax" caused by the hypervisor abstraction is getting smaller and s=
maller. This is in large part due to work done by AMD and Intel to includ=
e virtualization instruction enhancements in their CPUs, MMUs and chipsets.
Suggestions (these are somewhat VMware specific, but other VM platforms sha=
re these features):
* Use a real virtualization platform (sorry, that sounds snobbish, =
but its true, the free stuff doesn't cut it when you scale up), features wh=
ich are very important:
o Dynamic Resource Scheduling to move VMs around the cluster to seek reso=
urces
o High Availability (HA) at the VM platform layer greatly improves total =
uptime. This eliminates the need to support separate HA solutions for eve=
ry application. HA is implemented BELOW the OS in the VM layer. We have o=
ne HA cluster technology (Goodbye MS cluster, sun cluster, linux cluster, =
cluster of the month club)
o Storage Migration: (VMware sVmotion)... There have been a number of si=
tuation where we have had to "evacuate" a disk array. For hardware replace=
ment, upgrade, etc This can be done non-disruptively.
o Memory overcommitment
o Integrated backup management (VMware Consolidated Backup, Site Recovery=
Manager, Etc...)
* Deploying VMs from Templates: Templates allow us to deploy a ne=
w Fileserver VM in (a small number of) minutes. Deploy VM, Run configur=
ation scripts, done. Its ready to vos move volumes to it. This is how =
we perform most major software upgrades. New FS VMs are deployed, Old fil=
eserver VMs are evacuated and destroyed. All non-disruptive.
* Many fileservers / smaller fileservers: This philosophy has evo=
lved as we have moved more into virtualized fileservers. With physical ha=
rdware you are limited by ABILITY TO PURCHASE. Meaning, you can only get "=
x" number of servers of "n" size. This means if you want highly resilient=
servers, you can only afford to by a few of them. This can lead to very =
fat fileservers. If you go for many cheap fileservers, you might be able =
to get more distributed but end up suffering more small individual outages.=
With virtualized fileservers you have full flexibility. On the virtual =
platform, you get HA by default on every VM. After that you get to desig=
n your fileserver layout decisions based on the DATA they will store. Fo=
r example, in our layout we have the following classes of fileservers:
o Infrastructure (INF) Fileservers: Very small fileserver, for highly cr=
itical root.*, software, etc. the "bones" of the cell. Replicated of cours=
e.
o User fileservers (USR): Home volumes, nuff said
o Bulk Fileservers (BLK): Almost everything else, projects, web content, =
research data, yadda yadda
o Jumbo Fileservers (JMB): Used for ridiculously large volumes. These f=
ileservers are the only fileserver that has a VARIABLE vicep partition size=
. Used for archival data and some research projects.
* Headroom is maintained on the fileservers in a mostly n+1 type fa=
shion so that at least one fileserver can be evacuated at any given time fo=
r maintenance. Almost ALL maintenance is non-disruptive. (Volumes move, =
FS VMs move from blade to blade, and even from array to array non-disruptiv=
e)
* Balancing, evacuation, and new volume placement should be automat=
ed as much as possible. (we have scripts to do this)
So, To boil this down. It works. It can work, and can actually work very=
well. The system needs to be scaled and built properly. When you consol=
idate workloads in this manner, you can afford to buy better hardware and s=
torage. You get HA everywhere. Capacity that is heavily used during the =
day is freed and available at night for other workloads.
Perhaps one of the nicest features is your operation becomes more vendor ag=
nostic. When you have squeezed all the pennies out of your server or stor=
age hardware, moving to new hardware is much easier. Add new hardware, mov=
e VMs.
Hope this helps, If anyone has questions pls reply to the list or personall=
y. One of us will reply.
Thanks
-Matt
--_000_8C89C6E7BB60B041A71DC9845CA92E690906B32CDCadm01njitdmca_
Content-Type: text/html; charset="us-ascii"
Content-Transfer-Encoding: quoted-printable
<html xmlns:v=3D"urn:schemas-microsoft-com:vml" xmlns:o=3D"urn:schemas-micr=
osoft-com:office:office" xmlns:w=3D"urn:schemas-microsoft-com:office:word" =
xmlns:m=3D"http://schemas.microsoft.com/office/2004/12/omml" xmlns=3D"http:=
//www.w3.org/TR/REC-html40"><head><meta http-equiv=3DContent-Type content=
=3D"text/html; charset=3Dus-ascii"><meta name=3DGenerator content=3D"Micros=
oft Word 12 (filtered medium)"><style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Consolas;
panose-1:2 11 6 9 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri","sans-serif";}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
p.MsoPlainText, li.MsoPlainText, div.MsoPlainText
{mso-style-priority:99;
mso-style-link:"Plain Text Char";
margin:0in;
margin-bottom:.0001pt;
font-size:10.5pt;
font-family:Consolas;}
p.MsoListParagraph, li.MsoListParagraph, div.MsoListParagraph
{mso-style-priority:34;
margin-top:0in;
margin-right:0in;
margin-bottom:0in;
margin-left:.5in;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri","sans-serif";}
span.PlainTextChar
{mso-style-name:"Plain Text Char";
mso-style-priority:99;
mso-style-link:"Plain Text";
font-family:Consolas;}
.MsoChpDefault
{mso-style-type:export-only;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
/* List Definitions */
@list l0
{mso-list-id:964193699;
mso-list-type:hybrid;
mso-list-template-ids:-572249006 1813300192 67698691 67698693 67698689 676=
98691 67698693 67698689 67698691 67698693;}
@list l0:level1
{mso-level-start-at:2;
mso-level-number-format:bullet;
mso-level-text:\F0B7;
mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:.75in;
text-indent:-.25in;
font-family:Symbol;
mso-fareast-font-family:Calibri;
mso-bidi-font-family:"Times New Roman";}
@list l0:level2
{mso-level-number-format:bullet;
mso-level-text:o;
mso-level-tab-stop:none;
mso-level-number-position:left;
margin-left:1.25in;
text-indent:-.25in;
font-family:"Courier New";}
@list l0:level3
{mso-level-tab-stop:1.5in;
mso-level-number-position:left;
text-indent:-.25in;}
@list l0:level4
{mso-level-tab-stop:2.0in;
mso-level-number-position:left;
text-indent:-.25in;}
@list l0:level5
{mso-level-tab-stop:2.5in;
mso-level-number-position:left;
text-indent:-.25in;}
@list l0:level6
{mso-level-tab-stop:3.0in;
mso-level-number-position:left;
text-indent:-.25in;}
@list l0:level7
{mso-level-tab-stop:3.5in;
mso-level-number-position:left;
text-indent:-.25in;}
@list l0:level8
{mso-level-tab-stop:4.0in;
mso-level-number-position:left;
text-indent:-.25in;}
@list l0:level9
{mso-level-tab-stop:4.5in;
mso-level-number-position:left;
text-indent:-.25in;}
ol
{margin-bottom:0in;}
ul
{margin-bottom:0in;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext=3D"edit" spidmax=3D"1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext=3D"edit">
<o:idmap v:ext=3D"edit" data=3D"1" />
</o:shapelayout></xml><![endif]--></head><body lang=3DEN-US link=3Dblue vli=
nk=3Dpurple><div class=3DWordSection1><p class=3DMsoNormal>Hello all, <o:p>=
</o:p></p><p class=3DMsoNormal><o:p> </o:p></p><p class=3DMsoNormal>NJ=
IT has been running Virtualized fileservers and database servers for at 4 y=
ears. We operate two cells, our administrative cell is fully vi=
rtualized. Our Student/Research cell is partially virtualized (full v=
irtualization in progress now). <o:p></o:p></p><p class=3DMsoNormal><=
o:p> </o:p></p><p class=3DMsoNormal>OpenAFS at NJIT forms the backbone=
of many of our IT operations, It is critical to our system management phil=
osophy and is used widely by both administrative staff, faculty and student=
s. (Even though they may not always know they are using OpenAFS=
under the covers!)<o:p></o:p></p><p class=3DMsoNormal><o:p> </o:p></p=
><p class=3DMsoNormal>The VM platform we use is VMware vSphere 4.1 running =
on a 30 hosts in three HA clusters. Our storage is currently SU=
N/StorageTek FC SAN arrays and NetApp (11 arrays total). Each v=
Sphere host is dual attached to the SAN at 4gb/s and dual-pathed etherchann=
el to the campus network cisco 65xx VSS datacenter switch complex. (M=
igration to 10G net and 8g FC underway as we also migrate to a new hardware=
vendor)<o:p></o:p></p><p class=3DMsoNormal><o:p> </o:p></p><p class=
=3DMsoNormal>All VMs use normal VMDK virtual disks. <o:p>=
</o:p></p><p class=3DMsoNormal>VMware HA clusters have DRS in “Fully =
automated” mode, so VMs will vMotion between nodes seeking best use o=
f available resources. vSphere does a pretty good job at =
this, provided you adequately configure how you want resources allocated us=
ing VMware “Resource Pools” <o:p></o:p></p><p class=3DMso=
Normal><o:p> </o:p></p><p class=3DMsoNormal>Overall, we have about 360=
VMs in the entire environment, of those about 30 of those are VM fileserve=
rs. All other VMs and nearly every desktop on campus is a clien=
t. <o:p></o:p></p><p class=3DMsoNormal><o:p> </o:p></p><p class=3DMsoN=
ormal>Performance is good. Complaints are rare and have never b=
een attributed to “virtualization” as the real cause. (Al=
though it is sometimes the first to be blamed) We have everythi=
ng from user home directories, web content, software, data files, etc=
. Wildly different workloads. <o:p></o:p></p><p class=3DMsoNorm=
al><o:p> </o:p></p><p class=3DMsoNormal>OpenAFS is an workload j=
ust like any other. It uses CPU, Memory, and I/O. When bu=
ilding up a virtual environment all of those resources need to be accounted=
for and scaled appropriately. With modern hardware the “=
virtualization tax” caused by the hypervisor abstraction is getting s=
maller and smaller. This is in large part due to work done by A=
MD and Intel to include virtualization instruction enhancements in their CP=
Us, MMUs and chipsets. <o:p></o:p></p><p class=3DMsoNormal><o:p=
> </o:p></p><p class=3DMsoNormal>Suggestions (these are somewhat VMwar=
e specific, but other VM platforms share these features):<o:p></o:p></p><p =
class=3DMsoListParagraph style=3D'margin-left:.75in;text-indent:-.25in;mso-=
list:l0 level1 lfo1'><![if !supportLists]><span style=3D'font-family:Symbol=
'><span style=3D'mso-list:Ignore'>·<span style=3D'font:7.0pt "Times =
New Roman"'> </span></span>=
</span><![endif]>Use a real virtualization platform (sorry, that sounds sno=
bbish, but its true, the free stuff doesn’t cut it when you scale up)=
, features which are very important: <o:p></o:p></p><p class=3DMsoListParag=
raph style=3D'margin-left:1.25in;text-indent:-.25in;mso-list:l0 level2 lfo1=
'><![if !supportLists]><span style=3D'font-family:"Courier New"'><span styl=
e=3D'mso-list:Ignore'>o<span style=3D'font:7.0pt "Times New Roman"'> &=
nbsp; </span></span></span><![endif]>Dynamic Resource Scheduling to move VM=
s around the cluster to seek resources<o:p></o:p></p><p class=3DMsoListPara=
graph style=3D'margin-left:1.25in;text-indent:-.25in;mso-list:l0 level2 lfo=
1'><![if !supportLists]><span style=3D'font-family:"Courier New"'><span sty=
le=3D'mso-list:Ignore'>o<span style=3D'font:7.0pt "Times New Roman"'> =
</span></span></span><![endif]>High Availability (HA) at the VM plat=
form layer greatly improves total uptime. This eliminates the n=
eed to support separate HA solutions for every application. HA is imp=
lemented BELOW the OS in the VM layer. We have one HA cluster technol=
ogy (Goodbye MS cluster, sun cluster, linux cluster, cluster of the m=
onth club) <o:p></o:p></p><p class=3DMsoListParagraph style=3D'margin-left:=
1.25in;text-indent:-.25in;mso-list:l0 level2 lfo1'><![if !supportLists]><sp=
an style=3D'font-family:"Courier New"'><span style=3D'mso-list:Ignore'>o<sp=
an style=3D'font:7.0pt "Times New Roman"'> </span></span></span=
><![endif]>Storage Migration: (VMware sVmotion)… There have bee=
n a number of situation where we have had to “evacuate” a disk =
array. For hardware replacement, upgrade, etc This =
can be done non-disruptively.<o:p></o:p></p><p class=3DMsoListParagraph sty=
le=3D'margin-left:1.25in;text-indent:-.25in;mso-list:l0 level2 lfo1'><![if =
!supportLists]><span style=3D'font-family:"Courier New"'><span style=3D'mso=
-list:Ignore'>o<span style=3D'font:7.0pt "Times New Roman"'> </=
span></span></span><![endif]>Memory overcommitment<o:p></o:p></p><p class=
=3DMsoListParagraph style=3D'margin-left:1.25in;text-indent:-.25in;mso-list=
:l0 level2 lfo1'><![if !supportLists]><span style=3D'font-family:"Courier N=
ew"'><span style=3D'mso-list:Ignore'>o<span style=3D'font:7.0pt "Times New =
Roman"'> </span></span></span><![endif]>Integrated backup manag=
ement (VMware Consolidated Backup, Site Recovery Manager, Etc…)<o:p><=
/o:p></p><p class=3DMsoListParagraph style=3D'margin-left:.75in;text-indent=
:-.25in;mso-list:l0 level1 lfo1'><![if !supportLists]><span style=3D'font-f=
amily:Symbol'><span style=3D'mso-list:Ignore'>·<span style=3D'font:7=
.0pt "Times New Roman"'> </=
span></span></span><![endif]>Deploying VMs from Templates: Temp=
lates allow us to deploy a new Fileserver VM in (a small number of) minutes=
. Deploy VM, Run configuration scripts, done. =
Its ready to vos move volumes to it. This is how w=
e perform most major software upgrades. New FS VMs are deployed=
, Old fileserver VMs are evacuated and destroyed. All non-disru=
ptive. <o:p></o:p></p><p class=3DMsoListParagraph style=3D'margin-lef=
t:.75in;text-indent:-.25in;mso-list:l0 level1 lfo1'><![if !supportLists]><s=
pan style=3D'font-family:Symbol'><span style=3D'mso-list:Ignore'>·<s=
pan style=3D'font:7.0pt "Times New Roman"'> &n=
bsp; </span></span></span><![endif]>Many fileservers / smaller =
fileservers: This philosophy has evolved as we have moved more =
into virtualized fileservers. With physical hardware you are li=
mited by ABILITY TO PURCHASE. Meaning, you can only get “x̶=
1; number of servers of “n” size. This means if you=
want highly resilient servers, you can only afford to by a few of them.&nb=
sp; This can lead to very fat fileservers. If you go for =
many cheap fileservers, you might be able to get more distributed but end u=
p suffering more small individual outages. With virtualized fil=
eservers you have full flexibility. On the virtual platform, you get =
HA by default on every VM. After that you get to design y=
our fileserver layout decisions based on the DATA they will store. &nb=
sp; For example, in our layout we have the following classes of files=
ervers:<o:p></o:p></p><p class=3DMsoListParagraph style=3D'margin-left:1.25=
in;text-indent:-.25in;mso-list:l0 level2 lfo1'><![if !supportLists]><span s=
tyle=3D'font-family:"Courier New"'><span style=3D'mso-list:Ignore'>o<span s=
tyle=3D'font:7.0pt "Times New Roman"'> </span></span></span><![=
endif]>Infrastructure (INF) Fileservers: Very small fileserver, for h=
ighly critical root.*, software, etc. the “bones” of the cell.&=
nbsp; Replicated of course. <o:p></o:p></p><p class=3DMsoListPa=
ragraph style=3D'margin-left:1.25in;text-indent:-.25in;mso-list:l0 level2 l=
fo1'><![if !supportLists]><span style=3D'font-family:"Courier New"'><span s=
tyle=3D'mso-list:Ignore'>o<span style=3D'font:7.0pt "Times New Roman"'>&nbs=
p; </span></span></span><![endif]>User fileservers (USR): Home =
volumes, nuff said<o:p></o:p></p><p class=3DMsoListParagraph style=3D'margi=
n-left:1.25in;text-indent:-.25in;mso-list:l0 level2 lfo1'><![if !supportLis=
ts]><span style=3D'font-family:"Courier New"'><span style=3D'mso-list:Ignor=
e'>o<span style=3D'font:7.0pt "Times New Roman"'> </span></span=
></span><![endif]>Bulk Fileservers (BLK): Almost everything else, projects,=
web content, research data, yadda yadda<o:p></o:p></p><p class=3DMsoListPa=
ragraph style=3D'margin-left:1.25in;text-indent:-.25in;mso-list:l0 level2 l=
fo1'><![if !supportLists]><span style=3D'font-family:"Courier New"'><span s=
tyle=3D'mso-list:Ignore'>o<span style=3D'font:7.0pt "Times New Roman"'>&nbs=
p; </span></span></span><![endif]>Jumbo Fileservers (JMB): Used=
for ridiculously large volumes. These fileservers are the only files=
erver that has a VARIABLE vicep partition size. Used for archiv=
al data and some research projects. <o:p></o:p></p><p class=3DMsoListParagr=
aph style=3D'margin-left:.75in;text-indent:-.25in;mso-list:l0 level1 lfo1'>=
<![if !supportLists]><span style=3D'font-family:Symbol'><span style=3D'mso-=
list:Ignore'>·<span style=3D'font:7.0pt "Times New Roman"'> &nb=
sp; </span></span></span><![endif]>Head=
room is maintained on the fileservers in a mostly n+1 type fashion so that =
at least one fileserver can be evacuated at any given time for maintenance.=
Almost ALL maintenance is non-disruptive. (Volumes move,=
FS VMs move from blade to blade, and even from array to array non-disrupti=
ve) <o:p></o:p></p><p class=3DMsoListParagraph style=3D'margin-left:.75in;t=
ext-indent:-.25in;mso-list:l0 level1 lfo1'><![if !supportLists]><span style=
=3D'font-family:Symbol'><span style=3D'mso-list:Ignore'>·<span style=
=3D'font:7.0pt "Times New Roman"'>  =
; </span></span></span><![endif]>Balancing, evacuation, and new volum=
e placement should be automated as much as possible. (we have scripts=
to do this) <o:p></o:p></p><p class=3DMsoNormal style=3D'margin-left:.5in'=
><o:p> </o:p></p><p class=3DMsoNormal><o:p> </o:p></p><p class=3D=
MsoNormal>So, To boil this down. It works. It can work, a=
nd can actually work very well. The system needs to be scaled a=
nd built properly. When you consolidate workloads in this manner, you=
can afford to buy better hardware and storage. You get HA ever=
ywhere. Capacity that is heavily used during the day is freed and ava=
ilable at night for other workloads. <o:p></o:p></p><p class=3D=
MsoNormal><o:p> </o:p></p><p class=3DMsoNormal>Perhaps one of the nice=
st features is your operation becomes more vendor agnostic. Whe=
n you have squeezed all the pennies out of your server or storage hardware,=
moving to new hardware is much easier. Add new hardware, move VMs.&n=
bsp; <o:p></o:p></p><p class=3DMsoNormal><o:p> </o:p></p><p clas=
s=3DMsoNormal>Hope this helps, If anyone has questions pls reply to the lis=
t or personally. One of us will reply. <o:p></o:p></p><p class=
=3DMsoNormal><o:p> </o:p></p><p class=3DMsoNormal>Thanks<o:p></o:p></p=
><p class=3DMsoNormal>-Matt<o:p></o:p></p><p class=3DMsoPlainText><o:p>&nbs=
p;</o:p></p></div></body></html>=
--_000_8C89C6E7BB60B041A71DC9845CA92E690906B32CDCadm01njitdmca_--