Skip to main content

virtualization - Hosting a ZFS server as a virtual guest

itemprop="text">


I'm still new to ZFS. I've
been using Nexenta but I'm thinking of switching to OpenIndiana or Solaris 11 Express.
Right now, I'm at a point of considering virtualizing the ZFS server as a guest within
either ESXi, Hyper-V or XenServer (I haven't decided which one yet - I'm leaning towards
ESXi for VMDirectPath and FreeBSD support).



The
primary reason being that it seems like I have enough resources to go around that I
could easily have 1-3 other VMs running concurrently. Mostly Windows Server. Maybe a
Linux/BSD VM as well. I'd like the virtualized ZFS server to host all the data for the
other VMs so their data could be kept on a physically separate disks from the ZFS disks
(mount as iscsi or nfs).



The server currently
has an AMD Phenom II with 6 total cores (2 unlocked), 16GB RAM (maxed out) and an LSI
SAS 1068E HBA with (7) 1TB SATA II disks attached (planning on RAIDZ2 with hot spare). I
also have (4) 32GB SATA II SSDs attached to the motherboard. I'm hoping to mirror two of
the SSDs to a boot mirror (for the virtual host), and leave the other two SSDs for ZIL
and L2ARC (for the ZFS VM guest). I'm willing to add two more disks to store the VM
guests and allocate all seven of the current disks as ZFS storage. Note: The motherboard
does not have IOMMU support as the 880G doesn't support it, but I
do have an 890FX board which does have IOMMU if it makes a huge
difference.



My questions
are:



1) Is it wise to do this? I don't see any
obviously downside (which makes me wonder why no one else has mentioned it). I feel like
I could be making a huge oversight and I'd hate to commit to this, move over all my data
only to go fubar from some minute detail I
missed.




2) ZFS virtual guest
performance? I'm willing to take a small performance hit but I'd think if the VM guest
has full disk access to the disks that at the very least, disk I/O performance will be
negligible (in comparison to running ZFS non-virtualized). Can anyone speak to this from
experience hosting a ZFS server as a VM guest?


class="post-text" itemprop="text">
class="normal">Answer



I've
built a number of these "all-in-one" ZFS storage setups. Initially inspired by the
excellent posts at href="http://blog.laspina.ca/ubiquitous/encapsulating-vt-d-accelerated-zfs-storage-within-esxi"
rel="nofollow noreferrer">Ubiquitous Talk, my solution takes a slightly
different approach to the hardware design, but yields the result of encapsulated
virtualized ZFS storage.



To answer your
questions:




  • Determining
    whether this is a wise approach really depends on your goals. What are you trying to
    accomplish? If you have a technology (ZFS) and are searching for an application for it,
    then this is a bad idea. You're better off using a proper hardware RAID controller and
    running your VMs on a local VMFS partition. It's the path of least resistance. However,
    if you have a specific reason for wanting to use ZFS (replication, compression, data
    security, portability, etc.), then this is definitely possible if you're willing to put
    in the effort.


  • Performance depends
    heavily on your design regardless of whether you're running on bare-metal or virtual.
    Using href="http://en.wikipedia.org/wiki/X86_virtualization#I.2FO_MMU_virtualization_.28AMD-Vi_and_VT-d.29"
    rel="nofollow noreferrer">PCI-passthrough (or AMD IOMMU in your case) is
    essential, as you would be providing your ZFS VM direct access to a SAS storage
    controller and disks. As long as your VM is allocated an appropriate amount of RAM and
    CPU resources, the performance is near-native. Of course, your pool design matters.
    Please consider mirrors versus RAID Z2. ZFS href="http://web.archive.org/web/20120923015835/http://www.nex7.com/node/3"
    rel="nofollow noreferrer">scales across vdevs and not the number of
    disks.





/>

My platform is href="http://www.vmware.com/products/vsphere/esxi-and-esx/overview.html" rel="nofollow
noreferrer">VMWare ESXi 5 and my preferred ZFS-capable operating system is
NexentaStor Community
Edition
.



href="http://www.flickr.com/photos/ewwhite/7295340538/" rel="nofollow
noreferrer">This is my home server. It is an
rel="nofollow noreferrer">HP ProLiant DL370 G6 running ESXi fron an
internal SD card. The two mirrored 72GB disks in the center are linked to the internal
Smart Array P410 RAID controller and form a VMFS volume. That volume holds a NexentaStor
VM. Remember that the ZFS virtual machine needs to live somewhere
on stable storage.



There is an href="http://www.lsi.com/channel/products/storagecomponents/Pages/LSISAS9211-8i.aspx"
rel="nofollow noreferrer">LSI 9211-8i SAS controller connected to the drive
cage housing six 1TB SATA disks on the right. It is passed-through to the NexentaStor
virtual machine, allowing Nexenta to see the disks as a RAID 1+0 setup. The disks are
el-cheapo href="http://www.wdc.com/wdproducts/library/SpecSheet/ENG/2879-701229.pdf" rel="nofollow
noreferrer">Western Digital Green WD10EARS drives href="https://serverfault.com/questions/273412/zfs-nexentastor-and-4k-advanced-format-partition-alignment/273475#273475">aligned
properly with a modified zpool
binary.



I am not using a ZIL device or any L2ARC
cache in this installation.




src="https://i.stack.imgur.com/pUb0I.jpg" alt="enter image description
here">



The VM has 6GB of RAM and 2 vCPU's
allocated. In ESXi, if you use PCI-passthrough, a memory reservation for the full amount
of the VM's assigned RAM will be created.



I
give the NexentaStor VM two network interfaces. One is for management traffic. The other
is part of a separate vSwitch and has a vmkernel interface (without an external uplink).
This allows the VM to provide NFS storage mountable by ESXi through a private network.
You can easily add an uplink interface to provide access to outside hosts.



Install your new VMs on the ZFS-exported
datastore. Be sure to set the "Virtual Machine Startup/Shutdown" parameters in ESXi. You
want the storage VM to boot before the guest systems and shut down
last.



/>


src="https://i.stack.imgur.com/LduQc.png" alt="enter image description
here">



Here are the href="http://www.coker.com.au/bonnie++/" rel="nofollow noreferrer">bonnie++
and iozone
results of a run directly on the NexentaStor VM. ZFS compression is off for the test to
show more relatable numbers, but in practice, ZFS default compression (not gzip) should
always be enabled.



# bonnie++ -u root -n
64:100000:16:64



Version
1.96 ------Sequential Output------ --Sequential Input- --Random-
Concurrency 1
-Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec
%CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP

saint 12G 156
98 206597 26 135609 24 410 97 367498 21 1478 17
Latency 280ms 3177ms 1019ms
163ms 180ms 225ms
Version 1.96 ------Sequential Create------ --------Random
Create--------
saint -Create-- --Read--- -Delete-- -Create-- --Read---
-Delete--
files:max:min /sec %CP /sec %CP /sec %CP /sec %CP /sec %CP /sec
%CP
64:100000:16/64 6585 60 58754 100 32272 79 9827 58 38709 100 27189
80
Latency 1032ms 469us 1080us 101ms 375us
16108us


#
iozone -t1 -i0 -i1 -i2 -r1m
-s12g





Iozone: Performance Test of File I/O

Run began: Wed Jun 13
22:36:14 2012

Record Size 1024 KB
File size set to
12582912 KB
Command line used: iozone -t1 -i0 -i1 -i2 -r1m -s12g

Output is in Kbytes/sec
Time Resolution = 0.000001
seconds.

Throughput test with 1 process
Each process
writes a 12582912 Kbyte file in 1024 Kbyte records

Children see
throughput for 1 initial writers = 234459.41 KB/sec
Children see throughput
for 1 rewriters = 235029.34 KB/sec
Children see throughput for 1 readers =
359297.38 KB/sec
Children see throughput for 1 re-readers = 359821.19
KB/sec
Children see throughput for 1 random readers = 57756.71
KB/sec
Children see throughput for 1 random writers = 232716.19
KB/sec



This
is a NexentaStor DTrace graph showing the storage VM's IOPS and transfer rates during
the test run. 4000 IOPS and 400+ Megabytes/second is pretty reasonable for such low-end
disks. (big block size, though)
src="https://i.stack.imgur.com/AUKG9.png" alt="enter image description
here">



Other notes.




  • You'll want to test your
    SSDs to see if they can be presented directly to a VM or if the DirectPath chooses the
    entire motherboard controller.

  • You don't have much CPU
    power, so limit the storage unit to 2 vCPU's.

  • Don't use
    RAIDZ1/Z2/Z3 unless you really need the disk
    space.


  • Don't use deduplication. Compression is
    free and very useful for VMs. Deduplication would require much more RAM + L2ARC in order
    to be effective.

  • Start without the SSDs and add them if
    necessary. Certain workloads href="https://serverfault.com/questions/228743/whats-in-my-zfs-arc-and-l2arc-caches">don't
    hit the ZIL or L2ARC.

  • NexentaStor is a complete
    package. There's a benefit to having a solid management GUI, however, I've heard of
    success with Napp-It as well.



Comments

Popular posts from this blog

linux - iDRAC6 Virtual Media native library cannot be loaded

When attempting to mount Virtual Media on a iDRAC6 IP KVM session I get the following error: I'm using Ubuntu 9.04 and: $ javaws -version Java(TM) Web Start 1.6.0_16 $ uname -a Linux aud22419-linux 2.6.28-15-generic #51-Ubuntu SMP Mon Aug 31 13:39:06 UTC 2009 x86_64 GNU/Linux $ firefox -version Mozilla Firefox 3.0.14, Copyright (c) 1998 - 2009 mozilla.org On Windows + IE it (unsurprisingly) works. I've just gotten off the phone with the Dell tech support and I was told it is known to work on Linux + Firefox, albeit Ubuntu is not supported (by Dell, that is). Has anyone out there managed to mount virtual media in the same scenario?

ubuntu - Monitoring CPU, Mem, disk, on a single server

I've been looking for a simple starter solution for monitoring my [currently] single server hosted solution. Other than Nagios and similar, are there other good (simple) solutions people are using? Answer Everything depends on what you want. For example Munin is very simple, you can install and configure it in less then 10 minutes (on one server), it can sends alarms, make graphs from monitoring cpu, mem. apache connections, eaccellerator, disk io and many many more (it has many plugins). But if you are planning in future get some more machines, munin may not be enough. For example in munin you cant monitor state of individual processes, can't monitor changes in files (for security purpose). So if you wanna only see what is the utilization of basics parameters on your server and don't plan to buy some more servers Munin is what you are looking for, but if you wanna be alarmed when some of your service is down, take more control on what is happeninig on...

hp proliant - Smart Array P822 with HBA Mode?

We get an HP DL360 G8 with an Smart Array P822 controller. On that controller will come a HP StorageWorks D2700 . Does anybody know, that it is possible to run the Smart Array P822 in HBA mode? I found only information about the P410i, who can run HBA. If this is not supported, what you think about the LSI 9207-8e controller? Will this fit good in that setup? The Hardware we get is used but all original from HP. The StorageWorks has 25 x 900 GB SAS 10K disks. Because the disks are not new I would like to use only 22 for raid6, and the rest for spare (I need to see if the disk count is optimal or not for zfs). It would be nice if I'm not stick to SAS in future. As OS I would like to install debian stretch with zfs 0.71 as file system and software raid. I have see that hp has an page for debian to. I would like to use hba mode because it is recommend, that zfs know at most as possible about the disk, and I'm independent from the raid controller. For us zfs have many benefits, ...