xhyve vs vbox driver benchmarks for docker-machine

Getting a usable and productive dev environment working with Docker on OS X is not exactly trivial, although it is getting much better.  If you have spent any time working with docker-machine and Docker on OS X you’ve probably run across some type of roadblock to getting your dev environment working.

If you have used docker-machine you are probably familiar with the Virtualbox driver, the driver that ships as by default.  Obviously it works out of the box but if you have used Virtualbox for any amount of time you have probably discovered some of its quirks.  My biggest gripe thus far with Vbox is that their shared folders technology to sync files between the host and VM is slooooow.  In fact, I have written about my own workaround here.

I have run in to some other performance issues using VBox.  This write up is a very detailed comparison of the performance between VBox and VMWare.  The tl;dr of the post is that that the VMWare hypervisor has better performance.  To Oracle’s credit though, many of these performance issues have actually been addressed in the vbox 5.0.0 release.  So if you aren’t running on 5.x definitely make the jump.  The Docker Toolbox ships the newer release so there is no reason not to upgrade.

Making the jump to VBox 5.x may, and most likely should solve your problems but I have been curious about what other options are out there.  Recently, as of July 2015, the xhyve hypervisor project has been available on OS X.  xhyve is a port of the byhve project, which aims to bring high performance virtualization with a light footprint to OS X.  It is still very young but shows a lot of potential.

Even younger than the xhyve project itself is the xhyve driver for docker-machine.  It is so young that it is still not an officially supported driver yet, though it looks like it is well on its way.  Definitely keep an eye on the xhyve and docker-machine xhyve projects if you are looking for an alternative to either VBox or VMWare.  The xhyve docker-machine driver project has recently closed a ticket to be added to brew so it is much less complicated to get working.

Xhyve installation

I will be going over the bare minimum installation instructions to getting everything working.  If you are interested in more of the details on how to get the xhyve driver, I suggest taking a look at this awesome blog post.  The post goes in to depth on how to install and use the docker-machine xhyve driver if you are interested in a more in depth look at how to get things working.

Make sure you have brew installed first.  You will also need to have brew cask installed.  After you have brew installed you should be able to get it from the command line with the following command.

brew tap caskroom/cask

Once you have cask installed you should be able to install the remaining components.

brew update
brew install xhyve
brew cask install dockertoolbox

This might take a little bit depending on how fast your internet connection is.  After you have the toolbox installed, go grab the docker-machine xhyve driver.

brew install docker-machine-driver-xhyve

If you have the dockertoolbox installed already you might some errors in the output.  This just means there was a version conflict somewhere.  As of docker-machine version 0.5.6_1, support has been added for the xhyve driver.

There is currently a caveat to using this driver where you need to change some permissions.  This should hopefully be fixed in the future but is at least something to be aware of.

sudo chown root:wheel $(brew --prefix)/opt/docker-machine-driver-xhyve/bin/docker-machine-driver-xhyve
sudo chmod u+s $(brew --prefix)/opt/docker-machine-driver-xhyve/bin/docker-machine-driver-xhyve

You will also need to clean out your /etc/exports file if you have made changes.

sudo mv /etc/exports{,.backup} && touch /etc/exports

Then create the machine.

docker-machine create --driver xhyve --xhyve-experimental-nfs-share test

If you can interact with the Docker daemon you should be in business.

Benchmark results

The remainder of the post describes the benchmark and performance results of the VBox driver and the xhyve driver.  If you are only interested in getting the xhyve driver working then feel free to skim through the benchmarks, but be sure to take a look at the conclusion for the final verdict.

Below are the specs of the OS X machine that was used to run my benchmarks.

  • OS X 10.10.5
  • Virtualbox 5.0.12
  • xhyve 0.2.0
  • docker-machine-xyhve 0.2.2

Many of the ideas I used for the benchmark tests were taken from the post linked above.  It shows in great detail the methodology that was used to benchmark each of the drivers, which is useful because it gives some really good insight into the tools that were used and how the tests were performed.

Benchmarking on the boot2docker VM is tricky because it is mostly a read only file system and there is no package manger.  Therefore I relied on running the benchmarks inside containers, using a few different methodologies for my testing.  The first was borrowed from the simple-container-benchmarks project on Dockerhub.  This benchmark test gives a good idea of the overall write performance and CPU performance of a container running inside the VM.  For network performance I used the iperf3 image located on Dockerhub.

Below are the results of a few random runs for both the VBox driver as well as the xhyve driver.  I have left out the specific commands here as they are included in the links to each benchmark.  Use the links to each project for specific instructions on how to run the benchmarks yourself if you are interested.  The results were interesting because I was expecting the xhyve driver to outperform the VBox driver.

Virtualbox results

container benchmark results (FS write and CPU)

Client mode...
Target: 172.17.0.2
------------------------------
Performance benchmarks
------------------------------
dockerhost: tcp://192.168.99.100:2376
host: 172.17.0.2 a8b790317264
eth0: 172.17.0.2
date: Sat Jan 23 02:24:20 UTC 2016

------------------------------
FS write performance
------------------------------
1073741824 bytes (1.1 GB) copied, 2.39743 s, 448 MB/s
1073741824 bytes (1.1 GB) copied, 2.35377 s, 456 MB/s
1073741824 bytes (1.1 GB) copied, 1.9075 s, 563 MB/s
1073741824 bytes (1.1 GB) copied, 2.37838 s, 451 MB/s
1073741824 bytes (1.1 GB) copied, 2.03373 s, 528 MB/s
1073741824 bytes (1.1 GB) copied, 1.94024 s, 553 MB/s
1073741824 bytes (1.1 GB) copied, 1.99546 s, 538 MB/s
1073741824 bytes (1.1 GB) copied, 2.00287 s, 536 MB/s
1073741824 bytes (1.1 GB) copied, 1.5292 s, 702 MB/s
1073741824 bytes (1.1 GB) copied, 1.92617 s, 557 MB/s

------------------------------
CPU performance
------------------------------
268435456 bytes (268 MB) copied, 22.6775 s, 11.8 MB/s
268435456 bytes (268 MB) copied, 22.1466 s, 12.1 MB/s
268435456 bytes (268 MB) copied, 30.7552 s, 8.7 MB/s
268435456 bytes (268 MB) copied, 22.2861 s, 12.0 MB/s
268435456 bytes (268 MB) copied, 22.5571 s, 11.9 MB/s
268435456 bytes (268 MB) copied, 21.9901 s, 12.2 MB/s
268435456 bytes (268 MB) copied, 21.8232 s, 12.3 MB/s
268435456 bytes (268 MB) copied, 31.3903 s, 8.6 MB/s
268435456 bytes (268 MB) copied, 28.1219 s, 9.5 MB/s
268435456 bytes (268 MB) copied, 31.0172 s, 8.7 MB/s

------------------------------
System info
------------------------------
             total       used       free     shared    buffers     cached
Mem:       1019960     313288     706672     113104       7808     132532
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                2
On-line CPU(s) list:   0,1
Thread(s) per core:    1
Core(s) per socket:    2
Socket(s):             1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 58
Stepping:              9
CPU MHz:               2294.770
BogoMIPS:              4589.54
Hypervisor vendor:     KVM
Virtualization type:   full
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              3072K

iperf results

Connecting to host 172.17.0.3, port 5201
[  4] local 172.17.0.4 port 39476 connected to 172.17.0.3 port 5201
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec  2.09 GBytes  17.9 Gbits/sec  2321   1.03 MBytes
[  4]   1.00-2.00   sec  2.46 GBytes  21.1 Gbits/sec  496    980 KBytes
[  4]   2.00-3.00   sec  2.24 GBytes  19.3 Gbits/sec  339   1.77 MBytes
[  4]   3.00-4.00   sec  2.54 GBytes  21.8 Gbits/sec  1355    389 KBytes
[  4]   4.00-5.00   sec  2.10 GBytes  18.0 Gbits/sec  106    495 KBytes
[  4]   5.00-6.00   sec  3.00 GBytes  25.7 Gbits/sec  217    411 KBytes
[  4]   6.00-7.00   sec  2.60 GBytes  22.4 Gbits/sec  440   1.72 MBytes
[  4]   7.00-8.00   sec  2.06 GBytes  17.7 Gbits/sec    0   1.72 MBytes
[  4]   8.00-9.00   sec  2.07 GBytes  17.8 Gbits/sec    0   1.72 MBytes
[  4]   9.00-10.00  sec  2.51 GBytes  21.6 Gbits/sec  876    713 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec  23.7 GBytes  20.3 Gbits/sec  6150             sender
[  4]   0.00-10.00  sec  23.7 GBytes  20.3 Gbits/sec                  receiver

iperf Done.

xhyve results

container benchmark results (FS write and CPU)

Client mode...
Target: 172.17.0.2
------------------------------
Performance benchmarks
------------------------------
dockerhost:
host: 172.17.0.2 2c8d9ba61eae
eth0: 172.17.0.2
date: Sat Jan 23 02:08:15 UTC 2016

------------------------------
FS write performance
------------------------------
1073741824 bytes (1.1 GB) copied, 8.24671 s, 130 MB/s
1073741824 bytes (1.1 GB) copied, 5.89179 s, 182 MB/s
1073741824 bytes (1.1 GB) copied, 6.05392 s, 177 MB/s
1073741824 bytes (1.1 GB) copied, 5.37728 s, 200 MB/s
1073741824 bytes (1.1 GB) copied, 4.824 s, 223 MB/s
1073741824 bytes (1.1 GB) copied, 5.90409 s, 182 MB/s
1073741824 bytes (1.1 GB) copied, 5.22375 s, 206 MB/s
1073741824 bytes (1.1 GB) copied, 5.07298 s, 212 MB/s
1073741824 bytes (1.1 GB) copied, 5.89058 s, 182 MB/s
1073741824 bytes (1.1 GB) copied, 4.80828 s, 223 MB/s

------------------------------
CPU performance
------------------------------
268435456 bytes (268 MB) copied, 25.478 s, 10.5 MB/s
268435456 bytes (268 MB) copied, 31.3984 s, 8.5 MB/s
268435456 bytes (268 MB) copied, 24.698 s, 10.9 MB/s
268435456 bytes (268 MB) copied, 31.1973 s, 8.6 MB/s
268435456 bytes (268 MB) copied, 23.3705 s, 11.5 MB/s
268435456 bytes (268 MB) copied, 23.3973 s, 11.5 MB/s
268435456 bytes (268 MB) copied, 23.7405 s, 11.3 MB/s
268435456 bytes (268 MB) copied, 23.6118 s, 11.4 MB/s
268435456 bytes (268 MB) copied, 23.5606 s, 11.4 MB/s
268435456 bytes (268 MB) copied, 24.3341 s, 11.0 MB/s

------------------------------
System info
------------------------------
             total       used       free     shared    buffers     cached
Mem:       1020028     291632     728396      70356       6420      89824
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                2
On-line CPU(s) list:   0,1
Thread(s) per core:    1
Core(s) per socket:    1
Socket(s):             2
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 58
Stepping:              9
CPU MHz:               2294.450
BogoMIPS:              4607.99
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              3072K

iperf results

Connecting to host 172.17.0.2, port 5201
[  4] local 172.17.0.3 port 49244 connected to 172.17.0.2 port 5201
[ ID] Interval           Transfer     Bandwidth       Retr  Cwnd
[  4]   0.00-1.00   sec  2.29 GBytes  19.7 Gbits/sec    0   1.90 MBytes
[  4]   1.00-2.00   sec  2.84 GBytes  24.4 Gbits/sec  567    953 KBytes
[  4]   2.00-3.00   sec  2.16 GBytes  18.6 Gbits/sec  327    667 KBytes
[  4]   3.00-4.00   sec  2.32 GBytes  19.9 Gbits/sec  166   1.52 MBytes
[  4]   4.00-5.00   sec  2.63 GBytes  22.6 Gbits/sec  565    769 KBytes
[  4]   5.00-6.00   sec  2.71 GBytes  23.3 Gbits/sec  608    583 KBytes
[  4]   6.00-7.00   sec  2.67 GBytes  22.9 Gbits/sec  217   1.40 MBytes
[  4]   7.00-8.00   sec  2.98 GBytes  25.6 Gbits/sec  782    498 KBytes
[  4]   8.00-9.00   sec  2.80 GBytes  24.0 Gbits/sec  359   1.01 MBytes
[  4]   9.00-10.00  sec  2.43 GBytes  20.9 Gbits/sec  883    467 KBytes
- - - - - - - - - - - - - - - - - - - - - - - - -
[ ID] Interval           Transfer     Bandwidth       Retr
[  4]   0.00-10.00  sec  25.8 GBytes  22.2 Gbits/sec  4474             sender
[  4]   0.00-10.00  sec  25.8 GBytes  22.2 Gbits/sec                  receiver

iperf Done.

Conclusion

I was on on the fence about VBox performance but the proof is in the pudding here with the test results.  The VBox driver had significantly better FS write performance (almost 2x).  CPU performance was about equal overall, and network throughput was also very similar.  I suspect CPU performance would be in favor of xhyve if these tests were run using VBox 4.x.  Regardless, equal CPU performance, similar network throughput and significantly better FS writes tip the scale in favor of the VBox driver.

As frustrating as it can be at times to use Vbox, many of its past performance issues have been fixed as of the v5.0 release.  The shared folder issue still exists but is largely taken care of by the great, easy to use tools that the Docker community has written, docker-machine-nfs is my favorite.

Surprisingly, or maybe not THAT surprisingly, xhyve actually performs worse that Virtualbox at this point.  xyhve itself is still a super young project and the docker-machine xhyve driver is still super young so there is definitely some room for growth.  That said, it was very straightforward to get xhyve and the docker-drive installed and configured, so I believe it is just a matter of time before the xhyve driver matures to a point where it can replace other drivers.  One down side of the xhyve driver is that it also suffers from the host to VM shared folder issue and the current best work around is to use the –nfs-share flag that the xhyve docker-machine driver offers.

I will definitely have my eye on the xhyve project moving forward because it looks to be a great alternative to other virtualization technologies for OS X once it reaches a point of maturity.  For now, VBox works more than sufficiently, has been around for a long time, is pretty much ubiquitous across platforms and the developers have shown that they are still actively working on improving the project with the recent 5.0 release.

Read More

docker developer tools

Useful tools for Docker development

Docker is still a young project, and as such the ecosystem around it hasn’t quite matured to the point that many people feel quite comfortable using it at this point.  It is nice to have such a fast growing set of tools, however the downside to all of this is that many of the tools are not production ready.  I think as the ecosystem solidifies and Docker adoption grows we will see a healthy set of solid, production ready tools that are built off of the current generation of tools.

Once you get introduced to the concepts and ideas behind Docker you quickly realize the power and potential that it holds.   Inevitably though, there comes a “now what?” moment where you basically realize that Docker can do some interesting things but get stuck because there are barriers to simply dropping Docker into a production environment.

One problem is that you can’t simply “turn on” Docker in your environment, so you need tools to help manage images and containers, manage orchestration, development, etc.  So there are a number of challenged to take Docker and start doing useful and interesting things with it once you get past the introductory novelty of building an image and deploying simple containers.

I will attempt to make sense of the current state of Docker and to help take some of the guesswork out of which tools to use in which situations and scenarios for those that are hesitant to adopt Docker.  This post will focus mostly around the development aspects of the Docker ecosystem because that is a nice gateway to working with and getting acquainted with Docker.

Boot2Docker

As you may be aware, Docker does not (yet) support MacOSX or Windows.  This can definitely be a hindrance for adopting and building Docker acceptance amongst developers.  Boot2Docker massively simplifies this issue by essentially creating a sandbox to work with Docker as a thin layer between Docker and Mac (or Windows) via the boot2docker VM.

You can check it out here, but essentially you will download a package and install it and you are ready to start hacking away on Docker on your Mac.  Definitely a must for Mac OS as well as Windows users that are looking to begin their Docker journey, because the complexity is completely removed.

Behind the scenes, a number of things get abstracted away and simplified with Boot2docker, like setting up SSH keys, managing network interfaces, setting up VM integrations and guest additions, etc.  Boot2docker also bundles together with a cli for managing the VM that manages docker so it is easy to manage and configure the VM from the terminal.

CoreOS

It would take many blog posts to try to describe everything that CoreOS and its tooling can do.  The reason I am mentioning it here is because CoreOS is one of those core building blocks that are recently becoming necessary in any Docker environment.  Docker as it is today, is not specifically designed for distributed workloads and as such doesn’t provide much of the tooling around how to solve challenges that accompany distributed systems.  However, CoreOS bridges this gap very well.

CoreOS is a minimal Linux distribution that aims to help with a number of Docker related tasks and challenges.  It is distributed by its design so can do some really interesting things with images and containers using etcd, systemd, fleetd, confd and others as the platform continues to evolve.

Because of this tooling and philosophy, CoreOS machines can be rebooted on the fly without interrupting services and clustered processes across machines.  This means that maintenance can occur whenever and wherever, which makes the resiliency factor very high for CoreOS servers.

Another highlight is its security model, which is a push based model.  For example, instead of manually updating servers with security patches, the CoreOS maintainers periodically push updates to servers, alleviating the need to update all the time.  This was very nice when the latest shellshock vulnerability was released because within a day or so, a patch was automatically pushed to all CoreOS servers, automating the otherwise tedious process of updating servers, especially without config management tools.

Fig

Fig is a must have for anybody that works with Docker on a regular basis, ie developers.  Fig allows you to define your environment in a simple YAML config file and then bring up an entire development environment in one command, with fig up.

Fig works very well for a development work flow because you can rapidly prototype and test how Docker images will work together and eliminate issues that might crop up without being able to test things so easily.  For example if you are working on an application stack you can simply define how the different containers should work and interact together from the fig file.

The downside to fig is that in its current form, it isn’t really equipped to deal with distributed Docker hosts, something that you will find a large number of projects are attempting to solve and simplify.  This shouldn’t be an issue though, if you are aware of its limitations beforehand and know that there are some workloads that fig is not built for.

Panamax

This is a cool project out of Century Link Labs that aims to solve problems around docker app development and orchestration.  Panamax is similar to Fig in that it stitches containers together logically but is slightly different in  a few regards.  First, Panamax builds off of CoreOS to leverage some of its built in tools, etcd, fleetd, etc.  Another thing to note is that currently Panamax only supports single host deployments.  The creators of Panamax have stated that clustered support and multi host tenancy is in the works but for now you will have to use Panamax on a single host.

Panamax simplifies Docker images and application orchestration (kind of), in the background and additionally places a nice layer of abstraction on top of this process so that managing the Docker image “stack” becomes even easier, through a slick GUI.  With the GUI you can set environment variables, link containers together, bind ports and volumes.

Panamax draws a number of its concepts from Fig.  It uses templates as the underlying way to compose containers and applications, which is similar to the Fig config files, as both use YAML files to compose and orchestrate Docker container behavior.  Another cool thing about Panamax is that there is a public template repo for getting different application and container stacks up and running, so the community participation is a really nice aspect of the project.

If setting up a command line config file isn’t an ideal solution in your environment, this tool is definitely worth a look.  Panamax is a great way to quickly develop and prototype Docker containers and applications.

Flocker

This is a very young but interesting project.  The project looks interesting because of the way that it handles and deals volume management.  Right now one of the biggest challenges to widespread Docker adoption is exactly the problem that Flocker solves in its ability to persist storage across distributed hosts.

From their github page:

Flocker is a data volume manager and multi-host Docker cluster management tool. With it you can control your data using the same tools you use for your stateless applications by harnessing the power of ZFS on Linux.

Basically, Flocker is using some ZFS magic behind the scenes to allow volumes to float between servers, to allow for persistent storage across machines and containers.  That is a huge win for building distributed systems that require persisnt data and storage, eg databases.

Definitely keep an eye on this project for improvements and look for them to push this area in the future.  The creators have said it isn’t production ready just quite yet but is  a great tool to use in a test or staging environment.

Flynn

Flynn touts itself as a Platform as a Service (PasS) built on Docker, in a very similar vein to Heroku.  Having a Docker PaaS is a huge win for developers because it simplifies developer workflow.  There are some great benefits of having a PaaS in your environment, the subject could easily expand to be its own topic of conversation.

The approach that Flynn takes (and Paas in general) is that operations should be a product team.  With Flynn, ops can provide the platform and developers can focus on their tasks much more easily, developing software, testing and generally freeing developers the time to focus on development tasks instead of fighting operations.  Flynn does a nice job of decoupling operations tasks from dev tasks so that the developers don’t need to rely on operations to do their work and operations don’t need to concern themselves with development tasks which can cause friction and create efficiency issues.

Flynn works by basically tying a number of different tools together created specifically to solve challenges of building a PaaS to perform their workloads via Docker (scheduling, persistent storage, orchestration, clustering, etc) as one single entity.

Currently its developers state that Flynn is not quite suitable for production use yet, but it is still mature enough to use and play around with and even deploy apps to.

Deis

Deis is another PaaS for Docker, aiming to solve the same problems and challenges that Flynn does, so there is definitely some overlap in the projects, as far as end users are concerned.  There is a nice CLI tool for manaing and intereacting with Deis and it offers much of the same functionality that either heroku or Flynn offer.  Deis can do things like horizontal application scaling, supports many different application frameworks and is Open Source.

Deis is similar in concept to Flynn in that it aims to solve PaaS challenges but they are quite different in their implementation and how they actually achieve their goals.

Both Flynn and Deis aim to create platforms to build Docker apps on top of but do so in somewhat different means.  As the creator Deis explains, Deis is very much more practical in its approach to solving PaaS issues because it is basically taking a number of available technologies and tools that have already been created and is fitting them together only creating the pieces that are missing,  while Flynn seems to be very much more ambitious in its approach due to the fact that it is implementing a number of its own tooling and solutions, including its own scheduler, registration service, etc and only relying on a few tools that are already in existence.  For example, while Flynn does all of these different things, Deis leverages CoreOS to do many of the tasks it needs to operate and work correctly while minimally bolting on tooling that it needs to function correctly.

Conclusion

As the Docker ecosystem continues to evolve, more and more options seem to be sprouting up.  There are already a number of great tools in the space but as the community continues to evolve I believe that the current tools will continue to improve and new and useful tools will be built for Docker specific workloads.  It is really cool to see how the Docker ecosystem is growing and how the tools and technologies are disrupting traditional views on a number of areas in tech including virtualization, DevOps, development, deployments and application development, among others.

I anticipate the adoption of Docker to continue growing for the foreseeable future as the core Docker project continues to improve and stabilize as well as the tools tools built around it that I have discussed here.  It will be interesting to see where things are even six months from now in regards to the adoption and use cases that Docker has created.

Read More

Nexus 1000v in a Hyper-V 2012 Environment (Part 1)

In the next few posts I will be going over some of the basics on how to get the Nexus 1000v setup and working in a Hyper-V environment.  I must warn readers ahead of time, this product was just released (as of a week or two ago) and the Cisco documentation is seriously lacking.  What documentation that does exist is thoroughly confusing so it may take some time to work through all of the issues.  Just as much if not more irritating, the Hyper-V way of doing things is just as confusing.  Taking on a project like this will surely improve your skills and abilities with virtualization, especially network virtualization.  I must admit, this stuff can get very confusing at first so it is important to realize that you might not understand everything at first, just be patient, it will eventually start making more sense.

First I need to lay some ground work.  I think it’s important not only in this example but a good habit in general to spec out a project and figure out all of the requirements in order to make sure you have everything lined up that you might need before tackling a project.  A few important considerations when working with the 1000v are to make sure the networking and NIC’s on the Hyper-V hosts are set correctly, Virtual Machine Manager (SCVMM) is installed and configured, the network is configured (LACP port channels, trunk ports, correct VLAN assignment, etc) and that configuring all of these pieces won’t cause any downtime or other issues with your production network.  Ideally, all of this would be thought of and set up ahead of time.  Luckily I have a test environment as well as SCVMM in my test environment to test this with and do not have to worry about any real world down time or production issues.

One of the most important things to get established is getting the underlying Hyper-V network stack configured properly.  I try to mimic a production type environment as much as possible so this configuration is a typical design you may see in the real world.  So let’s lay out the structure of the design.

  • Management VLAN(s)
  • DMZ VLAN(s)
  • Inside VLAN(s)
  • Live Migration VLAN(s)

It is common to break these out through different physical connections, so as an example you might see 4 different NIC’s on the Hyper-V host connecting to a switch that has 4 different VLAN’s configured.  If you want redundancy you can add NIC teaming into this scenario (which is native in Server 2012 now, which is nice).  I have limited resources so I am using a single NIC for management, DMZ and live migration traffic, and teaming the inside connection with 2 NIC’s.  Here is a crude example of how this is setup.

Hyper-V architecture

If you are setting this up in a clustered environment, you would want these settings to be identical across all Hyper-V hosts.  Once this is setup correctly make sure you have SCVMM installed and configured. That is a separate process and therefore is out of the scope of this post, I’d be happy to answer any questions you have, I’m just not discussing it here.  You will need to grab the Cisco Nexus 1000v for Hyper-V.  To download the files necessary for installation (let me know if you don’t have one) you will need a valid Cisco ID.  Cisco also provides some documentation as well as some installation videos links but I have found them to be less than helpful to be honest, there is some useful information to be sure, I just want to walk you through the process myself because there were a few caveats and the documentation creates a lot of unnecessary confusion.

There is some basic terminology to be familiar with when getting the 1000v up and going that helps to understand how and why different parts work the way that they do when running through the installation.

  • vsm – virtual superviser module.  This logically controls the virtual switch and can be thought of as a virtual line card to manage the different VEMs.
  • vem – virtual ethernet module.  This is the piece that actually replaces the virtual switch
  • nsm – network segmentation manager.

Once you have the 1000v downloaded you need to make sure you run the installation for it on the server that is hosting SCVMM.  The installer is hidden in the following location,

\Nexus1000v.5.2.1.SM1.5.1\VSM\Installer_App\Cisco.Nexus1000VInstaller.UI.exe

When you run this executable it should bring up a GUI to install and configure the virtual switch(es).  You will need to use an account that is a member of the SCVMMAdmins group in Active Directory, otherwise the installer will not be able to connect to SCVMM and will not be able to create and configure the VM for the new virtual switch.

Authenticate to SCVMM

The next portion of the installer is where things may get confusing if you don’t know what you are looking for.  I have linked to the sample configuration I used in my lab to help with this.  Since this is what I used in my test environment I know at least at one point this configuration worked.  It would be a good idea to deploy the VSM’s in high availability if you can, otherwise it isn’t a big deal.

  • Choose a meaningful name for VSM name, basically this is the same as the host name.  
  • The ISO linstall location is, \Nexus1000v.5.2.1.SM1.5.1\VSM\Install\nexus-1000v.5.2.1.SM1.5.1.iso.  
  • From the documentation I’ve read the VEM MSI location indicated is a little misleading because it points at the wrong installation file.  It should point at \Nexus1000v.5.2.1.SM1.5.1\VMM\Nexus1000V-VSEMProvider-5.2.1.SM1.5.1.0.msi.  
  • The VSM IP address should be an address in your management network, it can basically be thought of as the address to use to connect to the 1000v virtual switch.  
  • Subnet mask should be fine as 255.255.255.0.
  • Gateway IP should match up with the VSM IP address, essentially they just need to be on the same subnet.
  • Domain ID is an arbitrary number that is associated with the virtual network.  For most use cases you should be able to use one ID, 1000 in my example.
  • Use the VLAN ID that your VSM is on, in my case it is my management ID.
  • Since our management VLAN is that same as the VSM VLAN (typical in most deployments) simply choose “Yes” here.

1000v deployment config

At this point everything should be configured, the installer just needs to go out and create the VM’s and take care of getting everything up and running.  It may take awhile so take a break if needed and come back later.

Wait for the installation to finish

Everything should complete successfully, if not you will need to look at the log file and troubleshoot any errors you may have.

Installation summary

Almost done.  Everything should be out there and running but there is still one very important step left.  If you notice, about halfway down the installation summary page there is a username/password of admin and admin.  This obviously will change once the 1000v gets put into use but there is NOTHING in the documentation that tells you that this will break the configuration in SCVMM!

What you need to do is hop on the SCVMM server and manually configure the credentials that are used to connect to the 1000v switch.  To do this, drill down into the security settings in SCVMM by flipping open the Configuration pane -> Security -> Runas accounts -> Right click your 1000v admin account and select properties.

Updating the admin account in SCVMM

Then you will change the username and password to match the credentials that you have set on the 1000v. This will allow the switch to communicate with the SCVMM server so that 1000v network settings can be managed through Hyper-V.

In Part 2 I will discuss the intricacies of configuring the 1000v as well as how to reflect these settings in your Hyper-V virtual environment.  Since this is a brand new product, there are still some things yet  that need to get worked out, especially the documentation.  And as I mentioned earlier, the network settings in Hyper-V and SCVMM can be extremely confusing the first time you see them.   Working through and troubleshooting these issues will quickly help improve your knowledge and understanding of how Hyper-V and the Nexus 1000v work together to improve virtual networking.  If you have any questions or concerns about any of this I will try to help, but I am not promising anything at this point.

Read More

Fix the “Can’t create highly Available VM” error in VMM

I was doing some lab testing with Virtual Machine Manager (VMM) for my Hyper-V environment the other day when I ran across this issue.  It was happening because I was attempting to save my disk image to a CSV location and I couldn’t figure out why it was blowing up.

It turns out that the answer is really simple actually but is annoying to fix.  So here is what the error looks like in VMM.  “Cannot create or update a non highly available virtual machine because the path cluster storage volume is a clustered resource error”.  Surprisingly Google didn’t turn up much for how to fix this so I thought I would make a note of it in case others happen to run across this.

VMM error

The fix is simple enough, I just wish VMM was smart enough to know when you select a CSV to use for storage.  When you go through the VM creation process and you get to the hardware selection tab, choose the “Make this virtual machine highly available” check box.

VM high availability

If you happen to be building a lot of identical (or near identical) VM’s I reccomend checking out Profiles through VMM, in this case a specific hardware profile for desktop VM’s.  Creating profiles simplifies the build and creation process and you don’t need to worry about picking all the right defaults without having your VM creation blow up.

Read More

WTF Friday

Lately I have been building a Windows Hyper-V v3 clustered lab environment with all the bells and whistles.  It has been a great learning experience thus far and I can honestly say that I am enjoying Hyper-V overall thus far.  Recently I decided to take the plunge and begin experimenting with System Center Virtual Machine Manager (SCVMM), and managed to run across a bizarre issue last Friday.  The reason I am posting is because there were basically no real clues for this problem, so I would like to go over some of the various things that I looked and ultimately how this issue was resolved.  I feel this post may be useful to others because a lot of this stuff is relatively new and there wasn’t a ton of material out there on this specific problem to use as reference.

The installation process is relatively straight forward.  The environment I am using is Server 2012, so as a prerequisite you must use SCVMM 2012 w/SP1 in order for this to work.  If you are using 2008R2 you can use SCVMM 2012.  I used this guide as a reference for the installation instructions, which more or less go like this:

  1. Create your SCVMM accounts in AD.  scvmmadmin (admin account), scvmmsvc (service account), scvmmadmins (admin group).
  2. Install/point the SCVMM server to SQL 2012.  I won’t go over SQL installation because it is beyond the scope of this post.
  3. Install the prerequisites on your SCVMM server.  ADK for Windows 8, SQL 2012 native client, SQL 2012 command line utilities.
  4. Install SCVMM 2012 w/SP1.  VMM Management Server, VMM Console.
  5. Deploy agents to Hyper-V hosts.

This is easy enough to follow but I was getting suck on step 4 when I was attempting to install the Management Server and the Console.  The installation would choke about half way through with the following error:

A Hardware Management error has occurred trying to contact server GMVM-TEST-04.gmrcnet.local  .

WinRM: URL: [http://gmvm-test-04.gmrcnet.local:5985], Verb: [INVOKE], Method: [AssociateLibrary], Resource: [http://schemas.microsoft.com/wbem/wsman/1/wmi/root/scvmm/AgentManagement]

Check that WinRM is installed and running on server GMVM-TEST-04.gmrcnet.local. For more information use the command “winrm helpmsg hresult”.

WinRM error

Okay… WTF?  I knew that I already had WinRM installed and was on, but if you are not sure the quickest way to find out is to type winrm quickconfig from a command prompt.  You should get something similar to the following:

WinRM output

So we know WinRM is on and should be working.  Next, I checked the installation logs for clues.  They are located in C:\ProgramData\VMMLogs\SetupWizard.log.  I found the portion of the logs that indicated there were issues:

12:44:54:VMMPostinstallProcessor threw an exception: Threw Exception.Type: Microsoft.Carmine.WSManWrappers.WSManProviderException, Exception.Message: A Hardware Management error has occurred trying to contact server GMVM-TEST-04.gmrcnet.local .

WinRM: URL: [http://gmvm-test-04.gmrcnet.local:5985], Verb: [INVOKE], Method: [AssociateLibrary], Resource: [http://schemas.microsoft.com/wbem/wsman/1/wmi/root/scvmm/AgentManagement]

Check that WinRM is installed and running on server GMVM-TEST-04.gmrcnet.local. For more information use the command "winrm helpmsg hresult".
12:44:54:StackTrace: at Microsoft.Carmine.WSManWrappers.ErrorContextParameterHelper.ThrowTranslatedCarmineException(WsmanSoapFault fault, COMException ce)
at Microsoft.Carmine.WSManWrappers.WsmanAPIWrapper.RetrieveUnderlyingWMIErrorAndThrow(SessionCacheElement sessionElement, COMException ce)
at Microsoft.Carmine.WSManWrappers.WsmanAPIWrapper.Invoke(String actionUri, WSManUri targetUri, Hashtable parameters, Type returnType, Boolean isCarmineMethod, Boolean forceResponseCast)
at Microsoft.Carmine.WSManWrappers.WsmanAPIWrapper.Invoke(String actionUri, String url, Hashtable parameters, Type returnType, Boolean isCarmineMethod)
at Microsoft.Carmine.WSManWrappers.AgentManagement.AssociateLibrary(WsmanAPIWrapper wsmanObject, String CertificateSubjectName, String& ExportedCertificate, ErrorInfo& ErrorInfo)
at Microsoft.VirtualManager.Setup.VirtualMachineManagerHelpers.AssociateDefaultLibraryServer()
at Microsoft.VirtualManager.Setup.VirtualMachineManagerHelpers.SetupLibraryShare()
at Microsoft.VirtualManager.Setup.InstallItemCustomDelegates.PangaeaServerPostinstallProcessor()
12:44:54:InnerException.Type: System.Runtime.InteropServices.COMException, InnerException.Message: The WinRM client sent a request to an HTTP server and got a response saying the requested HTTP URL was not available. This is usually returned by a HTTP server that does not support the WS-Management protocol.
12:44:54:InnerException.StackTrace: at WSManAutomation.IWSManSession.Invoke(String actionUri, Object resourceUri, String parameters, Int32 flags)
at Microsoft.Carmine.WSManWrappers.MyIWSManSession.Invoke(String actionUri, Object resourceUri, String parameters, Int32 flags)
at Microsoft.Carmine.WSManWrappers.WsmanAPIWrapper.Invoke(String actionUri, WSManUri targetUri, Hashtable parameters, Type returnType, Boolean isCarmineMethod, Boolean forceResponseCast)
12:44:54:ProcessInstalls: Running the PostProcessDelegate returned false.
12:44:54:ProcessInstalls: Running the PostProcessDelegate for PangaeaServer failed.... This is a fatal item. Setting rollback.
12:44:54:SetProgressScreen: FinishMinorStep.
12:44:55:ProcessInstalls: Rollback is set and we are not doing an uninstall so we will stop processing installs
12:44:55:****************************************************************
12:44:55:****Starting*RollBack*******************************************
12:44:55:****************************************************************

Incredibly useful, I know.  It is good to know where this stuff is located though just in case other issues arise that require troubleshooting like this.  So at this point I was dumbfounded and most of the stuff I found on Google was not helpful for my situation (I tried many different suggestions).

Finally I came across a post that mentioned disabling WinRM from Group Policy.  It just so happens that there is a policy in our test environment for enabling Powershell and remoting and all that jazz.  So I completely disabled the policy and was finally able to get SCVMM to install!  Here are the two policy settings you should take a look at first.

Computer Configuration > Administrative Templates > Windows Components > Windows Remote Management (WinRM) > WinRM Service

and

Computer Configuration > Administrative Templates > Windows Components > Windows PowerShell

I need to go back and verify that the root issue was caused by the WinRM portion of the policy, which I’m suspecting it is.  But if you run across this error look at your Group Policy settings!

The moral of the story:  Windows Management Framework 3.0 and more specifically the WinRM components of WMF 3.0 are delicate, even on Server 2012 (there have been major compatibility issues with earlier versions of Windows).  In my scenario Group Policy was somehow getting in the way (if you can understand and decipher those logs and how they relate to Group Policy let me know) of allowing SCVMM to install.

Read More