Create Windows Server 2019 AMIs using Packer

There are quite a few blog posts out there detailing this, but none of them seem to be up to date for use with the HCL style syntax, introduced in Packer 1.5, which has a number of advantages over using the original JSON syntax, namely that it is much more human friendly, flexible, and easy to work with.

So in this post I will share a very quick and dirty example packer configuration that uses the HCL syntax to create, provision and upload a Windows server AMI to AWS.

You’ll need the following bootstrap_win.txt file in place in order for Packer to be able to provision the server using PowerShell commands over WinRM.

<powershell>

write-output "Running User Data Script"
write-host "(host) Running User Data Script"

Set-ExecutionPolicy Unrestricted -Scope LocalMachine -Force -ErrorAction Ignore

# Don't set this before Set-ExecutionPolicy as it throws an error
$ErrorActionPreference = "stop"

# Remove HTTP listener
Remove-Item -Path WSMan:\Localhost\listener\listener* -Recurse

# Create a self-signed certificate to let ssl work
$Cert = New-SelfSignedCertificate -CertstoreLocation Cert:\LocalMachine\My -DnsName "packer"
New-Item -Path WSMan:\LocalHost\Listener -Transport HTTPS -Address * -CertificateThumbPrint $Cert.Thumbprint -Force

# WinRM
write-output "Setting up WinRM"
write-host "(host) setting up WinRM"

cmd.exe /c winrm quickconfig -q
cmd.exe /c winrm set "winrm/config" '@{MaxTimeoutms="1800000"}'
cmd.exe /c winrm set "winrm/config/winrs" '@{MaxMemoryPerShellMB="1024"}'
cmd.exe /c winrm set "winrm/config/service" '@{AllowUnencrypted="true"}'
cmd.exe /c winrm set "winrm/config/client" '@{AllowUnencrypted="true"}'
cmd.exe /c winrm set "winrm/config/service/auth" '@{Basic="true"}'
cmd.exe /c winrm set "winrm/config/client/auth" '@{Basic="true"}'
cmd.exe /c winrm set "winrm/config/service/auth" '@{CredSSP="true"}'
cmd.exe /c winrm set "winrm/config/listener?Address=*+Transport=HTTPS" "@{Port=`"5986`";Hostname=`"packer`";CertificateThumbprint=`"$($Cert.Thumbprint)`"}"
cmd.exe /c netsh advfirewall firewall set rule group="remote administration" new enable=yes
cmd.exe /c netsh firewall add portopening TCP 5986 "Port 5986"
cmd.exe /c net stop winrm
cmd.exe /c sc config winrm start= auto
cmd.exe /c net start winrm

</powershell>

Here is what the full Packer HCL configuration looks like.

variable "aws_region" {
  type    = string
  default = "us-west-2"
}

variable "instance_type" {
  type    = string
  default = "t3.medium"
}

variable "subnet_id" {
  type = string
}

variable "vpc_id" {
  type = string
}

variable "ami_users" {
  type = string
}

source "amazon-ebs" "windows_server" {
  ami_description             = "A custom Windows Server AMI"
  ami_name                    = "windows-example"
  ami_users                   = ["${var.ami_users}"]
  associate_public_ip_address = true
  communicator                = "winrm"
  instance_type               = "${var.instance_type}"
  region                      = "${var.aws_region}"
  force_deregister            = true
  force_delete_snapshot       = true
  source_ami_filter {
    filters = {
      architecture        = "x86_64"
      name                = "Windows_Server-2019-English-Full-ContainersLatest-*"
      root-device-type    = "ebs"
      virtualization-type = "hvm"
    }
    most_recent = true
    owners      = ["801119661308"]
  }
  subnet_id      = "${var.subnet_id}"
  user_data_file = "./bootstrap_win.txt"
  vpc_id         = "${var.vpc_id}"
  winrm_insecure = true
  winrm_port     = 5986
  winrm_use_ssl  = true
  winrm_username = "Administrator"
}

build {
  sources = ["source.amazon-ebs.windows_server"]

  # Extra configuration
  provisioner "file" {
    destination = "C:\\ProgramData\\someconfig.txt"
    source      = "./myconfig.txt"
  }

  provisioner "powershell" {
    # Reinitialize the server to generate a random password on first boot
    inline = [
      "C:\\ProgramData\\Amazon\\EC2-Windows\\Launch\\Scripts\\SendWindowsIsReady.ps1 -Schedule",
      "C:\\ProgramData\\Amazon\\EC2-Windows\\Launch\\Scripts\\InitializeInstance.ps1 -Schedule",
      "C:\\ProgramData\\Amazon\\EC2-Windows\\Launch\\Scripts\\SysprepInstance.ps1 -NoShutdown"
    ]
  }
}

Notice the two provisioners. The first enables a way for you to inject local configurations into the image before baking it, useful for adding extra configurations. The second is specific to AWS Windows Server images but essentially allows the machine to act like it is being booted for the first time, using the SendWindowsIsReady.ps1, InitializeInstance.ps1 and SysprepInstance.ps1 scripts. These scripts are important pieces to ensuring that this AMI can be created and started the exact same way every time.

Some of the configuration options may not be necessary so you will need to play around with the configuration to make it suit your needs. For example, you may want to create a unique image on every build, which can be done using timestamps. If you use unique identifiers, the force_deregister and force_delete_snapshot options can be omitted.

Likewise, since this Packer image is specific to Windows, it uses a number of winrn_ options, which would be replaced by ssh_ options if this were being provisioned for Linux.

One other trick that I found to be helpful was setting some of these variables on the fly via a script. You can use PKR_VAR_<var> via shell to set some environment variables. This is especially useful when the configuration needs to be shared across different environments for things that change, e.g. vpc_id and subnet_id.

Read More

Terraform Testing Tools

terraform testing tools

What begins with “T”? I have been thinking about various ways of testing infrastructure and resources lately and have been having a difficult time parsing out the various tools that are available. This post is meant to be a reference for “finding the right tool for the right job” as part of testing various infrastructures.

Often times when you start talking about testing, you will hear about the testing pyramid, which is described along with some other interesting aspects of testing Terraform in this blog post, it covers a lot of the pitfalls and gotchas you might run into.

My aim originally was to find a good tool for unit testing Terraform and as part of that adventure have uncovered a number of other interesting projects, that while not directly applicable, could be very useful for a number of different testing scenarios. Read the above blog post for more info but suffice to say, unit testing (the bottom layer of the testing pyramid) is quite difficult to do with Infrastructure as Code and Terraform and at this point is mostly not a solved problem.

Here is the list of tools that I uncovered in my research. Please let me know if there are any missing.

Terratest

Terratest is written and maintained by the folks at Terragrunt and it provides a comprehensive testing experience for deploying Infrastructure as Code, testing that it works as expected and then tears down the IaC when it is finished. From what I can tell this is probably the most comprehensive testing tool out there, still doesn’t cover unit tests, but is a great way to ensure things are working end to end.

Kitchen-Terraform

Follows the BDD philosophy and spins up, tests and spins down various Terraform resources. Works in a similar fashion to Terratest to bring up the environment, test and then tear things down.

serverspec

This testing tool is one of the original tools I ran across when originally mulling over the idea of unit testing infrastructure back in the days when configuration management tools like Chef, Puppet and Salt ruled the earth.

inspec (aws-resources)

Very similiar to awspec, this tool provides a framework for testing various AWS resources. This one is nice because it uses inspec to build on and so it has a lot of extra capabilities.

awspec

Very similar to the other “spec” tools, awspec is built in the same style as serverspec/inspec and provides a very nice interface for testing various AWS resource types. Since it is modeled after serverspec, you will need to deal with Ruby.

Testinfra

Another Python based unit testing style framework. This tools varies from Terraform Validate in that it mainly focuses on testing the lower level server and OS, but it does have integration for testing a number of other things.

terraform-compliance

A nice tool for testing (and enforcing Terraform compliance rules).

Terraform Validate

This is a nice tool for expressing various test conditions as compliance, especially if you are already familiar with Python and its testing landscape. This tool parses configs using pyhcl and allows you to write familiar unittest style tests for Terraform configurations.

Pulumi

This one is a bit of a wild card but could prove interesting to some readers. Pulumi provides direct integration into your language of choice (Python for me) to enable you to write pulumi code using language native unit tests. Obviously Pulumi isn’t Terraform but I had to mention it here because there is a significant amount of crossover between the tools.

Read More

Automatically set Terraform environment variables

automatically load environment variables

If you have spent any time managing infrastructure you have probably run into the issue of needing to set environment variables in order to connect to various resources.

For example, the AWS Terraform provider allows you to automatically source local environment variables, which solves the issue of placing secrets in places they should be, ie. writing the keys into configurations or state. This use case is pretty straight forward, you can just set the environment variables once and everything will be able to connect.

The problem arises when environments need to be changed frequently or other configurations requiring secrets or tokens to connect to other resources need to be changed.

There are various tools available to solve the issue of managing and storing secrets for connecting to AWS including aws-vault and aws-env. The main issue is that these tools are opinionated and only work with AWS and aren’t designed to work with Terraform.

Of course this problem isn’t specific to Terraform, but the solution I have discovered using ondir is generic enough that it can be applied to Terraform as well as any other task that involves setting up environment variables or running shell commands whenever a specific directory is changed to or left.

With that said, the ondir tool in combination with Terragrunt, gives you a set of tools that vastly improves the Terraform experience. There is a lot of Terragrunt reference material already so I won’t discuss it much here. The real power of my solution comes from combining ondir with Terragrunt to enhance how you use Terraform.

Note: Before getting into more details of ondir, it is worth mentioning that there are several other similar tools, the most well known of which is called direnv. Direnv provides a few other features and a nice library for completing actions so it is definitely work checking out if you are not looking for a solution specific to Terraform/Terragrunt. Other tools in the space include autoenv, smartcd and a few others.

Direnv should technically work to solve the problem, but when managing and maintaining large infrastructures, I have found it to be more tedious and cumbersome to manage.

Ondir

From the Github page, Ondir is a small program to automate tasks specific to certain directories, so isn’t designed specifically to set environment variables but makes a perfect fit doing so.

Install ondir

# OSX
brew install ondir
# Ubuntu
apt install ondir

Configure ondir

After installing, add the following lines to your ~/.zshrc (or these lines if you are using bash) and restart your shell.

val_ondir() {
  eval "`ondir \"$OLDPWD\" \"$PWD\"`"
}
chpwd_functions=( eval_ondir $chpwd_functions )

Ondir leverages an ~/.ondirrc file to configure the tool. This file basically consists of enter and leave directives for performing actions when specific directories are entered and left. To make this tool useful with Terragrunt, I have setup a simple example below for exporting a database password given a path that gets switched into (if the path ends in stage/database), which is defined in the ~/.ondirrc file. One very nice feature of ondir is that it allows regular expressions to be used to define the enter/leave directives, which makes allows for complex paths to be defined.

enter .+(stage\/database)
    echo "Setting database password"
    export TF_VAR_db_password=mypassword
leave .+(stage\/database)
    unset TF_VAR_db_password

With this logic, the shell should automatically export a database password for Terraform to use whenever you switch to the directory ending in stage/database.

cd ~/project/path/to/stage/database
# Check your environment for the variable
env | grep TF_VAR
TF_VAR_db_password=mypassword

cd ~
# When you leave the directory it should get unset
env | grep TF_VAR

In a slightly more complicated example, you may have a databases.sh script containing more logic for setting up or exporting variables or otherwise setting up environments.

#!/usr/bin/env bash

db_env=$(pwd | awk -F '/' '{print $(NF-1)"/"$NF}')

if [[ "$db_env" == "stage/database" ]]; then
    # do stuff
    export TF_VAR_db_password="mypassword"

else [[ "$db_env" == "prod/database" ]]; then
    # do other stuff
    export TF_VAR_db_password="myotherpassword"
fi

Conclusion

That’s pretty much. Now you can automatically set environment variables for Terraform to use, based on the directory we are currently and no longer need to worry about setting and unsetting environment variables manually.

I HIGHLY recommend checking out direnv as well. It offers slightly different functionality, and some neat features which in some cases may actually be better for specific tasks than ondir. For example, the direnv stdlib has some great helpers to help configure environments cleanly.

For me, using ondir for managing my Terragrunt configs and direnv for managing other projects makes the most sense.

Read More

Deploy AWS SSM agent to CoreOS

If you have been a CoreOS user for long you will undoubtedly have noticed that there is no real package management system.   If you’re not familiar, the philosophy of CoreOS is to avoid using a package manager and instead rely heavily on leveraging the power of Docker containers along with a few system level tools to manage servers.  The problem that I just recently stumbled across is that the AWS SSM agent is packaged into debian and RPM formats and is assumed to be installed with a package manager, which obviously won’t work on CoreOS.  In the remainder of this post I will describe the steps that I took to get the SSM agent working on a CoreOS/Dockerized server.  Overall I am very happy with how well this solution turned out.

To get started, there is a nice tutorial here for using the AWS Session Manager through the the console.  The most important thing that needs to be done before “installing” the SSM agent on the CoreOS host is to set up the AWS instance with the correct permissions for the agent to be able to communicate with AWS.  For accomplishing this, I created a new IAM role and attached the AmazonEC2RoleForSSM policy to it through the AWS console.

After this step is done, you can bring up the ssm-agent.

Install the ssm-agent

After ensuring the correct permissions have been applied to the server that is to be manager, the next step is to bring up the agent.  To do this using Docker, there are some tricks that need to be used to get things working correctly, notably, fixing the PID 1 zombie reaping problem that Docker has.

I basically lifted the Dockerfile from here originally and adapted it into my own public Docker image at jmreicha/ssm-agent:latest.  In case readers want to go try this, my image is a little bit newer than the original source and has a few tweaks.  The Dockerfile itself is mostly straight forward, the main difference is that the ssm-agent process won’t reap child processes in the default Debian image.

In order to work around the child reaping problem I substituted the slick Phusion Docker baseimage, which has a very simple process manager that allows shells spawned by the ssm-agent to be reaped when they get terminated.  I have my Dockerfile hosted here if you want to check out how the phusion baseimage version works.

Once the child reaping problem was solved, here is the command I initially used to spin up the container, which of course still didn’t work out of the box.

docker run \
  -v /var/run/dbus:/var/run/dbus \
  -v /run/systemd:/run/systemd \
 jmreicha/ssm-agent:latest

I received the following errors.

2018-11-05 17:42:27 INFO [OfflineService] Starting document processing engine...
2018-11-05 17:42:27 INFO [OfflineService] [EngineProcessor] Starting
2018-11-05 17:42:27 INFO [OfflineService] [EngineProcessor] Initial processing
2018-11-05 17:42:27 INFO [OfflineService] Starting message polling
2018-11-05 17:42:27 INFO [OfflineService] Starting send replies to MDS
2018-11-05 17:42:27 INFO [LongRunningPluginsManager] starting long running plugin manager
2018-11-05 17:42:27 INFO [LongRunningPluginsManager] there aren't any long running plugin to execute
2018-11-05 17:42:27 INFO [HealthCheck] HealthCheck reporting agent health.
2018-11-05 17:42:27 INFO [MessageGatewayService] Starting session document processing engine...
2018-11-05 17:42:27 INFO [MessageGatewayService] [EngineProcessor] Starting
2018-11-05 17:42:27 INFO [LongRunningPluginsManager] There are no long running plugins currently getting executed - skipping their healthcheck
2018-11-05 17:42:27 INFO [StartupProcessor] Executing startup processor tasks
2018-11-05 17:42:27 INFO [StartupProcessor] Unable to open serial port /dev/ttyS0: open /dev/ttyS0: no such file or directory
2018-11-05 17:42:27 INFO [StartupProcessor] Attempting to use different port (PV): /dev/hvc0
2018-11-05 17:42:27 INFO [StartupProcessor] Unable to open serial port /dev/hvc0: open /dev/hvc0: no such file or directory
2018-11-05 17:42:27 ERROR [StartupProcessor] Error opening serial port: open /dev/hvc0: no such file or directory
2018-11-05 17:42:27 ERROR [StartupProcessor] Error opening serial port: open /dev/hvc0: no such file or directory. Retrying in 5 seconds...
2018-11-05 17:42:27 INFO [MessageGatewayService] Successfully created ssm-user
2018-11-05 17:42:27 ERROR [MessageGatewayService] Failed to add ssm-user to sudoers file: open /etc/sudoers.d/ssm-agent-users: no such file or directory
2018-11-05 17:42:27 INFO [MessageGatewayService] [EngineProcessor] Initial processing
2018-11-05 17:42:27 INFO [MessageGatewayService] Setting up websocket for controlchannel for instance: i-0d33006836710e7ef, requestId: 2975fe0d-846d-4256-9d50-57932be03925
2018-11-05 17:42:27 INFO [MessageGatewayService] listening reply.
2018-11-05 17:42:27 INFO [MessageGatewayService] Opening websocket connection to: %!(EXTRA string=wss://ssmmessages.us-west-2.amazonaws.com/v1/control-channel/i-0d33006836710e7ef?role=subscribe&stream=input)
2018-11-05 17:42:27 INFO [MessageGatewayService] Successfully opened websocket connection to: %!(EXTRA string=wss://ssmmessages.us-west-2.amazonaws.com/v1/control-channel/i-0d33006836710e7ef?role=subscribe&stream=input)
2018-11-05 17:42:27 INFO [MessageGatewayService] Starting receiving message from control channel
2018-11-05 17:42:32 INFO [StartupProcessor] Unable to open serial port /dev/ttyS0: open /dev/ttyS0: no such file or directory
2018-11-05 17:42:32 INFO [StartupProcessor] Attempting to use different port (PV): /dev/hvc0
2018-11-05 17:42:32 INFO [StartupProcessor] Unable to open serial port /dev/hvc0: open /dev/hvc0: no such file or directory
2018-11-05 17:42:32 ERROR [StartupProcessor] Error opening serial port: open /dev/hvc0: no such file or directory
2018-11-05 17:42:32 ERROR [StartupProcessor] Error opening serial port: open /dev/hvc0: no such file or directory. Retrying in 5 seconds...
2018-11-05 17:42:35 INFO [MessagingDeliveryService] [Association] No associations on boot. Requerying for associations after 30 seconds.

The first error that jumped out in logs is the “Unable to open serial port”.  There is also an error referring to not being able to add the ssm-user to the sudoers file.

The fix for these issues is to add a Docker flag to the CoreOS serial device, “–device=/dev/ttyS0” and a volume mount to the sudoers path, “-v /etc/sudoers.d:/etc/sudoers.d”.  The full Docker run command is shown below.

docker run -d --restart unless-stopped --name ssm-agent \
  --device=/dev/ttyS0 \
  -v /var/run/dbus:/var/run/dbus \
  -v /run/systemd:/run/systemd \
  -v /etc/sudoers.d:/etc/sudoers.d \
  jmreicha/ssm-agent:latest

After fixing the errors found in the logs, and bringing up the containerized SSM agent, go ahead and create a new session in the AWS console.

ssm session

The session should come up pretty much immediately and you should be able to run commands like you normally would.

The last thing to (optionally) do is run the agent as a systemd service to take advantage of some capabilities to start it up automatically if it dies or start it if the server gets rebooted.  You can probably just get away with using the docker restart policy too if you aren’t interested in configuring a systemd service, which is what I have chosen to do for now.

You could even adapt this Docker image into a Kubernetes manifest and run it as a daemonset on each node of the cluster if desired to simplify things and add another layer of security.  I may return to the systemd unit and/or Kubernetes manifest in the future if readers are interested.

Conclusion

session history

The AWS Session manager is a fantastic tool for troubleshooting/debugging as well as auditing and security.

With SSM you can make sure to never expose specific servers to the internet directly, and you can also keep track of what kinds of commands have been run on the server.  As a bonus, the AWS console helps keeps track of all the previous sessions that were created and if you hook up to Cloudwatch and/or S3 you can see all the commands and times that they were run with nice simple links to the log files.

SSM allows you to do a lot of other cool stuff like run scripts against either a subset of servers which can be filtered by tags or against all servers that are recognized by SSM.  I’m sure there are some other features as well, I just haven’t found them yet.

Read More

Toggle the Vscode sidebar with vim bindings

There is an annoying behavior that you might run across in the Windows version of Vscode if you are using the Vsvim plugin when trying to use the default hotkey <ctrl+b> to toggle the sidebar open and closed.  As far as I can tell, this glitch doesn’t exist in the OSX version of Vscode.

In vim, the shortcut for this toggle is actually used to scroll the page buffer up one screen.  Obviously this makes behavior is the default for Vim but it also is annoying to not be able to open and close the sidebar.  The solution is a simple hotkey remap for the keyboard shortcut in Vscode.

To do this, pull open the keyboard shortcuts by either navigating to File -> Preferences -> Keyboard shortcuts

Or by using the command pallet (ctrl + shift + p) and searching for “keyboard shortcut”.

Once the keyboard shortcuts have been pulled open just do a search for “sidebar” and you should see the default key binding.

Just click in the “Toggle Side Bar Visibility” box and remap the keybinding to Ctrl+Shift+B (since that doesn’t get used by anything by default in Vim).  This menu is also really useful for just discovering all of the different keyboard shortcuts in Vscode.  I’m not really a fan of going crazy and totally customizing hotkeys in Vscode just because it makes support much harder but in this case it was easy enough and works really well with my workflow.

Read More