Idempotent Shell Scripts with Terraform

March 22, 2020March 22, 2020 Josh Reichardt 2 Comments

One challenge when dealing with Terraform is keeping things clean and repeatable. My current favorite approach to accomplish this task using shell scripts is by using a combination of null_resources and triggers to control when scripts should be updated. These controls combined with Terraform provisioners and template_files provide a nice flexible way to deal with otherwise potentially messy scripts.

I like to provide more than one trigger so that they can be recomputed when either a variable that gets passed into the script changes, or, the script itself changes. This trick is handy when you need to bend Terraform into doing something that it usually doesn’t handle, like some startup or bootstrap process.

Below is the full example with templated scripts and triggers in Terraform v0.11. The same logic should also work with Terraform v0.12 with minimal changes.

data "template_file" "cool_script" {
  template = "${file("${path.module}/script.sh")}"

  vars {
    my_cool_var = "${var.my_cool_var}"
  }
}

resource "null_resource" "script" {

  # Trigger when the script when variables change
  triggers = {
    my_trigger = "${var.my_cool_var}"
    script_sha = "${sha256(file("${path.module}/script.sh"))}"
  }

  provisioner "local-exec" {
    command   = "${data.template_file.cool_script.rendered}"
    interpreter = ["/bin/bash", "-c"]
  }
}

You can provide as many variables to the template_file as the script needs and any time those get updated, the null_resource will pick these changes up and update/rerun your script for you. Notice that the provisioner in the null_resource is basically set up to call bash against the rendered script that we created, using the values in the vars.

If you call the script again, nothing should change because we already computed the SHA of the rendered script and told Terraform to keep track of that state. The below example is a simple way for telling Terraform to keep track of changes to the script/template.

script_sha = "${sha256(file("${path.module}/script.sh"))}"

Once the script is updated, the next time the script gets called, its values and output should update accordingly. Likewise, since we are also using a var as a trigger, when that changes the script will also be updated.

Terraform is flexible enough to allow us to do things like this because often times there are situations and edge cases where Terraform can’t really perform some actions. The above example is a common workaround for provisioning resources that either don’t have an API that Terraform can tap into, or are just tasks that are only handled via some startup or bootstrap script/process.

Mount a volume using Ignition and Terraform

April 1, 2018 Josh Reichardt Leave a comment

Sometimes when provisioning a server you may want to configure and provision storage as part of the bootstrapping and booting process. For example, the other day I ran into an issue where I needed to define a disk, partition it, mount it to a specified location and then create a few directories in it. It turned out to be surprisingly not straight forward to provision this storage and I learned quite a few things that I thought were worth sharing.

I’d just like to mention that ignition works like magic. If you aren’t familiar, Ignition is basically a tool to help provision and configure servers, very similar to cloud-config except by default Ignition only runs once, on first boot. The magic of Ignition is that it injects itself into initramfs before the OS ever eve boots and manipulating the system. Ignition can be read in from remote URL so that it can easily be provisioned in bare metal infrastructures. There were several pieces to this puzzle.

The first was getting down all of the various ignition configuration components in Terraform. Nothing was particularly complicated, there was just a lot of trial and error to get everything working. Terraform has some really nice documentation for working with Ignition configurations, I’d recommend starting there and just playing around to figure out some of the various bits and pieces of configuration that Ignition can do. There is some documentation on Ignition troubleshooting as well which I found to be helpful when things weren’t working correctly.

Below each portion of the Ignition configuration gets declared inside of a “ignition_config” block. The Ignition configuration then points towards each invidual component that we want Ignition to configure. e.g. systemd, filesystem, directories, etc.

data "ignition_config" "staging_rancher_host_stateful" {
  systemd = [
     "${data.ignition_systemd_unit.mount_data.id}",
  ]

  filesystems = [
    "${data.ignition_filesystem.data_fs.id}",
  ]

  directories = [
    "${data.ignition_directory.data_dir.id}",
  ]

  disks = [
    "${data.ignition_disk.data_disk.id}",
  ]
}

This part of the setup is pretty straight forward. Create a data block with the needed ignition configuration to mount the disk to the correct location, format the device if it hasn’t already been formatted and create the desired directory and then create the Systemd unit to configure the mount point for the OS. Here’s what each of the data blocks might look like.

data "ignition_filesystem" "data_fs" {
   name = "data"

  mount {
    device = "/dev/xvdb1"
    format = "ext4"
  }
}

data "ignition_directory" "data_dir" {
  filesystem = "data"
  path = "/data"
  uid = 500
  gid = 500
}

data "ignition_disk" "data_disk" {
  device = "/dev/xvdb"

  partition {
    number = 1
    start = 0
    size = 0
  }
}

Next, create the Systemd unit.

data "ignition_systemd_unit" "mount_data" {
  content = "${file("./data.mount")}"
  name = "data.mount"
}

Another challenge was getting the Systemd unit to mount the disk correctly. I don’t work with Systemd frequently so initially had some trouble figuring this part out. Basically, Systemd expects the service/unit definition name to EXACTLY match what’s declared inside the “Where” clause of the service definition.

For example, the following configuration needs to be named data.mount because that is what is defined in the service.

[Unit]
Description=Mount /data
Before=local-fs.target

[Mount]
What=/dev/xvdb1
Where=/data
Type=ext4

[Install]
WantedBy=local-fs.target

After all the kinks have been worked out of the Systemd unit(s) and other above Terraform Ignition configuration you should be able to deploy this and have Ignition provision disks for you automatically when the OS comes up. This can be extended as much as needed for getting initial disks set up correctly and is a huge step in automating your infrastructure in a nice repeatable way.

There is currently an open issue with Ignition currently where it breaks when attempting to re-provision a previously configured disk on a new machine. Basically the Ignition process chokes because it sees the device has already been partitioned and formatted and can’t do it again. I ran into this scenario where I was trying to create a basically floating persistent data EBS volume that gets attached to servers in an autoscaling group and wanted to allow the volume to be able to move around freely if the server gets killed off.

Configure S3 to store load balancer logs using Terraform

March 21, 2018 Josh Reichardt Leave a comment

If you’ve ever encountered the following error (or similar) when setting up an AWS load balancer to write its logs to an s3 bucket using Terraform then you are not alone. I decided to write a quick note about this problem because it is the second time I have been bitten by this and had to spend time Googling around for an answer. The AWS documentation for creating and attaching the policy makes sense but the idea behind why you need to do it is murky at best.

aws_elb.alb: Failure configuring ELB attributes: InvalidConfigurationRequest: Access Denied for bucket: <my-bucket> Please check S3bucket permission
status code: 409, request id: xxxx

For reference, here are the docs for how to manually create the policy by going through the AWS console. This method works fine for manually creating and attaching to the policy to the bucket. The problem is that it isn’t obvious why this needs to happen in the first place and also not obvious to do in Terraform after you figure out why you need to do this. Luckily Terraform has great support for IAM, which makes it easy to configure the policy and attach it to the bucket correctly.

Below is an example of how you can create this policy and attach it to your load balancer log bucket.

data "aws_elb_service_account" "main" {}

data "aws_iam_policy_document" "s3_lb_write" {
    policy_id = "s3_lb_write"

    statement = {
        actions = ["s3:PutObject"]
        resources = ["arn:aws:s3:::<my-bucket>/logs/*"]

        principals = {
            identifiers = ["${data.aws_elb_service_account.main.arn}"]
            type = "AWS"
        }
    }
}

Notice that you don’t need to explicitly define the principal like you do when setting up the policy manually. Just use the ${data.aws_elb_service_account.main.arn} variable and Terraform will figure out the region that the bucket is in and pick out the correct parent ELB ID to attach to the policy. You can verify this by checking the table from the link above and cross reference it with the Terraform output for creating and attaching the policy.

You shouldn’t need to update anything in the load balancer config for this to work, just rerun the failed command again and it should work. For completeness here is what that configuration might look like.

...
access_logs {
    bucket = "${var.my_bucket}"
    prefix = "logs"
    enabled = true
}
...

This process is easy enough but still begs the question of why this seemingly unnecessary process needs to happen in the first place? After searching around for a bit I finally found this:

When Amazon S3 receives a request—for example, a bucket or an object operation—it first verifies that the requester has the necessary permissions. Amazon S3 evaluates all the relevant access policies, user policies, and resource-based policies (bucket policy, bucket ACL, object ACL) in deciding whether to authorize the request.

Okay, so it basically looks like when the load balancer gets created, the load balancer gets associated with an AWS owned ID, which we need to explicitly give permission to, through IAM policy:

If the request is for an operation on an object that the bucket owner does not own, in addition to making sure the requester has permissions from the object owner, Amazon S3 must also check the bucket policy to ensure the bucket owner has not set explicit deny on the object

Note

A bucket owner (who pays the bill) can explicitly deny access to objects in the bucket regardless of who owns it. The bucket owner can also delete any object in the bucket.

There we go. There is a little bit more information in the link above but now it makes more sense.