Bootstrap servers to a Rancher environment

If you’re not familiar already, Rancher is an orchestration and scheduling tool for containers.  I have written a little bit about Rancher in the past but haven’t covered much on the specifics about how to manage a Rancher environment.  One cool thing about Rancher is its “single pane of glass” approach to managing servers and containers, which allows users and admins to quickly and easily manage complicated environments.  In this post I’ll be covering how to quickly and automatically add servers to your Rancher environment.

One of the manual steps that can(and in my opinion should) be automated is the server bootstrapping process.  The Rancher web interface allows users to add hosts across different cloud providers (AWS, Azure, GCE, etc) and importantly the ability to add a custom host.  This custom host registration is the piece that allows us to automate the host addition process by exposing a registration token via the Rancher API.  One important thing to note if you are going to be adding hosts automatically is that you will need to actually create the entries necessary in the environment that you bootstrap servers to.  So for example, if you create a new environment you will either need to programatically hit the API or in the web interface navigate to Infrastructure -> Add Host to populate the necessary tokens and entries.

Once you have populated the API with the values needed, you will need to create an API token to allow the server(s) that are bootstrapping to connect to the Rancher server to add themselves.  If you haven’t done this before, in the environment you’d like to allow access to navigate to API -> Add Environment API Key -> name it and make a note of key that gets generated.

rancher api

That’s pretty much all of the prep work you need to do to your Rancher environment for this method to work.  The next step is to make a script to bootstrap a server when it gets created.  The logic for this bootstrap process can be boiled down to the following snippet.

#!/bin/bash

INTERNAL_IP=$(ip add show eth0 | awk '/inet/ {print $2}' | cut -d/ -f1 | head -1)
SERVER="https://example.com"
TOKEN="access_key:secret_key"
PROJID="unique_environment"
AGENT_VER="v1.0.1"

RANCHER_URL=$(curl -su $TOKEN $SERVER/v1/registrationtokens?projectId=$PROJID | head -1 | grep -nhoe 'registrationUrl[^},]*}' | egrep -hoe 'https?:.*[^"}]')

docker run \
  -e CATTLE_AGENT_IP=$INTERNAL_IP \
  -e CATTLE_HOST_LABELS='your=label' \
  -d --privileged --name rancher-bootstrap \
  -v /var/run/docker.sock:/var/run/docker.sock \
  rancher/agent:$AGENT_VER $RANCHER_URL

The script is pretty straight forward.  It attempts to gather the internal IP address of the server being created, so that it can add it to the Rancher environment with a unique name.  Note that there are a number of variables that need to get set to reflect.   One that uses the DNS name of the Rancher server, one for the token that was generated in the step above and one for the project ID, which can be found by navigating to the Environment and then looking at the URL for /env/xxxx.

After we have all the needed information and updated the script, we can curl the Rancher server (this won’t work if you didn’t populate the API in the steps above or if your keys are invalide) with the registration token.  Finally, start a docker container with the agent version set (check your Rancher server version and match to that) along with the URL obtained from the curl command.

The final step is to get the script to run when the server is provisioned.  There are many ways to do this and this step will vary depending a number of different factors,  but in this post I am using Cloud-init for CoreOS on AWS.  Cloud-init is used to inject the script into the server and then create a systemd service to run the script the first time the server boots and use the result of the script to run the Rancher agent which allows the server to be picked up by the Rancher server and its environment.

Here is the logic to run the script when the server is booted.

coreos:

  units:
  - name: rancher-agent.service
    command: start
    content: |
      [Unit]
      Description=Rancher Agent
      After=docker.service
      Requires=docker.service
      After=network-online.target
      Requires=network-online.target

      [Service]
      Type=oneshot
      RemainAfterExit=yes
      ExecStart=/etc/rancher-agent

The full version of the cloud-init file can be found here.

After you provision your server and give it a minute to warm up and run the script, check your Rancher environment to see if your server has popped up.  If it hasn’t, the first place to start looking is on the server itself that was just created.  Run docker logs -f rancher-agent to get information about what went wrong.  Usually the problem is pretty obvious.

A brand new server looks something like this.

bootstrapped server

I typically use Terraform to provision these servers but I feel like covering Terraform here is a little bit out of scope.  You can image some really interesting possibilities with auto scaling groups and load balancers that can come and go as your environment changes, one of the beauties of disposable infrastructure as well as infrastructure as code.

If you are interested in seeing how this Rancher bootstrap process fits in with Terraform let me know and I’ll take a stab at writing up a little piece on how to get it working.

Read More

Enable SSL for your WordPress blog

With the announcement of the public beta of the Let’s Encrypt project, it is now nearly trivial to get your site set up with an SSL certificate.  One of the best parts about the Let’s Encrypt project is that it is totally free, so there is pretty much no reason to protect your blog set up with an SSL certificate.  The other nice part of Let’s Encrypt is that it is very easy to get your certificate issued.

The first step to get started is grabbing the latest source code from GitHub for the project.  Log on to your WordPress server (I’m running Ubuntu) and clone the repo.  Make sure to install git if you haven’t already.

git clone https://github.com/letsencrypt/letsencrypt.git

There is a shell script you can run to pretty much do everything for you, including installation of any packages and libraries it needs as well as configures paths and other components it needs to work.

cd letsencrypt
./letsencrypt-auto

After the bootstrap is done there should be some CLI options.  Since I am using Apache for my blog I will use the “–apache” option.

./letsencrypt-auto --apache

There will be some prompts you need to go through for setting up the certificates and account creation.

let's encrypt

 

 

 

 

 

This process is still somewhat error prone, so if you make a typo you can just rerun the “./letsencrypt-auto” command and follow the prompts.

The certificates will be dropped in to /etc/letsencrypt/live/<website>.  Go double check them if needed.

This process will also generate a new apache configuration file for you to use.  You can check for the file in /etc/apache2/site-enabled.  The import part of this config should look similar to the following:

<VirtualHost *:443>
  UseCanonicalName Off
  ServerAdmin webmaster@localhost
  DocumentRoot /var/www/wordpress
  SSLCertificateFile /etc/letsencrypt/live/thepracticalsysadmin.com/cert.pem
  SSLCertificateKeyFile /etc/letsencrypt/live/thepracticalsysadmin.com/privkey.pem
  Include /etc/letsencrypt/options-ssl-apache.conf
  SSLCertificateChainFile /etc/letsencrypt/live/thepracticalsysadmin.com/chain.pem
</VirtualHost>

As a side note, you will probably want to redirect non https requests to use the encrypted connection.  This is easy enough to do, just go find your .htaccess file (mine was in /var/www/wordpress/.htaccess) and add the following rules.

<IfModule mod_rewrite.c>
  RewriteEngine On
  RewriteCond %{SERVER_PORT} 80
  RewriteRule ^(.*)$ https://example.com/$1 [R,L]
</IfModule>

Before we restart Apache with the new configuration let’s run a quick configtest to make sure it all works as expected.

apachectl configtest

If everything looks okay in the configtest then you can reload or restart apache.

service apache2 restart

Now when you visit your site you should get the nice shiny green lock icon on the address bar.  It is important to remember that the certificates issued by the Let’s Encrypt project are valid for 90 days so you will need to make sure to keep up to date and generate new certificates every so often.  The Let’s Encrypt folks are working on automating this process but for now you will need to manually generate new certificates and reload your web server.

let's encrypt

 

 

 

 

 

 

 

 

 

 

 

 

 

 

That’s it.  Your site should now be functioning with SSL.

Updating the certificate automatically

To take this process one step further We can make a script that can be run via cron (or manually) to update the certificate.

Here’s what the script looks like.

#!/usr/bin/env bash

dir="/etc/letsencrypt/live/example.com"
acme_server="https://acme-v01.api.letsencrypt.org/directory"
domain="example.com"
https="--standalone-supported-challenges tls-sni-01"

# Using webroot method
#/root/letsencrypt/letsencrypt-auto --renew certonly --server $acme_server -a webroot --webroot-path=$dir -d $domain --agree-tos

# Using standalone method
service apache2 stop
/root/letsencrypt/letsencrypt-auto --renew certonly --standalone $https -d $domain --agree-tos
service apache start

Notice that I have the “webroot” method commented out.  I run a service (Varnish) on port 80 that proxies traffic but also interferes with LE so I chose to run the standalone renewal method.  It is pretty easy, the main difference is that you need to turn off Apache before you run it since Apache binds to to ports 80/443.  But the downtime is okay in my case.

I chose to put the script in to a cron job and have it run every 45 days so that I don’t have to worry about logging on to my server to regenerate the certifcate.

This is a straight forward process and will help with your search engine juices as well.

Read More

Set up SSL for Rancher Server

One issue you will probably run across if you start to use Rancher to manage your Docker containers is that it doesn’t serve pages over an encrypted connection by default.  If you are looking to put Rancher in to a production scenario, it is a good idea to serve encrypted pages.  HA is another topic, but at this point I have not attempted to set it up yet because it is a much more complicated process currently.  The Rancher folks are working on making HA easier in the near future (if you know an easy way to do it I would love to hear about it).  I would argue though that if you can set up SSL for your Rancher server you are over half way to a full production set up.

The process of getting Rancher to proxy through an encrypted connection is straight forward, assuming you already have some certs to use.  If you don’t already have any official certificates issued *I think* you should be okay with self signed certs, but you won’t get that green lock that everybody loves.  Definitely if you are just testing this set up you should be fine to start out with some self signed certs.  Here is a reference for creating some certs for Nginx to test with.

Another important thing to be aware of is that these instructions are specific to the Nginx method outline above.  I have not tried the Apache method, though I would guess it should be very easy to adapt.

Take a look at the Rancher docs as a starting point for getting started, they are very good and will get you most of the way there.  However, when I went through this process there were a few pieces of information that I had to piece together myself, which is the bulk of what I will be sharing today.

The first step is to adapt the configuration in the docs in to a full Nginx config that can be dropped in to the official Nginx image from Dockerhub.  Here is the config I used.

upstream rancher {
    server rancher-server:8080;
}

server {
    listen 443 ssl;
    server_name test.com;
    ssl_certificate /etc/rancher/test.com.crt;
    ssl_certificate_key /etc/rancher/test.com.key;

    access_log /var/log/nginx/access.log;
    error_log  /var/log/nginx/error.log;

    location / {
        proxy_set_header Host $host;
        proxy_set_header X-Forwarded-Proto $scheme;
        proxy_set_header X-Forwarded-Port $server_port;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_pass http://rancher;
        proxy_http_version 1.1;
        proxy_set_header Upgrade $http_upgrade;
        proxy_set_header Connection "upgrade";
        # This allows the ability for the execute shell window to remain open for up to 15 minutes. Without this parameter, the default is 1 minute and will automatically close.
        proxy_read_timeout 900s;
    }
}

server {
    listen 80;
    server_name test.com;
    return 301 https://$server_name$request_uri;
}

There are a few important things to note about this config.   One is naming the upstream the same name as what the rancher server container is named, in this case rancher-server.

Note that I have used test.com as the server name and so the certs and names are all reflective of that value.  Obviously that will need to be updated with your own values.

Finally, we have added an additional logging section to the config that will pipe the logs to stdout/stderr so we can easily look at the requests from the host OS via the “docker logs” command.

To get the following Docker run command to work correctly you will want to create a directory called /etc/rancher or something easy to remember, and place this config (named as rancher-nginx.conf), along with the certs you have created in to this location.  Alternately you can modify the Docker run command and simply have the volume mounts pointed at the location you store the configuration and certs.  For me, it makes the most sense to group these items together in /etc/rancher.

docker run -d --restart=always --name nginx 
    -v /etc/rancher/rancher-nginx.conf:/etc/nginx/conf.d/default.conf
    -v /etc/rancher/test.com.crt:/etc/rancher/test.com.crt
    -v /etc/rancher/test.com.key:/etc/rancher/test.com.key
    -p 80:80 -p 443:443 --link=rancher-server nginx

This will mount in the correct configuration and certificates to the Nginx docker container, expose port 80 and 443 for web traffic (make sure to adjust any firewall rules you need to get traffic to pass through these ports), and link to the rancher-server container so that the traffic can be proxied.

Additionally, you will need to update any reference to the old address that was using http://<rancher-name>:8080/ to point to https://<rancher-name>/.  Namely the host registration configuration in the Rancher server, but if you were relying on any other outside tools to hit that endpoint they will also need to be updated to use https instead.

Read More

ECS cluster turnup with CoreOS and Terraform

Recently I have been evaluating different container clustering tools and technologies.  It has been a fun experience thus far, the tools and community being built around Docker have come a long time since I last looked.  So for today’s post I’d like to go over ECS a little bit.

ECS is essentially the AWS version of container management.  ECS takes care of managing your Docker (container) infrastructure by handling creation, management, destruction and scheduling as well as providing API integration with other AWS services, which is really powerful.  To get ECS up and running all you need to do is create an ECS cluster, either from the AWS console or from some other AWS integration like the CLI or Terraform, then install the agent on servers that you would like ECS to schedule work on.  After setting up the agent and cluster name you are basically ready to go, start by creating a task and then create a service to start running containers on the cluster.  Some cool new features have been announced at this years re:Invent conference but I haven’t had a chance yet to look at them yet.

First impression of ECS

The best part about testing ECS by far has been how easy it is to get set up and running.  It took less than 20 minutes to go from nothing to fully functioning cluster that was scheduling containers to hosts and receiving load.  I think the most powerful aspect of ECS is its integration with other AWS services.  For example, if you need to attach containers/services to a load balancer, the AWS infrastructure is already there so the different pieces of the infrastructure really mesh well together.

The biggest downside so far is that the ECS console interface is still clunky.  It is functional, and I have been able to use it to do everything I have needed but it just feels like it needs some polish and things are nested in menu’s and usually not easy to find.  I’m sure there are plans to improve the interface and as mentioned above some new features were recently announced, so I have a feeling there will be some nice improvements on the way.

I haven’t tried the CLI tool yet but it looks promising for automating containers and services.

Setting things up

Since I am a big fan of CoreOS I decided to try turning up my ECS cluster using CoreOS as the base OS and Terraform to do the heavy lifting and provisioning.

The first step is to create your cluster.  I noticed in the AWS console there was a configuration wizard that guides you through your first cluster which was annoying because there wasn’t a clean way to just create the cluster.  So you will need to follow the on screen instructions for getting your first environment set up.  If any of this is unclear there is a good guide for getting started with ECS here.

After your cluster has been created there is a menu that shows your ECS environments.

ECS cluster menu

 

 

 

 

 

 

 

 

 

 

Next, you will need to turn on the nodes that will be connecting to this cluster.  The first part of this is to get your cloud-config set up to connect to the cluster.  I used the CoreOS docs to set up the ECS agent, making sure to change the ECS_CLUSTER= section in the config.

#cloud-config

coreos:
  units:
  -
  name: amazon-ecs-agent.service
  command: start
  runtime: true
  content: |
  [Unit]
  Description=Amazon ECS Agent
  After=docker.service
  Requires=docker.service
  Requires=network-online.target
  After=network-online.target

  [Service]
  Environment=ECS_CLUSTER=my-cluster
  Environment=ECS_LOGLEVEL=warn
  Environment=ECS_CHECKPOINT=true
  ExecStartPre=-/usr/bin/docker kill ecs-agent
  ExecStartPre=-/usr/bin/docker rm ecs-agent
  ExecStartPre=/usr/bin/docker pull amazon/amazon-ecs-agent
  ExecStart=/usr/bin/docker run --name ecs-agent --env=ECS_CLUSTER=${ECS_CLUSTER} --env=ECS_LOGLEVEL=${ECS_LOGLEVEL} --env=ECS_CHECKPOINT=${ECS_CHECKPOINT} --publish=127.0.0.1:51678:51678 --volume=/var/run/docker.sock:/var/run/docker.sock --volume=/var/lib/aws/ecs:/data amazon/amazon-ecs-agent
  ExecStop=/usr/bin/docker stop ecs-agent

Note that the Environment=ECS_CLUSTER=my-cluster, this is the most important bit to get the server to check in to your cluster, assuming you named it “my-cluster”.  Feel free to add any other values your infrastructure may need.  Once you have the config how you want it, run it through the CoreOS cloud-config validator to make sure it checks out.  If everything looks okay there, your cloud-config should be ready to go.

You can find more info about how to configure the ECS agent in the docs here.

Once you have your cloud-config in order, you will need to get your Terraform “recipe” set up.  I used this awesome github project as the base for my own project.  The Terraform logic from there basically creates an AWS launch config and autoscaling group (and uses the cloud-config from above) to launch instances in to your cluster.  And the ECS agent takes care of the rest, once your servers are up and the agent is reporting in to the cluster.

launch_config.tf

resource "aws_launch_configuration" "ecs" {
  name = "ECS ${var.cluster_name}"
  image_id = "${var.ami}"
  instance_type = "${var.instance_type}"
  iam_instance_profile = "${var.iam_instance_profile}"
  key_name = "${var.key_name}"
  security_groups = ["${split(",", var.security_group_ids)}"]
  user_data = "${file("../cloud-config/ecs.yml")}"

  root_block_device = {
    volume_type = "gp2"
    volume_size = "40"
  }
}

Notice the user_data section.  This is where we inject the cloud config from above to provision CoreOS and launch the ECS agent.

autoscaler.tf

resource "aws_autoscaling_group" "ecs-cluster" {
  availability_zones = ["${split(",", var.availability_zones)}"]
  vpc_zone_identifier = ["${split(",", var.subnet_ids)}"]
  name = "ECS ${var.cluster_name}"
  min_size = "${var.min_size}"
  max_size = "${var.max_size}"
  desired_capacity = "${var.desired_capacity}"
  health_check_type = "EC2"
  launch_configuration = "${aws_launch_configuration.ecs.name}"
  health_check_grace_period = "${var.health_check_grace_period}"

  tag {
    key = "Env"
    value = "${var.environment_name}"
    propagate_at_launch = true
  }

  tag {
    key = "Name"
    value = "ECS ${var.cluster_name}"
    propagate_at_launch = true
  }
}

There are a few caveats I’d like to highlight with this approach.  First, I already have an AWS infrastructure in place that I was testing agains this.  So I didn’t have to do any of the extra work to create a VPC, or a gateway for the VPC.  I didn’t have to create the security groups and subnets either, I just added them to the Terraform code.

The other caveat is that if you want to use the Github project I linked to you will need to make sure that you populate the variables with your own environment specific values.  That is why having the VPC, subnets and security groups was handy for me.  Be sure to browse through the variables.tf file and substitute in your own values.  As an example,  I had to update the variables to use the CoreOS 766.4.0 image.  This AMI will be specific to your AWS region so make sure to look up the AMI first.

variable "ami" {
  /* CoreOS 766.4.0 */
  default = "ami-dbe71d9f"
  description = "AMI id to launch, must be in the region specified by the region variable"
}

Another part I had to modify to get the Github project to work was adding in my AWS credentials which look similar to the following.  Make sure to update these variables with your ID and secret.

provider "aws" {
  access_key = "${var.access_key}"
  secret_key = "${var.secret_key}"
  region = "${var.region}"
}

variable "access_key" {
  description = "AWS access key"
  default = "XXX"
}

variable "secret_key" {
  description = "AWS secret access key"
  default = "xxx"
}

Make sure to also copy/edit the autoscaling.tf and launch_config.tf files to reflect anything that is specific to your environment (Terraform will complain if there are issues).

After you have combed through the variables.tf and updated the Terraform files to your liking you can simply run terraform plan -input=false and see how Terraform will create the ASG for you.

If everything looks good, you can run terrafrom apply -input=false and Terraform will go out and start building your new ECS infrastructure for you.  After a few minutes check the EC2 console and your launch config and autoscaling group should be in there.  If that stuff all looks okay, check the ECS console and your new servers should show up and be ready to go to work for you!

NOTE: If you are starting from scratch, it is possible to do all of the infrastructure provisioning via Terraform but it is too far out of the scope of this post to cover because there are a lot of steps to it.

Read More

Intro to Systemd

I have a rocky relationship with Systemd.  On the one hand I love how powerful and extensive it is.  On the other hand, I hate how cumbersome and clunky it can sometimes feel.  There are a TON of moving components and it is very confusing to use if you have no experience with it.  The aim of this post is NOT to debate relative merits of Systemd but instead to take users through a few basic examples of how to accomplish tasks with Systemd and get familiar with how to manage systems with this framework.

My background is primarily with Debian/Ubuntu so moving over to this init system has been a learning curve.  The problem I had when I first made the transition is that there aren’t a lot of great resources currently for making the change.

Most of my knowledge has been pieced together from various blog posts and sites, the best of which is over at the Arch wiki.  As Systemd continues to mature it is becoming increasingly easier to find good guides and resources but the Arch wiki is still the defacto, go to place to find resources about how to use Systemd.

Another thing that has helped is forcing myself to use CoreOS, which, for better or worse has forced me to learn how to use Systemd.  It seems that most modern Linux distro’s are moving towards Systemd, so it’s probably worth it to at least begin experimenting with it at this point.  Ubuntu 15.04 as well as Debian 8 have both made the jump, along with other RedHat based distro’s, with more to follow.

While the learning curve can be a little steep to start with, you can learn about 80% of the things that Systemd can do with about 20% of the effort.  Getting the basics down takes a little bit of effort but will more than be enough to manage a system.  Before starting, as a disclaimer, I do not claim to be an expert but I have learned the hard way how things work and sometimes don’t work with Systemd.

Basics of Systemd

So now that we have a little bit of background stuff out of the way, we can look at some of the meat and potatoes of Systemd.  The first thing to do is get an idea of what services are running on your system.

systemctl

This will give you a listing of everything running on your system.  Usually there are a very large number of units listed.  This isn’t really important but say for example, there is a failed unit.  You can filter systemctl to only spit out failed units.

systemctl --failed

That’s pretty cool.  If, for examle there is a failed unit file you can drill down in to that unit specifically.

systemctl status <unit>

This will give you process and service information for the unit file and the last 10 lines of logs for the unit as well, which can be really handy for troubleshooting.

One of the easier ways to work with unit files is to use the edit subcommand.  This method removes much of the complexity of having to know exactly where all of the unit files live on the system.

systemctl edit --full <unit>

You can manage and interact with Systemd and the unit files in a straight forward way.  We will examine a few examples.

systemctl start <unit> - start a stopped service
systemctl stop <unit> - stop a running service
systemctl restart <unit> - restart a service
systemctl disable <unit> - stop unit from loading on boot
systemctl enable <unit>  - load unit on boot

If you make a change to a service or unit file (we will go over this later) you need to reload the Systemd daemon for it to pick up the changes to the file.

sytemctl daemon-reload

This is super important (on CoreOS at least) when troubleshooting because if you don’t run the reload your process will not update and will not change its behavior.

There are many many more tips and tricks for working with systemctl but this should give users a basic grasp on interacting with components of the system.  Again, for lots more details I advise that you check out the wiki.

Journalctl

If you are coming from older Debian/Ubuntu distros you will need to get familiar with using journalctl to view and manipulate your log files.

One of the first changes that people notice right away is that there aren’t any beloved log files /var/log in Systemd any more.  Well, there are still logs, but it isn’t what most folks are used to.

The files haven’t gone away, they are simply disguised as a binary file and are accessible via journald.  To check how much space your logs are taking up on disk you can use the following command.

journalctl --disk-usage

Journalctl is a powerful tool for doing just about everything you can think of with log files.  Because journald writes logs as a binary file you need this tool to interact with them.  That’s okay though because journalctl has pretty much all of the features needed to look through logs with its built in flags.  For example, if you want to tail a log file, you can use the “-f” flag to follow “-u” to specify the systemd unit and the systemd unit name to follow it as illustrated below (this is a CoreOS system so your results may vary on a different OS).

journalctl -f -u motdgen

To follow all the entries that journalctl is picking up (pretty much just tailing all the logs as an equivalent) you can just use this.

journalctl -f

Sometimes you only want to view the last X number of entries in a log file.  To look at older entries you can use the “-n” flag.

journalctl -n 100 -u motdgen

You can filter logs to only show messages since the last boot.

journalctl -b

There are many more options and the filtering capabilities of journalctl are very powerful.  Additionally, journalctl offers JSON outputs, filtering by log priority, granular filtering by time with –unit and –since flags, filtering on different boots, and more.

I suggest exploring on your own a little bit to see how far the capabilities can go.  Here is a link to all of the various flags that journalctl has as well as a good writeup to get a good idea of some of the more advanced capabilities of journalctl.

Drop-in Units

Writing unit files is somewhat of an art form.  I feel that it can quickly become such a complicated topic that I won’t discuss it in full detail here.

By default, Systemd unit files are written to /usr/lib/systemd/system and custom defined unit files are written to /etc/systemd/system.  Systemd looks at both locations for which files to load up.  Because of this, you can extend base units that live in the /usr/lib path by augmenting them in /etc/systemd.

This has a few implications.  First, it makes management of unit files a little bit easier to work with.  It also makes extending units a little bit easier.  You don’t need to worry about manipulting the system level units, you just add functionality on top of them in /etc/systemd.

Drop-in units are a handy feature if you need to extend a basic unit.  To make the drop-in work, it must follow the format of /etc/systemd/system/unit.d/override.conf, where unit.d is the full name of the service that you are extending.

The following example extends the CoreOS baked in etcd2 service

/etc/systemd/system/etcd2.service.d/30-configuration.conf

[Service]
# General settings
Environment=ETCD_DEBUG=true

Replacing Cron

Well not exactly.  Systemd doesn’t use cron jobs so if you want to get something to run on a schedule you will need to use a timer unit and associate a service with it.

[Unit]
Description=Log file cleaner (runs daily at midnight)

[Timer]
OnCalendar=daily

Notice that there is a log-cleaner.timer as well as a log-cleaner.service (below).  The easiest way I have found is to create a service/timer pair, which ensures that the timer runs the service based on its schedule.

[Unit]
Description=Log file cleaner

[Service]
ExecStart=/usr/bin/bash -c "truncate -s 1m /data/*.log"

The only thing that the service is doing is running a shell command which will get run based on the timer directive in the timer unit file.

The timer directive in the timer unit is pretty flexible.  For example, you can tell the timer to run based on the calendar either by day, week or month or alternately you can have the unit run services based on the system time by seconds, minutes or hours.

One of the down sides of timers is that there is not an easy way to email results.  With cron this is easily configurable.  Likewise, timers don’t have a random delay to run jobs like cron has.  So, timers are definitely no replacement for cron but have a lot of usable functionality, it is just confusing to learn at first, especially if you are used to cron.

Here is more involved tutorial for working with timers.  Also, the Arch wiki once again is great and has a dedicated section for timers.

Conclusion

This introduction is really just the beginning.  There are so many more pieces to the Systemd puzzle, I have just covered the basics, in fact I have deliberately chosen not to cover many of the topics and concepts because they can convolute the learning process and slow down the knowledge.  I chose to focus instead on the most basic ideas that are used most frequently in day to day administration.  Feel free to go do your own research and play around with the more advanced topics.

Systemd aims to solve a lot of problems but brings about a lot of complexity as well to accomplish this task.  In my opinion it is definitely worth investing the time and energy to learn this system and service manager though, pretty much all the Linux distro’s are moving towards it in the future and you would be doing yourself a disservice by not learning it.

Other areas of Systemd that aren’t covered in this post but are worth researching and playing with are:

  • Targets (analogous to runlevels)
  • Managing NTP with timedateclt
  • Network management with networkd
  • Hostname management with hostnamectl
  • Username management with loginctl

One of the easiest ways to get started with systemd is by grabbing a Vagrant box that leverages Systemd.  I like the CoreOS Vagrant box but there are many other options for getting started.  One of the best tools for learning is by doing, so take some of these commands and resources and go start playing around with Systemd.  Feel free if you have any extra additions or need help getting started with Systemd.

Read More