Untangling CLI Text Processing Tools

February 2, 2020February 7, 2020 Josh Reichardt 2 Comments

Update (2/7/2020): Added jc

The landscape of command line driver text manipulation and processing tools is somewhat large and confusing, with more and more tools emerging all the time. Because I am having trouble keeping them all in my head, I decided to make a little reference guide to help remember which tool to choose for the correct task at hand.

jq
jc
yq (both python and go versions)
oq
xq
hq
jk
ytt
configula
jsonnet
sed (for everything else)

Reach for jq first when you need to do any kind of processing with JSON files.

From the website, “jq is like sed for JSON data – you can use it to slice and filter and map and transform structured data with the same ease that sed, awk, grep and friends let you play with text.”

sudo apt-get install jq
# or
brew install jq

# consume json and output as unchanged json
curl 'https://api.github.com/repos/stedolan/jq/commits?per_page=5' | jq '.'

This project looks very interesting and is a refreshing thing to see in the Linux world. JC is basically a way to create structured objects (ala PowerShell) as JSON output from running various Linux commands. And from the GitHub repo, ” This tool serializes the output of popular gnu linux command line tools and file types to structured JSON output. This allows piping of output to tools like jq”.

Transforming data into structured objects can massively simplify interacting with them by pairing the output with jq to interact with.

pip3 install --upgrade jc
df | jc --df -p
[
  {
    "filesystem": "devtmpfs",
    "1k_blocks": 1918816,
    "used": 0,
    "available": 1918816,
    "use_percent": 0,
    "mounted_on": "/dev"
  },
  {
    "filesystem": "tmpfs",
    "1k_blocks": 1930664,
    "used": 0,
    "available": 1930664,
    "use_percent": 0,
    "mounted_on": "/dev/shm"
  },
  ...
]

yq (Python) (Go)

This tool can be confusing because there is both a Python version and a Go version. On top of that, the Python version includes its own version of xq, which is different than the standalone xq tool.

The main differences between the Python and Go version is that the Python version can deal with both yaml and xml while the Go version is meant to be used as a command line tool to deal with only yaml.

pip install yq
cat input.yml | yq -y .foo.bar

From the website, oq is “A performant, portable jq wrapper thats facilitates the consumption and output of formats other than JSON; using jq filters to transform the data”.

The claim to fame that oq has is that it is very similar to jq but works better with other data formats, including xml and yaml. For example, you can read in some xml (and others), apply some filters, and output to yaml (and others). This flexibility makes oq a good option if you need to deal with different data formats oustide of JSON.

snap install oq
# or
brew tap blacksmoke16/tap && brew install oq

# consume json and output xml
echo '{"name": "Jim"}' | oq -o xml .

From the Github page, “Apply XPath expressions to XML, like jq does for JSONPath and JSON”.

The coolest use case I have found for xq so far is taking in an xml file and outputting it into a json file, which surprising I haven’t found another tool that can do this (oc authors say there are plans to do this in the future). The simplest example is to curl a page, pipe it through xq to change it to json and then pipe it again and use jq to manipulate the data.

pip install yq
curl -s https://mysite.xml | xq .

Like xq (and jq) but for html parsing. This tool is handy for manipulating html in the same way you would xml or json.

pip install hq
cat /path/to/file.html | hq '`Hello, ${/html/head/title}!`'

This is a newer tool, with a slightly different approach aimed at helping to automate configurations, especially for things like Kubernetes but should work with most structured data.

This tool works with json, yaml and hcl and can be used in conjunction with Javascript, making it an interesting option.

curl -Lo jk https://github.com/jkcfg/jk/releases/download/0.3.1/jk-darwin-amd64
chmod +x jk
sudo mv jk /usr/local/bin/

// alice.js
const alice = {
  name: 'Alice',
  beverage: 'Club-Mate',
  monitors: 2,
  languages: [
    'python',
    'haskell',
    'c++',
    '68k assembly', // Alice is cool like that!
  ],
};

// Instruct to write the alice object as a YAML file.
export default [
  { value: alice, file: `developers/${alice.name.toLowerCase()}.yaml` },
];

jk generate -v alice.js

ytt

This is basically a simplified templating language that only intends to deal with yaml. The approach the authors took was to create yaml templates and sanbdox/embed Python into the templating engine, allowing users to call on the power of Python inside of their templates.

The easiest way to play around with ytt if you don’t want to clone the repo is to try out the online playground.

curl -Lo ytt https://github.com/k14s/ytt/releases/download/v0.25.0/ytt-darwin-amd64
chmod +x ytt
sudo mv ytt /usr/local/bin/

https://github.com/k14s/ytt.git && cd ytt
ytt -f examples/playground/example-demo/

configula

From the GitHub page, ” Configula is a configuration generation language and processor. It’s goal is to make the programmatic definition of declarative configuration easy and intuitive”.

Similar in some ways to ytt, but instead of embedding Python into the yaml template file, you create a .py file and then render the py file into yaml using the Configula command line tool.

git clone https://github.com/brendandburns/configula
cd configula

# tiny.py
# Define a YAML object where the 'foo' field has the value of evaluating 1 + 2 (e.g. 3)
my_obj = foo: !~ 1 + 2
my_obj.render()

./configula examples/tiny.py

jsonnet

Jsonnet bills itself as a “data templating language for app and tool developers”. This tool was originally created by folks working at Google, and has been around for quite some time now. For some reason always seems to fly underneath the radar but it is super powerful.

This tool is a superset of JSON and allows you to add conditionals, loops and other functions available as part of its standard library. Jsonnet can render itself into json and yaml output.

pip install jsonnet
# or
brew install jsonnet

// example.jsonnet
{
  person1: {
    name: "Alice",
    welcome: "Hello " + self.name + "!",
  },
  person2: self.person1 { name: "Bob" },
}

jsonnet -S example.jsonnet

sed

For (pretty much) everything else, there is Sed. Sed, short for stream editor, has been around forever and is basically a Swiss army knife for manipulating text, and if you have been using *nix for any length of time you have more than likely come across this tool before. From their docs, Sed is “a stream editor is used to perform basic text transformations on an input stream”.

The odds are good that Sed will likely do what you’re looking for if you can’t use one of the aforementioned tools.

Automatically set Terraform environment variables

December 22, 2019December 22, 2019 Josh Reichardt 1 Comment

automatically load environment variables

If you have spent any time managing infrastructure you have probably run into the issue of needing to set environment variables in order to connect to various resources.

For example, the AWS Terraform provider allows you to automatically source local environment variables, which solves the issue of placing secrets in places they should be, ie. writing the keys into configurations or state. This use case is pretty straight forward, you can just set the environment variables once and everything will be able to connect.

The problem arises when environments need to be changed frequently or other configurations requiring secrets or tokens to connect to other resources need to be changed.

There are various tools available to solve the issue of managing and storing secrets for connecting to AWS including aws-vault and aws-env. The main issue is that these tools are opinionated and only work with AWS and aren’t designed to work with Terraform.

Of course this problem isn’t specific to Terraform, but the solution I have discovered using ondir is generic enough that it can be applied to Terraform as well as any other task that involves setting up environment variables or running shell commands whenever a specific directory is changed to or left.

With that said, the ondir tool in combination with Terragrunt, gives you a set of tools that vastly improves the Terraform experience. There is a lot of Terragrunt reference material already so I won’t discuss it much here. The real power of my solution comes from combining ondir with Terragrunt to enhance how you use Terraform.

Note: Before getting into more details of ondir, it is worth mentioning that there are several other similar tools, the most well known of which is called direnv. Direnv provides a few other features and a nice library for completing actions so it is definitely work checking out if you are not looking for a solution specific to Terraform/Terragrunt. Other tools in the space include autoenv, smartcd and a few others.

Direnv should technically work to solve the problem, but when managing and maintaining large infrastructures, I have found it to be more tedious and cumbersome to manage.

Ondir

From the Github page, Ondir is a small program to automate tasks specific to certain directories, so isn’t designed specifically to set environment variables but makes a perfect fit doing so.

Install ondir

# OSX
brew install ondir
# Ubuntu
apt install ondir

Configure ondir

After installing, add the following lines to your ~/.zshrc (or these lines if you are using bash) and restart your shell.

val_ondir() {
  eval "`ondir \"$OLDPWD\" \"$PWD\"`"
}
chpwd_functions=( eval_ondir $chpwd_functions )

Ondir leverages an ~/.ondirrc file to configure the tool. This file basically consists of enter and leave directives for performing actions when specific directories are entered and left. To make this tool useful with Terragrunt, I have setup a simple example below for exporting a database password given a path that gets switched into (if the path ends in stage/database), which is defined in the ~/.ondirrc file. One very nice feature of ondir is that it allows regular expressions to be used to define the enter/leave directives, which makes allows for complex paths to be defined.

enter .+(stage\/database)
    echo "Setting database password"
    export TF_VAR_db_password=mypassword
leave .+(stage\/database)
    unset TF_VAR_db_password

With this logic, the shell should automatically export a database password for Terraform to use whenever you switch to the directory ending in stage/database.

cd ~/project/path/to/stage/database
# Check your environment for the variable
env | grep TF_VAR
TF_VAR_db_password=mypassword

cd ~
# When you leave the directory it should get unset
env | grep TF_VAR

In a slightly more complicated example, you may have a databases.sh script containing more logic for setting up or exporting variables or otherwise setting up environments.

#!/usr/bin/env bash

db_env=$(pwd | awk -F '/' '{print $(NF-1)"/"$NF}')

if [[ "$db_env" == "stage/database" ]]; then
    # do stuff
    export TF_VAR_db_password="mypassword"

else [[ "$db_env" == "prod/database" ]]; then
    # do other stuff
    export TF_VAR_db_password="myotherpassword"
fi

Conclusion

That’s pretty much. Now you can automatically set environment variables for Terraform to use, based on the directory we are currently and no longer need to worry about setting and unsetting environment variables manually.

I HIGHLY recommend checking out direnv as well. It offers slightly different functionality, and some neat features which in some cases may actually be better for specific tasks than ondir. For example, the direnv stdlib has some great helpers to help configure environments cleanly.

For me, using ondir for managing my Terragrunt configs and direnv for managing other projects makes the most sense.

Testing for Deprecated Kubernetes APIs

November 10, 2019November 11, 2019 Josh Reichardt 1 Comment

Kubernetes API changes are coming up and I wanted to make a quick blog post to highlight what this means and show a few of things I have discovered to deal with the changes.

First, there have been some relevant announcements regarding the changes and deprecations recently. The first being the API Depractions in 1.16 announcement, which describes the changes to the API and some of the things to look at and do to fix problems

The next post is the Kubernetes 1.16 release announcement, which contains a section “Significant Changes to the Kubernetes API” that references the deprecation post.

Another excellent resource for learning about how Kubernetes deprecations work is the API deprecation documentation, highlighted in the deprecation post, but not widely shared.

In my opinion, the Kubernetes community really dropped the ball in terms of communicating these changes and missed an opportunity to describe and discuss the problems that these changes will create. I understand that the community is gigantic and it would be impossible to cover every case, but to me, the few blog posts describing the changes and not much other official communication or guides for how to handle and fix the impending problems is a little bit underwhelming.

The average user probably doesn’t pay attention to these blog posts, and there are a lot of old Helm charts out in the wild still, so I’m confident that the incoming changes will create headaches and table flips when people start upgrading. As an example, if you have an old API defined and running in a pre 1.16 cluster, and upgrade without fixing the API version first, APPS IN YOUR CLUSTER WILL BREAK. The good news is that new clusters won’t allow the old API versions, making errors easier to see and deal with.

Testing for and fixing deprecated APIs

With that mini rant out of the way, there is a simple but effective way to test your existing configurations for API compatibility.

Conftest is a nice little tool that helps write tests against structured configuration data, using the Rego language using Open Policy Agent (OPA). Conftest works with many file types including JSON, TOML and HCL, which makes it a great choice for testing a variety of different configurations, but is especially useful for testing Kubernetes YAML configurations.

To get started, install conftest.

wget https://github.com/instrumenta/conftest/releases/download/v0.15.0/conftest_0.15.0_Linux_x86_64.tar.gz
tar xzf conftest_0.15.0_Linux_x86_64.tar.gz
sudo mv conftest /usr/local/bin

Then we can use the handy policy provided by the deprek8 repo to validate the API versions.

curl https://raw.githubusercontent.com/naquada/deprek8/master/policy/deprek8.rego > deprek8.rego
conftest test -p deprek8.rego sample/manifest.yaml

Here’s what a FAIL condition might look like according to what is defined in the rego policy file for an outdated API version.

FAIL - sample/manifest.yaml - Deployment/my-deployment: API extensions/v1beta1 for Deployment is no longer served by default, use apps/v1 instead.

The Rego policy is what actually defines the behavior that Conftest will display and as you can see, it found an issue with the Deployment object defined in the test manifest.

Below is the Rego policy that causes Conftest to spit out the FAILure message. The syntax is clean and easy to follow, so writing and adjusting policies is easy.

_deny = msg {
  resources := ["DaemonSet", "Deployment", "ReplicaSet"]
  input.apiVersion == "extensions/v1beta1"
  input.kind == resources[_]
  msg := sprintf("%s/%s: API extensions/v1beta1 for %s is no longer served by default, use apps/v1 instead.", [input.kind, input.metadata.name, input.kind])
}

Once you know what is wrong with the configuration, you can use the kubectl convert subcommand to fix up the existing deprecated API objects. Again, attempting to create objects using deprecated APIs in 1.16 will be rejected automatically by Kubernetes, so you will only need to deal with converting existing objects in old clusters being upgraded.

From the above error, we know the object type (Deployment) and the version (extensions/v1beta1). With this information we can run the convert command to fix the object.

# General syntax
kubectl convert -f <file> --output-version <group>/<version>

# The --output-version flag allows specifying the API version to upgrade to 
kubectl convert -f sample/manifest.yaml  --output-version apps/v1

# Omitting the --output-version flag will convert to the latest version
kubectl convert -f sample/manifest.yaml

After the existing objects have been converted and any manifest files have been updated you should be safe to upgrade Kubernetes.

Bonus

There was a fantastic episode of TGIK awhile back called Kubernetes API Removal and You that describes in great detail what all of the deprections mean and how to fix them – definitely worth a watch if you have the time.

Conclusion

OPA and testing configurations using tools like conftest and Rego policies is a great way to harden and help standardize configurations. Taken a step further, these configuration testing tools can be extended to test all sorts of other things.

Conftest looks especially promising because of the number of file types that it understands. There is a lot of potential here for doing things like unit testing Kubernetes configuration files and other things like Terraform configs.

I haven’t written any Rego policies yet but the language looks pretty straight forward and easy to deal with. I think that as configurations continue to evolve, tools like Conftest (OPA), Kubeval and Kustomize will gain more traction and help simplify some of the complexities of Kubernetes.

Quickly securing local secrets

September 15, 2019September 16, 2019 Josh Reichardt 1 Comment

One thing I have run into recently and have been thinking about a little bit lately, is a simple way to hide environment variables that contain sensitive information. For example, when working in a local environment, if you need access to a secret like an oauth token or some authentication method to an API, the first inclination is usually to just hard code the secret contents into your local bash/zsh profile so that it can be read anytime you need access to it. This method obviously will work but if the filesystem itself isn’t encrypted, the secret can easily be leaked and for a small amount of effort I believe I have found an effective way of shrinking the visibility of these secrets.

Inspired by the aws-vault tool which is a simple but secure way of storing local AWS credentials in environment variables using a local password store, in this post I will show you a quick and dirty way to add an extra layer of security to your (other) local environment by injecting sensitive secrets stored in an encrypted location (password store) into your local terminal. This method works for both OSX and Linux and is just a few lines of configuration and examples for both OSes are shown below.

In OSX the keychain is a good starting place for storing and retrieving secrets and in Linux the combination of GPG and the standard unix password manager “pass” work well together. Pass also works on OSX if you aren’t a fan of keychain.

Below are steps for storing and retrieving local secrets using the Linux pass tool. There are installation instructions and full documentation for how to use the tool in the link above. It should also be noted that the system needs to have GPG installed in order to write and read secrets.

One you have GPG configured, create the password store. I am skipping most of the GPG configuration because there is a lot to know, the command below should be enough to get things started. If you already have GPG set up and configured you can skip the setup.

Set up GPG and pass.

gpg2 --full-gen-key # follow prompts to create a gpg store with defaults
pass init <email> # use the same email address used with gpg
pass git init # optionally set pass up as a git repo

To create/edit a secret.

#pass insert/edit <secret>
pass insert mysecret
pass edit mysecret

Pass allows for hierarchies but in the example we are just going to put the secret at the top level. The command above will open the default editor. After closing the editor, the password will be written to an encrypted file in ~/.password-store. Once you have added the password you can show the contents of the newly added secret.

To read a secret into the terminal.

#pass show <secret>
pass show mysecret

You can also quickly list all of your secrets.

pass ls

Now that we have a created secret, we can write a little bash function to pull out the contents of the password and export them as an environment variable when the shell gets sourced. Put the following snippet into your ~/.bashrc, ~/.zshrc or ~/.bashprofile to read secrets.

get_password () {
  pass show "$1"
}

A similar result can be achieved in OSX using the “security” command line tool.

get_password () {
  security find-generic-password -ga "$1" -w
}

In your shell configuration file you can simply export the result of calling the get_password() function into an environment variable.

export MYSECRET="$(get_password mysecret)"

Source the shell profile to pickup the new changes. After that, you should now see the contents of the secret inside an environment variable in your terminal.

source ~/.bashrc
env | grep MYSECRET

Conclusion

Obviously this isn’t a perfect way to secure your environment since the secret is available to anyone who is able to connect to this user so make sure you practice good security in as many other ways as possible.

What this method does do though is cuts down the amount of sensitive information that can be gleaned from a user account by ensuring that shell secrets are encrypted at rest and unavailable as clear text.

Build a Pine64 Kubernetes Cluster with k3os

July 14, 2019July 14, 2019 Josh Reichardt Leave a comment

Kubernetes (k3os) arm64 cluster with custom 3D printed case

The k3os project was recently announced and I finally got a chance to test it out. k3os greatly simplifies the steps needed to create a Kubernetes cluster along with its counterpart, k3s, to reduce the overhead of running Kubernetes clusters. Paired with Rancher for the UI, all of these components make for an even better option. You can even run Rancher in your (arm64) k3os cluster via the Rancher Helm chart now.

Instead of using Etcd, k3s opts to use SQLite by default and does some other magic to reduce extra Kubernetes bloat and simplify management. Check here for more about k3s and how it works and how to run it.

k3os replaces some complicated OS components with much simpler ones. For example, instead of using Systemd it uses OpenRC, instead of Docker it uses containerd, it also leverages connman for configuring network components and it doesn’t use a package manager.

The method I am showing in this post uses the k3os overlay installation, which is detailed here. The reason for this choice is because the pine64 boards use u-boot to boot the OS and so special steps are needed to accommodate for the way this process is handled. The upside of this method is that these instructions should pretty much work for any of the Pi form factor boards, including the newly released Raspberry Pi4, with minimal changes.

Setup

If you haven’t downloaded and imaged your Pine64 yet, I like to use the ayufan images, which can be found here. You can easily write these images to a microSD card on OSX using something like Etcher.

Assuming the node is connected to your network, you can SSH into it.

ssh rock64@rock64 # or use the ip, password is rock64

When using the overlay installation, the first step is to download the k3os rootfs and lay it down on the host. This step applies to all nodes in the cluster.

curl -sL https://github.com/rancher/k3os/releases/download/v0.2.1/k3os-rootfs-arm64.tar.gz | tar --strip-components=1 -zxvf - -C

The above command is installing v0.2.1 which is the most current version as of writing this, so make sure to check if there is a newer version available.

After installing, lay down the following configurations into /k3os/system/config.yaml, modifying as needed. After the machine is rebooted this path will become read only so if you need to change the configuration again you will need to edit /etc/fstab to change the location to be writable again.

Server node

ssh_authorized_keys:
- ssh-rsa <your-public-ssh-key-to-login>
hostname: k3s-master

k3os:
  data_sources:
  - cdrom
  dns_nameservers:
  - 192.168.1.1 # update this to your local or public DNS server if needed
  ntp_servers:
  - 0.us.pool.ntp.org
  - 1.us.pool.ntp.org
  password: rancher
  token: <TOKEN>

The k3s config will be written to /etc/rancher/k3s/k3s.yaml on this node so make sure to grab it if you want to connect the cluster from outside this node. Reboot the machine to boot to the new filesystem and you should be greeted with the k3os splash screen.

Agent node

The agent uses nearly the same config, with the addition of the server_url. Just point the agent nodes to the server/master and you should be good to go. Again, reboot after creating the config and the host should boot to the new filesystem and everything should be ready.

ssh_authorized_keys:
- ssh-rsa <your-public-ssh-key-to-login>
hostname: k3s-node-1

k3os:
  data_sources:
  - cdrom
  dns_nameservers:
  - 192.168.1.1 # update this to your local or public DNS server if needed
  ntp_servers:
  - 0.us.pool.ntp.org
  - 1.us.pool.ntp.org
  password: rancher
  server_url: https://<server-ip-or-hostname>:6443
  token: <TOKEN>

You can do a lot more with the bootstrap configurations, such as setting labels or environment variables. Some folks in the community have had luck getting the wifi configuration working on the RPi4’s out of the box, but I haven’t been able to get it to work yet on my Pine64 cluster. Check the docs for more details on the various configuration options.

After the nodes have been rebooted and configs applied, the cluster “should just work”. You can check that the cluster is up using k3s using the kubectl passthrough command (checking from the master node below).

k3s-master [~]$ k3s kubectl get nodes
NAME         STATUS   ROLES    AGE     VERSION
k3s-master   Ready    <none>   6d1h    v1.14.1-k3s.4
k3s-node-1   Ready    <none>   7m10s   v1.14.1-k3s.4

NOTE: After installing the overlay filesystem there will be no package manager and no obvious way to upgrade the kernel so use this guide only for testing purposes. The project is still very young and a number of things still need to be added, including update mechanisms and HA. Be sure to follow the k3os issue tracker and Rancher Slack (#k3os channel) for updates and developments.

Conclusion

This is easily the best method I have found so far for getting a Kubernetes cluster up and running, minus the few caveats mentioned above, which I believe will be resolved very soon. I have been very impressed with how simple and easy it has been to configure and use. The next step for me is to figure out how to run Rancher and start working on some configurations for running workloads on the cluster. I will share more on that project in another post.

There are definitely some quirks to getting this setup working for the Pi and Pine64 based boards, but aren’t major problems by any means.

References

This post was heavily inspired by this gist for getting the overlay installation method working on Raspberry Pi.