Untangling CLI Text Processing Tools

Update (2/7/2020): Added jc

The landscape of command line driver text manipulation and processing tools is somewhat large and confusing, with more and more tools emerging all the time. Because I am having trouble keeping them all in my head, I decided to make a little reference guide to help remember which tool to choose for the correct task at hand.

  • jq
  • jc
  • yq (both python and go versions)
  • oq
  • xq
  • hq
  • jk
  • ytt
  • configula
  • jsonnet
  • sed (for everything else)

jq

Reach for jq first when you need to do any kind of processing with JSON files.

From the website, “jq is like sed for JSON data – you can use it to slice and filter and map and transform structured data with the same ease that sedawkgrep and friends let you play with text.”

sudo apt-get install jq
# or
brew install jq

# consume json and output as unchanged json
curl 'https://api.github.com/repos/stedolan/jq/commits?per_page=5' | jq '.'

jc

This project looks very interesting and is a refreshing thing to see in the Linux world. JC is basically a way to create structured objects (ala PowerShell) as JSON output from running various Linux commands. And from the GitHub repo, ” This tool serializes the output of popular gnu linux command line tools and file types to structured JSON output. This allows piping of output to tools like jq”.

Transforming data into structured objects can massively simplify interacting with them by pairing the output with jq to interact with.

pip3 install --upgrade jc
df | jc --df -p
[
  {
    "filesystem": "devtmpfs",
    "1k_blocks": 1918816,
    "used": 0,
    "available": 1918816,
    "use_percent": 0,
    "mounted_on": "/dev"
  },
  {
    "filesystem": "tmpfs",
    "1k_blocks": 1930664,
    "used": 0,
    "available": 1930664,
    "use_percent": 0,
    "mounted_on": "/dev/shm"
  },
  ...
]

yq (Python) (Go)

This tool can be confusing because there is both a Python version and a Go version. On top of that, the Python version includes its own version of xq, which is different than the standalone xq tool.

The main differences between the Python and Go version is that the Python version can deal with both yaml and xml while the Go version is meant to be used as a command line tool to deal with only yaml.

pip install yq
cat input.yml | yq -y .foo.bar

oq

From the website, oq is “A performant, portable jq wrapper thats facilitates the consumption and output of formats other than JSON; using jq filters to transform the data”.

The claim to fame that oq has is that it is very similar to jq but works better with other data formats, including xml and yaml. For example, you can read in some xml (and others), apply some filters, and output to yaml (and others). This flexibility makes oq a good option if you need to deal with different data formats oustide of JSON.

snap install oq
# or
brew tap blacksmoke16/tap && brew install oq

# consume json and output xml
echo '{"name": "Jim"}' | oq -o xml .

xq

From the Github page, “Apply XPath expressions to XML, like jq does for JSONPath and JSON”.

The coolest use case I have found for xq so far is taking in an xml file and outputting it into a json file, which surprising I haven’t found another tool that can do this (oc authors say there are plans to do this in the future). The simplest example is to curl a page, pipe it through xq to change it to json and then pipe it again and use jq to manipulate the data.

pip install yq
curl -s https://mysite.xml | xq .

hq

Like xq (and jq) but for html parsing. This tool is handy for manipulating html in the same way you would xml or json.

pip install hq
cat /path/to/file.html | hq '`Hello, ${/html/head/title}!`'

jk

This is a newer tool, with a slightly different approach aimed at helping to automate configurations, especially for things like Kubernetes but should work with most structured data.

This tool works with json, yaml and hcl and can be used in conjunction with Javascript, making it an interesting option.

curl -Lo jk https://github.com/jkcfg/jk/releases/download/0.3.1/jk-darwin-amd64
chmod +x jk
sudo mv jk /usr/local/bin/

// alice.js
const alice = {
  name: 'Alice',
  beverage: 'Club-Mate',
  monitors: 2,
  languages: [
    'python',
    'haskell',
    'c++',
    '68k assembly', // Alice is cool like that!
  ],
};

// Instruct to write the alice object as a YAML file.
export default [
  { value: alice, file: `developers/${alice.name.toLowerCase()}.yaml` },
];

jk generate -v alice.js

ytt

This is basically a simplified templating language that only intends to deal with yaml. The approach the authors took was to create yaml templates and sanbdox/embed Python into the templating engine, allowing users to call on the power of Python inside of their templates.

The easiest way to play around with ytt if you don’t want to clone the repo is to try out the online playground.

curl -Lo ytt https://github.com/k14s/ytt/releases/download/v0.25.0/ytt-darwin-amd64
chmod +x ytt
sudo mv ytt /usr/local/bin/

https://github.com/k14s/ytt.git && cd ytt
ytt -f examples/playground/example-demo/

configula

From the GitHub page, ” Configula is a configuration generation language and processor. It’s goal is to make the programmatic definition of declarative configuration easy and intuitive”.

Similar in some ways to ytt, but instead of embedding Python into the yaml template file, you create a .py file and then render the py file into yaml using the Configula command line tool.

git clone https://github.com/brendandburns/configula
cd configula

# tiny.py
# Define a YAML object where the 'foo' field has the value of evaluating 1 + 2 (e.g. 3)
my_obj = foo: !~ 1 + 2
my_obj.render()

./configula examples/tiny.py

jsonnet

Jsonnet bills itself as a “data templating language for app and tool developers”. This tool was originally created by folks working at Google, and has been around for quite some time now. For some reason always seems to fly underneath the radar but it is super powerful.

This tool is a superset of JSON and allows you to add conditionals, loops and other functions available as part of its standard library. Jsonnet can render itself into json and yaml output.

pip install jsonnet
# or
brew install jsonnet

// example.jsonnet
{
  person1: {
    name: "Alice",
    welcome: "Hello " + self.name + "!",
  },
  person2: self.person1 { name: "Bob" },
}

jsonnet -S example.jsonnet

sed

For (pretty much) everything else, there is Sed. Sed, short for stream editor, has been around forever and is basically a Swiss army knife for manipulating text, and if you have been using *nix for any length of time you have more than likely come across this tool before. From their docs, Sed is “a stream editor is used to perform basic text transformations on an input stream”.

The odds are good that Sed will likely do what you’re looking for if you can’t use one of the aforementioned tools.

Read More

Kubernetes Tips and Tricks

I have been getting more familiar with Kubernetes in the past few months and have uncovered some interesting capabilities that I had no idea existed when I started, which have come in handy in helping me solve some interesting and unique problems.  I’m sure there are many more tricks I haven’t found, so please feel free to let me know of other tricks you may know of.

Semi related; if you haven’t already checked it out, I wrote a post awhile ago about some of the useful kubectl tricks I have discovered.  The CLI has improved since then so I’m sure there are more and better tricks now but it is still a good starting point for new users or folks that are just looking for more ideas of how to use kubectl.  Again, let me know of any other useful tricks and I will add them.

Kubernetes docs

The Kubernetes community has somewhat of a love hate relationship with the documentation, although that relationship has been getting much better over time and continues to improve.  Almost all of the stuff I have discovered is scattered around the documentation, the main issue is that it a little difficult to find unless you know what you’re looking for.  There is so much information packed into these docs and so many features that are tucked away that aren’t obvious to newcomers.  The docs have been getting better and better but there are still a few gaps in examples and general use cases that are missing.  Often the “why” of using various features is still sometimes lacking.

Another point I’d like to quickly cover is the API reference documentation.  When you are looking for some feature or functionality and the main documentation site fails, this is the place to go look as it has everything that is available in Kubernetes.  Unfortunately the API reference is also currently a challenge to use and is not user friendly (especially for newcomers), so if you do end up looking through the API you will have to spend some time to get familiar with things, but it is definitely worth reading through to learn about capabilities you might not otherwise find.

For now, the best advice I have for working with the docs and testing functionality is trial and error.  Katacoda is an amazing resource for playing around with Kubernetes functionality, so definitely check that out if you haven’t yet.

Simple leader election

Leader election built on Kubernetes is really neat because it buys you a quick and dirty way to do some pretty complicated tasks.  Usually, implementing leader election requires extra software like ZooKeeper, etcd, Consul or some other distributed key/value store for keeping track of consensus, but it is built into Kubernetes, so you don’t have much extra work to get it working.

Leader election piggy backs off the same etcd Kubernetes uses as well as Kubernetes annotations, which give users a robust way to do distributed tasks without having to recreate the wheel for doing complicated leader elections.

Basically, you can deploy the leader-elector as a sidecar with any app you deploy.  Then, any container in the pod that’s interested in who is the master can can check by visiting the http endpoint (localhost:4044 by default) and they will get back some json with the current leader.

Shared process namespace across namespaces

This is a beta feature currently (as of 1.13) so is enabled now by default.  This one is interesting because it allows you to share a PID between containers.  Unfortunately the docs don’t really tell you why this feature is useful.

Basically, if you add shareProcessNamespace: true to your pod spec, you turn on the ability to share a PID across containers. This allows you to do things like changing a configuration in one container, sending a SIGHUP, and then reloading that configuration in another container.

For example, running a sidecars that controls configuration files or for reaping orphaned zombie processes.

apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  shareProcessNamespace: true
  containers:
  - name: nginx
    image: nginx
  - name: shell
    image: busybox
    securityContext:
      capabilities:
        add:
        - SYS_PTRACE
    stdin: true
    tty: true

Custom termination messages

Custom termination messages can be useful when debugging tricky situations.

You can actually customize pod terminations by using the terminationMessagePolicy which can control how terminations get outputted. For example, by using FallbackToLogsOnError you can tell Kubernetes to use container log output if the termination message is empty and the container exited with error.

Likewise, you can specify the terminationMessagePath spec to customize the path to a log file for specifying successes and failures when a pod terminates.

apiVersion: v1
kind: Pod
metadata:
  name: msg-path-demo
spec:
  containers:
  - name: msg-path-demo-container
    image: debian
    terminationMessagePath: "/tmp/my-log"

Container lifecycle hooks

Lifecycle hooks are really useful for doing things either after  a container has started (such as joining a cluster) or for running commands/code for cleanup when a container is stopped (such as leaving a cluster).

Below is a straight forward example taken from the docs that writes a message after a pod starts and sends a quit signal to nginx when the pod is destroyed.

apiVersion: v1
kind: Pod
metadata:
  name: lifecycle-demo
spec:
  containers:
  - name: lifecycle-demo-container
    image: nginx
    lifecycle:
      postStart:
        exec:
          command: ["/bin/sh", "-c", "echo Hello from the postStart handler > /usr/share/message"]
      preStop:
        exec:
          command: ["/usr/sbin/nginx","-s","quit"]

Kubernetes downward API

This one is probably more known, but I still think it is useful enough to add to the list.  The downward API basically allows you to grab all sorts of useful metadata information about containers, including host names and IP addresses.  The downward API can also be used to retrieve information about resources for pods.

The simplest example to show off the downward API is to use it to configure a pod to use the hostname of the node as an environment variable.

apiVersion: v1
kind: Pod
spec:
  containers:
    - name: test-container
      image: k8s.gcr.io/busybox
      command: [ "sh", "-c"]
      args:
      - while true; do
          echo -en '\n';
          printenv MY_NODE_NAME
          sleep 10;
        done;
      env:
        - name: MY_NODE_NAME
          valueFrom:
            fieldRef:
              fieldPath: spec.nodeName

Injecting a script into a container from a configmap

This is a useful trick when you want to add a layer on top of a Docker container but don’t necessarily want to build either a custom image or update an existing image.  By injecting the script as a configmap directly into the container you can augment a Docker image to do basically any extra work you need it to do.

The only caveat is that in Kubernetes, configmaps are by default not set to be executable.

In order to make your script work inside of Kubernetes you will simply need to add defaultMode: 0744 to your configmap volume spec. Then simply mount the config as volume like you normally would and then you should be able to run you script as a normal command.

...
volumeMounts:
- name: wrapper
mountPath: /scripts
volumes:
- name: wrapper
configMap:
name: wrapper
defaultMode: 0744
...

Using commands as liveness/readiness checks

This one is also pretty well known but often forgotten.  Using commands a health checks is a nice way to check that things are working.  For example, if you are doing complicated DNS things and want to check if DNS has updated you can use dig.  Or if your app updates a file when it becomes healthy, you can run a command to check for this.

readinessProbe:
exec:
command:
- cat
- /tmp/healthy
initialDelaySeconds: 5
periodSeconds: 5

Host aliases

Host aliases in Kubernetes offer a simple way to easily update the /etc/hosts file of a container.  This can be useful for example if a localhost name needs to be mapped to some DNS name that isn’t handled by the DNS server.

apiVersion: v1
kind: Pod
metadata:
  name: hostaliases-pod
spec:
  restartPolicy: Never
  hostAliases:
  - ip: "127.0.0.1"
    hostnames:
    - "foo.local"
    - "bar.local"
  containers:
  - name: cat-hosts
    image: busybox
    command:
    - cat
    args:
    - "/etc/hosts"

Conclusion

As mentioned, these are just a few gems that I have uncovered, I’m sure there are a lot of other neat tricks out there.  As I get more experience using Kubernetes I will be sure to update this list.  Please let me know if there are things that should be on here that I missed or don’t know about.

Read More

Python virtualenv Notes

Virtual environments are really useful for maintaining different packages and for separating different environments without getting your system messy.  In this post I will go over some of the various virtual environment tricks that I seem to always forget if I haven’t worked with Python in awhile.

This post is meant to be mostly a reference for remembering commands and syntax and other helpful notes.  I’d also like to mention that these steps were all tested on OSX, I haven’t tried on Windows so don’t know if it is any different.

Working with virtual environments

There are a few pieces in order to get to get started.  First, the default version of Python that ships with OSX is 2.7, which is slowly moving towards extinction.  Unfortunately, it isn’t exactly obvious how to replace this version of Python on OSX.

Just doing a “brew install python” won’t actually point the system at the newly installed version.  In order to get Python 3.x working correctly, you need to update the system path and place Python3 there.

export PATH="/usr/local/opt/python/libexec/bin:$PATH"

You will want to put the above line into your bashrc or zshrc (or whatever shell profile you use) to get the brew installed Python onto your system path by default.

Another thing I discovered – in Python 3 there is a built in command for creating virtual environments, which alleviates the need to install the virtualenv package.

Here is the command in Python 3 the command to create a new virtual environment.

python -m venv test

Once the environment has been created, it is very similar to virtualenv.  To use the environment, just source it.

source test/bin/activate

To deactivate the environment just use the “deactivate” command, like you would in virutalenv.

The virtualenv package

If you like the old school way of doing virtual environments you can still use the virtualenv package for managing your environments.

Once you have the correct version of Python, you will want to install the virtualenv package on your system globally in order to work with virtual environments.

sudo pip install virtualenvwrapper

You can check to see if the package was installed correctly by running virtualenv -h.  There are a number of useful virtualenv commands listed below for working with the environments.

Make a new virtual env

mkvirtualenv test-env

Work on a virtual env

workon test-env

Stop working on a virtual env

(when env is actiave) deactive

List environments

lsvirtualenv

Remove a virtual environment

rmvirtualenv test-env

Create virtualenv with specific version of python

mkvirtualenv -p $(which pypy) test2-env

Look at where environments are stored

ls ~/.virtualenvs

I’ll leave it here for now.  The commands and tricks I (re)discovered were enough to get me back to being productive with virtual environments.  If you have some other tips or tricks feel free to let me know.  I will update this post if I find anything else that is noteworthy.

Read More

Toggle the Vscode sidebar with vim bindings

There is an annoying behavior that you might run across in the Windows version of Vscode if you are using the Vsvim plugin when trying to use the default hotkey <ctrl+b> to toggle the sidebar open and closed.  As far as I can tell, this glitch doesn’t exist in the OSX version of Vscode.

In vim, the shortcut for this toggle is actually used to scroll the page buffer up one screen.  Obviously this makes behavior is the default for Vim but it also is annoying to not be able to open and close the sidebar.  The solution is a simple hotkey remap for the keyboard shortcut in Vscode.

To do this, pull open the keyboard shortcuts by either navigating to File -> Preferences -> Keyboard shortcuts

Or by using the command pallet (ctrl + shift + p) and searching for “keyboard shortcut”.

Once the keyboard shortcuts have been pulled open just do a search for “sidebar” and you should see the default key binding.

Just click in the “Toggle Side Bar Visibility” box and remap the keybinding to Ctrl+Shift+B (since that doesn’t get used by anything by default in Vim).  This menu is also really useful for just discovering all of the different keyboard shortcuts in Vscode.  I’m not really a fan of going crazy and totally customizing hotkeys in Vscode just because it makes support much harder but in this case it was easy enough and works really well with my workflow.

Read More

Fix the JenkinsAPI No valid crumb error

If you are working with the Python based JenkinsAPI library you might run into the No valid crumb was included in the request error.  The error below will probably look familiar if you’ve run into this issue.

Traceback (most recent call last):
 File "myscript.py", line 47, in <module>
 deploy()
 File "myscript.py", line 24, in deploy
 jenkins.build_job('test')
 File "/usr/local/lib/python3.6/site-packages/jenkinsapi/jenkins.py", line 165, in build_job
 self[jobname].invoke(build_params=params or {})
 File "/usr/local/lib/python3.6/site-packages/jenkinsapi/job.py", line 209, in invoke
 allow_redirects=False
 File "/usr/local/lib/python3.6/site-packages/jenkinsapi/utils/requester.py", line 143, in post_and_confirm_status
 response.text.encode('UTF-8')
jenkinsapi.custom_exceptions.JenkinsAPIException: Operation failed. url=https://jenkins.example.com/job/test/build, data={'json': '{"parameter": [], "statusCode": "303", "redirectTo": "."}'}, headers={'Content-Type': 'application/x-www-form-urlencoded'}, status=403, text=b'<html>\n<head>\n<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>\n<title>Error 403 No valid crumb was included in the request</title>\n</head>\n<body><h2>HTTP ERROR 403</h2>\n<p>Problem accessing /job/test/build. Reason:\n<pre> No valid crumb was included in the request</pre></p><hr><a href="http://eclipse.org/jetty">Powered by Jetty:// 9.4.z-SNAPSHOT</a><hr/>\n\n</body>\n</html>\n'

It is good practice to enable additional security in Jenkins by turning on the “Prevent Cross Site Forgery exploits” option in the security settings, so if you see this error it is a good thing.  The below example shows this security feature in Jenkins.

enable xss protection

The Fix

This error threw me off at first, but it didn’t take long to find a quick fix.  There is a crumb_requester class in the jenkinsapi that you can use to create the crumbed auth token.  You can use the following example as a guideline in your own code.

from jenkinsapi.jenkins import Jenkins
from jenkinsapi.utils.crumb_requester import CrumbRequester

JENKINS_USER = 'user'
JENKINS_PASS = 'pass'
JENKINS_URL = 'https://jenkins.example.com'

# We need to create a crumb for the request first
crumb=CrumbRequester(username=JENKINS_USER, password=JENKINS_PASS, baseurl=JENKINS_URL)

# Now use the crumb to authenticate against Jenkins
jenkins = Jenkins(JENKINS_URL, username=JENKINS_USER, password=JENKINS_PASS, requester=crumb)

...

The code looks very similar to creating a normal Jenkins authentication object, the only difference being that we create and then pass in a crumb for the request, rather than just a username/password combination.  Once the crumbed authentication object has been created, you can continue writing your Python code as you would normally.  If you’re interested in learning more about crumbs and CSRF you can find more here, or just Google for CSRF for more info.

This issue was slightly confusing/annoying, but I’d rather deal with an extra few lines of code and know that my Jenkins server is secure.

Read More