Untangling CLI Text Processing Tools

Update (2/7/2020): Added jc

The landscape of command line driver text manipulation and processing tools is somewhat large and confusing, with more and more tools emerging all the time. Because I am having trouble keeping them all in my head, I decided to make a little reference guide to help remember which tool to choose for the correct task at hand.

  • jq
  • jc
  • yq (both python and go versions)
  • oq
  • xq
  • hq
  • jk
  • ytt
  • configula
  • jsonnet
  • sed (for everything else)

jq

Reach for jq first when you need to do any kind of processing with JSON files.

From the website, “jq is like sed for JSON data – you can use it to slice and filter and map and transform structured data with the same ease that sedawkgrep and friends let you play with text.”

sudo apt-get install jq
# or
brew install jq

# consume json and output as unchanged json
curl 'https://api.github.com/repos/stedolan/jq/commits?per_page=5' | jq '.'

jc

This project looks very interesting and is a refreshing thing to see in the Linux world. JC is basically a way to create structured objects (ala PowerShell) as JSON output from running various Linux commands. And from the GitHub repo, ” This tool serializes the output of popular gnu linux command line tools and file types to structured JSON output. This allows piping of output to tools like jq”.

Transforming data into structured objects can massively simplify interacting with them by pairing the output with jq to interact with.

pip3 install --upgrade jc
df | jc --df -p
[
  {
    "filesystem": "devtmpfs",
    "1k_blocks": 1918816,
    "used": 0,
    "available": 1918816,
    "use_percent": 0,
    "mounted_on": "/dev"
  },
  {
    "filesystem": "tmpfs",
    "1k_blocks": 1930664,
    "used": 0,
    "available": 1930664,
    "use_percent": 0,
    "mounted_on": "/dev/shm"
  },
  ...
]

yq (Python) (Go)

This tool can be confusing because there is both a Python version and a Go version. On top of that, the Python version includes its own version of xq, which is different than the standalone xq tool.

The main differences between the Python and Go version is that the Python version can deal with both yaml and xml while the Go version is meant to be used as a command line tool to deal with only yaml.

pip install yq
cat input.yml | yq -y .foo.bar

oq

From the website, oq is “A performant, portable jq wrapper thats facilitates the consumption and output of formats other than JSON; using jq filters to transform the data”.

The claim to fame that oq has is that it is very similar to jq but works better with other data formats, including xml and yaml. For example, you can read in some xml (and others), apply some filters, and output to yaml (and others). This flexibility makes oq a good option if you need to deal with different data formats oustide of JSON.

snap install oq
# or
brew tap blacksmoke16/tap && brew install oq

# consume json and output xml
echo '{"name": "Jim"}' | oq -o xml .

xq

From the Github page, “Apply XPath expressions to XML, like jq does for JSONPath and JSON”.

The coolest use case I have found for xq so far is taking in an xml file and outputting it into a json file, which surprising I haven’t found another tool that can do this (oc authors say there are plans to do this in the future). The simplest example is to curl a page, pipe it through xq to change it to json and then pipe it again and use jq to manipulate the data.

pip install yq
curl -s https://mysite.xml | xq .

hq

Like xq (and jq) but for html parsing. This tool is handy for manipulating html in the same way you would xml or json.

pip install hq
cat /path/to/file.html | hq '`Hello, ${/html/head/title}!`'

jk

This is a newer tool, with a slightly different approach aimed at helping to automate configurations, especially for things like Kubernetes but should work with most structured data.

This tool works with json, yaml and hcl and can be used in conjunction with Javascript, making it an interesting option.

curl -Lo jk https://github.com/jkcfg/jk/releases/download/0.3.1/jk-darwin-amd64
chmod +x jk
sudo mv jk /usr/local/bin/

// alice.js
const alice = {
  name: 'Alice',
  beverage: 'Club-Mate',
  monitors: 2,
  languages: [
    'python',
    'haskell',
    'c++',
    '68k assembly', // Alice is cool like that!
  ],
};

// Instruct to write the alice object as a YAML file.
export default [
  { value: alice, file: `developers/${alice.name.toLowerCase()}.yaml` },
];

jk generate -v alice.js

ytt

This is basically a simplified templating language that only intends to deal with yaml. The approach the authors took was to create yaml templates and sanbdox/embed Python into the templating engine, allowing users to call on the power of Python inside of their templates.

The easiest way to play around with ytt if you don’t want to clone the repo is to try out the online playground.

curl -Lo ytt https://github.com/k14s/ytt/releases/download/v0.25.0/ytt-darwin-amd64
chmod +x ytt
sudo mv ytt /usr/local/bin/

https://github.com/k14s/ytt.git && cd ytt
ytt -f examples/playground/example-demo/

configula

From the GitHub page, ” Configula is a configuration generation language and processor. It’s goal is to make the programmatic definition of declarative configuration easy and intuitive”.

Similar in some ways to ytt, but instead of embedding Python into the yaml template file, you create a .py file and then render the py file into yaml using the Configula command line tool.

git clone https://github.com/brendandburns/configula
cd configula

# tiny.py
# Define a YAML object where the 'foo' field has the value of evaluating 1 + 2 (e.g. 3)
my_obj = foo: !~ 1 + 2
my_obj.render()

./configula examples/tiny.py

jsonnet

Jsonnet bills itself as a “data templating language for app and tool developers”. This tool was originally created by folks working at Google, and has been around for quite some time now. For some reason always seems to fly underneath the radar but it is super powerful.

This tool is a superset of JSON and allows you to add conditionals, loops and other functions available as part of its standard library. Jsonnet can render itself into json and yaml output.

pip install jsonnet
# or
brew install jsonnet

// example.jsonnet
{
  person1: {
    name: "Alice",
    welcome: "Hello " + self.name + "!",
  },
  person2: self.person1 { name: "Bob" },
}

jsonnet -S example.jsonnet

sed

For (pretty much) everything else, there is Sed. Sed, short for stream editor, has been around forever and is basically a Swiss army knife for manipulating text, and if you have been using *nix for any length of time you have more than likely come across this tool before. From their docs, Sed is “a stream editor is used to perform basic text transformations on an input stream”.

The odds are good that Sed will likely do what you’re looking for if you can’t use one of the aforementioned tools.

Josh Reichardt

Josh is the creator of this blog, a system administrator and a contributor to other technology communities such as /r/sysadmin and Ops School. You can also find him on Twitter and Facebook.