Building Multiarch Conda-Forge Recipes

As part of my adventures in building a 100% Arm64 Kubernetes cluster, I recently tried to build an Arm64 Jupyterhub Docker imageto run in the cluster. To my surprise, there doesn’t seems to be any “official” Jupyterhub arm64 Docker images out there, so I decided to set out and create one.

In the process of building my image, I almost immediately hit a stumbling block in that the Docker image uses the Conda package manager and several Conda packages for its build. The problem is that several of these packages have not yet been built to work on alternate architectures, e.g. Arm64, and others. So I went off down into the rabbit hole of seeing how hard it would be to add this support for these packages in order to get the Jupyterhub Docker image working.

The first stop on this journey was to conda-forge to look at the multiarch support. If you aren’t familiar (I wasn’t), conda-forge bills itself as a large Github community for creating and building Conda packages.

The first thing to look at when adding support to an existing package is getting familiar with conda-smithy, which is the tool responsible for setting up and building all of the various conda-forge “recipes”. There are generic instructions for using conda-smithy here.

As a fun side note, there is no “native” Arm64 build infrastructure for creating packages. The current builds use QEMU to emulate aarch64 (arm64) using Azure pipelines. This has some issues so while I was down in the rabbit hole I decided to contribute a PR to help get native arm64 builds added. The work isn’t yet complete, it still needs to be hooked up to CI, so if you want to help out feel free to let me know or just open a PR in the conda-smithy repo.

Multiarch support

With the housekeeping out of the way, we can now look at how to actually add the multiarch support for a package.

First, fork and clone the desired recipe.  In this example I am adding arm64 support to the pycurl recipe as it is one of the Conda package dependencies that I need to build Jupyterhub for Arm64.

git clone https://github.com/conda-forge/pycurl-feedstock.git

Edit conda-forge.yml and add the following line to the bottom.

provider: {linux_aarch64: default, linux_ppc64le: default}

If you are just adding support for new architectures like I am here, you will need to bump the build number.  This can be found recipe/meta.yml, and there are also instruction for doing this.

…
build:
  number: 0
…

Just change this value to 1. Next, install conda smithy if you don’t have it already.

conda install conda-smithy

And then you can render out all the new files needed for the various builds.

conda-smithy rerender

Add the generated files to a new (forked) branch of the recipe.

git add .
git commit -m "Add multiarch support"
git push

Then open up a PR to the conda-forge repo with the details.  Once the PR has been open a series of checks should kick off to build the recipe for the various architectures.

CI checks

If everything is green you are good to go.  Maintainers are usually pretty good about merging in changes, but if you need to, you can ping an admin to get help.

You can also tell the build to rerun if it fails using the “@conda-forge-admin, please rerender” command.

You can find more details about what all the bot can do here.

Conclusion

Conda-forge provides some nifty tools for large scale automation and makes it super easy for outsiders to contribute to the community. If you find a missing, outdated or package lacking multiarch support on the Anaconda repo (which includes packages contributed by conda-forge along with many others), definitely think about contributing. The process of adding changes is easy and the conda-forge community is growing all the time.

Read More

Manually Reset Windows Subsystem for Linux

The problem is described in more detail in the link below and there are a tons of posts in this thread but it was an arduous process and there wasn’t a lot of other information about this problem so I figured I would write up a quick summary of how to fix this for the case where the instructions aren’t quite enough.

https://github.com/Microsoft/WSL/issues/473#issuecomment-224422589

Bascially, you need to run the lxrun command to reset Windows subsystem for Linux. The intructions were basically the following.

lxrun /uninstall /full /y
Reboot
Open an admin prompt or through explorer, delete all the content under the %localappdata%\lxss directory.
Install using bash.exe or "LxRun.exe /install"

However, something strange happened and every time I tried to do this I got a neat little error.

PS C:\Users\jmreicha\AppData\Local> LxRun /install /y
Warning: lxrun.exe is only used to configure the legacy Windows Subsystem for Linux distribution.
Distributions can be installed by visiting the Microsoft Store:
https://aka.ms/wslstore

This will install Ubuntu on Windows, distributed by Canonical and licensed under its terms available here:
https://aka.ms/uowterms

Error: 0x80070005

Likewise, I couldn’t reinstall the Windows subsystem using other methods, I just got the same exact error. I was poking around a little bit more and found that the files the subsystem uses get saved off into %localappdata%\lxss. By default this directory isn’t displayed in Windows so in order to see this folder you need to uncheck the “Hide protected operating system files (Recommended)” option in Windows Explorer View Options.

After showing OS files I was able to see the lxss directory and so I tried to delete the files, but got an interesting error message saying the files were no longer there and that Windows couldn’t remove them.

could not find this item
Item Not Found

The only way I found to get around this problem was to rename the folder (hence the _old suffix in the screen shot).

After moving the directory to another name I was able to run the lxrun /install command and successfully reinstall Ubuntu.

PS C:\Users\jmreicha\AppData\Local> LxRun /install
Warning: lxrun.exe is only used to configure the legacy Windows Subsystem for Linux distribution.
Distributions can be installed by visiting the Microsoft Store:
https://aka.ms/wslstore

This will install Ubuntu on Windows, distributed by Canonical and licensed under its terms available here:
https://aka.ms/uowterms

Type "y" to continue: y
Downloading from the Microsoft Store... 100%
Extracting filesystem, this will take a few minutes...

Conclusion

I saw various renditions of the solution sprinkled around the internet but none of them seemed to mention how to handle the case where Windows couldn’t locate a file and therefore couldn’t actually delete the lxss directory and recreate everything.

I haven’t figured out how to hard delete everything in the newly renamed “lxss_old” directory, so if anyone has any input it would be appreciated. As far as I know its contents are still consuming some amount of space on the system.

Read More

Set up Drone on arm64 Kubernetes clusters

Continuing with the multiarch and Kubernetes narratives that I have been writing about for awhile now, in this post I will be showing off some of the capabilities of the Drone.io continuous integration tool running on my arm64 Kubernetes homelab cluster.

As arm64 continues to gain popularity, more and more projects are adding support for it, including Drone.io. With its semi recent announcement, Drone can now run on a variety of different architectures now. Likewise, the Drone maintainers have also been working towards a 1.0 release, which brings first class support for Kubernetes, among other things.

I won’t touch on too many of the specifics of Drone, but if you’re interested in learning more, you can check out the website. I will mostly be focusing on how to get things running in Kubernetes, which turns out to be exactly the same for both amd64 and arm64 architectures. There were a few things I discovered along the way to get the Kubernetes integrations working but for the most part things “just work”.

I started off by grabbing the Helm chart and modifying it to suit my needs. I find it easiest to template the chart and then save it off locally so I can play around with it.

Note: the below example assumes you already have helm installed locally.

git clone [email protected]:helm/charts.git && cd charts
helm template --name drone --namespace cicd \
   --set 'sourceControl.provider=github' \
   --set 'sourceControl.github.clientID=XXXXXXXX' \
   --set 'sourceControl.secret=drone-server-secrets' \
   --set 'server.host=drone.example.com' \
   --set 'server.kubernetes.enabled=false' \
   stable/drone > /tmp/manifest.yaml

Obviously you will want to set the configurations to match your own settings, like domain name and oauth settings (I used Github).

After saving out the manifest, the first issue I ran into is that port 9000 is still referenced in the Helm chart, which was used for communication between the client and server in the older releases, but is no longer used. So I just completely removed the references to the port in my Frankenstein configuration. If you are just using the Kubernetes configuration mentioned below, you won’t run into these problems connecting the server and agent, but if you use the agent you will.

There is some server config that will need to adjusted as well to get things working. For example, the oauth settings will need to be created on the Github side first in order for any of this to work. Also, the drone server host will need to be accessible from the internet so any firewall rules will need to be added or adjusted to allow traffic.

 env:
  # Webhook setings
  - name: DRONE_ALWAYS_AUTH
    value: "false"
  - name: DRONE_SERVER_HOST
    value: "drone.example.com"
  - name: DRONE_SERVER_PROTO
    #value: http
    value: https
  # Agent config
  - name: DRONE_RPC_SECRET
    valueFrom:
      secretKeyRef:
        name: drone
        key: secret
  # Server config
  - name: DRONE_DATABASE_DATASOURCE
    value: "/var/lib/drone/drone.sqlite"
  - name: DRONE_DATABASE_DRIVER
    value: "sqlite3"
  - name: DRONE_LOGS_DEBUG
    value: "true"
  - name: DRONE_LOGS_PRETTY
    value: "true"
  - name: DRONE_USER_CREATE
    value: "username:<github_user>,machine:false,admin:true,token:abc123"
  # Github config
  - name: DRONE_GITHUB_CLIENT_ID
    value: abcd
  - name: DRONE_GITHUB_SERVER
    value: https://github.com
  - name: DRONE_GITHUB_CLIENT_SECRET
    valueFrom:
      secretKeyRef:
        name: client-secret-drone
        key: secret

Add the DRONE_USER_CREATE env var to bootstrap an admin account when starting the Drone server. This will allow your user to do all of the admin things using the CLI tool.

The secrets so should get generated when you dump the Helm chart, so feel free to update those with any values you may need.

Note: if you have double checked all of your settings but builds aren’r being triggered, there is a good chance that the webhook is the problem. There is a really good post about how to troubleshoot these settings here.

Running Drone with the Kubernetes integration

This option turned out to be the easier approach. Just add the following configuration to the drone server deployment environment variables, updating according to your environment. For example, the namespace I am deploying to is called “cicd”, which will need to be updated if you choose a different namespace.

- name: DRONE_KUBERNETES_ENABLED
  value: "true"
- name: DRONE_KUBERNETES_NAMESPACE
  value: cicd
- name: DRONE_KUBERNETES_SERVICE_ACCOUNT
  value: drone-pipeline

The main downside to this method is that it creates Kubernetes jobs for each build. By default, once these builds are done, the will exit and not clean themselves up, so if you do a lot of builds then your namespace will get clogged up. There is a way to set TTLs on finished to clean themselves up in newer versions of Kubernetes via the TTLAfterFinished flag but this functionality isn’t default in Kubernetes yet and is a little bit out of the scope of this post.

Running Drone with the agent configuration

The agent uses the sidecar pattern to run a Docker in Docker (dind) container to connect to the Docker socket in order to allow the Drone agent to do its builds.

The main downside of using this approach is that there seems to be an issue (sometimes) where the Drone components can’t talk to the Docker socket, you can find a better description of this problem and more details here. The problem seems to be a race condition where the docker socket is not being able to be mounted before the agent comes up, but I still haven’t totally solved the problem there yet. The advice for getting around this is to run the agent on a dedicated stand alone host to avoid race conditions and other pitfalls.

That being said, if you still want to use this method you will need to add an additional deployment to the config for the drone agent. If you use the agent you can disregard the above Kubernetes environment variable configurations and instead set appropriate environment variables in the agent. Below is the working snipped I used for deploying the agent to my test cluster.

---
apiVersion: extensions/v1beta1
kind: Deployment
metadata:
  name: drone-agent
  labels:
    app: drone
    component: agent
spec:
  replicas: 1
  template:
    metadata:
      labels:
        app: drone
        component: agent
    spec:
      serviceAccountName: drone
      containers:
      - name: agent
        image: "docker.io/drone/agent:1.0.0-rc.6"
        imagePullPolicy: IfNotPresent
        ports:
        - name: http
          containerPort: 3000
          protocol: TCP
        env:
          # This value should point to the Drone server service
          - name: DRONE_RPC_SERVER
            value: http://drone.cicd
          - name: DRONE_RPC_SECRET
            valueFrom:
              secretKeyRef:
                name: drone
                key: secret
          - name: DOCKER_HOST
            value: tcp://localhost:2375
          - name: DRONE_LOGS_DEBUG
            value: "true"
          # Uncomment this for additional trace logs
          #- name: DRONE_LOGS_TRACE
          #  value: "true"
          - name: DRONE_LOGS_PRETTY
            value: "true"

      - name: dind
        image: "docker.io/library/docker:18.06-dind"
        imagePullPolicy: IfNotPresent
        env:
        - name: DOCKER_DRIVER
          value: overlay2

        securityContext:
          privileged: true

        volumeMounts:
          - name: docker-graph-storage
            mountPath: /var/lib/docker
      volumes:
      - name: docker-graph-storage
        emptyDir: {}

I have gotten the agent to work, I just haven’t had very much success getting it working consistently. I would avoid using this method unless you have to or as mentioned above, get a dedicated host for running builds on.

Testing it out

After wiring everything up, the rest is easy. Add a file called .drone.yml to a repository that you would like to automate builds for. You can find out more about the various capabilities here.

For my use case I wanted to tell drone to build and publish an arm64 based Docker image whenever a change to master occurs. You can look at my drone configuration here to get a better idea of the multiarch options as well as authenticating to a Docker registry.

After adding the .drone.yml to your repo and triggering a build you should see something similar in your local Drone instance.

Sample Drone build

If everything worked correctly and is green then you should be good to go. Obviously there is a lot of overhead that Kubernetes brings but the actual Drone setup if really straight forward. Just set stuff up on the Github side, translate it into Kubernetes configurations and add some other Drone specific config options and you should have a nice CI/CD environment ready to go.

Conclusion

I am very excited to see Drone grow and mature and use it more in the future. It is simple yet flexible and it fits nicely into the new paradigm of using containers for everything.

The new 1.0 YAML syntax is really nice as well, as it basically maps to the conventions that Kubernetes has chosen, so if you are familiar with that syntax you should feel at home. You can check out the available plugins here, which cover about 90% of the use cases you would see in the wild.

One downside is that YAML syntax errors are really hard to debug, and there isn’t much in the way of output to figure out where your problems are. The best approach I have found is to run the .drone.yml file through the Drone CLI lint/fmt tool before committing and building.

The Drone CLI tool is really powerful and could probably be its own post. There are some links in the references that show off some of its other features.

References

There are a few cool Drone resources I would definitely recommend checking out if you are interested running Drone in your own environment. The docs reference is a great place to start and is great for finding information about how to configure and tweak various Drone settings.

https://docs.drone.io/reference/

Here is a link to the CLI reference.

https://github.com/drone/awesome-drone

I also definitely recommend checking out the jsonnet extension docs, which can be used to help improve automation workflows. The second link show an good example of how it works and the third link shows some practical applications of using jsonnet to help manage complicated CI pipelines.

https://docs.drone.io/extend/config/jsonnet/
https://github.com/drone/drone/blob/master/.drone.jsonnet
https://medium.com/dazn-tech/simplify-your-ci-pipeline-configuration-with-jsonnet-5a96cd9ccc51

Here is a link for various cool drone stuff, including blog posts and tools.

https://docs.drone.io/cli/

Read More

Kubernets plugins

Manage Kubernetes Plugins with Krew

There have been quite a few posts recently describing how to write custom plugins, now that the mechanism for creating and working with them has been made easier in upstream Kubernetes (as of v1.12). Here are the official plugin docs if you’re interested in learning more about how it all works.

One neat thing about the new plugin architecture is that they don’t need to be written in Go to be recognized by kubectl. There is a document in the Kubernetes repo that describes how to write your own custom plugin and even a helper library for making it easier to write plugins.

Instead of just writing another tutorial about how to make your own plugin, I decided to show how easy it is to grab and experiment with existing plugins.

Installing krew

If you haven’t heard about it yet, Krew is a new tool released by the Google Container Tools team for managing Kubernetes plugins. As far as I know this is the first plugin manager offering available, and it really scratches my itch for finding a specific tool for a specific job (while also being easy to use).

Krew basically builds on top of the kubectl plugin architecture for making it easier to deal with plugins by providing a sort of framework for keeping track of things and making sure they are doing what they are supposed to.

The following kubectl-compatible plugins are available:

/home/jmreicha/.krew/bin/kubectl-krew
/home/jmreicha/.krew/bin/kubectl-rbac_lookup
...

You can manage plugins without Krew, but if you work with a lot of plugins complexity and maintenance generally start to escalate quickly if you are managing everything manually. Below I will show you how easy it is to deal with plugins instead using Krew.

There are installation instructions in the repo, but it is really easy to get going. Run the following commands in your shell and you are ready to go.

(
  set -x; cd "$(mktemp -d)" &&
  curl -fsSLO "https://storage.googleapis.com/krew/v0.2.1/krew.{tar.gz,yaml}" &&
  tar zxvf krew.tar.gz &&
  ./krew-"$(uname | tr '[:upper:]' '[:lower:]')_amd64" install \
    --manifest=krew.yaml --archive=krew.tar.gz
)

# Then append the following to your .zshrc or bashrc
export PATH="${KREW_ROOT:-$HOME/.krew}/bin:$PATH"

# Then source your shell to pick up the path
source ~/.zshrc # or ~/.bashrc

You can use the kubectl plugin list command to look at all of your plugins.

Test it out to make sure it works.

kubectl krew help

If everything went smoothly you should see some help information and can start working with the plugin manager. For example, if you want to check currently available plugins you can use Krew.

kubectl krew update
kubectl krew search

Or you can browse around the plugin index on GitHub. Once you find a plugin you want to try out, just install it.

kubectl krew install view-utilization

That’s it. Krew should take care of downloading the plugin and putting it in the correct path to make it usable right away.

kubectl view-utilization

Some plugins require additional tools to be installed beforehand as dependencies but should tell you which ones are required when they are installed the first time.

Installing plugin: view-secret
CAVEATS:
\
 |  This plugin needs the following programs:
 |  * jq
/
Installed plugin: view-secret

When you are done with a plugin, you can install it just as easily as it was installed.

kubectl krew uninstall view-secret

Conclusion

I must say I am a really big fan of this new model for managing and creating plugins, and I think it will encourage the community to develop even more tools so I’m looking forward to seeing what people come up with.

Likewise I think Krew is a great fit for this because it is super easy to get installed and started with, which I think is important for gaining widespread adoption in the community. If you have an idea for a Kubectl plugin please consider adding it to the krew-index. The project maintainers are super friendly and are great about feedback and getting things merged.

Read More

Turbocharge your Ansible Playbooks

If you haven’t already discovered Mitogen yet, read on for how to use it (and a few other tricks) to make you Ansible plays a much better experience.

In short, Mitogen is a Python library that (among other things) provides an alternative way to connect to distributed machines using tools like Ansible, Salt and Fabric. And it is fast. Like really fast. Here is a note taken from the Mitogen documentation.

Expect a 1.25x – 7x speedup and a CPU usage reduction of at least 2x, depending on network conditions, modules executed, and time already spent by targets on useful work. Mitogen cannot improve a module once it is executing, it can only ensure the module executes as quickly as possible.

As the documentation says, Mitogen isn’t intended to be used directly but has entrypoints for connecting various tools with its API.

Here is what the sample output might look like with the SSH pipelining and other tweaks configured, not including Mitogen. It clocks in around 120 seconds.

And here is the same run again with Mitogen. The same playbook run is down to around 90 seconds, about a 25% improvement as shown below. The output and few of the other settings are described in more detail below.

mitogen output
better Ansible play output

To set up Mitogen as an Ansible replacement for connecting to hosts, first install it. Note the version. In my own testing, versions earlier than 0.2.5 had some issues.

cd </path/to/install>
curl -OL https://files.pythonhosted.org/packages/source/m/mitogen/mitogen-0.2.5.tar.gz
tar xvzf mitogen-0.2.5.tar.gz

Then modify the anisble.cfg file to point at Mitogen.

[defaults]
strategy_plugins = </path/to/install>/mitogen-0.2.4/ansible_mitogen/plugins/strategy
strategy = mitogen_linear

An option was addin in Mitogen v0.2.4 to disable SSH compression, which can reduce run times in faster networks. The documentation says this option will be default in the future but for now you can turn it on with the following command configuration.

mitogen_ssh_compression = False

NOTE: If you are having trouble with Mitogen and need to turn it off you should also be aware of SSH pipelining. This method of execution isn’t as fast as Mitogen but should at least help bring playbook times down. You can turn it on with the following configuration.

[ssh_connection]
pipelining = True

There are a few other bells and whistles that you can adjust in the anisble.cfg file to help with performance and gain visibility into what is happening.

There is a setting for callback configurations that can be added to ansible.cfg that makes it much easier to see how long things take.

...
# Record some metrics about the Ansible runs
callback_whitelist = timer, profile_tasks
# Better output formatting
stdout_callback = yaml
# Minimal output formatting
#stdout_callback = minimal
callback_plugins = callback_plugins
...

Other settings that can be tuned include some of the defaults like poll_interval, caching and the number of forks to run. I found this blog post to be very helpful in discovering and describing a number of these Ansible tweaks.

Below is a modified ansible.cfg with these settings tuned.

# How often Ansible checks running tasks. The default is set to 15
poll_interval = 5

# Number of processes to fork.  Default is set to 5.
forks = 100

#caching
fact_caching            = jsonfile
fact_caching_connection = .cache/

With these tweaks your Ansible playbooks should run much faster and more cleanly. I highly recommend giving Mitogen a try as well, I have not run into any issues with Mitogen 0.2.3 and it isn’t much effort to add for the amount of gains you get by switching to it. If you know of any other tweaks or settings feel free to let me know!

Read More