Building Multiarch Conda-Forge Recipes

May 26, 2019May 26, 2019 Josh Reichardt Leave a comment

As part of my adventures in building a 100% Arm64 Kubernetes cluster, I recently tried to build an Arm64 Jupyterhub Docker imageto run in the cluster. To my surprise, there doesn’t seems to be any “official” Jupyterhub arm64 Docker images out there, so I decided to set out and create one.

In the process of building my image, I almost immediately hit a stumbling block in that the Docker image uses the Conda package manager and several Conda packages for its build. The problem is that several of these packages have not yet been built to work on alternate architectures, e.g. Arm64, and others. So I went off down into the rabbit hole of seeing how hard it would be to add this support for these packages in order to get the Jupyterhub Docker image working.

The first stop on this journey was to conda-forge to look at the multiarch support. If you aren’t familiar (I wasn’t), conda-forge bills itself as a large Github community for creating and building Conda packages.

The first thing to look at when adding support to an existing package is getting familiar with conda-smithy, which is the tool responsible for setting up and building all of the various conda-forge “recipes”. There are generic instructions for using conda-smithy here.

As a fun side note, there is no “native” Arm64 build infrastructure for creating packages. The current builds use QEMU to emulate aarch64 (arm64) using Azure pipelines. This has some issues so while I was down in the rabbit hole I decided to contribute a PR to help get native arm64 builds added. The work isn’t yet complete, it still needs to be hooked up to CI, so if you want to help out feel free to let me know or just open a PR in the conda-smithy repo.

Multiarch support

With the housekeeping out of the way, we can now look at how to actually add the multiarch support for a package.

First, fork and clone the desired recipe. In this example I am adding arm64 support to the pycurl recipe as it is one of the Conda package dependencies that I need to build Jupyterhub for Arm64.

git clone https://github.com/conda-forge/pycurl-feedstock.git

Edit conda-forge.yml and add the following line to the bottom.

provider: {linux_aarch64: default, linux_ppc64le: default}

If you are just adding support for new architectures like I am here, you will need to bump the build number. This can be found recipe/meta.yml, and there are also instruction for doing this.

…
build:
  number: 0
…

Just change this value to 1. Next, install conda smithy if you don’t have it already.

conda install conda-smithy

And then you can render out all the new files needed for the various builds.

conda-smithy rerender

Add the generated files to a new (forked) branch of the recipe.

git add .
git commit -m "Add multiarch support"
git push

Then open up a PR to the conda-forge repo with the details. Once the PR has been open a series of checks should kick off to build the recipe for the various architectures.

If everything is green you are good to go. Maintainers are usually pretty good about merging in changes, but if you need to, you can ping an admin to get help.

You can also tell the build to rerun if it fails using the “@conda-forge-admin, please rerender” command.

You can find more details about what all the bot can do here.

Conclusion

Conda-forge provides some nifty tools for large scale automation and makes it super easy for outsiders to contribute to the community. If you find a missing, outdated or package lacking multiarch support on the Anaconda repo (which includes packages contributed by conda-forge along with many others), definitely think about contributing. The process of adding changes is easy and the conda-forge community is growing all the time.

Turbocharge your Ansible Playbooks

February 9, 2019February 15, 2019 Josh Reichardt Leave a comment

If you haven’t already discovered Mitogen yet, read on for how to use it (and a few other tricks) to make you Ansible plays a much better experience.

In short, Mitogen is a Python library that (among other things) provides an alternative way to connect to distributed machines using tools like Ansible, Salt and Fabric. And it is fast. Like really fast. Here is a note taken from the Mitogen documentation.

Expect a 1.25x – 7x speedup and a CPU usage reduction of at least 2x, depending on network conditions, modules executed, and time already spent by targets on useful work. Mitogen cannot improve a module once it is executing, it can only ensure the module executes as quickly as possible.

As the documentation says, Mitogen isn’t intended to be used directly but has entrypoints for connecting various tools with its API.

Here is what the sample output might look like with the SSH pipelining and other tweaks configured, not including Mitogen. It clocks in around 120 seconds.

And here is the same run again with Mitogen. The same playbook run is down to around 90 seconds, about a 25% improvement as shown below. The output and few of the other settings are described in more detail below.

mitogen output — better Ansible play output

To set up Mitogen as an Ansible replacement for connecting to hosts, first install it. Note the version. In my own testing, versions earlier than 0.2.5 had some issues.

cd </path/to/install>
curl -OL https://files.pythonhosted.org/packages/source/m/mitogen/mitogen-0.2.5.tar.gz
tar xvzf mitogen-0.2.5.tar.gz

Then modify the anisble.cfg file to point at Mitogen.

[defaults]
strategy_plugins = </path/to/install>/mitogen-0.2.4/ansible_mitogen/plugins/strategy
strategy = mitogen_linear

An option was addin in Mitogen v0.2.4 to disable SSH compression, which can reduce run times in faster networks. The documentation says this option will be default in the future but for now you can turn it on with the following command configuration.

mitogen_ssh_compression = False

NOTE: If you are having trouble with Mitogen and need to turn it off you should also be aware of SSH pipelining. This method of execution isn’t as fast as Mitogen but should at least help bring playbook times down. You can turn it on with the following configuration.

[ssh_connection]
pipelining = True

There are a few other bells and whistles that you can adjust in the anisble.cfg file to help with performance and gain visibility into what is happening.

There is a setting for callback configurations that can be added to ansible.cfg that makes it much easier to see how long things take.

...
# Record some metrics about the Ansible runs
callback_whitelist = timer, profile_tasks
# Better output formatting
stdout_callback = yaml
# Minimal output formatting
#stdout_callback = minimal
callback_plugins = callback_plugins
...

Other settings that can be tuned include some of the defaults like poll_interval, caching and the number of forks to run. I found this blog post to be very helpful in discovering and describing a number of these Ansible tweaks.

Below is a modified ansible.cfg with these settings tuned.

# How often Ansible checks running tasks. The default is set to 15
poll_interval = 5

# Number of processes to fork.  Default is set to 5.
forks = 100

#caching
fact_caching            = jsonfile
fact_caching_connection = .cache/

With these tweaks your Ansible playbooks should run much faster and more cleanly. I highly recommend giving Mitogen a try as well, I have not run into any issues with Mitogen 0.2.3 and it isn’t much effort to add for the amount of gains you get by switching to it. If you know of any other tweaks or settings feel free to let me know!

Python virtualenv Notes

July 4, 2018 Josh Reichardt Leave a comment

Virtual environments are really useful for maintaining different packages and for separating different environments without getting your system messy. In this post I will go over some of the various virtual environment tricks that I seem to always forget if I haven’t worked with Python in awhile.

This post is meant to be mostly a reference for remembering commands and syntax and other helpful notes. I’d also like to mention that these steps were all tested on OSX, I haven’t tried on Windows so don’t know if it is any different.

Working with virtual environments

There are a few pieces in order to get to get started. First, the default version of Python that ships with OSX is 2.7, which is slowly moving towards extinction. Unfortunately, it isn’t exactly obvious how to replace this version of Python on OSX.

Just doing a “brew install python” won’t actually point the system at the newly installed version. In order to get Python 3.x working correctly, you need to update the system path and place Python3 there.

export PATH="/usr/local/opt/python/libexec/bin:$PATH"

You will want to put the above line into your bashrc or zshrc (or whatever shell profile you use) to get the brew installed Python onto your system path by default.

Another thing I discovered – in Python 3 there is a built in command for creating virtual environments, which alleviates the need to install the virtualenv package.

Here is the command in Python 3 the command to create a new virtual environment.

python -m venv test

Once the environment has been created, it is very similar to virtualenv. To use the environment, just source it.

source test/bin/activate

To deactivate the environment just use the “deactivate” command, like you would in virutalenv.

The virtualenv package

If you like the old school way of doing virtual environments you can still use the virtualenv package for managing your environments.

Once you have the correct version of Python, you will want to install the virtualenv package on your system globally in order to work with virtual environments.

sudo pip install virtualenvwrapper

You can check to see if the package was installed correctly by running virtualenv -h. There are a number of useful virtualenv commands listed below for working with the environments.

Make a new virtual env

mkvirtualenv test-env

Work on a virtual env

workon test-env

Stop working on a virtual env

(when env is actiave) deactive

List environments

lsvirtualenv

Remove a virtual environment

rmvirtualenv test-env

Create virtualenv with specific version of python

mkvirtualenv -p $(which pypy) test2-env

Look at where environments are stored

ls ~/.virtualenvs

I’ll leave it here for now. The commands and tricks I (re)discovered were enough to get me back to being productive with virtual environments. If you have some other tips or tricks feel free to let me know. I will update this post if I find anything else that is noteworthy.

Fix the JenkinsAPI No valid crumb error

June 7, 2017 Josh Reichardt 1 Comment

If you are working with the Python based JenkinsAPI library you might run into the No valid crumb was included in the request error. The error below will probably look familiar if you’ve run into this issue.

Traceback (most recent call last):
 File "myscript.py", line 47, in <module>
 deploy()
 File "myscript.py", line 24, in deploy
 jenkins.build_job('test')
 File "/usr/local/lib/python3.6/site-packages/jenkinsapi/jenkins.py", line 165, in build_job
 self[jobname].invoke(build_params=params or {})
 File "/usr/local/lib/python3.6/site-packages/jenkinsapi/job.py", line 209, in invoke
 allow_redirects=False
 File "/usr/local/lib/python3.6/site-packages/jenkinsapi/utils/requester.py", line 143, in post_and_confirm_status
 response.text.encode('UTF-8')
jenkinsapi.custom_exceptions.JenkinsAPIException: Operation failed. url=https://jenkins.example.com/job/test/build, data={'json': '{"parameter": [], "statusCode": "303", "redirectTo": "."}'}, headers={'Content-Type': 'application/x-www-form-urlencoded'}, status=403, text=b'<html>\n<head>\n<meta http-equiv="Content-Type" content="text/html;charset=utf-8"/>\n<title>Error 403 No valid crumb was included in the request</title>\n</head>\n<body><h2>HTTP ERROR 403</h2>\n<p>Problem accessing /job/test/build. Reason:\n<pre> No valid crumb was included in the request</pre></p><hr><a href="http://eclipse.org/jetty">Powered by Jetty:// 9.4.z-SNAPSHOT</a><hr/>\n\n</body>\n</html>\n'

It is good practice to enable additional security in Jenkins by turning on the “Prevent Cross Site Forgery exploits” option in the security settings, so if you see this error it is a good thing. The below example shows this security feature in Jenkins.

The Fix

This error threw me off at first, but it didn’t take long to find a quick fix. There is a crumb_requester class in the jenkinsapi that you can use to create the crumbed auth token. You can use the following example as a guideline in your own code.

from jenkinsapi.jenkins import Jenkins
from jenkinsapi.utils.crumb_requester import CrumbRequester

JENKINS_USER = 'user'
JENKINS_PASS = 'pass'
JENKINS_URL = 'https://jenkins.example.com'

# We need to create a crumb for the request first
crumb=CrumbRequester(username=JENKINS_USER, password=JENKINS_PASS, baseurl=JENKINS_URL)

# Now use the crumb to authenticate against Jenkins
jenkins = Jenkins(JENKINS_URL, username=JENKINS_USER, password=JENKINS_PASS, requester=crumb)

...

The code looks very similar to creating a normal Jenkins authentication object, the only difference being that we create and then pass in a crumb for the request, rather than just a username/password combination. Once the crumbed authentication object has been created, you can continue writing your Python code as you would normally. If you’re interested in learning more about crumbs and CSRF you can find more here, or just Google for CSRF for more info.

This issue was slightly confusing/annoying, but I’d rather deal with an extra few lines of code and know that my Jenkins server is secure.

Quicktip: Manage Memory Usage with Supervisord

July 6, 2015August 31, 2015 Josh Reichardt 1 Comment

I have been using Supervisord for process management for quite a while now but had no idea it could manage memory usage (among other things) until just recently.

There is a Python project called Superlance which essentially adds some extra functionality to supervisord for managing processes and memory. The docs are a little thin so I thought it would be a good idea to highlight some of the functionality for folks that just want a few examples of how it works or can be used in a useful way.

Obviously you will want to have supervisor installed and configured already. That can be done with pip or via apt-get. You will also need to make sure you have a proper [unix_http_server] section in your /etc/supervisor/supervisord.conf file.

To install Superlance (on Ubuntu 14.04).

sudo pip install superlance

This will download and install a handful of Python scripts that can then be plugged in to Supervisor. Check the link above if you are interested in the other plugins.

Then you will need to add a section to your supervisor config for memmon to manage memory usgae.

[eventlistener:memmon]
command=memmon -p <program_name>=3GB
events=TICK_60

The “-p <program_name>” corresponds to the program header in your supervisor configuration. There are other options available to manage group processes, etc. for more advanced use cases but this should cover most basic scenarios.

You will need to reload the supervisor configuration after your changes have been made. Unforunately the supervisor process needs to be fully reloaded.

sudo supervisorctl reload

If you want to check that the the memmon script is available before restarting supervisor you can use reread.

sudo supervisorctl reread

I would suggest reading through the Superlance docs and checking out the other scripts. This additional functionality really helps add another layer of functionality to supervisord that I didn’t know existed.