As part of my adventures in building a 100% Arm64 Kubernetes cluster, I recently tried to build an Arm64 Jupyterhub Docker imageto run in the cluster. To my surprise, there doesn’t seems to be any “official” Jupyterhub arm64 Docker images out there, so I decided to set out and create one.
In the process of building my image, I almost immediately hit a stumbling block in that the Docker image uses the Conda package manager and several Conda packages for its build. The problem is that several of these packages have not yet been built to work on alternate architectures, e.g. Arm64, and others. So I went off down into the rabbit hole of seeing how hard it would be to add this support for these packages in order to get the Jupyterhub Docker image working.
The first stop on this journey was to conda-forge to look at the multiarch support. If you aren’t familiar (I wasn’t), conda-forge bills itself as a large Github community for creating and building Conda packages.
The first thing to look at when adding support to an existing package is getting familiar with conda-smithy, which is the tool responsible for setting up and building all of the various conda-forge “recipes”. There are generic instructions for using conda-smithy here.
As a fun side note, there is no “native” Arm64 build infrastructure for creating packages. The current builds use QEMU to emulate aarch64 (arm64) using Azure pipelines. This has some issues so while I was down in the rabbit hole I decided to contribute a PR to help get native arm64 builds added. The work isn’t yet complete, it still needs to be hooked up to CI, so if you want to help out feel free to let me know or just open a PR in the conda-smithy repo.
Multiarch support
With the housekeeping out of the way, we can now look at how to actually add the multiarch support for a package.
First, fork and clone the desired recipe. In this example I am adding arm64 support to the pycurl recipe as it is one of the Conda package dependencies that I need to build Jupyterhub for Arm64.
git clone https://github.com/conda-forge/pycurl-feedstock.git
Edit conda-forge.yml and add the following line to the bottom.
provider: {linux_aarch64: default, linux_ppc64le: default}
If you are just adding support for new architectures like I am here, you will need to bump the build number. This can be found recipe/meta.yml, and there are also instruction for doing this.
… build: number: 0 …
Just change this value to 1. Next, install conda smithy if you don’t have it already.
conda install conda-smithy
And then you can render out all the new files needed for the various builds.
conda-smithy rerender
Add the generated files to a new (forked) branch of the recipe.
git add .
git commit -m "Add multiarch support"
git push
Then open up a PR to the conda-forge repo with the details. Once the PR has been open a series of checks should kick off to build the recipe for the various architectures.
If everything is green you are good to go. Maintainers are usually pretty good about merging in changes, but if you need to, you can ping an admin to get help.
You can also tell the build to rerun if it fails using the “@conda-forge-admin, please rerender” command.
You can find more details about what all the bot can do here.
Conclusion
Conda-forge provides some nifty tools for large scale automation and makes it super easy for outsiders to contribute to the community. If you find a missing, outdated or package lacking multiarch support on the Anaconda repo (which includes packages contributed by conda-forge along with many others), definitely think about contributing. The process of adding changes is easy and the conda-forge community is growing all the time.