Immutable WordPress Installations with Kubernetes

wordpress on kubernetes

In this post I will describe some of the interesting discoveries I have made for a recent side project I have been working on, which include a few nice discoveries to automate and manage WordPress deployments with Kubernetes.

Automating WordPress

I am very happy with the patterns that have emerged from this project. One thing that I have always struggled with (I’m sure others have as well) in the world of WP has been finding a good way to create completely reproducible and immutable code and configurations. For example, managing plugins and themes has been a painful experience because WP was designed back in a time before configuration as code and infrastructure as code. Due to this different paradigm, WP is usually stood up once, then managed via it

Tools have evolved since then and a perfect example of one of the bridges between the old and new way is the wp-cli. The wp-cli is basically a way to automate all kinds of things you would otherwise do in the UI. For example, wp-cli provides a way to manage plugin and themes, which as I mentioned has been notoriously difficult to do in the past.

The next step forward is the combination of a tool called bedrock and its accompanying way of modifying it and building your own Docker image. The roots/bedrock method provides the wp-cli in the build scripts so if needed, you can automate tasks using extra entrypoint scripts and/or wp-cli commands, which is just a nice extra touch and shows that the maintainers of the project are putting a lot of effort into it.

A few other bells and whistles include a way to build custom plugins into Docker images for portability rather than relying on some external persistent storage solution which can quickly add overhead and complexity to a project, as well as modern tools like PHP Composer and Packagist which provide a way to install packages (Composer) and a way to manage WP plugins via the Composer package manager (packagist).

Sidenote

There are several other ways of deploying WP into Kubernetes, unfortunately most of these methods do not address multitenancy. If multitenancy is needed, a much more complicated approach is needed involving either NFS or some other many -> many volume mapping.

Deploying with Kubernetes

The tricky part to all of this is the fact that I was unable to find any examples of others using Kubernetes to deploy bedrock managed Docker containers. There is a docker-compose.yaml file in the repo that works perfectly well, but the next step beyond that doesn’t seem to be a topic that has been covered much.

Luckily it is mostly straight forward to bring the docker-compose configuration into Kubernetes, there are just a few minor adjustments that need to be made. The below link should provide the basic scaffolding needed to bring bedrock into a Kubernetes cluster. This method will even expose a way to create and manage WP multisite, another notoriously difficult aspect of WP to manage.

https://gist.github.com/jmreicha/aae3ce024c13be1c561189946f1a0efc

There are a couple of things to note with this configuration. You will need build/maintain your own Docker image based off the roots/bedrock repo linked above. You will also need to have some knowledge of Kubernetes and a working Kubernetes cluster in place. The configuration will require certificates, and DNS so cert-manager and external-dns will most likely need to be deployed into the cluster.

Finally, in the configuration the password, domain (example.com), environment variables for configuring the database and Docker image will need to be updated to reflect your own environment. This method assumes that the WordPress database has already been split out to another location, so will require the Kubernetes cluster to be able to communicate with wherever the database is hosted.

To see some of the magic, change the number of replicas in the Kubernetes manifest configuration from 1 to 2, and you should be able to see a new, completely identical container come up with all the correct configurations and code and start taking traffic.

Conclusion

Switching to the immutable infrastructure approach with WP nets a big win. By adopting these new methods and workflows you can control everything with code, which removes the need for manually managing WP instances and instead allows you to create workflows and pipelines to do all of the heavy lifting.

These benefits include much more visibility in controlling changes, because now Git becomes the central source of truth which allows you to get a better picture of the what, when and why than any other system I have found. This new paradigm also enables the use of Continuous Integration as it is intended – the automatic builds and deploys because of Docker and Kubernetes integrations producing immutable artifacts (Docker), and deployments (Kubernetes manifests) create a clean and simple way to manage the aspects of running the WordPress site.

Read More

Fix Google Analytics search queries in WordPress

I embarrassingly discovered the other day that I was not receiving metrics or analytics about keyword queries in the Google Analytics console.  It turns out that problem was twofold.  First, I didn’t have the SSL version of my site enabled in the Google webmaster tools and second, I was serving a cached version of my sitemap that was several months out of date.

To give you an idea of how this issue manifested itself, and how I discovered that there were issues in the first place – here’s what my search keywords were showing as in the Google Analytics console.

search queries

Clearly the data is less than useful.  The solution to this problem is pretty easy to fix at least.

Fixing the webmaster properties

Open up the Google webmasters site (you should have this setup already, if not go ahead and get signed up and add your WordPress site).  If you have recently updated your site to use https, make sure you add a new property in the webmaster tools for the https version of your site that matches your http version.

Doing this will tell google to keep track of search queries for the https version of your site, which should be the default after swtiching.  After adding the new https property and indexing it, give it a day or two to start collecting metrics, and check back.  Now when you check your search query traffic in the webmaster tools you should start to see all of the search results.

search queries

Also be sure to update properties to use https in both the Webmaster console as well as the Analytics console.  For example, in the Analytics console under Admin -> Property settings -> default URL, there is an option to use http or https.  Likewise, in the webmaster console there is an option for defaulting to http, which is buried in the Google Analytics interface under Admin -> Property settings -> Search Console.  Make sure you update ALL of the site settings to use https.

NOTE: It can take some time for queries to begin showing up in the Google Analytics console (it took about two days for them to start showing up for my site after fixing all the https references).

Fixing sitemaps issues

If you find that Google isn’t indexing and using all of your posts and pages, the next thing to look at the sitemaps.  A quick way to know if you your sitemaps file is doing its job is to pull open the sitemaps, which can be found under the Crawl -> Sitemaps menu.

webmaster tools sitemap

The above shows what a healthy sitemap index looks like (after I corrected the problem).  There is a button located in the top left of this view that can help you test your sitemap while you are updating your settings.  First check for any items in the “issues” column.  Also, if the “processed” date here isn’t recent then there is probably an issue.  One last thing to check – if there are either no entries in this view or fewer then you expect, something is probably not working.

There are many more knobs and dials you can adjust in the webmaster tools, so if you haven’t played with them much I would recommend spending some time and poking around.

I should quickly mention that my solution assumes that you are using the Google XML Sitemaps plugin.  If you’re not using this plugin, and you are either 1) new to WordPress or 2) don’t want to manage your sitemaps file manually, I suggest you enable it.  It makes sitemap management so much easier.

After you have the plugin turned on, navigate to your blog settings for sitemaps, which can be located in Dashboard -> Settings -> XML-Sitemap.  Clicking the popup should bring up a page similar to the following.

xml sitemaps

First, make sure everything looks correct in the settings.  If you are setting this up for the first time you might need to configure some of the settings.  For example, make sure the site name matches the listing, and the options to notify search indexers are all turned on.

When I was troubleshooting the search queries not getting set, I navigated to this menu and immediately noticed that the plugin was showing a warning about using a cached version of my sitemaps.xml file.  To fix this warning, there should be an option to remove the cached versions.

Next, there should be an option near the top of the sitemap settings to “notify search engines about your sitemap”.  After you have adjusted all the sitemap settings and cleared the cached sitemaps file, click that link to trigger a ping to the search indexers.

Be aware that the crawling process may take up to a few days to index and update so be patient.

Read More

Enable SSL for your WordPress blog

Updated: 11/18/16

The Let’s Encrypt client was recently renamed to “certbot”.  I have updated the post to use the correct name but if I miss something use certbot or let me know.

With the announcement of the public beta of the Let’s Encrypt project, it is now nearly trivial to get your site set up with an SSL certificate.  One of the best parts about the Let’s Encrypt project is that it is totally free, so there is pretty much no reason to protect your blog set up with an SSL certificate.  The other nice part of Let’s Encrypt is that it is very easy to get your certificate issued.

The first step to get started is grabbing the latest source code from GitHub for the project.  Log on to your WordPress server (I’m running Ubuntu) and clone the repo.  Make sure to install git if you haven’t already.

git clone https://github.com/letsencrypt/certbot.git

There is a shell script you can run to pretty much do everything for you, including installation of any packages and libraries it needs as well as configures paths and other components it needs to work.

cd certbot
./certbot-auto

After the bootstrap is done there should be some CLI options.  Run the command with the -h flag to print out help.

./certbot-auto -h

Since I am using Apache for my blog I will use the “–apache” option.

./certbot-auto --apache

There will be some prompts you need to go through for setting up the certificates and account creation.

let's encrypt

 

 

 

 

 

This process is still somewhat error prone, so if you make a typo you can just rerun the “./letsencrypt-auto” command and follow the prompts.

The certificates will be dropped in to /etc/letsencrypt/live/<website>.  Go double check them if needed.

This process will also generate a new apache configuration file for you to use.  You can check for the file in /etc/apache2/site-enabled.  The import part of this config should look similar to the following:

<VirtualHost *:443>
  UseCanonicalName Off
  ServerAdmin webmaster@localhost
  DocumentRoot /var/www/wordpress
  SSLCertificateFile /etc/letsencrypt/live/thepracticalsysadmin.com/cert.pem
  SSLCertificateKeyFile /etc/letsencrypt/live/thepracticalsysadmin.com/privkey.pem
  Include /etc/letsencrypt/options-ssl-apache.conf
  SSLCertificateChainFile /etc/letsencrypt/live/thepracticalsysadmin.com/chain.pem
</VirtualHost>

As a side note, you will probably want to redirect non https requests to use the encrypted connection.  This is easy enough to do, just go find your .htaccess file (mine was in /var/www/wordpress/.htaccess) and add the following rules.

<IfModule mod_rewrite.c>
  RewriteEngine On
  RewriteCond %{SERVER_PORT} 80
  RewriteRule ^(.*)$ https://example.com/$1 [R,L]
</IfModule>

Before we restart Apache with the new configuration let’s run a quick configtest to make sure it all works as expected.

apachectl configtest

If everything looks okay in the configtest then you can reload or restart apache.

service apache2 restart

Now when you visit your site you should get the nice shiny green lock icon on the address bar.  It is important to remember that the certificates issued by the Let’s Encrypt project are valid for 90 days so you will need to make sure to keep up to date and generate new certificates every so often.  The Let’s Encrypt folks are working on automating this process but for now you will need to manually generate new certificates and reload your web server.

let's encrypt

 

 

 

 

 

 

 

 

 

 

 

 

 

 

That’s it.  Your site should now be functioning with SSL.

Updating the certificate automatically

To take this process one step further We can make a script that can be run via cron (or manually) to update the certificate.

Here’s what the script looks like.

#!/usr/bin/env bash

dir="/etc/letsencrypt/live/example.com"
acme_server="https://acme-v01.api.letsencrypt.org/directory"
domain="example.com"
https="--standalone-supported-challenges tls-sni-01"

# Using webroot method
#/root/letsencrypt/certbot-auto --renew certonly --server $acme_server -a webroot --webroot-path=$dir -d $domain --agree-tos

# Using standalone method
service apache2 stop
# Previously you had to specify options to renew the cert but this has been deprecated
#/root/letsencrypt/certbot-auto --renew certonly --standalone $https -d $domain --agree-tos
# In newer versions you can just use the renew command
/root/letsencrypt/certbot-auto renew --quiet
service apache start

Notice that I have the “webroot” method commented out.  I run a service (Varnish) on port 80 that proxies traffic but also interferes with LE so I chose to run the standalone renewal method.  It is pretty easy, the main difference is that you need to turn off Apache before you run it since Apache binds to to ports 80/443.  But the downtime is okay in my case.

I chose to put the script in to a cron job and have it run every 45 days so that I don’t have to worry about logging on to my server to regenerate the certificate.  Here’s what a sample crontab for this job might look like.

0 0 */45 * * /root/renew_cert.sh

This is a straight forward process and will help with your search engine juices as well.

Read More