Janet Riley

Cleaning Up After the Move: .htaccess recipes for Jekyll

February 15, 2016

A few things broke during the switch from WordPress to Jekyll. Content moved, page endings changed, and I lost some of the WordPress features that create a smooth experience.

Some of the issues weren't visible until I deployed to DreamHost's Apache webserver. Apache behaves differently than GitHub Pages' web server or Jekyll's built-in Webrick that I use for testing. Things that worked fine locally didn't work on my live site.

Apache's .htaccess file can solve these problems. Htaccess is a configuration file that lets you fine-tune how Apache serves pages. It fixed several issues:

  • moving pages to new locations
  • using a different permalink pattern
  • automatically adding .html to URLs
  • adding error pages
  • preventing snooping in directories

Here are the .htaccess recipes I used on Dreamhost. The complete file is included at the end. The recipes aren't specific to Dreamhost, and should work on other ISPs that use Apache.

As you work, reload your site in a web browser after each change. A mistake in your .htaccess file will cause a server error. It's easier to debug when you know the last thing that changed.

Where does Dreamhost keep the .htaccess file?

To do this on your own Dreamhost site, you need to be able to edit your .htaccess file from the command line online or upload a new version.

When you set up webhosting, Dreamhost created a folder with the same name as your domain in your home directory. This is the root of your website. The .htaccess file lives there.

You may already have an .htaccess file in your webroot. Make a backup copy if one exists. If not, create a new file named .htaccess .

The .htaccess file name begins with a period. Some FTP clients hide them by default, so you may have to edit preferences to make it appear. When working from the command line, remember to use ls -a to see file names that begin with dots.

Turn off directory listings

By default, a visitor can browse through your site's directories. Try it out by visiting your site's /assets/ page. Look at all that great stuff!

The IndexIgnore directive tells Apache which files to omit from a directory listing. Let's omit all of them. At the top of the .htaccess file, add:

IndexIgnore *

Add custom error pages

The built-in Apache error pages are bland. It's nice to add custom pages that match your site's layout. I made custom error pages for missing pages, permission errors, and server errors.

Custom error pages are created like any other Jekyll page. Make a content file with frontmatter and a permalink.

Error page configuration has three parts. The keyword ErrorDocument is followed by the HTTP Status Code it corresponds to, and ends with the file to show when that HTTP status code occurs.

ErrorDocument 401 /error/not_allowed.html
ErrorDocument 403 /error/not_allowed.html
ErrorDocument 404 /error/not_found.html
ErrorDocument 500 /error/godzilla_rampage.html

A static site won't produce a 500 error, but I love the picture. It's an easter egg.

Rewrite rules to redirect pages

Rewrite rules send incoming requests to another page when they match a formula. They smooth over the URL differences between WordPress and Jekyll.

Rewrite rules are a powerful feature and worth getting to know better. Smashing Magazine has a good detailed introduction to htaccess files and URL rewriting here.

Let's work through some use cases you may have found in your own migration.

The rewrite block

Here's an empty skeleton of the rewrite block. Add it to your .htaccess file. It's OK to have more than one rewrite block.

<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
# rewrite rules go here
</IfModule>

Before you start: remove WordPress rewrite rules

WordPress uses .htaccess rewrite rules to serve pages. If your .htaccess file contains WordPress configuration, delete the Wordpress rules.

Here's what you're looking for:

# -- start deleting here --
# BEGIN WordPress
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteRule ^index\.php$ - [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule . /index.php [L]
</IfModule>
# END WordPress
# -- stop deleting here --

Allow any requests that have matches

It's a good practice to list rewrite rules from the most specific to the most general. Apache will return the new page as soon as it finds a match, so a request won't get a chance to find a closer match later.

My first rewrite rule returns any request for a file that exists. This block is mainly for peace of mind, to ensure it isn't accidentally matched later.

# pass through if the file exists
RewriteCond %{REQUEST_FILENAME} -f
RewriteRule ^(.*)$ - [L]

Send a specific URL to another URL

The simplest rule redirects a specific url to a specific file. Here's an example where I changed a file name after web crawlers had picked up the old one.

RewriteRule ^2016/01/talk-roundup-three-for-neo4j.html$ /2016/01/talk-roundup-four-for-neo4j.html [R=301,L]

Rewrite rules have four parts. The first word is the directive "RewriteRule"". The second part is the requested URL or URL pattern to match. The third phrase is what to convert the request into. The final section, in square brackets, are optional flags. I use [R=301, L]. R=301 makes Apache send HTTP status code 301, Moved Permanently, to signal that web crawlers should update their records. The L tells Apache to return the page without further matching.

This rewrite rule says to send /2016/01/talk-roundup-three-for-neo4j.html to /2016/01/talk-roundup-four-for-neo4j.html immediately.

Add .html to the end of URLs

None of my WordPress pages ended in .html, but all of the new Jekyll pages do. Any bookmark or search engine URL will break.

I added explicit rules for a few specific pages and wildcard rules for blog posts.

Specific pages are given rewrite rules pointing to their .html locations. The ? after the slashes make it match both URLs with and without a trailing slash.

# static pages
RewriteRule ^about/?$ /about.html [R=301,L]
RewriteRule ^contact/?$ /contact.html [R=301,L]
RewriteRule feed/?$ /feed.xml [R=301,L]

WordPress' permalinks all begin with a year. Instead of writing a rule for each page, we can use a formula with wildcards and regular expressions.

I created an accidental infinite redirect during development when the new destination URL was incorrect. It never matched a page, so was redirected and rewritten over and over. This rule catches any URL that ends in .html but didn't match a real page.

# pass anything ending in HTML - prevents infinite redirect if the following rules are buggy
RewriteRule ^(.*).html$ - [L]

And now the rule to add .html to any URL that looks like it starts with a year. ^20 matches any URL that begins with 20. The wildcard (.*) means 'blah blah blah' - anything one or more characters long. In the destination '$1' copies 'blah blah blah' and sticks .html on the end.

# Blog posts start with year.
# match a trailing slash, but omit it from the destination
RewriteRule ^20(.*)\/$ /20$1.html [R=301,L]
# tack html onto anything else
RewriteRule ^20(.*)$ /20$1.html [R=301,L]

The full .htaccess file

Here's the complete file.

# turn off directory browsing
IndexIgnore *

# show custom error pages for these HTTP statuses
ErrorDocument 404 /error/404.html
ErrorDocument 403 /error/403.html
ErrorDocument 401 /error/403.html

# start rewrite rules
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /

# pass through requests when the page exists
RewriteCond %{REQUEST_FILENAME} -f
RewriteRule ^(.*)$ - [L]

# renamed a file, point to new location
RewriteRule ^2016/01/talk-roundup-three-for-neo4j.html$ /2016/01/talk-roundup-four-for-neo4j.html [R=301,L]

# static pages
RewriteRule ^about/?$ /about.html [R=301,L]
RewriteRule ^contact/?$ /contact.html [R=301,L]
RewriteRule feed/?$ /feed.xml [R=301,L]

# pass anything ending in HTML - prevents infinite redirect if the wildcard formulas below are incorrect
RewriteRule ^(.*).html$ - [L]

# Blog posts start with year, ^20___ .
# match a trailing slash, but omit it from the destination
RewriteRule ^20(.*)\/$ /20$1.html [R=301,L]
# tack html onto anything else
RewriteRule ^20(.*)$ /20$1.html [R=301,L]

# end rewrite rules
</IfModule>