Cleaning Up After the Move: .htaccess recipes for Jekyll
A few things broke during the switch from WordPress to Jekyll. Content moved, page endings changed, and I lost some of the WordPress features that create a smooth experience.
Some of the issues weren't visible until I deployed to DreamHost's Apache webserver. Apache behaves differently than GitHub Pages' web server or Jekyll's built-in Webrick that I use for testing. Things that worked fine locally didn't work on my live site.
Apache's .htaccess file can solve these problems. Htaccess is a configuration file that lets you fine-tune how Apache serves pages. It fixed several issues:
- moving pages to new locations
- using a different permalink pattern
- automatically adding .html to URLs
- adding error pages
- preventing snooping in directories
Here are the .htaccess recipes I used on Dreamhost. The complete file is included at the end. The recipes aren't specific to Dreamhost, and should work on other ISPs that use Apache.
As you work, reload your site in a web browser after each change. A mistake in your .htaccess file will cause a server error. It's easier to debug when you know the last thing that changed.
Where does Dreamhost keep the .htaccess file?
To do this on your own Dreamhost site, you need to be able to edit your .htaccess file from the command line online or upload a new version.
When you set up webhosting, Dreamhost created a folder with the same name as your domain in your home directory. This is the root of your website. The .htaccess file lives there.
You may already have an .htaccess file in your webroot. Make a backup copy if one exists. If not, create a new file named .htaccess .
The .htaccess file name begins with a period. Some FTP clients hide them by default,
so you may have to edit preferences to make it appear. When working from the command line, remember to use
ls -a to see file names that begin with dots.
Turn off directory listings
By default, a visitor can browse through your site's directories. Try it out by visiting your site's /assets/ page. Look at all that great stuff!
The IndexIgnore directive tells Apache which files to omit from a directory listing. Let's omit all of them. At the top of the .htaccess file, add:
Add custom error pages
Custom error pages are created like any other Jekyll page. Make a content file with frontmatter and a permalink.
Error page configuration has three parts. The keyword
is followed by the HTTP Status Code it corresponds to,
and ends with the file to show when that HTTP status code occurs.
I don't know how a static site will produce a 500 error. It's a little easter egg for a lucky visitor.
Rewrite rules to redirect pages
Rewrite rules send incoming requests to another page when they match a formula. They smooth over the URL differences between WordPress and Jekyll.
Rewrite rules are a powerful feature and worth getting to know better. Smashing Magazine has a good detailed introduction to htaccess files and URL rewriting here.
Let's work through some use cases you may have found in your own migration.
The rewrite block
Here's an empty skeleton of the rewrite block. Add it to your .htaccess file. It's OK to have more than one rewrite block.
Before you start: remove WordPress rewrite rules
WordPress uses .htaccess rewrite rules to serve pages. If your .htaccess file contains WordPress configuration, delete the Wordpress rules.
Here's what you're looking for:
Allow any requests that have matches
It's a good practice to list rewrite rules from the most specific to the most general. Apache will return the new page as soon as it finds a match, so a request won't get a chance to find a closer match later.
My first rewrite rule returns any request for a file that exists. This block is mainly for peace of mind, to ensure it isn't accidentally matched later.
Send a specific URL to another URL
The simplest rule redirects a specific url to a specific file. Here's an example where I changed a file name after web crawlers had picked up the old one.
Rewrite rules have four parts. The first word is the directive "RewriteRule"".
The second part is the requested URL or URL pattern to match.
The third phrase is what to convert the request into.
The final section, in square brackets, are optional flags. I use
R=301 makes Apache send HTTP status code 301, Moved Permanently,
to signal that web crawlers should update their records.
L tells Apache to return the page without further matching.
This rewrite rule says to send /2016/01/talk-roundup-three-for-neo4j.html to /2016/01/talk-roundup-four-for-neo4j.html immediately.
Add .html to the end of URLs
None of my WordPress pages ended in .html, but all of the new Jekyll pages do. Any bookmark or search engine URL will break.
I added explicit rules for a few specific pages and wildcard rules for blog posts.
Specific pages are given rewrite rules pointing to their .html locations.
? after the slashes make it match both URLs with and without a trailing slash.
WordPress' permalinks all begin with a year. Instead of writing a rule for each page, we can use a formula with wildcards and regular expressions.
I created an accidental infinite redirect during development when the new destination URL was incorrect. It never matched a page, so was redirected and rewritten over and over. This rule catches any URL that ends in .html but didn't match a real page.
And now the rule to add .html to any URL that looks like it starts with a year.
^20 matches any URL that begins with 20. The wildcard
(.*) means 'blah blah blah' - anything one or more characters long.
In the destination '$1' copies 'blah blah blah' and sticks
.html` on the end.
The full .htaccess file
Here's the complete file.