Disclaimer: This post assumes that you know Haskell, and is probably most understandable (and useful) if you have a little bit of experience with Hakyll. If you don’t, go check them out! Haskell is a great programming language, and Hakyll is a great way to make a website.
In setting up this blog, I wound up reformatting some old articles I had written so that they’d look nice on the web. I think the original PDFs still look better in a lot of ways, so I wanted to give people access to them.
I’m using Hakyll to generate this website, so the challenge was to have the PDF links show up when there was an associated PDF, and otherwise have nothing show up. This wound up being more difficult than I thought it would be, so I thought I should document the solution in case it’s useful to someone else. (And, honestly, for when I forget what I’ve done.)
The first part is just going to be background for people unfamiliar with Hakyll, so skip that if you already know Hakyll and just want to know how to add optional PDFs.
As I write this the file hierarchy for the blog looks like this:
As you can see, there is a
hiring-teams.pdf, but no
jealousy.pdf, and we want this to be dealt with cleanly.
Before doing anything with PDFs, this is what the relevant part of my Hakyll configuration looked like:
main = hakyll $ do match "blog/posts/*" $ do route $ cleanRoute compile $ pandocCompilerWith pandocReadOpts pandocWriteOpts >>= loadAndApplyTemplate "templates/post.html" postCtx >>= loadAndApplyTemplate "templates/default.html" postCtx >>= relativizeUrls >>= cleanIndexUrls postCtx :: Context String postCtx = dateField "date" "%B %e, %Y" <> defaultContext
The first chunk tells Hakyll to use pandoc to compile the original markdown files into html, apply the relevant templates, and do some cleaning up.
There is some extra stuff going on offscreen in
cleanRoute so that the post
blog/posts/some-post.md gets copied to
blog/posts/some-post/index.html rather than to
blog/posts/some-post.html. This is so that you can access it with the simpler url
postCtx tells Hakyll what information we’re going to need for the files. It gives us some standard data, plus the date the post was written. Let’s see how that’s used, in
At the top of the page, there’s a bit saying when the post was written; this is where that comes from, so that I don’t have to manually put it into each post. The fields inside dollar signs are substituted by the Hakyll system.
The first thing we need to do is to make sure the PDFs end up in the final site:
main = hakyll $ do match "blog/posts/*.pdf" $ do route extrasRoute compile copyFileCompiler match "blog/posts/*.md" $ do route $ cleanRoute compile $ pandocCompilerWith pandocReadOpts pandocWriteOpts >>= loadAndApplyTemplate "templates/post.html" postCtx >>= loadAndApplyTemplate "templates/default.html" postCtx >>= relativizeUrls >>= cleanIndexUrls
This tells Hakyll to copy the PDFs over verbatim.
There’s more extra stuff in
extrasRoute to get
blog/posts/some-post.pdf to go to
Note that we need to add
.md to the second pattern to make sure it doesn’t match the PDFs.
The next step is to get the information about the PDF to the page as a field we can use as
$pdf$ to tell us where the PDF is (if it’s there at all).
After a whole bunch of looking through the documentation and trying things that didn’t work, this is what I came up with:
postCtx :: Context String postCtx = field' "pdf" (\item -> do let fp = toFilePath $ itemIdentifier item let pdfName = ((dropExtensions . normalise) fp <.> "pdf") pdf <- loadAll $ fromGlob pdfName return $ ListField (urlField "url" :: Context CopyFile) pdf) <> dateField "date" "%B %e, %Y" <> defaultContext
So what’s going on here?
field' is a function that I discovered in the internal code of Hakyll, that you don’t actually have access to by default. I had to copy it’s definition into the file:
You can completely ignore it’s definition though (I did). The main thing is what it does: It lets you have access to the post in question, and the state of the site compiler. The state of the compiler tells us which files have been loaded and to where. Given that information you have to produce a piece of data to be substituted in for the field. (In our case
So that’s what we do. We extract the file path of our post into
fp. We then turn it into the path for the PDF in
pdfName. Then we ask the compiler to give us all the PDFs it’s loaded that have that name. In this case there will always be either one or zero. We return that as a list field where each PDF knows it’s URL.
Don’t even ask why we have to explicitly mark the type as Context CopyFile. It has to do with some type system magic I don’t fully understand.
Now we can go back to our template and set it up as follows:
We loop through the PDFs (remember it’s just zero or one of them), and for each one we add a link to it’s URL.