Park
Imminent

Pipi

Programming

Pipi is a very simple static site generator written in C. It was used to generate this site.

Making Pipi

A static site generator seemed like a useful and simple choice for my first C project. I really didn't know a lot of C when I started it, and what I did know I was unsure of. I had read two thirds of Modern C and another few chapters of Head First C. Everything else I searched for help with or read the man pages as I went.

I plan to return to this in a year and comment on my approach and implementation.

Read one file and write to another

Read a hardcoded file line by line, use the first character of each line to determine how it should be converted to HTML and write that to a hard coded output file. This was surprisingly easy.

1st char Line is...
= Heading
- Subheading
Paragraph, with _italics_ and *bold*
" Quote
+ ID for an image file, followed by a caption
> Code
# Numbered list
* Bulleted list
` Comment (blank lines are also ignored)
(other) New page (text is the page title)

Adjusting buffer

The biggest problem with 1 was that the lines were actually limited to 80 chars because of the buffer I was using. So new paragraph elements got created where they shouldn't. So, I found out how to adjust the buffer size as needed. This is probably the first time I ran into something that is just not an issue in more modern languages. It took me a while to understand the examples I read.

Reading into structs

Instead of reading and converting the site on the fly, I created structs representing a line and a page, then read the data into these. A series of these pages makes up the raw site data. This took a little while just to understand how and when to use * and & or neither, I am not nearly comfortable with it and sometimes had to try one then the other, looking at the error messages from the compiler to work out what was going on. I also wished for the ability to inspect the structs in memory (a debugger of some kind) at this point - but resolved it before I needed it.

Once I had the site read into the structs, I was quickly able to iterate through them and print each line (not converted to HTML) to stdout.

Multiple output files

It was very easy to generate multiple pages from a single input file now, just opened a file each time I was iterating through a new page, then pass it to the function that printed each line.

What next?

Sometimes I get stuck not because I don't know what to do next, but because I can't decide which step to take next:

Although it seems the hardest of the four, I think I'll try to do the navigation next.

(when I came back to this the next day, I changed my mind and decided to do the filename thing, because the existing code is making filenames based on the number of lines looped through, so different filenames get created each time and now I have a folder full of 0.html, 34.html, 35.html, 37.html).

Other ideas:

Creating sensible filenames

I want to convert the page title into a filename. Replace spaces with dashes, remove other non-printing characters.

Maybe I can just remove any characters except for alphanumerics and -. Should probably store the filename with the page struct too. I'll need it later when I generate links.

How should I compare size_t (a string length) to an index iterating through the string? Should I make the index size_t? Should I cast it? How do I handle errors?

Improving sensible filenames

I want to remove the double dashes from the filenames, and maybe investigate why the section below is getting an extra a on the end - am I reading past the end of a string.

First idea for removing the double dashes is just to keep a flag indicating whether the previous character written was a dash, and if so don't write subsequent dashes. Once I write a non-dash character, then reset the flag.

This actually turned out really easy to implement. Working out the extra a was a bit harder and I have run into a segmentation fault... looks like I am over-reading memory somehow... it seems related to line 51 in this file... perhaps the space for the struct or one of the strings overflows there?

Also rerunning the file multiple times gives slightly different results.

Yes - filename bug was due to writing the \0 for the filename one character ahead of where I was supposed to in string_to_filename(). Somehow, fixing this bug also fixed the segmentation fault issue... for now.

More ideas

Started tables

Got a basic table with only a single column/td being generated. Turned out to be easier than I thought. Dan suggested a _block-mode_ which would work for numbered and ordered lists too.

Next step is to split out each line by it's internal pipe characters, work out how many columns the table should have. May also need special handling for when a table has missing cells - use colspan to fill the remainder?

First row sets the number of columns. |s in succession like || indicates a colspan. A space in between would indicate an empty table instead.

If a subsequent row has more |s, indicating more columns than the original row, then the excess is placed in the last column as per the number of rows specified in the first row.

More tables

Implemented a clumsy table splitting, but it worked first try. Now I need to handle exceptions - when there are more columns in the header than the following rows, or less.

1st char Line is... More
= Heading adfs
- Subheading adfg
Paragraph, with _italics_ and *bold* daf
" Quote af

*More ideas:*

After adding the test table above, I started getting segmentation errors again. I removed the tables and still got the issue, so this is caused by something else.

Multiple input files

Time to take a break from tables. They work well enough, just don't handle errors very well. :) I think I will try to handle multiple input files.

First look for all .pi files in current directory - turned out to be pretty easy.

Might stop here and convert my existing site into pipi!

Converting the old site to Pipi

Just a cut and paste exercise, then change the Markdown into my own formatting. It will also be good for pointing out errors in my generator.

A little refactoring

Tried to separate some of the code into separate functions and ran into a segfault I couldn't figure out. That lead to trying to install gdb, which required a codesigning tutorial.

Permissions for GDB

And ultimately using lldb to debug the error anyway. It was because I was passing a pointer (page*) to a function instead of a double pointer (page**). Because things are passed by value, I had to pass a reference to my pages 'array' instead of the array...

Basic index page

Create an index page; for each page I create, list it on the index page with a link. Each page struct already has it's filename, so this should be pretty easy.

Put files in subfolders

Need to split the title/filename up based on \ characters as though they were directories.

On second though, all the files can just have unique names and go in the same subfolder. The site nav will still use the 'folders' though.

So the next step is to create some sort of hierarchical structure. A page has a parent and children. A page may have content, or it may only be defined as a category.

If a page has content and is a category with children, how to show the navigation and content together?

CSS

I split the page title out from the subfolder (which I'm calling section for now). Haven't done anything with the section value yet. Instead I made a basic index page and then did a little bit of CSS. I'd like to keep it all basic HTML elements if I can, no ids or classes.

More 'block mode' parsing

I'll try to implement ordered/unordered lists and /pre> blocks now. Shouldn't be too hard compared to tables.

Multiple lines of code?
Should be treated
As blocks too.
  1. Item 1
  2. Item 2
  3. Item 3

Dates

I'll add a new rune that specifies when a page is created.

/ yyyy-mm-dd, yyyy-mm-dd

The first date specifies the date the piece was written. The second the date the piece was updated. The second is optional.

If this rune appears multiple times in a page, later appearances will overwrite the previous ones.

_pages_ should be sorted based on the created date (or updated, if present) and displayed in order from newest to oldest. Updated posts should be marked accordingly.

So:

  1. parse dates
  2. sort pages
  3. update the index page
  4. style the page

Sections

Adding the dates wasn't too bad. I ran into some segfaults but quickly fixed them. One thing I need to understand better is when I need to malloc and free. I think I need to allocate memory on the heap when I am going to use the value non-locally, passing to another function? It seems like I'm doing a lot of building and creating strings.

Anyway, on to sections. Each page can have a section as part of it's title, e.g. Programming\Pipi. The section gets cut off and displayed on the index page. I want sections to have their own mini-index page, with all pages in that section.

I guess it's similar to the index page generation, only this time I have to test each page before I add it. Perhaps I can write a more generic function - generate index and pass an optional section value. That doesn't sound too difficult.

Although... I have to get a list of all the sections I want to generate pages for as well. Should I create a list when I'm parsing the source data, or generate the indexes afterwards and maintain a list of sections I have already generated?