About this page
This page describes the design and development of BILE, a program that I wrote to give HTML pages a common look and feel. BILE was written in C and has been compiled for both Windows and Linux.
Contents
Background
I wrote BILE originally to help produce a satirical Web site (hence the name) that never got off the ground. This was some years ago before blogging software and Web-based Content Management Systems were commonplace and before Web hosting services routinely offered CGI or other server-side interactivity. For this reason, BILE was created as an “offline” tool; the intention is that you run BILE over your Web pages and upload the results to your Web site. Therefore, BILE is suitable in the following situations:
- If your hosting service only permits you to host static content
- If your Web site is small and doesn’t need a full Content Management System
- If your HTML documents are intended to be accessed off line; a manual or slideshow, for example.
BILE can be used to style dynamic content such as PHP or ASP pages but, as with static content, BILE must be run on these files before they are uploaded to the server.
At the time BILE was conceived, the most commonly-available technology for giving Web pages on a site a consistent look and feel was Server Side Includes (SSI)[1]. However, SSI+ requires the Web page author to embed special comments into each and every page on which they want common page elements (such as sidebars and footers) to appear. Instead of doing this, BILE uses a template file into which the body of each page is “poured”; see the section on BILE’s application model for a more detailed description.
The first version of BILE was written in QBasic, the free version of QuickBasic that Microsoft distributed with DOS 6 and Windows 95. This version of BILE offered the user very little control over the output unless they were willing to change the BASIC code. It was also limited in the kinds of metadata it could extract from the input files because of limitations in QBasic itself.
Because of these limitations, I decided to rewrite BILE in a more powerful language. Specifically, I wanted to add the following improvements:
- A way of extracting metadata from files of different types
- Support for multiple directories and multiple templates. The original version of BILE operated only on a single directory and used only one template file.
- A proper template language with conditional statements, variables, etc. (Note however that the template language has no general looping construct or equivalent to the “goto” statement; it is not intended as a general-purpose programming language)
After some experiments, I decided on C as the implementation language.
Build environment
The BILE source currently consists of about 6,000 lines of C. The source is fairly portable with little conditional compilation. I build the code using the GNU C Compiler on Linux and the MinGW32 port of same on Windows. The build is controlled using GNU make but I don't use the GNU autoconf tools. In addition, I use the following software for version control and change tracking:
- Git[2] (originally, CVS) for version control
- CVSTrac[3], a bug-tracking and Wiki tool that integrates with CVS and Git
- Doxygen[4], to generate source code documentation
- Git Extensions[5], a shell extension for Windows that allows Git operations to be carried out from Windows Explorer
Application model
Basic concepts
BILE was originally intended to be used to produce a news-type Website and its terminology reflects this. A BILE “project” is referred to as a publication. A publication is broken into sections which contain stories. Each section can have one or more indexes which are used to generate lists of links, tables of contents, etc. Stories can also have tags like those found on a number of Web sites. In the present implementation of BILE, publications and sections map to directories and stories map to files. There only difference between the publication and its sections is that indexes defined in the publication configuration file (see below) apply to all files in all sections and subsections, whereas indexes defined in section configuration files only index those files in the section directory itself.
Each BILE entity — publication, section, index and story —
has a number of variables associated with it. Some of these variables
are created by BILE itself on startup, some are read from file metadata and
configuration files and some are modified by BILE during processing.The
entities form a set of nested scopes. For example, if BILE is processing a
story file’s template and it encounters a reference to a variable
$var, it will first check if there is a local variable called
$var. If there is not, it will check the section’s
variables, the parent section’s variables and so on up to the
publication’s variables. Finally, it will check the computer’s
environment variables. If no variable can be found, it will create a new local
variable in the story called $var and assign it a blank value. If
BILE code in an “inner” scope attempts to change a variable in an
“outer” scope, BILE will create a local variable with the same name
and containing the modified value. This is to prevent side effects as BILE
doesn’t guarantee the order in which it processes files. However, this
behaviour can be circumvented if necessary by using the SET
command described below.
Command-line invocation
BILE is invoked as follows:
bile [-f] [-v] -i input directory -o output directory -t template directory
| Command-line switch | Description |
|---|---|
-f |
Optional. Force regeneration of all output pages even if the input hasn’t changed. |
-v |
Optional. Verbose mode. Generate progress information while running. |
-i |
Mandatory. Specifies the input directory. |
-o |
Mandatory. Specifies the output directory. |
-t |
Mandatory. Specifies the template directory. |
Index pages
One of the benefits of using BILE is its ability to automatically generate
index pages which act as tables of contents for a site. An index
can be added to any template using the INDEX
block command but a separate file or set of files can be generated for an index
by setting the $index_file and $index_template
variables for in the index’s definition in the publication or section
configuration file.
Multi-page indexes
Normally, an index page will consist of only a single output file. This can
be overriden by changing the value of the $index_file variable
(using the SET command as you are changing
the global state) and then using the BREAK
command. BILE checks the value of the $index_file variable after
it has output the index page and if it has changed, re-runs the template
with the new file name.
Note: While inside an INDEX block, BILE changes
the way it searches its scope for variables. When processing a file normally,
the search looks like this:
story → section → … publication → environment
Inside an INDEX block, the search looks like this:
story → index → section → … publication → environment
Configuration files
BILE configuration files are used to store information about the publication and each section in the publication. They also store the index definitions for the publication and its sections. They have a.bile file
extension. The publication configuration file is called publication.bile
and is located in the publication’s top-level directory. Each subdirectory
in this direction can have a section-specific configuration file called
section.bile.
Format
The configuration files have the following format:
# A comment; ignored
$var_name1 = `A literal`
$var_name2 = `A valid ` . ucase(`bile`) . ` expression`
# Index definition
index index_name
$sort_by = `-file_date`
endindex
Note the following:
- Blank lines and those beginning with a “#” character are ignored
- Variable names are prefixed with a “$” character, similar to PHP or Perl. Any valid BILE template language expression can appear on the right-hand side of an assignment.
- Indexes are defined by the
indexkeyword and the definition is terminated with theendindexkeyword
Mode of operation
For BILE to run, it needs three arguments: the input directory from which files are to be read, the output directory where processed files are to be created, and the template directory where template files are stored. When BILE is run, it performs two main phases of processing: reading the data it needs to generate the output, and the generation phase itself. The first phase proceeds as follows:
- BILE initialises the publication structure, creating some global variables
- It reads the
publication.bilefile and creates the publication indexes. - For each normal file BILE encounters, it extracts the metadata from it
into a story and adds it to the appropriate index. Some kinds of file
are never indexed (these include standard Web files like
robots.txtandfavicon.icoas well as files with the extension.incand any file whose name begins with a “.” character. Additionally, a file can prevent itself from being added to an index by setting the$noindexvariable. - For each directory it finds, it creates a section and reads the
configuration from that directory’s
section.bilefile, creating any indexes for this section. It does this recursively so that all subdirectories of the input directory are read.
At the end of this operation, BILE has everything it needs to do its job. It then proceeds to the generation phase:
- For each input file, it checks the
$use_templatevariable if a template file is to be used. - If a template is to be used and BILE has not used this template before, it reads the file from the templates directory and “compiles” into an internal form. BILE caches its templates in this form so they do not have to be read for each file.
- BILE then “executes” the template, passing the input file
to it. The result will be written to the output directory.
- BILE preserves the directory structure of the input directory
when it copies files to the output directory. That is, a file
in the input directory called, say,
input/news/latest.htmlwill be copied tooutput/news/latest.html. - If the file already exists in the output directory, BILE checks
if the input file and the template have changed since the
the date on the ouput file. If they have not, it will not
change the output file. This behaviour can be overriden by
specifying the “force” option (-f) on
the command line.
If the file doesn’t already exist in the output directory, BILE will set the variable
$is_new. If the file already exists but the input file is newer, BILE will set the variable$is_modified. If the template file is newer, BILE will update the output file but it will not set$is_modified.
- BILE preserves the directory structure of the input directory
when it copies files to the output directory. That is, a file
in the input directory called, say,
- Any subdirectories of the template directory are copied to the output directory so that any images, stylesheets or other files required by the templates will be available.
- For every index defined in the publication and its sections, BILE checks if an index page is to be generated and generates the index pages if necessary.
Controlling the output
BILE allows you to specify variables in the configuration file or metadata to give finer control over what gets output.
| Variable name | Description |
|---|---|
$use_template |
This is the variable that tells BILE to use a template on the input file. |
$template_file |
The location of the template file to use, if $use_template
is true. The location is relative to the template directory
specified on the command line.
|
$use_template_ext |
If use $use_template_ext is set to "true", the
extension of the template is used rather than the extension of the
input file. For example, if the input file is called input.html
but the template file is called template.shtml, then
the output file will be called input.shtml if
$use_template_ext is set to "true".
|
$output_mode |
If $output_mode is set to "both", both the
original input file and the file generated by passing the input
file through the template are copied to the output directory.
This is useful for non-HTML input. For example, this can be
used with image files to create a gallery.
|
Template files
BILE template files are simply text files with commands enclosed in
double square brackets, [[like this]]. Template commands can be
simple commands or block commands that enclose other commands. Blocks
are closed by preceding the command name with a slash, for example,
[[if]] ... [[/if]]. Some commands are immediate; that is,
they are evaluated when the template is loaded, not when it is
executed. Immediate commands are prefixed with a “!”
character.
The following commands are defined in BILE:
| Command | Description |
|---|---|
# |
A comment. Everything after the “#” is ignored. No output is generated |
= expression |
Writes the result of the expression to the output, escaping any HTML special characters. |
> expression |
Writes the result of the expression to the output without escaping any characters. |
BODY |
Writes the body of the input file to the output. What constitutes the body
of a file varies depending on the input file type. For example, for an
HTML file, only the parts of the file between the <BODY>
tags will be output; for a text file, the entire file will be output.
|
LOCATION expression |
Prints a “breadcrumb trail” for the input file using the result of the expression as a separator. The “breadcrumb trail” will consist of a link to the index page of the first index defined in the input file’s section, parent section, etc., all the way to the publication level. |
BREAK |
Leaves the current block.
Note: Unlike C and its descendents, a BILE [[block]]
In this case, the [[block]] |
BREAKIF expression |
Leaves the current block if the expression is true. |
IF |
Conditional block command. Syntax:
[[block]]
Note that there is no |
!INCLUDE expression |
Includes the filename given as the the result of the expression in the template including any BILE commands. Immediate command, so executed once when the template is loaded. |
INDEX [expression] |
Block command. Evaluates the expression and looks for an index of that name. If an index is found, the block is evaluated for each file in the index. Used to generate tables of contents. If included in an index template, the expression may be omitted. |
LET $variable = expression |
Assigns the value of the expression to the local variable $variable.
|
PREAMBLE |
For HTML files, outputs any text that occurs before the opening <HTML> tag. Useful for PHP files which may have setup code before the <HTML> tag. |
SECTIONS |
Generates a list of sections defined in the publication. |
SET $global = expression |
Assigns the value of the expression to the global variable $global.
|
TAGS |
Block command. For each tag defined in the input file, evaluate the block. |
Expressions
BILE has simple expression evaluator based on Jack Crenshaw’s series of articles entitled, “Let’s build a compiler”[6]. The syntax is similar to that of PHP’s with some peculiarities described below.
Variables
Like PHP, BILE variables are prefixed with a “$” character. Prefixing a variable name with two “$” characters works as it does in PHP, allowing a simple form of indirection, for example:
[[let $a = `Test`]]
[[let $b = `a`]]
[[= $$b]]
This will write “Test” to the output. This also works for functions, for example:
[[let $func = iif($do_uppercase, `ucase`, `lcase`)]]
[[= $func(`Test`)]]
will write “TEST” to the output if the variable $do_uppercase
is true.
Literals
String literals may be delimited by single quote ('), double quote (") or backquote (`) characters. BILE does not interpolate variable names in double-quoted strings like PHP does.
BILE recognises the Boolean literals true and false.
The following additional values are regarded as Boolean False:
- The empty string
- Any string that is convertible to the number zero
- The literal string, “False”.
All other values are regarded as True in a Boolean context.
Operators
Like PHP, “.” is the preferred string concatenation operator. Using “+” may not work.
BILE’s arithmetic operators are (in decreasing order of precedence):
^(exponention)mod(modulus),div(integer division),*(multiplication),/(division)+(addition),-(subtraction)
The logical operators have higher precedence than the arithmetic operators and are (in decreasing order of precedence):
and(logical AND)or(logical OR),xor(exclusive OR)not(logical NOT)
BILE’s comparison operators are somewhat unorthodox. This is because BILE is often embedded in HTML code in which the standard comparison symbols like “<” and “>” have special meanings and must be escaped. Although the BILE parser could be modified to recognise the escaped form of the operators, I felt this would reduce the readibility of BILE code. Therefore, I decided to use two-letter names for the comparison operators which FORTRAN and DCL programmers might recognise, but will probably be unfamiliar to everyone else!
| Conventional operator | BILE operator | Description |
|---|---|---|
| =, == | eq | equal to |
| <>, != | ne | not equal to |
| < | lt | less than |
| <= | le | less than or equal to |
| > | gt | greater than |
| >= | ge | greater than or equal to |
Functions
BILE has a number of built-in functions. These functions have been added on a more-or-less ad-hoc basis as I needed them so they are something of a “mixed bag”. The functions in BILE are stored in a table of function pointers so it is fairly straighforward to add new ones.
| Function | Description |
|---|---|
basename(file_path) |
Removes the directory part of file_path and returns the filename.
|
decode(expression, [val1, ret1, ... valn, retn], default |
Compares expression to va11. If they are
equal, ret1 is returned. If not, the next val
is compared. If none of the supplied vals match, the
default value is returned.
Note: This function is equivalent to the Oracle function of the same name. |
defined(variable_name) |
Returns True if a variable called variable_name exists
in the expression’s scope, False otherwise.
|
dirname(file_path) |
Returns the directory part of file_path.
|
ent(entity_name) |
Returns an SGML entity reference. For example, ent(`quot`)
returns ". This is a covenience function intended
to reduce the amount of escaping necessary in BILE code.
|
exec(program_name) |
Runs the external program program_name and captures any
output it generates. The program’s exit code is stored in
the global variable $error.
|
file(file_name) |
Reads the file file_name and returns its contents.
|
file_exists(file_path) |
Returns True if the file file_path exists, False
otherwise.
|
iif(expression, true_val, false_val) |
If expression is True, returns true_val,
otherwise false_val is returned
Note: This function is equivalent to the VB/VBA function of the same name. |
index_first(index_name[, variable_name]) |
Checks the publication for an index called index_name.
For index_first(), the value of
variable_name in the scope of the first file in the
index is returned. For index_last(), the value of
variable_name in the scope of the last file in the
index is returned. For index_prev() and
index_next(), the current file’s position
in the index is determined and then the the value of
variable_name in the scope of the preceding file
or the following file in the index is returned. If
variable_name is not specified, the value of
the variable $file_name is returned.
These functions are used to generate Previous/Next links on pages. |
index_last(index_name[, variable_name]) |
|
index_next(index_name[, variable_name]) |
|
index_prev(index_name[, variable_name]) |
|
lcase(string) |
Returns string in lower case.
|
length(string) |
Returns the length of string.
|
now() |
Returns the current time as the number of seconds since midnight of Jan 1 1970. |
relative_path(path1, path2) |
Given two absolute paths path1 and path2,
returns a path to path2 relative to path1.
|
strftime |
Formats a time value. Accepts the same format as the
strftime function in the C library.
|
substr(string, start[, length]) |
Returns the substring of string of length
characters, starting at offset start. If length
is omitted, the substring to the end of the string is returned.
Note: BILE counts the characters in strings starting from zero. |
tag(tag_name[, attr_name1, attr_val1, ...]) |
Returns an SGML element (tag) with the specified attributes. For
example tag(`h1`, `class`, `title`) returns
<h1 class="title">. This is a
convenience function used to reduce the amount of escaping in
BILE code.
Note: |
ucase(string) |
Returns string in lower case.
|
Note: In order to simplify the parser, there can be no space between a function name and the opening bracket.
File metadata
BILE works by being able to extract metadata from its input files. There is a default file handler that works on all files and extracts information common to all files such as their name, size and modification date. There are two additional handlers for processing HTML and image files.
General metadata
The general file handler will create the following variables in the file’s scope:
$file_name$file_size$file_date
HTML metadata
The HTML file handler parses the <HEAD> element of the HTML
file and will create a variable called $title equal to the contents
of the <TITLE> element. In addition, it will create variables
for every <META> element it finds in the <HEAD>,
replacing any characters that are illegal in BILE variable names with
underscores. For example, if an HTML file contains the following <HEAD>
element:
<HEAD>
<TITLE>Home Page</TITLE>
<META HTTP-EQUIV="Content-Type" CONTENT="text/html">
<META NAME="Keywords" CONTENT="home page">
</HEAD>
BILE will create the following variables:
$title = `Home Page`$content_type = `text/html`$keywords = `home page`
Image metadata
The image file handler can parse a GIF, JPEG or PNG image and extract the
image’s type and dimensions. These will be stored in the variables
$content_type (as a MIME type), $image_width and
$image_height. For GIF and JPEG images, it will check for
embedded comments in the image and store them in the variable
$comments. For PNG images, the handler will check for
tEXt chunks. These chunks contain image metadata in key/value
pairs and the handler will create a BILE variable for each key/value pair it
finds.
Known issues
There are a number of bugs and other problems with BILE:
- Logging is currently broken, so specifying the -v (verbose) switch on the command line doesn’t work.
Further directions
As it stands, BILE serves my needs pretty well. I use it to maintain this Web site. However, there are a number of features that could be added to make it more useful.
- Build a better expression evaluator, perhaps using flex. However, this would be something of an academic exercise. The rather flaky nature of the BILE language at present discourages writing large quantities of BILE code — which is probably no bad thing. If you find yourself writing lots of BILE code, it’s a good sign your requirements have surpassed BILE’s rather modest abilities and it’s probably time to look at a more powerful templating system.
- Support metadata extraction from more file types (e.g. OLE compound documents, audio files, EXIF image files). Also, the existing file handlers are very crude. They scan the input files using brute force rather than using external libraries like libpng or libjpeg which are probably more forgiving of errors.
- Have BILE generate a script that automatically uploads changed pages to a Web site using FTP or HTTP.
- BILE turns a blind eye to encoding issues. This needs to be addressed if BILE is to work robustly in multilingual environments.
- Allow plugins for additional file types and functions so BILE can be extended without having to recompile.
- BILE is written in straight C, albeit using a pseudo-OO style. This suggests that it might make more sense to reimplement in a proper OO language like C++, Java or C#.
References
Document history
| Version | Author | Date | Comment |
|---|---|---|---|
| 1.1 | Ken Keenan | 06 January 2016 | Updated details |
| 1.0 | Ken Keenan | 13 August 2007 | Initial version |