GutenMark Usage Page
Attractively formatting Project Gutenberg texts
home
download
usage
FAQ
changelog

Contents

GutenMark Tutorial
Features
Wish List
Software Developers

GutenMark Tutorial

GutenMark is a command-line utility, so you have to use it from the Win32 "MS-DOS Prompt" or from the Linux/UNIX/BSD/MacOS-X command shell:
GutenMark [options] [inputfile [outputfile]]
For example,
GutenMark tomco10.txt tomco10.html
Other possibilities are to use the program in "filter" style:
GutenMark tomco10.txt >tomco10.html
     or
GutenMark <tomco10.txt >tomco10.html
Because GutenMark is intended to be fully automatic, there are very few command-line options:
 
Option
Description
--help
Displays a list of the available options.
--no-justify
Outputs paragraphs in ragged-right format.  The default format is right-justified.  This option is useful if the htmldoc utility is used to convert HTML to Postscript because htmldoc is (or has been) buggy in regard to right-justfication.  Or, I guess, if you just prefer ragged-right text. 
--no-mdash
By default, GutenMark replaces constructs like "--" with an mdash character.  This looks better when printed, but most browsers do a very poor job of rendering mdashes, so that HTML looks better with the original dashes in place.  The "--no-mdash" command-line option turns off the mdash conversion.
--yes-header
By default, GutenMark removes the Project Gutenberg file-header from the HTML output, in order to insure conformance with PG requirements.  The "--yes-header" command-line option causes the PG header to be retained.  You need to read the PG header and evaluate for yourself whether retention of the header is legal or desirable for your application.  (Removal of the header is guaranteed to be legal.)

Another thing you might want to do is, of course, to make a hardcopy of the book.  You can do this by printing directly from your browser, but the typical browser does not do a great job of making the HTML (however well it has been created) print like a book.  Several options are available, such as loading the HTML into Microsoft Word, and printing it from there.  A better method is to use one of the freely available  HTML-to-Postscript conversion utilities to create a Postscript or PDF version of the book.  This is, perhaps, easier if you are a Linux/BSD user than if you are a Windows user.  To create the PDF sample text, on Linux, I used the free utility html2ps, along with a custom configuration file named "html2psrc" that you can download by clicking here.

Here is what the complete sequence of steps looked like, in Linux, for converting the sample etext to PDF format:

GutenMark bldhb10.txt bldhb10.html
html2ps -f html2psrc bldhb10.html >bldhb10.ps
ps2pdf bldhb10.ps
Or, in Linux, we could simply have printed it rather than creating PDF, by replacing the final command with
lpr bldhb10.ps

Features

Here are some of the things GutenMark does:

Wish List

Some of the items below represent things that are merely hard to accomplish, whereas others are simply not possible because the information that would be needed to accomplish them is not present in the PG files.  But I still can wish ... In general, the closer the etext conforms to PG guidelines, the better GutenMark can handle it.


Software Developers

I don't know at the time this is being written whether anyone will want to contribute features or bug-fixes to GutenMark, so I haven't really allowed any way to do it.  If you want to do so, I'd suggest communicating the changes directly to me.

Oh, and I know that the code isn't very pretty.  I was really just throwing together a proof-of-concept, and it started being useful much more quickly than I thought it would.  Perhaps I'll pretty it up later.



Last updated 11/13/01 by RSB.  Contact me.