GutenMark Frequently Asked Questions (FAQ)
Attractively formatting Project Gutenberg texts
home
download
usage
FAQ
changelog

How much does GutenMark cost?  It's free, licensed to you under the GPL.

What if I want to use GutenMark in a commercial application, such as print-on-demand, or to set up a website for automatically prettifying PG texts?  Fine, do it!  I'll be happy to encourage this in any way I can, if you let me know what you're trying to do.  Again, consult the GPL if you are thinking of distributing the GutenMark program itself.  Also, you will want to carefully read the Project Gutenberg "fine print" to determine if what you want to do is acceptable to Project Gutenberg.

Why use GutenMark rather than another text-to-html markup utility?  Although GutenMark is a text-to-html markup tool, it is not a general-purpose utility.  It is specifically designed to correct the deficiencies of Project Gutenberg etexts.   The goal is 100% automatic publishable-quality markup.  In  other words, to produce books that look as if they had been published.  General-purpose tools are not really suited for this.

What's the status of GutenMark?  GutenMark has reached the stage of being pretty suitable for personal use.   It should also be very useful for anyone intending to manually mark up a PG etext in HTML, since it does most of the work for you.  For a commercial printing operation -- e.g., a print-on-demand service -- GutenMark can use some improvement.  For a list of things that GutenMark can't do (or perhaps, can't do well), click here.

GutenMark seems to work much more slowly than the early versions.  That's true.  GutenMark is now able to use wordlists and namelists to help it work more intelligently, but this intelligence comes at the cost of speed.  Processing the wordlists isn't really dependent on the etext file's size -- in other words, it adds roughly the same amount of time for big etexts as for small etexts -- and so the speed difference seems more obvious, and is more objectionable, if you process a small test etext.  Here's a rough speed comparison made on my 450 MHz iMac, processing the 400 Kbyte etext file TMOTB10.TXT.
 

GutenMark
"profile"
Wordlists
Processing
time
--profile=english
(the default)
American names, English, French, German, Italian, Latin, Spanish
25 seconds
--profile=none
none
4 seconds
--profile=english_all
American names, Danish, English, Finnish, French, Gaelic, German (2), Italian, Latin, Norwegian (monstrously big), Spanish, Swedish
66 seconds

I find this acceptable, but if you don't, there are a few things you can do about it, short of getting a faster computer.  :-)

Why does GutenMark discard the Project Gutenberg file header by default?  Is this even legal?  If you refer to the Project Gutenberg standard file header (an example of which may be seen here), under the section titled DISTRIBUTION UNDER "PROJECT GUTENBERG-tm", you'll note that Project Gutenberg specifically requires the header (and all other references to PG) to be removed if the etext has been changed.  It's unclear whether GutenMark changes the etext sufficiently to activate this clause, but in any case removal of the header is always allowed.  Therefore, the default is to remove the header.  You can restore the PG header with GutenMark's "--yes-header" command-line option.  If you do so, please keep in mind that complying with PG's requirements is entirely your responsibility.

Why does the HTML produced by GutenMark look funny in my browser?  Each browser tends to have its own individual quirks that limit the accuracy with which it can display HTML correctly.  So far, the only browser I've personally checked out that correctly displays all of the special characters needed by GutenMark is Microsoft Internet Explorer 5 on Windows.   (Internet Explorer for Mac OS X does not.)  You can check some of your own browser's capabilities by looking at the following table.
 

Description
What your browser displays
long dash (em-dash)
-
short dash (en-dash)
-
soft hyphen
should be in­vis­i­ble
curly left double-quote
"
curly right double-quote
"
curly left single-quote
`
curly right single-quote
'

Remember, though, that the goal of the GutenMark project is to produce good-looking printouts, or good-looking PDF-based online displays, and only secondarily to produce good-looking browser-based online displays.  So if you print this page, the table will often appear very differently than it does on your computer screen.

Why does GutenMark waste disk space by using custom "wordlists" instead of the spelling dictionary already installed on the computer? GutenMark is designed to be very portable, and the types of spelling dictionaries available differ greatly from one computer platform to the next.  Most GutenMark wordlists are derived from the spelling dictionaries of the ispell program, which are installed on many Linux computers.  However, even on a Linux platform, some of these ispell dictionaries have technical deficiencies from GutenMark's standpoint.  For some languages (such as Latin), no comprehensive ispell dictionary has been previously available.  Therefore, the choice has been made to produce a set of custom wordlists used only by GutenMark, even if it has the unfortunate side effect of increased download-times and disk-usage.

What if I want to mirror the GutenMark website?  Swell!  Just make sure you use the material as-is, without change. Let me know about it, and I'll provide a link.


©2001 Ronald S. Burkey
Last updated 11/25/01 by RSB.  Contact me.