GutenMark
Home Page
Attractively formatting Project
Gutenberg texts
Contents
What is GutenMark?
What is Project Gutenberg?
The Problem
What's the Solution?
How does GutenMark fit in?
What is GutenMark?
GutenMark is a tool for automatically creating high-quality HTML
markup from Project Gutenberg etexts. In combination with freely-available
HTML-to-Postscript conversion tools, GutenMark can convert Project
Gutenberg etexts into publication-quality Postscript, for print-on-demand
applications. This conversion is completely automatic, without manual
markup or editing.
Or at least, that's the goal.
What is Project Gutenberg?
Project Gutenberg -- or PG for short
-- is a marvelous project for freely providing online books.
Thousands of such "etexts" have been made available. Many are familiar
classics, and many others are completely unfamiliar books you're unlikely
to find anywhere else. I've provided a dozen or so of the etexts
myself.
The Problem
A problem with PG etexts -- dare I say it? -- is that they are not very
pretty, in comparison to typical hand-held books. PG etexts have
traditionally been provided in a format providing most of the content
of the books (i.e., what was in the author's mind) but have discarded the
attractive formatting (provided by the publisher). There have been
sound reasons for doing so, and I can't quibble with them.
It turns out, for many of us, that we really do prefer the more attractive
printed version over the "plain-vanilla" PG version. In other words,
we'd rather buy the book than read the online etext. I fear that
this effect has limited PG readership somewhat.
The situation has improved somewhat in recent years, in several ways.
Special software for reading the online books can make the books appear
more attractive on the computer screen. There is a project of the
HTML
Writers Guild to provide HTML versions of PG etexts. Even PG
itself is now willing to accept formatted versions of etexts, as long as
the plain-vanilla version exists also. I applaud all of these efforts,
and I hope it does not denigrate them to add my own efforts to theirs.
What's the Solution?
I think that printing on demand would go a long way towards making
PG better for the readers. It would be great to be able to take any
random PG etext and automatically format it so that it is as attractive
as a printed book, for either online reading or for printing.
How does GutenMark fit in?
GutenMark is a free command-line utility for Win32 or Linux (or
BSD or UNIX or Mac OS X ...). It accepts a Project Gutenberg etext,
applies what I hope are intelligent heuristics to it, and produces attractively-formatted
HTML. My definition of "attractively formatted" is that it should
look like a book when you print it.
In other words, Project Gutenberg has retained the content of the books
in converting them to etexts, but has discarded the formatting. GutenMark
aims to restore the formatting.
How well does GutenMark succeed? It depends on the particular
etext; in my view, it works pretty well. To give you some idea, here
is a small sample:
You might need to download free Adobe's
Acrobat Reader program to view or print the PDF file. I created the
PDF file just for fun, using a freely available HTML-to-PDF conversion
utility (html2ps), so that you could see a book-like printout.
It's important to understand that no manual markup or editing was performed
at any stage.
Last updated 11/12/01 by RSB. Contact me.