GutenMark
Download Page
Attractively formatting Project
Gutenberg texts
Contents
License
Downloading GutenMark
Installing GutenMark
Compiling GutenMark
Other Stuff You Might Want
License
GutenMark is freely available under the terms of the GNU General
Public License (GPL). You may view the text of the GPL here
, or you may visit the Free Software Foundation
for more explanation.
Downloading GutenMark
The "base package" is a single archive file containing all of the source
code and executable programs for all directly supported platforms (Win32,
Linux-Intel, Linux-PPC, Mac OS X, and FreeBSD). Simply choose
the archive format (ZIP or TAR) most suitable for your particular platform.
The documentation is provided as a separate archive file, because of its
size; the documentation is just a copy of this website (except possibly
not as up-to-date), and therefore you may not want to download it.
Current Software
The "wordlists" and "namelists" are optional files that you can download
or not, as you choose. The wordlists are categorized as highly
recommended, recommended , or available,
based on my own admittedly subjective experience. Click
here for an extended explanation of what wordlists do .
Wordlists and Namelists
|
Description
|
Version
|
Download
|
|
My own special English wordlist
|
Nov. 29, 2001
|
1K
|
|
U.S. namelist
|
Nov. 10, 2001
|
348K
|
|
U.S. place names (NEW)
|
Dec. 18, 2001
|
144K
|
|
French namelist
|
Nov. 11, 2001
|
7K
|
|
English wordlist
|
Nov. 10, 2001
|
449K
|
|
French wordlist
|
Nov. 17, 2001
|
373K
|
|
German wordlist
|
Nov. 24, 2001
|
582K
|
|
Older, smaller German wordlist
|
Nov. 11, 2001
|
209K
|
|
Latin wordlist
|
Nov. 16, 2001
|
195K
|
|
Italian wordlist
|
Nov. 11, 2001
|
383K
|
|
Spanish wordlist
|
Nov. 11, 2001
|
322K
|
|
Non-U.S. place names (NEW)
|
Dec. 22, 2001
|
5992K
(Really really big!!)
|
|
Norwegian wordlist
|
Nov. 16, 2001
|
2078K
(Really big!)
|
|
Gaelic wordlist
|
Nov. 11, 2001
|
298K
|
|
Danish wordlist
|
Nov. 11, 2001
|
558K
|
|
Swedish wordlist
|
Nov. 11, 2001
|
254K
|
|
Finnish wordlist
|
Nov. 11, 2001
|
285K
|
|
My own special non-English wordlist
|
Nov. 24, 2001
|
1K
|
|
All wordlists and namelists
|
n/a
|
via
FTP
|
Older versions of the software
Installing GutenMark
... on Win32
-
Unzip GutenMark-xxx.zip with WinZip, pkunzip, or whatever you have that's
appropriate. You'll find GutenMark.exe under the GutenMark\Win32
directory.
-
Add this directory to your PATH, or copy GutenMark.exe to some directory
that's already in your path.
-
Place the file GutenMark.cfg and any wordlists
or namelists into the directory from which you intend to run the GutenMark
command-line program. Typically, this is the same as the directory
containing your etexts.
-
Don't uncompress the wordlists.
-
Depending on the wordlists you've downloaded, the native languages of the
etexts you're interested in, and your own personal tastes, you may want
to reconfigure the software.
In most cases, the default configuration should be fine.
... on Linux (Intel or PPC), Mac OS X, or FreeBSD
-
gunzip GutenMark-xxx.tar.gz
-
tar -xf GutenMark-xxx.tar
-
Under the GutenMark directory will subdirectories LinuxIntel, LinuxPPC,
MacOS-X, and FreeBSD. Each subdirectory contains an executable program
called "GutenMark" suitable for the indicated platform. Add
the appropriate directory to your path, or copy the executable to some
directory that's already in your path.
-
Place the file GutenMark.cfg and any wordlists
or namelists into the directory from which you intend to run the GutenMark
command-line program. Typically, this is the same as the directory
containing your etexts.
-
Don't uncompress the wordlists.
-
Depending on the wordlists you've downloaded, the native languages of the
etexts you're interested in, and your own personal tastes, you may want
to reconfigure the software.
In most cases, the default configuration should be fine.
Compiling GutenMark
If you don't have any of the platforms for which an executable program
is supplied, or if you would like to modify the program, then you need
to compile GutenMark yourself. This is easy on any system
that has the GNU compiler gcc and the GNU make program.
You can obtain gcc and make for free from GNU
. When compiling for Win32, the version of gcc called mingw32
(see
Mumit
Khan's web site ) is used.
NOTE: In versions later than 20011113, support for Borland's
free C++ compiler (see Borland's
web site ) has been dropped, because it was just too much effort for
me without knowing if anyone was interested. If for some reason you
don't want to use mingw32, and if you figure out how to get other
C compilers such as Borland's or Microsoft's to work, tell
me ; I'll post the instructions here.
Requirements
You need to have the compression library zlib installed. This
can be obtained for free from www.zlib.org
, but is already present on every *NIX system I personally have tried.
For Win32, I've included a pre-compiled zlib library with the GutenMark
distribution, so you don't have to worry about it.
... on Win32
Unzip the source code, and change to the GutenMark/src from the DOS command
line. To compile with mingw32,
make GutenMark.exe
(This assumes that the name of the GNU make program that you got
with mingw32 is actually accessible by typing "make". If this
calls up some other make program, such as Microsoft's or Borland's,
then this will not work properly.) In addition to compiling GutenMark.exe,
this will attempt to test the compilation by running GutenMark.exe to produce
sample HTML file (bldhb10.html) which it compares to an equivalent HTML
file (bldhb10.txt.html) provided with the distribution. (For versions
prior to 20011205, this comparison will fail because GutenMark
embeds the compilation date within the HTML markup. However, only
a single line of the HTML file, albeit a long one, will fail the comparison.
If more than one HTML lines fail to match, then the executable is not working
properly.)
... on *NIX
gunzip GutenMark-current.tar.gz
tar -xf GutenMark-current.tar
cd GutenMark/src
make
On FreeBSD, the GNU make program is actually called gmake
. In addition to compiling GutenMark, this will attempt to
test the compilation by running GutenMark to produce sample HTML
file (bldhb10.html) which it compares to an equivalent HTML file (bldhb10.txt.html)
provided with the distribution. (For versions prior to 20011205,
this comparison will fail because GutenMark embeds the compilation
date within the HTML markup. However, only a single line of the HTML
file, albeit a long one, will fail the comparison. If more than one
HTML lines fail to match, then the executable is not working properly.)
Other Stuff You Might Want
The function of GutenMark is merely to convert the Project Gutenberg
etexts to marked-up HTML or LaTeX. If that's all you want -- if you
want to read the etext online, or to set up a web site that displays PG
texts in HTML, or if you're fine with printing etexts from your browser,
or if you want to use the HTML as a starting point for further markup --
then you're all set!
If, on the other hand, you are looking for an end-to-end solution that
can produce attractive printable texts like this sample,
then you need some better way of printing HTML than your browser can provide.
You could, of course, load the HTML into Microsoft Word or some other word
processing program, and manipulate the document format manually.
The solution I would choose instead is to use a utility program that
can convert HTML to Postscript printer language, or to PDF format.
Several such free utilities are available.
-
html2ps is
available for Linux systems (my SuSE Linux system installed it automatically),
and is the program I used to create the sample PDF file. Its only
drawbacks, as far as I know, are that it is slow and that it can leave
hanging section headings at the bottoms of pages. Also, if the font
size is too big (or the page size too small), it has occasional problems
right justifying text. Actually, html2ps only creates Postscript,
which can then be converted to PDF with ps2pdf . I don't have
personal experience with these options on Win32 systems, but html2ps
is written in Perl, and should therefore be available on Win32; Postscript
can be converted to PDF or printed on Win32 using Adobe
Acrobat (costs money) or with ghostscript
(for free). If you are interested in using html2ps with
GutenMark,
you may also be interested in various configuration files for html2ps
that I think work nicely with GutenMark HTML:
htmldoc is available
for either for Win32 or in source-code form (for Linux systems), and has
some very nice properties. I personally find it a little buggy, but
it's apparently under active development and can presumably only get better.
The main problem is that it is very bad at right justification (or at least,
I haven't figured it out), and so you need to use ragged-right text.
Many other means of converting the HTML to alternate forms for printing
or editing are also available. While I have not used any of them
myself, here are some ideas:
-
Convert the PG etexts to to LaTeX instead of HTML with GutenMark
's "--latex" command-line switch. There are also various free conversion
utilities to convert HTML to LaTeX (such as htmltolatex
or html2latex),
but experimentation with such utilities has not (so far) produced good
results. Unfortunately GutenMark's LaTeX support is also still
quite buggy.
-
Convert to DocBook with existing (non-free) utilities.
-
Convert to XML with existing utilities.
©2001 Ronald S. Burkey
Last updated 12/28/01 by RSB. Contact me
.