txt2jpg is a simple Java utility for converting plain text files
and relatively simple HTML
documents to (possibly long) sequences of JPEG images. I wrote it to
make it possible for me to read eBooks on my Archos pocket media players, which
can view still images as well as movies. Of course,
these image viewers on these media players are intended for browsing
photo collections, not eBooks; and
converting a plain text documented into a few thousand separate JPEG
files is an outrageous use of disk space; but it works, after a fashion,
when the alternative is nothing at all.
The program takes a text or HTML file as input, and outputs numbered
JPEG files. You can then copy these files en masse to your
photo viewer.
If the specified input file has a name that ends in .html
or .htm the file is formatted as HTML; otherwise it is
treated as plain text.
txt2jpg is a command-line utility. It has no graphical user
interface and never will have. It is just not complex enough to merit one.
Please note that I have no immediate plans to extend this software to
read, or produce, other formats that those it currently supports.
java executable (java.exe on
MS Windows) at the command prompt without getting an error message.
Check this by getting a command prompt and running:
java -versionIf you get a version number, you're up and running. If you get an error message, check your installation. Sorry, but fixing broken Java installations is beyond the scope of this document. You need to have enough disk space to store the generated JPEG files -- about 5 Mb for a novel.
txt2jpg is:
java -jar /path/to/txt2jpg.jar /path/to/text/file.txt prefixReplace the
/path/to stuff with the full directory name
of the place where you've installed txt2jpg.jar, and the
full path to the text/HTML file you want to convert. On Windows systems you'll
probably have to use \ instead of / as the directory separator.
prefix is
the text that will form the basis of the generated filenames --
this can be anything your operating system allows. The files
will be named prefix0000.jpg, prefix0001.jpg, etc.,
amd will be generated in the directory in which the program is run.
By default, the generated images will be 480x272 pixels in size, with a
12-point sans-serif font, with white text on a black background.
You can control the way the conversion is done using command-line switches.
For example, to produce images that are 320x240 pixels, do this:
java -jar txt2jpg.jar -w 320 -h 240 file.txt prefixHere are the other command-line options:
-c, --cscheme colourscheme Set colourscheme (0-3). In fact, txt2jpg does
not use colours at all, as such; the variants
correspond to different shades of grey.
-f, --fontsize fontsize Set font size (8-24, default 12).
Strange results will be obtained with bizarre
font sizes; usually 12 is the smallest that
can readily be read on a handheld device.
-h, --height height Set image height (default 272).
-p, --preformat Selects preformatted mode (see below).
Has no effect with an HTML file.
-q, --squash Use squash mode (see below).
-s, --serif Use serif font. The default is to use the
-v, --verbose level Set verbosity level (0-2);
Doesn't do much at present.
--version Show version.
-w, --width width Set image width (default 480).
system's default san-serif font.
Pictures directory (called Photos on some
units) just for converted text files. You can then connect the
device as a USB disk drive, and copy all the .jpg files
output from txt2jpg into
this one directory. If you then open the photo browser and select the first
image in the sequence, you should be able to page forward and back
in the text using the right and left keys on the unit.
For best results, I suggest setting the image size using the -w
and -h switches to be exactly the same as the device's
screen dimensions. You may have to experiment with the font settings to
get a readable display -- displays vary from device to device, so it's
difficult to give general advice.
P and /P Treated identically -- both break to a new paragraph, and
rest the paragraph indent, if any
BR and /BR Break to a new line; do not reset the paragraph indent, if any
BLOCKQUOTE rSsets the left indent the same as a paragraph indent.
Reset by a P or H tag, as well as a /BLOCKQUOTE
H1, H2, etc All headings are rendered in bold, all the same size
B and STRONG Rendered in a bold font
CODE Rendered in a monospaced font
I and U Rendered in an oblique font
HR Draws a horizontal line across the page, and flushes the line
PRE Selects preformatted mode and a monospaced font (see below)
The following tags are recongized but ignored:
HTML HEAD BODY TITLE META STYLE SCRIPT DIV SPAN FONT A
DOCTYPE LINK PAGE FRAME FRAMESET CENTER TBODY NOBR, all table tags,
all form tags, all script tags
The following entities are recognized:
nbsp translated to plain space
quot rendered as a double quote
gt, lt rendered as a greater-than or less-than sign
copy rendered as a copyright symbol
amp rendered as an ampersand
#NNN the unicode character NNN is inserted. Whether it is rendered
or not depends on your JVM font capabilities
In HTML mode, txt2jpg continues to try to fit as much text into
the output image as it can. However, it makes some concessions to layout.
So, for example, it will try to avoid having a title as the last
line on a page.
-p command-line
switch, or the PRE tag is encountered in a HTML document.
In preformatted mode, txt2jpg tries to respect the internal
layout of the document. That is, it will break lines where the source
document breaks lines, respect multiple spaces and line breaks, and use
a monospaced font so that each character occupies the same screen area.
This mode is useful for documents such as scripts, which rely heavily
on spacing and line breaks to create the proper layout.
However, txt2jpg will not respect the source document to the
extent that text will be lost off the right-hand edge of the page if
the line does not fit -- it will still wrap lines that are too long to fit
the selected image size.
-q command-line switch.
Its purpose is to attempt to increase the amount of text that fits onto
each output image. In squash mode, txt2jpg will still respect
formatting information, but it will interpret it in such a way as to minimize
unnecessary whitespace.
In text mode and HTML mode, the effect of squash mode is to render paragraph breaks as indents, not blank lines. This is how printed text is usually presented, but not what Web browsers usually do. The following affect HTML documents only:
PRE sections.
txt2jpg tries to fit as much text onto each page as
possible, while paying some heed to the formatting information in the
HTML tags. It does not, and
never will, render tables or images.
2. This program cannot process any kind of formatted text other than very simple HTML, or any kind of word processor document, or any kind of compressed file, or any proprietary format. It is primarily designed to read the plain ASCII text files of the type that form the main body of the Project Gutenberg library. It might handle non-latin characters if your source file is properly encoded, and you system and Java run-time are properly configured, but don't bet on it. It has incidental support for HTML, because a few eBooks are starting to turn up in HTML format, and because it's easy to most things into HTML, but don't expect too much.
3. It's much more difficult than you might think to lay out plain text on
a small page. In general, the small page will accomodate fewer words than
the original source documents, so the lines will have to be broken in
different places. This means that we can't assume that an end-of-line marker
in the source file should be treated as an end-of-line in the output --
it might just be an `incidental' end-of-line put in because text editors won't
work with lines of arbitrary length. The program assumes that a blank line
in the source should be taken as the start of a new paragraph. In all other
circumstances, words in the source are put into the output so as to fill each
line as fully as possible. This means that some layout may be jumbled. This
is not a defect in txt2jpg -- it is the inevitable result of
changing the page size in a plain text document. To some degree you can
avoid this problem using preformatted mode, but at the expense of generating
a larger number of pages.
4. Most handheld devices that can display images have a relatively low limit on the number of files that can be placed into one directory. On the Archos devices, it's 999 files. On my AV500, whose screen size is 480x272, if I use the smallest readable font size (12), most ordinary novels will produce fewer than 999 JPEG files. If you increase the font size, you may have to limit the size of the text document you process, because inevitably you won't get as much text on each page, and the number of files may go over the 999-file limit.
5. The program only outputs monochrome images. You can't highlight text, for example, in colour. The reason for this is that monochrome JPEG files are about half the size of colour JPEG files that just happen to contain mostly monochrome data.