Copyright © 1996-2007 Jari Aalto
License: This material may be distributed only subject to the terms and conditions set forth in GNU General Public License v2 or later; or, at your option, distributed under the terms of GNU Free Documentation License version 1.2 or later (GNU FDL).
This document describes formatting rules of a t2html.pl Perl program.
Writing text documents is different from writing messages to Usenet or to your fellow workers. There already exists several tools to convert email messages into HTML, like MHonArc, Email hyper archiver, but for regular text documents, like for memos, FAQs, help pages and for other papers, there wasn't any suitable HTML converter couple of years back. The author wanted a simple HTML tool which would read pure plain text documents, like guides, tips pages, documentation, book mark pages etc. and convert them into HTML. Here you will find the specification how to format your text documents for t2html.pl perl script text to HTML converter.
Few arguments, why plain text is the best source document format:
1.2.1 Overview of features:
1.2.2 HTML conversion
1.2.3 HTML 4.01
1.2.4 Link check for the text file
1.2.5 Splitting the text file to pieces
Before leaping forward, you anxiously want to see the results? Ok, view these two links and you see what TF looks like and the generated HTML. You may also be interested in seeing the generated HTML code: select view->source in your browser now.
/t2html.txt -- plain text (TF) /t2html.html -- generated page. (this page)
The TF specification can be found from the t2html.pl manual page;
/t2html-perl.html
The command used to generate these framed html pages was:
% t2html.pl \ --author "Jari Aalto" \ --title "Perl Text to html converter" \ --html-body LANG=en \ --button-previous ssjaaa.html \ --meta-keywords "text2html perl conversion" \ --meta-description "Perl Text to html converter" \ --html-frame \ --Out \ t2html.txt
You need nothing else but a text editor where the current column number is displayed or editor can be configured to advance your TAB by 4 spaces. That's it.
See project http://freshmeat.net/projects/emacs-tiny-tools
If word Emacs means anything to you, then you can use additional Emacs minor mode (package tinytf.el) which can make the writing much more easier: It will help formatting paragraphs, filling bullets numbering headings and keeping TOC up to date. The description of the Emacs package is found from this page and from the package itself.
1.6.1 Documentation tools in programming languages
Perl is an exception within programmin languages, because it includes internal documentation syntax called POD (Plain Old Syntax), with which you can embed documentation right into the program source. Deriving the documentation from perl programs is a straightforward job.
Another well known language (invented long after Perl) is Java, which calls the embedded documentation javadoc.
All others, the de facto method would be:
1.6.2 Other programming languages
But it is possible to embed documentation inside any programming language: directly into the code. Another small Perl utility is used to extract the documentation provided it was written in TF format. Documentation is put at the beginning of the file and updated there.
ripdoc.pl extracts the documentation which follows TF guidelines. The idea is that you can generate HTML documents from the embedded 'TF pod'. The conversion goes like this:
% ripdoc.pl code.sh | t2html.pl > code.html % ripdoc.pl code.el | t2html.pl > code.html % ripdoc.pl code.cc | t2html.pl > code.html
Suitable for awk, shell, sh, ksh, C++, Java, Lisp, python, Tcl etc. programming languages. The only criteria is that the language supports one-comment-starter and that the documentation has been written by using it. Languages that have comment-start and comment-end, like C that has /* and */, are not suitable for ripdoc.pl. Visit http://cpan.perl.org/modules/by-authors/id/J/JA/JARIAALTO/ to get the program.
Live examples
See also http://www.imatix.com/html/gslgen/index.htm GSLgen is a general-purpose file generator. It generates source code, data, or other files from an XML file and a schema file. The XML file defines a particular set of data. The schema file tells GSLgen what to do with that data
Other Utilities
DocBook - SGML
http://www.oreilly.com/catalog/docbook => online book
Texi2html - Perl script
ttp://www.mathematik.uni-kl.de/~obachman/Texi2html
HTML tidy - remove extra markup
http://www.w3.org/People/Raggett/tidy/
Latex like Perl formatting
http://www.physics.purdue.edu/~hinson/ftl/
Hyperlatex
http://www.cs.ust.hk/~otfried/Hyperlatex/
Hyperlatex is a package that allows you to prepare documents in HTML, and, at the same time, to produce a neatly printed document from your input. Unlike some other systems that you may have seen, Hyperlatex is not a general LaTeX-to-HTML converter. In my eyes, conversion is not a solution to HTML authoring. A well written HTML document must differ from a printed copy in a number of rather subtle ways. I doubt that these differences can be recognized mechanically, and I believe that converted LaTeX can never be as readable as a document written in HTML. The basic idea of Hyperlatex is to make it possible to write a document that will look like a flawless LaTeX document when printed and like a handwritten HTML document when viewed with an HTML browser.
html2texi
http://www.cs.washington.edu/homes/mernst/software/#html2texi
<<<<<< conversion.txt
html2texi converts HTML documentation trees into Texinfo format.
Texinfo format can be easily converted to Info format (for browsing
in Emacs or the stand alone Info browser), to a printed manual, or
to HTML. Thus, html2texi.pl permits conversion of HTML files to
Info format, and secondarily enables producing printed versions of
Web page hierarchies. Unlike HTML, Info format is searchable. Since
Info is integrated into Emacs, one can read documentation without
starting a separate Web browser. Additionally, Info browsers
(including Emacs) contain convenient features missing from Web
browsers, such as easy index lookup and mouse-free browsing.
=====
RTF in PC
http://www.kfa-juelich.de/isr/1/texconv/textopc.html
>>>>>>> 1.5
catdoc
http://www.ice.ru/~vitus/works/
http://packages.debian.org/unstable/text/catdoc.html
vitus@agropc.msk.su Victor B. Wagner
Catdoc is simple, one C source file, compiles in any system (DOS; Unix). Feed MS word file to it and it gives 7bit text out of it.
word2x
ftp://ftp.dante.de:/pub/tex/tools/word2x/
My comment:
Laola
http://wwwwbs.cs.tu-berlin.de/~schwartz/pmh/
Laola(perl) does a respectable job of taking MSWord files to text ...LAOLA is giving access to the raw document streams of any program using "structured storage" technology to save its documents. ELSER is dealing especially with these streams as they are present in Word 6 and Word 7 documents.
MSWordView
http://www.csn.ul.ie/~caolan/docs/MSWordView.html
...MSWordView is a program that can understand the microsofts word
8 binary file format (office97), it currently converts word into
html, which can then be read with a browser.
This file has been automatically generated from plain text file
with Perl script t2html v2007.0918.1337
Last updated: 2007-09-18 16:37