Difference between revisions of "HTML and ConTeXt"

Revision as of 07:58, 16 July 2007

< Main Page

The Problem

Several issues arise when a TeX document is to be constructed collaboratively.

Your partner(s) may not be used to TeX (ConTeXt) markup.
Marking up editorial comments is difficult (for instance in WYSIWIG softwaresuch as WORD, edits can be highlighted by switching on "Track changes")
One needs to have the same document in different formats (PDF, HTML, DOC etc)

A Solution

There are several different ways to address each of these issues. For instance

Have the partner construct the document in WYSIWIG software and have them import it to OpenOffice and then export it to TeX.

Have the partner edit the source code

Use third party software to convert between formats (PDF to Word for instance).

I would like to offer a different path towards solving these issues, especially with the aim of involving non-TeX folks in the process. Essentially the approach is to use a web-based Wiki to construct parts of the document and then use Ruby to translate HTML syntax to ConTeXt syntax. The advantages I can think of are:

Wiki syntax is easy to learn
People can join the effort from anywhere in the world provided they have an internet connection.
Editing the document is relatively easy.
Folks can visualize the structure of their document in the HTML version.
Folks who want to convert the document into WORD just needs to import the HTMLversion.

Now, without much ado, to the details ...

Using Wiki software for collaborative document preparation

Any Wiki software may just be fine. In this document I will use a new Ruby Rails based wiki software known as Informl to illustrate key ideas. One attractive aspect of Informl for my colleagues who are used to WYSIWIG software is that while in edit mode, a real-time preview is available in a twin-window.

@@ Line 6: / Line 6: @@
 * Your partner(s) may not be used to TeX (ConTeXt) markup.
-* Marking up editorial comments is difficult (for instance in WYSIWIG software
+* Marking up editorial comments is difficult (for instance in WYSIWIG softwaresuch as WORD, edits can be highlighted by switching on "Track changes")
-such as WORD, edits can be highlighted by switching on "Track changes")
 * One needs to have the same document in different formats (PDF, HTML, DOC etc)
@@ Line 14: / Line 13: @@
 There are several different ways to address each of these issues. For instance
-* Have the partner construct the document in WYSIWIG software and have them
+* Have the partner construct the document in WYSIWIG software and have them import it to OpenOffice and then export it to TeX.
-import it to OpenOffice and then export it to TeX.
 * Have the partner edit the source code
@@ Line 41: / Line 39: @@
 == Translating HTML into ConTeXt using Ruby ==
-The next step is to retrieve the HTML pages created in the step above. Here I have used the ruby library 'open-uri' to
+* [[Navigating to HTML page]]
-retrieve the web-page and another libray 'hpricot' to edit these pages and translate html markup into ConTeXt markup.
+* [[Setting up ConTeXt document]]
+* [[Click and navigate to chapters and sections]]
-=== Step 1. Open the remote page ===
+* [[HTML to ConTeXt]]
-<pre>
+** [[Removing unwanted markup]]
+** [[Simple replacements]]
-#scan_page.rb = Retrieves the html page of interest from the server,
+** [[Translating Figure markup]]
-#        navigates to links within the main page and construct a
+** [[Translating Table markup]]
-#        context document
+** [[The rest of the filters]]
-#!/usr/bin/ruby
-require 'rubygems'
-require 'open-uri'        # the open-uri library
-require 'hpricot'         # the hpricot library
-require 'scrape_page'     # user-defined function to filter html  into ConTeXt
-# scans the home page and lists
-# all the directories and subdirectories
-doc=Hpricot(open("http://ipa.dd.re.ss/AnnRep07"))
-</pre>
-=== Step 2. Setting up the ConTeXt document ===
-<pre>
-mainfil="annrep.tex"   # open a file to output ConTeXt document
-`rm #{mainfil}`
-fil=File.new(mainfil,"a")
-# Add some opening directives and include style files
-fil.write "\\input context_styles \n"  # this file contains the styling options for my Context document
-fil.write "\\starttext \n"
-fil.write "\\leftaligned{\\BigFontOne Contents} \n"
-fil.write "\\vfill \n"
-fil.write "{ \\switchtobodyfont[10pt] "
-fil.write "\\startcolumns[n=2,balance=no,rule=off,option=background,frame=off,background=color,backgroundcolor=blue:1] \n"
-fil.write "\\placecontent \n"
-fil.write "\\stopcolumns \n"
-fil.write "}"
-</pre>
-=== Step 3.  Clicking chapters and section links ===
-In this example, we created new pages for chapters and sections so that each part of the document could
-be authored by a different person.  In Informl new pages are indicated by the CSS class name "existingWikiWord"
-as shown in the following figure.
-[[Image:Wiki_prev2.jpg]].
-<pre>
-<p>
-  <a class="existingWikiWord"
-      href="http://localhost:3010/AnnRep07/pages/APCC+Research+and+Development+Projects">
-      APCC Research and Development Projects
-  </a>
-</p>
-</pre>
-Knowing this, I have used the following 'hpricot' code to click on chapter and section links to retrieve
-their contents.
-<pre>
-chapters= (doc/"p/a.existingWikiWord")
-# we need to navigate one more level into the web page
-# let us discover the links for that
-chapters.each do |ch|
-  chap_link = ch.attributes['href']
-  # using inner_html we can create subdirectories
-  chap_name = ch.inner_html.gsub(/\s*/,"")
-  chap_name_org = ch.inner_html
-  # We create chapter directories
-  system("mkdir -p #{chap_name}")
-  fil.write "\\input #{chap_name}  \n"
-  chapFil="#{chap_name}.tex"
-  `rm #{chapFil}`
-  cFil=File.new(chapFil,"a")
-  cFil.write "\\chapter{ #{chap_name_org} } \n"
-</pre>
-<pre>
-  # We navigate to sections now
-  doc2=Hpricot(open(chap_link))
-  sections= (doc2/"p/a.existingWikiWord")
-  sections.each do |sc|
-    sec_link = sc.attributes['href']
-    sec_name = sc.inner_html.gsub(/\s*/,"")
-    secFil="#{chap_name}/#{sec_name}.tex"
-    `rm #{secFil}`
-    sFil=File.new(secFil,"a")
-    sechFil="#{chap_name}/#{sec_name}.html"
-    `rm #{sechFil}`
-    shFil=File.new(sechFil,"a")
-</pre>
-After navigating to sections (h1 elements in HTML) retrieve their contents
-and send it to the ruby function "scrape_page.rb" for filtering.
-<pre>
-    #  scrape_the_page(sec_link,"#{chap_name}/#{sec_name}")
-    scrape_the_page(sec_link,sFil,shFil)
-    cFil.write "\\input #{chap_name}/#{sec_name} \n"
-  end
-end
-fil.write "\\stoptext \n"
-</pre>
-=== Filtering HTML into ConTeXt ===

Difference between revisions of "HTML and ConTeXt"

Revision as of 07:58, 16 July 2007

Contents

The Problem

A Solution

Using Wiki software for collaborative document preparation

Translating HTML into ConTeXt using Ruby

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Main

Navigation

Indexes

Interaction

Tools