Difference between revisions of "Removing unwanted markup"

From Wiki
Jump to navigation Jump to search
m
m
Line 1: Line 1:
 
+
< [[HTML_to_ConTeXt]]
< [[HTML_to_ConTeXt
 
  
 
Not all the markup in HTML is needed. We need to remove them first. The following is based on the
 
Not all the markup in HTML is needed. We need to remove them first. The following is based on the

Revision as of 08:59, 16 July 2007

< HTML_to_ConTeXt

Not all the markup in HTML is needed. We need to remove them first. The following is based on the markup used in Informl.


# Function: scrape_page.rb

def scrape_the_page(pagePath,oFile,hFile)
items_to_remove = [
  "#menus",        #menus notice
  "div.markedup",
  "div.navigation",
  "head",          #table of contents
  "hr"
  ]

doc=Hpricot(open(pagePath))
@article = (doc/"#container").each do |content|
  #remove unnecessary content and edit links
  items_to_remove.each { |x| (content/x).remove }
end