Difference between revisions of "Removing unwanted markup"

From Wiki
Jump to navigation Jump to search
m
m
 
(One intermediate revision by one other user not shown)
Line 1: Line 1:
 
+
< [[HTML_to_ConTeXt]]
< [[HTML_to_ConTeXt
 
  
 
Not all the markup in HTML is needed. We need to remove them first. The following is based on the
 
Not all the markup in HTML is needed. We need to remove them first. The following is based on the
Line 24: Line 23:
  
 
</pre>
 
</pre>
 +
 +
[[Category:Old_Content]]

Latest revision as of 15:18, 8 June 2020

< HTML_to_ConTeXt

Not all the markup in HTML is needed. We need to remove them first. The following is based on the markup used in Informl.


# Function: scrape_page.rb

def scrape_the_page(pagePath,oFile,hFile)
items_to_remove = [
  "#menus",        #menus notice
  "div.markedup",
  "div.navigation",
  "head",          #table of contents
  "hr"
  ]

doc=Hpricot(open(pagePath))
@article = (doc/"#container").each do |content|
  #remove unnecessary content and edit links
  items_to_remove.each { |x| (content/x).remove }
end