Difference between revisions of "Removing unwanted markup"

From Wiki
Jump to navigation Jump to search
(New page: Not all the markup in HTML is needed. We need to remove them first. The following is based on the markup used in Informl. <pre> # Function: scrape_page.rb def scrape_the_page(pagePath,oF...)
 
m
 
(2 intermediate revisions by one other user not shown)
Line 1: Line 1:
 +
< [[HTML_to_ConTeXt]]
 +
 
Not all the markup in HTML is needed. We need to remove them first. The following is based on the
 
Not all the markup in HTML is needed. We need to remove them first. The following is based on the
 
markup used in Informl.
 
markup used in Informl.
Line 21: Line 23:
  
 
</pre>
 
</pre>
 +
 +
[[Category:Old_Content]]

Latest revision as of 15:18, 8 June 2020

< HTML_to_ConTeXt

Not all the markup in HTML is needed. We need to remove them first. The following is based on the markup used in Informl.


# Function: scrape_page.rb

def scrape_the_page(pagePath,oFile,hFile)
items_to_remove = [
  "#menus",        #menus notice
  "div.markedup",
  "div.navigation",
  "head",          #table of contents
  "hr"
  ]

doc=Hpricot(open(pagePath))
@article = (doc/"#container").each do |content|
  #remove unnecessary content and edit links
  items_to_remove.each { |x| (content/x).remove }
end