Difference between revisions of "Removing unwanted markup"
Jump to navigation
Jump to search
(New page: Not all the markup in HTML is needed. We need to remove them first. The following is based on the markup used in Informl. <pre> # Function: scrape_page.rb def scrape_the_page(pagePath,oF...) |
m |
||
(2 intermediate revisions by one other user not shown) | |||
Line 1: | Line 1: | ||
+ | < [[HTML_to_ConTeXt]] | ||
+ | |||
Not all the markup in HTML is needed. We need to remove them first. The following is based on the | Not all the markup in HTML is needed. We need to remove them first. The following is based on the | ||
markup used in Informl. | markup used in Informl. | ||
Line 21: | Line 23: | ||
</pre> | </pre> | ||
+ | |||
+ | [[Category:Old_Content]] |
Latest revision as of 15:18, 8 June 2020
Not all the markup in HTML is needed. We need to remove them first. The following is based on the markup used in Informl.
# Function: scrape_page.rb def scrape_the_page(pagePath,oFile,hFile) items_to_remove = [ "#menus", #menus notice "div.markedup", "div.navigation", "head", #table of contents "hr" ] doc=Hpricot(open(pagePath)) @article = (doc/"#container").each do |content| #remove unnecessary content and edit links items_to_remove.each { |x| (content/x).remove } end