Difference between revisions of "TEI xml"

From Wiki
Jump to navigation Jump to search
(Beginning page on TEI xml. Work in progress)
 
m (further work)
Line 1: Line 1:
TEI (Text Encoding Initiative) is "a consortium which collectively develops
+
== General ==
and maintains a standard for the representation of texts in digital form,"
+
 
to quote [http://www.tei-c.org/index.xml|their own website]. They have developed a series of guidelines for
+
TEI (Text Encoding Initiative) is "a consortium which collectively develops and maintains a standard for the representation of texts in digital form," to quote [http://www.tei-c.org/index.xml their own website]. They have developed a series of guidelines for editing texts in a digital form. In their latest form (which is called P 5), these guidelines weigh in at a hefty 1350 pages (OK, that's counting the bibliography and the index too; there are only 1290 pages of real text). These describe an xml format which is suitable for editing texts. The TEI guidelines have the advantage of being very well documented. There are a number of free resources available that should help everyone who is interested in getting started (one extremely helpful website with lots of tutorials, examples, and tests is [http://tbe.kantl.be/TBE TEI by example]). They are not (and do not aspire to be) an absolute standard that everyone has to follow, but many academic projects use these guidelines, and they should be a pretty good way to make sure that your electronic edition of a text will be useful in the future.
editing texts in a digital form. In their latest form (which is called P
 
5), these guidelines weigh in at a hefty 1350 pages (OK, that's counting
 
the bibliography and the index too; there are only 1290 pages of real
 
text). These describe an xml format which is suitable for editing
 
texts. The TEI guidelines have the advantage of being very well
 
documented. There are a number of free resources available that should help
 
everyone who is interested in getting started (one extremely helpful
 
website is ). They are not (and do not aspire to be) an absolute standard
 
that everyone has to follow, but many academic projects use these
 
guidelines, and they should be a pretty good way to make sure that your
 
electronic edition of a text will be useful in the future.
 
  
 
Since editing texts is something which quite a few users of ConTeXt are involved in, it makes sense to think about ways in which xml documents which follow the TEI guideline can be typeset with ConTeXt. We would invite users to keep a few caveats in mind:  
 
Since editing texts is something which quite a few users of ConTeXt are involved in, it makes sense to think about ways in which xml documents which follow the TEI guideline can be typeset with ConTeXt. We would invite users to keep a few caveats in mind:  
  
# The TEI guidelines are very detailed because they try to cater to a large   number of needs. Most users will only need a small subset of the tags and attributes which the guidelines offer. It would not make sense to try and provide a monolithic solution that defines all TEI tags; instead, localized ConTeXt style sheets are necessary which will define a subset which is relevant for a number of texts with similar features.
+
# The TEI guidelines are very detailed because they try to cater to a large number of needs. Most users will only need a small subset of the tags and attributes which the guidelines offer (in fact, TEI is aware of this and have a slimmed down version of their guidelines which is called [http://www.tei-c.org/Guidelines/Customization/Lite  TEI Lite]. This is a very good starting place to familiarize yourself with TEI). It would not make sense to try and provide a monolithic solution that defines all TEI tags; instead, localized ConTeXt style sheets are necessary which will define a subset which is relevant for a number of texts with similar features.
 
# Even with this huge number of tags, TEI does not expect to be sufficient for every text. Users are encouraged to develop their own styles; again, this necessitates special ConTeXt style sheets to process such adaptations.
 
# Even with this huge number of tags, TEI does not expect to be sufficient for every text. Users are encouraged to develop their own styles; again, this necessitates special ConTeXt style sheets to process such adaptations.
 
# Encoding and typesetting texts in xml is an ongoing process. As you go forward in your edition, you realize that you need more tags, that you need to distinguish more special cases, that you want to add more information to your edition. This means that you will have to go back and forth between your xml file and the ConTeXt style and adapt both to your needs.  
 
# Encoding and typesetting texts in xml is an ongoing process. As you go forward in your edition, you realize that you need more tags, that you need to distinguish more special cases, that you want to add more information to your edition. This means that you will have to go back and forth between your xml file and the ConTeXt style and adapt both to your needs.  
  
All of which means that the following paragraphs are just the first step in an ongoing attempt. I (Thomas) have written down a setup for a text that I am editing (for those who are interested: the Lives of the Sophists by Philostratus). I fully expect this to be a community effort: as others use TEI xml, they will discover new ways of handling things, will want to add features or add examples for other sorts of texts. My example is meant to start the discussion. Since those who edit texts usually have a background in the humanities, not in programming, I have added lengthy comments which will explain every step.
+
All of which means that the following paragraphs are just the first step in an ongoing attempt. I (Thomas) have written down a setup for a text that I am editing (for those who are interested: [http://www.livius.org/phi-php/philostratus/philostratus.htm#VS  the ''Lives of the Sophists'' by Philostratus]). I fully expect this to be a community effort: as others use TEI xml, they will discover new ways of handling things, will want to add features or add examples for other sorts of texts. My example is meant to start the discussion. Since those who edit texts usually have a background in the humanities, not in programming, I have added lengthy comments which will explain every step.
 +
 
 +
== Our xml file ==
 +
 
 +
Philostratus's text is in ancient Greek, but since the text itself doesn't matter much when we talk about structure and typesetting xml, I have replaced it here with a simple lorem ipsum text that is easier to display. So here's what the first paragraphs of the xml file look like:
  
--[[User:Thomas|Thomas]] 08:33, 10 October 2010 (UTC)
+
--[[User:Thomas|Thomas]] 09:03, 10 October 2010 (UTC)

Revision as of 09:05, 10 October 2010

General

TEI (Text Encoding Initiative) is "a consortium which collectively develops and maintains a standard for the representation of texts in digital form," to quote their own website. They have developed a series of guidelines for editing texts in a digital form. In their latest form (which is called P 5), these guidelines weigh in at a hefty 1350 pages (OK, that's counting the bibliography and the index too; there are only 1290 pages of real text). These describe an xml format which is suitable for editing texts. The TEI guidelines have the advantage of being very well documented. There are a number of free resources available that should help everyone who is interested in getting started (one extremely helpful website with lots of tutorials, examples, and tests is TEI by example). They are not (and do not aspire to be) an absolute standard that everyone has to follow, but many academic projects use these guidelines, and they should be a pretty good way to make sure that your electronic edition of a text will be useful in the future.

Since editing texts is something which quite a few users of ConTeXt are involved in, it makes sense to think about ways in which xml documents which follow the TEI guideline can be typeset with ConTeXt. We would invite users to keep a few caveats in mind:

  1. The TEI guidelines are very detailed because they try to cater to a large number of needs. Most users will only need a small subset of the tags and attributes which the guidelines offer (in fact, TEI is aware of this and have a slimmed down version of their guidelines which is called TEI Lite. This is a very good starting place to familiarize yourself with TEI). It would not make sense to try and provide a monolithic solution that defines all TEI tags; instead, localized ConTeXt style sheets are necessary which will define a subset which is relevant for a number of texts with similar features.
  2. Even with this huge number of tags, TEI does not expect to be sufficient for every text. Users are encouraged to develop their own styles; again, this necessitates special ConTeXt style sheets to process such adaptations.
  3. Encoding and typesetting texts in xml is an ongoing process. As you go forward in your edition, you realize that you need more tags, that you need to distinguish more special cases, that you want to add more information to your edition. This means that you will have to go back and forth between your xml file and the ConTeXt style and adapt both to your needs.

All of which means that the following paragraphs are just the first step in an ongoing attempt. I (Thomas) have written down a setup for a text that I am editing (for those who are interested: the Lives of the Sophists by Philostratus). I fully expect this to be a community effort: as others use TEI xml, they will discover new ways of handling things, will want to add features or add examples for other sorts of texts. My example is meant to start the discussion. Since those who edit texts usually have a background in the humanities, not in programming, I have added lengthy comments which will explain every step.

Our xml file

Philostratus's text is in ancient Greek, but since the text itself doesn't matter much when we talk about structure and typesetting xml, I have replaced it here with a simple lorem ipsum text that is easier to display. So here's what the first paragraphs of the xml file look like:

--Thomas 09:03, 10 October 2010 (UTC)