Changes

TEI xml (view source)

Revision as of 18:10, 12 October 2010

2,248 bytes added , 18:10, 12 October 2010

m

continuing

TEI (Text Encoding Initiative) is "a consortium which collectively develops and maintains a standard for the representation of texts in digital form," to quote [http://www.tei-c.org/index.xml their own website]. They have developed a series of guidelines for editing texts in a digital form. In their latest form (which is called P 5), these guidelines weigh in at a hefty 1350 pages (OK, that's counting the bibliography and the index too; there are only 1290 pages of real text). These describe an xml format which is suitable for editing texts. The TEI guidelines have the advantage of being very well documented. There are a number of free resources available that should help everyone who is interested in getting started (one extremely helpful website with lots of tutorials, examples, and tests is [http://tbe.kantl.be/TBE TEI by example]). They are not (and do not aspire to be) an absolute standard that everyone has to follow, but many academic projects use these guidelines, and they should be a pretty good way to make sure that your electronic edition of a text will be useful in the future.

Since editing texts is something which quite a few users of ConTeXt are involved in, it makes sense to think about ways in which xml documents which follow the TEI ~~guideline~~ guidelines can be typeset with ConTeXt. We would invite users to keep a few caveats in mind:

# The TEI guidelines are very detailed because they try to cater to a large number of needs. Most users will only need a small subset of the tags and attributes which the guidelines offer (in fact, TEI is ia aware of this and ~~have~~ has a slimmed down version of their guidelines which is called [http://www.tei-c.org/Guidelines/Customization/Lite TEI Lite]. This is a very good starting place to familiarize yourself with TEI). It would not make sense to try and provide a monolithic solution that defines all TEI tags; instead, localized ConTeXt style sheets are necessary which will define a subset which is relevant for a number of texts with similar features.

# Even with this huge number of tags, TEI does not expect to be sufficient for every text. Users are encouraged to develop their own styles; again, this necessitates special ConTeXt style sheets to process such adaptations.

# Encoding and typesetting texts in xml is an ongoing process. As you go forward in your edition, you realize that you need more tags, that you need to distinguish more special cases, that you want to add more information to your edition. This means that you will have to go back and forth between your xml file and the ConTeXt style and adapt both to your needs.

<body>

<p>Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed

diam nonumy eirmod tempor invidunt

nonumy eirmod tempor invidunt.</p>

</div>

<p>ut labore et dolore magna aliquyam erat, sed diam

voluptua. <pb ed="Olearius" n="481"/> At vero eos et accusam et

amet</p>

</div>

<p>Duis autem vel eum iriure dolor in hendrerit in vulputate

velit esse molestie consequat, vel illum dolore eu feugiat

which defines it as a TEI xml file. Everything else is a "child" of this root level. At the next level, you see two of these children: on the one hand, the <teiHeader> element. This contains meta-information about your electronic edition: title, author, editor, publication status, source of your edition. There can be much more information here. This is meta-information which will usually not be typeset in your edition.

The other child is the <text> element. This is what will really be in a typeset, printed edition. As you see, ~~this~~ the <text> element has again two children~~: the~~ . The <front>~~, which~~ contains the title of the work you edit in the form in which it will appear in your typeset document, prefatory material, etc. The <body> element contains the text itself. This text has a logical structure: It consists of books, chapters, and sections. All of these logical parts are expressed via different <div> elements; to distinguish them from each other, these <div> elements have so-called attributes, so we have:

</texcode>

We define a set of <ttcode>\xmlsetups</ttcode> in a <ttcode>\start \stop</ttcode> environment, and we give it a name in the namespace <tt>xml:</tt>. The first line of these setups does only one thing: the <code>\xmlsetsetup</code> operates on the current xml tree (that's what the first argument <code>{#1}</code> refers to), takes all its elements (<code>{*}</code>) and discards them (<code>{-}</code>). That means '''only''' elements which we address explicitly will be typeset. This is necessary in our case because we do not want the information in the TEI header to be typeset.

For those elements we '''do''' want typeset, we have to add instructions. This involves a three-step process:

# (optional) we define TeX commands for typesetting

Let us begin with some easy steps. The xml tree we are operating on is empty now. So we first have to tell ConTeXt to pass the content of ~~rge~~ the topmost elements to its typesetting engine. The topmost element is TEI, so we write:

So we have:

# ~~add~~ added the element <tt>body</tt> to our <code>\xmlsetsetup</code># ~~add~~ added a specific setup for the element which puts its content within a <code>\startlinenumbering</code> environment # ~~add~~ added ConTeXt setup commands for the <code>\startlinenumbering</code> environment.

Things become even more interesting at the next level. When you look at our xml document, you will see that the entire body consists of different divisions in <tt>div</tt> elements; the different levels are distinguished by different <tt>type</tt> attributes. This means we cannot simply add the <tt>div</tt> element to our general <code>\xmlsetsetup</code>, but have to add a specific <code>\xmlsetsetup</code> for every type. Fortunately, ConTeXt makes it easy to address these different elements. We begin with the <tt>book</tt> level: (for clarity, I will now only show the new steps, not the entire style document):

</texcode>

What happens here? The expression <code>div[@type='book']</code> means "every element div which has an attribute 'type' with the value 'book.'" We want a blank line before the title of the book. Then, we take the value of the <tt>n</tt> attribute (that's what the construct <code>\xmlatt{#1}{n}</code> expands to: the value of the attribute <tt>n</tt> of the current tag) and typeset it midaligned. We add another, smaller blank. And don't forget to "flush" the content of the <tt>div</tt> element!

For the next level, the <tt>chapter</tt>, we need again three steps: add it to the ~~setsetup~~<code>\xmlsetsetup</code>, define a setup command and a ConTeXt macro for it:

</texcode>

So: here, the argument of the <tt>n</tt> attribute is passed to a ConTeXt macro <code>\PhilSection</code>. This macro is defined as an <code>\inmargin</code> which will be typeset in the outer margin, in a bigger, bold font. This will be the "chapter" numbering in the outermargin. For the section numbering, we take a similar approach, but as you will see,we need to define even more different setups: <texcode>\startxmlsetups xml:teisetups \xmlsetsetup{#1}{*}{-} \xmlsetsetup{#1}{TEI|text|body}{xml:*} \xmlsetsetup{#1}{div[@type='section']}{xml:div:section}\stopxmlsetups \startxmlsetups xml:div:section \doifelse {\xmlatt{#1}{n}} {1} {\xmlflush{#1}} {\doifelse {\xmlatt{#1}{rend}} {paragraph} {\par\PhilSubsection{\xmlatt{#1}{n}}\xmlflush{#1}} {\PhilSubsection{\xmlatt{#1}{n}}\xmlflush{#1}}}\stopxmlsetups \defineinmargin [PhilSubsection] [outer] [normal] [distance=0.3em,style=normal]</texcode> Here, we define a setup for the section level which contains two furthertests, for which we use ConTeXt's <code>\doifelse</code> macro. The first<code>\doifelse</code> tests if the value of the <tt>n</tt> attribute is"1," i.e., if this is the first section in a chapter. If it is, it doesnothing more than "flush" the content of this section -- remember, thenumber for the first section should not appear in the margin since it isimplied in the chapter number. It's still good to have this number -- ifyou ever decide that your typeset output should look different, theinformation is there and can be shown. But for the time being, we do notwant it to appear, and that's what the first condition does. If the<tt>n</tt> attribute's value '''isn't''' 1, another test is performed; thistime, we look at the value of the <tt>rend</tt> attribute. If thisattribute has the value "paragraph," we insert a <code>\par</code>, passthe value of the <tt>n</tt> attribute to the macro<code>\PhilSubsection</code>, and "flush" the content of our section. Ifthe value is anything else (i.e., "inline"), we flush the content withoutinserting a <code>\par</code>. Finally, we define<code>\PhilSubsection</code> as another <code>\inmargin</code>, which willappear in the outer margin, at the same place as the chapter numbering, butin a normal font.

--[[User:Thomas|Thomas]] 1118:3708, 10 12 October 2010 (UTC)

Thomas

gardener

111

edits

Changes

TEI xml (view source)

Revision as of 18:10, 12 October 2010

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Main

Navigation

Indexes

Interaction

Tools