Changes

TEI xml (view source)

Revision as of 15:02, 5 August 2017

2,926 bytes added , 15:02, 5 August 2017

m

addrress of TEI by example modified

== General ==

TEI (Text Encoding Initiative) is "a consortium which collectively develops and maintains a standard for the representation of texts in digital form," to quote [http://www.tei-c.org/index.xml their own website]. They have developed a series of guidelines for editing texts in a digital form. In their latest form (which is called P 5), these guidelines weigh in at a hefty 1350 pages (OK, that's counting the bibliography and the index too; there are only 1290 pages of real text). These describe an xml format which is suitable for editing texts. The TEI guidelines have the advantage of being very well documented. There are a number of free resources available that should help everyone who is interested in getting started (one extremely helpful website with lots of tutorials, examples, and tests is [http://~~tbe~~teibyexample.~~kantl.be/TBE~~ org TEI by example]). They are not (and do not aspire to be) an absolute standard that everyone has to follow, but many academic projects use these guidelines, and they should be a pretty good way to make sure that your electronic edition of a text will be useful in the future.

Since editing texts is something which quite a few users of ConTeXt are involved in, it makes sense to think about ways in which xml documents which follow the TEI guidelines can be typeset with ConTeXt. We would invite users to keep a few caveats in mind:

which defines it as a TEI xml file. Everything else is a "child" of this root level. At the next level, you see two of these children: on the one hand, the <teiHeader> element. This contains meta-information about your electronic edition: title, author, editor, publication status, source of your edition. There can be much more information here. This is meta-information which will usually not be typeset in your edition.

The other child is the <text> element. This is what will really be in atypeset, printed edition. As you see, the <text> element has again two children. The <front> contains the title of the work you edit in the form in which it will appear in your typeset document, prefatory material, etc. The <body> element contains the text itself. This text has a logical structure: It consists of books, chapters, and sections. All of these logical parts are expressed via different <code><div></code> elements; to distinguish them from each other, these <code><div></code> elements have so-called attributes, so we have:

As you can see, most of these "div" elements have other attributes as well: the "xml:id" attribute gives every section in your document a unique identifier. This makes it easier to refer to these sections later. You are free to choose these attributes; as an example, I have opted for a short numeric tag that refers to the paragraph. The "n" attribute is the name of the section as it will appear in your typeset edition. For classical prose texts, it is customary to have the chapter and section numbers appear in the margin of the edition, with no prefix and no additional information about the structure. E.g., at the beginning of chapter 8, there will be a bold '''8''' in the margin (the mark for "section 1" is understood and usually not expressed). For subsequent sections of chapter 8, there will be smaller section numbers in the margin, like "2," "3," etc. Finally, such sections of chapters do not necessarily begin a new paragraph. In order to make this clear, I have used the "rend" attribute (not exactly in the way TEI defines it, but close enough). For sections, I have two types of "rend" attributes: "inline" means that this section should just continue the typographical paragraph; "paragraph" means that it should begin in a new paragraph. This is an important distinction which I want to emphasize: in your typeset edition, these two will appear very different. For the '''logical''' structure of your digital text, however, they are both on the same level. That's why they are both "div" of the same type, but with different "rend" attributes.

Further, we have <tt><pb></tt> elements. These are used to denote pagebreaks in standard editions, which are often used for reference purposes and displayed in the margin; in the case of the ''Lives of the Sophists'', this is the 18th-century edition of Olearius. These elements are inserted at the places where these pagebreaks occur.

Finally, we have the critical apparatus. Its notes are included in <app> elements. Every single entry into the apparatus is within a <rdg> (= reading) element.

== The ConTeXt style file ==

'''NB:''' Some of the functionality described here has been introduced quite recently. You will need a ConTeXt version not earlier than December 2010 in order to try this example!

In order to typeset such a file with ConTeXt, we need a style file which will map xml elements and attributes to specific ConTeXt commands. We have to save this file (let's call it tei-style.tex) somewhere where ConTeXt can find it (e.g., somewhere in your personal texmf tree or in the same directory as the xml file) and then typeset with the command <tt>context --environment=tei-style philostratus.xml</tt>. We will look at this file in detail:

<code>\PhilSubsection</code>, and "flush" the content of our section. If

the value is anything else (i.e., "inline"), we flush the content without

inserting a <code>\par</code>. ~~Finally~~Then, we define<code>\PhilSubsection</code> as another <code>\inmargin</code>, which willappear in the outer margin, at the same place as the chapter numbering, butin a normal font. Finally,when you look at the main text, you will see that we now have definedsetups for books, chapters, sections, but not yet for the smallest element,<tt>p</tt>. Remember: we don't want paragraph breaks for these elements, soall we need to do is "flush" them. Which means: we add the <tt>p</tt>element to the list: <texcode>\xmlsetsetup{#1}{TEI|text|body|p}{xml:*}</texcode> and the appropriate setup is: <texcode>\startxmlsetups xml:p \xmlflush{#1}\stopxmlsetups</texcode> And that's it! This is our structure for the main text! If you typeset thexml file with this setup, you get text with marginal numbering for yourchapters and sections. We now add the bells and whistles. We begin with the Olearius pagebreaks,the <tt><pb></tt> elements. If you've followed so far, this should beeasy. As you see, these elements contain a reference to the relevantedition (the <tt>ed=</tt> attribute) and the pagenumber. If we had moreelements of this type, it would make sense to define a <tt>setsetup</tt>for every one of them. In the case of Philostratus, we will probably onlyhave Olearius, so we just add them to our list: <texcode>\xmlsetsetup{#1}{TEI|text|body|p|pb}{xml:*}</texcode> and add both the setup for the xml element and a new definition for amarginal text (since we're a bit paranoid, we still test whether the<tt>xmlattribute ed</tt> is set to <tt>Olearius</tt>). Since I want theOlearius numbers in square brackets, I needed to take a two-step approach(the square brackets would be confusing to the ConTeXt parser). So I firstdefine an inmargin <code>\Zolearius</code> and then a macro<code>\Olearius</code> which takes this value and typesets it within squarebrackets, in the outer margin, at a distance of 2em from the main text: <texcode>\startxmlsetups xml:pb \doifelse {\xmlatt{#1}{ed}} {Olearius} {\Olearius{\xmlatt{#1}{n}}} {}\stopxmlsetups \defineinmargin [ZOlearius] [outer] [normal ~~font~~] [distance=2em,style=small] \define[1]\Olearius% {\ZOlearius{[#1]}}</texcode> [[User:Thomas|Thomas]] 21:38, 7 November 2010 (UTC) == Removing unwanted strings from xml source == In some cases you might want to remove strings or characters from the xml source. For example ConTeXt cannot process a hashmark. The following example shows how to remove the hashmark from a xml identifier before processing with the command \cldcontext The xml source: <xmlcode><a href="#myspecialid">the previous section</a></xmlcode> The setup code: <texcode>\startxmlsetups xml:initialize \xmlsetsetup{#1}{a}{xml:*}\stopxmlsetups \xmlregistersetup{xml:initialize} \startxmlsetups xml:a \cldcontext{string.sub([[\xmlatt{#1}{href}]],2)}\stopxmlsetups</texcode> {{Getting started navbox}}

--[[~~User~~Category:~~Thomas|Thomas~~XML]] ~~18:08, 12 October 2010 (UTC)~~

Adeimantos

48

edits

Changes

TEI xml (view source)

Revision as of 15:02, 5 August 2017

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Main

Navigation

Indexes

Interaction

Tools