Difference between revisions of "User:Luigi.scarso/testpage"

From Wiki
Jump to navigation Jump to search
Line 379: Line 379:
 
</pre>
 
</pre>
 
      
 
      
<ul style="list-style-type:none;">
+
<ul style="list-style-type:decimal;">
 
<li>(Hans Hagen and Ton Otten)</li>
 
<li>(Hans Hagen and Ton Otten)</li>
 
<li>(Hans Hagen and Ton Otten (1996))</li>
 
<li>(Hans Hagen and Ton Otten (1996))</li>

Revision as of 09:36, 21 January 2014

The database

The bibTEX format is rather popular in the TEX community and even with its shortcomings it will stay around for a while. Many publication websites can export and many tools are available to work with this database format. It is rather simple and looks a bit like Lua tables. Unfortunately the content can be polluted with non-standardized TEX commands which complicates pre- or postprocessing outside TEX. In that sense a bibTEX database is often not coded neutrally. Some limitations, like the use of commands to encode accented characters root in the ascii world and can be bypassed by using utf instead (as handled somewhat in LATEX through extensions such as bibtex8).

The normal way to deal with a bibliography is to refer to entries using a unique tag or key. When a list of entries is typeset, this reference can be used for linking purposes. The typeset list can be processed and sorted using the bibtex program that converts the database into something more TEX friendly (a .bbl file). I never used the program myself (nor bibliographies) so I will not go into too much detail here, if only because all I say can be wrong.

In ConTEXt we no longer use the bibtex program: we just use database files and deal with the necessary manipulations directly in ConTEXt. One or more such databases can be used and combined with additional entries defined within the document. We can have several such datasets active at the same time.

A bibTEX file looks like this:

@Article{sometag,
    author  = "An Author and Another One",
    title   = "A hopefully meaningful title",
    journal = maps,
    volume  = "25",
    number  = "2",
    pages   = "5--9",
    month   = mar,
    year    = "2013",
    ISSN    = "1234-5678",
}

Normally a value is given between quotes (or curly brackets) but single words are also OK (there is no real benefit in not using quotes, so we advise to always use them). There can be many more fields and instead of strings one can use predefined shortcuts. The title for example quite often contains TEX macros. Some fields, like pages have funny characters such as the endash (typically as --) so we have a mixture of data and typesetting directives. If you are covering non--english references, you often need characters that are not in the ascii subset but ConTEXt is quite happy with utf. If your database file uses old-fashioned TEX accent commands then these will be internally converted automatically to utf. Commands (macros) are converted to an indirect call, which is quite robust.

The bibTEX files are loaded in memory as Lua table but can be converted to xml so that we can access them in a more flexible way, but that is a subject for specialists.

In the old MkII setup we have two kinds of entries: the ones that come from the bibTEX run and user supplied ones. We no longer rely on bibTEX output but we do still support the user supplied definitions. These were in fact prepared in a way that suits the processing of bibTEX generated entries. The next variant reflects the ConTEXt recoding of the old bibTEX output.

\startpublication[k=Hagen:Second,t=article,a={Hans Hagen},y=2013,s=HH01]
    \artauthor[]{Hans}[H.]{}{Hagen}
    \arttitle{Who knows more?}
    \journal{MyJournal}
    \pubyear{2013}
    \month{8}
    \volume{1}
    \issue{3}
    \issn{1234-5678}
    \pages{123--126}
\stoppublication

The split \artauthor fields are collapsed into a single author field as we deal with the splitting later when it gets parsed in Lua. The \artauthor syntax is only kept around for backward compatibility with the previous use of bibTEX.

In the new setup we support these variants as well:

\startpublication[k=Hagen:Third,t=article]
    \author{Hans Hagen}
    \title{Who knows who?}
    ...
\stoppublication

and

\startpublication[tag=Hagen:Third,category=article]
    \author{Hans Hagen}
    \title{Who knows who?}
    ...
\stoppublication

and

\startpublication
    \tag{Hagen:Third}
    \category{article}
    \author{Hans Hagen}
    \title{Who knows who?}
    ...
\stoppublication

Because internally the entries are Lua tables, we also support loading of Lua based definitions:

return {
    ["Hagen:First"] = {
        author   = "Hans Hagen",
        category = "article",
        issn     = "1234-5678",
        issue    = "3",
        journal  = "MyJournal",
        month    = "8",
        pages    = "123--126",
        tag      = "Hagen:First",
        title    = "Who knows nothing?",
        volume   = "1",
        year     = "2013",
    },
}

Files set up like this can be loaded too. The following xml input is rather close to this, and is also accepted as input.

<?xml version="2.0" standalone="yes" ?>
<bibtex>
    <entry tag="Hagen:First" category="article">
        <field name="author">Hans Hagen</field>
        <field name="category">article</field>
        <field name="issn">1234-5678</field>
        <field name="issue">3</field>
        <field name="journal">MyJournal</field>
        <field name="month">8</field>
        <field name="pages">123--126</field>
        <field name="tag">Hagen:First</field>
        <field name="title">Who knows nothing?</field>
        <field name="volume">1</field>
        <field name="year">2013</field>
    </entry>
</bibtex>

Todo: Add some remarks about loading EndNote and RIS formats, but first we need to complete the tag mapping (on Alan’s plate).

So the user has a rather wide choice of formatting style for bibliography database files.

You can load more data than you actually need. Only entries that are referred to explicitly through the \cite and \nocite commands will be shown in lists. We will cover these details later.

Commands in entries

One unfortunate aspect commonly found in bibTEX files is that they often contain TEX commands. Even worse is that there is no standard on what these commands can be and what they mean, at least not formally, as bibTEX is a program intended to be used with many variants of TEX style: plain, LATEX, and others. This means that we need to define our use of these typesetting commands. However, in most cases, they are just abbreviations or font switches and these are often known. Therefore, ConTEXt will try to resolve them before reporting an issue. In the log file there is a list of commands that has been seen in the loaded databases. For instance, loading tugboat.bib gives a long list of commands of which we show a small set here:

publications > start used btx commands
publications > standard CONTEXT 1 known
publications > standard ConTeXt 4 known
publications > standard TeXLive 3 KNOWN
publications > standard eTeX    1 known
publications > standard hbox    6 known
publications > standard sltt    1 unknown
publications > stop used btxcommands

You can define unknown commands, or overload existing definitions in the following way:

\definebtxcommand\TUB {TUGboat}
\definebtxcommand\sltt{\tt}
\definebtxcommand\<#1>{\type{#1}}

Unknown commands do not stall processing, but their names are then typeset in a mono- spaced font so they probably stand out for proofreading. You can access the commands with \btxcommand{...}, as in:

commands like \btxcommand{MySpecialCommand} are handled in an indirect way

As this is an undefined command we get: “commands like MySpecialCommand are handled in an indirect way”.

??


Datasets

Normally in a document you will use only one bibliographic database, whether or not distributed over multiple files. Nevertheless we support multiple databases as well which is why we talk of datasets instead. A dataset is loaded with the \usebtxdataset command. Although currently it is not necessary to define a (default) dataset you can best do this because in the future we might provide more options. Here are some examples:

\definebtxdataset[standard]
\usebtxdataset[standard][tugboat.bib]
\usebtxdataset[standard][mtx-bibtex-output.xml]
\usebtxdataset[standard][test-001-btx-standard.lua]

These three suffixes are understood by the loader. Here the dataset has the name standard and the three database files are merged, where later entries having the same tag overload previous ones. Definitions in the document source (coded in TEX speak) are also added, and they are saved for successive runs. This means that if you load and define entries, they will be known at a next run beforehand, so that references to them are independent of when loading and definitions take place.

setup definition setupbtxdataset

setup definition definebtxdataset

setup definition usebtxdataset

In this document we use some example databases, so let’s load one of them now:

\definebtxdataset[example]
\usebtxdataset[example][mkiv-publications.bib]

You can ask for an overview of entries in a dataset with:

\showbtxdatasetfields[example]

this gives:

tagcategoryfields
demo-001bookauthor index title year
demo-002bookcrossref index year
demo-003bookauthor comment index title year
demo-004bookauthor comment index title year
demo-005bookauthor doi index pages serial title url year

You can set the current active dataset with

\setbtxdataset[standard]

but most publication-related commands accept optional arguments that denote the dataset and references to entries can be prefixed with a dataset identifier.. More about that later.

Sometimes you want to check a database. One way of doing that is the following:

\definebtxdataset[check]
\usebtxdataset[check][mkiv-publications-check.bib]
\showbtxdatasetcompleteness[check]

The database like like this:

The completeness check shows (with green field names) the required fields and when one is missing this is indicated in red. In blue we show what gets inherited.


Renderings

A list of publications can be rendered at any place in the document. A database can be much larger than needed for a document. The same is true for the fields that make up an entry. Here is the list of fields that are currently handled, but of course there can be additional ones:

abstract, address, annotate, assignee, author, bibnumber, booktitle, chapter, comment, country, day, dayfiled, doi, edition, editor, eprint, howpublished, institution, isbn, issn, journal, key, keyword, keywords, language, lastchecked, month, monthfiled, names, nationality, note, notes, number, organization, pages, publisher, revision, school, series, size, title, type, url, volume, year, yearfiled

If you want to see what publications are in the database, the easiest way is to ask for a complete list:

\definebtxrendering
  [example]
  [dataset=example,
   method=local,
   alternative=apa]
\placelistofpublications % \placebtxrendering
  [example]
  [criterium=all]

This gives:1 Hagen, H. and Otten, T. (1996). Typesetting education documents.2 Scarso, L. (2021). Designing high speed trains.3 author (year). title. pages p.

The rendering itself is somewhat complex to set up because we have not only many different standards but also many fields that can be set up. This means that there are several commands involved. Often there is a prescribed style to render bibliographic descriptions, for example apa. A rendering is setup and defined with:

setup definition setupbtxrendering

setup definition definebtxrendering

And a list of such descriptions is generated with:

setup definition placebtxrendering

A dataset can have all kind of entries:

Each has its own rendering variant. To keep things simple we have their settings separated. However, these settings are shared for all rendering alternatives. In practice this is seldom a problem in a publication as only one rendering alternative will be active. If this be not sufficient, you can always group local settings in a setup and hook that into the specific rendering.

setup definition setupbtxlistvariant

setup definition definebtxlistvariant

Examples of list variants are:

setupbtxlistvariant : artauthor

no specific settings

setupbtxlistvariant : author

no specific settings

setupbtxlistvariant : editor

no specific settings

The exact rendering of list entries is determined by the alternative key and defaults to apa which uses definitions from publ-imp-apa.mkiv. If you look at that file you will see that each category has its own setup. You may also notice that additional tests are needed to make sure that empty fields don’t trigger separators and such.

There are a couple of accessors and helpers to get the job done. When you want to fetch a field from the current entry you use \btxfield. In most cases you want to make sure this field has a value, for instance because you don’t want fences or punctuation that belongs to a field.

\btxdoif {title} {
    \bold{\btxfield{title}},
}

There are three test macros:

\btxdoifelse{fieldname}{action when found}{action when not found}
\btxdoif    {fieldname}{action when found}
\btxdoifnot {fieldname}                   {action when not found}

An extra conditional is available for testing interactivity:

\btxdoifelseinteraction{action when true}{action when false}

In addition there is also a conditional \btxinteractive which is more efficient, although in practice efficiency is not so important here.

There are three commands to flush data:

\btxfieldfetch a explicit field (e.g. year)
\btxdetailfetch a derived field (e.g. short)
\btxflushfetch a derived or explicit field

Normally you can use \btxfield or \btxflush as derived fields just like analyzed author fields are flushed in a special way.

You can improve readability by using setups, for instance:

\btxdoifelse {author} {
    \btxsetup{btx:apa:author:yes}
} {
    \btxsetup{btx:apa:author:nop}
}

Keep in mind that normally you don’t need to mess with definitions like this because standard rendering styles are provided. These styles use a few helpers that inject symbols but also take care of leading and trailing spaces:

\btxspacebefore after
\btxperiodbefore. after
\btxcommabefore, after
\btxlparentbefore (after
\btxrparentbefore) after
\btxlbracketbefore [after
\btxrbracketbefore] after

So, the previous example setup can be rewritten as:

\btxdoif {title} {
    \bold{\btxfield{title}}
    \btxcomma
}

There is a special command for rendering a (combination) of authors:

\btxflushauthor{author}
\btxflushauthor{editor}
\btxflushauthor[inverted]{editor}

Instead of the last one you can also use:

\btxflushauthorinverted{editor}

You can use a (configurable) default or pass directives: Valid directives are

conversionrendering
invertedthe Frog jr, Kermit
invertedshortthe Frog jr, K
normalKermit, the Frog, jr
normalshortK, the Frog, jr


Citations

Citations are references to bibliographic entries that normally show up in lists someplace in the document: at the end of a chapter, in an appendix, at the end of an article, etc. We discussed the rendering of these lists in the previous chapter. A citation is normally pretty short as its main purpose is to refer uniquely to a more detailed description. But, there are several ways to refer, which is why the citation subsystem is configurable and extensible. Just look at the following commands:

\cite[author][example::demo-003]
\cite[authoryear][example::demo-003]
\cite[authoryears][example::demo-003]
\cite[author][example::demo-003,demo-004]
\cite[authoryear][example::demo-003,demo-004]
\cite[authoryears][example::demo-003,demo-004]
\cite[author][example::demo-004,demo-003]
\cite[authoryear][example::demo-004,demo-003]
\cite[authoryears][example::demo-004,demo-003]
  • (Hans Hagen and Ton Otten)
  • (Hans Hagen and Ton Otten (1996))
  • (Hans Hagen and Ton Otten, 1996)
  • (Hans Hagen and Ton Otten, Luigi Scarso)
  • (Hans Hagen and Ton Otten (1996), Luigi Scarso (2021))
  • (Hans Hagen and Ton Otten, 1996, Luigi Scarso, 2021)
  • (Luigi Scarso, Hans Hagen and Ton Otten)
  • (Luigi Scarso (2021), Hans Hagen and Ton Otten (1996))
  • (Luigi Scarso, 2021, Hans Hagen and Ton Otten, 1996)

The first argument is optional.

setup definition cite

You can tune the way a citation shows up:

\setupbtxcitevariant[author]     [sorttype=author,color=darkyellow]
\setupbtxcitevariant[authoryear] [sorttype=author,color=darkyellow]
\setupbtxcitevariant[authoryears][sorttype=author,color=darkyellow]
\cite[author][example::demo-004,demo-003]
\cite[authoryear][example::demo-004,demo-003]
\cite[authoryears][example::demo-004,demo-003]

Here we sort the authors and color the citation:

  • (Hans Hagen and Ton Otten, Luigi Scarso)
  • (Hans Hagen and Ton Otten (1996), Luigi Scarso (2021))
  • (Hans Hagen and Ton Otten, 1996, Luigi Scarso, 2021)

For reasons of backward compatibility the \cite command is a bit picky about spaces between the two arguments, of which the first is optional. This is a consequence of allowing its use with the key specified between curly brackets as is the traditional practice. (We do encourage users to adopt the more coherent ConTEXt syntax by using square brackets for keywords and reserving curly brackets to regroup text to be typeset.)

The \citation command is synonymous but is more flexible with respect to spacing of its arguments:

\citation[author]     [example::demo-004,demo-003]
\citation[authoryear] [example::demo-004,demo-003]
\citation[authoryears][example::demo-004,demo-003]

There is a whole bunch of cite options and more can be easily defined.

keyrendering
author(author)
authornum[author [btx error 1]]
authoryear(author (year))
authoryears(author, year)
doi[todo: doi]
key[demo-005]
none
numbtx error 1
pagepages
serial[5]
short[aut00]
type[book]
url[todo: url]
year(year)

Because we are dealing with database input and because we generally need to manipulate entries, much of the work is delegated to Lua. This makes it easier to maintain and extend the code. Of course TEX still does the rendering. The typographic details are controlled by parameters but not all are used in all variants. As with most ConTEXt commands, it starts out with a general setup command:

setup definition setupbtxcitevariant

On top of that we can define instances that inherit either from a given parent or from the topmost setup.

setup definition definebtxcitevariant

But, specific variants can have them overloaded:

setupbtxcitevariant : author

right)
middle,
left(

setupbtxcitevariant : authornum

right]
middle,
left[

setupbtxcitevariant : authoryear

compressyes
inbetween,
right)
middle,
left(

setupbtxcitevariant : authoryears

compressyes
inbetween,
right)
middle,
left(

setupbtxcitevariant : doi

right]
left[

setupbtxcitevariant : key

right]
left[

setupbtxcitevariant : none

no specific settings

setupbtxcitevariant : num

compressyes
inbetween--
right]
left[

setupbtxcitevariant : page

inbetween

setupbtxcitevariant : serial

right]
left[

setupbtxcitevariant : short

right]
left[

setupbtxcitevariant : type

right]
left[

setupbtxcitevariant : url

right]
left[

setupbtxcitevariant : year

right)
left(

A citation variant is defined in several steps and if you really want to know the dirty details, you should look into the publ-imp-*.mkiv files. Here we stick to the concept.

\startsetups btx:cite:author
    \btxcitevariant{author}
\stopsetups

You can overload such setups if needed, but that only makes sense when you cannot configure the rendering with parameters. The \btxcitevariant command is one of the build in accessors and it calls out to Lua where more complex manipulation takes place if needed. If no manipulation is known, the field with the same name (if found) will be flushed. A command like \btxcitevariant assumes that a dataset and specific tag has been set. This is normally done in the wrapper macros, like \cite. For special purposes you can use these commands

\setbtxdataset[example]
\setbtxentry[hh2013]

But don’t expect too much support for such low level rendering control.

Unless you use criterium=all only publications that are cited will end up in the lists. You can force a citation into a list using \usecitation, for example:

\usecitation[example::demo-004,demo-003]

This command has two synonyms: \nocite and \nocitation so you can choose whatever fits you best.

setup definition nocite


The LUA view

Because we manage data at the Lua end it is tempting to access it there for other purposes. This is fine as long as you keep in mind that aspects of the implementation may change over time, although this is unlikely once the modules become stable.

The entries are collected in datasets and each set has a unique name. In this document we have the set named example. A dataset table has several fields, and probably the one of most interest is the luadata field. Each entry in this table describes a publication:

t={ 
 ["author"]="Hans Hagen", 
 ["category"]="book", 
 ["index"]=1, 
 ["tag"]="demo-001", 
 ["title"]="\\btxcmd{BIBTEX}, the \\btxcmd{CONTEXT}\\ way", 
 ["year"]="2013", 
} 

This is publications.datasets.example.luadata["demo-001"]. There can be a companion entry in the parallel details table.

t={ 
 ["author"]={ 
  { 
   ["firstnames"]={ "Hans" }, 
   ["initials"]={ "H" }, 
   ["original"]="Hans Hagen", 
   ["surnames"]={ "Hagen" }, 
   ["vons"]={}, 
  }, 
 }, 
 ["short"]="Hag13", 
} 

These details are accessed as publications.datasets.example.details["demo-001"] and by using a separate table we can overload fields in the original entry without losing the original.

You can loop over the entries using regular Lua code combined with MkIV helpers:

local dataset = publications.datasets.example
context.starttabulate { "|l|l|l|" }
for tag, entry in table.sortedhash(dataset.luadata) do
    local detail = dataset.details[tag] or { }
    context.NC() context.type(tag)
    context.NC() context(detail.short)
    context.NC() context(entry.title)
    context.NC() context.NR()
end
context.stoptabulate()

This results in:

demo-001Hag13bibTEX, the ConTEXt way
demo-002Hag14bibTEX, the ConTEXt way
demo-003HO96Typesetting education documents
demo-004Sca21Designing high speed trains
demo-005aut00title

You can manipulate a dataset after loading. Of course this assumes that you know what kind of content you have and what you need for rendering. As example we load a small dataset.

\definebtxdataset[drumming]
\usebtxdataset[drumming][mkiv-publications.lua]

Because we’re going to do some Lua, we could also have loaded the dataset with:

publications.load("drumming","mkiv-publications.lua","lua")

The dataset has three entries:

As you can see, we can have a subtitle. We will combine the title and subtitle into one:

\startluacode
for tag, entry in next, publications.datasets.drumming.luadata do
    if entry.subtitle then
        if entry.title then
            entry.title = entry.title .. ", " .. entry.subtitle
        else
            entry.title = entry.subtitle
        end
        entry.subtitle = nil
        logs.report("btx","combining title and subtitle of entry tagged %a",tag)
    end
end
\stopluacode

We can now typeset the entries with:

\definebtxrendering[drumming][dataset=drumming,method=dataset]
\placebtxrendering[drumming]

Because we just want to show the entries, and have no citations that force them to be shown, we have to the method to dataset.1


The XML view

The luadata table can be converted into an xml representation. This is a follow up on earlier experiments with an xml-only approach. I decided in the end to stick to a Lua approach and provide some simple xml support in addition.

Once a dataset is accessible as xml tree, you can use the regular \xml... commands. We start with loading a dataset, in this case from just one file.

\usebtxdataset[tugboat][tugboat.bib]

The dataset has to be converted to xml:

\convertbtxdatasettoxml[tugboat]

The tree is now accessible by its root reference btx:tugboat. If we want simple field access we can use a few setups:

\startxmlsetups btx:initialize
    \xmlsetsetup{#1}{bibtex|entry|field}{btx:*}
    \xmlmain{#1}
\stopxmlsetups
\startxmlsetups btx:field
    \xmlflushcontext{#1}
\stopxmlsetups
\xmlsetup{btx:tugboat}{btx:initialize}

The two setups are predefined in the core already, but you might want to change them. They are applied in for instance:

\starttabulate[|||]
    \NC \type {tag}   \NC \xmlfirst {btx:tugboat}
        {/bibtex/entry[string.find(@tag,'Hagen')]/attribute('tag')}
    \NC \NR
    \NC \type {title} \NC \xmlfirst {btx:tugboat}
        {/bibtex/entry[string.find(@tag,'Hagen')]/field[@name='title']}
    \NC \NR
\stoptabulate
tagHagen:TB17-1-54
titlePPCHTEX: typesetting chemical formulas in TEX


\startxmlsetups btx:demo
    \xmlcommand
        {#1}
        {/bibtex/entry[string.find(@tag,'Hagen')][1]}{btx:table}
\stopxmlsetups
\startxmlsetups btx:table
\starttabulate[|||]
    \NC \type {tag}   \NC \xmlatt{#1}{tag} \NC \NR
    \NC \type {title} \NC \xmlfirst{#1}{/field[@name='title']} \NC \NR
\stoptabulate
\stopxmlsetups
\xmlsetup{btx:tugboat}{btx:demo}
tagHagen:TB17-1-54
titlePPCHTEX: typesetting chemical formulas in TEX

Here is another example:

\startxmlsetups btx:row
    \NC \xmlatt{#1}{tag}
    \NC \xmlfirst{#1}{/field[@name='title']}
    \NC \NR
\stopxmlsetups
\startxmlsetups btx:demo
    \xmlfilter {#1} {
        /bibtex
        /entry[@category='article']
        /field[@name='author' and (find(text(),'Knuth') or find(text(),'DEK'))]
        /../command(btx:row)
    }
\stopxmlsetups
\starttabulate[|||]
    \xmlsetup{btx:tugboat}{btx:demo}
\stoptabulate
Knuth:TB10-1-31Typesetting Concrete Mathematics
Knuth:TB10-1-8TEX would find it difficult …
Knuth:TB10-3-325The new versions of TEX and MF
Knuth:TB10-4-529The errors of TEX
Knuth:TB11-1-13Virtual Fonts: More Fun for Grand Wizards
Knuth:TB11-2-165Exercises for TEX: The Program
Knuth:TB11-4-489The future of TEX and MF
Knuth:TB11-4-497Arthur Lee Samuel, 1901--1990
Knuth:TB11-4-499Answers to Exercises for TEX: The Program
Knuth:TB12-2-313Fixed-point glue setting: Errata
Knuth:TB14-4-387Icons for TEX and MF
Knuth:TB17-1-29Important message regarding CM fonts
Knuth:TB2-3-5The current state of things
Knuth:TB3-1-10Fixed-point glue settingDashan example of WEB
Knuth:TB31-2-121An Earthshaking Announcement
Knuth:TB4-2-64A note on hyphenation
Knuth:TB5-1-4TEX incunabula
Knuth:TB5-1-67Comments on quality in publishing
Knuth:TB5-2-105A course on MF programming
Knuth:TB6-1-36Recipes and fractions
Knuth:TB7-2-101The TEX logo in various fonts
Knuth:TB7-2-95Remarks to celebrate the publication of Computers & Typesetting
Knuth:TB8-1-14Mixing right-to-left texts with left-to-right texts
Knuth:TB8-1-6It happened: announcement of TEX 2.1
Knuth:TB8-1-73Problem for a Saturday afternoon
Knuth:TB8-2-135Fonts for digital halftones
Knuth:TB8-2-210Saturday morning problemDashsolution
Knuth:TB8-2-217Reply: Printing out selected pages
Knuth:TB8-3-309Macros for Jill
Knuth:TB9-2-152A Punk Meta-Font

A more extensive example is the following. Of course this assumes that you know what xml support mechanisms and macros are available.

\startxmlsetups btx:getkeys
    \xmladdsortentry{btx}{#1}{\xmlfilter{#1}{/field[@name='author']/text()}}
    \xmladdsortentry{btx}{#1}{\xmlfilter{#1}{/field[@name='year'  ]/text()}}
    \xmladdsortentry{btx}{#1}{\xmlatt{#1}{tag}}
\stopxmlsetups
\startxmlsetups btx:sorter
    \xmlresetsorter{btx}
  % \xmlfilter{#1}{entry/command(btx:getkeys)}
    \xmlfilter{#1}{
        /bibtex
        /entry[@category='article']
        /field[@name='author' and find(text(),'Knuth')]
        /../command(btx:getkeys)}
    \xmlsortentries{btx}
    \starttabulate[||||]
        \xmlflushsorter{btx}{btx:entry:flush}
    \stoptabulate
\stopxmlsetups
\startxmlsetups btx:entry:flush
    \NC \xmlfilter{#1}{/field[@name='year'  ]/context()}
    \NC \xmlatt{#1}{tag}
    \NC \xmlfilter{#1}{/field[@name='author']/context()}
    \NC \NR
\stopxmlsetups
\xmlsetup{btx:tugboat}{btx:sorter}
1984Knuth:TB5-1-67Don Knuth
1984Knuth:TB5-1-4Donald E. Knuth
1984Knuth:TB5-2-105Donald E. Knuth
1985Knuth:TB6-1-36Donald E. Knuth
1986Knuth:TB7-2-101Donald E. Knuth
1987Knuth:TB8-2-135Donald E. Knuth
1987Knuth:TB8-3-309Donald E. Knuth
1988Knuth:TB9-2-152Donald E. Knuth
1989Knuth:TB10-3-325Donald E. Knuth
1989Knuth:TB10-4-529Donald E. Knuth
1990Knuth:TB11-4-489Donald E. Knuth
1993Knuth:TB14-4-387Donald E. Knuth
1996Knuth:TB17-1-29Donald E. Knuth
1987Knuth:TB8-1-14Donald Knuth and Pierre MacKay
1981Knuth:TB2-3-5Donald Knuth
1982Knuth:TB3-1-10Donald Knuth
1983Knuth:TB4-2-64Donald Knuth
1986Knuth:TB7-2-95Donald Knuth
1987Knuth:TB8-1-6Donald Knuth
1987Knuth:TB8-1-73Donald Knuth
1987Knuth:TB8-2-210Donald Knuth
1987Knuth:TB8-2-217Donald Knuth
1989Knuth:TB10-1-8Donald Knuth
1989Knuth:TB10-1-31Donald Knuth
1990Knuth:TB11-1-13Donald Knuth
1990Knuth:TB11-2-165Donald Knuth
1990Knuth:TB11-4-497Donald Knuth
1990Knuth:TB11-4-499Donald Knuth
1991Knuth:TB12-2-313Donald Knuth
2010Knuth:TB31-2-121Donald Knuth

The original data is stored in a Lua table, hashed by tag. Starting with Lua 5.2 each run of Lua gets a different ordering of such a hash. In older versions, when you looped over a hash, the order was undefined, but the same as long as you used the same binary. This had the advantage that successive runs, something we often have in document processing gave consistent results. In today’s Lua we need to do much more sorting of hashes before we loop, especially when we save multi--pass data. It is for this reason that the xml tree is sorted by hash key by default. That way lookups (especially the first of a set) give consistent outcomes.


Standards

The rendering of bibliographic entries is often standardized and prescribed by the publisher. If you submit an article to a journal, normally it will be reformatted (or even re- keyed) and the rendering will happen at the publishers end. In that case it may not matter how entries were rendered when writing the publication, because the publisher will do it his or her way. This means that most users probably will stick to the standard apa rules and for them we provide some configuration. Because we use setups it is easy to overload specifics. If you really want to tweak, best look in the files that deal with it.

Many standards exist and support for other renderings may be added to the core. Interested users are invited to develop and to test alternate standard renderings according to their needs.

Todo: maybe a list of categories and fields.


Cleaning up

Although the bibTEX format is reasonably well defined, in practice there are many ways to organize the data. For instance, one can use predefined string constants that get used (either or not combined with other strings) later on. A string can be enclosed in curly braces or double quotes. The strings can contain TEX commands but these are not standardized. The databases often have somewhat complex ways to deal with special characters and the use of braces in their definition is also not normalized.

The most complex to deal with are the fields that contain names of people. At some point it might be needed to split a combination of names into individual ones that then get split into title, first name, optional inbetweens, surname(s) and additional: Prof. Dr. Alfred B. C. von Kwik Kwak Jr. II and P. Q. Olet is just one example of this. The convention seems to be not to use commas but and to separate names (often each name will be specified as lastname, firstname).

We don’t see it as challenge nor as a duty to support all kinds of messy definitions. Of course we try to be somewhat tolerant, but you will be sure to get better results if you use nicely setup, consistent databases.

Todo: maybe some examples of bad.


Transition

In the original bibliography support module usage was as follows (example taken from the contextgarden wiki):

% engine=pdftex
\usemodule[bib]
\usemodule[bibltx]
\setupbibtex
  [database=xampl]
\setuppublications
  [numbering=yes]
\starttext
    As \cite [article-full] already indicated, bibtex is a \LATEX||centric
    program.
    \completepublications
\stoptext

For MkIV the modules were partly rewritten and ended up in the core so the two commands were no longer needed. The overhead associated with the automatic loading of the bibliography macros can be neglected these days, so standardized modules such as bib are all being moved to the core and do not need to be explicitly loaded.

The first \setupbibtex command in this example is needed to bootstrap the process: it tells what database has to be processed by bibTEX between runs. The second \setuppublications command is optional. Each citation (tagged with \cite) ends up in the list of publications.

In the new approach we no longer use bibTEXso we don’t need to setup bibTEX. Instead we define dataset(s). We also no longer set up publications with one command, but have split that up in rendering-, list-, and cite-variants. The basic \cite command remains. The above example becomes:

\definebtxdataset
  [document]
\usebtxdataset
  [document]
  [mybibfile.bib]
\definebtxrendering
  [document]
\setupbtxrendering
  [document]
  [numbering=yes]
\starttext
    As \cite [article-full] already indicated, bibtex is a \LATEX||centric
    program.
    \completebtxrendering[document]
\stoptext

So, we have a few more commands to set up things. If you intend to use just a single dataset and rendering, the above preamble can be simplified to:

\usebtxdataset
  [mybibfile.bib]
\setupbtxrendering
  [numbering=yes]

But keep in mind that compared to the old MkII derived method we have moved some of the options to the rendering, list and cite setup variants.

Another difference is now the use of lists. When you define a rendering, you also define a list. However, all entries are collected in a common list tagged btx. Although you will normally configure a rendering you can still set some properties of lists, but in that case you need to prefix the list identifier. In the case of the above example this is btx:document.


MLBIBTEX

Todo: how to plug in MLbibTEX for sorting and other advanced operations.


Extensions

As TEX and Lua are both open and accessible in ConTEXt it is possible to extend the functionality of the bibliography related code. For instance, you can add extra loaders.

function publications.loaders.myformat(dataset,filename)
    local t = { }
    -- Load data from 'filename' and convert it to a Lua table 't' with
    -- the key as hash entry and fields conforming the luadata table
    -- format.
    loaders.lua(dataset,t)
end

This then permits loading a database (into a dataset) with the command:

\usebtxdataset[standard][myfile.myformat]

The myformat suffix is recognized automatically. If you want to use another suffix, you can do this:

\usebtxdataset[standard][myformat::myfile.txt]


Notes

The move from external bibTEX processing to internal processing has the advantage that we stay within the same run. In the traditional approach we had roughly the following steps:

the first run information is collected and written to file
after that run the bibTEX program converts that file to another one
successive runs use that data for references and producing lists

In the MkIV approach the bibliographic database is loaded in memory each run and processing also happens each run. On paper this looks less efficient but as Lua is quite fast, in practice performance is much better.

Probably most demanding is the treatment of authors as we have to analyze names, split multiple authors and reassemble firstnames, vons, surnames and juniors. When we sort by author sorting vectors have to be made which also has a penalty. However, in practice the user will not notice a performance degradation. We did some tests with a list of 500.000 authors, sorted them and typeset them as list (producing some 5400 dense pages in a small font and with small margins). This is typical one of these cases where using LuajitTEX saves quite time. On my machine it took just over 100 seconds to get this done. Unfortunately not all operating systems performed equally well: 32 bit versions worked fine, but 64 bit linux either crashed (stalled) the machine or ran out of memory rather fast, while MacOSX and Windows performed fine. In practice you will never run into this, unless you produce massive amounts of bibliographic entries. LuaJIT has some benefits but also some drawbacks.