Changes

Jump to navigation Jump to search
no edit summary
==The database==
The <span tag="MKIV" style="font-style:sans;">bibTEX <span > format is rather popular in the <span tag="MKIV" style="font-style:sans;">TEX <span > community and even with its shortcomings it will stay around for a while. Many publication websites can export and many tools are available to work with this database format. It is rather simple and looks a bit like <span tag="MKIV" style="font-style:sans;">Lua <span > tables. Unfortunately the content can be polluted with non-standardized <span tag="MKIV" style="font-style:sans;">TEX <span > commands which complicates pre- or postprocessing outside <span tag="MKIV" style="font-style:sans;">TEX<span >. In that sense a <span tag="MKIV" style="font-style:sans;">bibTEX <span > database is often not coded neutrally. Some limitations, like the use of commands to encode accented characters root in the <span tag="MKIV" style="font-style:sans;">ascii <span > world and can be bypassed by using <span tag="MKIV" style="font-style:sans;">utf <span > instead (as handled somewhat in <span tag="MKIV" style="font-style:sans;">LATEX <span > through extensions such as <tt style="color:rgb(0,102,102);font-size:120%;" >bibtex8</tt>).
<br/>
The normal way to deal with a bibliography is to refer to entries using a unique tag or key. When a list of entries is typeset, this reference can be used for linking purposes. The typeset list can be processed and sorted using the <tt style="color:rgb(0,102,102);font-size:120%;" >bibtex</tt> program that converts the database into something more <span tag="MKIV" style="font-style:sans;">TEX <span > friendly (a <tt style="color:rgb(0,102,102);font-size:120%;" >.bbl</tt> file). I never used the program myself (nor bibliographies) so I will not go into too much detail here, if only because all I say can be wrong.
<br/>
In <span tag="MKIV" style="font-style:sans;">ConTEXt <span > we no longer use the <tt style="color:rgb(0,102,102);font-size:120%;" >bibtex</tt> program: we just use database files and deal with the necessary manipulations directly in <span tag="MKIV" style="font-style:sans;">ConTEXt<span >. One or more such databases can be used and combined with additional entries defined within the document. We can have several such datasets active at the same time.
<br/>
A <span tag="MKIV" style="font-style:sans;">bibTEX <span > file looks like this:
<pre style="color:rgb(102,0,102);font-size:120%">@Article{sometag,
</pre>
<br/>
Normally a value is given between quotes (or curly brackets) but single words are also OK (there is no real benefit in not using quotes, so we advise to always use them). There can be many more fields and instead of strings one can use predefined shortcuts. The title for example quite often contains <span tag="MKIV" style="font-style:sans;">TEX <span > macros. Some fields, like <tt style="color:rgb(0,102,102);font-size:120%;" >pages</tt> have funny characters such as the endash (typically as <tt style="color:rgb(0,102,102);font-size:120%;" >--</tt>) so we have a mixture of data and typesetting directives. If you are covering non--english references, you often need characters that are not in the <span tag="MKIV" style="font-style:sans;">ascii <span > subset but <span tag="MKIV" style="font-style:sans;">ConTEXt <span > is quite happy with <span tag="MKIV" style="font-style:sans;">utf<span >. If your database file uses old-fashioned <span tag="MKIV" style="font-style:sans;">TEX <span > accent commands then these will be internally converted automatically to <span tag="MKIV" style="font-style:sans;">utf<span >. Commands (macros) are converted to an indirect call, which is quite robust.
<br/>
The <span tag="MKIV" style="font-style:sans;">bibTEX <span > files are loaded in memory as <span tag="MKIV" style="font-style:sans;">Lua <span > table but can be converted to <span tag="MKIV" style="font-style:sans;">xml <span > so that we can access them in a more flexible way, but that is a subject for specialists.
<br/>
In the old <span tag="MKIV" style="font-style:sans;">MkII <span > setup we have two kinds of entries: the ones that come from the <span tag="MKIV" style="font-style:sans;">bibTEX <span > run and user supplied ones. We no longer rely on <span tag="MKIV" style="font-style:sans;">bibTEX <span > output but we do still support the user supplied definitions. These were in fact prepared in a way that suits the processing of <span tag="MKIV" style="font-style:sans;">bibTEX <span > generated entries. The next variant reflects the <span tag="MKIV" style="font-style:sans;">ConTEXt <span > recoding of the old <span tag="MKIV" style="font-style:sans;">bibTEX <span > output.
<pre style="color:rgb(102,0,102);font-size:120%">\startpublication[k=Hagen:Second,t=article,a={Hans Hagen},y=2013,s=HH01]
</pre>
<br/>
The split <tt style="color:rgb(0,102,102);font-size:120%;" >\artauthor</tt> fields are collapsed into a single <tt style="color:rgb(0,102,102);font-size:120%;" >author</tt> field as we deal with the splitting later when it gets parsed in <span tag="MKIV" style="font-style:sans;">Lua<span >. The <tt style="color:rgb(0,102,102);font-size:120%;" >\artauthor</tt> syntax is only kept around for backward compatibility with the previous use of <span tag="MKIV" style="font-style:sans;">bibTEX<span >.
<br/>
In the new setup we support these variants as well:
</pre>
<br/>
Because internally the entries are <span tag="MKIV" style="font-style:sans;">Lua <span > tables, we also support loading of <span tag="MKIV" style="font-style:sans;">Lua <span > based definitions:
<pre style="color:rgb(102,0,102);font-size:120%">return {
</pre>
<br/>
Files set up like this can be loaded too. The following <span tag="MKIV" style="font-style:sans;">xml <span > input is rather close to this, and is also accepted as input.
<pre style="color:rgb(102,0,102);font-size:120%"><?xml version="2.0" standalone="yes" ?>
==Commands in entries==
One unfortunate aspect commonly found in <span tag="MKIV" style="font-style:sans;">bibTEX <span > files is that they often contain <span tag="MKIV" style="font-style:sans;">TEX <span > commands. Even worse is that there is no standard on what these commands can be and what they mean, at least not formally, as <span tag="MKIV" style="font-style:sans;">bibTEX <span > is a program intended to be used with many variants of <span tag="MKIV" style="font-style:sans;">TEX <span > style: plain, <span tag="MKIV" style="font-style:sans;">LATEX<span >, and others. This means that we need to define our use of these typesetting commands. However, in most cases, they are just abbreviations or font switches and these are often known. Therefore, <span tag="MKIV" style="font-style:sans;">ConTEXt <span > will try to resolve them before reporting an issue. In the log file there is a list of commands that has been seen in the loaded databases. For instance, loading <tt style="color:rgb(0,102,102);font-size:120%;" >tugboat.bib</tt> gives a long list of commands of which we show a small set here:
<pre style="color:rgb(102,0,102);font-size:120%">publications > start used btx commands
</pre>
<br/>
These three suffixes are understood by the loader. Here the dataset has the name <tt style="color:rgb(0,102,102);font-size:120%;" >standard</tt> and the three database files are merged, where later entries having the same tag overload previous ones. Definitions in the document source (coded in <span tag="MKIV" style="font-style:sans;">TEX <span > speak) are also added, and they are saved for successive runs. This means that if you load and define entries, they will be known at a next run beforehand, so that references to them are independent of when loading and definitions take place.
<div style="border:thin solid black;" >
<span style="font-style:oblique;" > setup definition setupbtxdataset </span >
<br/>
For reasons of backward compatibility the <tt style="color:rgb(0,102,102);font-size:120%;" >\cite</tt> command is a bit picky about spaces between the two arguments, of which the first is optional. This is a consequence of allowing its use with the key specified between curly brackets as is the traditional practice. (We do encourage users to adopt the more coherent <span tag="MKIV" style="font-style:sans;">ConTEXt <span > syntax by using square brackets for keywords and reserving curly brackets to regroup text to be typeset.)
<br/>
The <tt style="color:rgb(0,102,102);font-size:120%;" >\citation</tt> command is synonymous but is more flexible with respect to spacing of its arguments:
<br/>
Because we are dealing with database input and because we generally need to manipulate entries, much of the work is delegated to <span tag="MKIV" style="font-style:sans;">Lua<span >. This makes it easier to maintain and extend the code. Of course <span tag="MKIV" style="font-style:sans;">TEX <span > still does the rendering. The typographic details are controlled by parameters but not all are used in all variants. As with most <span tag="MKIV" style="font-style:sans;">ConTEXt <span > commands, it starts out with a general setup command:
<div style="border:thin solid black;" >
<span style="font-style:oblique;" > setup definition setupbtxcitevariant </span >
</pre>
<br/>
You can overload such setups if needed, but that only makes sense when you cannot configure the rendering with parameters. The <tt style="color:rgb(0,102,102);font-size:120%;" >\btxcitevariant</tt> command is one of the build in accessors and it calls out to <span tag="MKIV" style="font-style:sans;">Lua <span > where more complex manipulation takes place if needed. If no manipulation is known, the field with the same name (if found) will be flushed. A command like <tt style="color:rgb(0,102,102);font-size:120%;" >\btxcitevariant</tt> assumes that a dataset and specific tag has been set. This is normally done in the wrapper macros, like <tt style="color:rgb(0,102,102);font-size:120%;" >\cite</tt>. For special purposes you can use these commands
<pre style="color:rgb(102,0,102);font-size:120%">\setbtxdataset[example]
==The LUA view==
Because we manage data at the <span tag="MKIV" style="font-style:sans;">Lua <span > end it is tempting to access it there for other purposes. This is fine as long as you keep in mind that aspects of the implementation may change over time, although this is unlikely once the modules become stable.
<br/>
The entries are collected in datasets and each set has a unique name. In this document we have the set named <tt style="color:rgb(0,102,102);font-size:120%;" >example</tt>. A dataset table has several fields, and probably the one of most interest is the <tt style="color:rgb(0,102,102);font-size:120%;" >luadata</tt> field. Each entry in this table describes a publication:
These details are accessed as <tt style="color:rgb(0,102,102);font-size:120%;" >publications.datasets.example.details["demo-001"]</tt> and by using a separate table we can overload fields in the original entry without losing the original.
<br/>
You can loop over the entries using regular <span tag="MKIV" style="font-style:sans;">Lua <span > code combined with <span tag="MKIV" style="font-style:sans;">MkIV <span > helpers:
<pre style="color:rgb(102,0,102);font-size:120%">local dataset = publications.datasets.example
|
<span tag="MKIV" style="font-style:sans;">bibTEX<span >, the <span tag="MKIV" style="font-style:sans;">ConTEXt <span > way
|
|
<span tag="MKIV" style="font-style:sans;">bibTEX<span >, the <span tag="MKIV" style="font-style:sans;">ConTEXt <span > way
|
==The XML view==
The <tt style="color:rgb(0,102,102);font-size:120%;" >luadata</tt> table can be converted into an <span tag="MKIV" style="font-style:sans;">xml <span > representation. This is a follow up on earlier experiments with an <span tag="MKIV" style="font-style:sans;">xml<span >-only approach. I decided in the end to stick to a <span tag="MKIV" style="font-style:sans;">Lua <span > approach and provide some simple <span tag="MKIV" style="font-style:sans;">xml <span > support in addition.
<br/>
Once a dataset is accessible as <span tag="MKIV" style="font-style:sans;">xml <span > tree, you can use the regular <tt style="color:rgb(0,102,102);font-size:120%;" >\xml...</tt> commands. We start with loading a dataset, in this case from just one file.
<pre style="color:rgb(102,0,102);font-size:120%">\usebtxdataset[tugboat][tugboat.bib]
</pre>
<br/>
The dataset has to be converted to <span tag="MKIV" style="font-style:sans;">xml<span >:
<pre style="color:rgb(102,0,102);font-size:120%">\convertbtxdatasettoxml[tugboat]
<br/>
A more extensive example is the following. Of course this assumes that you know what <span tag="MKIV" style="font-style:sans;">xml <span > support mechanisms and macros are available.
<pre style="color:rgb(102,0,102);font-size:120%">\startxmlsetups btx:getkeys
<br/>
The original data is stored in a <span tag="MKIV" style="font-style:sans;">Lua <span > table, hashed by tag. Starting with <span tag="MKIV" style="font-style:sans;">Lua <span > 5.2 each run of <span tag="MKIV" style="font-style:sans;">Lua <span > gets a different ordering of such a hash. In older versions, when you looped over a hash, the order was undefined, but the same as long as you used the same binary. This had the advantage that successive runs, something we often have in document processing gave consistent results. In today’s <span tag="MKIV" style="font-style:sans;">Lua <span > we need to do much more sorting of hashes before we loop, especially when we save multi--pass data. It is for this reason that the <span tag="MKIV" style="font-style:sans;">xml <span > tree is sorted by hash key by default. That way lookups (especially the first of a set) give consistent outcomes.
==Standards==
The rendering of bibliographic entries is often standardized and prescribed by the publisher. If you submit an article to a journal, normally it will be reformatted (or even re- keyed) and the rendering will happen at the publishers end. In that case it may not matter how entries were rendered when writing the publication, because the publisher will do it his or her way. This means that most users probably will stick to the standard <span tag="MKIV" style="font-style:sans;">apa <span > rules and for them we provide some configuration. Because we use setups it is easy to overload specifics. If you really want to tweak, best look in the files that deal with it.
<br/>
Many standards exist and support for other renderings may be added to the core. Interested users are invited to develop and to test alternate standard renderings according to their needs.
==Cleaning up==
Although the <span tag="MKIV" style="font-style:sans;">bibTEX <span > format is reasonably well defined, in practice there are many ways to organize the data. For instance, one can use predefined string constants that get used (either or not combined with other strings) later on. A string can be enclosed in curly braces or double quotes. The strings can contain <span tag="MKIV" style="font-style:sans;">TEX <span > commands but these are not standardized. The databases often have somewhat complex ways to deal with special characters and the use of braces in their definition is also not normalized.
<br/>
The most complex to deal with are the fields that contain names of people. At some point it might be needed to split a combination of names into individual ones that then get split into title, first name, optional inbetweens, surname(s) and additional: <tt style="color:rgb(0,102,102);font-size:120%;" >Prof. Dr. Alfred B. C. von Kwik Kwak Jr. II and P. Q. Olet</tt> is just one example of this. The convention seems to be not to use commas but <tt style="color:rgb(0,102,102);font-size:120%;" >and</tt> to separate names (often each name will be specified as lastname, firstname).
</pre>
<br/>
For <span tag="MKIV" style="font-style:sans;">MkIV <span > the modules were partly rewritten and ended up in the core so the two commands were no longer needed. The overhead associated with the automatic loading of the bibliography macros can be neglected these days, so standardized modules such as <tt style="color:rgb(0,102,102);font-size:120%;" >bib</tt> are all being moved to the core and do not need to be explicitly loaded.
<br/>
The first <tt style="color:rgb(0,102,102);font-size:120%;" >\setupbibtex</tt> command in this example is needed to bootstrap the process: it tells what database has to be processed by <span tag="MKIV" style="font-style:sans;">bibTEX <span > between runs. The second <tt style="color:rgb(0,102,102);font-size:120%;" >\setuppublications</tt> command is optional. Each citation (tagged with <tt style="color:rgb(0,102,102);font-size:120%;" >\cite</tt>) ends up in the list of publications.
<br/>
In the new approach we no longer use bibTEXso <span tag="MKIV" style="font-style:sans;">bibTEX<span >so we don’t need to setup <span tag="MKIV" style="font-style:sans;">bibTEX<span >. Instead we define dataset(s). We also no longer set up publications with one command, but have split that up in rendering-, list-, and cite-variants. The basic <tt style="color:rgb(0,102,102);font-size:120%;" >\cite</tt> command remains. The above example becomes:
<pre style="color:rgb(102,0,102);font-size:120%">\definebtxdataset
</pre>
<br/>
But keep in mind that compared to the old <span tag="MKIV" style="font-style:sans;">MkII <span > derived method we have moved some of the options to the rendering, list and cite setup variants.
<br/>
Another difference is now the use of lists. When you define a rendering, you also define a list. However, all entries are collected in a common list tagged <tt style="color:rgb(0,102,102);font-size:120%;" >btx</tt>. Although you will normally configure a rendering you can still set some properties of lists, but in that case you need to prefix the list identifier. In the case of the above example this is <tt style="color:rgb(0,102,102);font-size:120%;" >btx:document</tt>.
==MLBIBTEX==
Todo: how to plug in <span tag="MKIV" style="font-style:sans;">MLbibTEX <span > for sorting and other advanced operations.
==Extensions==
As <span tag="MKIV" style="font-style:sans;">TEX <span > and <span tag="MKIV" style="font-style:sans;">Lua <span > are both open and accessible in <span tag="MKIV" style="font-style:sans;">ConTEXt <span > it is possible to extend the functionality of the bibliography related code. For instance, you can add extra loaders.
<pre style="color:rgb(102,0,102);font-size:120%">function publications.loaders.myformat(dataset,filename)

Navigation menu