Open main menu

Changes

no edit summary
=The data­base=
The bibTEX for­mat is rather pop­u­lar in the TEX com­mu­nity and even with its short­com­ings it will stay around for a while. Many pub­li­ca­tion web­sites can ex­port and many tools are avail­able to work with this data­base for­mat. It is rather sim­ple and looks a bit like Lua ta­bles. Un­for­tu­nately the con­tent can be pol­luted with non-stan­dard­ized TEX com­mands which com­pli­cates pre- or post­pro­cess­ing out­side TEX . In that sense a bibTEX data­base is of­ten not coded neu­trally. Some lim­i­ta­tions, like the use of com­mands to en­code ac­cented char­ac­ters root in the ascii world and can be by­passed by us­ing utf in­stead (as han­dled some­what in LATEX through ex­ten­sions such as <tt>bibtex8</tt>).
<br/>
The nor­mal way to deal with a bib­li­og­ra­phy is to re­fer to en­tries us­ing a unique tag or key. When a list of en­tries is type­set, this ref­er­ence can be used for link­ing pur­poses. The type­set list can be processed and sorted us­ing the <tt>bibtex</tt> pro­gram that con­verts the data­base into some­thing more TEX friendly (a <tt>.bbl</tt> file). I never used the pro­gram my­self (nor bib­li­ogra­phies) so I will not go into too much de­tail here, if only be­cause all I say can be wrong.
<br/>
In ConTEXt we no longer use the <tt>bibtex</tt> pro­gram: we just use data­base files and deal with the nec­es­sary ma­nip­u­la­tions di­rectly in ConTEXt . One or more such data­bases can be used and com­bined with ad­di­tional en­tries de­fined within the doc­u­ment. We can have sev­eral such datasets ac­tive at the same time.
<br/>
A bibTEX file looks like this:
<pre detail='typing'>
@Article{sometag,
</pre>
<br/>
Nor­mally a value is given be­tween quotes (or curly brack­ets) but sin­gle words are also OK (there is no real ben­e­fit in not us­ing quotes, so we ad­vise to al­ways use them). There can be many more fields and in­stead of strings one can use pre­de­fined short­cuts. The ti­tle for ex­am­ple quite of­ten con­tains TEX macros. Some fields, like <tt>pages</tt> have funny char­ac­ters such as the en­dash (typ­i­cally as <tt>--</tt>) so we have a mix­ture of data and type­set­ting di­rec­tives. If you are cov­er­ing non--eng­lish ref­er­ences, you of­ten need char­ac­ters that are not in the ascii sub­set but ConTEXt is quite happy with utf . If your data­base file uses old-fash­ioned TEX ac­cent com­mands then these will be in­ter­nally con­verted au­to­mat­i­cally to utf . Com­mands (macros) are con­verted to an in­di­rect call, which is quite ro­bust.
<br/>
The bibTEX files are loaded in mem­ory as Lua ta­ble but can be con­verted to xml so that we can ac­cess them in a more flex­i­ble way, but that is a sub­ject for spe­cial­ists.
<br/>
In the old MkII setup we have two kinds of en­tries: the ones that come from the bibTEX run and user sup­plied ones. We no longer rely on bibTEX out­put but we do still sup­port the user sup­plied de­f­i­n­i­tions. These were in fact pre­pared in a way that suits the pro­cess­ing of bibTEX gen­er­ated en­tries. The next vari­ant re­flects the ConTEXt re­cod­ing of the old bibTEX out­put.
<pre detail='typing'>
\startpublication[k=Hagen:Second,t=article,a={Hans Hagen},y=2013,s=HH01]
</pre>
<br/>
The split <tt>\artauthor</tt> fields are col­lapsed into a sin­gle <tt>author</tt> field as we deal with the split­ting later when it gets parsed in Lua . The <tt>\artauthor</tt> syn­tax is only kept around for back­ward com­pat­i­bil­ity with the pre­vi­ous use of bibTEX .
<br/>
In the new setup we sup­port these vari­ants as well:
</pre>
<br/>
Be­cause in­ter­nally the en­tries are Lua ta­bles, we also sup­port load­ing of Lua based de­f­i­n­i­tions:
<pre detail='typing'>
return {
</pre>
<br/>
Files set up like this can be loaded too. The fol­low­ing xml in­put is rather close to this, and is also ac­cepted as in­put.
<pre detail='typing'>
<?xml version="2.0" standalone="yes" ?>
=Com­mands in en­tries=
One un­for­tu­nate as­pect com­monly found in bibTEX files is that they of­ten con­tain TEX com­mands. Even worse is that there is no stan­dard on what these com­mands can be and what they mean, at least not for­mally, as bibTEX is a pro­gram in­tended to be used with many vari­ants of TEX style: plain, LATEX , and oth­ers. This means that we need to de­fine our use of these type­set­ting com­mands. How­ever, in most cases, they are just ab­bre­vi­a­tions or font switches and these are of­ten known. There­fore, ConTEXt will try to re­solve them be­fore re­port­ing an is­sue. In the log file there is a list of com­mands that has been seen in the loaded data­bases. For in­stance, load­ing <tt>tugboat.bib</tt> gives a long list of com­mands of which we show a small set here:
<pre detail='typing'>
publications > start used btx commands
</pre>
<br/>
These three suf­fixes are un­der­stood by the loader. Here the dataset has the name <tt>standard</tt> and the three data­base files are merged, where later en­tries hav­ing the same tag over­load pre­vi­ous ones. De­f­i­n­i­tions in the doc­u­ment source (coded in TEX speak) are also added, and they are saved for suc­ces­sive runs. This means that if you load and de­fine en­tries, they will be known at a next run be­fore­hand, so that ref­er­ences to them are in­de­pen­dent of when load­ing and de­f­i­n­i­tions take place.
<br/>
Be­cause we are deal­ing with data­base in­put and be­cause we gen­er­ally need to ma­nip­u­late en­tries, much of the work is del­e­gated to Lua . This makes it eas­ier to main­tain and ex­tend the code. Of course TEX still does the ren­der­ing. The ty­po­graphic de­tails are con­trolled by pa­ra­me­ters but not all are used in all vari­ants. As with most ConTEXt com­mands, it starts out with a gen­eral setup com­mand:
</pre>
<br/>
You can over­load such se­tups if needed, but that only makes sense when you can­not con­fig­ure the ren­der­ing with pa­ra­me­ters. The <tt>\btxcitevariant</tt> com­mand is one of the build in ac­ces­sors and it calls out to Lua where more com­plex ma­nip­u­la­tion takes place if needed. If no ma­nip­u­la­tion is known, the field with the same name (if found) will be flushed. A com­mand like <tt>\btxcitevariant</tt> as­sumes that a dataset and spe­cific tag has been set. This is nor­mally done in the wrap­per macros, like <tt>\cite</tt>. For spe­cial pur­poses you can use these com­mands
<pre detail='typing'>
\setbtxdataset[example]
=The LUA view=
Be­cause we man­age data at the Lua end it is tempt­ing to ac­cess it there for other pur­poses. This is fine as long as you keep in mind that as­pects of the im­ple­men­ta­tion may change over time, al­though this is un­likely once the mod­ules be­come sta­ble.
<br/>
The en­tries are col­lected in datasets and each set has a unique name. In this doc­u­ment we have the set named <tt>example</tt>. A dataset ta­ble has sev­eral fields, and prob­a­bly the one of most in­ter­est is the <tt>luadata</tt> field. Each en­try in this ta­ble de­scribes a pub­li­ca­tion:
These de­tails are ac­cessed as <tt>publications.datasets.example.details["demo-001"]</tt> and by us­ing a sep­a­rate ta­ble we can over­load fields in the orig­i­nal en­try with­out los­ing the orig­i­nal.
<br/>
You can loop over the en­tries us­ing reg­u­lar Lua code com­bined with MkIV helpers:
<pre detail='buffer'>
local dataset = publications.datasets.example
|
bibTEX , the ConTEXt way
|
|
bibTEX , the ConTEXt way
|
=The XML view=
The <tt>luadata</tt> ta­ble can be con­verted into an xml rep­re­sen­ta­tion. This is a fol­low up on ear­lier ex­per­i­ments with an xml -only ap­proach. I de­cided in the end to stick to a Lua ap­proach and pro­vide some sim­ple xml sup­port in ad­di­tion.
<br/>
Once a dataset is ac­ces­si­ble as xml tree, you can use the reg­u­lar <tt>\xml...</tt> com­mands. We start with load­ing a dataset, in this case from just one file.
<pre detail='buffer'>
\usebtxdataset[tugboat][tugboat.bib]
</pre>
<br/>
The dataset has to be con­verted to xml :
<pre detail='buffer'>
\convertbtxdatasettoxml[tugboat]
<br/>
A more ex­ten­sive ex­am­ple is the fol­low­ing. Of course this as­sumes that you know what xml sup­port mech­a­nisms and macros are avail­able.
<pre detail='buffer'>
\startxmlsetups btx:getkeys
<br/>
The orig­i­nal data is stored in a Lua ta­ble, hashed by tag. Start­ing with Lua 5.2 each run of Lua gets a dif­fer­ent or­der­ing of such a hash. In older ver­sions, when you looped over a hash, the or­der was un­de­fined, but the same as long as you used the same bi­nary. This had the ad­van­tage that suc­ces­sive runs, some­thing we of­ten have in doc­u­ment pro­cess­ing gave con­sis­tent re­sults. In to­day’s Lua we need to do much more sort­ing of hashes be­fore we loop, es­pe­cially when we save multi--pass data. It is for this rea­son that the xml tree is sorted by hash key by de­fault. That way lookups (es­pe­cially the first of a set) give con­sis­tent out­comes.
=Stan­dards=
The ren­der­ing of bib­li­o­graphic en­tries is of­ten stan­dard­ized and pre­scribed by the pub­lisher. If you sub­mit an ar­ti­cle to a jour­nal, nor­mally it will be re­for­mat­ted (or even re- keyed) and the ren­der­ing will hap­pen at the pub­lish­ers end. In that case it may not mat­ter how en­tries were ren­dered when writ­ing the pub­li­ca­tion, be­cause the pub­lisher will do it his or her way. This means that most users prob­a­bly will stick to the stan­dard apa rules and for them we pro­vide some con­fig­u­ra­tion. Be­cause we use se­tups it is easy to over­load specifics. If you re­ally want to tweak, best look in the files that deal with it.
<br/>
Many stan­dards ex­ist and sup­port for other ren­der­ings may be added to the core. In­ter­ested users are in­vited to de­velop and to test al­ter­nate stan­dard ren­der­ings ac­cord­ing to their needs.
=Clean­ing up=
Al­though the bibTEX for­mat is rea­son­ably well de­fined, in prac­tice there are many ways to or­ga­nize the data. For in­stance, one can use pre­de­fined string con­stants that get used (ei­ther or not com­bined with other strings) later on. A string can be en­closed in curly braces or dou­ble quotes. The strings can con­tain TEX com­mands but these are not stan­dard­ized. The data­bases of­ten have some­what com­plex ways to deal with spe­cial char­ac­ters and the use of braces in their de­f­i­n­i­tion is also not nor­mal­ized.
<br/>
The most com­plex to deal with are the fields that con­tain names of peo­ple. At some point it might be needed to split a com­bi­na­tion of names into in­di­vid­ual ones that then get split into ti­tle, first name, op­tional in­be­tweens, sur­name(s) and ad­di­tional: <tt>Prof. Dr. Alfred B. C. von Kwik Kwak Jr. II and P. Q. Olet</tt> is just one ex­am­ple of this. The con­ven­tion seems to be not to use com­mas but <tt>and</tt> to sep­a­rate names (of­ten each name will be spec­i­fied as last­name, first­name).
</pre>
<br/>
For MkIV the mod­ules were partly rewrit­ten and ended up in the core so the two com­mands are not needed there. One ad­van­tage of ex­plic­itly load­ing a mod­ule is that a job that doesn’t need ref­er­ences to pub­li­ca­tions doesn’t suf­fer from the as­so­ci­ated over­head. Nowa­days this over­head can be ne­glected. The first setup com­mand in this ex­am­ple is needed to boot­strap the process: it tells what data­base has to be processed by bibTEX be­tween runs. The sec­ond setup com­mand is op­tional. Each ci­ta­tion (tagged with <tt>\cite</tt>) ends up in the list of pub­li­ca­tions.
<br/>
In the new ap­proach again the code is in the ConTEXt ker­nel, so no mod­ules need to be loaded. But, as we no longer use bibTEX , we don’t need to setup bibTEX . In­stead we de­fine dataset(s). We also no longer set up pub­li­ca­tions with one com­mand, but have split that up in ren­der­ing-, list-, and cite-vari­ants. The ba­sic <tt>\cite</tt> com­mand re­mains.
<pre detail='typing'>
\definebtxdataset
</pre>
<br/>
But keep in mind, that com­pared to the old MkII de­rived method we have moved some of the setup op­tions to set­ting up the list and cite vari­ants.
<br/>
An­other dif­fer­ence is the use of lists. When you de­fine a ren­der­ing, you also de­fine a list. How­ever, all en­tries are col­lected in a com­mon list tagged <tt>btx</tt>. Al­though you will nor­mally con­fig­ure a ren­der­ing you can still set some prop­er­ties of lists, but in that case you need to pre­fix the list iden­ti­fier. In the case of the above ex­am­ple this is <tt>btx:document</tt>.
=ML­BIBTEX=
Todo: how to plug in ML­bibTEX for sort­ing and other ad­vanced op­er­a­tions.
=Ex­ten­sions=
As TEX and Lua are both open and ac­ces­si­ble in ConTEXt it is pos­si­ble to ex­tend the func­tion­al­ity of the bib­li­og­ra­phy re­lated code. For in­stance, you can add ex­tra load­ers.
<pre detail='typing'>
function publications.loaders.myformat(dataset,filename)