Changes

Jump to navigation Jump to search
1,951 bytes removed ,  07:15, 13 July 2023
m
→‎Undefined Commands in Lua Comments: Fix a broken link. And add enough information to be able to re-fix it easily. (I guess this last part could go in a footnote reference, but I am not sure how to do that properly…)
As its name suggests, [[LuaTeX]] adds = Calling Lua, a programming language, to from TeX, the typesetter. This allows us to program TeX in a high-level programming language. For example, consider a TeX macro that divides two numbers. Such a macro is provided by the <tt>fp</tt> package and also by <tt>pgfmath</tt> library of the <tt>TikZ</tt> package. The following comment is from the <tt>fp</tt> package<texcode>\def\FP@div#1#2.#3.#4\relax#5.#6.#7\relax{% % [...] algorithmic idea (for x>0, y>0) % - %determine \FP@shift such that % y*10^\FP@shift < 100000000 % <=y*10^(\FP@shift+1) % - %determine \FP@shift' such that % x*10^\FP@shift'< 100000000 % <=x*10^(\FP@shift+1) % - x=x*\FP@shift' % - y=y*\FP@shift % - \FP@shift=\FP@shift-\FP@shift' % - res=0 % - while y>0 %fixed-point representation! % - \FP@times=0 % - while x>y % - \FP@times=\FP@times+1 % - x=x-y % - end % - y=y/10 % - res=10*res+\FP@times/1000000000 % - end % - %shift the result according to \FP@shift</texcode> The <tt>pgfmath</tt> library implements the macro in a similar way, but limits the number of shifts that it does. These macros highlight the state of affairs in writing TeX macros. Even simple things like multiplying two numbers are hard; you either have to work extremely hard to circumvent the programming limitations of TeX, or, more frequently, hope that someone else has done the hard work for you. In LuaTeX, such a function can be written using the <code>/</code> operator (I will explain the details later):<texcode>\def\DIVIDE#1#2{\directlua{tex.print(#1/#2)}}</texcode>
Thus, with LuaTeX ordinary users can write simple macrosThe interweaving of ConTeXt and Lua consists of two elements: first you tell TeX that you're starting some Lua code; andthen, perhaps more importantlyonce inside Lua, can read and understand macros written by you need to use the appropriate functions to put things into the TeX wizardsstream.
Since the LuaTeX project started it has been actively supported by There are two main ways to execute Lua code in a ConTeXt. These daysdocument: The command {{cmd|ctxlua}}, and the various <em>How do I write such a macro</em> questions on the ConTeXt mailing list are answered by a solution that uses Luaenvironment {{cmd|startluacode}}…{{cmd|stopluacode}}. A few such examples Both are presented in this article. This article focuses on how wrappers around the LuaTeX primitive {{cmd|directlua}}, which you should never need to use LuaTeX to write macros . In general, you will define a function inside a {{cmd|startluacode}} block, and then define a TeX command that require some <em>flow controlcalls the function using {{cmd|ctxlua</emcode>: randomized outputs, loops, and parsing. For fonts and non-Latin languages, the the [[Fonts in LuaTeX]] pageespecially because {{cmd|ctxlua}} has a few idiosyncracies.
= Interaction between <blockquote>The main thing about Lua code in a TeX document is this: the code is expanded by TeX and ''before'' Lua gets to it. '''This means that all the Lua =code, even the comments, must be valid TeX!''' A string like {{cmd|undefined}} will cause an immediate failure.</blockquote>
To == Calling a first approximation, the interaction between TeX and Lua is straightforward. When TeX (i.e., the LuaTeX engine) starts, it loads the input file in memory and processes it token by token. When TeX encounters <code>\directlua</code>, it stops reading the file in memory, <em>fully expands the argument of <code>\directlua</code></em>, and passes the control to a Lua instance. The Lua instance, which runs with a few preloaded libraries, processes the expanded arguments of <code>\directlua</code>. This Lua instance has a special output stream which can be accessed using <code>tex.print(...)</code>. The function <code>tex.print(...)</code> is just like the Lua function <code>print(...)</code> except that <code>tex.print(...)</code> prints to a <em>TeX stream</em> rather than to the standard output. When the Lua instance finishes processing its input, it passes the contents of the <em>TeX stream</em> back to TeX.<ref>The output bit of <code>tex.print(...)</code> is buffered and not passed to TeX until the Lua instance has stopped.</ref> TeX then inserts the contents of the <em>TeX stream</em> at the current location of the file that it was reading; expands the contents of the <em>TeX stream</em>; and continues. If TeX encounters another <code>\directlua</code>, the above process is repeated. inline: {{cmd|ctxlua}} ==
As an exercise, imagine what happens when the following input The command {{cmd|ctxlua}} is processed by LuaTeX. <ref>In this example, two different kinds for short inline snippets of quotations are used to avoid escaping quotes. Escaping quotes inside <code>\directlua</code> is tricky. The above was a contrived example; if you ever need to escape quotesLua, you can use the <code>\startluacode ... \stopluacode</code> syntax explained later.</ref>suchas
<texcode>
$2 + 5 \directlua% neq \ctxlua{tex.printcontext("Depth 1 3+5)}$, but is equal to \\directluactxlua{tex.printcontext('Depth 2'+5)}.This is \ctxlua{context(string.upper("absolutely"))}true.
</texcode>
On top of these LuaTeX primitives, ConTeXt provides a higher level interface. There are two ways to call Lua from ConTeXt. The first is a macro <code>\{{cmd|ctxlua</code> (read as ConTeXt Lua), which is similar to <code>\directlua</code>. (Aside: It is possible to run the Lua instance }} operates under different name spaces. <code>\ctxlua</code> is the default name space; other name spaces are explained later.) <code>\ctxlua</code> is good for calling small snippets of Lua. The argument of <code>\ctxlua</code> is parsed under normal TeX catcodes (category codes), so . This means the end of line character has following two things for the same catcode Lua code inside:* all newlines get treated as a spacespaces* special TeX characters like &, #, $, {, }, etc. This can lead , need to surprisesbe escaped. For example In addition, if you try to use a the warning above still holds. All the Lua commentcode, everything after even the comment gets ignoredcomments, must be valid TeXSome code to illustrate the newline problem:
<texcode>
\ctxlua
{-- A Lua comment
tex.print("This is not printed")}
\ctxlua
{% A Tex comment
tex.print("This is printed")}
</texcode>
This can be avoided by using a TeX comment instead of a Lua comment. However, working under normal TeX catcodes poses a bigger The problem: with special TeX characters like &, #, $, {, }, etc., need to be escaped. For example, # has to be escaped with (<code>\string#t</code> to be used in is Lua for 'the length of array <code>\ctxluat</code>.)
<texcode>
% This doesn't work:
%\ctxlua
% {local t = {1,2,3,4}
% tex.print("length " .. #t)}
\ctxlua
{local t = {1,2,3,4}
</texcode>
As == Calling a lua function with {{cmd|cldcontext}} and get the argument of <return == One can execute a Lua code>\ctxlua</from within TeX and get back the result in TeX by using {{cmd|cldcontext}}. Thus, if {{code> |myfunction}} is fully expandeda function of a variable {{code|x}} defined in Lua, escaping characters can sometimes be tricky. To circumvent this problem, ConTeXt defines a environment called <{{cmd|cldcontext|{myfunction(5)}}} returns the value {{code>\startluacode |myfunction(5)}} in TeX.This is equivalent to {{cmd|ctxlua|{context(myfunction(5))}}}.. \  == A larger Lua block: {{cmd|startluacode}}…{{cmd|stopluacode}} == Inside the {{cmd|startluacode}}…{{cmd|stopluacode</code>}} environment, newlines and special characters behave normally. This sets solves the catcodes to what one would expect catcode problem that {{cmd|ctxlua}} suffers from. Apart from these special characters, the main warning remains in force: all the Lua. Basically only <code>\</code> has its usual TeX meaning, even the catcode of everything else is set to other. Socomments, for all practical purposes, we can forget about catcodes inside <code>\startluacode must be valid TeX... \stopluacode</code>. The above two examples can be written as 
<texcode>
\startluacode
-- A The unknown command \undefined will cause this entire block to fail.  -- Print a countdown '10, 8, …, 0!' -- `..` is Lua for string concatenation for i = 10, 2, -2 do context(i .. ", ") end context("0!")  -- \\par is equivalent to a blank line in the input -- (Notice the escaped backslash: TeX won't mind the above comment.) tex context.printpar()  -- Look! we can use # and $ with impunity! context("This is printedUnless we print them, then we must \\#\\$\\& print the escape characters, too.") local t \stopluacode</texcode> == Putting Lua code in an external file == {1,2,3,4} texYou can put your lua code in an external file (with the <code>.printlua</code> extension) and include it with the <code>require</code> command: <texcode>\startluacode-- include the file my-lua-lib.luarequire("length my-lua-lib" .. #t)
\stopluacode
</texcode>
== Namespaces == It is a good habit to put your custom-defined functions in their own namespace. The contents of traditional namespace for this is <code>userdata</code>:<texcode>\startluacode -- if userdata doesn't exist yet, create it userdata = userdata or {} -- define a shorter synonym u = userdata  -- create my custom function inside the userdata namespace function u.myfunction() -- do stuff end\stopluacode</texcode> The full list of canonical namespaces, taken from [https://distribution.contextgarden.net/current/context/latest/tex/context/base/mkxl/luat-ini.lmt luat-ini.lmt]: <code> <pre>userdata = userdata or { } -- for users (e.g.functions etc)thirddata = thirddata or { } -- only for third party modulesmoduledata = moduledata or { } -- only for development teamdocumentdata = documentdata or { } -- for users (e.g. raw data)parametersets = parametersets or { } -- experimental for team<code/pre>\stopluacode</code> If your module, like the argument of environment, or document is going to be used by other people, you should create your own subnamespaces within these tables. <code>\ctxlua<pre>moduledata['mymodule'] = { }mm = moduledata.mymodulefunction mm.mainfunction() -- do stuffend</pre></code> are fully expanded. ' == Undefined Commands in Lua Comments == Lua code invoked inside TeX doesn’t allow TeX undefined commands ''This mean that even the Lua inside comments should be valid TeX statements!''' For example,. 
<texcode>
\starttext
\startluacode
-- \undefinedundefinedcommandfromme
\stopluacode
Hello
\ctxlua{--\undefinedcommandfromme}
\stoptext
</texcode>
will give an error because when TeX expands the contents, it encounters <code>\undefined</code> which is an undefined TeX macro. This error can be avoided by using <code>\type{\undefined}</code> or <code>\\undefined</code>. In general, the <code>\startluacode</code> ... <code>\stopluacode</code> environment is meant for moderately sized code snippets. For longer Lua code, it is more convenient to write the code in a separate Lua file and then load it using Lua's <code>dofile(...)</code> function.
ConTeXt also provides a Lua function to conveniently write to To get the TeX stream. The function is called <code>contextsample above working (as [https://www.mail-archive.com/ntg-context@ntg.)<nl/code> and it is equivalent to <code>tex.print(string.format(.msg103892.html explained by Hans in a NTG-context thread from jan.2023 entitled “Minor bug in Lua or ConTeXt”]))</code>. , you would need a fallback definition:
Using the above, it is easy to define TeX macros that pass control to Lua, do some processing in Lua, and then pass the result back to TeX. For example, a macro to convert a decimal number to hexadecimal can be written simply, by asking Lua to do the conversion.
<texcode>
\defifdefined\TOHEX#1{undefinedcommandfromme \ctxlua{context("else \%X",#1)}}let\TOHEX{35}undefinedcommandfromme\relax \fi
</texcode>
The percent sign had {{cmd|undefinedcommandfromme}} gets only defined (as {{cmd|relax}}, to be escaped because <code>\ctxlua</code> assumes do nothing), if and only if it is undefined. = Calling TeX catcodes. Sometimesfrom Lua = Being a topic on itself, pages are dedicated:* '''[[CLD|ConTeXt Lua Documents]]''', escaping arguments can be difficult; insteador CLD, it can be easier are way to access TeX from inside Lua scripts. A page give clues about [[CLD_passing_variables|passing variables]] within CLD (2018).* [[Lua|Wiki page dedicated to Lua]]** [[Extensions to define a the Lua I/O library]]** [[String manipulation]]** [[Table manipulation]] = Putting stuff in your TeX document from Lua function inside = == Simple printing: context(), tex.print(), and tex.sprint() ==Use <code>\startluacode ... \stopluacodecontext(…)</code> and call it using for most things. It is equivalent to <code>\ctxluatex.print(string.format(…))</code>. For example, a macro that takes a comma separated list of strings and prints a random item can be written as so 
<texcode>
\startluacode
userdata name = userdata or {}"Jane" math.randomseed( os.time() )date = "today" function userdata.random(...) context(arg[math.random(1"Hello %s, how are you %s?", name, #arg)]date) end-- Becomes 'Hello Jane, how are you today?'
\stopluacode
\def\CHOOSERANDOM#1% {\ctxlua{userdata.random(#1)}}</texcode>
More primitively, you have <code>tex.print()</code> and <code>tex.sprint()</code>. Either one can take as an argument either a number of strings, or an array of strings, and will then insert the strings into the TeX stream. The only difference is that <code>tex.print()</code> treats each string as a separate input line, while <code>tex.sprint()</code> doesn't. So the following lines <texcode>\CHOOSERANDOMctxlua{tex.print("onea", "twob")}\ctxlua{tex.print({"a", "threeb"})}
</texcode>
I could have written a wrapper so that the function takes a list of words and chooses a random word among them. For an example of such a conversion, see the <em>sorting a list of tokens</em> page on the [http://Luatex.bluwiki.com/go/Sort_a_token_list LuaTeX wiki]are both interpreted by TeX as
In the above, I created a name space called <codetexcode>userdata</code> and defined the function <code>randomab</codetexcode> in that name space. Using a name space avoids clashes with the Lua functions defined in LuaTeX and ConTeXt.
In order to avoid name clashesbut when we use <code>tex.sprint</code> instead, ConTeXt also defines independent name spaces either of Lua instances. They are the following
<texcode>\ctxlua{|tex.sprint("a", "b")}\ctxlua{tex.sprint({"a", "b"})}</texcode> |- | '''user'''will be read by TeX as | a private user instance |-<texcode>ab</texcode>   | '''third'''without any space in between. | third party module instance |-== Context commands == | '''module''' | Most commands that you would type with a backslash in plain ConTeXt module instance, you can access from Lua with <code>context.<em>command</em></code>. Unadorned strings end up in TeX as arguments in curly braces; Lua tables end up in TeX as paramater blocks in square brackets. The following two pieces of code are equivalent: |- | '''isolated'''<texcode> | an isolated instance\startluacode | context.chapter({first}, "Some title") context.startcolumns({n = 3, rule = "on"}) context("Hello one") context.column() context("Hello two") context.column() context("Hello three") context.stopcolumns()\stopluacode</texcode>
Thus, for example, instead of <code>\ctxlua</code> and <code>\startluacode ... \stopluacode</code>, the <code>user</code> instance can be accessed via the macros <code>\usercode</code> and <code>\startusercode ... \stopusercode</code>. In instances other than <code>isolated</code>, all the Lua functions defined by ConTeXt (but not the inbuilt Lua functions) are stored in a <code>global</code> name space. In the <code>isolated</code> instance, all Lua functions defined by ConTeXt are hidden and cannot be accessed. Using these instances, we could write the above <code>\CHOOSERANDOM</code> macro as follows
<texcode>
\chapter[first]{Some title} \startcolumns[n=3, rule=on] Hello one \startusercodecolumn math Hello two \column Hello three \stopcolumns</texcode> For a fuller account of the context.randomseed( globalcommands, see the [http://www.ospragma-ade.time() ) function random(nl/general/manuals/cld-mkiv.pdf ConTeXt Lua document] manual.It is old, but most of it still applies.) globalOne final note: arguments can also be specified in the form of nested functions.Because LuaTeX evaluates the deepest-nested argument first, this may cause the <code>context(arg)</code> calls to be evaluated in the wrong order. For more on this, see the article on [[CLD|ConTeXt Lua documents]], and also, again, the [mathhttp://www.random(1pragma-ade.nl/general/manuals/cld-mkiv.pdf CLD manual]. = Passing arguments and buffers: ConTeXt commands that hook into Lua = == Making \command{arg1}{arg2} hook into Lua ==First, #arg)])define a Lua function:  end<texcode>\stopusercodestartluacode -- remember, using the userdata namespace prevents conflicts userdata = userdata or {}
\def\CHOOSERANDOM#1% function userdata.surroundwithdashes(str) {\usercode{random context(#1"--" .. str .. "--")}} end\stopluacode
</texcode>
Since I defined Then define the function <code>random</code> in the <code>user</code> instance of Lua, I did not bother TeX command that expands to use a separate name space for the function. The Lua functions <code>os.time</code>, which is defined by a LuaTeX library, and <code>context</code>, which is defined by ConTeXt, needed to be accessed through a <code>global</code> name space. On the other hand, the <code>math.randomseed</code> function, which is part of Lua, could be accessed as is. {{cmd|ctxlua}} call:
A separate Lua instance also makes debugging slightly easier. With <code>\ctxlua</code> the error message starts with
<texcode>
! LuaTeX error &lt;main ctx instance&gt;:\def\surroundwd#1% {\ctxlua{userdata.surroundwithdashes([==[#1]==])}}
</texcode>
With ''NB'': quoting with <code>\usercode[==[#1]==]</code> ([http://www.lua.org/manual/5.2/manual.html#3.1 long strings])works just like <code>"#1"</code> in most cases, but in addition it is robust against <code>#1</code> containing the quotation mark<code>"</code> which would terminate the Lua string prematurely.Inside {{cmd|protect}}…{{cmd|unprotect}} the error message starts withmacros {{cmd|!!bs}}and {{cmd|!!es} are at your disposition.They are equivalent to <code>[===[</code> and <code>]===]</code> and --being single tokens to TeX -- parsed faster.(See [http://repo.or.cz/w/context.git/blob/refs/heads/origin:/tex/context/base/luat-ini.mkiv#l174 <code>luat-ini.mkiv</code>].) == Making {{cmd|startenv}}…{{cmd|stopenv}} hook into Lua ==The first job is, as ever, to have the Lua function at the ready
<texcode>
! LuaTeX error &lt;private user instance&gt;:\startluacode userdata = userdata or {}  function userdata.verynarrow(buffer) -- equivalent to \startnarrower[10em] context.startnarrower({"10em"}) context(buffer) context.stopnarrower() end\stopluacode
</texcode>
This makes it easier to narrow down Next, we define the source start command of error. our custom buffer:
Normally, it is best to define your Lua functions in the <code>user</code> name space. If you are writing a module, then define your Lua functions in the <code>third</code> instance and in a name space which is the name of your module. In this article, I will simply use the default Lua instance, but take care to define all my Lua functions in a <code>userdata</codetexcode> \def\startverynarrow% {\dostartbuffer [verynarrow] % buffer name space. [startverynarrow] % command where buffer starts [stopverynarrow]} % command where buffer ends % also: command invoked when buffer stops
Now that we have some idea of how to work with LuaTeX, let's look at some examples.</texcode>
= Arithmetic without using an abacus =Lastly, we define the {{cmd|stopverynarrow}} command such that it passes the recently-complated buffer to our <code>verynarrow</code> Lua function:
<texcode>\def\stopverynarrow {\ctxlua {userdata.verynarrow(buffers.getcontent('verynarrow'))}}</texcode> And that's it! The rest of this article will consist of examples. = Examples = == Arithmetic without using an abacus == ''This example demonstrates writing simple commands that invoke \ctxlua.'' Doing simple arithmetic in TeX can be extremely difficult, as illustrated by the division macro in the introduction. With Lua, simple arithmetic becomes trivial. For example, if you want a macro to find the cosine of an angle (in degrees), you can write
<texcode>
\def\COSINE#1%
$\pi = \ctxlua{context(math.pi)}$
</texcode>
or , if you want less precision (notice the percent sign is escaped):
<texcode>
$\pi = \ctxlua{context("\%letterpercent.6f", math.pi)}$</texcode>Notice that the percent sign is escaped with letterpercent. === mathexpr with LMTX === In LMTX there is a new way to use calculated expressions with mathexpr through ([https://github.com/contextgarden/context-mirror/blob/7fd782dace8f90e7e032ca8f449f8ca4eada450b/doc/context/sources/general/manuals/math/math-fun.tex math-fun]). Some examples are: <texcode>$ \pi = \mathexpr[.40N]{pi} $$ \pi = \mathexpr[.80N]{sqrt(11)} $$ \pi = \decimalexpr[.80N]{sqrt(11)} $$ \pi = \decimalexpr{sqrt(11)} $$ c = \complexexpr{123 + new(456,789)}$
</texcode>
== Loops without worrying about expansion == ''This example demonstrates using Lua to write a quasi-repetitive piece of ConTeXt code.''
Loops in TeX are tricky , because macro assignments and macro expansion interact in strange ways. For example, suppose we want to typeset a table showing the sum of the roll of two dice and want the output to look like this:
<context source="yes">
\setupcolors[state=start]
</context>
The tedious (but faster!) way to achieve this This is to simply type easy in LuaTeX. Once a Lua instance starts, TeX does not see anything until the whole table by handLua instance exits.  It is however natural to want to So, we can write this table as a the loopin Lua, and compute simply print the valuesthat we would have typed to the TeX stream. When the control is passed to TeX, TeX sees the input as if we had typed it by hand. A first ConTeXt implementation using This is the Lua code for the recursion level might beabove table:
<texcode>
\bTABLE \bTR \bTD $(+)$ \eTD \dorecurse{6} {\bTD \recurselevel \eTD} setupcolors[state=start] \eTR \dorecursesetupTABLE[each][each][width=2em,height=2em,align={6middle,middle} ] {\bTR setupTABLE[r][1][background=color,backgroundcolor=gray] \bTD \recurselevel \eTD \edef\firstrecurselevel{\recurselevel} \dorecurse{6} {\bTD \the\numexpr\firstrecurselevel+\recurselevel \eTD}% \eTR} \eTABLE</texcode>setupTABLE[c][1][background=color,backgroundcolor=gray]
However, this does not work as expected, yielding all zeros.
 
A natural table stores the contents of all the cells, before typesetting it. But it does not expand the contents of its cell before storing them. So, at the time the table is actually typeset, TeX has already finished the <code>\dorecurse</code> and <code>\recurselevel</code> is set to 0.
 
The solution is to place <code>\expandafter</code> at the correct location(s) to coax TeX into expanding the <code>\recurselevel</code> macro before the natural table stores the cell contents. The difficult part is figuring out the exact location of <code>\expandafter</code>s. Here is a solution that works:
 
<texcode>
\bTABLE
\bTR
\bTD $(+)$ \eTD
\dorecurse{6}
{\expandafter \bTD \recurselevel \eTD}
\eTR
\dorecurse{6}
{\bTR
\edef\firstrecurselevel{\recurselevel}
\expandafter\bTD \recurselevel \eTD
\dorecurse{6}
{\expandafter\bTD
\the\numexpr\firstrecurselevel+\recurselevel
\relax
\eTD}
\eTR}
\eTABLE
</texcode>
 
We only needed to add three <code>\expandafter</code>s to make the naive loop work. Nevertheless, finding the right location of <code>\expandafter</code> can be frustrating, especially for a non-expert.
 
By contrast, in LuaTeX writing loops is easy. Once a Lua instance starts, TeX does not see anything until the Lua instance exits. So, we can write the loop in Lua, and simply print the values that we would have typed to the TeX stream. When the control is passed to TeX, TeX sees the input as if we had typed it by hand. Consequently, macro expansion is no longer an issue. For example, we can get the above table by:
<texcode>
\startluacode
context.bTABLE()
</texcode>
The Lua functions such as <code>context.bTABLE()</code> and <code>context.bTR()</code> are just abbreviations for running <code>context ("\\bTABLE")</code>, <code>context("\\bTR")</code>, etc. See the [http://www.pragma-ade.com/general/manuals/cld-mkiv.pdf ConTeXt Lua document] manual for more details about such functions. The rest of the code is a simple nested for-loop that computes the sum of two dice. We do not need to worry about macro expansion at all!== Parsing input without exploding your head ==
''This example demonstrates parsing simple ASCII notation with Lua's lpeg parser.''
= Parsing input without exploding your head =As an example, let's consider typesetting chemical molecules in TeX. Normally, molecules should be typeset in text mode rather than math mode. If we want :H<sub>3</sub>SO<sub>4</sub><sup>+</sup>,we must type :<code>H{{cmd|low|{3}}}SO{{cmd|lohi|{4}}}{{{cmd|textplus}}}</code>,but we'd much rather type:{{cmd|molecule|{H_3SO_4^+}}}.
In order to get around the weird rules of macro expansionSo, writing we need a parser in TeX involves function that can take a lot of macro jugglery string like that, parse it, and catcode trickeryturn it into the appropriate TeX code. It is LuaTeX includes a black artgeneral parser based on PEG (parsing expression grammar) called [http://www.inf.puc-rio.br/~roberto/lpeg/lpeg.html lpeg], one and it makes writing little parsers positively joyful. (Once you've got the knack of it, at least.) For example, the biggest mysteries of TeX for ordinary usersabove {{cmd|molecule}} macro can be written as follows.
As an example, let's consider typesetting chemical molecules in TeX. Normally, molecules should be typeset in text mode rather than math mode. For example, <context>H\low{2}SO\lohi{4}{--}</context>, can be input as <code>H\low{2}SO\lohi{4}{--}</code>. Typing so much markup can be cumbersome. Ideally, we want a macro such that we type <code>\molecule{H_2SO_4^-}</code> and the macro translates this into <code>H\low{2}SO\lohi{4}{--}</code>. Such a macro can be written in TeX as follows.
<texcode>
\newbox\chemlowbox \def\chemlow#1% {\setbox\chemlowbox \hbox{{\switchtobodyfont[small]#1}}} startluacode
\def\chemhigh#1% -- we will put our molecule function in the userdata namespace. userdata = userdata or {\ifvoid\chemlowbox \high{{\switchtobodyfont[small]#1}}% \else \lohi{\box\chemlowbox} {{\switchtobodyfont[small]#1}} \fi}
\def\finishchem%-- The formatting functions into which the captured {\ifvoid\chemlowbox\else -- superscript/subscript blocks will be fed \lowlocal formatters = {\box\chemlowbox} \fi}
\unexpanded\def\molecule% {\bgroup \catcode`\_=\active \uccode`\~=`\_ \uppercase{\let~\chemlow}% \catcode`\^=\active \uccode`\~=`\^ \uppercase{\let~\chemhigh}% \dostepwiserecurse {65}{90}{1} {\catcode \recurselevel = \active function formatters.low(one) return string.format("\uccode`\~=\recurselevel \uppercaselow{\edef~{\noexpand\finishchem \rawcharacter{\recurselevel}}}}% \catcode`\-=\active \uccode`\~=`\- \uppercase{\def~{--}s}% ", one) \domolecule }% end
function formatters.high(one) return string.format("\def\domolecule#1high{#1\finishchem\egroup%s}", one)</texcode>end
This monstrosity is a typical TeX parserfunction formatters. Appropriate characters need to be made active; occasionallylowhigh(one, <code>two) return string.format("\lccode</code> and <code>\uccode</code> need to be set; signaling tricks are needed (for instancelohi{%s}{%s}", one, checking if <code>\chemlowbox</code> is void); and then magic happens (or so it seems to a flabbergasted usertwo). More sophisticated parsers involve creating finite state automata, which look even more monstrous.end
With LuaTeX, things are differentfunction formatters. LuaTeX includes a general parser based on PEG highlow(parsing expression grammarone, two,three) called [http://www return string.inf.puc-rio.br/roberto/lpeg/lpeg.html lpeg]. This makes writing parsers in TeX much more comprehensible. For example, the above <code>format("\molecule</code> macro can be written as <texcode>\startluacodeuserdata = userdata or lohi{%s}{%s}", one,two)end
local lowercase = lpeg.R("az")-- These are the characters we may encounterlocal uppercase = lpeg-- The `/` means we want to expand + and - to \textplus c.R("AZ")local backslash = lpegq.P("\\")textminus;-- this substition is not instant, but will take place inside the first local csname = backslash * -- surrounding lpeg.PCs(1) * (1-backslash)^0call.local plus = lpeg.P("+") / "\\textplus "local minus = lpeg.P("-") / "\\textminus "local digit character = lpeg.R("az", "AZ", "09")-- R is for 'range'local sign = plus + minuslocal cardinal subscript = digit^1local integer = sign^0 * cardinallocal leftbrace = lpeg.P("{_") -- P is simply for 'pattern'local rightbrace superscript = lpeg.P("}^")local nobrace = 1 - (leftbrace + rightbrace)local nested = lpeg.P ("{leftbrace * (csname + sign + nobrace + lpeg.V(1)")^0 * rightbrace}local any rightbrace = lpeg.P(1"}")
-- a ^ or _ affects either a single character, or a brace-delimited-- block. Whichever it is, call it `content`.local subscript single = lpeg.P("_")character + plus + minuslocal superscript multiple = lpeg.P("leftbrace * single^")1 * rightbracelocal somescript content = subscript single + superscriptmultiple
-- These are our top-level elements: non-special text, of course, and-- blocks of superscript/subscript/both.-- lpeg.Cs(content) does two things:-- (1) not all matches go into the `/ function` construction; only-- *captures* go in. The C in Cs stands for Capture. This way, -- the superscript/subscript mark gets discarded.-- (2) it expands plus/minus before they go into the formatter. The-- s in Cs stands for 'substitute in the replacement values, if any'local text = single^1local low = subscript * lpeg.Cs(content ) / formatters.lowlocal high = superscript * lpeg.Cs(content) / formatters.highlocal lowhigh = subscript * lpeg.Cs(content) * superscript * lpeg.Cs(content) / formatters.lowhighlocal highlow = superscript * lpeg.Cs(csname + nested content) * + sign + any subscript * lpeg.Cs(content) / formatters.highlow
local lowhigh = lpeg.Cc("\\lohi{%s}{%s}") * subscript * content * superscript * content / string.formatlocal highlow = lpeg.Cc("\\hilo{%s}{%s}") * superscript * content * subscript * content / string.formatlocal low = lpeg.Cc("\\low{%s}") * subscript * content / string.formatlocal high = lpeg.Cc("\\high{%s}") * superscript * content / string.formatlocal justtext = (1 - somescript)^1- Finally, the root element: 'moleculepattern'local parser moleculepattern = lpeg.Cs((csname + lowhigh + highlow + low + high + sign + anytext)^0)
userdatafunction thirddata.moleculeparser = parser molecule(string) -- * `:match` returns the matched string. Our pattern -- `moleculepattern` should match the entire input string. Any -- *performed* substitutions are retained. (`.Cs()` performs a -- previously defined substitution.) -- * `context()` inserts the resulting string into the stream, ready for -- TeX to evaluate. context(moleculepattern:match(string))end
function userdata.molecule(str)
return parser:match(str)
end
\stopluacode
\def\molecule#1% {\ctxlua{userdatathirddata.molecule("#1")}} \starttext \molecule{Hg^+}, \molecule{SO_4^{2-}}\stoptext</texcode>
This is more verbose than the TeX solution, but is easier to read Quite terse and write. With a proper readable by parserstandards, I do not have to use tricks to check if either one or both <code>_</code> and <code>^</code> are present. More importantly, anyone (once they know the lpeg syntax) can read the parser and easily understand what isn't it does. This is in contrast to the implementation based on TeX macro jugglery which require you to implement a TeX interpreter in your head to understand.?
== Manipulating verbatim text for dummies ==
Writing macros ''This example demonstrates defining a custom {{cmd|start}}…{{cmd|stop}} buffer that manipulate verbatim text involve catcode finesse that only TeX wizards can mastergets processed through Lua in its entirety.''
Consider a simple example. Suppose we want to write an environment <code>\{{cmd|startdedentedtyping</code> ... <code>\}} … {{cmd|stopdedentedtyping</code> }} that removes the indentation of the first line from every line. Thus, the output of ...
<texcode>
\stopdedentedtyping
</texcode>
... should be the same as the output of ...
<texcode>
\stoptyping
</texcode>
... even though the leading whitespace is different.
Defining an environment in TeX that removes the leading spaces but leaves other spaces untouched is complicated. On the other hand, once we capture the contents of the environment, removing the leading indent or ''dedenting'' the content in Lua is easy. Here is a Lua function that uses lpegsimple stringsubstitutions.
<texcode>
\stopluacode
</texcode>
The only hard part is capturing the content of the environment and passing it to Lua. As explained in [[Inside_ConTeXt#Passing_verbatim_text_as_macro_parameter|Inside ConText]], the trick to capturing the content of an environment verbatim is to ensure that spaces and newlines have a catcode that makes them significant. This is done using <code>\obeyspaces</code> and <code>\obeylines</code>. Using that trick, we can write this macro as
<texcode>\unprotect\def\startdedentedtyping% Here is the code for defining the {\begingroup \obeyspaces \obeylines \long\def\dostartdedentedtyping##1\stopdedentedtyping% {\ctxluacmd|startdedentedtyping}} … {userdata.dedentedtyping(\!!bs \detokenize{##1} \!!es)}% \endgroupcmd|stopdedentedtyping}% \dostartdedentedtyping}\protect</texcode>The above macro works for simple cases, but there are some limitations. For example, there is an extra space of <code>\n</code> in the output. This macro will also fail if the contents have unbalanced braces (try removing the <code>}</code> from the example).pair:
A more robust solution is to use ConTeXt's built in support for buffers. Using buffers, the above macro can be written as <texcode>% Create an environment that stores everything
% between \startdedentedtyping and \stopdedentedtyping
% in a buffer named 'dedentedtyping'.
{userdata.dedentedtyping(buffers.getcontent('dedentedtyping'))}}
</texcode>
Unlike MkII, where the contents of a buffer were written to an external file, in MkIV buffers are stored in memory. Thus, with LuaTeX is really simple to manipulate verbatim text: pass the contents of the environment to Lua; use Lua functions to do the text-manipulation; and in Lua call [[cld|<code>context.something()</code>]] functions to produce the ConTeXt code you want.
That's all. Finally, we will go into a little more detail on how TeX and Lua communicate with each other. == Some more Other examples == * [[Calculations_in_Lua|Calculations in Lua]] (warning date 2012)* [[LPeg|Writing a parser with LPeg]] (Lua Parsing Expression Grammars)* [[Random|Random numbers]] in ConTeXt and MetaPost* [[SQL|An example with SQL database]]* [[Pascal's Triangle]]
= Conclusion In detail: the interaction between TeX and Lua =
To a first approximation, the interaction between TeX and Lua is straightforward. When TeX (i.e., the LuaTeX is removing many engine) starts, it loads the input file in memory and processes it token by token. When TeX barriers: using system fontsencounters {{cmd|directlua}}, it stops reading and writing Unicode filesthe file in memory, typesetting non-Latin languages<em>fully expands the argument of {{cmd|directlua}}</em>, among othersand passes the control to a Lua instance. HoweverThe Lua instance, which runs with a few preloaded libraries, processes the biggest feature expanded arguments of LuaTeX {{cmd|directlua}}. This Lua instance has a special output stream which can be accessed using <code>tex.print(…)</code>. The function <code>tex.print(…)</code> is just like the ability Lua function <code>print(…)</code> except that <code>tex.print(…)</code> prints to use a high-level programming language <em>TeX stream</em> rather than to the standard output. When the Lua instance finishes processing its input, it passes the contents of the <em>TeX stream</em> back to program TeX. This can potentially lower <ref>The output of <code>tex.print(…)</code> is buffered and not passed to TeX until the Lua instance has stopped.</ref> TeX then inserts the contents of the <em>TeX stream</em> at the current location of the file that it was reading; expands the contents of the learning curve for programming <em>TeXstream</em>; and continues. If TeX encounters another {{cmd|directlua{{cmd|, the above process is repeated.
This article only mentions only one aspect of programming TeX: macros that manipulate their As an exercise, imagine what happens when the following input and output some text to is processed by LuaTeX. The answer is in the main TeX streamfootnotes. Many other <ref>In this example, two different kinds of manipulations quotations are possible: LuaTeX provides access used to avoid escaping quotes. Escaping quotes inside {{cmd|directlua}} is tricky. The above was a contrived example; if you ever need to TeX boxesescape quotes, token listsyou can use the {{cmd|startluacode}}…{{cmd|\stopluacode{{cmd| syntax.</ref> <texcode>\directlua% {tex.print("Depth 1 \\directlua{tex.print('Depth 2')}")}</texcode> For more on this, dimensions, glues, catcodes, direction parameters, math parameters, etcsee the [http://wiki.luatex.org/index. The details can be found in php/Writing_Lua_in_TeX] article on the [http://wwwwiki.luatex.org/documentationindex.html php/Main_Page LuaTeX manualwiki].
= Notes =
<references />
{{note | This article is a wikified version of originally based on [https://www.tug.org/members/TUGboat/tb30-2/tb95mahajan-luatex.pdf this TugBoat article ]. Feel free to modify it. }} 
[[Category:Lua]][[Category:LuaTeX]][[Category:Programmingand Databases]]
2

edits

Navigation menu