Difference between revisions of "Programming in LuaTeX"

From Wiki
Jump to navigation Jump to search
(wikified tugboat article)
 
m (→‎Arithmetic without using an abacus: - add math-fun examples for lmtx)
 
(82 intermediate revisions by 13 users not shown)
Line 1: Line 1:
{{note | This is a wikified version of [https://www.tug.org/members/TUGboat/tb30-2/tb95mahajan-luatex.pdf this TugBoat article ]. Feel free to modify it. }}
+
= Calling Lua from TeX =
  
In this article, I explain how to use lua to write macros in [[LuaTeX]]. I give some examples of macros that are complicated in [[PdfTeX]], but can be defined easily using lua in luaTeX. These examples include macros that do arithmetic on their arguments, use loops, and parse their arguments.  
+
The interweaving of ConTeXt and Lua consists of two elements: first you tell TeX that you're starting some Lua code; then, once inside Lua, you need to use the appropriate functions to put things into the TeX stream.
  
= Introduction =
+
There are two main ways to execute Lua code in a ConTeXt document: The command <code>\ctxlua</code>, and the environment <code>\startluacode...\stopluacode</code>. Both are wrappers around the LuaTeX primitive <code>\directlua</code>, which you should never need to use. In general, you will define a function inside a <code>\startluacode</code> block, and then define a TeX command that calls the function using <code>\ctxlua</code>, especially because <code>\ctxlua</code> has a few idiosyncracies.
  
TeX is getting a new engine—luaTeX. As its name suggests, luaTeX adds lua, a programming language, to TeX, the typesetter. I cannot overemphasize the significance of being able to program TeX in a high-level programming language. For example, consider a TeX macro that divides two numbers. Such a macro is provided by the <tt>fp</tt> package and also by <tt>pgfmath</tt> library of the <tt>TikZ</tt> package. The following comment is from the <tt>fp</tt> package
+
The main thing about Lua code in a TeX document is this: the code is expanded by TeX ''before'' Lua gets to it. '''This means that all the Lua code, even the comments, must be valid TeX!''' A string like "\undefined" will cause an immediate failure.
 +
 
 +
== Calling a bit of Lua inline: \ctxlua ==
 +
 
 +
The command <code>\ctxlua</code> is for short inline snippets of Lua, such
 +
as
 +
 
 +
<texcode>
 +
$2 + 5 \neq \ctxlua{context(3+5)}$, but is equal to \ctxlua{context(2+5)}.
 +
This is \ctxlua{context(string.upper("absolutely"))} true.
 +
</texcode>
 +
 
 +
<code>\ctxlua</code> operates under the normal TeX catcodes (category codes). This means the following two things for the Lua code inside:
 +
* all newlines get treated as spaces
 +
* special TeX characters like &, #, $, {, }, etc., need to be escaped.
 +
 
 +
In addition, the warning above still holds. All the Lua code, even the comments, must be valid TeX.
 +
 
 +
Some code to illustrate the newline problem:
 
<texcode>
 
<texcode>
\def\FP@div#1#2.#3.#4\relax#5.#6.#7\relax{%
+
\ctxlua
% [...] algorithmic idea (for x>0, y>0)
+
  {-- A Lua comment
%  - %determine \FP@shift  such that
+
  tex.print("This is not printed")}
%      y*10^\FP@shift < 100000000
+
\ctxlua
%                    <=y*10^(\FP@shift+1)
+
   {% A Tex comment
%   - %determine \FP@shift' such that
+
  tex.print("This is printed")}
%      x*10^\FP@shift'< 100000000
 
%                    <=x*10^(\FP@shift+1)
 
%  - x=x*\FP@shift'
 
%  - y=y*\FP@shift
 
%  - \FP@shift=\FP@shift-\FP@shift'
 
%  - res=0
 
%  - while y>0 %fixed-point representation!
 
%  -  \FP@times=0
 
%  -   while x>y
 
%  -    \FP@times=\FP@times+1
 
%  -    x=x-y
 
%  -  end
 
%  -  y=y/10
 
%  -  res=10*res+\FP@times/1000000000
 
%  - end
 
%  - %shift the result according to \FP@shift
 
 
</texcode>
 
</texcode>
  
The <tt>pgfmath</tt> library implements the macro in a similar way, but limits the number of shifts that it does. These macros highlight the state of affairs in writing TeX macros. Even simple things like multiplying two numbers are hard; you either have to work extremely hard to circumvent the programming limitations of TeX, or, more frequently, hope that someone else has done the hard work for you. In luaTeX, such a function can be written using the <code>/</code> operator (I will explain the details later):
+
The problem with special TeX characters. (<code>#t</code> is Lua for 'the length of array <code>t</code>.)
 
<texcode>
 
<texcode>
\def\DIVIDE#1#2{\directlua{tex.print(#1/#2)}}
+
% This doesn't work:
 +
%\ctxlua
 +
%  {local t = {1,2,3,4}
 +
%  tex.print("length " .. #t)}
 +
\ctxlua
 +
  {local t = {1,2,3,4}
 +
  tex.print("length " .. \string#t)}
 
</texcode>
 
</texcode>
  
Thus, with luaTeX ordinary users can write simple macros; and, perhaps more importantly, can read and understand macros written by TeX wizards.
 
  
Since the luaTeX project started it has been actively supported by ConTeXt. <ref>Not surprising, as two of luaTeX's main developers\Dash Taco Hoekwater and Hans Hagen\Dash are also the main ConTeXt developers.</ref> These days, the various <em>How do I write such a macro</em> questions on the ConTeXt mailing list are answered by a solution that uses lua. I present a few such examples in this article. I have deliberately avoided examples about fonts and non-Latin languages. There is already quite a bit of documentation about them. In this article, I want to highlight how to use luaTeX to write macros that require some <em>flow control</em>: randomized outputs, loops, and parsing.
+
== Calling a lua function with \cldcontext and get the return ==
  
 +
One can execute a Lua code from within TeX and get back the result in TeX by using {{code|\cldcontext}}. Thus, if {{code|myfunction}} is a function of a variable {{code|x}} defined in Lua, {{code|\cldcontext{myfunction(5)}}} returns the value {{code|myfunction(5)}} in TeX. This is equivalent to {{code|\ctxlua{context(myfunction(5))}}}.
  
= Interaction between TeX and lua =
 
  
To a first approximation, the interaction between TeX and lua is straightforward. When TeX (i.e., the luaTeX engine) starts, it loads the input file in memory and processes it token by token. When TeX encounters <code>\directlua</code>, it stops reading the file in memory, {\em fully expands the argument of\/ <code>\directlua</code>}, and passes the control to a lua instance. The lua instance, which runs with a few preloaded libraries, processes the expanded arguments of <code>\directlua</code>. This lua instance has a special output stream which can be accessed using <code>tex.print(...)</code>. The function <code>tex.print(...)</code> is just like the lua function <code>print(...)</code> except that <code>tex.print(...)</code> prints to a <em>TeX stream</em> rather than to the standard output. When the lua instance finishes processing its input, it passes the contents of the <em>TeX stream</em> back to TeX.<ref>The output of <code>tex.print(...)</code> is buffered and not passed to TeX until the lua instance has stopped.</ref> TeX then inserts the contents of the <em>TeX stream</em> at the current location of the file that it was reading; expands the contents of the <em>TeX stream</em>; and continues. If TeX encounters another <code>\directlua</code>, the above process is repeated.
+
== A larger Lua block: \startluacode...\stopluacode ==
  
As an exercise, imagine what happens when the following input is processed by luaTeX. <ref>In this example, I used two different kinds of quotations to avoid escaping quotes. Escaping quotes inside  <code>\directlua</code> is tricky. The above was a contrived example; if you ever need to escape quotes, you can use the <code>\startluacode ... \stopluacode</code> syntax explained later.</ref>
+
Inside the <code>\startluacode...\stopluacode</code> environment, newlines and special characters behave normally. This solves the catcode problem that <code>\ctxlua</code> suffers from. Apart from these special characters, the main warning remains in force: all the Lua code, even the comments, must be valid TeX.
  
 
<texcode>
 
<texcode>
\directlua%
+
\startluacode
  {tex.print("Depth 1
+
    -- The unknown command \undefined will cause this entire block to fail.
          \\directlua{tex.print('Depth 2')}")}
+
 
 +
    -- Print a countdown '10, 8, ..., 0!'
 +
    -- `..` is Lua for string concatenation
 +
    for i = 10, 2, -2 do
 +
        context(i .. ", ")
 +
    end
 +
    context("0!")
 +
 
 +
    -- \\par is equivalent to a blank line in the input
 +
    -- (Notice the escaped backslash: TeX won't mind the above comment.)
 +
    context.par()
 +
 
 +
    -- Look! we can use # and $ with impunity!
 +
    context("Unless we print them, then we must \\#\\$\\& print the escape characters, too.")
 +
\stopluacode
 
</texcode>
 
</texcode>
  
On top of these luaTeX primitives, ConTeXt provides a higher level interface. There are two ways to call lua from ConTeXt. The first is a macro <code>\ctxlua</code> (read as ConTeXt lua), which is similar to <code>\directlua</code>. (Aside: It is possible to run the lua instance under different name spaces. <code>\ctxlua</code> is the default name space; other name spaces are explained later.) <code>\ctxlua</code> is good for calling small snippets of lua. The argument of <code>\ctxlua</code> is parsed under normal TeX catcodes (category codes), so the end of line character has the same catcode as a space. This can lead to surprises. For example, if you try to use a lua comment, everything after the comment gets ignored.
+
== Putting Lua code in an external file ==
 +
 
 +
You can put your lua code in an external file (with the <code>.lua</code> extension) and include it with the <code>require</code> command:
 +
 
 
<texcode>
 
<texcode>
\ctxlua
+
\startluacode
  {-- A lua comment
+
-- include the file my-lua-lib.lua
  tex.print("This is not printed")}
+
require("my-lua-lib")
 +
\stopluacode
 
</texcode>
 
</texcode>
  
This can be avoided by using a TeX comment instead of a lua comment. However, working under normal TeX catcodes poses a bigger problem: special TeX characters like \letterampersand, \letterhash, \letterdollar, \{, \}, etc., need to be escaped. For example, \letterhash\ has to be escaped with <code>\string</code> to be used in <code>\ctxlua</code>.
+
== Namespaces ==
 +
 
 +
It is a good habit to put your custom-defined functions in their own namespace. The traditional namespace for this is <code>userdata</code>:
 
<texcode>
 
<texcode>
\ctxlua
+
\startluacode
  {local t = {1,2,3,4}
+
    -- if userdata doesn't exist yet, create it
  tex.print("length " .. \string#t)}
+
    userdata = userdata or {}
 +
    -- define a shorter synonym
 +
    u = userdata
 +
 
 +
    -- create my custom function inside the userdata namespace
 +
    function u.myfunction()
 +
        -- do stuff
 +
    end       
 +
\stopluacode
 
</texcode>
 
</texcode>
  
As the argument of <code>\ctxlua</code> is fully expanded, escaping characters can sometimes be tricky. To circumvent this problem, ConTeXt defines a  environment called <code>\startluacode ... \stopluacode</code>. This sets the catcodes to what one would expect in lua. Basically only <code>\</code> has its usual TeX meaning, the catcode of everything else is set to other. So, for all practical purposes, we can forget about catcodes inside <code>\startluacode ... \stopluacode</code>. The above two examples can be written as
+
The full list of canonical namespaces, taken from [http://minimals.contextgarden.net/current/context/alpha/tex/context/base/luat-ini.lua luat-ini.lua]:
 +
 
 +
<code><pre>
 +
userdata      = userdata      or { } -- for users (e.g. functions etc)
 +
thirddata    = thirddata    or { } -- only for third party modules
 +
moduledata    = moduledata    or { } -- only for development team
 +
documentdata  = documentdata  or { } -- for users (e.g. raw data)
 +
parametersets = parametersets or { } -- experimental for team
 +
</pre></code>
 +
 
 +
If your module, environment, or document is going to be used by other people, you should create your own subnamespaces within these tables.
 +
 
 +
<code><pre>
 +
moduledata['mymodule'] = { }
 +
mm = moduledata.mymodule
 +
function mm.mainfunction()
 +
    -- do stuff
 +
end
 +
</pre></code>
 +
 
 +
=  Calling TeX from Lua =
 +
 
 +
Being a topic on itself, pages are dedicated:
 +
* '''[[CLD|ConTeXt Lua Documents]]''', or CLD, are way to access TeX from inside Lua scripts. A page give clues about [[CLD_passing_variables|passing variables]] within CLD (2018).
 +
* [[Lua|Wiki page dedicated to Lua]]
 +
** [[Extensions to the Lua I/O library]]
 +
** [[String manipulation]]
 +
** [[Table manipulation]]
 +
 
 +
= Putting stuff in your TeX document from Lua =
 +
 
 +
== Simple printing: context(), tex.print(), and tex.sprint() ==
 +
Use <code>context(...)</code> for most things. It is equivalent to <code>tex.print(string.format(...))</code>, so
 +
 
 
<texcode>
 
<texcode>
 
\startluacode
 
\startluacode
  -- A lua comment
+
name = "Jane"
  tex.print("This is printed.")
+
date = "today"
  local t = {1,2,3,4}
+
context("Hello %s, how are you %s?", name, date)
  tex.print("length " .. #t)
+
-- Becomes 'Hello Jane, how are you today?'
 
\stopluacode
 
\stopluacode
 
</texcode>
 
</texcode>
  
This environment is meant for moderately sized code snippets. For longer lua code, it is more convenient to write the code in a separate lua file and then load it using lua's <code>dofile(...)</code> function.  
+
More primitively, you have <code>tex.print()</code> and <code>tex.sprint()</code>. Either one can take as an argument either a number of strings, or an array of strings, and will then insert the strings into the TeX stream. The only difference is that <code>tex.print()</code> treats each string as a separate input line, while <code>tex.sprint()</code> doesn't. So the following lines
  
ConTeXt also provides a lua function to conveniently write to the TeX stream. The function is called <code>context(...)</code> and it is equivalent to <code>tex.print(string.format(...))</code>.
+
<texcode>
 +
\ctxlua{tex.print("a", "b")}
 +
\ctxlua{tex.print({"a", "b"})}
 +
</texcode>
 +
 
 +
are both interpreted by TeX as
  
Using the above, it is easy to define TeX macros that pass control to lua, do some processing in lua, and then pass the result back to TeX. For example, a macro to convert a decimal number to hexadecimal can be written simply, by asking lua to do the conversion.
 
 
<texcode>
 
<texcode>
\def\TOHEX#1{\ctxlua{context("\%X",#1)}}
+
a
\TOHEX{35}
+
b
 
</texcode>
 
</texcode>
  
The percent sign had to be escaped because <code>\ctxlua</code> assumes TeX catcodes. Sometimes, escaping arguments can be difficult; instead, it can be easier to define a lua function inside <code>\startluacode ... \stopluacode</code> and call it using <code>\ctxlua</code>. For example, a macro that takes a comma separated list of strings and prints a random item can be written as  
+
but when we use <code>tex.sprint</code> instead, either of the following
 +
 
 +
<texcode>
 +
\ctxlua{tex.sprint("a", "b")}
 +
\ctxlua{tex.sprint({"a", "b"})}
 +
</texcode>
 +
 
 +
will be read by TeX as
 +
 
 +
<texcode>
 +
ab
 +
</texcode>
 +
 
 +
without any space in between.
 +
 
 +
== Context commands ==
 +
 
 +
Most commands that you would type with a backslash in plain ConTeXt, you can access from Lua with <code>context.<em>command</em></code>. Unadorned strings end up in TeX as arguments in curly braces; Lua tables end up in TeX as paramater blocks in square brackets. The following two pieces of code are equivalent:
 +
 
 
<texcode>
 
<texcode>
 
\startluacode
 
\startluacode
  userdata = userdata or {}
+
    context.chapter({first}, "Some title")
  math.randomseed( os.time() )
+
    context.startcolumns({n = 3, rule = "on"})
  function userdata.random(...)
+
        context("Hello one")
     context(arg[math.random(1, #arg)])
+
    context.column()
  end
+
        context("Hello two")
 +
    context.column()
 +
        context("Hello three")
 +
     context.stopcolumns()
 
\stopluacode
 
\stopluacode
 
+
</texcode>
\def\CHOOSERANDOM#1%
 
  {\ctxlua{userdata.random(#1)}}
 
  
\CHOOSERANDOM{"one", "two", "three"}
+
<texcode>
 +
    \chapter[first]{Some title}
 +
    \startcolumns[n=3, rule=on]
 +
        Hello one
 +
    \column
 +
        Hello two
 +
    \column
 +
        Hello three
 +
    \stopcolumns
 
</texcode>
 
</texcode>
  
I could have written a wrapper so that the function takes a list of words and chooses a random word among them. For an example of such a conversion, see the <em>sorting a list of tokens</em> page on the [http://luatex.bluwiki.com/go/\crlf Sort_a_token_list luaTeX wiki]
+
For a fuller account of the context.commands, see the [http://www.pragma-ade.com/general/manuals/cld-mkiv.pdf ConTeXt Lua document] manual. It is old, but most of it still applies.
  
In the above, I created a name space called <code>userdata</code> and defined the function <code>random</code> in that name space. Using a name space avoids clashes with the lua functions defined in luaTeX and ConTeXt.
+
One final note: arguments can also be specified in the form of nested functions. Because LuaTeX evaluates the deepest-nested argument first, this may cause the <code>context()</code> calls to be evaluated in the wrong order. For more on this, see the article on [[CLD|ConTeXt Lua documents]], and also, again, the [http://www.pragma-ade.com/general/manuals/cld-mkiv.pdf CLD manual].
  
In order to avoid name clashes, ConTeXt also defines independent name spaces of lua instances. They are
+
= Passing arguments and buffers: ConTeXt commands that hook into Lua =
  
{|
+
== Making \command{arg1}{arg2} hook into Lua ==
|-
+
First, define a Lua function:
| user   
 
| a private user instance
 
|-
 
| third   
 
| third party module instance 
 
|-
 
| module 
 
| ConTeXt module instance
 
|-
 
|  isolated
 
| an isolated instance
 
|}
 
  
Thus, for example, instead of <code>\ctxlua</code> and \type{\startluacode ... \stopluacode}, the <code>user</code> instance can be accessed via the macros <code>\usercode</code> and <code>\startusercode ... \stopusercode</code>. In instances other than <code>isolated</code>, all the lua functions defined by ConTeXt (but not the inbuilt lua functions) are stored in a <code>global</code> name space. In the <code>isolated</code> instance, all lua functions defined by ConTeXt are hidden and cannot be accessed. Using these instances, we could write the above <code>\CHOOSERANDOM</code> macro as follows
 
 
<texcode>
 
<texcode>
\startusercode
+
\startluacode
  math.randomseed( global.os.time() )
+
     -- remember, using the userdata namespace prevents conflicts
  function random(...)
+
    userdata = userdata or {}
     global.context(arg[math.random(1, #arg)])
 
  end
 
\stopusercode
 
  
\def\CHOOSERANDOM#1%
+
    function userdata.surroundwithdashes(str)
  {\usercode{random(#1)}}
+
        context("--" .. str .. "--")
 +
    end
 +
\stopluacode
 
</texcode>
 
</texcode>
  
Since I defined the function <code>random</code> in the <code>user</code> instance of lua, I did not bother to use a separate name space for the function. The lua functions <code>os.time</code>, which is defined by a luaTeX library, and <code>context</code>, which is defined by ConTeXt, needed to be accessed through a <code>global</code> name space. On the other hand, the <code>math.randomseed</code> function, which is part of lua, could be accessed as is.
+
Then define the TeX command that expands to a <code>\ctxlua</code> call:
  
A separate lua instance also makes debugging slightly easier. With <code>\ctxlua</code> the error message starts with
 
 
<texcode>
 
<texcode>
! LuaTeX error &lt;main ctx instance&gt;:
+
\def\surroundwd#1%
 +
    {\ctxlua{userdata.surroundwithdashes([==[#1]==])}}
 
</texcode>
 
</texcode>
  
With <code>\usercode</code> the error message starts with
+
''NB'': quoting with <code>[==[#1]==]</code>
 +
([http://www.lua.org/manual/5.2/manual.html#3.1 long strings])
 +
works just like <code>"#1"</code> in most cases, but in addition
 +
it is robust against <code>#1</code> containing the quotation mark
 +
<code>"</code> which would terminate the Lua string prematurely.
 +
Inside <code>\protect .. \unprotect</code> the macros <code>\!!bs</code>
 +
and <code>\!!es</code> are at your disposition.
 +
They are equivalent to <code>[===[</code> and <code>]===]</code> and --
 +
being single tokens to TeX -- parsed faster.
 +
(See [http://repo.or.cz/w/context.git/blob/refs/heads/origin:/tex/context/base/luat-ini.mkiv#l174 <code>luat-ini.mkiv</code>].)
 +
 
 +
== Making \startenv...\stopenv hook into Lua ==
 +
The first job is, as ever, to have the Lua function at the ready
 
<texcode>
 
<texcode>
! LuaTeX error &lt;private user instance&gt;:
+
\startluacode
 +
    userdata = userdata or {}
 +
 
 +
    function userdata.verynarrow(buffer)
 +
        -- equivalent to \startnarrower[10em]
 +
        context.startnarrower({"10em"})
 +
            context(buffer)
 +
        context.stopnarrower()
 +
    end
 +
\stopluacode
 
</texcode>
 
</texcode>
  
This makes it easier to narrow down the source of error.
+
Next, we define the start command of our custom buffer:
  
Normally, it is best to define your lua functions in the <code>user</code> name space. If you are writing a module, then define your lua functions in the <code>third</code> instance and in a name space which is the name of your module. In this article, I will simply use the default lua instance, but take care to define all my lua functions in a <code>userdata</code> name space.
+
<texcode>
 +
\def\startverynarrow%
 +
  {\dostartbuffer
 +
    [verynarrow]      % buffer name
 +
    [startverynarrow] % command where buffer starts
 +
    [stopverynarrow]} % command where buffer ends
 +
                      % also: command invoked when buffer stops
  
Now that we have some idea of how to work with luaTeX, let's look at some examples.
+
</texcode>
  
= Arithmetic without using a abacus =
+
Lastly, we define the <code>\stopverynarrow</code> command such that it passes the recently-complated buffer to our <code>verynarrow</code> Lua function:
  
Doing simple arithmetic in TeX can be extremely difficult, as illustrated by the division macro in the introduction. With lua, simple arithmetic becomes trivial. For example, if you want a macro to find the cosine of an angle (in degrees), you can write
+
<texcode>
 +
\def\stopverynarrow
 +
  {\ctxlua
 +
    {userdata.verynarrow(buffers.getcontent('verynarrow'))}}
 +
</texcode>
 +
 
 +
And that's it! The rest of this article will consist of examples.
 +
 
 +
= Examples =
 +
 
 +
== Arithmetic without using an abacus ==
 +
 
 +
''This example demonstrates writing simple commands that invoke \ctxlua.''
 +
 
 +
Doing simple arithmetic in TeX can be extremely difficult. With Lua, simple arithmetic becomes trivial. For example, if you want a macro to find the cosine of an angle (in degrees), you can write
 
<texcode>
 
<texcode>
 
\def\COSINE#1%
 
\def\COSINE#1%
   {\ctxlua(context(math.cos(#1*2*pi/360))}
+
   {\ctxlua{context(math.cos(#1*2*math.pi/360))}}
 
</texcode>
 
</texcode>
  
Line 167: Line 293:
 
$\pi = \ctxlua{context(math.pi)}$  
 
$\pi = \ctxlua{context(math.pi)}$  
 
</texcode>
 
</texcode>
or if you want less precision (notice the percent sign is escaped)
+
or, if you want less precision:
 
<texcode>  
 
<texcode>  
$\pi = \ctxlua{context("\%.6f", math.pi)}$
+
$\pi = \ctxlua{context("\letterpercent.6f", math.pi)}$
 +
</texcode>
 +
Notice that the percent sign is escaped with letterpercent.
 +
 
 +
=== mathexpr with LMTX ===
 +
 
 +
In LMTX there is a new way to use calculated expressions with mathexpr through ([https://github.com/contextgarden/context-mirror/blob/7fd782dace8f90e7e032ca8f449f8ca4eada450b/doc/context/sources/general/manuals/math/math-fun.tex math-fun]).
 +
 
 +
Some examples are:
 +
 
 +
<texcode>
 +
$ \pi = \mathexpr[.40N]{pi}            $
 +
$ \pi = \mathexpr[.80N]{sqrt(11)}      $
 +
$ \pi = \decimalexpr[.80N]{sqrt(11)}  $
 +
$ \pi = \decimalexpr{sqrt(11)}        $
 +
$ c = \complexexpr{123 + new(456,789)} $
 
</texcode>
 
</texcode>
  
 +
== Loops without worrying about expansion ==
  
= Loops without worrying about expansion =
+
''This example demonstrates using Lua to write a quasi-repetitive piece of ConTeXt code.''
  
Loops in TeX are tricky because macro assignments and macro expansion interact in strange ways. For example, suppose we want to typeset a table showing the sum of the roll of two dice and want the output to look like this
+
Loops in TeX are tricky, because macro assignments and macro expansion interact in strange ways. For example, suppose we want to typeset a table showing the sum of the roll of two dice and want the output to look like this:
 
<context source="yes">
 
<context source="yes">
 
\setupcolors[state=start]
 
\setupcolors[state=start]
Line 200: Line 342:
 
</context>
 
</context>
  
The tedious (but  faster!) way to achieve this is to simply type the whole table by hand.  
+
This is easy in LuaTeX. Once a Lua instance starts, TeX does not see anything until the Lua instance exits. So, we can write the loop in Lua, and simply print the values that we would have typed to the TeX stream. When the control is passed to TeX, TeX sees the input as if we had typed it by hand. This is the Lua code for the above table:
 
 
It is however natural to want to write this table as a loop, and compute the values. A first ConTeXt implementation using the recursion level might be:
 
  
 
<texcode>
 
<texcode>
\bTABLE 
+
\setupcolors[state=start]
\bTR 
+
\setupTABLE[each][each][width=2em,height=2em,align={middle,middle}]  
  \bTD $(+)$ \eTD 
+
\setupTABLE[r][1][background=color,backgroundcolor=gray]  
  \dorecurse{6} 
+
\setupTABLE[c][1][background=color,backgroundcolor=gray]
  {\bTD \recurselevel \eTD} 
 
  \eTR 
 
\dorecurse{6} 
 
{\bTR 
 
    \bTD \recurselevel \eTD 
 
    \edef\firstrecurselevel{\recurselevel} 
 
  \dorecurse{6} 
 
  {\bTD
 
  \the\numexpr\firstrecurselevel+\recurselevel
 
  \eTD}%
 
  \eTR} 
 
\eTABLE
 
</texcode>
 
 
 
However, this does not work as expected, yielding all zeros.
 
 
 
A natural table stores the contents of all the cells, before typesetting it. But it does not expand the contents of its cell before storing them. So, at the time the table is actually typeset, TeX has already finished the <code>\dorecurse</code> and <code>\recurselevel</code> is set to 0.
 
 
 
The solution is to place <code>\expandafter</code> at the correct location(s) to coax TeX into expanding the <code>\recurselevel</code> macro before the natural table stores the cell contents. The difficult part is figuring out the exact location of <code>\expandafter</code>s. Here is a solution that works:
 
 
 
<texcode>
 
\bTABLE 
 
\bTR 
 
  \bTD $(+)$ \eTD 
 
  \dorecurse{6}   
 
  {\expandafter \bTD \recurselevel \eTD}  
 
  \eTR 
 
\dorecurse{6} 
 
{\bTR 
 
\edef\firstrecurselevel{\recurselevel} 
 
\expandafter\bTD \recurselevel \eTD 
 
\dorecurse{6} 
 
{\expandafter\bTD
 
  \the\numexpr\firstrecurselevel+\recurselevel
 
  \relax
 
  \eTD} 
 
\eTR} 
 
\eTABLE 
 
</texcode>
 
  
We only needed to add three <code>\expandafter</code>s to make the naive loop work. Nevertheless, finding the right location of <code>\expandafter</code> can be frustrating, especially for a non-expert.
 
 
By contrast, in luaTeX writing loops is easy. Once a lua instance starts, TeX does not see anything until the lua instance exits. So, we can write the loop in lua, and simply print the values that we would have typed to the TeX stream. When the control is passed to TeX, TeX sees the input as if we had typed it by hand. Consequently, macro expansion is no longer an issue. For example, we can get the above table by:
 
<texcode>
 
 
\startluacode
 
\startluacode
 
context.bTABLE()
 
context.bTABLE()
Line 273: Line 370:
 
</texcode>
 
</texcode>
  
The lua functions such as <code>context.bTABLE()</code> and <code>context.bTR()</code> are  just abbreviations for running <code>context ("\\bTABLE")</code>, <code>context("\\bTR")</code>, etc. See the [http://www.pragma-ade.com/general/manuals/cld-mkiv.pdf ConTeXt lua document] manual for more details about such functions. The rest of the code is a simple nested for-loop that computes the sum of two dice. We do not need to worry about macro expansion at all!
+
== Parsing input without exploding your head ==
  
= Parsing input without exploding your~head =
+
''This example demonstrates parsing simple ASCII notation with Lua's lpeg parser.''
  
In order to get around the weird rules of macro expansion, writing a parser in TeX involves a lot of macro jugglery and catcode trickery. It is a black art, one of the biggest mysteries of TeX for ordinary users.
+
As an example, let's consider typesetting chemical molecules in TeX. Normally, molecules should be typeset in text mode rather than math mode.
 +
If we want
 +
:H<sub>3</sub>SO<sub>4</sub><sup>+</sup>,
 +
we must type
 +
:<code>H\low{3}SO\lohi{4}{\textplus}</code>,
 +
but we'd much rather type
 +
:<code>\molecule{H_3SO_4^+}</code>.
 +
 
 +
So, we need a function that can take a string like that, parse it, and turn it into the appropriate TeX code. LuaTeX includes a general parser based on PEG (parsing expression grammar) called [http://www.inf.puc-rio.br/~roberto/lpeg/lpeg.html lpeg], and it makes writing little parsers positively joyful. (Once you've got the knack of it, at least.) For example, the above <code>\molecule</code> macro can be written as follows.
  
As an example, let's consider typesetting chemical molecules in TeX. Normally, molecules should be typeset in text mode rather than math mode. For example, <context>H\low{2}SO\lohi{4}{--}</context>, can be input as <code>H\low{2</code>SO\lohi{4}{--}}. Typing so much markup can be cumbersome. Ideally, we want a macro such that we type <code>\molecule{H_2SO_4^-</code>} and the macro translates this into <code>H\low{2</code>SO\lohi{4}{--}}. Such a macro can be written in TeX as follows.
 
 
<texcode>
 
<texcode>
\newbox\chemlowbox
+
\startluacode
\def\chemlow#1%  
+
 
  {\setbox\chemlowbox
+
-- we will put our molecule function in the userdata namespace.
          \hbox{{\switchtobodyfont[small]#1}}}  
+
userdata = userdata or { }
 +
 
 +
-- The formatting functions into which the captured
 +
-- superscript/subscript blocks will be fed
 +
local formatters = { }
 +
 
 +
function formatters.low(one)
 +
    return string.format("\\low{%s}", one)
 +
end
 +
 
 +
function formatters.high(one)
 +
    return string.format("\\high{%s}", one)
 +
end
 +
 
 +
function formatters.lowhigh(one, two)
 +
    return string.format("\\lohi{%s}{%s}", one, two)
 +
end
 +
 
 +
function formatters.highlow(one, two,three)
 +
    return string.format("\\lohi{%s}{%s}", one,two)
 +
end
 +
 
 +
-- These are the characters we may encounter
 +
-- The `/` means we want to expand + and - to \textplus c.q. \textminus;
 +
-- this substition is not instant, but will take place inside the first
 +
-- surrounding lpeg.Cs() call.
 +
local plus        = lpeg.P("+") / "\\textplus "
 +
local minus        = lpeg.P("-") / "\\textminus "
 +
local character    = lpeg.R("az", "AZ", "09") -- R is for 'range'
 +
local subscript    = lpeg.P("_")              -- P is simply for 'pattern'
 +
local superscript  = lpeg.P("^")
 +
local leftbrace    = lpeg.P("{")
 +
local rightbrace  = lpeg.P("}")
 +
 
 +
-- a ^ or _ affects either a single character, or a brace-delimited
 +
-- block. Whichever it is, call it `content`.
 +
local single    = character + plus + minus
 +
local multiple  = leftbrace * single^1 * rightbrace
 +
local content  = single + multiple
 +
 
 +
-- These are our top-level elements: non-special text, of course, and
 +
-- blocks of superscript/subscript/both.
 +
-- lpeg.Cs(content) does two things:
 +
-- (1) not all matches go into the `/ function` construction; only
 +
--    *captures* go in. The C in Cs stands for Capture. This way,
 +
--    the superscript/subscript mark gets discarded.
 +
-- (2) it expands plus/minus before they go into the formatter. The
 +
--    s in Cs stands for 'substitute in the replacement values, if any'
 +
local text    = single^1
 +
local low    = subscript * lpeg.Cs(content)
 +
                / formatters.low
 +
local high    = superscript * lpeg.Cs(content)
 +
                / formatters.high
 +
local lowhigh = subscript  * lpeg.Cs(content) *
 +
                superscript * lpeg.Cs(content)
 +
                / formatters.lowhigh
 +
local highlow = superscript * lpeg.Cs(content) *
 +
                subscript * lpeg.Cs(content)
 +
                / formatters.highlow
 +
 
 +
-- Finally, the root element: 'moleculepattern'
 +
local moleculepattern = lpeg.Cs((lowhigh + highlow + low + high + text)^0)
  
\def\chemhigh#1%
+
function thirddata.molecule(string)
   {\ifvoid\chemlowbox
+
    -- * `:match` returns the matched string. Our pattern
      \high{{\switchtobodyfont[small]#1}}%
+
    --   `moleculepattern` should match the entire input string. Any
  \else
+
    --  *performed* substitutions are retained. (`.Cs()` performs a
      \lohi{\box\chemlowbox}
+
    --  previously defined substitution.)
          {{\switchtobodyfont[small]#1}}
+
    -- * `context()` inserts the resulting string into the stream, ready for
  \fi}
+
    --  TeX to evaluate.
 +
    context(moleculepattern:match(string))
 +
end
  
\def\finishchem%
+
\stopluacode
  {\ifvoid\chemlowbox\else
 
      \low{\box\chemlowbox}
 
  \fi}
 
  
\unexpanded\def\molecule%
+
\def\molecule#1{\ctxlua{thirddata.molecule("#1")}}
  {\bgroup
 
  \catcode`\_=\active \uccode`\~=`\_
 
      \uppercase{\let~\chemlow}%
 
  \catcode`\^=\active \uccode`\~=`\^
 
      \uppercase{\let~\chemhigh}%
 
  \dostepwiserecurse {65}{90}{1}  
 
  {\catcode \recurselevel = \active
 
    \uccode`\~=\recurselevel
 
        \uppercase{\edef~{\noexpand\finishchem
 
    \rawcharacter{\recurselevel}}}}%
 
    \catcode`\-=\active \uccode`\~=`\-
 
        \uppercase{\def~{--}}%
 
    \domolecule }%
 
  
\def\domolecule#1{#1\finishchem\egroup}
+
\starttext
 +
    \molecule{Hg^+}, \molecule{SO_4^{2-}}
 +
\stoptext
 
</texcode>
 
</texcode>
  
This monstrosity is a typical TeX parser. Appropriate characters need to be made active; occasionally, <code>\lccode</code> and <code>\uccode</code> need to be set; signaling tricks are needed (for instance, checking if <code>\chemlowbox</code> is void); and then magic happens (or so it seems to a flabbergasted user). More sophisticated parsers involve creating finite state automata, which look even more monstrous.
+
Quite terse and readable by parser standards, isn't it?
  
With luaTeX, things are different.  luaTeX includes a general parser based on PEG (parsing expression grammar) called [http://www.inf.puc-rio.br/roberto/lpeg/lpeg.html lpeg]. This makes writing parsers in TeX much more comprehensible. For example, the above <code>\molecule</code> macro can be written as
+
== Manipulating verbatim text ==
 +
 
 +
''This example demonstrates defining a custom \start...\stop buffer that gets processed through Lua in its entirety.''
 +
 
 +
Suppose we want to write an environment <code>\startdedentedtyping</code> ... <code>\stopdedentedtyping</code> that removes the indentation of the first line from every line. Thus, the output of ...
 +
 
 +
<texcode>
 +
\startdedentedtyping
 +
    #include &lt;stdio.h&gt;
 +
    void main()
 +
    {
 +
        print("Hello world \n") ;
 +
    }
 +
\stopdedentedtyping
 +
</texcode>
 +
... should be the same as the output of ...
 +
 
 +
<texcode>
 +
\starttyping
 +
#include &lt;stdio.h&gt;
 +
void main()
 +
{
 +
    print("Hello world \n") ;
 +
}
 +
\stoptyping
 +
</texcode>
 +
... even though the leading whitespace is different.
 +
 
 +
Defining an environment in TeX that removes the leading spaces but leaves
 +
other spaces untouched is complicated. On the other hand, once we capture the
 +
contents of the environment, removing the leading indent or ''dedenting'' the
 +
content in Lua is easy. Here is a Lua function that uses simple string
 +
substitutions.
  
 
<texcode>
 
<texcode>
 
\startluacode
 
\startluacode
userdata = userdata or {}
+
  -- Initialize a userdata name space to keep our own functions in.
 +
  -- That way, we won't interfere with anything ConTeXt keeps in
 +
  -- the global name space.
 +
  userdata = userdata or {}
 +
 
 +
  function userdata.dedentedtyping(content)
 +
    local lines    = string.splitlines(content)
 +
    local indent  = string.match(lines[1], '^ +') or ''
 +
    local pattern  = '^' .. indent
 +
    for i=1,#lines do
 +
      lines[i] = string.gsub(lines[i],pattern,"")
 +
    end
 +
 
 +
    content = table.concat(lines,'\n')
 +
 
 +
    tex.sprint("\\starttyping\n" .. content .. "\\stoptyping\n")
 +
 
 +
    -- The typing environment looks for an explicit \type{\stoptyping}. So,
 +
    --    context.starttyping() context(content) context.stoptyping()
 +
    -- does not work. But
 +
    --    context.starttyping() context(content) tex.sprint("\\stoptyping")
 +
    -- does.
 +
  end
 +
\stopluacode
 +
</texcode>
  
local lowercase  = lpeg.R("az")
+
Here is the code for defining the <code>\startdedentedtyping...\stopdedentedtyping</code> pair:
local uppercase  = lpeg.R("AZ")
 
local backslash  = lpeg.P("\\")
 
local csname      = backslash * lpeg.P(1)
 
                  * (1-backslash)^0
 
local plus        = lpeg.P("+") / "\\textplus "
 
local minus      = lpeg.P("-") / "\\textminus "
 
local digit      = lpeg.R("09")
 
local sign        = plus + minus
 
local cardinal    = digit^1
 
local integer    = sign^0 * cardinal
 
local leftbrace  = lpeg.P("{")
 
local rightbrace  = lpeg.P("}")
 
local nobrace    = 1 - (leftbrace + rightbrace)
 
local nested      = lpeg.P {leftbrace
 
                  * (csname + sign + nobrace
 
                  + lpeg.V(1))^0  * rightbrace}
 
local any        = lpeg.P(1)
 
  
local subscript   = lpeg.P("_")
+
<texcode>
local superscript = lpeg.P("^")
+
% Create an environment that stores everything
local somescript  = subscript + superscript
+
% between \startdedentedtyping and \stopdedentedtyping
 +
% in a buffer named 'dedentedtyping'.
 +
\def\startdedentedtyping
 +
   {\dostartbuffer
 +
    [dedentedtyping]
 +
    [startdedentedtyping]
 +
    [stopdedentedtyping]}
  
local content    = lpeg.Cs(csname + nested
+
% On closing the dedentedtyping environment, call the LuaTeX
                          + sign + any)
+
% function dedentedtyping(), and pass it the contents of
 +
% the buffer called 'dedentedtyping'
 +
\def\stopdedentedtyping
 +
  {\ctxlua
 +
    {userdata.dedentedtyping(buffers.getcontent('dedentedtyping'))}}
 +
</texcode>
  
local lowhigh    = lpeg.Cc("\\lohi{%s}{%s}")
+
That's all. Finally, we will go into a little more detail on how TeX and Lua communicate with each other.
                  * subscript  * content
 
                  * superscript * content
 
                  / string.format
 
local highlow    = lpeg.Cc("\\hilo{%s}{%s}")
 
                  * superscript * content
 
                  * subscript  * content
 
                  / string.format
 
local low        = lpeg.Cc("\\low{%s}")
 
                  * subscript  * content     
 
                  / string.format
 
local high      = lpeg.Cc("\\high{%s}")
 
                  * superscript * content
 
                  / string.format
 
local justtext  = (1 - somescript)^1
 
local parser    = lpeg.Cs((csname + lowhigh
 
                        + highlow + low
 
                        + high + sign + any)^0)
 
  
userdata.moleculeparser = parser
+
== Other examples ==
  
function userdata.molecule(str)
+
* [[Calculations_in_Lua|Calculations in Lua]] (warning date 2012)
    return parser:match(str)
+
* [[LPeg|Writing a parser with LPeg]] (Lua Parsing Expression Grammars)
end
+
* [[Random|Random numbers]] in ConTeXt and MetaPost
\stopluacode
+
* [[SQL|An example with SQL database]]
 +
* [[Pascal's Triangle]]
 +
 
 +
= In detail: the interaction between TeX and Lua =
 +
 
 +
To a first approximation, the interaction between TeX and Lua is straightforward. When TeX (i.e., the LuaTeX engine) starts, it loads the input file in memory and processes it token by token. When TeX encounters <code>\directlua</code>, it stops reading the file in memory, <em>fully expands the argument of <code>\directlua</code></em>, and passes the control to a Lua instance. The Lua instance, which runs with a few preloaded libraries, processes the expanded arguments of <code>\directlua</code>. This Lua instance has a special output stream which can be accessed using <code>tex.print(...)</code>. The function <code>tex.print(...)</code> is just like the Lua function <code>print(...)</code> except that <code>tex.print(...)</code> prints to a <em>TeX stream</em> rather than to the standard output. When the Lua instance finishes processing its input, it passes the contents of the <em>TeX stream</em> back to TeX.<ref>The output of <code>tex.print(...)</code> is buffered and not passed to TeX until the Lua instance has stopped.</ref> TeX then inserts the contents of the <em>TeX stream</em> at the current location of the file that it was reading; expands the contents of the <em>TeX stream</em>; and continues. If TeX encounters another <code>\directlua</code>, the above process is repeated.
 +
 
 +
As an exercise, imagine what happens when the following input is processed by LuaTeX. The answer is in the footnotes. <ref>In this example, two different kinds of quotations are used to avoid escaping quotes. Escaping quotes inside  <code>\directlua</code> is tricky. The above was a contrived example; if you ever need to escape quotes, you can use the <code>\startluacode ... \stopluacode</code> syntax.</ref>
  
\def\molecule#1%
+
<texcode>
   {\ctxlua{userdata.molecule("#1")}}
+
\directlua%
</texcode>  
+
   {tex.print("Depth 1
 +
          \\directlua{tex.print('Depth 2')}")}
 +
</texcode>
  
This is more verbose than the TeX solution, but is easier to read and write. With a proper parser, I do not have to use tricks to check if either one or both <code>_</code> and <code>^</code> are present. More importantly,  anyone (once they know the lpeg syntax) can read the parser and easily understand what it does. This is in contrast to the implementation based on TeX  macro jugglery which require you to implement a TeX interpreter in your head to understand.
+
For more on this, see the [http://wiki.luatex.org/index.php/Writing_Lua_in_TeX] article on the [http://wiki.luatex.org/index.php/Main_Page LuaTeX wiki].
  
= Conclusion =
+
= Notes =
 +
<references />
  
luaTeX is removing many TeX barriers: using system fonts, reading and writing Unicode files, typesetting non-Latin languages, among others. However, the biggest feature of luaTeX is the ability to use a high-level programming language to program TeX. This can potentially lower the learning curve for programming TeX.
+
{{note | This article is originally based on [https://www.tug.org/members/TUGboat/tb30-2/tb95mahajan-luatex.pdf this TugBoat article ]. Feel free to modify it.}}
  
In this article, I have mentioned only one aspect of programming TeX: macros that manipulate their input and output some text to the main TeX stream. Many other kinds of manipulations are possible: luaTeX provides access to TeX boxes, token lists, dimensions, glues, catcodes, direction parameters, math parameters, etc. The details can be found in the [http://www.luatex.org/documentation.html luaTeX  manual].
+
[[Category:Programming and Databases]]

Latest revision as of 13:23, 29 May 2021

Calling Lua from TeX

The interweaving of ConTeXt and Lua consists of two elements: first you tell TeX that you're starting some Lua code; then, once inside Lua, you need to use the appropriate functions to put things into the TeX stream.

There are two main ways to execute Lua code in a ConTeXt document: The command \ctxlua, and the environment \startluacode...\stopluacode. Both are wrappers around the LuaTeX primitive \directlua, which you should never need to use. In general, you will define a function inside a \startluacode block, and then define a TeX command that calls the function using \ctxlua, especially because \ctxlua has a few idiosyncracies.

The main thing about Lua code in a TeX document is this: the code is expanded by TeX before Lua gets to it. This means that all the Lua code, even the comments, must be valid TeX! A string like "\undefined" will cause an immediate failure.

Calling a bit of Lua inline: \ctxlua

The command \ctxlua is for short inline snippets of Lua, such as

$2 + 5 \neq \ctxlua{context(3+5)}$, but is equal to \ctxlua{context(2+5)}.
This is \ctxlua{context(string.upper("absolutely"))} true.

\ctxlua operates under the normal TeX catcodes (category codes). This means the following two things for the Lua code inside:

  • all newlines get treated as spaces
  • special TeX characters like &, #, $, {, }, etc., need to be escaped.

In addition, the warning above still holds. All the Lua code, even the comments, must be valid TeX.

Some code to illustrate the newline problem:

\ctxlua
  {-- A Lua comment
   tex.print("This is not printed")}
\ctxlua
  {% A Tex comment
   tex.print("This is printed")}

The problem with special TeX characters. (#t is Lua for 'the length of array t.)

% This doesn't work:
%\ctxlua
%  {local t = {1,2,3,4}
%   tex.print("length " .. #t)}
\ctxlua
  {local t = {1,2,3,4}
   tex.print("length " .. \string#t)}


Calling a lua function with \cldcontext and get the return

One can execute a Lua code from within TeX and get back the result in TeX by using \cldcontext. Thus, if myfunction is a function of a variable x defined in Lua, \cldcontext{myfunction(5)} returns the value myfunction(5) in TeX. This is equivalent to \ctxlua{context(myfunction(5))}.


A larger Lua block: \startluacode...\stopluacode

Inside the \startluacode...\stopluacode environment, newlines and special characters behave normally. This solves the catcode problem that \ctxlua suffers from. Apart from these special characters, the main warning remains in force: all the Lua code, even the comments, must be valid TeX.

\startluacode
    -- The unknown command \undefined will cause this entire block to fail.

    -- Print a countdown '10, 8, ..., 0!'
    -- `..` is Lua for string concatenation
    for i = 10, 2, -2 do
        context(i .. ", ")
    end
    context("0!")

    -- \\par is equivalent to a blank line in the input
    -- (Notice the escaped backslash: TeX won't mind the above comment.)
    context.par()

    -- Look! we can use # and $ with impunity!
    context("Unless we print them, then we must \\#\\$\\& print the escape characters, too.")
\stopluacode

Putting Lua code in an external file

You can put your lua code in an external file (with the .lua extension) and include it with the require command:

\startluacode
-- include the file my-lua-lib.lua
require("my-lua-lib")
\stopluacode

Namespaces

It is a good habit to put your custom-defined functions in their own namespace. The traditional namespace for this is userdata:

\startluacode
    -- if userdata doesn't exist yet, create it
    userdata = userdata or {}
    -- define a shorter synonym
    u = userdata

    -- create my custom function inside the userdata namespace
    function u.myfunction()
        -- do stuff
    end        
\stopluacode

The full list of canonical namespaces, taken from luat-ini.lua:

userdata      = userdata      or { } -- for users (e.g. functions etc)
thirddata     = thirddata     or { } -- only for third party modules
moduledata    = moduledata    or { } -- only for development team
documentdata  = documentdata  or { } -- for users (e.g. raw data)
parametersets = parametersets or { } -- experimental for team

If your module, environment, or document is going to be used by other people, you should create your own subnamespaces within these tables.

moduledata['mymodule'] = { }
mm = moduledata.mymodule
function mm.mainfunction()
    -- do stuff
end

Calling TeX from Lua

Being a topic on itself, pages are dedicated:

Putting stuff in your TeX document from Lua

Simple printing: context(), tex.print(), and tex.sprint()

Use context(...) for most things. It is equivalent to tex.print(string.format(...)), so

\startluacode
name = "Jane"
date = "today"
context("Hello %s, how are you %s?", name, date)
-- Becomes 'Hello Jane, how are you today?'
\stopluacode

More primitively, you have tex.print() and tex.sprint(). Either one can take as an argument either a number of strings, or an array of strings, and will then insert the strings into the TeX stream. The only difference is that tex.print() treats each string as a separate input line, while tex.sprint() doesn't. So the following lines

\ctxlua{tex.print("a", "b")}
\ctxlua{tex.print({"a", "b"})}

are both interpreted by TeX as

a
b

but when we use tex.sprint instead, either of the following

\ctxlua{tex.sprint("a", "b")}
\ctxlua{tex.sprint({"a", "b"})}

will be read by TeX as

ab

without any space in between.

Context commands

Most commands that you would type with a backslash in plain ConTeXt, you can access from Lua with context.command. Unadorned strings end up in TeX as arguments in curly braces; Lua tables end up in TeX as paramater blocks in square brackets. The following two pieces of code are equivalent:

\startluacode
    context.chapter({first}, "Some title")
    context.startcolumns({n = 3, rule = "on"})
        context("Hello one")
    context.column()
        context("Hello two")
    context.column()
        context("Hello three")
    context.stopcolumns()
\stopluacode
    \chapter[first]{Some title}
    \startcolumns[n=3, rule=on]
        Hello one
    \column
        Hello two
    \column
        Hello three
    \stopcolumns

For a fuller account of the context.commands, see the ConTeXt Lua document manual. It is old, but most of it still applies.

One final note: arguments can also be specified in the form of nested functions. Because LuaTeX evaluates the deepest-nested argument first, this may cause the context() calls to be evaluated in the wrong order. For more on this, see the article on ConTeXt Lua documents, and also, again, the CLD manual.

Passing arguments and buffers: ConTeXt commands that hook into Lua

Making \command{arg1}{arg2} hook into Lua

First, define a Lua function:

\startluacode
    -- remember, using the userdata namespace prevents conflicts
    userdata = userdata or {}

    function userdata.surroundwithdashes(str)
        context("--" .. str .. "--")
    end
\stopluacode

Then define the TeX command that expands to a \ctxlua call:

\def\surroundwd#1%
    {\ctxlua{userdata.surroundwithdashes([==[#1]==])}}

NB: quoting with [==[#1]==] (long strings) works just like "#1" in most cases, but in addition it is robust against #1 containing the quotation mark " which would terminate the Lua string prematurely. Inside \protect .. \unprotect the macros \!!bs and \!!es are at your disposition. They are equivalent to [===[ and ]===] and -- being single tokens to TeX -- parsed faster. (See luat-ini.mkiv.)

Making \startenv...\stopenv hook into Lua

The first job is, as ever, to have the Lua function at the ready

\startluacode
    userdata = userdata or {}

    function userdata.verynarrow(buffer)
        -- equivalent to \startnarrower[10em]
        context.startnarrower({"10em"})
            context(buffer)
        context.stopnarrower()
    end
\stopluacode

Next, we define the start command of our custom buffer:

\def\startverynarrow%
  {\dostartbuffer
    [verynarrow]      % buffer name
    [startverynarrow] % command where buffer starts
    [stopverynarrow]} % command where buffer ends
                      % also: command invoked when buffer stops

Lastly, we define the \stopverynarrow command such that it passes the recently-complated buffer to our verynarrow Lua function:

\def\stopverynarrow
  {\ctxlua
     {userdata.verynarrow(buffers.getcontent('verynarrow'))}}

And that's it! The rest of this article will consist of examples.

Examples

Arithmetic without using an abacus

This example demonstrates writing simple commands that invoke \ctxlua.

Doing simple arithmetic in TeX can be extremely difficult. With Lua, simple arithmetic becomes trivial. For example, if you want a macro to find the cosine of an angle (in degrees), you can write

\def\COSINE#1%
  {\ctxlua{context(math.cos(#1*2*math.pi/360))}}

The built-in math.cos function assumes that the argument is specified in radians, so we convert from degrees to radians on the fly. If you want to type the value of $\pi$ in an article, you can simply say

$\pi = \ctxlua{context(math.pi)}$ 

or, if you want less precision:

 
$\pi = \ctxlua{context("\letterpercent.6f", math.pi)}$

Notice that the percent sign is escaped with letterpercent.

mathexpr with LMTX

In LMTX there is a new way to use calculated expressions with mathexpr through (math-fun).

Some examples are:

$ \pi = \mathexpr[.40N]{pi}            $
$ \pi = \mathexpr[.80N]{sqrt(11)}      $
$ \pi = \decimalexpr[.80N]{sqrt(11)}   $
$ \pi = \decimalexpr{sqrt(11)}         $
$ c = \complexexpr{123 + new(456,789)} $

Loops without worrying about expansion

This example demonstrates using Lua to write a quasi-repetitive piece of ConTeXt code.

Loops in TeX are tricky, because macro assignments and macro expansion interact in strange ways. For example, suppose we want to typeset a table showing the sum of the roll of two dice and want the output to look like this:

\setupcolors[state=start]
\setupTABLE[each][each][width=2em,height=2em,align={middle,middle}]  
\setupTABLE[r][1][background=color,backgroundcolor=gray]  
\setupTABLE[c][1][background=color,backgroundcolor=gray]

\bTABLE  
  \bTR \bTD $(+)$ \eTD \bTD 1 \eTD \bTD 2 \eTD 
       \bTD 3 \eTD \bTD 4  \eTD \bTD 5  \eTD \bTD 6  \eTD \eTR  
  \bTR \bTD 1     \eTD \bTD 2 \eTD \bTD 3 \eTD 
       \bTD 4 \eTD \bTD 5  \eTD \bTD 6  \eTD \bTD 7  \eTD \eTR  
  \bTR \bTD 2     \eTD \bTD 3 \eTD \bTD 4 \eTD 
       \bTD 5 \eTD \bTD 6  \eTD \bTD 7  \eTD \bTD 8  \eTD \eTR  
  \bTR \bTD 3     \eTD \bTD 4 \eTD \bTD 5 \eTD
       \bTD 6 \eTD \bTD 7  \eTD \bTD 8  \eTD \bTD 9  \eTD \eTR  
  \bTR \bTD 4     \eTD \bTD 5 \eTD \bTD 6 \eTD
       \bTD 7 \eTD \bTD 8  \eTD \bTD 9  \eTD \bTD 10 \eTD \eTR  
  \bTR \bTD 5     \eTD \bTD 6 \eTD \bTD 7 \eTD 
       \bTD 8 \eTD \bTD 9  \eTD \bTD 10 \eTD \bTD 11 \eTD \eTR  
  \bTR \bTD 6     \eTD \bTD 7 \eTD \bTD 8 \eTD 
       \bTD 9 \eTD \bTD 10 \eTD \bTD 11 \eTD \bTD 12 \eTD \eTR  
\eTABLE

This is easy in LuaTeX. Once a Lua instance starts, TeX does not see anything until the Lua instance exits. So, we can write the loop in Lua, and simply print the values that we would have typed to the TeX stream. When the control is passed to TeX, TeX sees the input as if we had typed it by hand. This is the Lua code for the above table:

\setupcolors[state=start]
\setupTABLE[each][each][width=2em,height=2em,align={middle,middle}]  
\setupTABLE[r][1][background=color,backgroundcolor=gray]  
\setupTABLE[c][1][background=color,backgroundcolor=gray]

\startluacode
context.bTABLE()
  context.bTR()
    context.bTD() context("$(+)$") context.eTD()
    for j=1,6 do
      context.bTD() context(j) context.eTD()
    end
  context.eTR()
  for i=1,6 do
    context.bTR()
    context.bTD() context(i) context.eTD()
    for j=1,6 do
      context.bTD() context(i+j) context.eTD()
    end
    context.eTR()
  end
context.eTABLE()
\stopluacode

Parsing input without exploding your head

This example demonstrates parsing simple ASCII notation with Lua's lpeg parser.

As an example, let's consider typesetting chemical molecules in TeX. Normally, molecules should be typeset in text mode rather than math mode. If we want

H3SO4+,

we must type

H\low{3}SO\lohi{4}{\textplus},

but we'd much rather type

\molecule{H_3SO_4^+}.

So, we need a function that can take a string like that, parse it, and turn it into the appropriate TeX code. LuaTeX includes a general parser based on PEG (parsing expression grammar) called lpeg, and it makes writing little parsers positively joyful. (Once you've got the knack of it, at least.) For example, the above \molecule macro can be written as follows.

\startluacode

-- we will put our molecule function in the userdata namespace.
userdata = userdata or { }

-- The formatting functions into which the captured
-- superscript/subscript blocks will be fed
local formatters = { }

function formatters.low(one)
    return string.format("\\low{%s}", one)
end

function formatters.high(one)
    return string.format("\\high{%s}", one)
end

function formatters.lowhigh(one, two)
    return string.format("\\lohi{%s}{%s}", one, two)
end

function formatters.highlow(one, two,three)
    return string.format("\\lohi{%s}{%s}", one,two)
end

-- These are the characters we may encounter
-- The `/` means we want to expand + and - to \textplus c.q. \textminus;
-- this substition is not instant, but will take place inside the first 
-- surrounding lpeg.Cs() call.
local plus         = lpeg.P("+") / "\\textplus "
local minus        = lpeg.P("-") / "\\textminus "
local character    = lpeg.R("az", "AZ", "09") -- R is for 'range'
local subscript    = lpeg.P("_")              -- P is simply for 'pattern'
local superscript  = lpeg.P("^")
local leftbrace    = lpeg.P("{")
local rightbrace   = lpeg.P("}")

-- a ^ or _ affects either a single character, or a brace-delimited
-- block. Whichever it is, call it `content`.
local single    = character + plus + minus
local multiple  = leftbrace * single^1 * rightbrace
local content   = single + multiple

-- These are our top-level elements: non-special text, of course, and
-- blocks of superscript/subscript/both.
-- lpeg.Cs(content) does two things:
-- (1) not all matches go into the `/ function` construction; only
--     *captures* go in. The C in Cs stands for Capture. This way, 
--     the superscript/subscript mark gets discarded.
-- (2) it expands plus/minus before they go into the formatter. The
--     s in Cs stands for 'substitute in the replacement values, if any'
local text    = single^1
local low     = subscript * lpeg.Cs(content) 
                / formatters.low
local high    = superscript * lpeg.Cs(content) 
                / formatters.high
local lowhigh = subscript   * lpeg.Cs(content) * 
                superscript * lpeg.Cs(content) 
                / formatters.lowhigh
local highlow = superscript * lpeg.Cs(content) * 
                subscript * lpeg.Cs(content) 
                / formatters.highlow

-- Finally, the root element: 'moleculepattern'
local moleculepattern = lpeg.Cs((lowhigh + highlow + low + high + text)^0)

function thirddata.molecule(string)
    -- * `:match` returns the matched string. Our pattern
    --   `moleculepattern` should match the entire input string. Any
    --   *performed* substitutions are retained. (`.Cs()` performs a
    --   previously defined substitution.)
    -- * `context()` inserts the resulting string into the stream, ready for
    --   TeX to evaluate.
    context(moleculepattern:match(string))
end

\stopluacode

\def\molecule#1{\ctxlua{thirddata.molecule("#1")}}

\starttext
    \molecule{Hg^+}, \molecule{SO_4^{2-}}
\stoptext

Quite terse and readable by parser standards, isn't it?

Manipulating verbatim text

This example demonstrates defining a custom \start...\stop buffer that gets processed through Lua in its entirety.

Suppose we want to write an environment \startdedentedtyping ... \stopdedentedtyping that removes the indentation of the first line from every line. Thus, the output of ...

\startdedentedtyping
    #include <stdio.h>
    void main()
    {
        print("Hello world \n") ;
    }
\stopdedentedtyping

... should be the same as the output of ...

\starttyping
#include <stdio.h>
void main()
{
    print("Hello world \n") ;
}
\stoptyping

... even though the leading whitespace is different.

Defining an environment in TeX that removes the leading spaces but leaves other spaces untouched is complicated. On the other hand, once we capture the contents of the environment, removing the leading indent or dedenting the content in Lua is easy. Here is a Lua function that uses simple string substitutions.

\startluacode
  -- Initialize a userdata name space to keep our own functions in.
  -- That way, we won't interfere with anything ConTeXt keeps in 
  -- the global name space.
  userdata = userdata or {}

  function userdata.dedentedtyping(content)
    local lines    = string.splitlines(content)
    local indent   = string.match(lines[1], '^ +') or ''
    local pattern  = '^' .. indent
    for i=1,#lines do
      lines[i] = string.gsub(lines[i],pattern,"")
    end

    content = table.concat(lines,'\n')

    tex.sprint("\\starttyping\n" .. content .. "\\stoptyping\n")

    -- The typing environment looks for an explicit \type{\stoptyping}. So,
    --     context.starttyping() context(content) context.stoptyping()
    -- does not work. But
    --     context.starttyping() context(content) tex.sprint("\\stoptyping")
    -- does.
  end
\stopluacode

Here is the code for defining the \startdedentedtyping...\stopdedentedtyping pair:

% Create an environment that stores everything 
% between \startdedentedtyping and \stopdedentedtyping 
% in a buffer named 'dedentedtyping'.
\def\startdedentedtyping
  {\dostartbuffer
    [dedentedtyping]
    [startdedentedtyping]
    [stopdedentedtyping]}

% On closing the dedentedtyping environment, call the LuaTeX
% function dedentedtyping(), and pass it the contents of 
% the buffer called 'dedentedtyping'
\def\stopdedentedtyping
  {\ctxlua
     {userdata.dedentedtyping(buffers.getcontent('dedentedtyping'))}}

That's all. Finally, we will go into a little more detail on how TeX and Lua communicate with each other.

Other examples

In detail: the interaction between TeX and Lua

To a first approximation, the interaction between TeX and Lua is straightforward. When TeX (i.e., the LuaTeX engine) starts, it loads the input file in memory and processes it token by token. When TeX encounters \directlua, it stops reading the file in memory, fully expands the argument of \directlua, and passes the control to a Lua instance. The Lua instance, which runs with a few preloaded libraries, processes the expanded arguments of \directlua. This Lua instance has a special output stream which can be accessed using tex.print(...). The function tex.print(...) is just like the Lua function print(...) except that tex.print(...) prints to a TeX stream rather than to the standard output. When the Lua instance finishes processing its input, it passes the contents of the TeX stream back to TeX.[1] TeX then inserts the contents of the TeX stream at the current location of the file that it was reading; expands the contents of the TeX stream; and continues. If TeX encounters another \directlua, the above process is repeated.

As an exercise, imagine what happens when the following input is processed by LuaTeX. The answer is in the footnotes. [2]

\directlua%
  {tex.print("Depth 1 
           \\directlua{tex.print('Depth 2')}")}

For more on this, see the [1] article on the LuaTeX wiki.

Notes

  1. The output of tex.print(...) is buffered and not passed to TeX until the Lua instance has stopped.
  2. In this example, two different kinds of quotations are used to avoid escaping quotes. Escaping quotes inside \directlua is tricky. The above was a contrived example; if you ever need to escape quotes, you can use the \startluacode ... \stopluacode syntax.

NOTE: This article is originally based on this TugBoat article . Feel free to modify it.