Changes

Jump to navigation Jump to search
5,262 bytes removed ,  09:06, 22 September 2015
m
Corrected typo in URL to lpeg
{{note | This is a wikified version of [https://www.tug.org/members/TUGboat/tb30-2/tb95mahajan-luatex.pdf this TugBoat article ]. Feel free to modify it. }}= Calling Lua from TeX =
In this article, I explain how to use lua to write macros in [[LuaTeX]]. I give some examples The interweaving of ConTeXt and Lua consists of macros two elements: first you tell TeX that are complicated in [[PdfTeX]]you're starting some Lua code; then, but can be defined easily using lua in luaTeX. These examples include macros that do arithmetic on their argumentsonce inside Lua, you need to use loops, and parse their argumentsthe appropriate functions to put things into the TeX stream.
There are two main ways to execute Lua code in a ConTeXt document: The command <code>\ctxlua</code>, and the environment <code>\startluacode...\stopluacode</code>. Both are wrappers around the LuaTeX primitive <code>\directlua</code>, which you should never need to use. In general, you will define a function inside a <code>\startluacode</code> block, and then define a TeX command that calls the function using <code>\ctxlua</code>, especially because <code>\ctxlua</code> has a few idiosyncracies. The main thing about Lua code in a TeX document is this: the code is expanded by TeX ''before'' Lua gets to it. '''This means that all the Lua code, even the comments, must be valid TeX!''' A string like "\undefined" will cause an immediate failure. = Introduction =Calling a bit of Lua inline: \ctxlua == The command <code>\ctxlua</code> is for short inline snippets of Lua, suchas
As its name suggests, [[LuaTeX]] adds lua, a programming language, to TeX, the typesetter. This allows us to program TeX in a high-level programming language. For example, consider a TeX macro that divides two numbers. Such a macro is provided by the <tt>fp</tt> package and also by <tt>pgfmath</tt> library of the <tt>TikZ</tt> package. The following comment is from the <tt>fp</tt> package
<texcode>
$2 + 5 \defneq \FP@div#1#2.#ctxlua{context(3.#4+5)}$, but is equal to \relax#ctxlua{context(2+5)}.#6.#7This is \relaxctxlua{% % [..context(string.] algorithmic idea upper(for x>0, y>0"absolutely"))} true. % - %determine \FP@shift such that </texcode> % y*10^\FP@shift < 100000000 % <=y*10^(\FP@shift+1) % - %determine \FP@shift' such that % x*10^code>\FP@shift'ctxlua< 100000000 % <=x*10^/code> operates under the normal TeX catcodes (\FP@shift+1category codes). This means the following two things for the Lua code inside: % - x=x*\FP@shift'all newlines get treated as spaces % - y=y*\FP@shiftspecial TeX characters like &, #, $, {, }, etc., need to be escaped. % - \FP@shift=\FP@shift-\FP@shift' % - res=0In addition, the warning above still holds. All the Lua code, even the comments, must be valid TeX. % - while y>0 %fixed-point representation! % - \FP@times=0Some code to illustrate the newline problem: % - while x<texcode>y % - \FP@times=\FP@times+1ctxlua % {- x=x-yA Lua comment % - end tex.print("This is not printed")} % - y=y/10\ctxlua % - res=10*res+\FP@times/1000000000 {% - endA Tex comment % - %shift the result according to \FP@shift tex.print("This is printed")}
</texcode>
The problem with special TeX characters. (<ttcode>pgfmath#t</ttcode> library implements the macro in a similar way, but limits the number of shifts that it does. These macros highlight is Lua for 'the state length of affairs in writing TeX macros. Even simple things like multiplying two numbers are hard; you either have to work extremely hard to circumvent the programming limitations of TeX, or, more frequently, hope that someone else has done the hard work for you. In luaTeX, such a function can be written using the array <code>/t</code> operator (I will explain the details later.):
<texcode>
% This doesn't work:%\def\DIVIDE#ctxlua% {local t = {1,2,3,4}% tex.print("length " .. #2t)}\ctxlua {\directlualocal t = {1,2,3,4} tex.print("length " .. \string#1/#2t)}}
</texcode>
Thus, with luaTeX ordinary users can write simple macros; and, perhaps more importantly, can read and understand macros written by TeX wizards== A larger Lua block: \startluacode...\stopluacode ==
Since Inside the luaTeX project started it has been actively supported by ConTeXt. <refcode>Not surprising, as two of luaTeX's main developers&mdash;Taco Hoekwater and Hans Hagen&mdash;are also the main ConTeXt developers\startluacode...\stopluacode</refcode> These daysenvironment, newlines and special characters behave normally. This solves the various catcode problem that <emcode>How do I write such a macro\ctxlua</emcode> questions on suffers from. Apart from these special characters, the ConTeXt mailing list are answered by a solution that uses lua. I present a few such examples main warning remains in this article. I have deliberately avoided examples about [[Fonts in LuaTeX | fonts]] and non-Latin languages. There is already quite a bit of documentation about them. In this article, I highlight how to use luaTeX to write macros that require some <em>flow control</em>force: randomized outputsall the Lua code, loopseven the comments, and parsingmust be valid TeX.
= Interaction between TeX and lua =<texcode>\startluacode -- The unknown command \undefined will cause this entire block to fail.
To -- Print a first approximationcountdown '10, 8, the interaction between TeX and lua is straightforward. When TeX (i.e., the luaTeX engine) starts, it loads the input file in memory and processes it token by token0!' -- `. When TeX encounters <code>\directlua</code>, it stops reading the file in memory, <em>fully expands the argument of <code>\directlua</code></em>, and passes the control to a lua instance. The lua instance` is Lua for string concatenation for i = 10, which runs with a few preloaded libraries2, processes the expanded arguments of <code>\directlua</code>. This lua instance has a special output stream which can be accessed using <code>tex.print-2 do context(i ...)</code>. The function <code>tex.print(...", ")</code> is just like the lua function <code>print(...)</code> except that <code>tex.print(...)</code> prints to a <em>TeX stream</em> rather than to the standard output. When the lua instance finishes processing its input, it passes the contents of the <em>TeX stream</em> back to TeX.<ref>The output of <code>tex.print end context(..."0!")</code> is buffered and not passed to TeX until the lua instance has stopped.</ref> TeX then inserts the contents of the <em>TeX stream</em> at the current location of the file that it was reading; expands the contents of the <em>TeX stream</em>; and continues. If TeX encounters another <code>\directlua</code>, the above process is repeated.
As an exercise, imagine what happens when the following input -- \\par is processed by luaTeX. <ref>In this example, I used two different kinds of quotations equivalent to avoid escaping quotes. Escaping quotes inside <code>\directlua</code> is tricky. The above was a contrived example; if you ever need to escape quotes, you can use blank line in the input -- (Notice the escaped backslash: TeX won't mind the <code>\startluacode ..above comment. \stopluacode</code> syntax explained later) context.</ref>par()
<texcode> -- Look! we can use # and $ with impunity!\directlua% {tex.print context("Depth 1 Unless we print them, then we must \\#\\$\\directlua{tex& print the escape characters, too.print('Depth 2')}")}\stopluacode
</texcode>
On top of these luaTeX primitives, ConTeXt provides a higher level interface. There are two ways to call == Putting Lua code in an external file == You can put your lua from ConTeXt. The first is a macro <code>\ctxlua</code> in an external file (read as ConTeXt lua), which is similar to <code>\directluawith the </code>. (Aside: It is possible to run the lua instance under different name spaces. <code>\ctxlua</code> is extension) and include it with the default name space; other name spaces are explained later.) <code>\ctxluarequire</code> is good for calling small snippets of lua. The argument of <code>\ctxlua</code> is parsed under normal TeX catcodes (category codes), so the end of line character has the same catcode as a space. This can lead to surprises. For example, if you try to use a lua comment, everything after the comment gets ignored.command: 
<texcode>
\ctxluastartluacode {-- A include the file my-lua-lib.lua comment tex.printrequire("This is not printedmy-lua-lib")}\endluacode
</texcode>
This can be avoided by using == Namespaces == It is a TeX comment instead of a lua comment. However, working under normal TeX catcodes poses a bigger problem: special TeX characters like &, #, $, {, }, etc., need good habit to be escapedput your custom-defined functions in their own namespace. For example, # has to be escaped with The traditional namespace for this is <code>\stringuserdata</code> to be used in <code>\ctxlua</code>.:
<texcode>
\ctxluastartluacode {local -- if userdata doesn't exist yet, create it userdata = userdata or {1,2,3,4} tex -- define a shorter synonym u = userdata  -- create my custom function inside the userdata namespace function u.printmyfunction("length " .. ) -- do stuff end \string#t)}stopluacode
</texcode>
As the argument The full list of canonical namespaces, taken from [http://minimals.contextgarden.net/current/context/alpha/tex/context/base/luat-ini.lua luat-ini.lua]: <code>\ctxlua<pre>userdata = userdata or { } -- for users (e.g. functions etc)thirddata = thirddata or { } -- only for third party modulesmoduledata = moduledata or { } -- only for development teamdocumentdata = documentdata or { } -- for users (e.g. raw data)parametersets = parametersets or { } -- experimental for team</pre></code>  If your module, environment, or document is fully expandedgoing to be used by other people, escaping characters can sometimes be trickyyou should create your own subnamespaces within these tables. To circumvent this problem, ConTeXt defines a environment called  <code>\startluacode <pre>moduledata['mymodule'] = { }mm = moduledata.mymodulefunction mm.. \stopluacodemainfunction() -- do stuffend</pre></code> = Putting stuff in your TeX document from Lua = == Simple printing: context(), tex. This sets the catcodes to what one would expect in luaprint(), and tex. Basically only sprint() ==Use <code>\context(...)</code> has its usual TeX meaning, the catcode of everything else for most things. It is set equivalent to other. So, for all practical purposes, we can forget about catcodes inside <code>\startluacode tex.print(string.format(. \stopluacode..))</code>. The above two examples can be written as, so 
<texcode>
\startluacode
-- A lua comment tex.print(name = "This is printed.Jane") local t date = {1,2,3,4}"today" tex.printcontext("length Hello %s, how are you %s?" .. #t, name, date)-- Becomes 'Hello Jane, how are you today?'
\stopluacode
</texcode>
This environment is meant for moderately sized More primitively, you have <code snippets>tex. For longer lua print()</code> and <code>tex.sprint()</code>. Either one can take as an argument either a number of strings, or an array of strings, it and will then insert the strings into the TeX stream. The only difference is more convenient to write the that <code>tex.print()</code in > treats each string as a separate lua file and then load it using lua's input line, while <code>dofiletex.sprint()</code> doesn't.So the following lines <texcode>\ctxlua{tex.print("a", "b")}\ctxlua{tex.print({"a", "b"})}</codetexcode> function.  are both interpreted by TeX as
ConTeXt also provides a lua function to conveniently write to the TeX stream. The function is called <codetexcode>context(...)ab</codetexcode> and it is equivalent to  but when we use <code>tex.print(string.format(...))sprint</code>. instead, either of the following
Using the above, it is easy to define TeX macros that pass control to lua, do some processing in lua, and then pass the result back to TeX. For example, a macro to convert a decimal number to hexadecimal can be written simply, by asking lua to do the conversion.
<texcode>
\def\TOHEX#1{\ctxlua{contexttex.sprint("\%Xa",#1"b")}}\TOHEXctxlua{35tex.sprint({"a", "b"})}
</texcode>
The percent sign had to will be escaped because read by TeX as <codetexcode>\ctxluaab</codetexcode> assumes TeX catcodes without any space in between. Sometimes == Context commands == Most commands that you would type with a backslash in plain ConTeXt, escaping arguments you can be difficult; instead, it can be easier to define a lua function inside access from Lua with <code>\startluacode context... \stopluacode</codeem> and call it using command<code/em>\ctxlua</code>. For example, a macro that takes a comma separated list of Unadorned strings and prints a random item can be written end up in TeX as arguments in curly braces; Lua tables end up in TeX as paramater blocks in square brackets. The following two pieces of code are equivalent: 
<texcode>
\startluacode
userdata = userdata or context.chapter({first}, "Some title") math context.randomseedstartcolumns( os{n = 3, rule = "on"}) context("Hello one") context.timecolumn() context("Hello two") function userdata context.randomcolumn(...) context("Hello three") context(arg[math.randomstopcolumns(1, #arg)]) end
\stopluacode
\def\CHOOSERANDOM#1% {\ctxlua{userdata.random(#1)}}</texcode>
<texcode> \CHOOSERANDOMchapter[first]{"Some title} \startcolumns[n=3, rule=on] Hello one", " \column Hello two", " \column Hello three"} \stopcolumns
</texcode>
I could have written For a wrapper so that fuller account of the context.commands, see the function takes a list [http://www.pragma-ade.com/general/manuals/cld-mkiv.pdf ConTeXt Lua document] manual. It is old, but most of words and chooses a random word among themit still applies. For an example  One final note: arguments can also be specified in the form of such a conversionnested functions. Because LuaTeX evaluates the deepest-nested argument first, see this may cause the <emcode>sorting a list of tokenscontext()</emcode> page calls to be evaluated in the wrong order. For more on this, see the article on [[cld|ConTeXt Lua documents]], and also, again, the [http://luatexwww.bluwikipragma-ade.com/gogeneral/manuals/Sort_a_token_list luaTeX wikicld-mkiv.pdf CLD manual]. = Passing arguments and buffers: ConTeXt commands that hook into Lua = == Making \command{arg1}{arg2} hook into Lua ==First, define a Lua function: <texcode>\startluacode -- remember, using the userdata namespace prevents conflicts userdata = userdata or {}
In the above, I created a name space called <code> function userdata.surroundwithdashes(str) context("--" .. str .. "--") end\stopluacode</code> and defined the function <code>random</codetexcode> in that name space. Using a name space avoids clashes with the lua functions defined in luaTeX and ConTeXt.
In order Then define the TeX command that expands to avoid name clashes, ConTeXt also defines independent name spaces of lua instances. They are a <code>\ctxlua</code> call:
<texcode>\def\surroundwd#1% {|\ctxlua{userdata.surroundwithdashes([==[#1]==])}}</texcode> |- | '''user'NB'': quoting with <code>[==[#1]==]</code> | a private user instance ([http://www.lua.org/manual/5.2/manual.html#3.1 long strings]) |-works just like <code>"#1"</code> in most cases, but in addition | '''third'''it is robust against <code>#1</code> containing the quotation mark | third party module instance <code>"</code> which would terminate the Lua string prematurely. |-Inside <code>\protect .. \unprotect</code> the macros <code>\!!bs</code> | '''module'''and <code>\!!es</code> are at your disposition. | ConTeXt module instanceThey are equivalent to <code>[===[</code> and <code>]===]</code> and -- |being single tokens to TeX --parsed faster. | '''isolated''' | an isolated instance |}(See [http://repo.or.cz/w/context.git/blob/refs/heads/origin:/tex/context/base/luat-ini.mkiv#l174 <code>luat-ini.mkiv</code>].)
Thus, for example, instead of <code>\ctxlua</code> and <code>== Making \startluacode startenv... \stopluacode</code>stopenv hook into Lua ==The first job is, the <code>user</code> instance can be accessed via the macros <code>\usercode</code> and <code>\startusercode ... \stopusercode</code>. In instances other than <code>isolated</code>as ever, all the lua functions defined by ConTeXt (but not the inbuilt lua functions) are stored in a <code>global</code> name space. In to have the <code>isolated</code> instance, all lua functions defined by ConTeXt are hidden and cannot be accessed. Using these instances, we could write Lua function at the above <code>\CHOOSERANDOM</code> macro as followsready
<texcode>
\startusercode math.randomseed( global.os.time() ) function random(...)startluacode global.context(arg[math.random(1, #arg)]) end\stopusercodeuserdata = userdata or {}
function userdata.verynarrow(buffer) -- equivalent to \def\CHOOSERANDOM#1%startnarrower[10em] context.startnarrower({\usercode{random"10em"}) context(buffer) context.stopnarrower(#1)}} end\stopluacode
</texcode>
Since I defined the function <code>random</code> in the <code>user</code> instance of lua, I did not bother to use a separate name space for the function. The lua functions <code>os.time</code>, which is defined by a luaTeX library, and <code>context</code>, which is defined by ConTeXt, needed to be accessed through a <code>global</code> name space. On the other handNext, we define the <code>math.randomseed</code> function, which is part start command of lua, could be accessed as is. our custom buffer:
A separate lua instance also makes debugging slightly easier. With <code>\ctxlua</code> the error message starts with
<texcode>
! LuaTeX error &lt;main ctx instance&gt;\def\startverynarrow% {\dostartbuffer [verynarrow] % buffer name [startverynarrow] % command where buffer starts [stopverynarrow]} % command where buffer ends % also:command invoked when buffer stops 
</texcode>
With Lastly, we define the <code>\usercodestopverynarrow</code> command such that it passes the error message starts withrecently-complated buffer to our <code>verynarrow</code> Lua function: 
<texcode>
! LuaTeX error &lt;private user instance&gt;:\def\stopverynarrow {\ctxlua {userdata.verynarrow(buffers.getcontent('verynarrow'))}}
</texcode>
This makes And that's it easier to narrow down the source ! The rest of errorthis article will consist of examples.
Normally, it is best to define your lua functions in the <code>user</code> name space. If you are writing a module, then define your lua functions in the <code>third</code> instance and in a name space which is the name of your module. In this article, I will simply use the default lua instance, but take care to define all my lua functions in a <code>userdata</code> name space.= Examples =
Now that we have some idea of how to work with luaTeX, let's look at some examples.== Arithmetic without using an abacus ==
= Arithmetic without using a abacus =''This example demonstrates writing simple commands that invoke \ctxlua.''
Doing simple arithmetic in TeX can be extremely difficult, as illustrated by the division macro in the introduction. With luaLua, simple arithmetic becomes trivial. For example, if you want a macro to find the cosine of an angle (in degrees), you can write
<texcode>
\def\COSINE#1%
$\pi = \ctxlua{context(math.pi)}$
</texcode>
or , if you want less precision (notice the percent sign is escaped):
<texcode>
$\pi = \ctxlua{context("\%.6f", math.pi)}$
</texcode>
Notice that the percent sign is escaped.
= Loops without worrying about expansion =
== Loops without worrying about expansion == ''This example demonstrates using Lua to write a quasi-repetitive piece of ConTeXt code.'' Loops in TeX are tricky , because macro assignments and macro expansion interact in strange ways. For example, suppose we want to typeset a table showing the sum of the roll of two dice and want the output to look like this:
<context source="yes">
\setupcolors[state=start]
</context>
The tedious (but faster!) way to achieve this This is to simply type easy in LuaTeX. Once a Lua instance starts, TeX does not see anything until the whole table by handLua instance exits.  It is however natural to want to So, we can write this table as a the loopin Lua, and compute simply print the valuesthat we would have typed to the TeX stream. When the control is passed to TeX, TeX sees the input as if we had typed it by hand. A first ConTeXt implementation using This is the Lua code for the recursion level might beabove table:
<texcode>
\bTABLE setupcolors[state=start] \bTR \bTD $(+)$ \eTD \dorecursesetupTABLE[each][each][width=2em,height=2em,align={6middle,middle} ] {\bTD \recurselevel \eTD} \eTR \dorecurse{6} {\bTR \bTD \recurselevel \eTD \edef\firstrecurselevel{\recurselevel} \dorecurse{6} setupTABLE[r][1][background=color,backgroundcolor=gray] {\bTD \the\numexpr\firstrecurselevel+\recurselevel \eTD}% \eTR} \eTABLE</texcode> HoweversetupTABLE[c][1][background=color, this does not work as expected, yielding all zeros. A natural table stores the contents of all the cells, before typesetting it. But it does not expand the contents of its cell before storing them. So, at the time the table is actually typeset, TeX has already finished the <code>\dorecurse</code> and <code>\recurselevel</code> is set to 0. The solution is to place <code>\expandafter</code> at the correct location(s) to coax TeX into expanding the <code>\recurselevel</code> macro before the natural table stores the cell contents. The difficult part is figuring out the exact location of <code>\expandafter</code>s. Here is a solution that works:backgroundcolor=gray]
<texcode>
\bTABLE
\bTR
\bTD $(+)$ \eTD
\dorecurse{6}
{\expandafter \bTD \recurselevel \eTD}
\eTR
\dorecurse{6}
{\bTR
\edef\firstrecurselevel{\recurselevel}
\expandafter\bTD \recurselevel \eTD
\dorecurse{6}
{\expandafter\bTD
\the\numexpr\firstrecurselevel+\recurselevel
\relax
\eTD}
\eTR}
\eTABLE
</texcode>
 
We only needed to add three <code>\expandafter</code>s to make the naive loop work. Nevertheless, finding the right location of <code>\expandafter</code> can be frustrating, especially for a non-expert.
 
By contrast, in luaTeX writing loops is easy. Once a lua instance starts, TeX does not see anything until the lua instance exits. So, we can write the loop in lua, and simply print the values that we would have typed to the TeX stream. When the control is passed to TeX, TeX sees the input as if we had typed it by hand. Consequently, macro expansion is no longer an issue. For example, we can get the above table by:
<texcode>
\startluacode
context.bTABLE()
</texcode>
The lua functions such as <code>context.bTABLE()</code> and <code>context.bTR()</code> are just abbreviations for running <code>context ("\\bTABLE")</code>, <code>context("\\bTR")</code>, etc. See the [http://www.pragma-ade.com/general/manuals/cld-mkiv.pdf ConTeXt lua document] manual for more details about such functions. The rest of the code is a == Parsing input without exploding your head == ''This example demonstrates parsing simple nested for-loop that computes the sum of two diceASCII notation with Lua's lpeg parser. We do not need to worry about macro expansion at all!''
= Getting As an example, let's consider typesetting chemical molecules in TeX. Normally, molecules should be typeset in text into LuaTex verbatim =mode rather than math mode. If we want :H<sub>4</sub>SO<sub>4</sub><sup>+</sup>,we must type :<code>H\low{3}SO\lohi{4}{\textplus}</code>,but we'd much rather type:<code>\molecule{H_3SO_4^+}</code>.
This section gives example code for passing text to So, we need a call to function that can take a string like that, parse it, and turn it into the appropriate TeX code. LuaTeX function verbatim. The first subsection shows how to define includes a macro <code>\Dedent{general parser based on PEG (parsing expression grammar) called [http://www.inf.puc-rio.}<br/~roberto/lpeg/code> that processes its argument in LuaTeX; the second subsection shows how to define an environment <code>\startDedent lpeg.html lpeg], and it makes writing little parsers positively joyful.. \stopDedent</code> that does (Once you've got the sameknack of it, at least. First) For example, however, here is the LuaTeX above <code>dedent()\molecule</code> functionmacro can be written as follows.
<texcode>
\startluacode
-- Keep we will put our own functions out of molecule function in the global userdata namespace. thirddata userdata = thirddata userdata or {} -- The formatting functions into which the captured-- superscript/subscript blocks will be fedlocal formatters = { } function formatters.low(one) return string.format("\\low{%s}", one)end
thirddatafunction formatters.dedent = functionhigh(codeone) -- Finds out by how much the first line is indented return string.format("\\high{%s}", one) -- and dedents the entire block by that much. -- N.B. This function does not handle tabs.end
-- First some debugging stuff: dump the input to a text file -- to check that we got what we wanted. -- Feel free to delete these three linesfunction formatters.lowhigh(one, two) logfile = io return string.openformat("debug.txt\\lohi{%s}{%s}", "w"one, two) logfile:write(code) logfile:close()end
-- Now for the meat of itfunction formatters. -- How many leading spaces in the first line? lead = string.matchhighlow(codeone, two, '^ +'three) or '' -- remove lead from every line code = return string.gsubformat(code"\\lohi{%s}{%s}", '^' .. leadone, ''two) code = string.gsub(code, '\n' .. lead, '\n')end
-- print These are the resulting starttyping environment characters we may encounter-- The `/` means we want to expand + and - to \textplus c.q. \textminus;-- this substition is not instant, but will take place inside the TeX streamfirst -- (contextsurrounding lpeg.starttypingCs() doesn't seem to work)call.local plus tex= lpeg.sprintP("+") / "\\starttypingtextplus " local minus = lpeg.. P("-") / "\n\textminus " local character = lpeg.R("az", "AZ", "09") -- R is for 'range'local subscript = lpeg. P("_") -- P is simply for 'pattern' code local superscript = lpeg.P("^")local leftbrace = lpeg. P("\\stoptyping{" )local rightbrace = lpeg.. P("\n}") end-- a ^ or _ affects either a single character, or a brace-delimited-- block. Whichever it is, call it `content`.local single = character + plus + minuslocal multiple = leftbrace * single^1 * rightbracelocal content = single + multiple
\stopluacode-- These are our top-level elements: non-special text, of course, and<-- blocks of superscript/texcode>subscript/both.-- lpeg.Cs(content) does two things:-- (1) not all matches go into the `/ function` construction; only-- *captures* go in. The C in Cs stands for Capture. This way, -- the superscript/subscript mark gets discarded.-- (2) it expands plus/minus before they go into the formatter. The-- s in Cs stands for 'substitute in the replacement values, if any'local text = single^1local low = subscript * lpeg.Cs(content) / formatters.lowlocal high = superscript * lpeg.Cs(content) / formatters.highlocal lowhigh = subscript * lpeg.Cs(content) * superscript * lpeg.Cs(content) / formatters.lowhighlocal highlow = superscript * lpeg.Cs(content) * subscript * lpeg.Cs(content) / formatters.highlow
-- Finally, the root element: 'moleculepattern'local moleculepattern == A macro that passes its argument to LuaTeX verbatim ==lpeg.Cs((lowhigh + highlow + low + high + text)^0)
<!function thirddata.molecule(string) -- wikify \obeylines, \obeyspaces, \begingroup, \normalunexpanded * `:match` returns the matched string. Our pattern --> `moleculepattern` should match the entire input string. AnyThis code uses the two --macro pattern *performed* substitutions are retained. (`. The first macro begins Cs()` performs a grouping with<code>\begingroup</code>, initialises <code>\obeylines</code> and<code>\obeyspaces</code>, and finally calls <code>\doDedent</code> -- previously defined substitution. The)<code>\doDedent</code> macro, after performing -- * `context()` inserts the resulting string into the call to LuaTexstream, then closesready forthe grouping, thereby turning off the no -longer-needed<code>\obeylines</code>and <code>\obeyspaces</code> TeX to evaluate. Macros' contents get expanded before they areevaluated, so if this were all one single macro its argument <code>#1</code>would get expanded the normal way before the <code>\obey...</code> could take context(moleculepattern:match(string))effect.end
<texcode>%% This code is due to Wolfgang Schuster on the ntg-ConTeXt mailing list.\unprotectstopluacode % Two-macro structure to define \Dedent{...} \def\Dedent{\begingroup\obeylines\obeyspaces\doDedent} \def\doDedentmolecule#1{\directluactxlua{ thirddata.dedentmolecule(\!!bs\normalunexpanded{"#1}\!!es, 4")} \endgroup}\protect
% Try it out
\starttext
\DedentOnemolecule{ andra moi ennepeHg^+}, \molecule{SO_4^{2-}}
\stoptext
</texcode>
== An environment that passes its contents to LuaTeX verbatim ==Quite terse and readable by parser standards, isn't it?
There are two ways to define a startstop environment so that it passes itscontents to LuaTeX == Manipulating verbatim. The first resembles the macro definition in theprevious structure. The second makes use of<code>\dostartbuffer[</code><i>name</i><code>][</code><i>bufferstart</i><code>][<i>bufferstop</i><code>]</code>, which creates a pair of buffer commands that write their contents verbatim to a named buffer. This buffer can then be accessed in LuaTeX via <code>buffers.content(</code><i>name</i><code>)</code>. Examples of both are provided below.text ==
<!--''This example demonstrates defining a custom \start...\stop buffer that gets processed through Lua in its entirety.''[[Inside_ConTeXt#Passing_verbatim_text_as_macro_parameter|Inside Context]] article on this wikiSuppose we want to write an environment <code>\startdedentedtyping</code> ... <code>\stopdedentedtyping</code> that removes the indentation of the first line from every line. Luckily, howeverThus, ConTeXt provides the -->output of ...
First, a definition using the two-macro pattern seen above. The cleaner named-buffer pattern is below.
<texcode>
% This code, too, is due to Wolfgang Schuster on the ntg-ConTeXt mailing list.\unprotectstartdedentedtyping \def\startDedentA#include &lt;stdio.h&gt; {\begingroup \obeylines\obeyspaces% \dostartDedentA} void main() \def\dostartDedentA#1\stopDedentA{ {\ctxlua{thirddata.verbatimprint("Hello world \!!bs\normalunexpanded{#1}\!!es, 5n")}%; \endgroup }\protect % Try it out:\starttext \startDedentA andra moi ennepe \stopDedentA\stoptextstopdedentedtyping
</texcode>
... should be the same as the output of ...
<texcode>
%% Define a buffer \startDedentA ... \stopDedentB that feeds its contents%% to LuaTex for processing.starttyping% Tell TeX what strings the dedentBuffer starts and ends with#include &lt;stdio.h&gt;\def\startDedentB {\dostartbuffer[dedentBuffer][startDedentB][stopDedentB]}% Make the ending command void main(which will also signal the buffer's end)% call LuaTeX to read out the buffer and dedent its contents.\def\stopDedentB{ {\directlua{ local code = buffers.contentprint("dedentbufferHello world \n"); thirddata.dedent(buffers.content(code))}% Try it out:\starttext \startDedentB andra moi ennepe \stopDedentB\stoptextstoptyping
</texcode>
... even though the leading whitespace is different.
Don't worry about this code re-using Defining an environment in TeX that removes the same named buffer each time: eachleading spaces but leaves<code>\stopDedentB</code> immediately uses other spaces untouched is complicated. On the other hand, once we capture the <code>dedentBuffer</code>'s contentsof the environment, and each <code>\startDedentB</code> flushes removing the bufferleading indent or ''dedenting''s contents, so you can use as many <code>\startDedentB</code> blocks as you wantthecontent in Lua is easy. Here is a Lua function that uses simple stringsubstitutions.
= Parsing input without exploding your head =
 
In order to get around the weird rules of macro expansion, writing a parser in TeX involves a lot of macro jugglery and catcode trickery. It is a black art, one of the biggest mysteries of TeX for ordinary users.
 
As an example, let's consider typesetting chemical molecules in TeX. Normally, molecules should be typeset in text mode rather than math mode. For example, <context>H\low{2}SO\lohi{4}{--}</context>, can be input as <code>H\low{2}SO\lohi{4}{--}</code>. Typing so much markup can be cumbersome. Ideally, we want a macro such that we type <code>\molecule{H_2SO_4^-}</code> and the macro translates this into <code>H\low{2}SO\lohi{4}{--}</code>. Such a macro can be written in TeX as follows.
<texcode>
\newbox\chemlowbox startluacode\def\chemlow#1% -- Initialize a userdata name space to keep our own functions in. -- That way, we won't interfere with anything ConTeXt keeps in {\setbox\chemlowbox-- the global name space. \hbox{ userdata = userdata or {\switchtobodyfont[small]#1}}}
\def\chemhigh#1% function userdata.dedentedtyping(content) local lines = string.splitlines(content) local indent {\ifvoid\chemlowbox \high{{\switchtobodyfont= string.match(lines[small1], '^ +') or '' local pattern = '^' .. indent for i=1,#1}}% \else lines do \lohi{\box\chemlowbox} {{\switchtobodyfontlines[i] = string.gsub(lines[smalli]#1}},pattern,"") \fi} end
content = table.concat(lines,'\def\finishchem% {\ifvoid\chemlowbox\else \low{\box\chemlowbox} \fi} n')
\unexpanded\def\molecule% {\bgroup \catcode`\_=\active \uccode`\~=`\_ \uppercase{\let~\chemlow}% \catcode`\^=\active \uccode`\~=`\^ \uppercase{\let~\chemhigh}% \dostepwiserecurse {65}{90}{1} {\catcode \recurselevel = \active \uccode`\~=\recurselevel \uppercase{\edef~{\noexpand\finishchem \rawcharacter{\recurselevel}}}}% tex.sprint("\catcode`\-=starttyping\active n" .. content .. "\uccode`\~=`\- \uppercase{\def~{--}}% stoptyping\domolecule }% n")
-- The typing environment looks for an explicit \deftype{\stoptyping}. So, -- context.starttyping() context(content) context.stoptyping() -- does not work. But -- context.starttyping() context(content) tex.sprint("\domolecule#1{#1\finishchemstoptyping") -- does. end\egroup}stopluacode
</texcode>
This monstrosity Here is a typical TeX parser. Appropriate characters need to be made active; occasionally, <code>\lccode</code> and <code>\uccode</the code> need to be set; signaling tricks are needed (for instance, checking if defining the <code>\chemlowbox</code> is void); and then magic happens (or so it seems to a flabbergasted user). More sophisticated parsers involve creating finite state automata, which look even more monstrous. With luaTeX, things are different. luaTeX includes a general parser based on PEG (parsing expression grammar) called [http://www.inf.puc-rio.br/roberto/lpeg/lpegstartdedentedtyping.html lpeg]. This makes writing parsers in TeX much more comprehensible. For example, the above <code>\moleculestopdedentedtyping</code> macro can be written aspair:
<texcode>
% Create an environment that stores everything % between \startdedentedtyping and \stopdedentedtyping % in a buffer named 'dedentedtyping'.\def\startluacodestartdedentedtypinguserdata = userdata or {\dostartbuffer [dedentedtyping] [startdedentedtyping] [stopdedentedtyping]}
local lowercase = lpeg.R("az")local uppercase = lpeg.R("AZ")local backslash = lpeg.P("\\")% On closing the dedentedtyping environment, call the LuaTeXlocal csname = backslash * lpeg.P% function dedentedtyping(1) , and pass it the contents of * (1-backslash)^0% the buffer called 'dedentedtyping'local plus = lpeg.P("+") / "\\textplus "local minus = lpeg.P("-") / "\def\textminus "local digit = lpeg.R("09")local sign = plus + minuslocal cardinal = digit^1stopdedentedtypinglocal integer = sign^0 * cardinallocal leftbrace = lpeg.P("{")local rightbrace = lpeg.P("}")local nobrace = 1 - (leftbrace + rightbrace)\ctxlualocal nested = lpeg{userdata.P {leftbrace * dedentedtyping(csname + sign + nobrace + lpegbuffers.Vgetcontent(1'dedentedtyping'))^0 * rightbrace}}local any = lpeg.P(1)</texcode>
local subscript = lpegThat's all.P("_")local superscript = lpegFinally, we will go into a little more detail on how TeX and Lua communicate with each other.P("^")local somescript = subscript + superscript
local content = lpeg.Cs(csname + nested + sign + any)In detail: the interaction between TeX and Lua =
local lowhigh = lpegTo a first approximation, the interaction between TeX and Lua is straightforward.CcWhen TeX ("i.e., the LuaTeX engine) starts, it loads the input file in memory and processes it token by token. When TeX encounters <code>\directlua</code>, it stops reading the file in memory, <em>fully expands the argument of <code>\directlua</code></em>, and passes the control to a Lua instance. The Lua instance, which runs with a few preloaded libraries, processes the expanded arguments of <code>\lohi{%s}{%s}") * subscript * content * superscript * content directlua</ stringcode>.formatlocal highlow = lpegThis Lua instance has a special output stream which can be accessed using <code>tex.Ccprint("\\hilo{%s}{%s}"...) * superscript * content * subscript * content </ stringcode>.formatlocal low = lpegThe function <code>tex.Ccprint("\\low{%s}"...) * subscript * content </ stringcode> is just like the Lua function <code>print(..formatlocal high = lpeg.Cc("\\high{%s}") * superscript * content </ stringcode> except that <code>tex.formatlocal justtext = print(1 - somescript...)^1local parser = lpeg</code> prints to a <em>TeX stream</em> rather than to the standard output. When the Lua instance finishes processing its input, it passes the contents of the <em>TeX stream</em> back to TeX.<ref>The output of <code>tex.Csprint((csname + lowhigh + highlow + low + high + sign + any)^0...)</code> is buffered and not passed to TeX until the Lua instance has stopped.</ref> TeX then inserts the contents of the <em>TeX stream</em> at the current location of the file that it was reading; expands the contents of the <em>TeX stream</em>; and continues. If TeX encounters another <code>\directlua</code>, the above process is repeated.
userdataAs an exercise, imagine what happens when the following input is processed by LuaTeX.moleculeparser = parser The answer is in the footnotes. <ref>In this example, two different kinds of quotations are used to avoid escaping quotes. Escaping quotes inside <code>\directlua</code> is tricky. The above was a contrived example; if you ever need to escape quotes, you can use the <code>\startluacode ... \stopluacode</code> syntax.</ref>
function userdata<texcode>\directlua% {tex.moleculeprint(str)"Depth 1 return parser:match \\directlua{tex.print(str'Depth 2')}")}end\stopluacode</texcode>
\def\molecule#1% {\ctxlua{userdataFor more on this, see the [http://wiki.luatex.molecule("#1")}}<org/texcode> index.php/Writing_Lua_in_TeX] article on the [http://wiki.luatex.org/index.php/Main_Page LuaTeX wiki].
This is more verbose than the TeX solution, but is easier to read and write. With a proper parser, I do not have to use tricks to check if either one or both <code>_</code> and <code>^= Notes =<references /code> are present. More importantly, anyone (once they know the lpeg syntax) can read the parser and easily understand what it does. This is in contrast to the implementation based on TeX macro jugglery which require you to implement a TeX interpreter in your head to understand. = Conclusion = luaTeX is removing many TeX barriers: using system fonts, reading and writing Unicode files, typesetting non-Latin languages, among others. However, the biggest feature of luaTeX is the ability to use a high-level programming language to program TeX. This can potentially lower the learning curve for programming TeX.
In this {{note | This article, I have mentioned only one aspect of programming TeX: macros that manipulate their input and output some text to the main TeX stream. Many other kinds of manipulations are possible: luaTeX provides access to TeX boxes, token lists, dimensions, glues, catcodes, direction parameters, math parameters, etc. The details can be found in the is originally based on [httphttps://www.luatextug.org/documentationmembers/TUGboat/tb30-2/tb95mahajan-luatex.html luaTeX manualpdf this TugBoat article ].Feel free to modify it.}}
= Notes =[[Category:Lua]]<references />[[Category:LuaTeX]]
2

edits

Navigation menu