Changes

2,283 bytes added , 19:11, 5 September 2020

m

typo

{{todo|This page documents the situation that will become active in a few days/a week when the newly developed wiki extension will be ported over from the test wiki. This page is written in preparation. Soon, this todo block will be deleted.

We have a heatwave right now, and even though there are some things in the extension that I still want to improve on, it is too hot in the actual office to do any programming. Instead, I am on the couch in front of a fan.

~~--[[User:Taco|Taco]] ([[User talk:Taco|talk]]) 19:48, 10 August 2020 (CEST)}}~~

== An extension for editing `/Command` subpages ==

The ConTeXtXML extension is a new wiki feature specifically designed to edit the ConTeXt command ~~referencs~~ reference pages (the ones that live under the `/Command/` URL.

It does this by intercepting the creation of new wiki pages below `/Command/`, and using a ContentHandler extension to maintain those pages. The text model of those pages is `contextxml`, which is a special XML format developed for documenting ConTeXt commands that is based in the interface XML files by Wolfgang Schuster.

Sets the content model to `contextxml` if the wiki page title starts with `/Command`.

~~=== `PageContentSave` ===~~

~~On save, this saves `contextxml` pages to a designated harddisk location as well as in the wiki database.~~

=== `ArticleAfterFetchContentObject` ===

This fills the edit area for newly created `/Command` pages from the file on the harddisk

~~=== `EditPageNoSuchSection` ===~~

Error hook that is triggered if the user tried to edit a section that is generated from wiki code instead of from the XML data. This is an error because it is quite hard to extract the right block of text in that case and still keep track of where it is in relation to the XML data.

~~=== `EditPage::showEditForm:fields` ===~~

~~Prints a simple help message at the top of the edit field for `/Command` pages.~~

== Generating the wikitext code for page views and previews ==

== Implementation notes ==

=== Command disk files ===

The extension has three types of data files on the filesystem:

* XML files for command definitions

* Verification tables for command definitions

* Wiki text files for instance pages

Generally, the file names follow the logic of the wiki page title, except with the prefix <code>cmd-</code> instead of <code>/Command</code>.

The file extension for the XML files is <code>.xml</code>, the file extension for the verification table lua dump is <code>-test.lua</code>, the file extension for instance pages (redirects) is <code>.wiki</code>

However, in order to appease case-preserving and case-sensitive file systems, all uppercase letters in the filename are prefixed with a <code>^</code> character. A simple example: <code>Command/WEEKDAY</code> is stored on disk as <code>cmd-^W^E^E^K^D^A^Y.xml</code>, and its verification table is stored in <code>cmd-^W^E^E^K^D^A^Y-test.lua</code>.

=== XML parser ===

The extension uses a hardwritten simple XML parser in pure Lua. The parser is expat-style and the implementation is based on string.find() and string.sub(). The advantage of this approach is that it can handle bad XML input by throwing an appropriate (and understandable) error. Neither the Lpeg-based Lua parser from the 13th ConTeXt meeting nor the ConTeXt built-in parser allow for that. Both those parsers assume well-formed XML as input.

A tailored parser also allowed for easy extension to deal with the CDATA issue mentioned below.

But the main motivation for a private dedicated parser written in Lua is that we want to be able to not only check the well-formedness of the XML, but also its adherence to a set of extra rules:

# The documentation should not modify the argument structure of the command’s formal specification, only add explanations to it. Theoretically, each of the 3900+ formal specifications has its own private XML Schema.

# The documentation should be easily parseable by an external system, meaning that use of wiki code and HTML tags need to be governed.

These additional rules made using the DOM-based parser in php unwieldy, for me. I am sure a good php programmer could implement these extra checks, but not me. At least not in a reasonable amout of time. But I knew how to tackle both requirements using Lua, and could write an implementation quite quickly and effortlessly.

The first point is handled like this:

*When a fresh set of ‘virgin’ XML files is created from <code>context-en.xml</code>, each separate file is parsed using a set of functions that create a lua table representing the ‘virginal’ parse tree of the XML file. This Lua table is dumped to disk and distributed along with the XML file.

*When a wiki user presses the ‘Save’ button in the page editor, their edited XML is parsed using a slightly different set of functions from the ones for viewing. These functions in this set skip all documentation content while building the parse tree. The two lua tables representing the parse trees are then compared. They should be identical. If not, an error is raised and the save action is aborted with a user-visible error message.

The second point is taken care of during that same XML parse step of the user page revision. It uses a combination of a tag lookup table and string text matching to make sure the user followed the rules (as explained in [[Help:Command]]).

=== About those extension tags ===

The special tags <code><nowiki><texcode></nowiki></code>, <code><nowiki><xmlcode></nowiki></code>, and <code><nowiki><context></nowiki></code> on our wiki are handled by an extension (<~~code~~tt>context</~~code~~tt>) written a long time ago by Patrick Gundlach. That extension converts the parsed XML output from mediawiki into HTML code that looks 'right'. In normal wiki pages this works, because the mediawiki parser is quite forgiving (more like a HTML browser than a XML parser) and does some recovery attempts itself when a user types in something that is not quite well-formed HTML/XML.

For example, in a normal wiki page you do not need to properly quote the attributes of <code><nowiki><context></nowiki></code>. And the structure within <code><nowiki><xmlcode></nowiki></code> does not have to be properly nested.

But it also sometimes backfires. If you use a XML tag name inside a <code><nowiki><context source="yes"></nowiki></code> call or within <code><nowiki><texcode></nowiki></code>, it will not be displayed in the verbatim display section of the page (but it will be seen by ConTeXt while processing the <code><nowiki><context></nowiki></code>).

To solve this question between 'is it data?' and 'is it markup>?' in a standalone XML file, you would wrap a CDATA section around things like the content of <code><nowiki><xmlcode></nowiki></code>. But unfortunately that is something that either the mediawiki parser or the <code>context</code> or the HTML browser does not understand (I don't know which is the exact problem).

For now, within the ConTeXtXML XML parser,I decided to treat the content of <code><nowiki><texcode></nowiki></code>, <code><nowiki><xmlcode></nowiki></code>, and <code><nowiki><context></nowiki></code> 'as if' they are SGML elements with data model CDATA. That means that the generated XML files on disk that make use of this feature are not actually well-formed, for example this content of <code><nowiki><xmlcode></nowiki></code>:

<pre>

</document>

</xmlcode>

</pre>

should actually be this:

<pre>

<xmlcode><![CDATA[

</document>

]]></xmlcode>

</pre>

but then it could not be displayed on the wiki properly, or (with some internal patching by ConTeXtXML) there would be a constant difference between the XML version on disk and the wiki database version of a page (resulting in endless 'This revision is outdated' messages). So, I think this is the best solution for now.

Taco

Bureaucrats, Interface administrators, Administrators

3,897

edits

Changes

Extension:ConTeXtXML (view source)

Revision as of 19:11, 5 September 2020

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Main

Navigation

Indexes

Interaction

Tools