Changes

Unicode blocks in ConTeXt (view source)

Revision as of 16:09, 4 November 2017

1,919 bytes added , 16:09, 4 November 2017

no edit summary

ConTeXt has special names for all Unicode blocks. These names can be used to specify ranges of code points in the setups of several commands.

This article uses some basic terms, such as ''character'', ''code point'', and ''assigned code point'', from the Unicode Standard<ref name="Unicode">The Unicode Consortium, ''The Unicode Standard'', Version 10.0.0, The Unicode Consortium, Mountain View, CA, USA, 2016, http://www.unicode.org/versions/Unicode10.0.0/, Retrieved 2017-11-03.</ref>. For brief descriptions of these terms, see the Unicode glossary<ref name="Unicode-glossary">The Unicode Consortium, ''Glossary'', http://www.unicode.org/glossary/, Retrieved 2017-11-03.</ref>.

== Unicode blocks ==

A Unicode block , or, simply, a block, is ~~an organisational unit~~ any of the subsets of the Unicode code spacethat are listed in the file {{code|Blocks.txt}}<ref name="Blocks">The Unicode Consortium, ''Blocks.txt'', ftp://www.unicode.org/Public/UNIDATA/Blocks.txt, Retrieved 2017-11-03.</ref> of the Unicode Character Database. The Unicode code space is the set of all code points, that is, the set of all integers from 0 to the integer whose hexadecimal representation is 10FFF~~. The official list of the blocks is available [ftp://www.unicode.org/Public/UNIDATA/Blocks.txt at the Unicode Web site]~~.

The main properties of blocks are described in the Unicode Standard<ref name="Unicode"/> (Section 3.4, paragraph D10b). Every block is an interval of code points~~. Different~~ , and distinct blocks are disjoint from each other~~, and every code point belongs to at least one block~~. ~~Thus~~In particular, the blocks form a partition of a subset of the ~~set of all~~ Unicode code ~~points. The number of code points in a block varies. Some have just 16 code points, and some others have thousands of code points~~space.

A ~~code~~ block starts at a code point that is a multiple of 16. The number of code points in each block is also a multiple of 16. Thus, the hexadecimal representation of the first code point in a block is of the form ''pqrs''0, and that of the last code point in it is of the form ''tuvw''F, where ''p'', ''q'', ''r'', ''s'', ''t'', ''u'', ''v'', and ''w'', are hexadecimal digits.

The Unicode ~~standard~~ Standard gives every block a unique name that describes the common semantic nature of its code points. These names are case insensitive, and the hyphens, spaces, and underscores, in them are insignificant. For example, one can refer to the block whose Unicode name is {{code|Myanmar Extended-A}} as {{code|myanmarextendeda}}, {{code|MyanmarExtendedA}}, or {{code|myanmar_extended_a}}. ConTeXt chooses the first of these alternative styles for the names of blocks, as described below.

~~== ConTeXt names~~ The number of ~~Unicode blocks ==~~code points in a block varies. Some, such as the block named {{code|Syriac Supplement}}, have just 16 code points, and some others, such as the block named {{code|CJK Unified Ideographs Extension B}} with 42720 elements, have thousands of code points.

~~ConTeXt has its own names~~ Every assigned code point belongs to some block, but there are blocks which contain unassigned code points too; for ~~all~~ example, the ~~Unicode blocks. These names are defined in the source file~~ block named {{~~src~~code|~~char-ini.lua~~Telugu}}contains the unassigned code point 0C50. ~~Most of them~~ On the other hand, there are ~~obtained by converting~~ some code points, necessarily unassigned, which do not belong to any block; the code point 0870 is one such. Thus, the ~~Unicode name~~ set of all assigned code points is a proper subset of the ~~block to~~ union of all the ~~lower case~~blocks, and ~~removing~~ the ~~hyphens and spaces in~~ union of all the blocks is a proper subset of the ~~name~~Unicode code space.

== ~~The list~~ ConTeXt names of Unicode blocks ==

~~See~~ ConTeXt has its own names for all the Unicode blocks. Most of them are obtained by converting the Unicode name of the block to the lower case, and removing the hyphens and spaces in the name. The article entitled [[List of Unicode blocks]] ~~for~~ contains a table of Unicode blocks, their ConTeXt names, and links to more information about them.

== An example usage of Unicode blocks in ConTeXt ==

\stoptext

</context>

The verses in the above example are from the Wikipedia article on the poem ''Jabberwocky''<ref>Wikipedia contributors, ''Jabberwocky'', Wikipedia: The Free Encyclopaedia, 2017-11-03, 07:58 UTC, https://en.wikipedia.org/w/index.php?title=Jabberwocky&oldid=808507152, Retrieved 2017-11-03.</ref> by Lewis Carroll.

== Another example ==

\starttext

~~Here is a~~ Look at this bestiary of mathematical script letters:

\startformula

so {{code|context}} is indeed, and as expected, taking some of the glyphs from the fallback font, which, in this case, is provided by the local operating system.

== See also ==

* [[List of Unicode blocks]]

* {{cmd|definefontfallback}} — manual page with more information on the ConTeXt names of blocks, and their usage.

* {{src|char-ini.lua}} — source file containing the definitions of the ConTeXt names of Unicode blocks.

== References ==

Raghu

99

edits

Changes

Unicode blocks in ConTeXt (view source)

Revision as of 16:09, 4 November 2017

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Main

Navigation

Indexes

Interaction

Tools