Changes

Jump to navigation Jump to search
no edit summary
ConTeXt has special names for all Unicode blocks. These names can be used to specify ranges of code points in the setups of several commands.
 
This article uses some basic terms, such as ''character'', ''code point'', and ''assigned code point'', from the Unicode Standard<ref name="Unicode">The Unicode Consortium, ''The Unicode Standard'', Version 10.0.0, The Unicode Consortium, Mountain View, CA, USA, 2016, http://www.unicode.org/versions/Unicode10.0.0/, Retrieved 2017-11-03.</ref>. For brief descriptions of these terms, see the Unicode glossary<ref name="Unicode-glossary">The Unicode Consortium, ''Glossary'', http://www.unicode.org/glossary/, Retrieved 2017-11-03.</ref>.
== Unicode blocks ==
A Unicode block, or, simply, a block, is any one of the subsets of the Unicode code space that are listed in the file [{{code|Blocks.txt}}<ref name="Blocks">The Unicode Consortium, ''Blocks.txt'', ftp://www.unicode.org/Public/UNIDATA/Blocks.txt Blocks, Retrieved 2017-11-03.txt] </ref> of the Unicode Character Database. The Unicode code space is the set of all code points, that is, the set of all integers from 0 to the integer whose hexadecimal representation is 10FFF.
The file [ftp://www.unicode.org/Public/UNIDATA/Blocks.txt Blocks.txt] main properties of blocks are described in the Unicode Character Database lists several subsets of the Standard<ref name="Unicode code space"/> (Section 3.4, each of which is called a Unicode block, or, simply, a paragraph D10b). Every block. The Unicode code space is the set an interval of all code points, that isand distinct blocks are disjoint from each other. In particular, the set blocks form a partition of a subset of all integers from 0 to the integer whose hexadecimal representation is 10FFFUnicode code space.
Every A block starts at a code point that is an interval a multiple of 16. The number of code points. Different blocks are disjoint from in each other, and every code point belongs to at least one blockis also a multiple of 16. Thus, the blocks form a partition hexadecimal representation of the set of all Unicode first code points. The number of code points point in a block varies. Some have just 16 code pointsis of the form ''pqrs''0, and some others have thousands that of the last code pointspoint in it is of the form ''tuvw''F, where ''p'', ''q'', ''r'', ''s'', ''t'', ''u'', ''v'', and ''w'', are hexadecimal digits.
A code The Unicode Standard gives every block starts at a code point unique name that is a multiple describes the common semantic nature of 16its code points. The number of code points These names are case insensitive, and the hyphens, spaces, and underscores, in each block is also a multiple of 16them are insignificant. ThusFor example, one can refer to the hexadecimal representation of the first code point in a block whose Unicode name is of {{code|Myanmar Extended-A}} as {{code|myanmarextendeda}}, {{code|MyanmarExtendedA}}, or {{code|myanmar_extended_a}}. ConTeXt chooses the form ''pqrs''0, and that first of these alternative styles for the last code point in it is names of the form ''tuvw''F, where ''p'', ''q''blocks, ''r'', ''s'', ''t'', ''u'', ''v'', and ''w'', are hexadecimal digitsas described below.
The Unicode standard gives every block a unique name that describes the common semantic nature number of its code points. These names are case insensitive, and the hyphens, spaces, and underscores, in them are insignificanta block varies. For exampleSome, one can refer to such as the block whose Unicode name is named {{code|Myanmar Extended-ASyriac Supplement}} , have just 16 code points, and some others, such as the block named {{code|myanmarextendedaCJK Unified Ideographs Extension B}}with 42720 elements, have thousands of code points. Every assigned code point belongs to some block, but there are blocks which contain unassigned code points too; for example, the block named {{code|MyanmarExtendedATelugu}}contains the unassigned code point 0C50. On the other hand, there are some code points, necessarily unassigned, or {{which do not belong to any block; the code|myanmar_extended_a}}point 0870 is one such. ConTeXt chooses Thus, the first set of all assigned code points is a proper subset of these alternative styles for the names union of all the blocks, as described belowand the union of all the blocks is a proper subset of the Unicode code space.
== ConTeXt names of Unicode blocks ==
ConTeXt has its own names for all the Unicode blocks. These names are defined in the source file {{src|char-ini.lua}}. Most of them are obtained by converting the Unicode name of the block to the lower case, and removing the hyphens and spaces in the name. == The list of blocks == See the article entitled [[List of Unicode blocks]], for contains a table of Unicode blocks, their ConTeXt names, and links to more information about them.
== An example usage of Unicode blocks in ConTeXt ==
\stoptext
</context>
 
The verses in the above example are from the Wikipedia article on the poem ''Jabberwocky''<ref>Wikipedia contributors, ''Jabberwocky'', Wikipedia: The Free Encyclopaedia, 2017-11-03, 07:58 UTC, https://en.wikipedia.org/w/index.php?title=Jabberwocky&oldid=808507152, Retrieved 2017-11-03.</ref> by Lewis Carroll.
== Another example ==
so {{code|context}} is indeed, and as expected, taking some of the glyphs from the fallback font, which, in this case, is provided by the local operating system.
 
== See also ==
 
* [[List of Unicode blocks]]
 
* {{cmd|definefontfallback}} — manual page with more information on the ConTeXt names of blocks, and their usage.
 
* {{src|char-ini.lua}} — source file containing the definitions of the ConTeXt names of Unicode blocks.
 
== References ==
 
<references/>
99

edits

Navigation menu