Changes

4,662 bytes added , 14:55, 8 June 2020

m

no edit summary

A '''Unicode block''' is an interval of code points which represent characters that are semantically related to each other. For example, there is a Unicode block for characters from the Devanagari script which is used by several Indian languages. Another Unicode block corresponds to characters which denote mathematical operators, such as those that indicate the union and the intersection of sets.

ConTeXt has special names for all Unicode blocks. These names can be used to specify ranges of code points in the setups of several commands.

This article uses some basic terms, such as ''character'', ''code point'', and ''assigned code point'', from the Unicode Standard<ref name="Unicode">The Unicode Consortium, ''The Unicode Standard'', Version 10.0.0, The Unicode Consortium, Mountain View, CA, USA, 2016, http://www.unicode.org/versions/Unicode10.0.0/, Retrieved 2017-11-03.</ref>. For brief descriptions of these terms, see the Unicode glossary<ref name="Unicode-glossary">The Unicode Consortium, ''Glossary'', http://www.unicode.org/glossary/, Retrieved 2017-11-03.</ref>.

== Unicode blocks ==

A ~~'''~~Unicode block, or, simply, a block, is any of the subsets of the Unicode code space that are listed in the file {{code|Blocks.txt}}<ref name="Blocks">The Unicode Consortium, ''Blocks.txt'' ~~is an organisational unit~~ , ftp://www.unicode.org/Public/UNIDATA/Blocks.txt, Retrieved 2017-11-03.</ref> of the Unicode ~~code space~~Character Database. The Unicode code space is the set of all code points, that is, the set of all integers from 0 to the integer whose hexadecimal representation is 10FFF. The ~~official list~~ main properties of blocks are described in the ~~blocks is available [ftp:~~Unicode Standard<ref name="Unicode"/~~/www~~> (Section 3.4, paragraph D10b).~~unicode~~ Every block is an interval of code points, and distinct blocks are disjoint from each other.~~org/Public/UNIDATA/Blocks~~In particular, the blocks form a partition of a subset of the Unicode code space.~~txt~~ A block starts at a code point that is a multiple of 16. The number of code points in each block is also a multiple of 16. Thus, the ~~Unicode Web site]~~hexadecimal representation of the first code point in a block is of the form ''pqrs''0, and that of the last code point in it is of the form ''tuvw''F, where ''p'', ''q'', ''r'', ''s'', ''t'', ''u'', ''v'', and ''w'', are hexadecimal digits.

~~Every~~ The Unicode Standard gives every block ~~is an interval~~ a unique name that describes the common semantic nature of its code points. ~~Different blocks~~ These names are ~~disjoint from each other~~case insensitive, and ~~every code point belongs to at least one block~~the hyphens, spaces, and underscores, in them are insignificant. ~~Thus~~For example, one can refer to the ~~blocks form a partition of the set of all~~ block whose Unicode name is {{code|Myanmar Extended-A}} as {{code|myanmarextendeda}}, {{code|MyanmarExtendedA}}, or {{code ~~points~~|myanmar_extended_a}}. ~~The number~~ ConTeXt chooses the first of these alternative styles for the names of ~~code points in a block varies. Some have just 16 code points~~blocks, ~~and some others have thousands of code points~~as described below.

~~A code block starts at a code point that is a multiple of 16.~~ The number of code points in ~~each~~ a block ~~is also a multiple of 16~~varies. ~~Thus~~Some, such as the ~~hexadecimal representation of the first~~ block named {{code|Syriac Supplement}}, have just 16 code ~~point in a block is of the form ''pqrs''0~~points, and ~~that of~~ some others, such as the ~~last~~ block named {{code ~~point in it is~~ |CJK Unified Ideographs Extension B}} with 42720 elements, have thousands of ~~the form ''tuvw''F, where ''p'', ''q'', ''r'', ''s'', ''t'', ''u'', and ''v'', are hexadecimal digits~~code points.

~~The Unicode standard gives every~~ Every assigned code point belongs to some block ~~a unique name that describes the common semantic nature of its~~ , but there are blocks which contain unassigned code points~~. These names are case insensitive, and the hyphens, spaces, and underscores, in them are insignificant. For~~ too; for example, ~~one can refer to~~ the block ~~whose Unicode name is~~ named {{code|~~Myanmar Extended-A~~Telugu}} ~~as {{~~contains the unassigned code~~|myanmarextendeda}}~~point 0C50. On the other hand, {{there are some code~~|MyanmarExtendedA}}~~points, necessarily unassigned, ~~or {{~~which do not belong to any block; the code~~|myanmar_extended_a}}~~point 0870 is one such. ~~ConTeXt chooses~~ Thus, the ~~first~~ set of all assigned code points is a proper subset of ~~these alternative styles for~~ the ~~names~~ union of all the blocks, ~~as described below~~and the union of all the blocks is a proper subset of the Unicode code space.

== ConTeXt names of Unicode blocks ==

ConTeXt has its own names for all the Unicode blocks~~. These names are defined in the source file {{src|char-ini.lua}}~~. Most of them are obtained by converting the Unicode name of the block to the lower case, and removing the hyphens and spaces in the name. == The ~~list~~ article entitled [[List of Unicode blocks ==]] contains a table of Unicode blocks, their ConTeXt names, and links to more information about them.

~~See the article [[List~~ == An example usage of Unicode blocks~~]] for a table of Unicode blocks, their~~ in ConTeXt ~~names, and links to more information about them.~~==

~~== Usage~~ A typical use of Unicode blocks is in the ~~blocks~~ definition of '''fallback''' fonts to provide glyphs for certain characters. Sometimes, when writing a document in ConTeXt ==, one needs to typeset special symbols that are not available in the base font of the document. In such a situation, one can specify a fallback font to provide these missing symbols.

~~A typical use of Unicode blocks is~~ For example, in the ~~definition of '''fallback''' fonts to provide~~ following document, the base font [[TeX Gyre - Old Content|TeX Gyre Pagella]] does not have the glyphs for ~~certain~~ Cyrillic characters~~. Sometimes, when writing a document in ConTeXt~~, ~~one needs to typeset special symbols that~~ whose code points are ~~not available~~ in the ~~base font of the document~~Unicode block {{code|Cyrillic}}. ~~In such a situation, one can specify a fallback font~~ The document uses the {{cmd|definefallbackfamily}} command to ~~provide these missing symbols. Another use of fallback fonts arises when one wants to replace~~ get the glyphs for ~~some characters in~~ this block from the ~~base font with glyphs for those characters from another~~ {{code|DejaVu Serif}} font. ~~In such a case,~~ The ConTeXt name of the ~~latter font can be specified~~ block is supplied as ~~a fallback font~~the value of the key {{code|range}} in the last setup of the command.

<context source=yestext="Here is an image showing the relevant part of the PDF file obtained by running context on a file containing this document:">\definefallbackfamily [mainface] [~~serif~~rm] [DejaVu Serif] [range=cyrillic~~,force=yes~~]

\definefontfamily [mainface] [~~serif~~rm] [TeX Gyre Pagella]

\setupbodyfont [mainface]

\starttext

\startlines

\stoplines

\rightaligned{~~---~~ — Lewis Caroll, Jabberwocky}

\startlines

\stoplines

\rightaligned {~~---~~ — Дина Григорьевна Орловская, Бармаглот}

\stoptext

</context>

The verses in the above example are from the Wikipedia article on the poem ''Jabberwocky''<ref>Wikipedia contributors, ''Jabberwocky'', Wikipedia: The Free Encyclopaedia, 2017-11-03, 07:58 UTC, https://en.wikipedia.org/w/index.php?title=Jabberwocky&oldid=808507152, Retrieved 2017-11-03.</ref> by Lewis Carroll.

== Another example ==

A different application of fallback fonts arises when one wants to replace the existing glyphs for some characters in the base font with glyphs for those characters from another font. This situation is different from the one in the previous example. There, the base font did not contain glyphs for the characters of interest, and the fallback font provided the missing glyphs. Here, the base font does contain glyphs for the characters in question, but, perhaps due to aesthetic reaosons, the author of the document, wants to replace those glyphs with glyphs from another font. In such a case, the latter font can be specified as a fallback font.

For example, the following document uses the [[TeX Gyre - Old Content|{{code|pagella}}]] typescript to provide the base font, and uses the {{code|STIX General Regular}} font for mathematical script letters, which lie in the Unicode block {{code|Mathematical Alphanumeric Symbols}}. Instead of {{cmd|definefallbackfamily}} which was used in the previous example, this document uses the command {{cmd|definefontfallback}}. The ConTeXt name of the block is supplied as the third setup of this command. The last setup {{code|1=force=yes}} ensures that the glyphs of the relevant characters are replaced from the fallback font, overriding the glyphs that may exist in the base font for these characters.

\usetypescript [pagella]

\definefontfallback [mathscript] [STIXGeneralRegular] [mathematicalalphanumericsymbols] [force=yes]

\definefontsynonym [MathRoman] [pagella] [fallbacks=mathscript]

\setupbodyfont [pagella]

\starttext

Look at this bestiary of mathematical script letters:

\startformula

𝒜, 𝒞, 𝒟, 𝒢, 𝒥, 𝒦, 𝒪, 𝒫, 𝒬, 𝒮, 𝒯, 𝒰, 𝒱, 𝒲, 𝒳, 𝒴, 𝒵

\stopformula

\stoptext

</texcode>

Here is an image showing the relevant part of the PDF file obtained by running {{code|context}} on a file containing this document:

[[File:unicode-blocks-in-context-example.png]]

The log file resulting from that run of {{code|context}} says

<pre>

system > 13: filename=/usr/share/fonts/opentype/stix/STIXGeneral-Regular.otf ...

</pre>

so {{code|context}} is indeed, and as expected, taking some of the glyphs from the fallback font, which, in this case, is provided by the local operating system.

== See also ==

* [[List of Unicode blocks]]

* {{cmd|definefontfallback}} — manual page with more information on the ConTeXt names of blocks, and their usage.

* {{src|char-ini.lua}} — source file containing the definitions of the ConTeXt names of Unicode blocks.

== References ==

[[Category:Fonts]]

[[Category:Languages]]

Garulfo

gardener, Interface administrators, Administrators

3,893

edits

Changes

Unicode blocks in ConTeXt (view source)

Revision as of 14:55, 8 June 2020