Difference between revisions of "Unicode blocks in ConTeXt"

From Wiki
Jump to navigation Jump to search
m
m
 
(78 intermediate revisions by 3 users not shown)
Line 1: Line 1:
A ''Unicode block'' is an interval of code points which represent characters that are semantically related to each other.  For example, there is a Unicode block for characters from the Devanagari script which is used by several Indian languages.  Another Unicode block corresponds to characters which denote mathematical operators, such as those that indicate the union and the intersection of sets.
+
A '''Unicode block''' is an interval of code points which represent characters that are semantically related to each other.  For example, there is a Unicode block for characters from the Devanagari script which is used by several Indian languages.  Another Unicode block corresponds to characters which denote mathematical operators, such as those that indicate the union and the intersection of sets.
  
 
ConTeXt has special names for all Unicode blocks.  These names can be used to specify ranges of code points in the setups of several commands.
 
ConTeXt has special names for all Unicode blocks.  These names can be used to specify ranges of code points in the setups of several commands.
 +
 +
This article uses some basic terms, such as ''character'', ''code point'', and ''assigned code point'', from the Unicode Standard<ref name="Unicode">The Unicode Consortium, ''The Unicode Standard'', Version 10.0.0, The Unicode Consortium, Mountain View, CA, USA, 2016, http://www.unicode.org/versions/Unicode10.0.0/, Retrieved 2017-11-03.</ref>.  For brief descriptions of these terms, see the Unicode glossary<ref name="Unicode-glossary">The Unicode Consortium, ''Glossary'', http://www.unicode.org/glossary/, Retrieved 2017-11-03.</ref>.
  
 
== Unicode blocks ==
 
== Unicode blocks ==
  
A '''Unicode block''' is an organisational unit of the Unicode code space.  The Unicode code space is the set of all integers from 0 to 0x10FFF.
+
A Unicode block, or, simply, a block, is any of the subsets of the Unicode code space that are listed in the file {{code|Blocks.txt}}<ref name="Blocks">The Unicode Consortium, ''Blocks.txt'', ftp://www.unicode.org/Public/UNIDATA/Blocks.txt, Retrieved 2017-11-03.</ref> of the Unicode Character Database.  The Unicode code space is the set of all code points, that is, the set of all integers from 0 to the integer whose hexadecimal representation is 10FFF.
  
Every block is an interval of code points. Different blocks are disjoint from each other. In particular, the blocks form a partition of the set of all Unicode code points.  Every block has a unique name that describes the common semantic nature of its code points.
+
The main properties of blocks are described in the Unicode Standard<ref name="Unicode"/> (Section 3.4, paragraph D10b).  Every block is an interval of code points, and distinct blocks are disjoint from each other. In particular, the blocks form a partition of a subset of the Unicode code space.
  
A code block starts at a code point that is a multiple of 16.  The number of code points in each block is also a multiple of 16.  Thus, the first code point in a block is of the form 0x''pqrs''0, and the last code point in it is of the form 0x''tuvw''F.
+
A block starts at a code point that is a multiple of 16.  The number of code points in each block is also a multiple of 16.  Thus, the hexadecimal representation of the first code point in a block is of the form ''pqrs''0, and that of the last code point in it is of the form ''tuvw''F, where ''p'', ''q'', ''r'', ''s'', ''t'', ''u'', ''v'', and ''w'', are hexadecimal digits.
  
The number of code points in a block varies.  Some have just 16 code points, and some others have thousands of code points.
+
The Unicode Standard gives every block a unique name that describes the common semantic nature of its code points.  These names are case insensitive, and the hyphens, spaces, and underscores, in them are insignificant.  For example, one can refer to the block whose Unicode name is {{code|Myanmar Extended-A}} as {{code|myanmarextendeda}}, {{code|MyanmarExtendedA}}, or {{code|myanmar_extended_a}}.  ConTeXt chooses the first of these alternative styles for the names of blocks, as described below.
 +
 
 +
The number of code points in a block varies.  Some, such as the block named {{code|Syriac Supplement}}, have just 16 code points, and some others, such as the block named {{code|CJK Unified Ideographs Extension B}} with 42720 elements, have thousands of code points.
 +
 
 +
Every assigned code point belongs to some block, but there are blocks which contain unassigned code points too; for example, the block named {{code|Telugu}} contains the unassigned code point 0C50.  On the other hand, there are some code points, necessarily unassigned, which do not belong to any block; the code point 0870 is one such.  Thus, the set of all assigned code points is a proper subset of the union of all the blocks, and the union of all the blocks is a proper subset of the Unicode code space.
  
 
== ConTeXt names of Unicode blocks ==
 
== ConTeXt names of Unicode blocks ==
  
ConTeXt has its own names for all the Unicode blocks.  These names are defined in the source file {{src|char-ini.lua}}.  Most of them are obtained by converting the Unicode name of the block to the lower case, and removing the spaces in the name.
+
ConTeXt has its own names for all the Unicode blocks.  Most of them are obtained by converting the Unicode name of the block to the lower case, and removing the hyphens and spaces in the name.  The article entitled [[List of Unicode blocks]] contains a table of Unicode blocks, their ConTeXt names, and links to more information about them.
 +
 
 +
== An example usage of Unicode blocks in ConTeXt ==
 +
 
 +
A typical use of Unicode blocks is in the definition of '''fallback''' fonts to provide glyphs for certain characters. Sometimes, when writing a document in ConTeXt, one needs to typeset special symbols that are not available in the base font of the document.  In such a situation, one can specify a fallback font to provide these missing symbols.
 +
 
 +
For example, in the following document, the base font [[TeX Gyre - Old Content|TeX Gyre Pagella]] does not have the glyphs for Cyrillic characters, whose code points are in the Unicode block {{code|Cyrillic}}. The document uses the {{cmd|definefallbackfamily}} command to get the glyphs for this block from the {{code|DejaVu Serif}} fontThe ConTeXt name of the block is supplied as the value of the key {{code|range}} in the last setup of the command.
 +
 
 +
<context source=yes text="Here is an image showing the relevant part of the PDF file obtained by running context on a file containing this document:">
 +
\definefallbackfamily [mainface] [rm] [DejaVu Serif] [range=cyrillic]
 +
 
 +
\definefontfamily    [mainface] [rm] [TeX Gyre Pagella]
 +
 
 +
\setupbodyfont        [mainface]
 +
 
 +
\starttext
 +
 
 +
\startlines
 +
’Twas brillig, and the slithy toves
 +
Did gyre and gimble in the wabe;
 +
All mimsy were the borogoves,
 +
And the mome raths outgrabe.
 +
\stoplines
 +
 
 +
\rightaligned {— Lewis Caroll, Jabberwocky}
 +
 
 +
\startlines
 +
Варкалось. Хливкие шорьки
 +
Пырялись по наве,
 +
И хрюкотали зелюки,
 +
Как мюмзики в мове.
 +
\stoplines
 +
 
 +
\rightaligned {— Дина Григорьевна Орловская, Бармаглот}
 +
 
 +
\stoptext
 +
</context>
 +
 
 +
The verses in the above example are from the Wikipedia article on the poem ''Jabberwocky''<ref>Wikipedia contributors, ''Jabberwocky'', Wikipedia: The Free Encyclopaedia, 2017-11-03, 07:58 UTC, https://en.wikipedia.org/w/index.php?title=Jabberwocky&oldid=808507152, Retrieved 2017-11-03.</ref> by Lewis Carroll.
 +
 
 +
== Another example ==
 +
 
 +
A different application of fallback fonts arises when one wants to replace the existing glyphs for some characters in the base font with glyphs for those characters from another font.  This situation is different from the one in the previous example.  There, the base font did not contain glyphs for the characters of interest, and the fallback font provided the missing glyphs.  Here, the base font does contain glyphs for the characters in question, but, perhaps due to aesthetic reaosons, the author of the document, wants to replace those glyphs with glyphs from another font.  In such a case, the latter font can be specified as a fallback font.
 +
 
 +
For example, the following document uses the [[TeX Gyre - Old Content|{{code|pagella}}]] typescript to provide the base font, and uses the {{code|STIX General Regular}} font for mathematical script letters, which lie in the Unicode block {{code|Mathematical Alphanumeric Symbols}}.  Instead of {{cmd|definefallbackfamily}} which was used in the previous example, this document uses the command {{cmd|definefontfallback}}.  The ConTeXt name of the block is supplied as the third setup of this command.  The last setup {{code|1=force=yes}} ensures that the glyphs of the relevant characters are replaced from the fallback font, overriding the glyphs that may exist in the base font for these characters.
 +
 
 +
<texcode>
 +
\usetypescript      [pagella]
 +
 
 +
\definefontfallback [mathscript] [STIXGeneralRegular] [mathematicalalphanumericsymbols] [force=yes]
 +
 
 +
\definefontsynonym  [MathRoman]  [pagella]            [fallbacks=mathscript]
 +
 
 +
\setupbodyfont      [pagella]
 +
 
 +
\starttext
 +
 
 +
Look at this bestiary of mathematical script letters:
 +
 
 +
\startformula
 +
𝒜, 𝒞, 𝒟, 𝒢, 𝒥, 𝒦, 𝒪, 𝒫, 𝒬, 𝒮, 𝒯, 𝒰, 𝒱, 𝒲, 𝒳, 𝒴, 𝒵
 +
\stopformula
 +
 
 +
\stoptext
 +
</texcode>
 +
 
 +
Here is an image showing the relevant part of the PDF file obtained by running {{code|context}} on a file containing this document:
 +
 
 +
[[File:unicode-blocks-in-context-example.png]]
 +
 
 +
The log file resulting from that run of {{code|context}} says
 +
 
 +
<pre>
 +
system > 13: filename=/usr/share/fonts/opentype/stix/STIXGeneral-Regular.otf ...
 +
</pre>
 +
 
 +
so {{code|context}} is indeed, and as expected, taking some of the glyphs from the fallback font, which, in this case, is provided by the local operating system.
 +
 
 +
== See also ==
 +
 
 +
* [[List of Unicode blocks]]
  
== The list of blocks ==
+
* {{cmd|definefontfallback}} — manual page with more information on the ConTeXt names of blocks, and their usage.
  
The following table lists all the Unicode blocks.  Each row of the table describes a block.  The first cell in the row is the interval of code points in that block. The second cell is the Unicode name of the block.  The third cell is the ConTeXt name of the block.  The last cell is a link to the current code chart of the block at the Unicode Web site.  This chart contains glyphs for the graphic characters whose code points are in the block, and additional information, such as alternative names of some of the characters.
+
* {{src|char-ini.lua}} — source file containing the definitions of the ConTeXt names of Unicode blocks.
  
The order of the blocks in this list is different from that in the file {{src|char-ini.lua}}.  The blocks are ordered here numerically by their starting code points, whereas they are ordered in that file alphabetically by their ConTeXt names.  The official list of the blocks is available [ftp://www.unicode.org/Public/UNIDATA/Blocks.txt at the Unicode Web site].
+
== References ==
  
{| class="wikitable"
+
<references/>
|-
 
!Block
 
!Unicode name
 
!ConTeXt name
 
!Chart
 
|-
 
|0000–007F
 
|Basic Latin
 
|basiclatin
 
|[http://www.unicode.org/charts/PDF/U0000.pdf U0000.pdf]
 
|-
 
|0080–00FF
 
|Latin-1 Supplement
 
|latinsupplement
 
|[http://www.unicode.org/charts/PDF/U0080.pdf U0080.pdf]
 
|-
 
|0100–017F
 
|Latin Extended-A
 
|latinextendeda
 
|[http://www.unicode.org/charts/PDF/U0100.pdf U0100.pdf]
 
|-
 
|0180–024F
 
|Latin Extended-B
 
|latinextendedb
 
|[http://www.unicode.org/charts/PDF/U0180.pdf U0180.pdf]
 
|-
 
|0250–02AF
 
|IPA Extensions
 
|ipaextensions
 
|[http://www.unicode.org/charts/PDF/U0250.pdf U0250.pdf]
 
|-
 
|02B0–02FF
 
|Spacing Modifier Letters
 
|spacingmodifierletters
 
|[http://www.unicode.org/charts/PDF/U02B0.pdf U02B0.pdf]
 
|-
 
|0300–036F
 
|Combining Diacritical Marks
 
|combiningdiacriticalmarks
 
|[http://www.unicode.org/charts/PDF/U0300.pdf U0300.pdf]
 
|-
 
|0370–03FF
 
|Greek and Coptic
 
|greekandcoptic
 
|[http://www.unicode.org/charts/PDF/U0370.pdf U0370.pdf]
 
|-
 
|0400–04FF
 
|Cyrillic
 
|cyrillic
 
|[http://www.unicode.org/charts/PDF/U0400.pdf U0400.pdf]
 
|-
 
|0500–052F
 
|Cyrillic Supplement
 
|cyrillicsupplement
 
|[http://www.unicode.org/charts/PDF/U0500.pdf U0500.pdf]
 
|-
 
|0530–058F
 
|Armenian
 
|armenian
 
|[http://www.unicode.org/charts/PDF/U0530.pdf U0530.pdf]
 
|-
 
|0590–05FF
 
|Hebrew
 
|hebrew
 
|[http://www.unicode.org/charts/PDF/U0590.pdf U0590.pdf]
 
|-
 
|0600–06FF
 
|Arabic
 
|arabic
 
|[http://www.unicode.org/charts/PDF/U0600.pdf U0600.pdf]
 
|-
 
|0700–074F
 
|Syriac
 
|syriac
 
|[http://www.unicode.org/charts/PDF/U0700.pdf U0700.pdf]
 
|-
 
|0750–077F
 
|Arabic Supplement
 
|arabicsupplement
 
|[http://www.unicode.org/charts/PDF/U0750.pdf U0750.pdf]
 
|-
 
|0780–07BF
 
|Thaana
 
|thaana
 
|[http://www.unicode.org/charts/PDF/U0780.pdf U0780.pdf]
 
|-
 
|07C0–07FF
 
|NKo
 
|nko
 
|[http://www.unicode.org/charts/PDF/U07C0.pdf U07C0.pdf]
 
|-
 
|0800–083F
 
|Samaritan
 
|samaritan
 
|[http://www.unicode.org/charts/PDF/U0800.pdf U0800.pdf]
 
|-
 
|0840–085F
 
|Mandaic
 
|mandaic
 
|[http://www.unicode.org/charts/PDF/U0840.pdf U0840.pdf]
 
|-
 
|0860–086F
 
|Syriac Supplement
 
|syriacsupplement
 
|[http://www.unicode.org/charts/PDF/U0860.pdf U0860.pdf]
 
|-
 
|08A0–08FF
 
|Arabic Extended-A
 
|arabicextendeda
 
|[http://www.unicode.org/charts/PDF/U08A0.pdf U08A0.pdf]
 
|-
 
|0900–097F
 
|Devanagari
 
|devanagari
 
|[http://www.unicode.org/charts/PDF/U0900.pdf U0900.pdf]
 
|-
 
|0980–09FF
 
|Bengali
 
|bengali
 
|[http://www.unicode.org/charts/PDF/U0980.pdf U0980.pdf]
 
|-
 
|0A00–0A7F
 
|Gurmukhi
 
|gurmukhi
 
|[http://www.unicode.org/charts/PDF/U0A00.pdf U0A00.pdf]
 
|-
 
|0A80–0AFF
 
|Gujarati
 
|gujarati
 
|[http://www.unicode.org/charts/PDF/U0A80.pdf U0A80.pdf]
 
|-
 
|0B00–0B7F
 
|Oriya
 
|oriya
 
|[http://www.unicode.org/charts/PDF/U0B00.pdf U0B00.pdf]
 
|-
 
|0B80–0BFF
 
|Tamil
 
|tamil
 
|[http://www.unicode.org/charts/PDF/U0B80.pdf U0B80.pdf]
 
|-
 
|0C00–0C7F
 
|Telugu
 
|telugu
 
|[http://www.unicode.org/charts/PDF/U0C00.pdf U0C00.pdf]
 
|-
 
|0C80–0CFF
 
|Kannada
 
|kannada
 
|[http://www.unicode.org/charts/PDF/U0C80.pdf U0C80.pdf]
 
|-
 
|0D00–0D7F
 
|Malayalam
 
|malayalam
 
|[http://www.unicode.org/charts/PDF/U0D00.pdf U0D00.pdf]
 
|-
 
|0D80–0DFF
 
|Sinhala
 
|sinhala
 
|[http://www.unicode.org/charts/PDF/U0D80.pdf U0D80.pdf]
 
|-
 
|0E00–0E7F
 
|Thai
 
|thai
 
|[http://www.unicode.org/charts/PDF/U0E00.pdf U0E00.pdf]
 
|-
 
|0E80–0EFF
 
|Lao
 
|lao
 
|[http://www.unicode.org/charts/PDF/U0E80.pdf U0E80.pdf]
 
|-
 
|0F00–0FFF
 
|Tibetan
 
|tibetan
 
|[http://www.unicode.org/charts/PDF/U0F00.pdf U0F00.pdf]
 
|-
 
|1000–109F
 
|Myanmar
 
|myanmar
 
|[http://www.unicode.org/charts/PDF/U1000.pdf U1000.pdf]
 
|-
 
|10A0–10FF
 
|Georgian
 
|georgian
 
|[http://www.unicode.org/charts/PDF/U10A0.pdf U10A0.pdf]
 
|-
 
|1100–11FF
 
|Hangul Jamo
 
|hanguljamo
 
|[http://www.unicode.org/charts/PDF/U1100.pdf U1100.pdf]
 
|-
 
|1200–137F
 
|Ethiopic
 
|ethiopic
 
|[http://www.unicode.org/charts/PDF/U1200.pdf U1200.pdf]
 
|-
 
|1380–139F
 
|Ethiopic Supplement
 
|ethiopicsupplement
 
|[http://www.unicode.org/charts/PDF/U1380.pdf U1380.pdf]
 
|-
 
|13A0–13FF
 
|Cherokee
 
|cherokee
 
|[http://www.unicode.org/charts/PDF/U13A0.pdf U13A0.pdf]
 
|-
 
|1400–167F
 
|Unified Canadian Aboriginal Syllabics
 
|unifiedcanadianaboriginalsyllabics
 
|[http://www.unicode.org/charts/PDF/U1400.pdf U1400.pdf]
 
|-
 
|1680–169F
 
|Ogham
 
|ogham
 
|[http://www.unicode.org/charts/PDF/U1680.pdf U1680.pdf]
 
|-
 
|16A0–16FF
 
|Runic
 
|runic
 
|[http://www.unicode.org/charts/PDF/U16A0.pdf U16A0.pdf]
 
|-
 
|1700–171F
 
|Tagalog
 
|tagalog
 
|[http://www.unicode.org/charts/PDF/U1700.pdf U1700.pdf]
 
|-
 
|1720–173F
 
|Hanunoo
 
|hanunoo
 
|[http://www.unicode.org/charts/PDF/U1720.pdf U1720.pdf]
 
|-
 
|1740–175F
 
|Buhid
 
|buhid
 
|[http://www.unicode.org/charts/PDF/U1740.pdf U1740.pdf]
 
|-
 
|1760–177F
 
|Tagbanwa
 
|tagbanwa
 
|[http://www.unicode.org/charts/PDF/U1760.pdf U1760.pdf]
 
|-
 
|1780–17FF
 
|Khmer
 
|khmer
 
|[http://www.unicode.org/charts/PDF/U1780.pdf U1780.pdf]
 
|-
 
|1800–18AF
 
|Mongolian
 
|mongolian
 
|[http://www.unicode.org/charts/PDF/U1800.pdf U1800.pdf]
 
|-
 
|18B0–18FF
 
|Unified Canadian Aboriginal Syllabics Extended
 
|unifiedcanadianaboriginalsyllabicsextended
 
|[http://www.unicode.org/charts/PDF/U18B0.pdf U18B0.pdf]
 
|-
 
|1900–194F
 
|Limbu
 
|limbu
 
|[http://www.unicode.org/charts/PDF/U1900.pdf U1900.pdf]
 
|-
 
|1950–197F
 
|Tai Le
 
|taile
 
|[http://www.unicode.org/charts/PDF/U1950.pdf U1950.pdf]
 
|-
 
|1980–19DF
 
|New Tai Lue
 
|newtailue
 
|[http://www.unicode.org/charts/PDF/U1980.pdf U1980.pdf]
 
|-
 
|19E0–19FF
 
|Khmer Symbols
 
|khmersymbols
 
|[http://www.unicode.org/charts/PDF/U19E0.pdf U19E0.pdf]
 
|-
 
|1A00–1A1F
 
|Buginese
 
|buginese
 
|[http://www.unicode.org/charts/PDF/U1A00.pdf U1A00.pdf]
 
|-
 
|1A20–1AAF
 
|Tai Tham
 
|taitham
 
|[http://www.unicode.org/charts/PDF/U1A20.pdf U1A20.pdf]
 
|-
 
|1AB0–1AFF
 
|Combining Diacritical Marks Extended
 
|combiningdiacriticalmarksextended
 
|[http://www.unicode.org/charts/PDF/U1AB0.pdf U1AB0.pdf]
 
|-
 
|1B00–1B7F
 
|Balinese
 
|balinese
 
|[http://www.unicode.org/charts/PDF/U1B00.pdf U1B00.pdf]
 
|-
 
|1B80–1BBF
 
|Sundanese
 
|sundanese
 
|[http://www.unicode.org/charts/PDF/U1B80.pdf U1B80.pdf]
 
|-
 
|1BC0–1BFF
 
|Batak
 
|batak
 
|[http://www.unicode.org/charts/PDF/U1BC0.pdf U1BC0.pdf]
 
|-
 
|1C00–1C4F
 
|Lepcha
 
|lepcha
 
|[http://www.unicode.org/charts/PDF/U1C00.pdf U1C00.pdf]
 
|-
 
|1C50–1C7F
 
|Ol Chiki
 
|olchiki
 
|[http://www.unicode.org/charts/PDF/U1C50.pdf U1C50.pdf]
 
|-
 
|1C80–1C8F
 
|Cyrillic Extended-C
 
|cyrillicextendedc
 
|[http://www.unicode.org/charts/PDF/U1C80.pdf U1C80.pdf]
 
|-
 
|1CC0–1CCF
 
|Sundanese Supplement
 
|sundanesesupplement
 
|[http://www.unicode.org/charts/PDF/U1CC0.pdf U1CC0.pdf]
 
|-
 
|1CD0–1CFF
 
|Vedic Extensions
 
|vedicextensions
 
|[http://www.unicode.org/charts/PDF/U1CD0.pdf U1CD0.pdf]
 
|-
 
|1D00–1D7F
 
|Phonetic Extensions
 
|phoneticextensions
 
|[http://www.unicode.org/charts/PDF/U1D00.pdf U1D00.pdf]
 
|-
 
|1D80–1DBF
 
|Phonetic Extensions Supplement
 
|phoneticextensionssupplement
 
|[http://www.unicode.org/charts/PDF/U1D80.pdf U1D80.pdf]
 
|-
 
|1DC0–1DFF
 
|Combining Diacritical Marks Supplement
 
|combiningdiacriticalmarkssupplement
 
|[http://www.unicode.org/charts/PDF/U1DC0.pdf U1DC0.pdf]
 
|-
 
|1E00–1EFF
 
|Latin Extended Additional
 
|latinextendedadditional
 
|[http://www.unicode.org/charts/PDF/U1E00.pdf U1E00.pdf]
 
|-
 
|1F00–1FFF
 
|Greek Extended
 
|greekextended
 
|[http://www.unicode.org/charts/PDF/U1F00.pdf U1F00.pdf]
 
|-
 
|2000–206F
 
|General Punctuation
 
|generalpunctuation
 
|[http://www.unicode.org/charts/PDF/U2000.pdf U2000.pdf]
 
|-
 
|2070–209F
 
|Superscripts and Subscripts
 
|superscriptsandsubscripts
 
|[http://www.unicode.org/charts/PDF/U2070.pdf U2070.pdf]
 
|-
 
|20A0–20CF
 
|Currency Symbols
 
|currencysymbols
 
|[http://www.unicode.org/charts/PDF/U20A0.pdf U20A0.pdf]
 
|-
 
|20D0–20FF
 
|Combining Diacritical Marks for Symbols
 
|combiningdiacriticalmarksforsymbols
 
|[http://www.unicode.org/charts/PDF/U20D0.pdf U20D0.pdf]
 
|-
 
|2100–214F
 
|Letterlike Symbols
 
|letterlikesymbols
 
|[http://www.unicode.org/charts/PDF/U2100.pdf U2100.pdf]
 
|-
 
|2150–218F
 
|Number Forms
 
|numberforms
 
|[http://www.unicode.org/charts/PDF/U2150.pdf U2150.pdf]
 
|-
 
|2190–21FF
 
|Arrows
 
|arrows
 
|[http://www.unicode.org/charts/PDF/U2190.pdf U2190.pdf]
 
|-
 
|2200–22FF
 
|Mathematical Operators
 
|mathematicaloperators
 
|[http://www.unicode.org/charts/PDF/U2200.pdf U2200.pdf]
 
|-
 
|2300–23FF
 
|Miscellaneous Technical
 
|miscellaneoustechnical
 
|[http://www.unicode.org/charts/PDF/U2300.pdf U2300.pdf]
 
|-
 
|2400–243F
 
|Control Pictures
 
|controlpictures
 
|[http://www.unicode.org/charts/PDF/U2400.pdf U2400.pdf]
 
|-
 
|2440–245F
 
|Optical Character Recognition
 
|opticalcharacterrecognition
 
|[http://www.unicode.org/charts/PDF/U2440.pdf U2440.pdf]
 
|-
 
|2460–24FF
 
|Enclosed Alphanumerics
 
|enclosedalphanumerics
 
|[http://www.unicode.org/charts/PDF/U2460.pdf U2460.pdf]
 
|-
 
|2500–257F
 
|Box Drawing
 
|boxdrawing
 
|[http://www.unicode.org/charts/PDF/U2500.pdf U2500.pdf]
 
|-
 
|2580–259F
 
|Block Elements
 
|blockelements
 
|[http://www.unicode.org/charts/PDF/U2580.pdf U2580.pdf]
 
|-
 
|25A0–25FF
 
|Geometric Shapes
 
|geometricshapes
 
|[http://www.unicode.org/charts/PDF/U25A0.pdf U25A0.pdf]
 
|-
 
|2600–26FF
 
|Miscellaneous Symbols
 
|miscellaneoussymbols
 
|[http://www.unicode.org/charts/PDF/U2600.pdf U2600.pdf]
 
|-
 
|2700–27BF
 
|Dingbats
 
|dingbats
 
|[http://www.unicode.org/charts/PDF/U2700.pdf U2700.pdf]
 
|-
 
|27C0–27EF
 
|Miscellaneous Mathematical Symbols-A
 
|miscellaneousmathematicalsymbolsa
 
|[http://www.unicode.org/charts/PDF/U27C0.pdf U27C0.pdf]
 
|-
 
|27F0–27FF
 
|Supplemental Arrows-A
 
|supplementalarrowsa
 
|[http://www.unicode.org/charts/PDF/U27F0.pdf U27F0.pdf]
 
|-
 
|2800–28FF
 
|Braille Patterns
 
|braillepatterns
 
|[http://www.unicode.org/charts/PDF/U2800.pdf U2800.pdf]
 
|-
 
|2900–297F
 
|Supplemental Arrows-B
 
|supplementalarrowsb
 
|[http://www.unicode.org/charts/PDF/U2900.pdf U2900.pdf]
 
|-
 
|2980–29FF
 
|Miscellaneous Mathematical Symbols-B
 
|miscellaneousmathematicalsymbolsb
 
|[http://www.unicode.org/charts/PDF/U2980.pdf U2980.pdf]
 
|-
 
|2A00–2AFF
 
|Supplemental Mathematical Operators
 
|supplementalmathematicaloperators
 
|[http://www.unicode.org/charts/PDF/U2A00.pdf U2A00.pdf]
 
|-
 
|2B00–2BFF
 
|Miscellaneous Symbols and Arrows
 
|miscellaneoussymbolsandarrows
 
|[http://www.unicode.org/charts/PDF/U2B00.pdf U2B00.pdf]
 
|-
 
|2C00–2C5F
 
|Glagolitic
 
|glagolitic
 
|[http://www.unicode.org/charts/PDF/U2C00.pdf U2C00.pdf]
 
|-
 
|2C60–2C7F
 
|Latin Extended-C
 
|latinextendedc
 
|[http://www.unicode.org/charts/PDF/U2C60.pdf U2C60.pdf]
 
|-
 
|2C80–2CFF
 
|Coptic
 
|coptic
 
|[http://www.unicode.org/charts/PDF/U2C80.pdf U2C80.pdf]
 
|-
 
|2D00–2D2F
 
|Georgian Supplement
 
|georgiansupplement
 
|[http://www.unicode.org/charts/PDF/U2D00.pdf U2D00.pdf]
 
|-
 
|2D30–2D7F
 
|Tifinagh
 
|tifinagh
 
|[http://www.unicode.org/charts/PDF/U2D30.pdf U2D30.pdf]
 
|-
 
|2D80–2DDF
 
|Ethiopic Extended
 
|ethiopicextended
 
|[http://www.unicode.org/charts/PDF/U2D80.pdf U2D80.pdf]
 
|-
 
|2DE0–2DFF
 
|Cyrillic Extended-A
 
|cyrillicextendeda
 
|[http://www.unicode.org/charts/PDF/U2DE0.pdf U2DE0.pdf]
 
|-
 
|2E00–2E7F
 
|Supplemental Punctuation
 
|supplementalpunctuation
 
|[http://www.unicode.org/charts/PDF/U2E00.pdf U2E00.pdf]
 
|-
 
|2E80–2EFF
 
|CJK Radicals Supplement
 
|cjkradicalssupplement
 
|[http://www.unicode.org/charts/PDF/U2E80.pdf U2E80.pdf]
 
|-
 
|2F00–2FDF
 
|Kangxi Radicals
 
|kangxiradicals
 
|[http://www.unicode.org/charts/PDF/U2F00.pdf U2F00.pdf]
 
|-
 
|2FF0–2FFF
 
|Ideographic Description Characters
 
|ideographicdescriptioncharacters
 
|[http://www.unicode.org/charts/PDF/U2FF0.pdf U2FF0.pdf]
 
|-
 
|3000–303F
 
|CJK Symbols and Punctuation
 
|cjksymbolsandpunctuation
 
|[http://www.unicode.org/charts/PDF/U3000.pdf U3000.pdf]
 
|-
 
|3040–309F
 
|Hiragana
 
|hiragana
 
|[http://www.unicode.org/charts/PDF/U3040.pdf U3040.pdf]
 
|-
 
|30A0–30FF
 
|Katakana
 
|katakana
 
|[http://www.unicode.org/charts/PDF/U30A0.pdf U30A0.pdf]
 
|-
 
|3100–312F
 
|Bopomofo
 
|bopomofo
 
|[http://www.unicode.org/charts/PDF/U3100.pdf U3100.pdf]
 
|-
 
|3130–318F
 
|Hangul Compatibility Jamo
 
|hangulcompatibilityjamo
 
|[http://www.unicode.org/charts/PDF/U3130.pdf U3130.pdf]
 
|-
 
|3190–319F
 
|Kanbun
 
|kanbun
 
|[http://www.unicode.org/charts/PDF/U3190.pdf U3190.pdf]
 
|-
 
|31A0–31BF
 
|Bopomofo Extended
 
|bopomofoextended
 
|[http://www.unicode.org/charts/PDF/U31A0.pdf U31A0.pdf]
 
|-
 
|31C0–31EF
 
|CJK Strokes
 
|cjkstrokes
 
|[http://www.unicode.org/charts/PDF/U31C0.pdf U31C0.pdf]
 
|-
 
|31F0–31FF
 
|Katakana Phonetic Extensions
 
|katakanaphoneticextensions
 
|[http://www.unicode.org/charts/PDF/U31F0.pdf U31F0.pdf]
 
|-
 
|3200–32FF
 
|Enclosed CJK Letters and Months
 
|enclosedcjklettersandmonths
 
|[http://www.unicode.org/charts/PDF/U3200.pdf U3200.pdf]
 
|-
 
|3300–33FF
 
|CJK Compatibility
 
|cjkcompatibility
 
|[http://www.unicode.org/charts/PDF/U3300.pdf U3300.pdf]
 
|-
 
|3400–4DBF
 
|CJK Unified Ideographs Extension A
 
|cjkunifiedideographsextensiona
 
|[http://www.unicode.org/charts/PDF/U3400.pdf U3400.pdf]
 
|-
 
|4DC0–4DFF
 
|Yijing Hexagram Symbols
 
|yijinghexagramsymbols
 
|[http://www.unicode.org/charts/PDF/U4DC0.pdf U4DC0.pdf]
 
|-
 
|4E00–9FFF
 
|CJK Unified Ideographs
 
|cjkunifiedideographs
 
|[http://www.unicode.org/charts/PDF/U4E00.pdf U4E00.pdf]
 
|-
 
|A000–A48F
 
|Yi Syllables
 
|yisyllables
 
|[http://www.unicode.org/charts/PDF/UA000.pdf UA000.pdf]
 
|-
 
|A490–A4CF
 
|Yi Radicals
 
|yiradicals
 
|[http://www.unicode.org/charts/PDF/UA490.pdf UA490.pdf]
 
|-
 
|A4D0–A4FF
 
|Lisu
 
|lisu
 
|[http://www.unicode.org/charts/PDF/UA4D0.pdf UA4D0.pdf]
 
|-
 
|A500–A63F
 
|Vai
 
|vai
 
|[http://www.unicode.org/charts/PDF/UA500.pdf UA500.pdf]
 
|-
 
|A640–A69F
 
|Cyrillic Extended-B
 
|cyrillicextendedb
 
|[http://www.unicode.org/charts/PDF/UA640.pdf UA640.pdf]
 
|-
 
|A6A0–A6FF
 
|Bamum
 
|bamum
 
|[http://www.unicode.org/charts/PDF/UA6A0.pdf UA6A0.pdf]
 
|-
 
|A700–A71F
 
|Modifier Tone Letters
 
|modifiertoneletters
 
|[http://www.unicode.org/charts/PDF/UA700.pdf UA700.pdf]
 
|-
 
|A720–A7FF
 
|Latin Extended-D
 
|latinextendedd
 
|[http://www.unicode.org/charts/PDF/UA720.pdf UA720.pdf]
 
|-
 
|A800–A82F
 
|Syloti Nagri
 
|sylotinagri
 
|[http://www.unicode.org/charts/PDF/UA800.pdf UA800.pdf]
 
|-
 
|A830–A83F
 
|Common Indic Number Forms
 
|commonindicnumberforms
 
|[http://www.unicode.org/charts/PDF/UA830.pdf UA830.pdf]
 
|-
 
|A840–A87F
 
|Phags-pa
 
|phagspa
 
|[http://www.unicode.org/charts/PDF/UA840.pdf UA840.pdf]
 
|-
 
|A880–A8DF
 
|Saurashtra
 
|saurashtra
 
|[http://www.unicode.org/charts/PDF/UA880.pdf UA880.pdf]
 
|-
 
|A8E0–A8FF
 
|Devanagari Extended
 
|devanagariextended
 
|[http://www.unicode.org/charts/PDF/UA8E0.pdf UA8E0.pdf]
 
|-
 
|A900–A92F
 
|Kayah Li
 
|kayahli
 
|[http://www.unicode.org/charts/PDF/UA900.pdf UA900.pdf]
 
|-
 
|A930–A95F
 
|Rejang
 
|rejang
 
|[http://www.unicode.org/charts/PDF/UA930.pdf UA930.pdf]
 
|-
 
|A960–A97F
 
|Hangul Jamo Extended-A
 
|hanguljamoextendeda
 
|[http://www.unicode.org/charts/PDF/UA960.pdf UA960.pdf]
 
|-
 
|A980–A9DF
 
|Javanese
 
|javanese
 
|[http://www.unicode.org/charts/PDF/UA980.pdf UA980.pdf]
 
|-
 
|A9E0–A9FF
 
|Myanmar Extended-B
 
|myanmarextendedb
 
|[http://www.unicode.org/charts/PDF/UA9E0.pdf UA9E0.pdf]
 
|-
 
|AA00–AA5F
 
|Cham
 
|cham
 
|[http://www.unicode.org/charts/PDF/UAA00.pdf UAA00.pdf]
 
|-
 
|AA60–AA7F
 
|Myanmar Extended-A
 
|myanmarextendeda
 
|[http://www.unicode.org/charts/PDF/UAA60.pdf UAA60.pdf]
 
|-
 
|AA80–AADF
 
|Tai Viet
 
|taiviet
 
|[http://www.unicode.org/charts/PDF/UAA80.pdf UAA80.pdf]
 
|-
 
|AAE0–AAFF
 
|Meetei Mayek Extensions
 
|meeteimayekextensions
 
|[http://www.unicode.org/charts/PDF/UAAE0.pdf UAAE0.pdf]
 
|-
 
|AB00–AB2F
 
|Ethiopic Extended-A
 
|ethiopicextendeda
 
|[http://www.unicode.org/charts/PDF/UAB00.pdf UAB00.pdf]
 
|-
 
|AB30–AB6F
 
|Latin Extended-E
 
|latinextendede
 
|[http://www.unicode.org/charts/PDF/UAB30.pdf UAB30.pdf]
 
|-
 
|AB70–ABBF
 
|Cherokee Supplement
 
|cherokeesupplement
 
|[http://www.unicode.org/charts/PDF/UAB70.pdf UAB70.pdf]
 
|-
 
|ABC0–ABFF
 
|Meetei Mayek
 
|meeteimayek
 
|[http://www.unicode.org/charts/PDF/UABC0.pdf UABC0.pdf]
 
|-
 
|AC00–D7AF
 
|Hangul Syllables
 
|hangulsyllables
 
|[http://www.unicode.org/charts/PDF/UAC00.pdf UAC00.pdf]
 
|-
 
|D7B0–D7FF
 
|Hangul Jamo Extended-B
 
|hanguljamoextendedb
 
|[http://www.unicode.org/charts/PDF/UD7B0.pdf UD7B0.pdf]
 
|-
 
|D800–DB7F
 
|High Surrogates
 
|highsurrogates
 
|[http://www.unicode.org/charts/PDF/UD800.pdf UD800.pdf]
 
|-
 
|DB80–DBFF
 
|High Private Use Surrogates
 
|highprivateusesurrogates
 
|[http://www.unicode.org/charts/PDF/UDB80.pdf UDB80.pdf]
 
|-
 
|DC00–DFFF
 
|Low Surrogates
 
|lowsurrogates
 
|[http://www.unicode.org/charts/PDF/UDC00.pdf UDC00.pdf]
 
|-
 
|E000–F8FF
 
|Private Use Area
 
|privateusearea
 
|[http://www.unicode.org/charts/PDF/UE000.pdf UE000.pdf]
 
|-
 
|F900–FAFF
 
|CJK Compatibility Ideographs
 
|cjkcompatibilityideographs
 
|[http://www.unicode.org/charts/PDF/UF900.pdf UF900.pdf]
 
|-
 
|FB00–FB4F
 
|Alphabetic Presentation Forms
 
|alphabeticpresentationforms
 
|[http://www.unicode.org/charts/PDF/UFB00.pdf UFB00.pdf]
 
|-
 
|FB50–FDFF
 
|Arabic Presentation Forms-A
 
|arabicpresentationformsa
 
|[http://www.unicode.org/charts/PDF/UFB50.pdf UFB50.pdf]
 
|-
 
|FE00–FE0F
 
|Variation Selectors
 
|variationselectors
 
|[http://www.unicode.org/charts/PDF/UFE00.pdf UFE00.pdf]
 
|-
 
|FE10–FE1F
 
|Vertical Forms
 
|verticalforms
 
|[http://www.unicode.org/charts/PDF/UFE10.pdf UFE10.pdf]
 
|-
 
|FE20–FE2F
 
|Combining Half Marks
 
|combininghalfmarks
 
|[http://www.unicode.org/charts/PDF/UFE20.pdf UFE20.pdf]
 
|-
 
|FE30–FE4F
 
|CJK Compatibility Forms
 
|cjkcompatibilityforms
 
|[http://www.unicode.org/charts/PDF/UFE30.pdf UFE30.pdf]
 
|-
 
|FE50–FE6F
 
|Small Form Variants
 
|smallformvariants
 
|[http://www.unicode.org/charts/PDF/UFE50.pdf UFE50.pdf]
 
|-
 
|FE70–FEFF
 
|Arabic Presentation Forms-B
 
|arabicpresentationformsb
 
|[http://www.unicode.org/charts/PDF/UFE70.pdf UFE70.pdf]
 
|-
 
|FF00–FFEF
 
|Halfwidth and Fullwidth Forms
 
|halfwidthandfullwidthforms
 
|[http://www.unicode.org/charts/PDF/UFF00.pdf UFF00.pdf]
 
|-
 
|FFF0–FFFF
 
|Specials
 
|specials
 
|[http://www.unicode.org/charts/PDF/UFFF0.pdf UFFF0.pdf]
 
|-
 
|10000–1007F
 
|Linear B Syllabary
 
|linearbsyllabary
 
|[http://www.unicode.org/charts/PDF/U10000.pdf U10000.pdf]
 
|-
 
|10080–100FF
 
|Linear B Ideograms
 
|linearbideograms
 
|[http://www.unicode.org/charts/PDF/U10080.pdf U10080.pdf]
 
|-
 
|10100–1013F
 
|Aegean Numbers
 
|aegeannumbers
 
|[http://www.unicode.org/charts/PDF/U10100.pdf U10100.pdf]
 
|-
 
|10140–1018F
 
|Ancient Greek Numbers
 
|ancientgreeknumbers
 
|[http://www.unicode.org/charts/PDF/U10140.pdf U10140.pdf]
 
|-
 
|10190–101CF
 
|Ancient Symbols
 
|ancientsymbols
 
|[http://www.unicode.org/charts/PDF/U10190.pdf U10190.pdf]
 
|-
 
|101D0–101FF
 
|Phaistos Disc
 
|phaistosdisc
 
|[http://www.unicode.org/charts/PDF/U101D0.pdf U101D0.pdf]
 
|-
 
|10280–1029F
 
|Lycian
 
|lycian
 
|[http://www.unicode.org/charts/PDF/U10280.pdf U10280.pdf]
 
|-
 
|102A0–102DF
 
|Carian
 
|carian
 
|[http://www.unicode.org/charts/PDF/U102A0.pdf U102A0.pdf]
 
|-
 
|102E0–102FF
 
|Coptic Epact Numbers
 
|copticepactnumbers
 
|[http://www.unicode.org/charts/PDF/U102E0.pdf U102E0.pdf]
 
|-
 
|10300–1032F
 
|Old Italic
 
|olditalic
 
|[http://www.unicode.org/charts/PDF/U10300.pdf U10300.pdf]
 
|-
 
|10330–1034F
 
|Gothic
 
|gothic
 
|[http://www.unicode.org/charts/PDF/U10330.pdf U10330.pdf]
 
|-
 
|10350–1037F
 
|Old Permic
 
|oldpermic
 
|[http://www.unicode.org/charts/PDF/U10350.pdf U10350.pdf]
 
|-
 
|10380–1039F
 
|Ugaritic
 
|ugaritic
 
|[http://www.unicode.org/charts/PDF/U10380.pdf U10380.pdf]
 
|-
 
|103A0–103DF
 
|Old Persian
 
|oldpersian
 
|[http://www.unicode.org/charts/PDF/U103A0.pdf U103A0.pdf]
 
|-
 
|10400–1044F
 
|Deseret
 
|deseret
 
|[http://www.unicode.org/charts/PDF/U10400.pdf U10400.pdf]
 
|-
 
|10450–1047F
 
|Shavian
 
|shavian
 
|[http://www.unicode.org/charts/PDF/U10450.pdf U10450.pdf]
 
|-
 
|10480–104AF
 
|Osmanya
 
|osmanya
 
|[http://www.unicode.org/charts/PDF/U10480.pdf U10480.pdf]
 
|-
 
|104B0–104FF
 
|Osage
 
|osage
 
|[http://www.unicode.org/charts/PDF/U104B0.pdf U104B0.pdf]
 
|-
 
|10500–1052F
 
|Elbasan
 
|elbasan
 
|[http://www.unicode.org/charts/PDF/U10500.pdf U10500.pdf]
 
|-
 
|10530–1056F
 
|Caucasian Albanian
 
|caucasianalbanian
 
|[http://www.unicode.org/charts/PDF/U10530.pdf U10530.pdf]
 
|-
 
|10600–1077F
 
|Linear A
 
|lineara
 
|[http://www.unicode.org/charts/PDF/U10600.pdf U10600.pdf]
 
|-
 
|10800–1083F
 
|Cypriot Syllabary
 
|cypriotsyllabary
 
|[http://www.unicode.org/charts/PDF/U10800.pdf U10800.pdf]
 
|-
 
|10840–1085F
 
|Imperial Aramaic
 
|imperialaramaic
 
|[http://www.unicode.org/charts/PDF/U10840.pdf U10840.pdf]
 
|-
 
|10860–1087F
 
|Palmyrene
 
|palmyrene
 
|[http://www.unicode.org/charts/PDF/U10860.pdf U10860.pdf]
 
|-
 
|10880–108AF
 
|Nabataean
 
|nabataean
 
|[http://www.unicode.org/charts/PDF/U10880.pdf U10880.pdf]
 
|-
 
|108E0–108FF
 
|Hatran
 
|hatran
 
|[http://www.unicode.org/charts/PDF/U108E0.pdf U108E0.pdf]
 
|-
 
|10900–1091F
 
|Phoenician
 
|phoenician
 
|[http://www.unicode.org/charts/PDF/U10900.pdf U10900.pdf]
 
|-
 
|10920–1093F
 
|Lydian
 
|lydian
 
|[http://www.unicode.org/charts/PDF/U10920.pdf U10920.pdf]
 
|-
 
|10980–1099F
 
|Meroitic Hieroglyphs
 
|meroitichieroglyphs
 
|[http://www.unicode.org/charts/PDF/U10980.pdf U10980.pdf]
 
|-
 
|109A0–109FF
 
|Meroitic Cursive
 
|meroiticcursive
 
|[http://www.unicode.org/charts/PDF/U109A0.pdf U109A0.pdf]
 
|-
 
|10A00–10A5F
 
|Kharoshthi
 
|kharoshthi
 
|[http://www.unicode.org/charts/PDF/U10A00.pdf U10A00.pdf]
 
|-
 
|10A60–10A7F
 
|Old South Arabian
 
|oldsoutharabian
 
|[http://www.unicode.org/charts/PDF/U10A60.pdf U10A60.pdf]
 
|-
 
|10A80–10A9F
 
|Old North Arabian
 
|oldnortharabian
 
|[http://www.unicode.org/charts/PDF/U10A80.pdf U10A80.pdf]
 
|-
 
|10AC0–10AFF
 
|Manichaean
 
|manichaean
 
|[http://www.unicode.org/charts/PDF/U10AC0.pdf U10AC0.pdf]
 
|-
 
|10B00–10B3F
 
|Avestan
 
|avestan
 
|[http://www.unicode.org/charts/PDF/U10B00.pdf U10B00.pdf]
 
|-
 
|10B40–10B5F
 
|Inscriptional Parthian
 
|inscriptionalparthian
 
|[http://www.unicode.org/charts/PDF/U10B40.pdf U10B40.pdf]
 
|-
 
|10B60–10B7F
 
|Inscriptional Pahlavi
 
|inscriptionalpahlavi
 
|[http://www.unicode.org/charts/PDF/U10B60.pdf U10B60.pdf]
 
|-
 
|10B80–10BAF
 
|Psalter Pahlavi
 
|psalterpahlavi
 
|[http://www.unicode.org/charts/PDF/U10B80.pdf U10B80.pdf]
 
|-
 
|10C00–10C4F
 
|Old Turkic
 
|oldturkic
 
|[http://www.unicode.org/charts/PDF/U10C00.pdf U10C00.pdf]
 
|-
 
|10C80–10CFF
 
|Old Hungarian
 
|oldhungarian
 
|[http://www.unicode.org/charts/PDF/U10C80.pdf U10C80.pdf]
 
|-
 
|10E60–10E7F
 
|Rumi Numeral Symbols
 
|ruminumeralsymbols
 
|[http://www.unicode.org/charts/PDF/U10E60.pdf U10E60.pdf]
 
|-
 
|11000–1107F
 
|Brahmi
 
|brahmi
 
|[http://www.unicode.org/charts/PDF/U11000.pdf U11000.pdf]
 
|-
 
|11080–110CF
 
|Kaithi
 
|kaithi
 
|[http://www.unicode.org/charts/PDF/U11080.pdf U11080.pdf]
 
|-
 
|110D0–110FF
 
|Sora Sompeng
 
|sorasompeng
 
|[http://www.unicode.org/charts/PDF/U110D0.pdf U110D0.pdf]
 
|-
 
|11100–1114F
 
|Chakma
 
|chakma
 
|[http://www.unicode.org/charts/PDF/U11100.pdf U11100.pdf]
 
|-
 
|11150–1117F
 
|Mahajani
 
|mahajani
 
|[http://www.unicode.org/charts/PDF/U11150.pdf U11150.pdf]
 
|-
 
|11180–111DF
 
|Sharada
 
|sharada
 
|[http://www.unicode.org/charts/PDF/U11180.pdf U11180.pdf]
 
|-
 
|111E0–111FF
 
|Sinhala Archaic Numbers
 
|sinhalaarchaicnumbers
 
|[http://www.unicode.org/charts/PDF/U111E0.pdf U111E0.pdf]
 
|-
 
|11200–1124F
 
|Khojki
 
|khojki
 
|[http://www.unicode.org/charts/PDF/U11200.pdf U11200.pdf]
 
|-
 
|11280–112AF
 
|Multani
 
|multani
 
|[http://www.unicode.org/charts/PDF/U11280.pdf U11280.pdf]
 
|-
 
|112B0–112FF
 
|Khudawadi
 
|khudawadi
 
|[http://www.unicode.org/charts/PDF/U112B0.pdf U112B0.pdf]
 
|-
 
|11300–1137F
 
|Grantha
 
|grantha
 
|[http://www.unicode.org/charts/PDF/U11300.pdf U11300.pdf]
 
|-
 
|11400–1147F
 
|Newa
 
|newa
 
|[http://www.unicode.org/charts/PDF/U11400.pdf U11400.pdf]
 
|-
 
|11480–114DF
 
|Tirhuta
 
|tirhuta
 
|[http://www.unicode.org/charts/PDF/U11480.pdf U11480.pdf]
 
|-
 
|11580–115FF
 
|Siddham
 
|siddham
 
|[http://www.unicode.org/charts/PDF/U11580.pdf U11580.pdf]
 
|-
 
|11600–1165F
 
|Modi
 
|modi
 
|[http://www.unicode.org/charts/PDF/U11600.pdf U11600.pdf]
 
|-
 
|11660–1167F
 
|Mongolian Supplement
 
|mongoliansupplement
 
|[http://www.unicode.org/charts/PDF/U11660.pdf U11660.pdf]
 
|-
 
|11680–116CF
 
|Takri
 
|takri
 
|[http://www.unicode.org/charts/PDF/U11680.pdf U11680.pdf]
 
|-
 
|11700–1173F
 
|Ahom
 
|ahom
 
|[http://www.unicode.org/charts/PDF/U11700.pdf U11700.pdf]
 
|-
 
|118A0–118FF
 
|Warang Citi
 
|warangciti
 
|[http://www.unicode.org/charts/PDF/U118A0.pdf U118A0.pdf]
 
|-
 
|11A00–11A4F
 
|Zanabazar Square
 
|zanabazarsquare
 
|[http://www.unicode.org/charts/PDF/U11A00.pdf U11A00.pdf]
 
|-
 
|11A50–11AAF
 
|Soyombo
 
|soyombo
 
|[http://www.unicode.org/charts/PDF/U11A50.pdf U11A50.pdf]
 
|-
 
|11AC0–11AFF
 
|Pau Cin Hau
 
|paucinhau
 
|[http://www.unicode.org/charts/PDF/U11AC0.pdf U11AC0.pdf]
 
|-
 
|11C00–11C6F
 
|Bhaiksuki
 
|bhaiksuki
 
|[http://www.unicode.org/charts/PDF/U11C00.pdf U11C00.pdf]
 
|-
 
|11C70–11CBF
 
|Marchen
 
|marchen
 
|[http://www.unicode.org/charts/PDF/U11C70.pdf U11C70.pdf]
 
|-
 
|11D00–11D5F
 
|Masaram Gondi
 
|masaramgondi
 
|[http://www.unicode.org/charts/PDF/U11D00.pdf U11D00.pdf]
 
|-
 
|12000–123FF
 
|Cuneiform
 
|cuneiform
 
|[http://www.unicode.org/charts/PDF/U12000.pdf U12000.pdf]
 
|-
 
|12400–1247F
 
|Cuneiform Numbers and Punctuation
 
|cuneiformnumbersandpunctuation
 
|[http://www.unicode.org/charts/PDF/U12400.pdf U12400.pdf]
 
|-
 
|12480–1254F
 
|Early Dynastic Cuneiform
 
|earlydynasticcuneiform
 
|[http://www.unicode.org/charts/PDF/U12480.pdf U12480.pdf]
 
|-
 
|13000–1342F
 
|Egyptian Hieroglyphs
 
|egyptianhieroglyphs
 
|[http://www.unicode.org/charts/PDF/U13000.pdf U13000.pdf]
 
|-
 
|14400–1467F
 
|Anatolian Hieroglyphs
 
|anatolianhieroglyphs
 
|[http://www.unicode.org/charts/PDF/U14400.pdf U14400.pdf]
 
|-
 
|16800–16A3F
 
|Bamum Supplement
 
|bamumsupplement
 
|[http://www.unicode.org/charts/PDF/U16800.pdf U16800.pdf]
 
|-
 
|16A40–16A6F
 
|Mro
 
|mro
 
|[http://www.unicode.org/charts/PDF/U16A40.pdf U16A40.pdf]
 
|-
 
|16AD0–16AFF
 
|Bassa Vah
 
|bassavah
 
|[http://www.unicode.org/charts/PDF/U16AD0.pdf U16AD0.pdf]
 
|-
 
|16B00–16B8F
 
|Pahawh Hmong
 
|pahawhhmong
 
|[http://www.unicode.org/charts/PDF/U16B00.pdf U16B00.pdf]
 
|-
 
|16F00–16F9F
 
|Miao
 
|miao
 
|[http://www.unicode.org/charts/PDF/U16F00.pdf U16F00.pdf]
 
|-
 
|16FE0–16FFF
 
|Ideographic Symbols and Punctuation
 
|ideographicsymbolsandpunctuation
 
|[http://www.unicode.org/charts/PDF/U16FE0.pdf U16FE0.pdf]
 
|-
 
|17000–187FF
 
|Tangut
 
|tangut
 
|[http://www.unicode.org/charts/PDF/U17000.pdf U17000.pdf]
 
|-
 
|18800–18AFF
 
|Tangut Components
 
|tangutcomponents
 
|[http://www.unicode.org/charts/PDF/U18800.pdf U18800.pdf]
 
|-
 
|1B000–1B0FF
 
|Kana Supplement
 
|kanasupplement
 
|[http://www.unicode.org/charts/PDF/U1B000.pdf U1B000.pdf]
 
|-
 
|1B100–1B12F
 
|Kana Extended-A
 
|kanaextendeda
 
|[http://www.unicode.org/charts/PDF/U1B100.pdf U1B100.pdf]
 
|-
 
|1B170–1B2FF
 
|Nushu
 
|nushu
 
|[http://www.unicode.org/charts/PDF/U1B170.pdf U1B170.pdf]
 
|-
 
|1BC00–1BC9F
 
|Duployan
 
|duployan
 
|[http://www.unicode.org/charts/PDF/U1BC00.pdf U1BC00.pdf]
 
|-
 
|1BCA0–1BCAF
 
|Shorthand Format Controls
 
|shorthandformatcontrols
 
|[http://www.unicode.org/charts/PDF/U1BCA0.pdf U1BCA0.pdf]
 
|-
 
|1D000–1D0FF
 
|Byzantine Musical Symbols
 
|byzantinemusicalsymbols
 
|[http://www.unicode.org/charts/PDF/U1D000.pdf U1D000.pdf]
 
|-
 
|1D100–1D1FF
 
|Musical Symbols
 
|musicalsymbols
 
|[http://www.unicode.org/charts/PDF/U1D100.pdf U1D100.pdf]
 
|-
 
|1D200–1D24F
 
|Ancient Greek Musical Notation
 
|ancientgreekmusicalnotation
 
|[http://www.unicode.org/charts/PDF/U1D200.pdf U1D200.pdf]
 
|-
 
|1D300–1D35F
 
|Tai Xuan Jing Symbols
 
|taixuanjingsymbols
 
|[http://www.unicode.org/charts/PDF/U1D300.pdf U1D300.pdf]
 
|-
 
|1D360–1D37F
 
|Counting Rod Numerals
 
|countingrodnumerals
 
|[http://www.unicode.org/charts/PDF/U1D360.pdf U1D360.pdf]
 
|-
 
|1D400–1D7FF
 
|Mathematical Alphanumeric Symbols
 
|mathematicalalphanumericsymbols
 
|[http://www.unicode.org/charts/PDF/U1D400.pdf U1D400.pdf]
 
|-
 
|1D800–1DAAF
 
|Sutton SignWriting
 
|suttonsignwriting
 
|[http://www.unicode.org/charts/PDF/U1D800.pdf U1D800.pdf]
 
|-
 
|1E000–1E02F
 
|Glagolitic Supplement
 
|glagoliticsupplement
 
|[http://www.unicode.org/charts/PDF/U1E000.pdf U1E000.pdf]
 
|-
 
|1E800–1E8DF
 
|Mende Kikakui
 
|mendekikakui
 
|[http://www.unicode.org/charts/PDF/U1E800.pdf U1E800.pdf]
 
|-
 
|1E900–1E95F
 
|Adlam
 
|adlam
 
|[http://www.unicode.org/charts/PDF/U1E900.pdf U1E900.pdf]
 
|-
 
|1EE00–1EEFF
 
|Arabic Mathematical Alphabetic Symbols
 
|arabicmathematicalalphabeticsymbols
 
|[http://www.unicode.org/charts/PDF/U1EE00.pdf U1EE00.pdf]
 
|-
 
|1F000–1F02F
 
|Mahjong Tiles
 
|mahjongtiles
 
|[http://www.unicode.org/charts/PDF/U1F000.pdf U1F000.pdf]
 
|-
 
|1F030–1F09F
 
|Domino Tiles
 
|dominotiles
 
|[http://www.unicode.org/charts/PDF/U1F030.pdf U1F030.pdf]
 
|-
 
|1F0A0–1F0FF
 
|Playing Cards
 
|playingcards
 
|[http://www.unicode.org/charts/PDF/U1F0A0.pdf U1F0A0.pdf]
 
|-
 
|1F100–1F1FF
 
|Enclosed Alphanumeric Supplement
 
|enclosedalphanumericsupplement
 
|[http://www.unicode.org/charts/PDF/U1F100.pdf U1F100.pdf]
 
|-
 
|1F200–1F2FF
 
|Enclosed Ideographic Supplement
 
|enclosedideographicsupplement
 
|[http://www.unicode.org/charts/PDF/U1F200.pdf U1F200.pdf]
 
|-
 
|1F300–1F5FF
 
|Miscellaneous Symbols and Pictographs
 
|miscellaneoussymbolsandpictographs
 
|[http://www.unicode.org/charts/PDF/U1F300.pdf U1F300.pdf]
 
|-
 
|1F600–1F64F
 
|Emoticons
 
|emoticons
 
|[http://www.unicode.org/charts/PDF/U1F600.pdf U1F600.pdf]
 
|-
 
|1F650–1F67F
 
|Ornamental Dingbats
 
|ornamentaldingbats
 
|[http://www.unicode.org/charts/PDF/U1F650.pdf U1F650.pdf]
 
|-
 
|1F680–1F6FF
 
|Transport and Map Symbols
 
|transportandmapsymbols
 
|[http://www.unicode.org/charts/PDF/U1F680.pdf U1F680.pdf]
 
|-
 
|1F700–1F77F
 
|Alchemical Symbols
 
|alchemicalsymbols
 
|[http://www.unicode.org/charts/PDF/U1F700.pdf U1F700.pdf]
 
|-
 
|1F780–1F7FF
 
|Geometric Shapes Extended
 
|geometricshapesextended
 
|[http://www.unicode.org/charts/PDF/U1F780.pdf U1F780.pdf]
 
|-
 
|1F800–1F8FF
 
|Supplemental Arrows-C
 
|supplementalarrowsc
 
|[http://www.unicode.org/charts/PDF/U1F800.pdf U1F800.pdf]
 
|-
 
|1F900–1F9FF
 
|Supplemental Symbols and Pictographs
 
|supplementalsymbolsandpictographs
 
|[http://www.unicode.org/charts/PDF/U1F900.pdf U1F900.pdf]
 
|-
 
|20000–2A6DF
 
|CJK Unified Ideographs Extension B
 
|cjkunifiedideographsextensionb
 
|[http://www.unicode.org/charts/PDF/U20000.pdf U20000.pdf]
 
|-
 
|2A700–2B73F
 
|CJK Unified Ideographs Extension C
 
|cjkunifiedideographsextensionc
 
|[http://www.unicode.org/charts/PDF/U2A700.pdf U2A700.pdf]
 
|-
 
|2B740–2B81F
 
|CJK Unified Ideographs Extension D
 
|cjkunifiedideographsextensiond
 
|[http://www.unicode.org/charts/PDF/U2B740.pdf U2B740.pdf]
 
|-
 
|2B820–2CEAF
 
|CJK Unified Ideographs Extension E
 
|cjkunifiedideographsextensione
 
|[http://www.unicode.org/charts/PDF/U2B820.pdf U2B820.pdf]
 
|-
 
|2CEB0–2EBEF
 
|CJK Unified Ideographs Extension F
 
|cjkunifiedideographsextensionf
 
|[http://www.unicode.org/charts/PDF/U2CEB0.pdf U2CEB0.pdf]
 
|-
 
|2F800–2FA1F
 
|CJK Compatibility Ideographs Supplement
 
|cjkcompatibilityideographssupplement
 
|[http://www.unicode.org/charts/PDF/U2F800.pdf U2F800.pdf]
 
|-
 
|E0000–E007F
 
|Tags
 
|tags
 
|[http://www.unicode.org/charts/PDF/UE0000.pdf UE0000.pdf]
 
|-
 
|E0100–E01EF
 
|Variation Selectors Supplement
 
|variationselectorssupplement
 
|[http://www.unicode.org/charts/PDF/UE0100.pdf UE0100.pdf]
 
|-
 
|F0000–FFFFF
 
|Supplementary Private Use Area-A
 
|supplementaryprivateuseareaa
 
|[http://www.unicode.org/charts/PDF/UF0000.pdf UF0000.pdf]
 
|-
 
|100000–10FFFF
 
|Supplementary Private Use Area-B
 
|supplementaryprivateuseareab
 
|[http://www.unicode.org/charts/PDF/U100000.pdf U100000.pdf]
 
|-
 
|}
 
  
== Usage of the blocks in ConTeXt ==
+
[[Category:Fonts]]
 +
[[Category:Languages]]

Latest revision as of 14:55, 8 June 2020

A Unicode block is an interval of code points which represent characters that are semantically related to each other. For example, there is a Unicode block for characters from the Devanagari script which is used by several Indian languages. Another Unicode block corresponds to characters which denote mathematical operators, such as those that indicate the union and the intersection of sets.

ConTeXt has special names for all Unicode blocks. These names can be used to specify ranges of code points in the setups of several commands.

This article uses some basic terms, such as character, code point, and assigned code point, from the Unicode Standard[1]. For brief descriptions of these terms, see the Unicode glossary[2].

Unicode blocks

A Unicode block, or, simply, a block, is any of the subsets of the Unicode code space that are listed in the file Blocks.txt[3] of the Unicode Character Database. The Unicode code space is the set of all code points, that is, the set of all integers from 0 to the integer whose hexadecimal representation is 10FFF.

The main properties of blocks are described in the Unicode Standard[1] (Section 3.4, paragraph D10b). Every block is an interval of code points, and distinct blocks are disjoint from each other. In particular, the blocks form a partition of a subset of the Unicode code space.

A block starts at a code point that is a multiple of 16. The number of code points in each block is also a multiple of 16. Thus, the hexadecimal representation of the first code point in a block is of the form pqrs0, and that of the last code point in it is of the form tuvwF, where p, q, r, s, t, u, v, and w, are hexadecimal digits.

The Unicode Standard gives every block a unique name that describes the common semantic nature of its code points. These names are case insensitive, and the hyphens, spaces, and underscores, in them are insignificant. For example, one can refer to the block whose Unicode name is Myanmar Extended-A as myanmarextendeda, MyanmarExtendedA, or myanmar_extended_a. ConTeXt chooses the first of these alternative styles for the names of blocks, as described below.

The number of code points in a block varies. Some, such as the block named Syriac Supplement, have just 16 code points, and some others, such as the block named CJK Unified Ideographs Extension B with 42720 elements, have thousands of code points.

Every assigned code point belongs to some block, but there are blocks which contain unassigned code points too; for example, the block named Telugu contains the unassigned code point 0C50. On the other hand, there are some code points, necessarily unassigned, which do not belong to any block; the code point 0870 is one such. Thus, the set of all assigned code points is a proper subset of the union of all the blocks, and the union of all the blocks is a proper subset of the Unicode code space.

ConTeXt names of Unicode blocks

ConTeXt has its own names for all the Unicode blocks. Most of them are obtained by converting the Unicode name of the block to the lower case, and removing the hyphens and spaces in the name. The article entitled List of Unicode blocks contains a table of Unicode blocks, their ConTeXt names, and links to more information about them.

An example usage of Unicode blocks in ConTeXt

A typical use of Unicode blocks is in the definition of fallback fonts to provide glyphs for certain characters. Sometimes, when writing a document in ConTeXt, one needs to typeset special symbols that are not available in the base font of the document. In such a situation, one can specify a fallback font to provide these missing symbols.

For example, in the following document, the base font TeX Gyre Pagella does not have the glyphs for Cyrillic characters, whose code points are in the Unicode block Cyrillic. The document uses the \definefallbackfamily command to get the glyphs for this block from the DejaVu Serif font. The ConTeXt name of the block is supplied as the value of the key range in the last setup of the command.

\definefallbackfamily [mainface] [rm] [DejaVu Serif] [range=cyrillic]

\definefontfamily     [mainface] [rm] [TeX Gyre Pagella]

\setupbodyfont        [mainface]

\starttext

\startlines
’Twas brillig, and the slithy toves
Did gyre and gimble in the wabe;
All mimsy were the borogoves,
And the mome raths outgrabe.
\stoplines

\rightaligned {— Lewis Caroll, Jabberwocky}

\startlines
Варкалось. Хливкие шорьки
Пырялись по наве,
И хрюкотали зелюки,
Как мюмзики в мове. 
\stoplines

\rightaligned {— Дина Григорьевна Орловская, Бармаглот}

\stoptext

Here is an image showing the relevant part of the PDF file obtained by running context on a file containing this document:

The verses in the above example are from the Wikipedia article on the poem Jabberwocky[4] by Lewis Carroll.

Another example

A different application of fallback fonts arises when one wants to replace the existing glyphs for some characters in the base font with glyphs for those characters from another font. This situation is different from the one in the previous example. There, the base font did not contain glyphs for the characters of interest, and the fallback font provided the missing glyphs. Here, the base font does contain glyphs for the characters in question, but, perhaps due to aesthetic reaosons, the author of the document, wants to replace those glyphs with glyphs from another font. In such a case, the latter font can be specified as a fallback font.

For example, the following document uses the pagella typescript to provide the base font, and uses the STIX General Regular font for mathematical script letters, which lie in the Unicode block Mathematical Alphanumeric Symbols. Instead of \definefallbackfamily which was used in the previous example, this document uses the command \definefontfallback. The ConTeXt name of the block is supplied as the third setup of this command. The last setup force=yes ensures that the glyphs of the relevant characters are replaced from the fallback font, overriding the glyphs that may exist in the base font for these characters.

\usetypescript      [pagella]

\definefontfallback [mathscript] [STIXGeneralRegular] [mathematicalalphanumericsymbols] [force=yes]

\definefontsynonym  [MathRoman]  [pagella]            [fallbacks=mathscript]

\setupbodyfont      [pagella]

\starttext

Look at this bestiary of mathematical script letters:

\startformula
𝒜, 𝒞, 𝒟, 𝒢, 𝒥, 𝒦, 𝒪, 𝒫, 𝒬, 𝒮, 𝒯, 𝒰, 𝒱, 𝒲, 𝒳, 𝒴, 𝒵
\stopformula

\stoptext

Here is an image showing the relevant part of the PDF file obtained by running context on a file containing this document:

unicode-blocks-in-context-example.png

The log file resulting from that run of context says

system > 13: filename=/usr/share/fonts/opentype/stix/STIXGeneral-Regular.otf ...

so context is indeed, and as expected, taking some of the glyphs from the fallback font, which, in this case, is provided by the local operating system.

See also

  • \definefontfallback — manual page with more information on the ConTeXt names of blocks, and their usage.
  • char-ini.lua — source file containing the definitions of the ConTeXt names of Unicode blocks.

References

  1. 1.0 1.1 The Unicode Consortium, The Unicode Standard, Version 10.0.0, The Unicode Consortium, Mountain View, CA, USA, 2016, http://www.unicode.org/versions/Unicode10.0.0/, Retrieved 2017-11-03.
  2. The Unicode Consortium, Glossary, http://www.unicode.org/glossary/, Retrieved 2017-11-03.
  3. The Unicode Consortium, Blocks.txt, ftp://www.unicode.org/Public/UNIDATA/Blocks.txt, Retrieved 2017-11-03.
  4. Wikipedia contributors, Jabberwocky, Wikipedia: The Free Encyclopaedia, 2017-11-03, 07:58 UTC, https://en.wikipedia.org/w/index.php?title=Jabberwocky&oldid=808507152, Retrieved 2017-11-03.