Changes

Jump to navigation Jump to search
fixed typos, added links, markup and todo tags
http://fun.contextgarden.net/encodingtable/enctable.rb?ec,texnansi,8r,8a
''({{todo|I hope that the content of this section will soon move to a page on its own with more comprehensive overview of different encodings.)''}}
=== A note about the ec encoding ===
Ec encoding is also known under the names '''cork''' or '''T1''' (<code>\usepackage[T1]{fontenc}</code> in LaTeX). Its old version was '''dc''' (should not be used any more). Some of the glyph names in ec are old and deprecated, '''tex256''' uses the same set of glyphs, but the glyph names are compatible with Adobe, see also [ftp://tug.ctan.org/pub/tex-archive/info/fontname/tex256.enc tex256.enc] and [http://partners.adobe.com/public/developer/en/opentype/aglfn13.txt Adobe Glyph List].
=== Searching for non-asci ASCII characters in Adobe Reader ===
Some characters (<code>\ccaron</code> - 'č' being of them for example) are not properly recognized by Adobe (Acrobat) Reader (especially by older versions) when searching or copying text from PDF documents. In order to help Adobe Acrobat recognize the glyphs and treat them properly, add this piece of code to your source:
<texcode>
\input enco-pfr
</texcode>
At the time of writing this article, only '''il2 ''' and '''ec ''' are being supported, but support for other encodings can be added.
See also:
You find output/font encodings in <tt>enco-*.tex</tt> files.
See [http://czyborra.com/charsets/iso8859.html ISO 8859] for ISO standards.
==Typesetting in UTF-8==
Use <texcode>\enableregime[utf]</texcode> in order to be able to typeset in unicode under ConTeXt.
Unfortunately you must save your UTF-8 encoded files ''without '' BOM(byte order mark), because ConTeXt (or pdfTeX) doesn't ignore that but typesets the characters.
==Using non-ascii ASCII characters==
As a TeX/LaTeX user you were probably told to use the accents in the following way (the example is taken from the TeXBOOKTeXbook, page 24):
<texcode>
Once upon a time, in a distant
===How do I know which glyph name to use?===
* use <texcodecmd>\showcharacters</texcodecmd>* [http://partners.adobe.com/public/developer/en/opentype/aglfn13.txtAdobe glyph list]* browse the ConTeXt [[source:enco-acc.tex|source]]
* ask someone to put the list of the available glyphs on the Wiki -) <b>(or simply volunteer for that!)</b>
{{todo|list of the available glyphs}}
==How it works?==
'''Robert Ermers''' and '''[[User:adam|Adam]]''' provided a helpful explanation of how Characters characters are constructed in LaTeX and ConTeXt (in some discussion on the mailing list):
You know that all characters in a font have a number. If you type <code>a</code>, the font mechanism makes sure that you see an <context>a</context>. In reality the font shows you the character that is put on the numerical position of <code>a</code>. In the font dingbats Dingbats for example, the character on that position is not an <context>a</context>, but a symbol.
===In LatexLaTeX=== the combination <code>\"{a}</code> can mean two things:
* in most fonts: show the character on the a given numerical position, which means that there is one character <context>\"{a}</context>.
* in some other fonts <code>\"{a}</code> means: combine <code>"</code> with <code>a</code> and make an <context>\"{a}</context>. This means that <code>"</code> is combined with the character on the numerical position of <code>a</code>. TeX does this very well and thus construes very acceptable diacritical signs like <code>\"{q}</code>, <code>\d{o}</code>, <code>\v{o}</code>, which do not exist in regular fonts.
If you have a font which contains <context>\"{q}</context>(<code>\"{q}</code>), <context>\d{o}</context>(<code>\d{o}</code>) or some other special characters, you may instruct TeX not to create the character, but rather to show the contents of a given numerical position in that font. That's what the .enc and .fd files under Latex LaTeX are for.
That's also the reason there are, or used to be, special fonts for Polish an Czech and other languages: they contain predefined characters in one single numerical position, e.g. <code>\v{s}</code> and <code>\v{c}</code> that TeX does not have to create anew from two signs.
===In ConTeXt===the combination <code>\"{a}</code> means one thing: <code>\adiaeresis</code> (see <b>[[source:enco-acc.tex|enco-acc</b>]]). This <code>\adiaeresis</code> can mean one of two things, depending on the encoding:
* Numerical position, or
* The fallback case (defined in <b>[[source:enco-def</b>.tex|enco-def]]), where a diaeresis/umlaut is placed atop an <context>a</context> glyph. Hyphenation implications as Hans described.
The interesting/helpful thing about ConTeXt is that internally, that glyph is given a consistent name, no matter how it is input or output. So, if you type <code>ä</code> in your given input regime, and that encoding is properly set, that numerical <code>ä</code> (e.g., character <code>#228</code> in the windows regime) is mapped to <code>\adiaeresis</code>.
Wanna know what happens in '''UTF-8'''? Here's my a 'simplified' explanation:
In a UTF-8 bytestream, that character <context>\"{a}</context> is signified by two bytes:
<code>0xC3</code>, <code>0xA4</code>. That first byte triggers a conversion of both bytes into two
different bytes, the actual Unicode number, <code>0x00 0xE4</code> (or: <code>0, 228</code>). ConTeXt then looks into internal hashes set up (in this case, the <b>[[source:unic-000</b> .tex|unic-000]] vector), looks at the 228<sup>th</sup> element, and sees that it's <code>\adiaeresis</code>. Things then proceed as normal. :)
(It's also interesting to note that for PostScript and TrueType fonts, that number -> name -> number (glyph) mapping happens yet again in the driver. But all that is outside of TeX proper, so to say any more would be confusing.)
==External links==
* [http://en.wikipedia.org/wiki/Alphabets_derived_from_the_Latin Alphabets derived from the Latin] (to be moved to a better place/another page)* [http://www.eki.ee/letter/ Letter database]: languages, character sets, names etc.
[[Category:Fonts]]
[[Category:International]]

Navigation menu