Difference between revisions of "Arabic and Hebrew"

From Wiki
Jump to navigation Jump to search
m (make categorization match the new site organization)
 
(4 intermediate revisions by 2 users not shown)
Line 2: Line 2:
  
 
{{todo|overview page for the use of middle-eastern scripts}}
 
{{todo|overview page for the use of middle-eastern scripts}}
 
== Examples ==
 
  
 
== Arabic ==
 
== Arabic ==
Line 190: Line 188:
 
</texcode>
 
</texcode>
  
[[Category:International]]
+
== Hebrew ==
 +
 
 +
(Example by Rik Kabel on the mailing list, 2017-12-15)
 +
 
 +
Depending on the font, correct niqqud placement requires some combination (sometimes all) of the following font features: lang, ccmp, and script.
 +
 
 +
<texcode>
 +
\definefontfeature [hebrew] [oldstyle] [
 +
  lang=heb,
 +
  ccmp=yes,
 +
  script=hebr,
 +
]
 +
\starttext
 +
\definedfont[name:EzraSIL*hebrew at 72pt]
 +
\setupalign[r2l]
 +
טְרוֹפוֹתִי
 +
\stoptext
 +
</texcode>
 +
 
 +
The same settings work well for Narkisim. For David_CLM you only need the script setting.
 +
 
 +
Correction by Hans, 2017-12-16:
 +
 
 +
There is a font feature hebrew already predefined
 +
 
 +
<texcode>
 +
\definefontfeature [hebrewoldstyle] [oldstyle] [...]
 +
</texcode>
 +
 
 +
if you have to set a language depends on the font (often dflt is ok).
 +
 
 +
This is the predefined set:
 +
 
 +
<texcode>
 +
\definefontfeature
 +
[semitic-complete]
 +
[mode=node,analyze=yes,language=dflt,ccmp=yes,
 +
  autoscript=position,autolanguage=position,
 +
  init=yes,medi=yes,fina=yes,isol=yes,
 +
  mark=yes,mkmk=yes,kern=yes,curs=yes,
 +
  liga=yes,dlig=yes,rlig=yes,clig=yes,calt=yes]
 +
 
 +
\definefontfeature
 +
[semitic-simple]
 +
[mode=node,analyze=yes,language=dflt,ccmp=yes,
 +
  autoscript=position,autolanguage=position,
 +
  init=yes,medi=yes,fina=yes,isol=yes,
 +
  mark=yes,mkmk=yes,kern=yes,curs=yes,
 +
  rlig=yes,calt=yes]
 +
 
 +
\definefontfeature
 +
[arabic]
 +
[semitic-complete]
 +
[script=arab]
 +
 
 +
\definefontfeature
 +
[syriac]
 +
[arabic]
 +
[fin2=yes,fin3=yes,med2=yes]
 +
 
 +
\definefontfeature
 +
[hebrew]
 +
[semitic-complete]
 +
[script=hebr]
 +
 
 +
\definefontfeature
 +
[simplearabic]
 +
[semitic-simple]
 +
[script=arab]
 +
 
 +
\definefontfeature
 +
[simplehebrew]
 +
[semitic-simple]
 +
[script=hebr]
 +
</texcode>
 +
 
 +
Update by Joey McCollum, 2020-04-30:
 +
 
 +
It is a known issue that Unicode normalization (a process that sorts combining marks by their Unicode combining classes to ensure that Unicode string searches and comparisons are not hindered by different orderings of these marks) reorders niqqud and other points in a way that is not typographically or linguistically intuitive. Consider the following example:
 +
 
 +
<texcode>
 +
%Setup minimal font features:
 +
\definefontfeature[minimal][default][
 +
    ccmp=yes,
 +
    script=hebr
 +
]
 +
%Set up the main font:
 +
\definefontfamily[hebrew] [rm] [SBL Hebrew] [features=minimal]
 +
\setupbodyfont[hebrew]
 +
%Set up right-to-left alignment:
 +
\setupalign[r2l]
 +
\starttext
 +
    %Normalized Unicode mark order:
 +
    בְּרֵאשִׁ֖ית בָּרָ֣א אֱלֹהִ֑ים אֵ֥ת הַשָּׁמַ֖יִם וְאֵ֥ת הָאָֽרֶץ׃
 +
\stoptext
 +
</texcode>
 +
 
 +
(To compare this to a non-normalized version of the same text, copy and paste the text of Genesis 1:1 from https://tanach.us/; it cannot be included in the example above because, alas, the Wiki would apply Unicode normalization to it!)
 +
 
 +
When a minimal set of OpenType features needed to render Hebrew points correctly is employed, this normalized sample text will fail to typeset several points. But if you copy and paste the un-normalized text from the link above into the same example, you will find that it gets typeset completely and correctly. This is because most Hebrew fonts anticipate a particular ordering of certain classes of characters in their substitution tables, and Unicode normalization reverses this ordering in some cases. This happens most often when the shin dot and sin dot (which Unicode assigns to the combining classes 24 and 25) and / or dagesh (which Unicode places in combining class 21) co-occur with vowels (which are variously assigned combining classes between 10 and 20); the fonts expect the shin / sin dot to occur first, then the dagesh, then the vowel, but Unicode normalization sorts these characters in the opposite order. This why the first letter of בְּרֵאשִׁ֖ית in the normalized text does not have its vowel typeset and the shin in הַשָּׁמַ֖יִם in the normalized text lacks both the dagesh and the vowel that it should have. The cantillation marks, meanwhile, are all placed correctly, because Unicode normalization and the font's substitution tables both order these marks after all of the other classes.
 +
 
 +
A number of typesetting engines, including Microsoft's Uniscribe and Xe(La)TeX, address this discrepancy by performing an on-the-fly re-sorting of the marks into a more intuitive order to ensure that they are typeset correctly. This way, the input text does not have to be reordered manually, and the Hebrew text is rendered as expected. At the time of this edit, the latest version of ConTeXt will also do this whenever the standard Hebrew featureset (i.e., features=hebrew) is enabled.
 +
 
 +
[[Category:Languages]]
 
[[Category:Fonts]]
 
[[Category:Fonts]]

Latest revision as of 12:20, 8 June 2020

Arabic fonts >


TODO: overview page for the use of middle-eastern scripts (See: To-Do List)


Arabic

This is an example environment for typesetting Arabic documents in Mark IV (ConTeXt with LuaTeX). It won't work at all in Mark II (with either pdfTeX or XeTeX).

Save it as "ara-sty.tex" and use "\environment ara-sty" in your document.

\startenvironment ara-sty

\mainlanguage[arabic]

% Font setup

\definefontfeature
   [arabic]
   [mode=node,language=dflt,script=arab,
    init=yes,medi=yes,fina=yes,isol=yes,
    liga=yes,dlig=yes,rlig=yes,clig=yes,
    mark=yes,mkmk=yes,kern=yes,curs=yes]

\starttypescript [serif] [arabic]
 \definefontsynonym [Arabic-Light]       [name:arabtype] [features=arabic]
 \definefontsynonym [Arabic-Bold]        [name:arabtype] [features=arabic]
 \definefontsynonym [Arabic-Italic]      [name:arabtype] [features=arabic]
 \definefontsynonym [Arabic-Bold-Italic] [name:arabtype] [features=arabic]
\stoptypescript

\starttypescript [serif] [arabic] [name]
 \usetypescript[serif][fallback]
 \definefontsynonym [Serif]           [Arabic-Light]       [features=arabic]
 \definefontsynonym [SerifItalic]     [Arabic-Italic]      [features=arabic]
 \definefontsynonym [SerifBold]       [Arabic-Bold]        [features=arabic]
 \definefontsynonym [SerifBoldItalic] [Arabic-Bold-Italic] [features=arabic]
\stoptypescript

\starttypescript [Arabic]
  \definetypeface [Arabic] [rm] [serif] [arabic] [default] 
\stoptypescript 

\def\ArabicGlobalDir {\pagedir TRT\bodydir TRT\pardir TRT\textdir TRT}
\def\ArabicParDir    {\textdir TRT\pardir TRT}
\def\ArabicTextDir   {\textdir TRT}
\def\LatinParDir     {\textdir TLT\pardir TLT}
\def\LatinTextDir    {\textdir TLT}
\def\LatinGlobalDir  {\pagedir TLT\bodydir TLT\pardir TLT\textdir TLT}

\define\setarabic
  {\ArabicGlobalDir%
   \usetypescript[Arabic]%
   \setupbodyfont[Arabic,20pt]}

\definestartstop
  [arabicpar]
  [commands=\Arabic\ArabicParDir]

\define[1]\RT
  {{\Arabic\ArabicTextDir#1}}

\define\setlatin
  {\LatinGlobalDir%
   \usetypescript[lm]%
   \setupbodyfont[lm,20pt]}

\definestartstop
  [latinpar]
  [commands=\Arabic\LatinParDir]

\define[1]\LT
  {{\LatinTextDir#1}}

\setcharactermirroring[1]

\stopenvironment

Description

Here is some description:

\mainlanguage[arabic]

Sets the main language to Arabic, so that translatable titles are translated to Arabic.

\definefontfeature
   [arabic]
   [mode=node,language=dflt,script=arab,
    init=yes,medi=yes,fina=yes,isol=yes,
    liga=yes,dlig=yes,rlig=yes,clig=yes,
    mark=yes,mkmk=yes,kern=yes,curs=yes]

Here we define OpenType font features needed to render Arabic properly.

\starttypescript [serif] [arabic]
 \definefontsynonym [Arabic-Light]       [name:arabtype] [features=arabic]
 \definefontsynonym [Arabic-Bold]        [name:arabtype] [features=arabic]
 \definefontsynonym [Arabic-Italic]      [name:arabtype] [features=arabic]
 \definefontsynonym [Arabic-Bold-Italic] [name:arabtype] [features=arabic]
 \stoptypescript

\starttypescript [serif] [arabic] [name]
 \usetypescript[serif][fallback]
 \definefontsynonym [Serif]           [Arabic-Light]       [features=arabic]
 \definefontsynonym [SerifItalic]     [Arabic-Italic]      [features=arabic]
 \definefontsynonym [SerifBold]       [Arabic-Bold]        [features=arabic]
 \definefontsynonym [SerifBoldItalic] [Arabic-Bold-Italic] [features=arabic]
\stoptypescript

\starttypescript [Arabic]
  \definetypeface [Arabic] [rm] [serif] [arabic] [default] 
\stoptypescript 

Then, we define "Arabic" typescript, here we used a font named "arabtype". Since this font has only regular weight, we set bold and italic to use the same font.

\def\ArabicGlobalDir {\pagedir TRT\bodydir TRT\pardir TRT\textdir TRT}
\def\ArabicParDir    {\textdir TRT\pardir TRT}
\def\ArabicTextDir   {\textdir TRT}
\def\LatinParDir     {\textdir TLT\pardir TLT}
\def\LatinTextDir    {\textdir TLT}
\def\LatinGlobalDir  {\pagedir TLT\bodydir TLT\pardir TLT\textdir TLT}

Here we define some directional commands to use it in the next parts.

\define\setarabic
  {\ArabicGlobalDir%
   \usetypescript[Arabic]%
   \setupbodyfont[Arabic,20pt]}

\definestartstop
  [arabicpar]
  [commands=\Arabic\ArabicParDir]

\define[1]\RT
  {{\Arabic\ArabicTextDir#1}}

Here we define "arabicpar" environment for Arabic paragraphs in Latin context, and "\RT" for short Arabic sentences and "\setarabic" command to set the main document direction and font to Arabic.

\define\setlatin
  {\LatinGlobalDir%
   \usetypescript[lm]%
   \setupbodyfont[lm,20pt]}

\definestartstop
  [latinpar]
  [commands=\Arabic\LatinParDir]

\define[1]\LT
  {{\LatinTextDir#1}}

The counter Latine commands, "latinpar", "\LT" and "\setlatin".

\setcharactermirroring[1]

To enable mirroring of BiDi mirrored characters, like () and []. This also enables "implicit bidi", so that you don't need to explicitly specify the direction of individual Arabic sentences inside Latin context in vise versa.

Using it

Now, lets try a "Hello World" example:

% engine=luatex

\environment ara-sty

\starttext
\setarabic

أهلا بالعالم!
\stoptext

Hebrew

(Example by Rik Kabel on the mailing list, 2017-12-15)

Depending on the font, correct niqqud placement requires some combination (sometimes all) of the following font features: lang, ccmp, and script.

\definefontfeature [hebrew] [oldstyle] [
  lang=heb,
  ccmp=yes,
  script=hebr,
]
\starttext
\definedfont[name:EzraSIL*hebrew at 72pt]
\setupalign[r2l]
טְרוֹפוֹתִי
\stoptext

The same settings work well for Narkisim. For David_CLM you only need the script setting.

Correction by Hans, 2017-12-16:

There is a font feature hebrew already predefined

\definefontfeature [hebrewoldstyle] [oldstyle] [...]

if you have to set a language depends on the font (often dflt is ok).

This is the predefined set:

\definefontfeature
 [semitic-complete]
 [mode=node,analyze=yes,language=dflt,ccmp=yes,
  autoscript=position,autolanguage=position,
  init=yes,medi=yes,fina=yes,isol=yes,
  mark=yes,mkmk=yes,kern=yes,curs=yes,
  liga=yes,dlig=yes,rlig=yes,clig=yes,calt=yes]

\definefontfeature
 [semitic-simple]
 [mode=node,analyze=yes,language=dflt,ccmp=yes,
  autoscript=position,autolanguage=position,
  init=yes,medi=yes,fina=yes,isol=yes,
  mark=yes,mkmk=yes,kern=yes,curs=yes,
  rlig=yes,calt=yes]

\definefontfeature
 [arabic]
 [semitic-complete]
 [script=arab]

\definefontfeature
 [syriac]
 [arabic]
 [fin2=yes,fin3=yes,med2=yes]

\definefontfeature
 [hebrew]
 [semitic-complete]
 [script=hebr]

\definefontfeature
 [simplearabic]
 [semitic-simple]
 [script=arab]

\definefontfeature
 [simplehebrew]
 [semitic-simple]
 [script=hebr]

Update by Joey McCollum, 2020-04-30:

It is a known issue that Unicode normalization (a process that sorts combining marks by their Unicode combining classes to ensure that Unicode string searches and comparisons are not hindered by different orderings of these marks) reorders niqqud and other points in a way that is not typographically or linguistically intuitive. Consider the following example:

%Setup minimal font features:
\definefontfeature[minimal][default][
    ccmp=yes,
    script=hebr
]
%Set up the main font:
\definefontfamily[hebrew] [rm] [SBL Hebrew] [features=minimal]
\setupbodyfont[hebrew]
%Set up right-to-left alignment:
\setupalign[r2l]
\starttext
    %Normalized Unicode mark order:
    בְּרֵאשִׁ֖ית בָּרָ֣א אֱלֹהִ֑ים אֵ֥ת הַשָּׁמַ֖יִם וְאֵ֥ת הָאָֽרֶץ׃
\stoptext

(To compare this to a non-normalized version of the same text, copy and paste the text of Genesis 1:1 from https://tanach.us/; it cannot be included in the example above because, alas, the Wiki would apply Unicode normalization to it!)

When a minimal set of OpenType features needed to render Hebrew points correctly is employed, this normalized sample text will fail to typeset several points. But if you copy and paste the un-normalized text from the link above into the same example, you will find that it gets typeset completely and correctly. This is because most Hebrew fonts anticipate a particular ordering of certain classes of characters in their substitution tables, and Unicode normalization reverses this ordering in some cases. This happens most often when the shin dot and sin dot (which Unicode assigns to the combining classes 24 and 25) and / or dagesh (which Unicode places in combining class 21) co-occur with vowels (which are variously assigned combining classes between 10 and 20); the fonts expect the shin / sin dot to occur first, then the dagesh, then the vowel, but Unicode normalization sorts these characters in the opposite order. This why the first letter of בְּרֵאשִׁ֖ית in the normalized text does not have its vowel typeset and the shin in הַשָּׁמַ֖יִם in the normalized text lacks both the dagesh and the vowel that it should have. The cantillation marks, meanwhile, are all placed correctly, because Unicode normalization and the font's substitution tables both order these marks after all of the other classes.

A number of typesetting engines, including Microsoft's Uniscribe and Xe(La)TeX, address this discrepancy by performing an on-the-fly re-sorting of the marks into a more intuitive order to ensure that they are typeset correctly. This way, the input text does not have to be reordered manually, and the Hebrew text is rendered as expected. At the time of this edit, the latest version of ConTeXt will also do this whenever the standard Hebrew featureset (i.e., features=hebrew) is enabled.