Changes

Arabic and Hebrew (view source)

Revision as of 20:06, 30 April 2020

2,837 bytes added , 20:06, 30 April 2020

[script=hebr]

</texcode>

Update by Joey McCollum, 2020-04-30:

It is a known issue that Unicode normalization (a process that sorts combining marks by their Unicode combining classes to ensure that Unicode string searches and comparisons are not hindered by different orderings of these marks) reorders niqqud and other points in a way that is not typographically or linguistically intuitive. Consider the following example:

%Setup minimal font features:

\definefontfeature[minimal][default][

ccmp=yes,

script=hebr

]

%Set up the main font:

\definefontfamily[hebrew] [rm] [SBL Hebrew] [features=minimal]

\setupbodyfont[hebrew]

%Set up right-to-left alignment:

\setupalign[r2l]

\starttext

%Normalized Unicode mark order:

בְּרֵאשִׁ֖ית בָּרָ֣א אֱלֹהִ֑ים אֵ֥ת הַשָּׁמַ֖יִם וְאֵ֥ת הָאָֽרֶץ׃

%Typographically recommended mark order:

בְּרֵאשִׁ֖ית בָּרָ֣א אֱלֹהִ֑ים אֵ֥ת הַשָּׁמַ֖יִם וְאֵ֥ת הָאָֽרֶץ׃

\stoptext

</texcode>

When a minimal set of OpenType features needed to render Hebrew points correctly is employed, the first sample text (which has been normalized) will fail to typeset several points that the second sample text (which has been left in its original, un-normalized order) gets right. This is because most Hebrew fonts anticipate a particular ordering of certain classes of characters in their substitution tables, and Unicode normalization reverses this ordering in some cases. This happens most often when the shin dot and sin dot (which Unicode assigns to the combining classes 24 and 25) and / or dagesh (which Unicode places in combining class 21) co-occur with vowels (which are variously assigned combining classes between 10 and 20); the fonts expect the shin / sin dot to occur first, then the dagesh, then the vowel, but Unicode normalization sorts these characters in the opposite order. This why the first letter of בְּרֵאשִׁ֖ית in the normalized text does not have its vowel typeset and the shin in הַשָּׁמַ֖יִם in the normalized text lacks both the dagesh and the vowel that it should have. The cantillation marks, meanwhile, are all placed correctly, because Unicode normalization and the font's substitution tables both order these marks after all of the other classes.

A number of typesetting engines, including Microsoft's Uniscribe and Xe(La)TeX, address this discrepancy by performing an on-the-fly re-sorting of the marks into a more intuitive order to ensure that they are typeset correctly. This way, the input text does not have to be reordered manually, and the Hebrew text is rendered as expected. At the time of this edit, the latest version of ConTeXt will also do this whenever the standard Hebrew featureset (i.e., features=hebrew) is enabled.

[[Category:International]]

[[Category:Fonts]]

JoeyMcCollum

2

edits

Changes

Arabic and Hebrew (view source)

Revision as of 20:06, 30 April 2020

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Main

Navigation

Indexes

Interaction

Tools