Changes

Arabic and Hebrew (view source)

Revision as of 20:14, 30 April 2020

95 bytes added , 20:14, 30 April 2020

→‎Hebrew

\starttext

%Normalized Unicode mark order:

~~בְּרֵאשִׁ֖ית בָּרָ֣א אֱלֹהִ֑ים אֵ֥ת הַשָּׁמַ֖יִם וְאֵ֥ת הָאָֽרֶץ׃~~

~~%Typographically recommended mark order:~~

בְּרֵאשִׁ֖ית בָּרָ֣א אֱלֹהִ֑ים אֵ֥ת הַשָּׁמַ֖יִם וְאֵ֥ת הָאָֽרֶץ׃

\stoptext

</texcode>

(To compare this to a non-normalized version of the same text, copy and paste the text of Genesis 1:1 from https://tanach.us/; it cannot be included in the example above because, alas, the Wiki would apply Unicode normalization to it!) When a minimal set of OpenType features needed to render Hebrew points correctly is employed, ~~the first~~ this normalized sample text ~~(which has been normalized)~~ will fail to typeset several points ~~that~~ . But if you copy and paste the ~~second sample text (which has been left in its original,~~ un-normalized ~~order)~~ text from the link above into the same example, you will find that it gets ~~right~~typeset completely and correctly. This is because most Hebrew fonts anticipate a particular ordering of certain classes of characters in their substitution tables, and Unicode normalization reverses this ordering in some cases. This happens most often when the shin dot and sin dot (which Unicode assigns to the combining classes 24 and 25) and / or dagesh (which Unicode places in combining class 21) co-occur with vowels (which are variously assigned combining classes between 10 and 20); the fonts expect the shin / sin dot to occur first, then the dagesh, then the vowel, but Unicode normalization sorts these characters in the opposite order. This why the first letter of בְּרֵאשִׁ֖ית in the normalized text does not have its vowel typeset and the shin in הַשָּׁמַ֖יִם in the normalized text lacks both the dagesh and the vowel that it should have. The cantillation marks, meanwhile, are all placed correctly, because Unicode normalization and the font's substitution tables both order these marks after all of the other classes.

A number of typesetting engines, including Microsoft's Uniscribe and Xe(La)TeX, address this discrepancy by performing an on-the-fly re-sorting of the marks into a more intuitive order to ensure that they are typeset correctly. This way, the input text does not have to be reordered manually, and the Hebrew text is rendered as expected. At the time of this edit, the latest version of ConTeXt will also do this whenever the standard Hebrew featureset (i.e., features=hebrew) is enabled.

JoeyMcCollum

2

edits

Changes

Arabic and Hebrew (view source)

Revision as of 20:14, 30 April 2020

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Main

Navigation

Indexes

Interaction

Tools