On 15/06/2018 15:53, Nathan Willis wrote:
It seems like this it what is used (the same regexps being used for all
scripts in HarfBuzz's Indic shaper):
matra_group = z{0,3}.M.N?.(H | forced_rakar)?;
[...]
halant_or_matra_group = (final_halant_group | (H.ZWJ)? matra_group{0,4});
... and that only permits four matras (total) per syllable.
I vaguely recall seeing a commit message or comment or something
indicating that this limit was there to maintain compatibility with how
Uniscribe matches syllables, but I searched around and couldn't find it
today. It was something along the lines of the Microsoft docs saying
"one matra for each type [L,R,T,B] is permitted," but that isn't clear
whether it's justified by orthography at all or is just a practical
concession that they made for some reason.
Others with more Uniscribe knowledge may know.
Indeed, the spec at
https://docs.microsoft.com/en-us/typography/script-development/devanagari#analyze-the-text
says "matra (up to one of each type: pre-, above-, below- or post- base)"
However, I'm not sure it's a good idea to enforce this restriction.
While "normal" spelling may abide by it, in casual writing people
sometimes like to use repeated matras, just as an English speaker might
write "Helloooooooo!"
E.g. see https://www.xossip.com/showthread.php?t=1498145, where the
writer uses a number of "stretched-out" spellings (search in the page
for आाााााााााााााह, for example).
JK
_______________________________________________
HarfBuzz mailing list
[email protected]
https://lists.freedesktop.org/mailman/listinfo/harfbuzz