On 10/12/15 06:51, Behdad Esfahbod wrote:
Hi Jonathan,

Sorry for the delay.  I've been thinking about this for multiple days.
Initially my dislike for this proposal was on several principles:

- Using glyph-props in hb-buffer is a layering violation,

Yes, I figured that would be distasteful.


- Since we are in cluster-level=1 anyway, why include marks forward?

If we're ligating two bases in a sequence such as

  <baseA.0, baseB.1, mark.1>

and don't include marks forward, we'd end up with

  <ligAB.0, mark.1>

which splits the mark from the base to which it's applied. Including marks forward avoids this.


- Why extend backward?  One can equally easily build a font that ligates
forward, instead of backward, and you will have the same problem,

Hmm. I'm not sure I am visualizing the problem scenario you have in mind here.

AFAICS, extending backward can only be relevant when reordering has happened (so that there's a lower cluster value somewhere within the start..end range than the current cluster value of the start glyph -- e.g. because start is a left-matra that we've just moved to the front of the syllable).

(Aside: maybe it would be a useful micro-optimization to distinguish two versions of merge_clusters; one that is used when the shaper (e.g. Indic or USE) has reordered things, and does the scan-for-minimum and extend-backwards stuff, and a simpler method for use when ligating, which doesn't need to do that. This version wouldn't need to do the start-of-buffer and continue-in-outbuf check, either.)


- So this becomes more about not merging clusters at all, which is indeed
cluster-level=3.  The problem is, if we do that, it's not clear to me, or I
suppose to anyone, what the cluster values mean anymore.

Currently, there's a systematic description for what the cluster values mean:
"these glyphs represent those characters and we don't know anything more
granular."  With the suggested patch, the cluster values don't mean anything
anymore.  Indeed, because a glyph from one cluster leaked into another cluster
and we're not telling that to the client.

Yeah, I agree this makes the meaning of "cluster" less well-defined. Though it's not clear to me how far this is really a problem...


BTW, I see Uniscribe returns a different result (equally "wrong" as HarfBuzz's):

$ hb-unicode-encode 20,633,627,644 | hb-shape.exe JNN.ttf
[lam.l=3+1107|blank=3+1|sa.l=0+1094|space=0+1]

$ hb-unicode-encode 20,633,627,644 | hb-shape.exe JNN.ttf --shaper=uniscribe
[lam.l=3+1107|blank=3+1|sa.l=2+1094|space=0+1]

I suppose when the ligature for sa.l multiplied, Uniscribe assumed that it has
decomposed to it's original components.


Anyway, today I found a use-case that will definitely go wrong with your
suggested patch.  Imagine another Nastaliq font, that initially decomposes
each letter to a body, and a connector, in that order.  In a following lookup,
the connectors might ligate with the body glyph after them.  With your
suggested patch, we end up allocating the letter bodies to the cluster of
their previous letter, which is clearly wrong and will result in incorrect
cursoring.

So the scenario runs something like

  <letterA.0, letterB.1>

  <bodyA.0, connectorA.0, bodyB.1, connectorB.1>

  <bodyA.0, joinedBodyB.0, connectorB.1>

Yes, that's not ideal. :( Though whether it'll result in worse user experience than the current

  <bodyA.0, joinedBodyB.0, connectorB.0>

may be hard to say.


I think we might need to look outside the lookups for clues as to what's
actually going on, so we can distinguish these two legitimate cases from
eachother.  Eg. when a glyph multiplies into many, we want to know which
components represent the main body of the letters and which are "side
components".  Looking at glyph advance widths is a good heuristic, but is
undesirable during substitution.  How about, we look at the GDEF class of
Component=4?  That's currently not used for anything AFAIK.  I don't know how
the implementation will look like, but it's definitely possible to tell, eg,
JNN developers, to give the blank glyph a GDEF class of Component...

WDYT?

Without having tried to think it through in detail, that sounds like a promising idea. Worth hacking up an implementation to test, maybe?

JK

behdad

On 15-11-30 08:30 AM, Jonathan Kew wrote:
Hey Behdad,

I'm wondering if we can make merge_clusters a little more conservative....?

Here's the scenario:

Assume we start with two independent base glyphs with distinct cluster numbers:

   <glyphA.0, glyphB.1>

Then a MultipleSubst lookup expands glyphB to two parts, which both inherit
glyphB's cluster value:

   <glyphA.0, glyphB1.1, glyphB2.1>

Next, a LigatureSubst lookup combines glyphA with glyphB1. Currently, because
merge_clusters extends its target range to include any following glyphs that
share the same cluster value as the last one in the range, we'll get:

   <glyphAB1.0, glyphB2.0>

which ISTM is less than ideal. It's not clear to me that there's any totally
"right" result here, but what would seem more useful to me, at least, would be
to leave glyphB2's cluster untouched:

   <glyphAB1.0, glyphB2.1>

(In particular, this would resolve
https://bugzilla.mozilla.org/show_bug.cgi?id=1212668.)

I assume we'd still want to extend the end in merge_clusters when the
following glyph(s) are marks, so could we do something like the attached?

JK


_______________________________________________
HarfBuzz mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/harfbuzz

Reply via email to