Also things like generating PDFs that have better extractable text. On Fri, Nov 2, 2018 at 5:00 PM Behdad Esfahbod <[email protected]> wrote:
> On Fri, Nov 2, 2018 at 4:48 PM Nathan Willis <[email protected]> > wrote: > >> Hiyo. I'm revisiting the 'clusters' chapter in the User Manual, to make >> it more consistent with the rest and hopefully easier to understand. >> Rereading it has raised some questions.... >> >> 1) The opening sentence says "a cluster is a sequence of code points >> [...]" >> ...which might be true for the initial buffer contents, but all the >> interesting stuff happens after replacing them with glyphs. So that will >> certainly need some changing. I know it's a can of worms to open, but what >> if we said "characters" here? Explaining the relationship between code >> points, characters, and glyphs can be tricky, but then again explaining >> clusters to new readers is already difficult.... >> > > Right. "code points" by itself doesn't mean anything. They might be > Unicode code points, aka characters, or glyph code points, aka glyphs. > > A cluster refers to a sequence of characters and their corresponding > glyphs. Or the other way around, depending on your taste. > > 2) "Most clients will use UTF-8, UTF-16, or UTF-32 indices, but the actual >> number does not matter" ... is "indices" here referring to the buffer >> contents (code points)? >> > > Refers to position in the text that was passed to hb_buffer_add_utf8/16/32. > > >> 3) "Moreover, it is not required for the cluster values to be >> monotonically increasing. Most of HarfBuzz's tests are performed on >> monotonically increasing cluster numbers but, there is no such assumption >> in the code itself." >> > > Some of that sentence is implementation detals and can go. The first > sentence is enough. > > ... This is the big one. The examples that follow in subsequent >> subsections hinge on the fact that the cluster values need to be >> monotonically increasing. Keeping them monotonic & increasing is given as >> the reason that clusters get merged when reordering (levels 0 and 1). So >> this sentence sticks out. I'm not sure how to resolve that discrepancy; can >> anyone explain how both of those pieces are supposed to fit together? >> > > There's two things: > > 1. Whether or not the input clusters are monotonic, > > 2. Whether buffer cluster-level is set to any of the monotonic enum > values. > > The promise is that *if* both of those are true, then the output cluster > values are monotonic. If any of the above is false, there's no guarantee. > > >> Finally, I am adding a short "why your software cares about clusters" >> paragraph to the beginning. I've got cursor positioning, coloring >> diacritics, and line breaking in mind; anything else worth mentioning? >> > > Text selection. Is same as positioning but still. > > Thanks, >> Nate >> -- >> nathan.p.willis >> [email protected] <http://identi.ca/n8> >> _______________________________________________ >> HarfBuzz mailing list >> [email protected] >> https://lists.freedesktop.org/mailman/listinfo/harfbuzz >> > > > -- > behdad > http://behdad.org/ > -- behdad http://behdad.org/
_______________________________________________ HarfBuzz mailing list [email protected] https://lists.freedesktop.org/mailman/listinfo/harfbuzz
