Hi Martin, others,
On 15-12-21 01:59 AM, Martin Hosken wrote:
> Dear Behdad,
>
>> buf = hb.buffer_create ()
>> +class Debugger(object):
>> + def message (self, buf, font, msg, data, _x_what_is_this):
>> + print(msg)
>> + return True
>> +debugger = Debugger()
>> +hb.buffer_set_message_func (buf, debugger.message, 1, 0)
>> hb.buffer_add_utf8 (buf, text.encode('utf-8'), 0, -1)
>> hb.buffer_guess_segment_properties (buf)
>
> Yippee. At last, a debug interface :) (Behdad reminds me that I have been
> asking this once per year for the last 4 years!). Thank you.
>
> OK. Now to make a great debug interface!
>
> There are two ways of doing a debug interface: Event driven and One shot.
> There are probably more, but those are the only two that come to mind now.
> One shot sends all the information needed to give all the debug information
> for a debug point in its message. This allows the debugger not to have to
> keep state, but just record the results and pass them on. Event driven sends,
> well, events to the debugger and requires the debugger to keep state.
>
> While one shot seems more inviting and is more in line with what Graphite
> does. I think for harfbuzz, I would recommend an event based debugger, where
> you send debug events at the start and end of every lookup, at recursion,
> during initial reordering and shaping, at dotted circle insertion, etc. and
> have an enum of events and let the debugger work out what it wants to do with
> that information.
Agreed about stateful.
> So, I would add an enum to the debug message to give a debug message event
> type.
My current thinking is that everything is transferred as a text API in
one-line messages. The client can transform that to an enum if desired.
> One big question that always needs to be answered in the debugger is: where
> are we? Where in the buffer are we now processing. This is the idx field of
> the buffer. I don't think this is exposed in the public buffer interface. So
> it either needs to be exposed or passed as part of the debug message.
I'm unsure about this one. We don't expose the out_buf pard of the buffer, so
calling client code in the middle of a pass of transformation is harmful
currently. Exposing all of that, on the other hand, leaks a lot of the buffer
design, which I like to avoid right now. Indeed, we might end up changing the
buffer internals to accommodate the lookup direction proposal.
So, for now, no callbacks in the middle of a pass. I understand that's far
from ideal, but at least we are now answering the big question: which lookup
did what.
> I suggest that rather than relying on a message to give the lookup number,
> that the lookup number be passed as a separate parameter (or in a struct or
> whatever). The lookup number can be overloaded based on event type. So we
> could have a starting high level phase event type and use the lookup to say
> whether that is initial shaping, GSUB, GPOS, etc. for example. Or we could
> have different event types for each one. That's up to you.
While for regular C APIs I fully agree with you, for this, I'd rather we keep
it as a simple string. enums and tagged-union types are a headache for
language bindings and even serialization, whereas with 5 lines of code I could
get a debugger going on from Python. We just need to document the message
syntax completely and it will include all the info that the enum-and-struct
approach does; and performance is definitely not a problem here.
Plus, with a message-based API, clients can handle unknown messages to a
certain degree (eg, printing them out).
> I think we need to send a message each shaper pause when the pause occurs.
Yes, one at the beginning, another at the end.
> For GPOS we need to be passing parameters like the two points in an
> attachment or the actual calculated offset in a pair or single adjustment.
> When doing classed based activities, we should be passing the class values
> involved or perhaps pointers (or offsets) to the data structures involved so
> that a debugger can turn cross reference that back to source code.
GPOS is more friendly since the buffer structure is fully exposed. Though,
deferred attachments won't be exposed.
> What does that look like now:
>
> debug_message(type, buf, idx, lkupidx, void *aptr, void *bptr, msg, ...)
>
> where aptr and bptr are defined by type and lkupidx and may point to things
> like an attachment point record or a lookup record in a class based
> contextual lookup or somesuch. They may also point to debugger specific data
> structures (perhaps for an attachment point one needs a pointer to the ap
> record and 2 floats for the resolved x,y coordinates).
That's definitely one thing I *don't* want.
> You know, if we get this right, we should be able to drop the msg, ... since
> debuggers really don't want to have to parse textual messages. Yes they are
> easy for a quick trace, but not for a real debugger. But it's welcome to stay
> to make such tracing programs' lives easier, but it shouldn't contain
> anything that isn't in the other parameters. If it does, then we need a way
> to pass it outside the message.
Right. But I really don't want to add 35-and-growing different structs to
HarfBuzz, just for debugging, either. Since debuggers can recover whatever
structs they want from the message, and this is a side API I like to keep to a
minimum in HarfBuzz, the message API wins IMO.
> And yes, while I'm trying to define what the kitchen sink is, I'm also trying
> to keep this lightweight.
>
> I know the moment I hit send, I'll think of things I've forgotten!
lol.
I'm probably going to add shape_plan to list of arguments. After that, if I
make a release, the API is here to stay... So, speak very loudly if you think
for whatever reason this is not workable. Ie, there are things that cannot be
done using a message. I can't think of any.
Cheers,
behdad
_______________________________________________
HarfBuzz mailing list
[email protected]
http://lists.freedesktop.org/mailman/listinfo/harfbuzz