Re: commit: abi: UTF8String class

Dom Lachowicz Mon, 22 Apr 2002 11:17:33 -0700

On Sun, 2002-04-21 at 22:07, Andrew Dunbar wrote:

Andrew has asked at least twice for me to reply, so I feel compelled to
now :)


I view many separate issues here, as Paul's elephant metaphor tries to
address. I feel that things should be broken up into much smaller parts
and then addressed separately. This thread has gotten *far* too long
already, and is impossible to follow. I've done my best to catch up
after this weekend-barraige. 

IMO, we can look at the whole problem as a combination and interaction
of the component parts. Many of these parts, in my mind, have very clean
boundaries. Some, however, may not.

Several (but surely not all) of these important components are: 

Piece Table
GUI (dialogs, buttons, menus)
GUI (layout classes, "canvas" representation of the document)
Interaction with outside libraries/programs (eg: spell-checking code)
Disk/File representation of the document (and any limitations of the
respecitive xml parsers we use)

Of course there is a ripple/trickle effect throughout the code, but from
what I see, these all seem fairly self-contained. We need to list what
the design goals are for each component, and then map them to each
other. For instance we might want this (or we might not...):

-----------------------------------

GUI: input comes in in whatever format the toolkit defines (GTK+ - utf8,
Win32: ucs2?). iconv() to convert that back to the backend. For the
times when we need strings/data from the backend, convert it to whatever
format the front-end needs.

Spell-checking code: aspell/pspell can support various encodings, which
might be a big win in the long run if we use it. As it stands, we
already map UCS-2 to something that ispell can understand (through
hash-encoding files). Defining an XXX->ispell conversion isn't that
tough.

Disk/file: Do we want this to be the same encoding as in the PT? Do we
want to store this in a user's native locale? Do we want to store this
in UTF-8 everywhere and be done with it? There are darn good reasons for
all of these choices. Let's argue out their merits.

Piece Table: Do we use fixed or variable width encodings? I'd probably
be in favor of fixed-width encodings (UCS-2 or 32). Is processing UTF-8
computentionally intensive? Probably a little, but nothing outrageous.
Will picking a fixed-width encoding just screw us over down the road?
Beats me.

-----------------------------------

But then there are theoretical arguments against this above proposed
separation:

* ) Computational overhead/"slowness"
* ) Need to keep track of and use other bits of data (encodings for
various string types, if applicable)
* ) Increased memory usage
* ) ....

-----------------------------------

Find the boundaries and different components. Find out what works best
for each of them, and then make sure that they can work well with each
other. This is either one giant impossible problem, or many smaller
solvable ones.

Please split this thread into separate threads and discuss them
individually. What are the concerns that we see? What problems are we
trying to solve and how will choosing XYZ (best) solve them for us? What
new problems does that solution create? We'll see that there's probably
a best fit somewhere.

Cheers,
Dom

signature.asc
Description: This is a digitally signed message part

Re: commit: abi: UTF8String class

Reply via email to