On Sun, 2002-04-21 at 22:07, Andrew Dunbar wrote: Andrew has asked at least twice for me to reply, so I feel compelled to now :)
I view many separate issues here, as Paul's elephant metaphor tries to address. I feel that things should be broken up into much smaller parts and then addressed separately. This thread has gotten *far* too long already, and is impossible to follow. I've done my best to catch up after this weekend-barraige. IMO, we can look at the whole problem as a combination and interaction of the component parts. Many of these parts, in my mind, have very clean boundaries. Some, however, may not. Several (but surely not all) of these important components are: Piece Table GUI (dialogs, buttons, menus) GUI (layout classes, "canvas" representation of the document) Interaction with outside libraries/programs (eg: spell-checking code) Disk/File representation of the document (and any limitations of the respecitive xml parsers we use) Of course there is a ripple/trickle effect throughout the code, but from what I see, these all seem fairly self-contained. We need to list what the design goals are for each component, and then map them to each other. For instance we might want this (or we might not...): ----------------------------------- GUI: input comes in in whatever format the toolkit defines (GTK+ - utf8, Win32: ucs2?). iconv() to convert that back to the backend. For the times when we need strings/data from the backend, convert it to whatever format the front-end needs. Spell-checking code: aspell/pspell can support various encodings, which might be a big win in the long run if we use it. As it stands, we already map UCS-2 to something that ispell can understand (through hash-encoding files). Defining an XXX->ispell conversion isn't that tough. Disk/file: Do we want this to be the same encoding as in the PT? Do we want to store this in a user's native locale? Do we want to store this in UTF-8 everywhere and be done with it? There are darn good reasons for all of these choices. Let's argue out their merits. Piece Table: Do we use fixed or variable width encodings? I'd probably be in favor of fixed-width encodings (UCS-2 or 32). Is processing UTF-8 computentionally intensive? Probably a little, but nothing outrageous. Will picking a fixed-width encoding just screw us over down the road? Beats me. ----------------------------------- But then there are theoretical arguments against this above proposed separation: * ) Computational overhead/"slowness" * ) Need to keep track of and use other bits of data (encodings for various string types, if applicable) * ) Increased memory usage * ) .... ----------------------------------- Find the boundaries and different components. Find out what works best for each of them, and then make sure that they can work well with each other. This is either one giant impossible problem, or many smaller solvable ones. Please split this thread into separate threads and discuss them individually. What are the concerns that we see? What problems are we trying to solve and how will choosing XYZ (best) solve them for us? What new problems does that solution create? We'll see that there's probably a best fit somewhere. Cheers, Dom
signature.asc
Description: This is a digitally signed message part
