single or multiple internal encodings

Tomas Frydrych Tue, 23 Apr 2002 07:16:18 -0700


> GUI: input comes in in whatever format the toolkit defines (GTK+ -
> utf8, Win32: ucs2?). iconv() to convert that back to the backend. For
> the times when we need strings/data from the backend, convert it to
> whatever format the front-end needs.


This seems to the most logical approach. Most of the gui strings 
are static, they need only one translation, at load time. The 
interaction between the GUI and backend is virtually nil, save 
keyboard input, but considering human typing speeds conversion 
to a different encoding is not a reall performance issue.

> Disk/file: Do we want this to be the same encoding as in the PT? Do we
> want to store this in a user's native locale? Do we want to store this
> in UTF-8 everywhere and be done with it? There are darn good reasons
> for all of these choices. Let's argue out their merits.

Currently are not consistent. The Win32 build uses utf-8, Linux the 
encoding of current locale. The advantage of using the locale 
encoding is the size of the file, for unless you use only characters 
from basic ASCII, utf-8 needs at least two bytes for each. The other 
advantage of using the locale encoding is that the user can 
view/search, etc. the raw files. This is quite important to a number 
of users, and I think we should retain this. What, however, I would 
like to see, is for the user to be able to change any default to an 
encoding to his or her choice.

> Piece Table: Do we use fixed or variable width encodings? I'd probably
> be in favor of fixed-width encodings (UCS-2 or 32). Is processing
> UTF-8 computentionally intensive? Probably a little, but nothing
> outrageous. Will picking a fixed-width encoding just screw us over
> down the road? Beats me.
I strongly favour fixed width encoding, because it is so much easier 
to handle. Variable width would save some memory for some users 
and vaste it for others. Picking a fixed-width encoding will not limit 
us in any way in the future, and nor will variable-width encoding -- 
they are just encodings, that's all.

Tomas

single or multiple internal encodings

Reply via email to