> GUI: input comes in in whatever format the toolkit defines (GTK+ - > utf8, Win32: ucs2?). iconv() to convert that back to the backend. For > the times when we need strings/data from the backend, convert it to > whatever format the front-end needs.
This seems to the most logical approach. Most of the gui strings are static, they need only one translation, at load time. The interaction between the GUI and backend is virtually nil, save keyboard input, but considering human typing speeds conversion to a different encoding is not a reall performance issue. > Disk/file: Do we want this to be the same encoding as in the PT? Do we > want to store this in a user's native locale? Do we want to store this > in UTF-8 everywhere and be done with it? There are darn good reasons > for all of these choices. Let's argue out their merits. Currently are not consistent. The Win32 build uses utf-8, Linux the encoding of current locale. The advantage of using the locale encoding is the size of the file, for unless you use only characters from basic ASCII, utf-8 needs at least two bytes for each. The other advantage of using the locale encoding is that the user can view/search, etc. the raw files. This is quite important to a number of users, and I think we should retain this. What, however, I would like to see, is for the user to be able to change any default to an encoding to his or her choice. > Piece Table: Do we use fixed or variable width encodings? I'd probably > be in favor of fixed-width encodings (UCS-2 or 32). Is processing > UTF-8 computentionally intensive? Probably a little, but nothing > outrageous. Will picking a fixed-width encoding just screw us over > down the road? Beats me. I strongly favour fixed width encoding, because it is so much easier to handle. Variable width would save some memory for some users and vaste it for others. Picking a fixed-width encoding will not limit us in any way in the future, and nor will variable-width encoding -- they are just encodings, that's all. Tomas
