[Adding [EMAIL PROTECTED] to the list of recipients.] On Tue, Sep 27, 2005 at 02:20:27PM +0200, Agustin Martin wrote: > On Wed, Sep 21, 2005 at 04:32:06PM -0700, Anton Zinoviev wrote: > > > Changes: > > bgoffice (3.0-5) unstable; urgency=low > ... > > * Files /etc/emacs21/site-start.d/90{aspell-bg,ibulgarian}.el to > > codepage-setup cp1251. It is still not clear to me how to support > > spelling of Bulgarian UTF-8 texts in Emacs. > > This should be internally handled by most {x}emacs if > buffer-file-coding-system is set to the encoding instead to > 'undecided' or equivalent. Notably xemacs21-nomule does not support > that. ispell.el will recode that UTF-8 to the encoding declared by > the dictionary when sending strings and the other way back when > receiving them. That should be transparent to the user, unless the > original UTF-8 has characters that cannot be recoded to the single > byte encoding, leading to misalignment errors (like in #205516).
For me this works only for 8-bit coding systems. :-( For utf-8 encoded bufers "M-x ispell-bufer" works only on words that do not contain non-Latin1 letters. The other words (i.e. all for a non-Latin language) are simply skipped. (I can observe this because the Bulgarian dictionary for aspell accepts both the Bulgarian and the English words - an advantage of Bulgarian being a non-Latin language.) There is also another weird problem I'd like to ask for. I found it to be reproducible for all non-ISO-8859-1 dictionaries for aspell, for example aspell-pl (Latin2) and aspell-bg (Cyrillic). I have the following setup in my ~/.emacs: (custom-set-variables '(ispell-program-name "bulgarian") ; or "polish" '(ispell-dictionary "polish")) Then I am loading a file and do "M-x ispell-buffer". The result is Ispell misalignment: word `ZP' point 169; probably incompatible versions However if I manually select the Bulgarian (resp. Polish) language by "M-x ispell-change-dictionary" there is no problem (that is for 8-bit coding systems). Ispell works fine as a default dictionary, only aspell requires manual setting of the dictionary for every buffer. I have not set up a language environment for Emacs. I work in an UTF-8 locale and when I want to open a non-UTF-8 document I use "C-x RET c coding_system C-x C-f". > > * Add entries for different Emacs versions in ibulgarian.info-ispell and > > aspell-bg.info-aspell. Thanks to Ivan Raikov, closes: #321040. > > Seems that xemacs21 also does not support cp1251. The summary seems to be > > emacs20: nothing > emacs21: cp1251 > emacs22: cp1251, windows-1251 > xemacs21: windows-1251 > > I would forget emacs20, that was not even shipped with sarge (and whose > iso-8859-1 entry was wrong), and concentrate in leaving only the cp1251 > entry, that also matches aspell. The package language-env used to cheat Emacs20 that the user works with ISO 8859-1 but sets up a CP1251 font. Thats why there is a iso-8859-1 entry for a Cyrillic language. But you are right - Emacs20 is not important any more. > The only problem is (emacs20 discarded) > with xemacs21, and seems to be easily fixable defining cp1251 as an alias to > windows-1251 for xemacs. I can add that in an initialization file. > > I have seen another problem in the ispell entry name. While all utf-8 > entries I tried displayed as raw chars in my latin1 environment when used > in a debconf prompt, showing all chars, the bulgarian entry seems to only > show the first char (as a 3 byte UTF-8 char) and nothing of the remaining > chars. There are only 2 byte UTF-8 chars there but the fourth byte is \212 and is not part of ISO 8859-1. > I do not have a clear position regarding this last, when the use of utf8 > was introduced in policy seemed that all utf8 chars were to be displayed as > multibyte chains in single byte encodings, leaving in the worst case the > english translation readable. But this case confuses me, we should probably > suggest trying first some sort of 7bit 'native' transliteration when possible > instead of directly suggesting the use of UTF8, or at least using something > like > > 7bit_transliteration [UTF-8_native_name] (english translation) > > when utf8 is used. I hope that would at least make the 7bit_transliteration > readable in the worst case, when something in the utf8 string confuses > whiptail (but I did not check that). This seems to not affect readline or > gnome frontends. Another possibility would be to leave things as they > currently are, expecting utf8 support be improved in the meantime. > > What do you think? I think the best solution is to insert somewhere the command iconv -c -futf-8 -t`locale charmap` Anton Zinoviev -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]