[issue16343] PyUnicode_FromFormatV() doesn't support utf-8 text
New submission from Mariano Reingart: Working in an internationalization proposal <http://python.org.ar/pyar/TracebackInternationalizationProposal> I've stopped at #9769 where multi byte encodings (like utf-8) is not supported by PyUnicode_FromFormatV() Beside my proposal, I think utf-8 should be supported for consistency with the other unicode functions, like PyUnicode_FromString() or even unicode_fromformat_arg() Attached is a patch that: - enhanced the iterator to detect multibyte sequences, with sanity checks about start & continuation bytes - replaced unicode_write_cstr with PyUnicode_DecodeUTF8Stateful - tests Hope it helps, this is my first patch for cpython and my C skills are a bit rusty, so excuse me if there is any newbie glitch -- components: Interpreter Core, Unicode files: pyunicode_fromformat_utf8.patch keywords: patch messages: 173996 nosy: ezio.melotti, reingart priority: normal severity: normal status: open title: PyUnicode_FromFormatV() doesn't support utf-8 text type: enhancement versions: Python 3.4 Added file: http://bugs.python.org/file27755/pyunicode_fromformat_utf8.patch ___ Python tracker <http://bugs.python.org/issue16343> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16344] Traceback Internationalization Proposal
New submission from Mariano Reingart: I'm opening this ticket to organize patches for a proposal of a GETTEXT-based message translation for exception/tracebacks as described in: <http://python.org.ar/pyar/TracebackInternationalizationProposal> This requires the patch in issue #16343 Attached is a patch for a proof of concept, it includes: - pyi18n.h: header for Py_GETTEXT macro definition - Locale/es.po: sample Spanish language messages translation file - test_i18n.py: basic tests - errors.c: patched PyErr_SetString(), PyErr_Format() - traceback.c: patched tb_displayline(), PyTraceBack_Print() - pythonrun.c: patched _Py_InitializeEx_Private() - site.py: patched main() and added seti18n() -- components: Interpreter Core, Unicode files: traceback_internationalization_proposal.patch keywords: patch messages: 173998 nosy: ezio.melotti, reingart priority: normal severity: normal status: open title: Traceback Internationalization Proposal type: enhancement versions: Python 3.4 Added file: http://bugs.python.org/file27756/traceback_internationalization_proposal.patch ___ Python tracker <http://bugs.python.org/issue16344> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16344] Traceback Internationalization Proposal
Mariano Reingart added the comment: This has been discussed in python-ideas two years ago (I've resurrected the thread there) Sadly I didn't have time for this before, but as in 15 days we have a sprint on cpython at PyCon Argetina 2012, maybe it would be a good idea discuss this again. Sorry if I've made any mistake, this is my second patch here, and my C skills are rusty as I've mentioned in the other issue. -- ___ Python tracker <http://bugs.python.org/issue16344> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16344] Traceback Internationalization Proposal
Mariano Reingart added the comment: BTW, I'd write a draft PEP for this (attached), the online version is at <http://python.org.ar/pyar/TracebackInternationalizationProposal> Just let me know if it has to be uploaded/discussed elsewere -- Added file: http://bugs.python.org/file27757/pep_i18n_traceback.txt ___ Python tracker <http://bugs.python.org/issue16344> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16343] PyUnicode_FromFormatV() doesn't support utf-8 text
Mariano Reingart added the comment: I thought #9769 was closed (in fact, that patch was already applied). Now, PyUnicode_FromFormatV() doesn't handle non-ascii text at all. Maybe I misread the part telling to open a new issue in the comments, sorry for that. -- ___ Python tracker <http://bugs.python.org/issue16343> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue9769] PyUnicode_FromFormatV() doesn't handle non-ascii text correctly
Mariano Reingart added the comment: (moved from issue #16343) Working in an internationalization proposal <http://python.org.ar/pyar/TracebackInternationalizationProposal> (issue #16344) I've stopped at this problem (#9769) where multi byte encodings (like utf-8) is not supported by PyUnicode_FromFormatV() Beside my proposal, I think utf-8 should be supported for consistency with the other unicode functions, like PyUnicode_FromString() or even unicode_fromformat_arg() Attached is a patch that: - enhanced the iterator to detect multibyte sequences, with sanity checks about start & continuation bytes - replaced unicode_write_cstr with PyUnicode_DecodeUTF8Stateful - tests Hope it helps, this is my first patch for cpython and my C skills are a bit rusty, so excuse me if there is any newbie glitch -- nosy: +reingart Added file: http://bugs.python.org/file27771/pyunicode_fromformat_utf8.patch ___ Python tracker <http://bugs.python.org/issue9769> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16344] Traceback Internationalization Proposal
Mariano Reingart added the comment: "serious" developers? sorry but I think that is a unfortunate phrase that goes against the Python Diversity Statement What about young pupil? What about non-programmers (i.e. accountants)? In some places (like my country, public schools), English is not teach formally until the University. And I don't think non-English speakers are just a subset of users. Come on, English is not even the top native tongue (that is Chinese Mandarin). English can be one of the most spoken languages, but even that metric only reach 1/7th of the total world population. Other languages like Spanish or Portuguese are also rising. http://en.wikipedia.org/wiki/Language#Linguistic_diversity BTW, as the draft says, Python is the offender here, as other error messages are already translated (including the OS ones, even inside Python!): C:\Python32>python Python 3.2.2 (default, Sep 4 2011, 09:51:08) [MSC v.1500 32 bit (Intel)] on win 32 Type "help", "copyright", "credits" or "license" for more information. >>> import os >>> os.listdir("J:\\") Traceback (most recent call last): File "", line 1, in WindowsError: [Error 3] El sistema no puede encontrar la ruta especificada: 'J:\\*.*' PostgreSQL already translates error messages too: C:\Program Files\PostgreSQL\9.0\bin>psql psql (9.0.3) Digite «help» para obtener ayuda. Mariano=> SELECT * FROM nowhere; ERROR: no existe la relación «nowhere» LÍNEA 1: SELECT * FROM nowhere; ^ And Bash too: reingart@desktop:~/cpython$ ls /nowhere ls: no se puede acceder a /nowhere: No existe el archivo o el directorio Of course, there is no need to translate keywords or libraries (as SQL sentences and bash command are not translated, just messages are), I don't see why this could cause confusion, instead that, I think python would become more consistent with other tools and thus more easy to use. The mechanism to restore the language is the common one (used by almost every other application that support i18n): >>> locale.setlocale(locale.LC_MESSAGES, "C") It should be not difficult for "serious" programmers to handle that :-) If that is a concern, it could be implemented a command line parameter, a environment variable or a shortcut in locale module. Anyway, people will not necessarily be faced by default with the localized version, an if for example, a teacher has to jump to an student machine, surely it could use it as messages will be probably in the spoken language of the country (BTW, probably most of the operating system components will be localized, not only Python) For advanced users or logging, it could be disabled at all! Finally, you're correct about that translation is not easy job, and this proposal (traceback internationalization) is just the tip of the iceberg (even more work will be needed in other aspects to get a full localization). If PostgreSQL and other tools could do that, why Python could not? -- ___ Python tracker <http://bugs.python.org/issue16344> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue16344] Traceback Internationalization Proposal
Mariano Reingart added the comment: Sorry for taking so long to replying, and for this long follow up... > Antoine Pitrou added the comment: > I think the PEP should be proposed on python-dev or python-ideas. > Also, it's probably better if the PEP is encoded in utf-8, not > latin-1. Ok, I'll update, polish, encode in utf-8 and send to python-dev It was already discussed in Python-ideas (maybe not in particular/detail), but it seems that no one have more to add there, or they are bussy with the Async API :-). > Terry J. Reedy added the comment: > I am sympathetic with non-English speakers wanting a native-language > translation. Sympathetic is a kind of compassion? It may be a correct meaning here. Just read that almost every of you complain arguing that you'll hesitate because you can/could receive a message in a foreign language that could not understand. Well, that's is what is already happening to non-English speakers of the python language IMHO. And it is not just frustrating, sometimes it is also a wasting of time because of the distractions and delays it produces. > But I think the interpreter should *always* emit the standard message > and that any translation should be an addition, not a replacement. > This would maintain discoverablity and help people learn the English > version, not hinder it. I'll explore the alternatives to show both messages (original and translated), but I think that would be more confusing. I do not think that it hinders the meaning, it just translates it, and I didn't see any other language / tool that puts both messages, but I'll investigate more (maybe the exception name -that is untranslated- plus an error code like in PostgreSQL would be more helpful to discoverablity) Learning English by showing both messages may be a interesting experiment, but, for me, it's like traditional education focus on "memorizing" things instead of understanding them, and depending on the context, it can be lead to good results or misleading repetition. > The real question to me is how deep in the interpreter such support > should go. Third party shells can (and sometimes do) intercept > tracebacks and reformat (and translate) as they wish. But there would > be advantages and disadvanteges is adding the translation sooner. About except hook approach, it doesn't work very reliable because you don't have the original unformatted message, so you have to interpolate the results to find the correct translation. Beside that it will be slower and it could be error prone, the main problem is not technical, but "social", as it could lead to translation effort duplication, segregation and proliferation of custom tools, with the aggravation that in some scenarios except hook is not honoured: http://bugs.python.org/issue12643 (just an example) You can take a look at one of my attempt trying to translate using interpolation (my algorithm is some kind of brute force "guessing" using regular expressions just to test the idea): http://code.google.com/p/pydiversity/source/browse/__diversity__/__init__.py I think that approach "left in the wild" (and/or "do it yourself") is not only more complex, also it could be more dangerous that having a unified translation resources, where all messages all listed, a common infrastructure is used and general rules are agreed. > Ezio Melotti added the comment: > There are two solutions to this problem: > 1) adapt the language to the users; > 2) teach the users English; > > While the first (i.e. what you are proposing) works as a short term > solution, I believe the second is a much better long term solution, > because IMHO users will anyway have to learn English sooner or later. Teach the users English may be an altruist goal in the long term, but for many teachers (like my case) it a barrier right now that can tip the balance to other "more friendly languages" Anyway, and don't get me wrong, but, force novice users to learn a second language, aside it is likely impractical, it may sound at least rude, ethnocentric or as a neocolonialism in some contexts (if we want to go further...). Education takes a lot of resources, I don't think it would happen just showing some English messages (BTW, English may be one of the most difficult languages to learn as a second tongue, depending on the part of the world you live... at least in my country you only can archive an acceptable skill before 6 to 9 years, depending your age and other socio-economical factors) IMHO, it would be more encouraging a message like "we can help you in your first programming steps with python localized for your language, but please consider to learn English to better communicate in the international community" > I've seen buildbots
[issue16344] Traceback Internationalization Proposal
Mariano Reingart added the comment: Just for the record, I've presented a "CPython Internationalization proposal" for this year Google Summer of Code program: http://www.google-melange.com/gsoc/proposal/public/google/gsoc2015/reingart/5634387206995968 Indeed, that was my third attempt to move it forward (you can look there for the implications, schedule, etc.) Anyway, it didn't get accepted (and I have no more feedback from GSoC than that), so I will not be able to focus on this and finish it in 3 months as planned, but I'll do my best. Currently I'm cloning the CPython repository under my GitHub account to work on it (as I would have done if the GSoC project was approved): https://github.com/reingart/python/ It was exported using hg-git so it can be easily updated or get collaborations back with mercurial, just using GitHub to publish it. BTW, the PEP was hanging around since 2010 (see the attached file in 2012 for example), now I uploaded it in GitHub so it can be collaboratively edited: https://github.com/reingart/python/wiki/PEP-i18n I will re-organize / re-base the patch and update the PEP ASAP. PS: Yes, it seems that "the days when all programmers must learn English" are fading away, Visual Basic 5 was internationalized around 20 years ago (indeed, I learned VB as one of my first "real" languages as it had completely Spanish translations for errors and online built-in F1 help, in a CDROM those days). The first "logo" programming language my father brought to may home around 30 years ago, also was in Spanish IIRC (for a Spectrum TK90). Even gcc and bash are internationalized nowadays :-) -- ___ Python tracker <http://bugs.python.org/issue16344> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com