[issue16343] PyUnicode_FromFormatV() doesn't support utf-8 text

2012-10-27 Thread Mariano Reingart

New submission from Mariano Reingart:

Working in an internationalization proposal 
<http://python.org.ar/pyar/TracebackInternationalizationProposal>
I've stopped at #9769 where multi byte encodings (like utf-8) is not supported 
by PyUnicode_FromFormatV()

Beside my proposal, I think utf-8 should be supported for consistency with the 
other unicode functions, like PyUnicode_FromString() or even 
unicode_fromformat_arg()

Attached is a patch that:
- enhanced the iterator to detect multibyte sequences, with sanity checks about 
start & continuation bytes
- replaced unicode_write_cstr with PyUnicode_DecodeUTF8Stateful
- tests

Hope it helps, this is my first patch for cpython and my C skills are a bit 
rusty, so excuse me if there is any newbie glitch

--
components: Interpreter Core, Unicode
files: pyunicode_fromformat_utf8.patch
keywords: patch
messages: 173996
nosy: ezio.melotti, reingart
priority: normal
severity: normal
status: open
title: PyUnicode_FromFormatV() doesn't support utf-8 text
type: enhancement
versions: Python 3.4
Added file: http://bugs.python.org/file27755/pyunicode_fromformat_utf8.patch

___
Python tracker 
<http://bugs.python.org/issue16343>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16344] Traceback Internationalization Proposal

2012-10-27 Thread Mariano Reingart

New submission from Mariano Reingart:

I'm opening this ticket to organize patches for a proposal of a GETTEXT-based 
message translation for exception/tracebacks as described in:

<http://python.org.ar/pyar/TracebackInternationalizationProposal>

This requires the patch in issue #16343

Attached is a patch for a proof of concept, it includes:

- pyi18n.h: header for Py_GETTEXT macro definition
- Locale/es.po: sample Spanish language messages translation file
- test_i18n.py: basic tests
- errors.c: patched PyErr_SetString(), PyErr_Format()
- traceback.c: patched tb_displayline(), PyTraceBack_Print()
- pythonrun.c: patched _Py_InitializeEx_Private()
- site.py: patched main() and added seti18n()

--
components: Interpreter Core, Unicode
files: traceback_internationalization_proposal.patch
keywords: patch
messages: 173998
nosy: ezio.melotti, reingart
priority: normal
severity: normal
status: open
title: Traceback Internationalization Proposal
type: enhancement
versions: Python 3.4
Added file: 
http://bugs.python.org/file27756/traceback_internationalization_proposal.patch

___
Python tracker 
<http://bugs.python.org/issue16344>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16344] Traceback Internationalization Proposal

2012-10-28 Thread Mariano Reingart

Mariano Reingart added the comment:

This has been discussed in python-ideas two years ago (I've resurrected the 
thread there)

Sadly I didn't have time for this before, but as in 15 days we have a sprint on 
cpython at PyCon Argetina 2012, maybe it would be a good idea discuss this 
again.

Sorry if I've made any mistake, this is my second patch here, and my C skills 
are rusty as I've mentioned in the other issue.

--

___
Python tracker 
<http://bugs.python.org/issue16344>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16344] Traceback Internationalization Proposal

2012-10-28 Thread Mariano Reingart

Mariano Reingart added the comment:

BTW, I'd write a draft PEP for this (attached), the online version is at 
<http://python.org.ar/pyar/TracebackInternationalizationProposal>

Just let me know if it has to be uploaded/discussed elsewere

--
Added file: http://bugs.python.org/file27757/pep_i18n_traceback.txt

___
Python tracker 
<http://bugs.python.org/issue16344>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16343] PyUnicode_FromFormatV() doesn't support utf-8 text

2012-10-28 Thread Mariano Reingart

Mariano Reingart added the comment:

I thought #9769 was closed (in fact, that patch was already applied).
Now, PyUnicode_FromFormatV() doesn't handle non-ascii text at all.
Maybe I misread the part telling to open a new issue in the comments, sorry for 
that.

--

___
Python tracker 
<http://bugs.python.org/issue16343>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue9769] PyUnicode_FromFormatV() doesn't handle non-ascii text correctly

2012-10-28 Thread Mariano Reingart

Mariano Reingart added the comment:

(moved from issue #16343)

Working in an internationalization proposal 
<http://python.org.ar/pyar/TracebackInternationalizationProposal> (issue #16344)
I've stopped at this problem (#9769) where multi byte encodings (like utf-8) is 
not supported by PyUnicode_FromFormatV()

Beside my proposal, I think utf-8 should be supported for consistency with the 
other unicode functions, like PyUnicode_FromString() or even 
unicode_fromformat_arg()

Attached is a patch that:
- enhanced the iterator to detect multibyte sequences, with sanity checks about 
start & continuation bytes
- replaced unicode_write_cstr with PyUnicode_DecodeUTF8Stateful
- tests

Hope it helps, this is my first patch for cpython and my C skills are a bit 
rusty, so excuse me if there is any newbie glitch

--
nosy: +reingart
Added file: http://bugs.python.org/file27771/pyunicode_fromformat_utf8.patch

___
Python tracker 
<http://bugs.python.org/issue9769>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16344] Traceback Internationalization Proposal

2012-10-28 Thread Mariano Reingart

Mariano Reingart added the comment:

"serious" developers? sorry but I think that is a unfortunate phrase that goes 
against the Python Diversity Statement
What about young pupil? 
What about non-programmers (i.e. accountants)? 
In some places (like my country, public schools), English is not teach formally 
until the University.

And I don't think non-English speakers are just a subset of users.
Come on, English is not even the top native tongue (that is Chinese Mandarin). 
English can be one of the most spoken languages, but even that metric only 
reach 1/7th of the total world population. Other languages like Spanish or 
Portuguese are also rising.

http://en.wikipedia.org/wiki/Language#Linguistic_diversity

BTW, as the draft says, Python is the offender here, as other error messages 
are already translated (including the OS ones, even inside Python!):

C:\Python32>python
Python 3.2.2 (default, Sep  4 2011, 09:51:08) [MSC v.1500 32 bit (Intel)] on 
win 32
Type "help", "copyright", "credits" or "license" for more information.
>>> import os
>>> os.listdir("J:\\")
Traceback (most recent call last):
  File "", line 1, in 
WindowsError: [Error 3] El sistema no puede encontrar la ruta especificada: 
'J:\\*.*'

PostgreSQL already translates error messages too:

C:\Program Files\PostgreSQL\9.0\bin>psql
psql (9.0.3)
Digite «help» para obtener ayuda.

Mariano=> SELECT * FROM nowhere;
ERROR:  no existe la relación «nowhere»
LÍNEA 1: SELECT * FROM nowhere;
  ^

And Bash too:

reingart@desktop:~/cpython$ ls /nowhere
ls: no se puede acceder a /nowhere: No existe el archivo o el directorio


Of course, there is no need to translate keywords or libraries (as SQL 
sentences and bash command are not translated, just messages are), I don't see 
why this could cause confusion, instead that, I think python would become more 
consistent with other tools and thus more easy to use.

The mechanism to restore the language is the common one (used by almost every 
other application that support i18n):
>>> locale.setlocale(locale.LC_MESSAGES, "C")
It should be not difficult for "serious" programmers to handle that :-)
If that is a concern, it could be implemented a command line parameter, a 
environment variable or a shortcut in locale module.

Anyway, people will not necessarily be faced by default with the localized 
version, an if for example, a teacher has to jump to an student machine, surely 
it could use it as messages will be probably in the spoken language of the 
country (BTW, probably most of the operating system components will be 
localized, not only Python)
For advanced users or logging, it could be disabled at all!

Finally, you're correct about that translation is not easy job, and this 
proposal (traceback internationalization) is just the tip of the iceberg (even 
more work will be needed in other aspects to get a full localization).
If PostgreSQL and other tools could do that, why Python could not?

--

___
Python tracker 
<http://bugs.python.org/issue16344>
___
___
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue16344] Traceback Internationalization Proposal

2012-10-30 Thread Mariano Reingart

Mariano Reingart added the comment:

Sorry for taking so long to replying, and for this long follow up...

> Antoine Pitrou added the comment:

> I think the PEP should be proposed on python-dev or python-ideas.
> Also, it's probably better if the PEP is encoded in utf-8, not 
> latin-1.

Ok, I'll update, polish, encode in utf-8 and send to python-dev
It was already discussed in Python-ideas (maybe not in particular/detail), but 
it seems that no one have more to add there, or they are bussy with the Async 
API :-).


> Terry J. Reedy added the comment:

> I am sympathetic with non-English speakers wanting a native-language
> translation. 

Sympathetic is a kind of compassion?
It may be a correct meaning here. 
Just read that almost every of you complain arguing that you'll hesitate 
because you can/could receive a message in a foreign language that could not 
understand.
Well, that's is what is already happening to non-English speakers of the python 
language IMHO. 
And it is not just frustrating, sometimes it is also a wasting of time because 
of the distractions and delays it produces.

> But I think the interpreter should *always* emit the standard message
> and that any translation should be an addition, not a replacement.
> This would maintain discoverablity and help people learn the English
> version, not hinder it.

I'll explore the alternatives to show both messages (original and translated), 
but I think that would be more confusing.
I do not think that it hinders the meaning, it just translates it, and I didn't 
see any other language / tool that puts both messages, but I'll investigate 
more (maybe the  exception name -that is untranslated-  plus an error code like 
in PostgreSQL would be more helpful to discoverablity)

Learning English by showing both messages may be a interesting experiment, but, 
for me, it's like traditional education focus on "memorizing" things instead of 
understanding them, and depending on the context, it can be lead to good 
results or misleading repetition.

> The real question to me is how deep in the interpreter such support
> should go. Third party shells can (and sometimes do) intercept
> tracebacks and reformat (and translate) as they wish. But there would
> be advantages and disadvanteges is adding the translation sooner.

About except hook approach, it doesn't work very reliable because you don't 
have the original unformatted message, so you have to interpolate the results 
to find the correct translation. 
Beside that it will be slower and it could be error prone, the main problem is 
not technical, but "social", as it could lead to translation effort 
duplication, segregation and proliferation of custom tools, with the 
aggravation that in some scenarios except hook is not honoured:

http://bugs.python.org/issue12643 (just an example)

You can take a look at one of my attempt trying to translate using 
interpolation (my algorithm is some kind of brute force "guessing" using 
regular expressions just to test the idea):

http://code.google.com/p/pydiversity/source/browse/__diversity__/__init__.py

I think that approach "left in the wild" (and/or "do it yourself") is not only 
more complex, also it could be more dangerous that having a unified translation 
resources, where all messages all listed, a common infrastructure is used and 
general rules are agreed.


> Ezio Melotti added the comment:

> There are two solutions to this problem:
> 1) adapt the language to the users;
> 2) teach the users English;
>
> While the first (i.e. what you are proposing) works as a short term 
> solution, I believe the second is a much better long term solution, 
> because IMHO users will anyway have to learn English sooner or later.

Teach the users English may be an altruist goal in the long term, but for many 
teachers (like my case) it a barrier right now that can tip the balance to 
other "more friendly languages"

Anyway, and don't get me wrong, but, force novice users to learn a second 
language, aside it is likely impractical, it may sound at least rude, 
ethnocentric or as a neocolonialism in some contexts (if we want to go 
further...). 
Education takes a lot of resources, I don't think it would happen just showing 
some English messages (BTW, English may be one of the most difficult languages 
to learn as a second tongue, depending on the part of the world you live... at 
least in my country you only can archive an acceptable skill before 6 to 9 
years, depending your age and other socio-economical factors)

IMHO, it would be more encouraging a message like "we can help you in your 
first programming steps with python localized for your language, but please 
consider to learn English to better communicate in the international community"

> I've seen buildbots 

[issue16344] Traceback Internationalization Proposal

2015-05-03 Thread Mariano Reingart

Mariano Reingart added the comment:

Just for the record, I've presented a "CPython Internationalization proposal" 
for this year Google Summer of Code program:

http://www.google-melange.com/gsoc/proposal/public/google/gsoc2015/reingart/5634387206995968

Indeed, that was my third attempt to move it forward (you can look there for 
the implications, schedule, etc.)

Anyway, it didn't get accepted (and I have no more feedback from GSoC than 
that), so I will not be able to focus on this and finish it in 3 months as 
planned, but I'll do my best.

Currently I'm cloning the CPython repository under my GitHub account to work on 
it (as I would have done if the GSoC project was approved):

https://github.com/reingart/python/

It was exported using hg-git so it can be easily updated or get collaborations 
back with mercurial, just using GitHub to publish it.

BTW, the PEP was hanging around since 2010 (see the attached file in 2012 for 
example), now I uploaded it in GitHub so it can be collaboratively edited:

https://github.com/reingart/python/wiki/PEP-i18n

I will re-organize / re-base the patch and update the PEP ASAP.

PS: Yes, it seems that "the days when all programmers must learn English" are 
fading away, Visual Basic 5 was internationalized around 20 years ago (indeed, 
I learned VB as one of my first "real" languages as it had completely Spanish 
translations for errors and online built-in F1 help, in a CDROM those days). 
The first "logo" programming language my father brought to may home around 30 
years ago, also was in Spanish IIRC (for a Spectrum TK90). Even gcc and bash 
are internationalized nowadays :-)

--

___
Python tracker 
<http://bugs.python.org/issue16344>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com