[Python-Dev] PyCapsule_Import semantics, relative imports, module names etc.

2015-07-24 Thread John Dennis
While porting several existing CPython extension modules that form a 
package to be 2.7 and 3.x compatible the existing PyObject_* API was 
replaced with PyCapsule_*. This introduced some issues the existing 
CPython docs are silent on. I'd like clarification on a few issues and 
wish to raise some questions.


1. Should an extension module name as provided in PyModule_Create (Py3) 
or Py_InitModule3 (Py2) be fully package qualified or just the module 
name? I believe it's just the module name (see item 5 below) Yes/No?


2. PyCapsule_Import does not adhere to the general import semantics. The 
module name must be fully qualified, relative imports are not supported.


3. PyCapsule_Import requires the package (e.g. __init__.py) to import 
*all* of it's submodules which utilize the PyCapsule mechanism 
preventing lazy on demand loading. This is because PyCapsule_Import only 
imports the top level module (e.g. the package). From there it iterates 
over each of the module names in the module path. However the parent 
module (e.g. globals) will not contain an attribute for the submodule 
unless it's already been loaded. If the submodule has not been loaded 
into the parent PyCapsule_Import throws an error instead of trying to 
load the submodule. The only apparent solution is for the package to 
load every possible submodule whether required or not just to avoid a 
loading error. The inability to load modules on demand seems like a 
design flaw and change in semantics from the prior use of 
PyImport_ImportModule in combination with PyObject. [One of the nice 
features with normal import loading is setting the submodule name in the 
parent, the fact this step is omitted is what causes PyCapsule_Import to 
fail unless all submodules are unconditionally loaded). Shouldn't 
PyCapsule_Import utilize PyImport_ImportModule?


4. Relative imports seem much more useful for cooperating submodules in 
a package as opposed to fully qualified package names. Being able to 
import a C_API from the current package (the package I'm a member of) 
seems much more elegant and robust for cooperating modules but this 
semantic isn't supported (in fact the leading dot syntax completely 
confuses PyCapsule_Import, doc should clarify this).


5. The requirement that a module specifies it's name as unqualified when 
it is initializing but then also has to use a fully qualified package 
name for PyCapsule_New, both of which occur inside the same 
initialization function seems like an odd inconsistency (documentation 
clarification would help here). Also, depending on your point of view 
package names could be considered a deployment/packaging decision, a 
module obtains it's fully qualified name by virtue of it's position in 
the filesystem, something at compile time the module will not be aware 
of, another reason why relative imports make sense. Note the identical 
comment regarding _Py_PackageContext in  modsupport.c (Py2) and 
moduleobject.c (Py3) regarding how a module obtains it's fully qualified 
package name (see item 1).


Thanks!

--
John
___
Python-Dev mailing list
Python-Dev@python.org
https://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
https://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Unicode <--> UTF-8 in CPython extension modules

2008-02-22 Thread John Dennis
I've uncovered what seems to me to a problem with python Unicode
string objects passed to extension modules. Or perhaps it's revealing
a misunderstanding on my part :-) So I would like to get some
clarification.

Extension modules written in C receive strings from python via the
PyArg_ParseTuple family. Most extension modules use the 's' or 's#'
format parameter.

Many C libraries in Linux use the UTF-8 encoding.

The 's' format when passed a Unicode object will encode the string
according to the default encoding which is immutably set to 'ascii' in
site.py. Thus a C library expecting UTF-8 which uses the 's' format in
PyArg_ParseTuple will get an encoding error when passed a Unicode
string which contains any code points outside the ascii range.

Now my questions:

* Is the use of the 's' or 's*' format parameter in an extension
   binding expecting UTF-8 fundamentally broken and not expected to
   work?  Instead should the binding be using a format conversion which
   specifies the desired encoding, e.g. 'es' or 'es#'?

* The extension modules could successfully use the 's' or 's#' format
   conversion in a UTF-8 environment if the default encoding was
   UTF-8. Changing the default encoding to UTF-8 would in one easy
   stroke "fix" most extension modules, right? Why is the default
   encoding 'ascii' in UTF-8 environments and why is the default
   encoding prohibited from being changed from ascii?

* Did Python 2.5 introduce anything which now makes this issue visible
   whereas before it was masked by some other behavior?

Summary:

Python programs which use Unicode string objects for their i18n and
which "link" to C libraries expecting UTF-8 but which have a CPython
binding which only uses 's' or 's#' formats programs seem to often
fail with encoding errors. However, I have yet to see a CPython
binding which does explicitly define it's encoding requirements. This
suggests to me I either do not understand the issue in it's entirety
or many CPython bindings in Linux UTF-8 environments are broken with
respect to their i18n handling and the problem is currently
not addressed.

-- 
John Dennis <[EMAIL PROTECTED]>
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Unicode <--> UTF-8 in CPython extension modules

2008-02-22 Thread John Dennis
Colin Walters wrote:
> On Fri, Feb 22, 2008 at 4:23 PM, John Dennis <[EMAIL PROTECTED]> wrote:
> 
>>  Python programs which use Unicode string objects for their i18n and
>>  which "link" to C libraries expecting UTF-8 but which have a CPython
>>  binding which only uses 's' or 's#' formats programs seem to often
>>  fail with encoding errors.
> 
> One thing to be aware of is that PyGTK+ actually sets the Python
> Unicode object encoding to UTF-8.
> 
> http://bugzilla.gnome.org/show_bug.cgi?id=132040
> 
> I mention this because PyGTK is a very popular library related to
> Python and Linux.  So currently if you "import gtk", then libraries
> which are using UTF-8 (as you say, the vast majority) will work with
> Python unicode objects unmodified.

Thank you Colin, your input was very helpful. The fact PyGTK's i18n 
handling worked was the counter example which made me doubt my analysis 
was correct but I can see from the Gnome bug report and Martin's 
subsequent comment that the analysis was sound. It had perplexed me 
enormously why in some circumstances i18n handling worked but failed in 
others. Apparently it was a side effect of importing gtk, a problem 
exacerbated when either the sequence of imports or the complete set of 
imports was not taken into account.

I am aware of other python bindings (libxml2 is one example) which share 
the same mistake of not using the 'es' family of format conversions when 
the underlying library is UTF-8. At least I now understand why 
incorrectly coded bindings in some circumstances produced correct 
results when logic dictated they shouldn't.

-- 
John Dennis <[EMAIL PROTECTED]>
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com