Re: [Python-Dev] accept the wheel PEPs 425, 426, 427

2012-10-24 Thread Ronald Oussoren

On 18 Oct, 2012, at 19:29, Daniel Holth  wrote:

> I'd like to submit the Wheel PEPs 425 (filename metadata), 426
> (Metadata 1.3), and 427 (wheel itself) for acceptance. The format has
> been stable since May and we are preparing a patch to support it in
> pip, but we need to earn consensus before including it in the most
> widely used installer.

PEP 425: 

* "The version is py_version_nodot. CPython gets away with no dot, but if one 
is needed the underscore _ is used instead"

   I don't particularly like replacing dots by underscores. That needed because 
you use the dot character in compressed tag sets, but why not use a comma to 
separate items in the compressed tag set?

* "The platform tag is simply distutils.util.get_platform() with all hyphens - 
and periods . replaced with underscore _."

   Why the replacement?  The need for replacement could be avoided by using a 
different separator between elements of a tag (for example "~" or "+"), and 
furthermore the platform tag is at a know
   location, and hence the use of hyphens in the platform tag is harmless (use 
"python_tag, abi_tag, platform_tag = tag.split('-', 2)" to split the tag into 
its elements.

* "compressed tag sets"

   Using '"," instead of "." to separate elements of the tag set takes away the 
need to replace dots in tag elements, and seems more natural to me (you'd also 
use comma to separate the elements
   when you write them down in prose or python code.

Ronald
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Fwd: Python 3.3 can't sort memoryviews as they're unorderable

2012-10-24 Thread Nick Coghlan
(Oops, originally replied only to Mark)

Is a 3x3 array greater or less than a 2x4 array or another 3x3 array?

The contents of a 1D memory view may be sortable, but the "logical
structure" part isn't, and neither is any multi-dimensional view.

I'm surprised by the lack of inheritance support though - is that a
regression from 3.2? If yes, that's definitely a bug to be fixed in a 3.3
maintenance release, otherwise it's probably a feature request for 3.4.

Cheers,
Nick.

--
Sent from my phone, thus the relative brevity :)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] accept the wheel PEPs 425, 426, 427

2012-10-24 Thread Ronald Oussoren

On 18 Oct, 2012, at 19:29, Daniel Holth  wrote:

> I'd like to submit the Wheel PEPs 425 (filename metadata), 426
> (Metadata 1.3), and 427 (wheel itself) for acceptance. The format has
> been stable since May and we are preparing a patch to support it in
> pip, but we need to earn consensus before including it in the most
> widely used installer.

PEP 427:

* The installation section mentions that .py files should be compiled to 
.pyc/.pyo files, and that "Uninstallers should be smart enough to remove .pyc 
even if it is not mentioned in RECORD.". 

   Wouldn't it be better to add the compiled files to the RECORD file? That 
would break the digital signature, but I'm not sure if verifying the signature 
post-installation is useful (or if it's even
   intended to work). 

* Why is urlsafe_b64encode_nopad used to encode the hash in the record file, 
instead of the normal hex encoding that's directly supported by the hash module 
and system tools?

* The way to specify the required public key in package requirements in ugly 
(it looks like an abuse of setuptools' extras mechanism). Is there really no 
nicer way to specify this?

* As was noted before there is no threat model for the signature feature, which 
makes it hard to evaluate if the feature.  In particular, what is the advantage 
of this over PGP signatures of wheels? (PyPI already supports detached 
signatures, and such signatures are used more widely in the OSS world)

* RECORD.p7s is not described at all. I'm assuming this is intented to be a 
X.509 signature of RECORD in pkcs7 format. Why PKCS7 and not PEM? The latter 
seems to be easier to work with.

Ronald
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] accept the wheel PEPs 425, 426, 427

2012-10-24 Thread Daniel Holth
On Wed, Oct 24, 2012 at 7:28 AM, Ronald Oussoren  wrote:
>
> On 18 Oct, 2012, at 19:29, Daniel Holth  wrote:
>
>> I'd like to submit the Wheel PEPs 425 (filename metadata), 426
>> (Metadata 1.3), and 427 (wheel itself) for acceptance. The format has
>> been stable since May and we are preparing a patch to support it in
>> pip, but we need to earn consensus before including it in the most
>> widely used installer.
>
> PEP 427:
>
> * The installation section mentions that .py files should be compiled to 
> .pyc/.pyo files, and that "Uninstallers should be smart enough to remove .pyc 
> even if it is not mentioned in RECORD.".
>
>Wouldn't it be better to add the compiled files to the RECORD file? That 
> would break the digital signature, but I'm not sure if verifying the 
> signature post-installation is useful (or if it's even
>intended to work).

The trouble with mentioning .pyc files in RECORD is that someone can
install Python 3.4, and suddenly you have additional .pyc files,
approximately __pycache__/pyfile.cp34.pyc. So you should remove more
than what you installed anyway.

You can't verify the signature post-installation. #!python and RECORD
have been rewritten at this point.

> * Why is urlsafe_b64encode_nopad used to encode the hash in the record file, 
> instead of the normal hex encoding that's directly supported by the hash 
> module and system tools?

It's nice and small. The encoder is just
base64.urlsafe_b64encode(digest).rstrip('=')

> * The way to specify the required public key in package requirements in ugly 
> (it looks like an abuse of setuptools' extras mechanism). Is there really no 
> nicer way to specify this?
>
> * As was noted before there is no threat model for the signature feature, 
> which makes it hard to evaluate if the feature.  In particular, what is the 
> advantage of this over PGP signatures of wheels? (PyPI already supports 
> detached signatures, and such signatures are used more widely in the OSS 
> world)
>
> * RECORD.p7s is not described at all. I'm assuming this is intented to be a 
> X.509 signature of RECORD in pkcs7 format. Why PKCS7 and not PEM? The latter 
> seems to be easier to work with.

I am very confused about the idea that
not-downloading-the-archive-you-expected (pypi accounts getting
hacked, man-in-the-middle attacks, simply using the wrong index) is an
unrealistic threat.

It might help to think of the wheel signing scheme as a more powerful
version of the current #md5=digest instead of comparing it to PGP or
TLS. An md5 sum verifies the integrity of a single archive, the wheel
signing key verifies the integrity of any number of archives. Like the
archive digest, wheel just explains how to attach the signature to the
archive. A system for [automatically] trusting any particular key
could be built separately.

Wheel's signing scheme is similar to jarsigner. The big advantage over
PGP is that they are attached and less likely to get lost. PyPI still
supports detached signatures, even on wheel files, but they are
unpopular. Wheel gives you an additional different option.

Since the signature is over the unpacked contents, you can also change
the compression algorithm in the zipfile or append another signature
without invalidating the existing signature.

The simplified certificate model is inspired by SPKI/SDSI
(http://world.std.com/~cme/html/spki.html), Convergence
(http://convergence.io/) TACK (http://tack.io), and the general
discussion about the brokenness of the certificate authority system.
You get the raw public key without a claim that it represents anything
or anyone.

PKCS7 is the format that a US government user would be required to use
with their smartcard-based system.

I like the packagename[algorithm=key] syntax even though it started as
a hack. It fits into the existing pip requirements.txt syntax
perfectly, unlike packagename[extra]#algorithm=key, and it reads like
array indexing.
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] accept the wheel PEPs 425, 426, 427

2012-10-24 Thread Daniel Holth
On Wed, Oct 24, 2012 at 7:04 AM, Ronald Oussoren  wrote:
>
> On 18 Oct, 2012, at 19:29, Daniel Holth  wrote:
>
>> I'd like to submit the Wheel PEPs 425 (filename metadata), 426
>> (Metadata 1.3), and 427 (wheel itself) for acceptance. The format has
>> been stable since May and we are preparing a patch to support it in
>> pip, but we need to earn consensus before including it in the most
>> widely used installer.
>
> PEP 425:
>
> * "The version is py_version_nodot. CPython gets away with no dot, but if one 
> is needed the underscore _ is used instead"
>
>I don't particularly like replacing dots by underscores. That needed 
> because you use the dot character in compressed tag sets, but why not use a 
> comma to separate items in the compressed tag set?

> * "The platform tag is simply distutils.util.get_platform() with all hyphens 
> - and periods . replaced with underscore _."
>
>Why the replacement?  The need for replacement could be avoided by using a 
> different separator between elements of a tag (for example "~" or "+"), and 
> furthermore the platform tag is at a know
>location, and hence the use of hyphens in the platform tag is harmless 
> (use "python_tag, abi_tag, platform_tag = tag.split('-', 2)" to split the tag 
> into its elements.

This is based on the longstanding convention of folding - and _
(hyphen and underscore) in built distribution filenames and using - to
separate parts.

> * "compressed tag sets"
>
>Using '"," instead of "." to separate elements of the tag set takes away 
> the need to replace dots in tag elements, and seems more natural to me (you'd 
> also use comma to separate the elements
>when you write them down in prose or python code.

I kindof like the ,

The + might transform into a space in URLs?
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Split unicodeobject.c into subfiles

2012-10-24 Thread Larry Hastings

On 10/23/2012 09:29 AM, Georg Brandl wrote:

Especially since you're suggesting a huge number of new files, I question the
argument of better navigability.


FWIW I'm -1 on it too.  I don't see what the big deal is with "large" 
source files.  If you have difficulty finding your way around 
unicodeobject.c, that seems like more like a tooling issue to me, not a 
source code structural issue.



//arry/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] Some joker is trying to unsubscribe me

2012-10-24 Thread Guido van Rossum
I've received three messages in the past hour from mailman at
python.org notifying me of various attempts to receive a password
reminder or to remove me from python-dev. I hope they don't succeed.

-- 
--Guido van Rossum (python.org/~guido)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Python Bug Day in October

2012-10-24 Thread Maciej Szulik

On 10/24/2012 03:19 AM, Éric Araujo wrote:

Hello,

Le 12/10/2012 13:50, Petri Lehtinen a écrit :

It's two and a half weeks left, but I've not seen any announcements
yet!

Indeed, work and other commitments took over, so we (Montréal-Python)
decided to move the bug day instead of announcing it late.  The date
that would work for us is November 3rd.

Brian, is it okay for Boston?
Maciej, what about your group?
Comitters, who could join on IRC?

Sorry for the false start.



Eric,
We have a meeting tomorrow, I'll talk to guys, but because we're
starting Silesian Python Group, there won't be much interest yet.
I'll try to work on that ;) maybe some time in future I could organize
this kind of event and invite all of you to join us.
Nonetheless I'll try to join both events, on this and next Saturday
on IRC.

Maciej
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Split unicodeobject.c into subfiles

2012-10-24 Thread Nick Coghlan
On Oct 25, 2012 2:06 AM, "Larry Hastings"  wrote:
>
> On 10/23/2012 09:29 AM, Georg Brandl wrote:
>>
>> Especially since you're suggesting a huge number of new files, I
question the
>> argument of better navigability.
>
>
> FWIW I'm -1 on it too.  I don't see what the big deal is with "large"
source files.  If you have difficulty finding your way around
unicodeobject.c, that seems like more like a tooling issue to me, not a
source code structural issue.

OK, I need to weigh in after seeing this kind of reply. Large source files
are discouraged in general because they're a code smell that points
strongly towards a *lack of modularity* within a *complex piece of
functionality*.

Breaking such files up into separately compiled modules serves two purposes:
1. It proves that the code *isn't* a tangled monolithic mess;
2. It enlists the compilation toolchain's assistance in ensuring that
remains the case in the future.

I find complaints about the ease of searching within the file to be
misguided and irrelevant, as I can just as easily reply with "if searching
across multiple files is hard for you, use better tools, like grep, or
'Find in Files'".

Note that I also consider the "pro" argument about better navigability
inaccurate - the real gain is in *modularity*, making it clear to readers
which parts can be understood and worked on separately from each other.

We are not special snow flakes - good software engineering practice is
advisable for us as well, so a big +1 from me for breaking up the
monstrosity that is unicodeobject.c and lowering the barrier to entry for
hacking on the individual pieces. This should come with a large block
comment in unicodeobject.c explaining how the pieces are put back together
again.

However, -1 on the "faux modularity" idea of breaking up the files on disk,
but still exposing them to the compiler and linker as a monolithic block,
though. That would be completely missing the point of why large source
files are bad.

Regards,
Nick.

--
Sent from my phone, thus the relative brevity :)

>
>
> /arry
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
http://mail.python.org/mailman/options/python-dev/ncoghlan%40gmail.com
>
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Some joker is trying to unsubscribe me

2012-10-24 Thread Thomas Wouters
On Wed, Oct 24, 2012 at 9:19 PM, Guido van Rossum  wrote:

> I've received three messages in the past hour from mailman at
> python.org notifying me of various attempts to receive a password
> reminder or to remove me from python-dev. I hope they don't succeed.


Are you asking us to CC you on all messages? I'm sure it could be arranged
:>

-- 
Thomas Wouters 

Hi! I'm a .signature virus! copy me into your .signature file to help me
spread!
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Split unicodeobject.c into subfiles

2012-10-24 Thread Barry Warsaw
On Oct 25, 2012, at 08:15 AM, Nick Coghlan wrote:

>OK, I need to weigh in after seeing this kind of reply. Large source files
>are discouraged in general because they're a code smell that points
>strongly towards a *lack of modularity* within a *complex piece of
>functionality*.

Modularity is good, and the file system structure of the project should
reflect that, but to be effective, it needs to be obvious.  It's pretty
obvious what's generally in intobject.c.  I've worked with code bases where
there's no rhyme nor reason as to what you'd find in a particular file, and
this really hurts.

It hurts even with good tools.  Remember that sometimes you don't even know
what you're looking for, so search tools may not be very useful.  For example,
sometimes you want to understand how all the pieces fit together, what the
holistic view of the subsystem is, or where the "entry points" are.  Search
tools are not very good at this, and if it's a subsystem you only interact
with occasionally, having a file system organization that makes things easier
to remember what you learned the last time you were there helps enormously.

Another point: rather than large files (or maybe in addition to them), large
functions can also be painful to navigate.  So just splitting a file into
subfiles may not be the only modularity improvement you can make.

While I'm personally -0 about splitting up unicodeobject.c, if the folks
advocating for it go ahead with it, I just ask that you do it very carefully,
with an eye toward the casual and newbie reader of our code base.

Cheers,
-Barry
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Split unicodeobject.c into subfiles

2012-10-24 Thread Nick Coghlan
On Thu, Oct 25, 2012 at 8:37 AM, Barry Warsaw  wrote:
> On Oct 25, 2012, at 08:15 AM, Nick Coghlan wrote:
>
>>OK, I need to weigh in after seeing this kind of reply. Large source files
>>are discouraged in general because they're a code smell that points
>>strongly towards a *lack of modularity* within a *complex piece of
>>functionality*.
>
> Modularity is good, and the file system structure of the project should
> reflect that, but to be effective, it needs to be obvious.  It's pretty
> obvious what's generally in intobject.c.  I've worked with code bases where
> there's no rhyme nor reason as to what you'd find in a particular file, and
> this really hurts.
>
> It hurts even with good tools.  Remember that sometimes you don't even know
> what you're looking for, so search tools may not be very useful.  For example,
> sometimes you want to understand how all the pieces fit together, what the
> holistic view of the subsystem is, or where the "entry points" are.  Search
> tools are not very good at this, and if it's a subsystem you only interact
> with occasionally, having a file system organization that makes things easier
> to remember what you learned the last time you were there helps enormously.

And if we were talking in the abstract, I think these would be
reasonable concerns to bring up. However, Victor's proposed division
*is* logical (especially if he goes down the path of a separate
subdirectory which will better support easy searching across all of
the unicode object related files), and I conditioned my +1 with the
requirement that a road map be provided in a leading block comment in
unicodeobject.c.

speed.python.org is also making progress, and once that is up and
running (which will happen well before any Python 3.4 release) it will
be possible to compare the numbers between 3.3 and trunk to help
determine the validity of any concerns regarding optimisations that
can be performed within a module but not across modules.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Split unicodeobject.c into subfiles

2012-10-24 Thread Stephen J. Turnbull
Nick Coghlan writes:

 > OK, I need to weigh in after seeing this kind of reply. Large source files
 > are discouraged in general because they're a code smell that points
 > strongly towards a *lack of modularity* within a *complex piece of
 > functionality*.

Sure, but large numbers of tiny source files are also a code smell,
the smell of purist adherence to the literal principle of modularity
without application of judgment.

If you want to argue that the pragmatic point of view nevertheless is
to break up the file, I can see that, but I think Victor is going too
far.  (Full disclosure dept.: the call graph of the Emacs equivalents
is isomorphic to the Dungeon of Zork, so I may be a bit biased.)  You
really should speak to the question of "how many" and "what partition".

 > the real gain is in *modularity*, making it clear to readers which
 > parts can be understood and worked on separately from each other.

Yeah, so which do you think they are?  It seems to me that there are
three modules to be carved out of unicodeobject.c:

1.  The internal object management that is not exposed to Python:
allocation, deallocation, and PEP 393 transformations.

2.  The public interface to Python implementation: methods and
properties, including operators.

3.  Interaction with the outside world: codec implementations.  But
conceptually, these really don't have anything to do with internal
implementation of Unicode objects.  They're just functions that
convert bytes to Unicode and vice versa.  In principle they can be
written in terms of ord(), chr(), and bytes().  On the other hand,
they're rather repetitive: "When you've seen one codec
implementation, you've seen them all."  I see no harm in grouping
them in one file, and possibly a gain from proximity: casual
passers-by might see refactorings that reduce redundancy.

I'm not sure what to do with the charmap stuff.  In current CPython
head it seems incoherent to me: there's an IO codec, but there's also
unicode-to-unicode stuff (PyUnicode_Translate).  I haven't had time to
look at Victor's reorganization to see what he actually did with it,
but in terms of modularity, it seems to me that refactoring this stuff
would be a real win, as opposed to splitting the files which is
presentational improvement for the rest of the code which is pretty
modular.

As for Victor's proposal itself:

  1176 Objects/unicodecharmap.c
  1678 Objects/unicodecodecs.c
  1362 Objects/unicodeformat.c
   253 Objects/unicodeimpl.h
   733 Objects/unicodelegacy.c
  1836 Objects/unicodenew.c
  2777 Objects/unicodeobject.c
  2421 Objects/unicodeoperators.c
  1235 Objects/unicodeoscodecs.c
  1288 Objects/unicodeutfcodecs.c

As Victor himself admits, "unicodelegacy" and "unicodenew" are not
descriptive of what they contain.  In I18N discussions, "legacy" is
usually a deprectory reference to non-Unicode encodings, and I would
tend to guess this file contains codecs from the name.  A better name
might be "unicodedeprecated" (if what he really means is deprecated
APIs).

I don't understand why splitting out "unicodeoperators" is a great
idea; it's done nowhere else in CPython.  If that makes sense, why not
split out "unicodemethods" (for methods normally invoked explicitly
rather than by syntax) too?  N.B. For bytes, the corresponding file is
spelled "bytes_methods".

"unicodecodecs" vs "unicodeutfcodecs": Say what?  I would forever be
looking in the wrong one.

"unicodeoscodecs" suggests to me that these codecs are only usable on
some OSes.  If so, shouldn't the relevant OS be in the name?  If not,
the name is basically misleading IMO.

Why are any of these codecs here in unicodeobjectland in the first
place?  Sure, they're needed so that Python can find its own stuff,
but in principle *any* codec could be needed.  Is it just an heuristic
that the codecs needed for 99% of the world are here, and other codecs
live in separate modules?

Steve
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Split unicodeobject.c into subfiles

2012-10-24 Thread Nick Coghlan
On Thu, Oct 25, 2012 at 2:22 PM, Stephen J. Turnbull  wrote:
> Nick Coghlan writes:
>
>  > OK, I need to weigh in after seeing this kind of reply. Large source files
>  > are discouraged in general because they're a code smell that points
>  > strongly towards a *lack of modularity* within a *complex piece of
>  > functionality*.
>
> Sure, but large numbers of tiny source files are also a code smell,
> the smell of purist adherence to the literal principle of modularity
> without application of judgment.

Absolutely. The classic example of this is Java's unfortunate
insistence on only-one-public-top-level-class-per-file. Bleh.

> If you want to argue that the pragmatic point of view nevertheless is
> to break up the file, I can see that, but I think Victor is going too
> far.  (Full disclosure dept.: the call graph of the Emacs equivalents
> is isomorphic to the Dungeon of Zork, so I may be a bit biased.)  You
> really should speak to the question of "how many" and "what partition".

Yes, I agree I was too hasty in calling the specifics of Victor's
current proposal a good idea. What raised my ire was the raft of
replies objecting to the refactoring *in principle* for completely
specious reasons like being able to search within a single file
instead of having to use tools that can search across multiple files.

unicodeobject.c is too big, and should be restructured to make any
natural modularity explicit, and provide an easier path for users that
want to understand how the unicode implementation works.

>  > the real gain is in *modularity*, making it clear to readers which
>  > parts can be understood and worked on separately from each other.
>
> Yeah, so which do you think they are?  It seems to me that there are
> three modules to be carved out of unicodeobject.c:
>
> 1.  The internal object management that is not exposed to Python:
> allocation, deallocation, and PEP 393 transformations.
>
> 2.  The public interface to Python implementation: methods and
> properties, including operators.
>
> 3.  Interaction with the outside world: codec implementations.  But
> conceptually, these really don't have anything to do with internal
> implementation of Unicode objects.  They're just functions that
> convert bytes to Unicode and vice versa.  In principle they can be
> written in terms of ord(), chr(), and bytes().  On the other hand,
> they're rather repetitive: "When you've seen one codec
> implementation, you've seen them all."  I see no harm in grouping
> them in one file, and possibly a gain from proximity: casual
> passers-by might see refactorings that reduce redundancy.

I suspect you and Victor are in a much better position to thrash out
the details than I am. It was the trend in the discussion to treat the
question as "split or don't split?" rather than "how should we split
it?" when a file that large should already contain some natural
splitting points if the implementation isn't a tangled monolithic
mess.

> Why are any of these codecs here in unicodeobjectland in the first
> place?  Sure, they're needed so that Python can find its own stuff,
> but in principle *any* codec could be needed.  Is it just an heuristic
> that the codecs needed for 99% of the world are here, and other codecs
> live in separate modules?

I believe it's a combination of history and whether or not they're
needed by the interpreter during the bootstrapping process before the
encodings namespace is importable.

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Split unicodeobject.c into subfiles

2012-10-24 Thread M.-A. Lemburg
On 25.10.2012 08:42, Nick Coghlan wrote:
>> Why are any of these codecs here in unicodeobjectland in the first
>> place?  Sure, they're needed so that Python can find its own stuff,
>> but in principle *any* codec could be needed.  Is it just an heuristic
>> that the codecs needed for 99% of the world are here, and other codecs
>> live in separate modules?
> 
> I believe it's a combination of history and whether or not they're
> needed by the interpreter during the bootstrapping process before the
> encodings namespace is importable.

They are in unicodeobject.c so that the compilers can inline the
code in the various other places where they are used in the Unicode
implementation directly as necessary and because the codecs use
a lot of functions from the Unicode API (obviously), so the other
direction of inlining (Unicode API in codecs) is needed as well.

BTW: When discussing compiler optimizations, please remember that
there are more compilers out there than just GCC and also the fact
that not everyone is using the latest and greatest version of it.
Link time inlining will usually not be as efficient as compile time
optimization and we need every bit of performance we can get
for Unicode in Python 3.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Source  (#1, Oct 25 2012)
>>> Python Projects, Consulting and Support ...   http://www.egenix.com/
>>> mxODBC.Zope/Plone.Database.Adapter ...   http://zope.egenix.com/
>>> mxODBC, mxDateTime, mxTextTools ...http://python.egenix.com/

2012-09-27: Released eGenix PyRun 1.1.0 ...   http://egenix.com/go35
2012-09-26: Released mxODBC.Connect 2.0.1 ... http://egenix.com/go34
2012-10-29: PyCon DE 2012, Leipzig, Germany ... 4 days to go

::: Try our new mxODBC.Connect Python Database Interface for free ! 

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   http://www.egenix.com/company/contact/
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com