Re: [Python-Dev] Fast Implementation for ZIP decryption

2009-08-30 Thread Martin v. Löwis
> Does it sound worthy enough to create a patch for and integrate into
> python itself?

Probably not, given that people think that the algorithm itself is
fairly useless.

Regards,
Martin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Fast Implementation for ZIP decryption

2009-08-30 Thread Steven D'Aprano
On Sun, 30 Aug 2009 06:55:33 pm Martin v. Löwis wrote:
> > Does it sound worthy enough to create a patch for and integrate
> > into python itself?
>
> Probably not, given that people think that the algorithm itself is
> fairly useless.

I would think that for most people, the threat model isn't "the CIA is 
reading my files" but "my little brother or nosey co-worker is reading 
my files", and for that, zip encryption with a good password is 
probably perfectly adequate. E.g. OpenOffice uses it for 
password-protected documents.

Given that Python already supports ZIP decryption (as it should), are 
there any reasons to prefer the current pure-Python implementation over 
a faster version?


-- 
Steven D'Aprano
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Mercurial migration: help needed

2009-08-30 Thread Martin Geisler
Mark Hammond  writes:

> 1) I've stalled on the 'none:' patch I promised to resurrect.  While
> doing this, I re-discovered that the tests for win32text appear to
> check win32 line endings are used by win32text on *all* platforms, not
> just Windows.

I think it is only Patrick Mezard who knows how to run (parts of) the
test suite on Windows.

> I asked for advice from Dirkjan who referred me to the mercurual-devel
> list, but my request of slightly over a week ago remains unanswered
> (http://selenic.com/pipermail/mercurial-devel/2009-August/014873.html)
> - 
> maybe I just need to be more patient...

Oh no, that's usually the wrong tactic :-) I've been too busy for real
Mercurial work the last couple of weeks, but you should not feel bad
about poking us if you don't get a reply. Or come to the IRC channel
(#mercurial on irc.freenode.net) where Dirkjan (djc) and myself (mg)
hang out when it's daytime in Europe.

> Further, Martin's comments in this thread indicate he believes a new
> extension will be necessary rather than 'fixing' win32text.  If this
> is the direction we take, it may mean the none: patch, which targets
> the implementation of win32text, is no longer necessary anyway.

I suggested a new extension for two reasons:

* I'm using Linux, and I mentally skip over all extensions that mention
  "win32"... I guess others do the same, and in this case it's really a
  shame since converting EOL markers is a cross-platform problem: if
  someone creates a repository on Windows, I might find it nice to
  translate the EOL markers into LF on my machine.

  As far as I know, all my tools works correctly with CRLF EOL markers,
  but I can see the usefulness of such an extension when adding new
  files (which would default to LF unless I take care).

* A new extension will not have to deal with backwards compatibility
  issues. That would let us clean up the strange names: I think
  "cleverencode:" and "cleverdecode:" quite poor names that convey
  little meaning (and what's with the colon?). We could instead use the
  same names as Subversion: "native", "CRLF" and "LF".

  The new extension could be named 'convert-eol' or something like that.

> 2) These same recent discussions about an entirely new extension and
> no clear indication of our expectations regarding what the tool
> actually enforces means I'm not sure how to make a start on the more
> general issue.

It would be a folly to require all files in all changesets to use the
right EOL markers -- people will be making mistakes offline. The
important thing is that they fix them before pushing to a public server.

So the extension should do that: either abort commits with the wrong EOL
markers or do as Subversion and automatically convert the file in the
working copy.

> I also fear that should I try to make a start on this, it will still
> wind up fruitless - eg, it seems any work targeting win32text
> specifically would have been wasted, so I'd really like to see a
> consensus on what needs to be done before attempting to start it.

As I understand it, what is lacking is that win32text will read the
encode/decode settings from a versioned file called /.hgeol. This
means that you can just enable the extension and be done with it,
instead of configuring it in every clone. The /.hgeol file should
contain two sections:

  [repository]
  native = LF

  [patterns]
  Windows.txt = CRLF
  Unix.txt = LF
  Tools/buildbot/** = CRLF
  **.txt = native
  **.py = native
  **.dsp = CRLF

The [repository] setting controls what native is translated into upon
commit. The [patterns] section can be translated into safe [decode] /
[encode] settings by the extension:

  [encode]
  Windows.txt = to-crlf
  Unix.txt = to-lf
  Tools/buildbot/** = to-crlf
  **.txt = to-lf
  **.py = to-lf
  **.dsp = to-crlf

  [decode]
  Windows.txt = to-crlf
  Unix.txt = to-lf
  Tools/buildbot/** = to-crlf
  **.txt = to-native
  **.py = to-native
  **.dsp = to-crlf

where to-crlf, to-lf, to-native are filters installed by the extension.

I guess your 'none' encode/decode filter patch would be needed if the
Unix.txt file were to be stored unchanged in the repository? Instead I
imagine that the extension will convert a modified Unix.txt to LF EOL
markers automatically (Subversion behaves like that, as far as I can
tell from a bit of testing).

That way the repository will contain most files in the format specified
as native for it, but selected files are stored using whatever EOLs they
like. The result is that someone who has not enabled the extension will
get correct files from a checkout. Had we stored the *.dsp files with LF
EOLs in the repository (like Subversion does), then using the extension
would be mandatory for everybody.

-- 
Martin Geisler

VIFF (Virtual Ideal Functionality Framework) brings easy and efficient
SMPC (Secure Multiparty Computation) to Python. See: http://viff.dk/.


pgpPup5ro3MCH.pgp
Description: PGP signature
___
Python-

Re: [Python-Dev] Fast Implementation for ZIP decryption

2009-08-30 Thread exarkun

On 12:59 pm, st...@pearwood.info wrote:

On Sun, 30 Aug 2009 06:55:33 pm Martin v. L�wis wrote:

> Does it sound worthy enough to create a patch for and integrate
> into python itself?

Probably not, given that people think that the algorithm itself is
fairly useless.


I would think that for most people, the threat model isn't "the CIA is
reading my files" but "my little brother or nosey co-worker is reading
my files", and for that, zip encryption with a good password is
probably perfectly adequate. E.g. OpenOffice uses it for
password-protected documents.

Given that Python already supports ZIP decryption (as it should), are
there any reasons to prefer the current pure-Python implementation over
a faster version?


Given that the use case is "protect my biology homework from my little 
brother", how fast does the implementation really need to be?  Is 
speeding it up from 0.1 seconds to 0.001 seconds worth the potential new 
problems that come with more C code (more code to maintain, less 
portability to other runtimes, potential for interpreter crashes or even 
arbitrary code execution vulnerabilities from specially crafted files)?


Jean-Paul
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Fast Implementation for ZIP decryption

2009-08-30 Thread Shashank Singh
just to give you an idea of the speed up:

a 3.3 mb zip file extracted using the current all-python implementation on
my machine (win xp 1.67Ghz 1.5GB)
takes approximately 38 seconds.

the same file when extracted using c implementation takes 0.4 seconds.

--shashank

On Sun, Aug 30, 2009 at 6:35 PM,  wrote:

> On 12:59 pm, st...@pearwood.info wrote:
>
>> On Sun, 30 Aug 2009 06:55:33 pm Martin v. Löwis wrote:
>>
>>> > Does it sound worthy enough to create a patch for and integrate
>>> > into python itself?
>>>
>>> Probably not, given that people think that the algorithm itself is
>>> fairly useless.
>>>
>>
>> I would think that for most people, the threat model isn't "the CIA is
>> reading my files" but "my little brother or nosey co-worker is reading
>> my files", and for that, zip encryption with a good password is
>> probably perfectly adequate. E.g. OpenOffice uses it for
>> password-protected documents.
>>
>> Given that Python already supports ZIP decryption (as it should), are
>> there any reasons to prefer the current pure-Python implementation over
>> a faster version?
>>
>
> Given that the use case is "protect my biology homework from my little
> brother", how fast does the implementation really need to be?  Is speeding
> it up from 0.1 seconds to 0.001 seconds worth the potential new problems
> that come with more C code (more code to maintain, less portability to other
> runtimes, potential for interpreter crashes or even arbitrary code execution
> vulnerabilities from specially crafted files)?
>
> Jean-Paul
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/shashank.sunny.singh%40gmail.com
>
>
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Fast Implementation for ZIP decryption

2009-08-30 Thread Ludvig Ericson

On 30 aug 2009, at 16:34, Shashank Singh wrote:

just to give you an idea of the speed up:

a 3.3 mb zip file extracted using the current all-python  
implementation on my machine (win xp 1.67Ghz 1.5GB)

takes approximately 38 seconds.

the same file when extracted using c implementation takes 0.4 seconds.


If this matters to the users of the API, then likely they'd search for  
alternatives -- no need for it to go into the standard library just  
because it replaces functionality, or am I misunderstanding?


- Ludvig Ericson 

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Mercurial migration: help needed

2009-08-30 Thread Martin v. Löwis
> I suggested a new extension for two reasons:
> 
> * I'm using Linux, and I mentally skip over all extensions that mention
>   "win32"... I guess others do the same, and in this case it's really a
>   shame since converting EOL markers is a cross-platform problem: if
>   someone creates a repository on Windows, I might find it nice to
>   translate the EOL markers into LF on my machine.
> 
>   As far as I know, all my tools works correctly with CRLF EOL markers,
>   but I can see the usefulness of such an extension when adding new
>   files (which would default to LF unless I take care).
> 
> * A new extension will not have to deal with backwards compatibility
>   issues. That would let us clean up the strange names: I think
>   "cleverencode:" and "cleverdecode:" quite poor names that convey
>   little meaning (and what's with the colon?). We could instead use the
>   same names as Subversion: "native", "CRLF" and "LF".
> 
>   The new extension could be named 'convert-eol' or something like that.

Thanks for the confirmation - this is also why I think a new extension
would be best. FWIW, in Python, most files would be declared native,
some CRLF, none LF.

>> 2) These same recent discussions about an entirely new extension and
>> no clear indication of our expectations regarding what the tool
>> actually enforces means I'm not sure how to make a start on the more
>> general issue.
> 
> It would be a folly to require all files in all changesets to use the
> right EOL markers -- people will be making mistakes offline. The
> important thing is that they fix them before pushing to a public server.
> 
> So the extension should do that: either abort commits with the wrong EOL
> markers or do as Subversion and automatically convert the file in the
> working copy.

Maybe I misunderstand: when people use the extension, they cannot
possibly make mistakes, right? Because the commit that gets aborted
is already the local commit, right?

Of course, it may still be that not all people use the extension.
I think this is of concern to Mark (and he would like hg to refuse
operation at all if the extension isn't used), but not to me: I would
like this to be a feature of hg eventually, in which case I don't need
to worry whether hg enforces presence of certain extensions.

If people make commits that break the eol style, we could well
refuse to accept them on the server, telling people that they should
have used the extension (or that they should have been more careful
if they don't use the extension).

I think subversion's behavior wrt. incorrect eol-style is more subtle.
In some cases, it will complain about inconsistencies, rather than
fixing them automatically.

Regards,
Martin

___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Mercurial migration: help needed

2009-08-30 Thread Martin Geisler
"Martin v. Löwis"  writes:

>> So the extension should do that: either abort commits with the wrong
>> EOL markers or do as Subversion and automatically convert the file in
>> the working copy.
>
> Maybe I misunderstand: when people use the extension, they cannot
> possibly make mistakes, right? Because the commit that gets aborted is
> already the local commit, right?
>
> Of course, it may still be that not all people use the extension.

Exactly, when people use the extension, they wont be able to make bad
commits.

> I think this is of concern to Mark (and he would like hg to refuse
> operation at all if the extension isn't used), but not to me: I would
> like this to be a feature of hg eventually, in which case I don't need
> to worry whether hg enforces presence of certain extensions.

Yes, that would be nice for the future. I don't know if the other
Mercurial developers will see this as a big controversy -- Mercurial has
so far made very sure to never mutate your files behind your back.
Expansion of keywords (like $Id$) is also implemented as an extension.

> If people make commits that break the eol style, we could well refuse
> to accept them on the server, telling people that they should have
> used the extension (or that they should have been more careful if they
> don't use the extension).

Indeed. Their work will not be lost -- one can always take the final
file, convert the line-endings, copy it into a fresh clone and commit
that. With more work one could even salvage the intermediate commits,
but that is probably not necessary.

> I think subversion's behavior wrt. incorrect eol-style is more subtle.
> In some cases, it will complain about inconsistencies, rather than
> fixing them automatically.

Okay --- I don't have much experience with the svn:eol-style, except
that I've read about it in the manual.

-- 
Martin Geisler

VIFF (Virtual Ideal Functionality Framework) brings easy and efficient
SMPC (Secure Multiparty Computation) to Python. See: http://viff.dk/.


pgpaYHbx5rh2L.pgp
Description: PGP signature
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Fast Implementation for ZIP decryption

2009-08-30 Thread Nick Coghlan
exar...@twistedmatrix.com wrote:
> Given that the use case is "protect my biology homework from my little
> brother", how fast does the implementation really need to be?  Is
> speeding it up from 0.1 seconds to 0.001 seconds worth the potential new
> problems that come with more C code (more code to maintain, less
> portability to other runtimes, potential for interpreter crashes or even
> arbitrary code execution vulnerabilities from specially crafted files)?

Also, if the use case is just protecting stuff from a sibling or your
childen, use an archiving program to zip/extract it :)

So -1 here as well. Any added C code has a real cost for the reasons
Jean-Paul listed, so it should only be used in cases where there's a
major practical benefit to the speed-up. Faster execution of a
problematic algorithm that is already well implemented by plenty of
other applications doesn't qualify in my book (even if the speedup is by
a couple of orders of magnitude).

Cheers,
Nick.

-- 
Nick Coghlan   |   ncogh...@gmail.com   |   Brisbane, Australia
---
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


[Python-Dev] how important is setting co_filename for a module being imported to what __file__ is set to?

2009-08-30 Thread Brett Cannon
I am going through and running the entire test suite using importlib
to ferret out incompatibilities. I have found a bunch, although all
rather minor (raising a different exception typically; not even sure
they are worth backporting as anyone reliant on the old exceptions
might get a nasty surprise in the next micro release), and now I am
down to my last failing test suite: test_import.

Ignoring the execution bit problem (http://bugs.python.org/issue6526
but I have no clue why this is happening), I am bumping up against
TestPycRewriting.test_incorrect_code_name. Turns out that import
resets co_filename on a code object to __file__ before exec'ing it to
create a module's namespace in order to ignore the file name passed
into compile() for the filename argument. Now I can't change
co_filename from Python as it's a read-only attribute and thus can't
match this functionality in importlib w/o creating some custom code to
allow me to specify the co_filename somewhere (marshal.loads() or some
new function).

My question is how important is this functionality? Do I really need
to go through and add an argument to marshal.loads or some new
function just to set co_filename to something that someone explicitly
set in a .pyc file? Or I can let this go and have this be the one
place where builtins.__import__ and importlib.__import__ differ and
just not worry about it?

-Brett
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] how important is setting co_filename for a module being imported to what __file__ is set to?

2009-08-30 Thread Robert Collins
On Sun, 2009-08-30 at 16:28 -0700, Brett Cannon wrote:
> 
> 
> My question is how important is this functionality? Do I really need
> to go through and add an argument to marshal.loads or some new
> function just to set co_filename to something that someone explicitly
> set in a .pyc file? Or I can let this go and have this be the one
> place where builtins.__import__ and importlib.__import__ differ and
> just not worry about it? 

Just to be clear, this would show up if I:
had a python tree
built and run stuff from it
symlinked to that tree from somewhere else
ran stuff from that somewhere else

 - because the pyc is already on disk?

Thats been an invaluable 'wtf' debugging tool at various times, because
the odd provenance of the path in the pyc makes it extremely clear that
what is being loaded isn't what one had thought was being loaded.

OTOH, always showing the path that the pyc was *actually found at* would
fix the weirdness that occurs when you mv a python tree from one place
to another.

-Rob


signature.asc
Description: This is a digitally signed message part
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] how important is setting co_filename for a module being imported to what __file__ is set to?

2009-08-30 Thread Brett Cannon
On Sun, Aug 30, 2009 at 17:13, Robert Collins wrote:
> On Sun, 2009-08-30 at 16:28 -0700, Brett Cannon wrote:
>>
>>
>> My question is how important is this functionality? Do I really need
>> to go through and add an argument to marshal.loads or some new
>> function just to set co_filename to something that someone explicitly
>> set in a .pyc file? Or I can let this go and have this be the one
>> place where builtins.__import__ and importlib.__import__ differ and
>> just not worry about it?
>
> Just to be clear, this would show up if I:
> had a python tree
> built and run stuff from it
> symlinked to that tree from somewhere else
> ran stuff from that somewhere else

Right; the code object would think it was loaded from the original
location it was created at instead of where it actually is. Now why
someone would want to move their .pyc files around instead of
recompiling I don't know short of not wanting to send someone source.

-Brett
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] how important is setting co_filename for a module being imported to what __file__ is set to?

2009-08-30 Thread Guido van Rossum
On Sun, Aug 30, 2009 at 4:28 PM, Brett Cannon wrote:
> I am going through and running the entire test suite using importlib
> to ferret out incompatibilities. I have found a bunch, although all
> rather minor (raising a different exception typically; not even sure
> they are worth backporting as anyone reliant on the old exceptions
> might get a nasty surprise in the next micro release), and now I am
> down to my last failing test suite: test_import.
>
> Ignoring the execution bit problem (http://bugs.python.org/issue6526
> but I have no clue why this is happening), I am bumping up against
> TestPycRewriting.test_incorrect_code_name. Turns out that import
> resets co_filename on a code object to __file__ before exec'ing it to
> create a module's namespace in order to ignore the file name passed
> into compile() for the filename argument. Now I can't change
> co_filename from Python as it's a read-only attribute and thus can't
> match this functionality in importlib w/o creating some custom code to
> allow me to specify the co_filename somewhere (marshal.loads() or some
> new function).
>
> My question is how important is this functionality? Do I really need
> to go through and add an argument to marshal.loads or some new
> function just to set co_filename to something that someone explicitly
> set in a .pyc file? Or I can let this go and have this be the one
> place where builtins.__import__ and importlib.__import__ differ and
> just not worry about it?

ISTR that Bill Janssen once mentioned a file replication mechanism
whereby there were two names for each file: the "canonical" name on a
replicated read-only filesystem, and the longer "writable" name on a
unique master copy. He ended up with the filenames in the .pyc files
being pretty bogus (since not everyone had access to the writable
filesystem). So setting co_filename to match __file__ (i.e. the name
under which the module is being imported) would be a nice service in
this case.

In general this would happen whenever you pre-compile a bunch of .py
files to .pyc/.pyo and then copy the lot to a different location. Not
a completely unlikely scenario.

(I was going to comment on the execution bit issue but I realized I'm
not even sure if you're talking about import.c or not. :-)

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] how important is setting co_filename for a module being imported to what __file__ is set to?

2009-08-30 Thread Guido van Rossum
On Sun, Aug 30, 2009 at 5:23 PM, Brett Cannon wrote:
> Right; the code object would think it was loaded from the original
> location it was created at instead of where it actually is. Now why
> someone would want to move their .pyc files around instead of
> recompiling I don't know short of not wanting to send someone source.

I already mentioned replication; it could also just be a matter of
downloading a tarball with .py and .pyc files.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] how important is setting co_filename for a module being imported to what __file__ is set to?

2009-08-30 Thread Brett Cannon
On Sun, Aug 30, 2009 at 17:24, Guido van Rossum wrote:
> On Sun, Aug 30, 2009 at 4:28 PM, Brett Cannon wrote:
>> I am going through and running the entire test suite using importlib
>> to ferret out incompatibilities. I have found a bunch, although all
>> rather minor (raising a different exception typically; not even sure
>> they are worth backporting as anyone reliant on the old exceptions
>> might get a nasty surprise in the next micro release), and now I am
>> down to my last failing test suite: test_import.
>>
>> Ignoring the execution bit problem (http://bugs.python.org/issue6526
>> but I have no clue why this is happening), I am bumping up against
>> TestPycRewriting.test_incorrect_code_name. Turns out that import
>> resets co_filename on a code object to __file__ before exec'ing it to
>> create a module's namespace in order to ignore the file name passed
>> into compile() for the filename argument. Now I can't change
>> co_filename from Python as it's a read-only attribute and thus can't
>> match this functionality in importlib w/o creating some custom code to
>> allow me to specify the co_filename somewhere (marshal.loads() or some
>> new function).
>>
>> My question is how important is this functionality? Do I really need
>> to go through and add an argument to marshal.loads or some new
>> function just to set co_filename to something that someone explicitly
>> set in a .pyc file? Or I can let this go and have this be the one
>> place where builtins.__import__ and importlib.__import__ differ and
>> just not worry about it?
>
> ISTR that Bill Janssen once mentioned a file replication mechanism
> whereby there were two names for each file: the "canonical" name on a
> replicated read-only filesystem, and the longer "writable" name on a
> unique master copy. He ended up with the filenames in the .pyc files
> being pretty bogus (since not everyone had access to the writable
> filesystem). So setting co_filename to match __file__ (i.e. the name
> under which the module is being imported) would be a nice service in
> this case.
>
> In general this would happen whenever you pre-compile a bunch of .py
> files to .pyc/.pyo and then copy the lot to a different location. Not
> a completely unlikely scenario.
>

Well, to get this level of compatibility I am going to need to add
some magical API somewhere then to overwrite a code object's "file"
location. Blah.

I will either add an argument to marshal.loads to specify an
overriding file path or add an imp.exec that takes a file path
argument to override the code object with.

> (I was going to comment on the execution bit issue but I realized I'm
> not even sure if you're talking about import.c or not. :-)

So it turns out a bunch of execution/write bit stuff has come up in
Python 2.7 and importlib has been ignoring it. =) Importlib has simply
been opening up the bytecode files with 'wb' and writing out the file.
But test_import tests that no execution bit get set or that a write
bit gets added if the source file lacks it. I guess I can use
posix.chmod and posix.stat to copy the source file's read and write
bits and always mask out the execution bits. I hate this low-level
file permission stuff.

-Brett
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] how important is setting co_filename for a module being imported to what __file__ is set to?

2009-08-30 Thread Guido van Rossum
On Sun, Aug 30, 2009 at 5:34 PM, Brett Cannon wrote:
> On Sun, Aug 30, 2009 at 17:24, Guido van Rossum wrote:
>> On Sun, Aug 30, 2009 at 4:28 PM, Brett Cannon wrote:
>>> I am going through and running the entire test suite using importlib
>>> to ferret out incompatibilities. I have found a bunch, although all
>>> rather minor (raising a different exception typically; not even sure
>>> they are worth backporting as anyone reliant on the old exceptions
>>> might get a nasty surprise in the next micro release), and now I am
>>> down to my last failing test suite: test_import.
>>>
>>> Ignoring the execution bit problem (http://bugs.python.org/issue6526
>>> but I have no clue why this is happening), I am bumping up against
>>> TestPycRewriting.test_incorrect_code_name. Turns out that import
>>> resets co_filename on a code object to __file__ before exec'ing it to
>>> create a module's namespace in order to ignore the file name passed
>>> into compile() for the filename argument. Now I can't change
>>> co_filename from Python as it's a read-only attribute and thus can't
>>> match this functionality in importlib w/o creating some custom code to
>>> allow me to specify the co_filename somewhere (marshal.loads() or some
>>> new function).
>>>
>>> My question is how important is this functionality? Do I really need
>>> to go through and add an argument to marshal.loads or some new
>>> function just to set co_filename to something that someone explicitly
>>> set in a .pyc file? Or I can let this go and have this be the one
>>> place where builtins.__import__ and importlib.__import__ differ and
>>> just not worry about it?
>>
>> ISTR that Bill Janssen once mentioned a file replication mechanism
>> whereby there were two names for each file: the "canonical" name on a
>> replicated read-only filesystem, and the longer "writable" name on a
>> unique master copy. He ended up with the filenames in the .pyc files
>> being pretty bogus (since not everyone had access to the writable
>> filesystem). So setting co_filename to match __file__ (i.e. the name
>> under which the module is being imported) would be a nice service in
>> this case.
>>
>> In general this would happen whenever you pre-compile a bunch of .py
>> files to .pyc/.pyo and then copy the lot to a different location. Not
>> a completely unlikely scenario.

> Well, to get this level of compatibility I am going to need to add
> some magical API somewhere then to overwrite a code object's "file"
> location. Blah.

Agreed, no fun. Unfortunately for core Python it really pays to go the
extra mile...

> I will either add an argument to marshal.loads to specify an
> overriding file path or add an imp.exec that takes a file path
> argument to override the code object with.

Remember, there are many code objects created from one pyc file.
Adding it to marshal.load*() makes sense because then it's usable for
other purposes too, and that attacks the issue from the root. (in
import.c it's done by update_compiled_module() right after
read_compiled_module(), which is a thin wrapper around marshal.load())
I'm not sure how imp.exec would make sure that introspection of the
loaded code objects always gets the right thing.

>> (I was going to comment on the execution bit issue but I realized I'm
>> not even sure if you're talking about import.c or not. :-)
>
> So it turns out a bunch of execution/write bit stuff has come up in
> Python 2.7 and importlib has been ignoring it. =) Importlib has simply
> been opening up the bytecode files with 'wb' and writing out the file.
> But test_import tests that no execution bit get set or that a write
> bit gets added if the source file lacks it. I guess I can use
> posix.chmod and posix.stat to copy the source file's read and write
> bits and always mask out the execution bits. I hate this low-level
> file permission stuff.

It's no fun -- see the layers of #ifdefs in open_exclusive() in
import.c. (Though I think you won't need to worry about VMS. :-) But
it's somewhat important to get it right from a security POV. I would
use os.open() and wrap an io.BufferedWriter around it.

-- 
--Guido van Rossum (home page: http://www.python.org/~guido/)
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] how important is setting co_filename for a module being imported to what __file__ is set to?

2009-08-30 Thread Brett Cannon
On Sun, Aug 30, 2009 at 19:34, Guido van Rossum wrote:
> On Sun, Aug 30, 2009 at 5:34 PM, Brett Cannon wrote:
>> On Sun, Aug 30, 2009 at 17:24, Guido van Rossum wrote:
>>> On Sun, Aug 30, 2009 at 4:28 PM, Brett Cannon wrote:
 I am going through and running the entire test suite using importlib
 to ferret out incompatibilities. I have found a bunch, although all
 rather minor (raising a different exception typically; not even sure
 they are worth backporting as anyone reliant on the old exceptions
 might get a nasty surprise in the next micro release), and now I am
 down to my last failing test suite: test_import.

 Ignoring the execution bit problem (http://bugs.python.org/issue6526
 but I have no clue why this is happening), I am bumping up against
 TestPycRewriting.test_incorrect_code_name. Turns out that import
 resets co_filename on a code object to __file__ before exec'ing it to
 create a module's namespace in order to ignore the file name passed
 into compile() for the filename argument. Now I can't change
 co_filename from Python as it's a read-only attribute and thus can't
 match this functionality in importlib w/o creating some custom code to
 allow me to specify the co_filename somewhere (marshal.loads() or some
 new function).

 My question is how important is this functionality? Do I really need
 to go through and add an argument to marshal.loads or some new
 function just to set co_filename to something that someone explicitly
 set in a .pyc file? Or I can let this go and have this be the one
 place where builtins.__import__ and importlib.__import__ differ and
 just not worry about it?
>>>
>>> ISTR that Bill Janssen once mentioned a file replication mechanism
>>> whereby there were two names for each file: the "canonical" name on a
>>> replicated read-only filesystem, and the longer "writable" name on a
>>> unique master copy. He ended up with the filenames in the .pyc files
>>> being pretty bogus (since not everyone had access to the writable
>>> filesystem). So setting co_filename to match __file__ (i.e. the name
>>> under which the module is being imported) would be a nice service in
>>> this case.
>>>
>>> In general this would happen whenever you pre-compile a bunch of .py
>>> files to .pyc/.pyo and then copy the lot to a different location. Not
>>> a completely unlikely scenario.
>
>> Well, to get this level of compatibility I am going to need to add
>> some magical API somewhere then to overwrite a code object's "file"
>> location. Blah.
>
> Agreed, no fun. Unfortunately for core Python it really pays to go the
> extra mile...
>

Definitely, which is why I will do it, just not tonight as I am tired
of compatibility fixing for now. =)

>> I will either add an argument to marshal.loads to specify an
>> overriding file path or add an imp.exec that takes a file path
>> argument to override the code object with.
>
> Remember, there are many code objects created from one pyc file.
> Adding it to marshal.load*() makes sense because then it's usable for
> other purposes too, and that attacks the issue from the root.

That was my thinking.

> (in
> import.c it's done by update_compiled_module() right after
> read_compiled_module(), which is a thin wrapper around marshal.load())
> I'm not sure how imp.exec would make sure that introspection of the
> loaded code objects always gets the right thing.
>

Basically it would be imp.exec(module, code, path) and it would tweak
the code object before execution based on introspecting what the
module had set for __file__. But might as well add the support to
marshal.

>>> (I was going to comment on the execution bit issue but I realized I'm
>>> not even sure if you're talking about import.c or not. :-)
>>
>> So it turns out a bunch of execution/write bit stuff has come up in
>> Python 2.7 and importlib has been ignoring it. =) Importlib has simply
>> been opening up the bytecode files with 'wb' and writing out the file.
>> But test_import tests that no execution bit get set or that a write
>> bit gets added if the source file lacks it. I guess I can use
>> posix.chmod and posix.stat to copy the source file's read and write
>> bits and always mask out the execution bits. I hate this low-level
>> file permission stuff.
>
> It's no fun -- see the layers of #ifdefs in open_exclusive() in
> import.c. (Though I think you won't need to worry about VMS. :-) But
> it's somewhat important to get it right from a security POV. I would
> use os.open() and wrap an io.BufferedWriter around it.

I will have to see what of that is implemented in C or in Python. I
have always tried to keep all pure Python code out of importlib for
bootstrapping reasons in order to keep the possibility of using
importlib as the implementation of import. But maybe I should not be
worrying about that right at the moment and instead do what keeps the
code simple.

-Brett
___

Re: [Python-Dev] how important is setting co_filename for a module being imported to what __file__ is set to?

2009-08-30 Thread Benjamin Peterson
2009/8/30 Brett Cannon :
> On Sun, Aug 30, 2009 at 19:34, Guido van Rossum wrote:
>> On Sun, Aug 30, 2009 at 5:34 PM, Brett Cannon wrote:
>>> On Sun, Aug 30, 2009 at 17:24, Guido van Rossum wrote:
 (I was going to comment on the execution bit issue but I realized I'm
 not even sure if you're talking about import.c or not. :-)
>>>
>>> So it turns out a bunch of execution/write bit stuff has come up in
>>> Python 2.7 and importlib has been ignoring it. =) Importlib has simply
>>> been opening up the bytecode files with 'wb' and writing out the file.
>>> But test_import tests that no execution bit get set or that a write
>>> bit gets added if the source file lacks it. I guess I can use
>>> posix.chmod and posix.stat to copy the source file's read and write
>>> bits and always mask out the execution bits. I hate this low-level
>>> file permission stuff.
>>
>> It's no fun -- see the layers of #ifdefs in open_exclusive() in
>> import.c. (Though I think you won't need to worry about VMS. :-) But
>> it's somewhat important to get it right from a security POV. I would
>> use os.open() and wrap an io.BufferedWriter around it.
>
> I will have to see what of that is implemented in C or in Python. I
> have always tried to keep all pure Python code out of importlib for
> bootstrapping reasons in order to keep the possibility of using
> importlib as the implementation of import. But maybe I should not be
> worrying about that right at the moment and instead do what keeps the
> code simple.

You can use the C implementation of io, _io, which has a full
buffering implementation. Of course, that also makes it a better
harder for other implementations which may wish to use importlib
because the io library would have to be completely implemented...



-- 
Regards,
Benjamin
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] how important is setting co_filename for a module being imported to what __file__ is set to?

2009-08-30 Thread Glyph Lefkowitz
On Sun, Aug 30, 2009 at 8:26 PM, Guido van Rossum  wrote:

> On Sun, Aug 30, 2009 at 5:23 PM, Brett Cannon wrote:
> > Right; the code object would think it was loaded from the original
> > location it was created at instead of where it actually is. Now why
> > someone would want to move their .pyc files around instead of
> > recompiling I don't know short of not wanting to send someone source.
>
> I already mentioned replication; it could also just be a matter of
> downloading a tarball with .py and .pyc files.


Also, if you're using Python in an embedded context, bytecode compilation
(or even filesystem access!) can be prohibitively slow, so an uncompressed
.zip file full of compiled .pyc files is really the way to go.

I did this a long time ago on an XScale machine, but recent inspection of
the Android Python scripting stuff shows a similar style of deployment (c.f.
/data/data/com.google.ase/python/lib/python26.zip).
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Fast Implementation for ZIP decryption

2009-08-30 Thread Collin Winter
On Sun, Aug 30, 2009 at 7:34 AM, Shashank
Singh wrote:
> just to give you an idea of the speed up:
>
> a 3.3 mb zip file extracted using the current all-python implementation on
> my machine (win xp 1.67Ghz 1.5GB)
> takes approximately 38 seconds.
>
> the same file when extracted using c implementation takes 0.4 seconds.

Are there any applications/frameworks which have zip files on their
critical path, where this kind of (admittedly impressive) speedup
would be beneficial? What was the motivation for writing the C
version?

Collin Winter

> On Sun, Aug 30, 2009 at 6:35 PM,  wrote:
>>
>> On 12:59 pm, st...@pearwood.info wrote:
>>>
>>> On Sun, 30 Aug 2009 06:55:33 pm Martin v. Löwis wrote:

 > Does it sound worthy enough to create a patch for and integrate
 > into python itself?

 Probably not, given that people think that the algorithm itself is
 fairly useless.
>>>
>>> I would think that for most people, the threat model isn't "the CIA is
>>> reading my files" but "my little brother or nosey co-worker is reading
>>> my files", and for that, zip encryption with a good password is
>>> probably perfectly adequate. E.g. OpenOffice uses it for
>>> password-protected documents.
>>>
>>> Given that Python already supports ZIP decryption (as it should), are
>>> there any reasons to prefer the current pure-Python implementation over
>>> a faster version?
>>
>> Given that the use case is "protect my biology homework from my little
>> brother", how fast does the implementation really need to be?  Is speeding
>> it up from 0.1 seconds to 0.001 seconds worth the potential new problems
>> that come with more C code (more code to maintain, less portability to other
>> runtimes, potential for interpreter crashes or even arbitrary code execution
>> vulnerabilities from specially crafted files)?
>>
>> Jean-Paul
>>
>> ___
>> Python-Dev mailing list
>> Python-Dev@python.org
>> http://mail.python.org/mailman/listinfo/python-dev
>> Unsubscribe:
>> http://mail.python.org/mailman/options/python-dev/shashank.sunny.singh%40gmail.com
>>
>
>
> ___
> Python-Dev mailing list
> Python-Dev@python.org
> http://mail.python.org/mailman/listinfo/python-dev
> Unsubscribe:
> http://mail.python.org/mailman/options/python-dev/collinw%40gmail.com
>
>
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Fast Implementation for ZIP decryption

2009-08-30 Thread Jeroen Ruigrok van der Werven
-On [20090831 06:29], Collin Winter (coll...@gmail.com) wrote:
>Are there any applications/frameworks which have zip files on their
>critical path, where this kind of (admittedly impressive) speedup
>would be beneficial? What was the motivation for writing the C
>version?

Would zipped eggs count? For example, SQLAlchemy runs in the 5 MB range.

-- 
Jeroen Ruigrok van der Werven  / asmodai
イェルーン ラウフロック ヴァン デル ウェルヴェン
http://www.in-nomine.org/ | http://www.rangaku.org/ | GPG: 2EAC625B
All for one, one for all...
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] how important is setting co_filename for a module being imported to what __file__ is set to?

2009-08-30 Thread Brett Cannon
On Sun, Aug 30, 2009 at 19:51, Benjamin Peterson wrote:
> 2009/8/30 Brett Cannon :
>> On Sun, Aug 30, 2009 at 19:34, Guido van Rossum wrote:
>>> On Sun, Aug 30, 2009 at 5:34 PM, Brett Cannon wrote:
 On Sun, Aug 30, 2009 at 17:24, Guido van Rossum wrote:
> (I was going to comment on the execution bit issue but I realized I'm
> not even sure if you're talking about import.c or not. :-)

 So it turns out a bunch of execution/write bit stuff has come up in
 Python 2.7 and importlib has been ignoring it. =) Importlib has simply
 been opening up the bytecode files with 'wb' and writing out the file.
 But test_import tests that no execution bit get set or that a write
 bit gets added if the source file lacks it. I guess I can use
 posix.chmod and posix.stat to copy the source file's read and write
 bits and always mask out the execution bits. I hate this low-level
 file permission stuff.
>>>
>>> It's no fun -- see the layers of #ifdefs in open_exclusive() in
>>> import.c. (Though I think you won't need to worry about VMS. :-) But
>>> it's somewhat important to get it right from a security POV. I would
>>> use os.open() and wrap an io.BufferedWriter around it.
>>
>> I will have to see what of that is implemented in C or in Python. I
>> have always tried to keep all pure Python code out of importlib for
>> bootstrapping reasons in order to keep the possibility of using
>> importlib as the implementation of import. But maybe I should not be
>> worrying about that right at the moment and instead do what keeps the
>> code simple.
>
> You can use the C implementation of io, _io, which has a full
> buffering implementation. Of course, that also makes it a better
> harder for other implementations which may wish to use importlib
> because the io library would have to be completely implemented...

True. I guess it's a question of whether making importlib easier to
maintain and as minimally reliant on C-specific modules is more/less
important than trying to bootstrap it in for CPython for __import__ at
some point.

-Brett
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com


Re: [Python-Dev] Fast Implementation for ZIP decryption

2009-08-30 Thread Gregory P. Smith
On Sun, Aug 30, 2009 at 10:40 PM, Jeroen Ruigrok van der Werven <
asmo...@in-nomine.org> wrote:

> -On [20090831 06:29], Collin Winter (coll...@gmail.com) wrote:
> >Are there any applications/frameworks which have zip files on their
> >critical path, where this kind of (admittedly impressive) speedup
> >would be beneficial? What was the motivation for writing the C
> >version?
>
> Would zipped eggs count? For example, SQLAlchemy runs in the 5 MB range.
>

Unless someone's also pushing for being able to import and execute code from
scrambled zip files, no that doesn't matter.

The C code for this should be trivially tiny.  See the zipfile._ZipDecryptor
class, its got ~25 lines of actual code in it.  It is not worth arguing
about.  I'll commit this if you post it as a patch in a tracker issue.
 Please make sure your patch includes the following:

* A unittest that compares the C version of the descrambler to the python
version of the descrambler using a variety of inputs and outputs that
exercise any boundary condition.

* Conditional import code in the zipfile module itself so that the module
works even if the C module isn't available.

-Greg
___
Python-Dev mailing list
Python-Dev@python.org
http://mail.python.org/mailman/listinfo/python-dev
Unsubscribe: 
http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com