Re: [Python-Dev] Fast Implementation for ZIP decryption
> Does it sound worthy enough to create a patch for and integrate into > python itself? Probably not, given that people think that the algorithm itself is fairly useless. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Fast Implementation for ZIP decryption
On Sun, 30 Aug 2009 06:55:33 pm Martin v. Löwis wrote: > > Does it sound worthy enough to create a patch for and integrate > > into python itself? > > Probably not, given that people think that the algorithm itself is > fairly useless. I would think that for most people, the threat model isn't "the CIA is reading my files" but "my little brother or nosey co-worker is reading my files", and for that, zip encryption with a good password is probably perfectly adequate. E.g. OpenOffice uses it for password-protected documents. Given that Python already supports ZIP decryption (as it should), are there any reasons to prefer the current pure-Python implementation over a faster version? -- Steven D'Aprano ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Mercurial migration: help needed
Mark Hammond writes: > 1) I've stalled on the 'none:' patch I promised to resurrect. While > doing this, I re-discovered that the tests for win32text appear to > check win32 line endings are used by win32text on *all* platforms, not > just Windows. I think it is only Patrick Mezard who knows how to run (parts of) the test suite on Windows. > I asked for advice from Dirkjan who referred me to the mercurual-devel > list, but my request of slightly over a week ago remains unanswered > (http://selenic.com/pipermail/mercurial-devel/2009-August/014873.html) > - > maybe I just need to be more patient... Oh no, that's usually the wrong tactic :-) I've been too busy for real Mercurial work the last couple of weeks, but you should not feel bad about poking us if you don't get a reply. Or come to the IRC channel (#mercurial on irc.freenode.net) where Dirkjan (djc) and myself (mg) hang out when it's daytime in Europe. > Further, Martin's comments in this thread indicate he believes a new > extension will be necessary rather than 'fixing' win32text. If this > is the direction we take, it may mean the none: patch, which targets > the implementation of win32text, is no longer necessary anyway. I suggested a new extension for two reasons: * I'm using Linux, and I mentally skip over all extensions that mention "win32"... I guess others do the same, and in this case it's really a shame since converting EOL markers is a cross-platform problem: if someone creates a repository on Windows, I might find it nice to translate the EOL markers into LF on my machine. As far as I know, all my tools works correctly with CRLF EOL markers, but I can see the usefulness of such an extension when adding new files (which would default to LF unless I take care). * A new extension will not have to deal with backwards compatibility issues. That would let us clean up the strange names: I think "cleverencode:" and "cleverdecode:" quite poor names that convey little meaning (and what's with the colon?). We could instead use the same names as Subversion: "native", "CRLF" and "LF". The new extension could be named 'convert-eol' or something like that. > 2) These same recent discussions about an entirely new extension and > no clear indication of our expectations regarding what the tool > actually enforces means I'm not sure how to make a start on the more > general issue. It would be a folly to require all files in all changesets to use the right EOL markers -- people will be making mistakes offline. The important thing is that they fix them before pushing to a public server. So the extension should do that: either abort commits with the wrong EOL markers or do as Subversion and automatically convert the file in the working copy. > I also fear that should I try to make a start on this, it will still > wind up fruitless - eg, it seems any work targeting win32text > specifically would have been wasted, so I'd really like to see a > consensus on what needs to be done before attempting to start it. As I understand it, what is lacking is that win32text will read the encode/decode settings from a versioned file called /.hgeol. This means that you can just enable the extension and be done with it, instead of configuring it in every clone. The /.hgeol file should contain two sections: [repository] native = LF [patterns] Windows.txt = CRLF Unix.txt = LF Tools/buildbot/** = CRLF **.txt = native **.py = native **.dsp = CRLF The [repository] setting controls what native is translated into upon commit. The [patterns] section can be translated into safe [decode] / [encode] settings by the extension: [encode] Windows.txt = to-crlf Unix.txt = to-lf Tools/buildbot/** = to-crlf **.txt = to-lf **.py = to-lf **.dsp = to-crlf [decode] Windows.txt = to-crlf Unix.txt = to-lf Tools/buildbot/** = to-crlf **.txt = to-native **.py = to-native **.dsp = to-crlf where to-crlf, to-lf, to-native are filters installed by the extension. I guess your 'none' encode/decode filter patch would be needed if the Unix.txt file were to be stored unchanged in the repository? Instead I imagine that the extension will convert a modified Unix.txt to LF EOL markers automatically (Subversion behaves like that, as far as I can tell from a bit of testing). That way the repository will contain most files in the format specified as native for it, but selected files are stored using whatever EOLs they like. The result is that someone who has not enabled the extension will get correct files from a checkout. Had we stored the *.dsp files with LF EOLs in the repository (like Subversion does), then using the extension would be mandatory for everybody. -- Martin Geisler VIFF (Virtual Ideal Functionality Framework) brings easy and efficient SMPC (Secure Multiparty Computation) to Python. See: http://viff.dk/. pgpPup5ro3MCH.pgp Description: PGP signature ___ Python-
Re: [Python-Dev] Fast Implementation for ZIP decryption
On 12:59 pm, st...@pearwood.info wrote: On Sun, 30 Aug 2009 06:55:33 pm Martin v. L�wis wrote: > Does it sound worthy enough to create a patch for and integrate > into python itself? Probably not, given that people think that the algorithm itself is fairly useless. I would think that for most people, the threat model isn't "the CIA is reading my files" but "my little brother or nosey co-worker is reading my files", and for that, zip encryption with a good password is probably perfectly adequate. E.g. OpenOffice uses it for password-protected documents. Given that Python already supports ZIP decryption (as it should), are there any reasons to prefer the current pure-Python implementation over a faster version? Given that the use case is "protect my biology homework from my little brother", how fast does the implementation really need to be? Is speeding it up from 0.1 seconds to 0.001 seconds worth the potential new problems that come with more C code (more code to maintain, less portability to other runtimes, potential for interpreter crashes or even arbitrary code execution vulnerabilities from specially crafted files)? Jean-Paul ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Fast Implementation for ZIP decryption
just to give you an idea of the speed up: a 3.3 mb zip file extracted using the current all-python implementation on my machine (win xp 1.67Ghz 1.5GB) takes approximately 38 seconds. the same file when extracted using c implementation takes 0.4 seconds. --shashank On Sun, Aug 30, 2009 at 6:35 PM, wrote: > On 12:59 pm, st...@pearwood.info wrote: > >> On Sun, 30 Aug 2009 06:55:33 pm Martin v. Löwis wrote: >> >>> > Does it sound worthy enough to create a patch for and integrate >>> > into python itself? >>> >>> Probably not, given that people think that the algorithm itself is >>> fairly useless. >>> >> >> I would think that for most people, the threat model isn't "the CIA is >> reading my files" but "my little brother or nosey co-worker is reading >> my files", and for that, zip encryption with a good password is >> probably perfectly adequate. E.g. OpenOffice uses it for >> password-protected documents. >> >> Given that Python already supports ZIP decryption (as it should), are >> there any reasons to prefer the current pure-Python implementation over >> a faster version? >> > > Given that the use case is "protect my biology homework from my little > brother", how fast does the implementation really need to be? Is speeding > it up from 0.1 seconds to 0.001 seconds worth the potential new problems > that come with more C code (more code to maintain, less portability to other > runtimes, potential for interpreter crashes or even arbitrary code execution > vulnerabilities from specially crafted files)? > > Jean-Paul > > ___ > Python-Dev mailing list > Python-Dev@python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/shashank.sunny.singh%40gmail.com > > ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Fast Implementation for ZIP decryption
On 30 aug 2009, at 16:34, Shashank Singh wrote: just to give you an idea of the speed up: a 3.3 mb zip file extracted using the current all-python implementation on my machine (win xp 1.67Ghz 1.5GB) takes approximately 38 seconds. the same file when extracted using c implementation takes 0.4 seconds. If this matters to the users of the API, then likely they'd search for alternatives -- no need for it to go into the standard library just because it replaces functionality, or am I misunderstanding? - Ludvig Ericson ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Mercurial migration: help needed
> I suggested a new extension for two reasons: > > * I'm using Linux, and I mentally skip over all extensions that mention > "win32"... I guess others do the same, and in this case it's really a > shame since converting EOL markers is a cross-platform problem: if > someone creates a repository on Windows, I might find it nice to > translate the EOL markers into LF on my machine. > > As far as I know, all my tools works correctly with CRLF EOL markers, > but I can see the usefulness of such an extension when adding new > files (which would default to LF unless I take care). > > * A new extension will not have to deal with backwards compatibility > issues. That would let us clean up the strange names: I think > "cleverencode:" and "cleverdecode:" quite poor names that convey > little meaning (and what's with the colon?). We could instead use the > same names as Subversion: "native", "CRLF" and "LF". > > The new extension could be named 'convert-eol' or something like that. Thanks for the confirmation - this is also why I think a new extension would be best. FWIW, in Python, most files would be declared native, some CRLF, none LF. >> 2) These same recent discussions about an entirely new extension and >> no clear indication of our expectations regarding what the tool >> actually enforces means I'm not sure how to make a start on the more >> general issue. > > It would be a folly to require all files in all changesets to use the > right EOL markers -- people will be making mistakes offline. The > important thing is that they fix them before pushing to a public server. > > So the extension should do that: either abort commits with the wrong EOL > markers or do as Subversion and automatically convert the file in the > working copy. Maybe I misunderstand: when people use the extension, they cannot possibly make mistakes, right? Because the commit that gets aborted is already the local commit, right? Of course, it may still be that not all people use the extension. I think this is of concern to Mark (and he would like hg to refuse operation at all if the extension isn't used), but not to me: I would like this to be a feature of hg eventually, in which case I don't need to worry whether hg enforces presence of certain extensions. If people make commits that break the eol style, we could well refuse to accept them on the server, telling people that they should have used the extension (or that they should have been more careful if they don't use the extension). I think subversion's behavior wrt. incorrect eol-style is more subtle. In some cases, it will complain about inconsistencies, rather than fixing them automatically. Regards, Martin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Mercurial migration: help needed
"Martin v. Löwis" writes: >> So the extension should do that: either abort commits with the wrong >> EOL markers or do as Subversion and automatically convert the file in >> the working copy. > > Maybe I misunderstand: when people use the extension, they cannot > possibly make mistakes, right? Because the commit that gets aborted is > already the local commit, right? > > Of course, it may still be that not all people use the extension. Exactly, when people use the extension, they wont be able to make bad commits. > I think this is of concern to Mark (and he would like hg to refuse > operation at all if the extension isn't used), but not to me: I would > like this to be a feature of hg eventually, in which case I don't need > to worry whether hg enforces presence of certain extensions. Yes, that would be nice for the future. I don't know if the other Mercurial developers will see this as a big controversy -- Mercurial has so far made very sure to never mutate your files behind your back. Expansion of keywords (like $Id$) is also implemented as an extension. > If people make commits that break the eol style, we could well refuse > to accept them on the server, telling people that they should have > used the extension (or that they should have been more careful if they > don't use the extension). Indeed. Their work will not be lost -- one can always take the final file, convert the line-endings, copy it into a fresh clone and commit that. With more work one could even salvage the intermediate commits, but that is probably not necessary. > I think subversion's behavior wrt. incorrect eol-style is more subtle. > In some cases, it will complain about inconsistencies, rather than > fixing them automatically. Okay --- I don't have much experience with the svn:eol-style, except that I've read about it in the manual. -- Martin Geisler VIFF (Virtual Ideal Functionality Framework) brings easy and efficient SMPC (Secure Multiparty Computation) to Python. See: http://viff.dk/. pgpaYHbx5rh2L.pgp Description: PGP signature ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Fast Implementation for ZIP decryption
exar...@twistedmatrix.com wrote: > Given that the use case is "protect my biology homework from my little > brother", how fast does the implementation really need to be? Is > speeding it up from 0.1 seconds to 0.001 seconds worth the potential new > problems that come with more C code (more code to maintain, less > portability to other runtimes, potential for interpreter crashes or even > arbitrary code execution vulnerabilities from specially crafted files)? Also, if the use case is just protecting stuff from a sibling or your childen, use an archiving program to zip/extract it :) So -1 here as well. Any added C code has a real cost for the reasons Jean-Paul listed, so it should only be used in cases where there's a major practical benefit to the speed-up. Faster execution of a problematic algorithm that is already well implemented by plenty of other applications doesn't qualify in my book (even if the speedup is by a couple of orders of magnitude). Cheers, Nick. -- Nick Coghlan | ncogh...@gmail.com | Brisbane, Australia --- ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
[Python-Dev] how important is setting co_filename for a module being imported to what __file__ is set to?
I am going through and running the entire test suite using importlib to ferret out incompatibilities. I have found a bunch, although all rather minor (raising a different exception typically; not even sure they are worth backporting as anyone reliant on the old exceptions might get a nasty surprise in the next micro release), and now I am down to my last failing test suite: test_import. Ignoring the execution bit problem (http://bugs.python.org/issue6526 but I have no clue why this is happening), I am bumping up against TestPycRewriting.test_incorrect_code_name. Turns out that import resets co_filename on a code object to __file__ before exec'ing it to create a module's namespace in order to ignore the file name passed into compile() for the filename argument. Now I can't change co_filename from Python as it's a read-only attribute and thus can't match this functionality in importlib w/o creating some custom code to allow me to specify the co_filename somewhere (marshal.loads() or some new function). My question is how important is this functionality? Do I really need to go through and add an argument to marshal.loads or some new function just to set co_filename to something that someone explicitly set in a .pyc file? Or I can let this go and have this be the one place where builtins.__import__ and importlib.__import__ differ and just not worry about it? -Brett ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] how important is setting co_filename for a module being imported to what __file__ is set to?
On Sun, 2009-08-30 at 16:28 -0700, Brett Cannon wrote: > > > My question is how important is this functionality? Do I really need > to go through and add an argument to marshal.loads or some new > function just to set co_filename to something that someone explicitly > set in a .pyc file? Or I can let this go and have this be the one > place where builtins.__import__ and importlib.__import__ differ and > just not worry about it? Just to be clear, this would show up if I: had a python tree built and run stuff from it symlinked to that tree from somewhere else ran stuff from that somewhere else - because the pyc is already on disk? Thats been an invaluable 'wtf' debugging tool at various times, because the odd provenance of the path in the pyc makes it extremely clear that what is being loaded isn't what one had thought was being loaded. OTOH, always showing the path that the pyc was *actually found at* would fix the weirdness that occurs when you mv a python tree from one place to another. -Rob signature.asc Description: This is a digitally signed message part ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] how important is setting co_filename for a module being imported to what __file__ is set to?
On Sun, Aug 30, 2009 at 17:13, Robert Collins wrote: > On Sun, 2009-08-30 at 16:28 -0700, Brett Cannon wrote: >> >> >> My question is how important is this functionality? Do I really need >> to go through and add an argument to marshal.loads or some new >> function just to set co_filename to something that someone explicitly >> set in a .pyc file? Or I can let this go and have this be the one >> place where builtins.__import__ and importlib.__import__ differ and >> just not worry about it? > > Just to be clear, this would show up if I: > had a python tree > built and run stuff from it > symlinked to that tree from somewhere else > ran stuff from that somewhere else Right; the code object would think it was loaded from the original location it was created at instead of where it actually is. Now why someone would want to move their .pyc files around instead of recompiling I don't know short of not wanting to send someone source. -Brett ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] how important is setting co_filename for a module being imported to what __file__ is set to?
On Sun, Aug 30, 2009 at 4:28 PM, Brett Cannon wrote: > I am going through and running the entire test suite using importlib > to ferret out incompatibilities. I have found a bunch, although all > rather minor (raising a different exception typically; not even sure > they are worth backporting as anyone reliant on the old exceptions > might get a nasty surprise in the next micro release), and now I am > down to my last failing test suite: test_import. > > Ignoring the execution bit problem (http://bugs.python.org/issue6526 > but I have no clue why this is happening), I am bumping up against > TestPycRewriting.test_incorrect_code_name. Turns out that import > resets co_filename on a code object to __file__ before exec'ing it to > create a module's namespace in order to ignore the file name passed > into compile() for the filename argument. Now I can't change > co_filename from Python as it's a read-only attribute and thus can't > match this functionality in importlib w/o creating some custom code to > allow me to specify the co_filename somewhere (marshal.loads() or some > new function). > > My question is how important is this functionality? Do I really need > to go through and add an argument to marshal.loads or some new > function just to set co_filename to something that someone explicitly > set in a .pyc file? Or I can let this go and have this be the one > place where builtins.__import__ and importlib.__import__ differ and > just not worry about it? ISTR that Bill Janssen once mentioned a file replication mechanism whereby there were two names for each file: the "canonical" name on a replicated read-only filesystem, and the longer "writable" name on a unique master copy. He ended up with the filenames in the .pyc files being pretty bogus (since not everyone had access to the writable filesystem). So setting co_filename to match __file__ (i.e. the name under which the module is being imported) would be a nice service in this case. In general this would happen whenever you pre-compile a bunch of .py files to .pyc/.pyo and then copy the lot to a different location. Not a completely unlikely scenario. (I was going to comment on the execution bit issue but I realized I'm not even sure if you're talking about import.c or not. :-) -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] how important is setting co_filename for a module being imported to what __file__ is set to?
On Sun, Aug 30, 2009 at 5:23 PM, Brett Cannon wrote: > Right; the code object would think it was loaded from the original > location it was created at instead of where it actually is. Now why > someone would want to move their .pyc files around instead of > recompiling I don't know short of not wanting to send someone source. I already mentioned replication; it could also just be a matter of downloading a tarball with .py and .pyc files. -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] how important is setting co_filename for a module being imported to what __file__ is set to?
On Sun, Aug 30, 2009 at 17:24, Guido van Rossum wrote: > On Sun, Aug 30, 2009 at 4:28 PM, Brett Cannon wrote: >> I am going through and running the entire test suite using importlib >> to ferret out incompatibilities. I have found a bunch, although all >> rather minor (raising a different exception typically; not even sure >> they are worth backporting as anyone reliant on the old exceptions >> might get a nasty surprise in the next micro release), and now I am >> down to my last failing test suite: test_import. >> >> Ignoring the execution bit problem (http://bugs.python.org/issue6526 >> but I have no clue why this is happening), I am bumping up against >> TestPycRewriting.test_incorrect_code_name. Turns out that import >> resets co_filename on a code object to __file__ before exec'ing it to >> create a module's namespace in order to ignore the file name passed >> into compile() for the filename argument. Now I can't change >> co_filename from Python as it's a read-only attribute and thus can't >> match this functionality in importlib w/o creating some custom code to >> allow me to specify the co_filename somewhere (marshal.loads() or some >> new function). >> >> My question is how important is this functionality? Do I really need >> to go through and add an argument to marshal.loads or some new >> function just to set co_filename to something that someone explicitly >> set in a .pyc file? Or I can let this go and have this be the one >> place where builtins.__import__ and importlib.__import__ differ and >> just not worry about it? > > ISTR that Bill Janssen once mentioned a file replication mechanism > whereby there were two names for each file: the "canonical" name on a > replicated read-only filesystem, and the longer "writable" name on a > unique master copy. He ended up with the filenames in the .pyc files > being pretty bogus (since not everyone had access to the writable > filesystem). So setting co_filename to match __file__ (i.e. the name > under which the module is being imported) would be a nice service in > this case. > > In general this would happen whenever you pre-compile a bunch of .py > files to .pyc/.pyo and then copy the lot to a different location. Not > a completely unlikely scenario. > Well, to get this level of compatibility I am going to need to add some magical API somewhere then to overwrite a code object's "file" location. Blah. I will either add an argument to marshal.loads to specify an overriding file path or add an imp.exec that takes a file path argument to override the code object with. > (I was going to comment on the execution bit issue but I realized I'm > not even sure if you're talking about import.c or not. :-) So it turns out a bunch of execution/write bit stuff has come up in Python 2.7 and importlib has been ignoring it. =) Importlib has simply been opening up the bytecode files with 'wb' and writing out the file. But test_import tests that no execution bit get set or that a write bit gets added if the source file lacks it. I guess I can use posix.chmod and posix.stat to copy the source file's read and write bits and always mask out the execution bits. I hate this low-level file permission stuff. -Brett ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] how important is setting co_filename for a module being imported to what __file__ is set to?
On Sun, Aug 30, 2009 at 5:34 PM, Brett Cannon wrote: > On Sun, Aug 30, 2009 at 17:24, Guido van Rossum wrote: >> On Sun, Aug 30, 2009 at 4:28 PM, Brett Cannon wrote: >>> I am going through and running the entire test suite using importlib >>> to ferret out incompatibilities. I have found a bunch, although all >>> rather minor (raising a different exception typically; not even sure >>> they are worth backporting as anyone reliant on the old exceptions >>> might get a nasty surprise in the next micro release), and now I am >>> down to my last failing test suite: test_import. >>> >>> Ignoring the execution bit problem (http://bugs.python.org/issue6526 >>> but I have no clue why this is happening), I am bumping up against >>> TestPycRewriting.test_incorrect_code_name. Turns out that import >>> resets co_filename on a code object to __file__ before exec'ing it to >>> create a module's namespace in order to ignore the file name passed >>> into compile() for the filename argument. Now I can't change >>> co_filename from Python as it's a read-only attribute and thus can't >>> match this functionality in importlib w/o creating some custom code to >>> allow me to specify the co_filename somewhere (marshal.loads() or some >>> new function). >>> >>> My question is how important is this functionality? Do I really need >>> to go through and add an argument to marshal.loads or some new >>> function just to set co_filename to something that someone explicitly >>> set in a .pyc file? Or I can let this go and have this be the one >>> place where builtins.__import__ and importlib.__import__ differ and >>> just not worry about it? >> >> ISTR that Bill Janssen once mentioned a file replication mechanism >> whereby there were two names for each file: the "canonical" name on a >> replicated read-only filesystem, and the longer "writable" name on a >> unique master copy. He ended up with the filenames in the .pyc files >> being pretty bogus (since not everyone had access to the writable >> filesystem). So setting co_filename to match __file__ (i.e. the name >> under which the module is being imported) would be a nice service in >> this case. >> >> In general this would happen whenever you pre-compile a bunch of .py >> files to .pyc/.pyo and then copy the lot to a different location. Not >> a completely unlikely scenario. > Well, to get this level of compatibility I am going to need to add > some magical API somewhere then to overwrite a code object's "file" > location. Blah. Agreed, no fun. Unfortunately for core Python it really pays to go the extra mile... > I will either add an argument to marshal.loads to specify an > overriding file path or add an imp.exec that takes a file path > argument to override the code object with. Remember, there are many code objects created from one pyc file. Adding it to marshal.load*() makes sense because then it's usable for other purposes too, and that attacks the issue from the root. (in import.c it's done by update_compiled_module() right after read_compiled_module(), which is a thin wrapper around marshal.load()) I'm not sure how imp.exec would make sure that introspection of the loaded code objects always gets the right thing. >> (I was going to comment on the execution bit issue but I realized I'm >> not even sure if you're talking about import.c or not. :-) > > So it turns out a bunch of execution/write bit stuff has come up in > Python 2.7 and importlib has been ignoring it. =) Importlib has simply > been opening up the bytecode files with 'wb' and writing out the file. > But test_import tests that no execution bit get set or that a write > bit gets added if the source file lacks it. I guess I can use > posix.chmod and posix.stat to copy the source file's read and write > bits and always mask out the execution bits. I hate this low-level > file permission stuff. It's no fun -- see the layers of #ifdefs in open_exclusive() in import.c. (Though I think you won't need to worry about VMS. :-) But it's somewhat important to get it right from a security POV. I would use os.open() and wrap an io.BufferedWriter around it. -- --Guido van Rossum (home page: http://www.python.org/~guido/) ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] how important is setting co_filename for a module being imported to what __file__ is set to?
On Sun, Aug 30, 2009 at 19:34, Guido van Rossum wrote: > On Sun, Aug 30, 2009 at 5:34 PM, Brett Cannon wrote: >> On Sun, Aug 30, 2009 at 17:24, Guido van Rossum wrote: >>> On Sun, Aug 30, 2009 at 4:28 PM, Brett Cannon wrote: I am going through and running the entire test suite using importlib to ferret out incompatibilities. I have found a bunch, although all rather minor (raising a different exception typically; not even sure they are worth backporting as anyone reliant on the old exceptions might get a nasty surprise in the next micro release), and now I am down to my last failing test suite: test_import. Ignoring the execution bit problem (http://bugs.python.org/issue6526 but I have no clue why this is happening), I am bumping up against TestPycRewriting.test_incorrect_code_name. Turns out that import resets co_filename on a code object to __file__ before exec'ing it to create a module's namespace in order to ignore the file name passed into compile() for the filename argument. Now I can't change co_filename from Python as it's a read-only attribute and thus can't match this functionality in importlib w/o creating some custom code to allow me to specify the co_filename somewhere (marshal.loads() or some new function). My question is how important is this functionality? Do I really need to go through and add an argument to marshal.loads or some new function just to set co_filename to something that someone explicitly set in a .pyc file? Or I can let this go and have this be the one place where builtins.__import__ and importlib.__import__ differ and just not worry about it? >>> >>> ISTR that Bill Janssen once mentioned a file replication mechanism >>> whereby there were two names for each file: the "canonical" name on a >>> replicated read-only filesystem, and the longer "writable" name on a >>> unique master copy. He ended up with the filenames in the .pyc files >>> being pretty bogus (since not everyone had access to the writable >>> filesystem). So setting co_filename to match __file__ (i.e. the name >>> under which the module is being imported) would be a nice service in >>> this case. >>> >>> In general this would happen whenever you pre-compile a bunch of .py >>> files to .pyc/.pyo and then copy the lot to a different location. Not >>> a completely unlikely scenario. > >> Well, to get this level of compatibility I am going to need to add >> some magical API somewhere then to overwrite a code object's "file" >> location. Blah. > > Agreed, no fun. Unfortunately for core Python it really pays to go the > extra mile... > Definitely, which is why I will do it, just not tonight as I am tired of compatibility fixing for now. =) >> I will either add an argument to marshal.loads to specify an >> overriding file path or add an imp.exec that takes a file path >> argument to override the code object with. > > Remember, there are many code objects created from one pyc file. > Adding it to marshal.load*() makes sense because then it's usable for > other purposes too, and that attacks the issue from the root. That was my thinking. > (in > import.c it's done by update_compiled_module() right after > read_compiled_module(), which is a thin wrapper around marshal.load()) > I'm not sure how imp.exec would make sure that introspection of the > loaded code objects always gets the right thing. > Basically it would be imp.exec(module, code, path) and it would tweak the code object before execution based on introspecting what the module had set for __file__. But might as well add the support to marshal. >>> (I was going to comment on the execution bit issue but I realized I'm >>> not even sure if you're talking about import.c or not. :-) >> >> So it turns out a bunch of execution/write bit stuff has come up in >> Python 2.7 and importlib has been ignoring it. =) Importlib has simply >> been opening up the bytecode files with 'wb' and writing out the file. >> But test_import tests that no execution bit get set or that a write >> bit gets added if the source file lacks it. I guess I can use >> posix.chmod and posix.stat to copy the source file's read and write >> bits and always mask out the execution bits. I hate this low-level >> file permission stuff. > > It's no fun -- see the layers of #ifdefs in open_exclusive() in > import.c. (Though I think you won't need to worry about VMS. :-) But > it's somewhat important to get it right from a security POV. I would > use os.open() and wrap an io.BufferedWriter around it. I will have to see what of that is implemented in C or in Python. I have always tried to keep all pure Python code out of importlib for bootstrapping reasons in order to keep the possibility of using importlib as the implementation of import. But maybe I should not be worrying about that right at the moment and instead do what keeps the code simple. -Brett ___
Re: [Python-Dev] how important is setting co_filename for a module being imported to what __file__ is set to?
2009/8/30 Brett Cannon : > On Sun, Aug 30, 2009 at 19:34, Guido van Rossum wrote: >> On Sun, Aug 30, 2009 at 5:34 PM, Brett Cannon wrote: >>> On Sun, Aug 30, 2009 at 17:24, Guido van Rossum wrote: (I was going to comment on the execution bit issue but I realized I'm not even sure if you're talking about import.c or not. :-) >>> >>> So it turns out a bunch of execution/write bit stuff has come up in >>> Python 2.7 and importlib has been ignoring it. =) Importlib has simply >>> been opening up the bytecode files with 'wb' and writing out the file. >>> But test_import tests that no execution bit get set or that a write >>> bit gets added if the source file lacks it. I guess I can use >>> posix.chmod and posix.stat to copy the source file's read and write >>> bits and always mask out the execution bits. I hate this low-level >>> file permission stuff. >> >> It's no fun -- see the layers of #ifdefs in open_exclusive() in >> import.c. (Though I think you won't need to worry about VMS. :-) But >> it's somewhat important to get it right from a security POV. I would >> use os.open() and wrap an io.BufferedWriter around it. > > I will have to see what of that is implemented in C or in Python. I > have always tried to keep all pure Python code out of importlib for > bootstrapping reasons in order to keep the possibility of using > importlib as the implementation of import. But maybe I should not be > worrying about that right at the moment and instead do what keeps the > code simple. You can use the C implementation of io, _io, which has a full buffering implementation. Of course, that also makes it a better harder for other implementations which may wish to use importlib because the io library would have to be completely implemented... -- Regards, Benjamin ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] how important is setting co_filename for a module being imported to what __file__ is set to?
On Sun, Aug 30, 2009 at 8:26 PM, Guido van Rossum wrote: > On Sun, Aug 30, 2009 at 5:23 PM, Brett Cannon wrote: > > Right; the code object would think it was loaded from the original > > location it was created at instead of where it actually is. Now why > > someone would want to move their .pyc files around instead of > > recompiling I don't know short of not wanting to send someone source. > > I already mentioned replication; it could also just be a matter of > downloading a tarball with .py and .pyc files. Also, if you're using Python in an embedded context, bytecode compilation (or even filesystem access!) can be prohibitively slow, so an uncompressed .zip file full of compiled .pyc files is really the way to go. I did this a long time ago on an XScale machine, but recent inspection of the Android Python scripting stuff shows a similar style of deployment (c.f. /data/data/com.google.ase/python/lib/python26.zip). ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Fast Implementation for ZIP decryption
On Sun, Aug 30, 2009 at 7:34 AM, Shashank Singh wrote: > just to give you an idea of the speed up: > > a 3.3 mb zip file extracted using the current all-python implementation on > my machine (win xp 1.67Ghz 1.5GB) > takes approximately 38 seconds. > > the same file when extracted using c implementation takes 0.4 seconds. Are there any applications/frameworks which have zip files on their critical path, where this kind of (admittedly impressive) speedup would be beneficial? What was the motivation for writing the C version? Collin Winter > On Sun, Aug 30, 2009 at 6:35 PM, wrote: >> >> On 12:59 pm, st...@pearwood.info wrote: >>> >>> On Sun, 30 Aug 2009 06:55:33 pm Martin v. Löwis wrote: > Does it sound worthy enough to create a patch for and integrate > into python itself? Probably not, given that people think that the algorithm itself is fairly useless. >>> >>> I would think that for most people, the threat model isn't "the CIA is >>> reading my files" but "my little brother or nosey co-worker is reading >>> my files", and for that, zip encryption with a good password is >>> probably perfectly adequate. E.g. OpenOffice uses it for >>> password-protected documents. >>> >>> Given that Python already supports ZIP decryption (as it should), are >>> there any reasons to prefer the current pure-Python implementation over >>> a faster version? >> >> Given that the use case is "protect my biology homework from my little >> brother", how fast does the implementation really need to be? Is speeding >> it up from 0.1 seconds to 0.001 seconds worth the potential new problems >> that come with more C code (more code to maintain, less portability to other >> runtimes, potential for interpreter crashes or even arbitrary code execution >> vulnerabilities from specially crafted files)? >> >> Jean-Paul >> >> ___ >> Python-Dev mailing list >> Python-Dev@python.org >> http://mail.python.org/mailman/listinfo/python-dev >> Unsubscribe: >> http://mail.python.org/mailman/options/python-dev/shashank.sunny.singh%40gmail.com >> > > > ___ > Python-Dev mailing list > Python-Dev@python.org > http://mail.python.org/mailman/listinfo/python-dev > Unsubscribe: > http://mail.python.org/mailman/options/python-dev/collinw%40gmail.com > > ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Fast Implementation for ZIP decryption
-On [20090831 06:29], Collin Winter (coll...@gmail.com) wrote: >Are there any applications/frameworks which have zip files on their >critical path, where this kind of (admittedly impressive) speedup >would be beneficial? What was the motivation for writing the C >version? Would zipped eggs count? For example, SQLAlchemy runs in the 5 MB range. -- Jeroen Ruigrok van der Werven / asmodai イェルーン ラウフロック ヴァン デル ウェルヴェン http://www.in-nomine.org/ | http://www.rangaku.org/ | GPG: 2EAC625B All for one, one for all... ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] how important is setting co_filename for a module being imported to what __file__ is set to?
On Sun, Aug 30, 2009 at 19:51, Benjamin Peterson wrote: > 2009/8/30 Brett Cannon : >> On Sun, Aug 30, 2009 at 19:34, Guido van Rossum wrote: >>> On Sun, Aug 30, 2009 at 5:34 PM, Brett Cannon wrote: On Sun, Aug 30, 2009 at 17:24, Guido van Rossum wrote: > (I was going to comment on the execution bit issue but I realized I'm > not even sure if you're talking about import.c or not. :-) So it turns out a bunch of execution/write bit stuff has come up in Python 2.7 and importlib has been ignoring it. =) Importlib has simply been opening up the bytecode files with 'wb' and writing out the file. But test_import tests that no execution bit get set or that a write bit gets added if the source file lacks it. I guess I can use posix.chmod and posix.stat to copy the source file's read and write bits and always mask out the execution bits. I hate this low-level file permission stuff. >>> >>> It's no fun -- see the layers of #ifdefs in open_exclusive() in >>> import.c. (Though I think you won't need to worry about VMS. :-) But >>> it's somewhat important to get it right from a security POV. I would >>> use os.open() and wrap an io.BufferedWriter around it. >> >> I will have to see what of that is implemented in C or in Python. I >> have always tried to keep all pure Python code out of importlib for >> bootstrapping reasons in order to keep the possibility of using >> importlib as the implementation of import. But maybe I should not be >> worrying about that right at the moment and instead do what keeps the >> code simple. > > You can use the C implementation of io, _io, which has a full > buffering implementation. Of course, that also makes it a better > harder for other implementations which may wish to use importlib > because the io library would have to be completely implemented... True. I guess it's a question of whether making importlib easier to maintain and as minimally reliant on C-specific modules is more/less important than trying to bootstrap it in for CPython for __import__ at some point. -Brett ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com
Re: [Python-Dev] Fast Implementation for ZIP decryption
On Sun, Aug 30, 2009 at 10:40 PM, Jeroen Ruigrok van der Werven < asmo...@in-nomine.org> wrote: > -On [20090831 06:29], Collin Winter (coll...@gmail.com) wrote: > >Are there any applications/frameworks which have zip files on their > >critical path, where this kind of (admittedly impressive) speedup > >would be beneficial? What was the motivation for writing the C > >version? > > Would zipped eggs count? For example, SQLAlchemy runs in the 5 MB range. > Unless someone's also pushing for being able to import and execute code from scrambled zip files, no that doesn't matter. The C code for this should be trivially tiny. See the zipfile._ZipDecryptor class, its got ~25 lines of actual code in it. It is not worth arguing about. I'll commit this if you post it as a patch in a tracker issue. Please make sure your patch includes the following: * A unittest that compares the C version of the descrambler to the python version of the descrambler using a variety of inputs and outputs that exercise any boundary condition. * Conditional import code in the zipfile module itself so that the module works even if the C module isn't available. -Greg ___ Python-Dev mailing list Python-Dev@python.org http://mail.python.org/mailman/listinfo/python-dev Unsubscribe: http://mail.python.org/mailman/options/python-dev/archive%40mail-archive.com