Re: [Python-ideas] Enhancing Zipapp

2020-01-08 Thread Abdur-Rahmaan Janhangeer
Yours,

Abdur-Rahmaan Janhangeer
pythonmembers.club | github
Mauritius


On Wed, Jan 8, 2020 at 1:32 AM Brett Cannon  wrote:
>
>
> This would be a packaging detail so not something to be specified in the
stdlib.


Yes, the module opening the zip will look for it

>> - [  ] Signing mechanism
>>
>> Mechanisms can be added to detect the integrity of the app. App hash can
be
>> used to check if the app has been modified and per-file hash can be used
to
>> detect what part has been modified. This can be further enhanced if
needed.
>>
>> - [  ] Protecting meta data
>>
>> Metadata are not protected by basic signing. There existing ways to
protect
>> metadata and beyond [7]
>
>
> This can be tricky because people want signing in specific ways that vary
from OS to OS, case by case. So unless there's a built-in signing mechanism
the flexibility required here is huge.


Let's say we have a simple project

folder/
file.py
__main__.py

The first step is to include in the info file the file name and hashes

file.py: 5f22669f6f0ea1cc7e5af7c59712115bcf312e1ceaf7b2b005af259b610cf2da
__main__.py:
e3b0c44298fc1c149afbf4c8996fb92427ae41e4649b934ca495991b7852b855

Then by reading the info file and hashing the actual file and comparing,
we can see which file was modified if any.

But now, a malicious program might try to modify the info file
and modify the hash. One way to protect even the metadata is
to hash the entire content

folder/
file.py # we can add those in a folder if needed
__main__.py
   infofile

Then after zipping it, we hash the zipfile then append the hash to the zip
binary

[zipfile binary][hash value]

We can have a zip file and yet another file stating the hash value but
to maintain a single file structure, the one described above is best.

Then when opening the zip file, we start reading upto the hash value. The
hash
value becomes the checking signature of the zipfile.

This forms a base on which more sigining mechanism can be added like
author keys

Since zipfiles are the same across OSes, this kind of approach supposedly
don't pose a problem

> Install the wheels where? You can't do that globally. And you also have
to worry about the security of doing the install implicitly. And now the
user suddenly has stuff on their file system they may not have asked for as
a side-effect which may upset some people who are tight on disk space
(remember that Python runs on some low-powered machines).

Yes, global folders also defeat the spirit.

Using the wheel-included zip (A), we can generate another zip file (B) with
the packages installed. That generated zip file is then executed.
Zip format A solves the problem of cross-platforming.
Normal solutions upto now like use solution B where you can't share
your zips across OSes.

As for space, it's a bit the same as with venvs. Zip format B is the
equivalent
of packages installed in venv.

Venv usage can be a hint as to when to use.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: [Python-ideas] Re: Enhancing Zipapp

2020-01-08 Thread Abdur-Rahmaan Janhangeer
On Wed, 8 Jan 2020, 11:09 Christopher Barker,  wrote:

>
> But a thought on that -- you may be able to accomplish something similar
> with conda, "conda constructor", and "conda run". -- or a new tool built
> from those. The idea is that the first time you ran your "app", it would
> install its dependencies, and then use them in an isolated environment. But
> if the multiple apps had the same dependencies, they would share them, so
> you wouldn't get major bloat on the host machine.
>

I guess it's time to dig more into anaconda, been
putting it off, will do.

but a wheel is just as big as the installed package (at least a zipped
> version) -- it's essentially the package compressed into a tarball.
>

I really hope C extentions would become redundent someday
in Python, which would make Python development real
Python dev.

The proposal at hand is maybe the best solution to a
hard nut case that most if not all solutions preferred to avoid

But: "Unlike “conventional” zipapps, shiv packs a site-packages style
> directory of your tool’s dependencies into the resulting binary, and then
> at bootstrap time extracts it into a ~/.shiv cache directory."
>

Maybe we can have a PYZ directory where the
packages for each app are extracted then it's not
a global dump but more specific

Why not that route? It would be nice to comment
on what is wrong with Shiv's mode of execution

>
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: [Python-ideas] Re: Enhancing Zipapp

2020-01-08 Thread Abdur-Rahmaan Janhangeer
On Wed, 8 Jan 2020, 02:15 Barry,  wrote:

>
> Have a look at this write up about the horror that is zip file name
> handling.
>
> https://marcosc.com/2008/12/zip-files-and-encoding-i-hate-you/
>
> This has been a pain point at work.
>

Since zipapp did not touch the subject, i won't either
unless well, we can clearly come up with a solution.
If you can work out a solution for Python that
would be great!
-- 
https://mail.python.org/mailman/listinfo/python-list


looking for git with a solution - merge many pdfs to 1 pdf (no matter what language)

2020-01-08 Thread alon . najman
hi
looking for git with a solution - merge many pdfs to 1 pdf (no matter what 
language)
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: looking for git with a solution - merge many pdfs to 1 pdf (no matter what language)

2020-01-08 Thread Pieter van Oostrum
[email protected] writes:

> hi
> looking for git with a solution - merge many pdfs to 1 pdf (no matter what 
> language)

There is a clone of pdftk on github: https://github.com/ericmason/pdftk

Another possibility is mupdf: http://git.ghostscript.com/?p=mupdf.git
-- 
Pieter van Oostrum
www: http://pieter.vanoostrum.org/
PGP key: [8DAE142BE17999C4]
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: [Python-ideas] Re: Enhancing Zipapp

2020-01-08 Thread Christopher Barker
On Wed, Jan 8, 2020 at 1:49 AM Abdur-Rahmaan Janhangeer <
[email protected]> wrote:

> Have a look at this write up about the horror that is zip file name
>> handling.
>>
>> https://marcosc.com/2008/12/zip-files-and-encoding-i-hate-you/
>>
>> This has been a pain point at work.
>>
>
I'm pretty sure this is a non-issue for this use-case. If you need to open
sip files created by arbitrary other systems, or create zip files that can
be opened by arbitrary other systems, then it's a big mess. But that isn't
the case here.

-CHB


-- 
Christopher Barker, PhD

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: [Python-ideas] Re: Enhancing Zipapp

2020-01-08 Thread Christopher Barker
On Wed, Jan 8, 2020 at 1:24 AM Abdur-Rahmaan Janhangeer <
[email protected]> wrote:

> But a thought on that -- you may be able to accomplish something similar
>> with conda, "conda constructor", and "conda run". -- or a new tool built
>> from those. The idea is that the first time you ran your "app", it would
>> install its dependencies, and then use them in an isolated environment. But
>> if the multiple apps had the same dependencies, they would share them, so
>> you wouldn't get major bloat on the host machine.
>>
>
> I guess it's time to dig more into anaconda, been
> putting it off, will do.
>

to be clear -- you want to look at "conda", not "Anaconda" -- conda is a
package manager, Anaconda is a distribution created with the conda package
manager.


> but a wheel is just as big as the installed package (at least a zipped
>> version) -- it's essentially the package compressed into a tarball.
>>
>
> I really hope C extentions would become redundent someday
> in Python, which would make Python development real
> Python dev.
>

That's not going to completely happen. Which does not mean that a solution
that doesn't support them isn't still useful for a lot. But it would be
interesting to see how many commonly used packages on PyPi rely on C
extensions (other than the SciPy Stack).


> But: "Unlike “conventional” zipapps, shiv packs a site-packages style
>> directory of your tool’s dependencies into the resulting binary, and then
>> at bootstrap time extracts it into a ~/.shiv cache directory."
>>
>
> Maybe we can have a PYZ directory where the
> packages for each app are extracted then it's not
> a global dump but more specific
>

I'm not sure how that differs from a .shiv directory, which is not global.
But a way to share packages in the "central place for packages" would be
nice. -- maybe how conda does it with hard links?

-CHB


-- 
Christopher Barker, PhD

Python Language Consulting
  - Teaching
  - Scientific Software Development
  - Desktop GUI and Web Development
  - wxPython, numpy, scipy, Cython
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: [Python-ideas] Re: Enhancing Zipapp

2020-01-08 Thread Andrew Barnert via Python-list
On Jan 8, 2020, at 01:09, Abdur-Rahmaan Janhangeer  wrote:
> 
> But now, a malicious program might try to modify the info file
> and modify the hash. One way to protect even the metadata is
> to hash the entire content
> 
> folder/
> file.py # we can add those in a folder if needed
> __main__.py
>infofile
> 
> Then after zipping it, we hash the zipfile then append the hash to the zip 
> binary
> 
> [zipfile binary][hash value]

How does this solve the problem? A malicious program that could modify the hash 
inside the info file could even more easily modify the hash at the end of the 
zip.

Existing systems deal with this by recognizing that you can’t prevent anyone 
from hashing anything they want, so you either have to store the hashes in a 
trusted central repo, or (more commonly–there are multiple advantages) sign 
them with a trustable key. If a malicious app modified the program and modified 
the hash, it’s going to be a valid hash; there’s nothing you can do about that. 
But it won’t be the hash in the repo, or it’ll be signed by the untrusted 
author of the malicious program rather than the trusted author of the app, and 
that’s why you don’t let it run. And this works just as well for hashes 
embedded inside an info file inside the zip as for hashes appended to the zip.

And there are advantages to putting the hash inside. For example, if you want 
to allow downstream packagers or automated systems to add distribution info 
(this is important if you want to be able to pass a second code signing 
requirement, e.g., Apple’s, as well as the zipapp one), you just have a list of 
escape patterns that say which files are allowed to be unhashed. Anything that 
appears in the info file must match its hash or the archive is invalid. 
Anything that doesn’t appear in the info file but does match the escape 
patterns is fine, but if it doesn’t match the escape patterns, the archive is 
invalid. So now downstream distributors can add extra files that match the 
escape patterns. (The escape patterns can be configurable—you just need them to 
be specified by something inside the hash. But you definitely want a default 
that works 99% of the time, because if developers and packagers have to think 
it through in every case instead of only in exceptional cases, they’re going to 
get it wrong, and nobody will have any idea who to trust to get it right.)


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: PyInstaller needs Funding by your Company

2020-01-08 Thread songbird
Christian Gollwitzer wrote:
> Am 07.01.20 um 15:09 schrieb Hartmut Goebel:
>> Maintianing PyInstaller at a proper level requires about 4 to 5 days per
>> month. Which means about 4,000 to 5,000 € per month and about 50,000 to
>> 60,000 € per year.
>
> these numbers sound odd to me. 4000€ - 5000€ per month or equivalently 
> 60,000€ per year is the level of academic full-time jobs in Germany, 
> i.e. that would be 4-5 days per week, not per month.

  it is the demand of a volunteer to be paid.

  if people want to pay him that's their business, but
i think a larger company may just instead fork their
own copy or fund an internal developer to track this
project IF it is that important to them.

  he may resign or limit his participation in the future
as with any other volunteer effort.

  it is GPL code.


  songbird
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: [Python-ideas] Re: Enhancing Zipapp

2020-01-08 Thread Andrew Barnert via Python-list
On Jan 8, 2020, at 01:09, Abdur-Rahmaan Janhangeer  wrote:
> 
> Using the wheel-included zip (A), we can generate another zip file (B) with
> the packages installed. That generated zip file is then executed.

But that generated zip B doesn’t have a trustable hash on it, so how can you 
execute it?

If you keep this all hidden inside the zipapp system, where malicious programs 
can’t find and modify the generated zips, then I suppose that’s fine. But at 
that point, why not just install the wheels inside zip A into an auto-generated 
only-for-zip-A venv cache directory or something, and then just run zip A as-is 
against that venv?

> Zip format A solves the problem of cross-platforming.
> Normal solutions upto now like use solution B where you can't share
> your zips across OSes. 

You can still only share zips across OSs if you bundle in a wheel for each 
extension library for every possible platform. For in-house deployments where 
you only care about two platforms (your dev boxes and your deployment cluster 
boxes), that’s fine, but for a publicly released app that’s supposed to work 
“everywhere”, you pretty much have to download and redistribute every wheel on 
PyPI for every dependency, which could make your app pretty big, and require 
pretty frequent updates, and it still only lets you run on systems that have 
wheels for all your dependencies.

If you’re already doing an effective “install” step in building zip B out of 
zip A, why not make that step just use a requirements file and download the 
dependencies from PyPI? You could still run zip B without being online, just 
not zip A.

Maybe you could optionally include wheels and they’d serve as a micro-repo 
sitting in front of PyPI, so when you’re dependencies are small you can 
distribute a version that works for 95% of your potential users without needing 
to do anything fancy but it still works for the other 5% if they can reach PyPI.

(But maybe it would be simpler to just use the zip B as a cache in the first 
place. If I download Spam.zipapp for Win64 3.9, that’s a popular enough 
platform that you probably have a zip B version ready to go and just ship me 
that, so it works immediately. Now, if I copy that file to my Mac instead of 
downloading it fresh, oops, wrong wheels, so it downloads the right ones off 
PyPI and builds a new zipapp for my platform—and it still runs, it just takes a 
bit longer the first time. I’m not sure this is a good idea, but I’m not sure 
trying to include every wheel for every platform is a good idea either…)

But there’s a bigger problem than just distribution. Some extension modules are 
only extension modules for speed, like numpy. But many are there to interface 
with C libraries. If my app depends on PortAudio, distributing the extension 
module as wheels is easy, but it doesn’t do any good unless you have the C 
library installed and configured on your system. Which you probably don’t if 
you’re on Windows or Mac. A package manager like Homebrew or Choco can take 
care of that by just making my app’s package depend on the PortAudio package 
(and maybe even conda can?), but I don’t see how zipapps with wheels in, or 
anything else self-contained, can. And if most such packages eventually migrate 
to binding from Python (using cffi or ctypes) rather than from C (using an 
extension module), that actually makes your problem harder rather than easier, 
because now you can’t even tell from outside the code that there are external 
dependencies; you can distribute a single zipapp that works everywhere, but 
only in the sense that it starts running and quickly fails with an exception 
for most users.


-- 
https://mail.python.org/mailman/listinfo/python-list


Re: [Python-ideas] Re: Enhancing Zipapp

2020-01-08 Thread Rhodri James

On 08/01/2020 18:08, many people wrote lots of stuff...

Folks, could we pick one list and have the discussion there, rather than 
on both python-list and python-ideas?  Getting *four* copies of Andrew's 
emails is a tad distracting :-)


--
Rhodri James *-* Kynesim Ltd
--
https://mail.python.org/mailman/listinfo/python-list


Re: [Python-ideas] Re: Enhancing Zipapp

2020-01-08 Thread Barry Scott



> On 8 Jan 2020, at 16:02, Christopher Barker  wrote:
> 
> On Wed, Jan 8, 2020 at 1:49 AM Abdur-Rahmaan Janhangeer  > wrote:
> Have a look at this write up about the horror that is zip file name handling.
> 
> https://marcosc.com/2008/12/zip-files-and-encoding-i-hate-you/ 
> 
> 
> This has been a pain point at work.
> 
> I'm pretty sure this is a non-issue for this use-case. If you need to open 
> sip files created by arbitrary other systems, or create zip files that can be 
> opened by arbitrary other systems, then it's a big mess. But that isn't the 
> case here.

One claim is that because its zip you can use any of the existing tools.
But this encoding issue means that its likely that you have to use zipapp aware 
tools.

Also can we stop cross posting to 2 lists please.

Pick one and keep the thread on it please.

Barry

> 
> -CHB
> 
> 
> -- 
> Christopher Barker, PhD
> 
> Python Language Consulting
>   - Teaching
>   - Scientific Software Development
>   - Desktop GUI and Web Development
>   - wxPython, numpy, scipy, Cython

-- 
https://mail.python.org/mailman/listinfo/python-list


Coding technique: distinguish using type or abc?

2020-01-08 Thread DL Neil via Python-list
Do you prefer to use isinstance() with type() or to refer to 
collections.abc?



This team producing bases statistical analyses for (lesser capable) 
user-coders to utilise with their own experimental 'control code'; faces 
situations where a list-parameter is often only one element long. As is 
so often the way, amongst the 'clients' there are a couple of 
strong-minded (am not allowed to call them "awkward", or otherwise!) 
user-coder-analysts, who demand that entry of a single-element not 
require them to surround it with "unnecessary" square-brackets. Despite 
complaining, we realise that this is actually quite a good practice, and 
likely save us (as well as 'them') from interface mistakes.


Such single elements appear in both string and numeric formats, but for 
simplicity (here) let's ignore numerics...


The problem rearing its ugly head, is when the string single-element 
becomes input to a for-loop. If we loop around a list, then each element 
is handled individually (as desired). Whereas if the element is a 
string, then each character is treated as if it were a list-element (not)!



In Code Review, I noticed that two 'solutions' have been coded.

1 using type()s to distinguish:

def format_as_list( parameter:Union[ str, list ] )->list:
if isinstance( parameter, str ):
parameter_list = [ parameter ]
elif isinstance( parameter, list ):
parameter_list = parameter
else:
raise NotImplementedError
return parameter_list

2 using abstract base classes from PSL.collections to distinguish:

import collections.abc as abc
def is_list_not_string( parameter:Union[ str, list ] ) -> bool:
return isinstance( parameter, abc.MutableSequence )

def format_as_list( parameter:str )->list:
if is_list_not_string( parameter ):
return parameter
else:
return [ parameter, ]

(ignoring implicit assumption/error!)
NB I've simplified the code and attempted to harmonise the varNMs 
between snippets.


With our preference for EAFP, I went looking for a way to utilise an 
exception by way of distinguishing between the input-types - but 
couldn't see how, without major artifice (false/purposeless construct 
which would confuse the next reader of the code). That said, I'm 
wondering if using (or appearing to use) tuples and *args might solve 
the problem - but will have to dig back through the code-base...



Meantime, faced with such a challenge, would you recommend utilising one 
of these ideas over the other, or perhaps some other solution?


Are there perhaps circumstances where you would use one solution, and 
others the other?


--
Regards,
=dn
--
https://mail.python.org/mailman/listinfo/python-list


Re: Coding technique: distinguish using type or abc?

2020-01-08 Thread Rob Gaddi

On 1/8/20 1:40 PM, DL Neil wrote:

Do you prefer to use isinstance() with type() or to refer to collections.abc?


This team producing bases statistical analyses for (lesser capable) user-coders 
to utilise with their own experimental 'control code'; faces situations where a 
list-parameter is often only one element long. As is so often the way, amongst 
the 'clients' there are a couple of strong-minded (am not allowed to call them 
"awkward", or otherwise!) user-coder-analysts, who demand that entry of a 
single-element not require them to surround it with "unnecessary" 
square-brackets. Despite complaining, we realise that this is actually quite a 
good practice, and likely save us (as well as 'them') from interface mistakes.


Such single elements appear in both string and numeric formats, but for 
simplicity (here) let's ignore numerics...


The problem rearing its ugly head, is when the string single-element becomes 
input to a for-loop. If we loop around a list, then each element is handled 
individually (as desired). Whereas if the element is a string, then each 
character is treated as if it were a list-element (not)!



In Code Review, I noticed that two 'solutions' have been coded.

1 using type()s to distinguish:

 def format_as_list( parameter:Union[ str, list ] )->list:
     if isinstance( parameter, str ):
     parameter_list = [ parameter ]
     elif isinstance( parameter, list ):
     parameter_list = parameter
     else:
     raise NotImplementedError
     return parameter_list

2 using abstract base classes from PSL.collections to distinguish:

 import collections.abc as abc
 def is_list_not_string( parameter:Union[ str, list ] ) -> bool:
     return isinstance( parameter, abc.MutableSequence )

 def format_as_list( parameter:str )->list:
     if is_list_not_string( parameter ):
     return parameter
     else:
     return [ parameter, ]

(ignoring implicit assumption/error!)
NB I've simplified the code and attempted to harmonise the varNMs between 
snippets.

With our preference for EAFP, I went looking for a way to utilise an exception 
by way of distinguishing between the input-types - but couldn't see how, without 
major artifice (false/purposeless construct which would confuse the next reader 
of the code). That said, I'm wondering if using (or appearing to use) tuples and 
*args might solve the problem - but will have to dig back through the code-base...



Meantime, faced with such a challenge, would you recommend utilising one of 
these ideas over the other, or perhaps some other solution?


Are there perhaps circumstances where you would use one solution, and others the 
other?




I try to avoid making assumptions, so I wind up with a lot of

if isinstance(parameter, str):
  plist = [parameter]
else:
  try:
plist = list(parameter)
  except TypeError:
plist = [parameter]

Any iterable gets listified unless it's a string, which gets treated the same 
way a non-iterable does.  EAFP.

--
https://mail.python.org/mailman/listinfo/python-list