Re: Is there a character that never appears in the output of zlib.compress?
On 28Jan2020 23:09, Peng Yu wrote:
I'd like to tell what part is zlib.compress data in an input stream.
One way is to use some characters that never appear in zlib.compress
output to denote the boundary. Are there such characters? Thanks.
If you mean: is there a byte which never appears, then apparently not:
[~]fleet*1> python3 testzlib.py
where testzlib.py contains this code:
from random import randint
import sys
from zlib import compress
unseen = set(range(256))
while unseen:
sys.stdout.write('.')
sys.stdout.flush()
block = bytes(randint(0,255) for _ in range(256))
cdata = compress(block)
for c in cdata:
unseen.discard(c)
sys.stdout.write('\n')
--
https://mail.python.org/mailman/listinfo/python-list
Re: Is there a character that never appears in the output of zlib.compress?
On 1/29/20 12:09 AM, Peng Yu wrote: Hi, I'd like to tell what part is zlib.compress data in an input stream. One way is to use some characters that never appear in zlib.compress output to denote the boundary. Are there such characters? Thanks. A compression routine that avoid one byte value would be less efficient at compression then one that uses all the values. An alternative might be to precede the compressed data with a byte count of how much data will follow (as well as whatever file code you use to indicate that the next data IS compressed data. A second method would be to take some byte value, (like FF) and where ever it occurs in the compressed data, replace it with a doubled value FF FF, and then add a single FF to the end. -- Richard Damon -- https://mail.python.org/mailman/listinfo/python-list
Python 3.8.1
Hi, I installed the Python3.8.1 in my computer. I have other versions 2.6 and 3.7. When I use 2.6 and 3.7 I didn't have problem with: python2 Python 2.6.6 (r266:84292, Jun 20 2019, 14:14:55) [GCC 4.4.7 20120313 (Red Hat 4.4.7-23)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import numpy as np >>> python3.7 Python 3.7.3 | packaged by conda-forge | (default, Jul 1 2019, 21:52:21) [GCC 7.3.0] :: Anaconda, Inc. on linux Type "help", "copyright", "credits" or "license" for more information. >>> import numpy >>> import numpy as np But if I type: python3.8 I had the error messages: import numpy as np import matplotlib.pyplot as plt from mpl_toolkits.basemap import BasemapTraceback (most recent call last): File "", line 1, in ModuleNotFoundError: No module named 'numpy' >>> Traceback (most recent call last): File "", line 1, in ModuleNotFoundError: No module named 'matplotlib' Please what can I do to solve this erros. Thanks, Conrado -- https://mail.python.org/mailman/listinfo/python-list
Re: Python 3.8.1
On 1/29/2020 8:13 AM, J Conrado wrote: I installed the Python3.8.1 in my computer. I have other versions 2.6 and 3.7. When I use 2.6 and 3.7 I didn't have problem with: python2 Python 2.6.6 (r266:84292, Jun 20 2019, 14:14:55) [GCC 4.4.7 20120313 (Red Hat 4.4.7-23)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import numpy as np No matplotlib import here. python3.7 Python 3.7.3 | packaged by conda-forge | (default, Jul 1 2019, 21:52:21) [GCC 7.3.0] :: Anaconda, Inc. on linux Type "help", "copyright", "credits" or "license" for more information. >>> import numpy >>> import numpy as np No matplotlib import here, though Anaconda may include it. But if I type: python3.8 I had the error messages: import numpy as np import matplotlib.pyplot as plt from mpl_toolkits.basemap import BasemapTraceback (most recent call last): File "", line 1, in ModuleNotFoundError: No module named 'numpy' >>> Traceback (most recent call last): File "", line 1, in ModuleNotFoundError: No module named 'matplotlib' Please what can I do to solve this erros. Install matplotlib into your 3.8 install. Possibly $ python3.8 -m pip install matplotlib -- Terry Jan Reedy -- https://mail.python.org/mailman/listinfo/python-list
Re: on sorting things
Tony Flury via Python-list wrote: > > On 20/12/2019 18:59, Peter Otten wrote: >> Chris Angelico wrote: >> >>> On Sat, Dec 21, 2019 at 5:03 AM Peter Otten <[email protected]> wrote: PS: If you are sorting files by size and checksum as part of a deduplication effort consider using dict-s instead: >>> Yeah, I'd agree if that's the purpose. But let's say the point is to >>> have a guaranteed-stable ordering of files that are primarily to be >>> sorted by file size - in order to ensure that two files are in the >>> same order every time you refresh the view, they get sorted by their >>> checksums. >> One thing that struck me about Eli's example is that it features two key >> functions rather than a complex comparison. >> >> If sort() would accept a sequence of key functions each function could be >> used to sort slices that compare equal when using the previous key. > > You don't need a sequence of key functions : the sort algorithm used in > Python (tim-sort) is stable - which means if two items (A &B) are in a > given order in the sequence before the sort starts, and A & B compare > equal during the sort, then after the sort A & B retain their ordering. Thank you for explaining that ;) > > So if you want to sort by file size as the primary and then by checksum > if file sizes are equal - you sort by checksum first, and then by file > size: this guarantees that the items will always be in file size order - > and if file sizes are equal then they will be ordered by checksum. > > The rule to remember - is sort in the reverse order of criteria. The idea behind the "sequence of key functions" suggestion is that you only calculate keys[n] where keys[:n] all compared equal. Example: you have 1000 files and among those there are five pairs with equal size. Let's assume calculating the size is instantaneous whereas calculating the checksum takes one second. Then you spend 10 seconds on calculating the keys with my proposal versus 1000 seconds with a naive reliance on stable sorting. -- https://mail.python.org/mailman/listinfo/python-list
Suggestions on mechanism or existing code - maintain persistence of file download history
Hi all I'm almost embarrassed to ask this as it's "so simple", but thought I'd give it a go... I want to be a able to use a simple 'download manager' which I was going to write (in Python), but then wondered if there was something suitable already out there. I haven't found it, but thought people here might have some ideas for existing work, or approaches. The situation is this - I have a long list of file URLs and want to download these as a 'background task'. I want this to process to be 'crudely persistent' - you can CTRL-C out, and next time you run things it will pick up where it left off. The download part is not difficult. Is is the persistence bit I am thinking about. It is not easy to tell the name of the downloaded file from the URL. I could have a file with all the URLs listed and work through each line in turn. But then I would have to rewrite the file (say, with the previously-successful lines commented out) as I go. I also thought of having the actual URLs as filenames (of zero length) in a 'source' directory. The process would then look at each filename in turn, and download the appropriate URL. Then the 'filename file' would either be moved to a 'done' directory, or perhaps renamed to something that the process wouldn't subsequently pick up. But I would have thought that some utility to do this kind of this exists already. Any pointers? Or any comments on the above suggested methods? Thanks J^n -- https://mail.python.org/mailman/listinfo/python-list
Re: Suggestions on mechanism or existing code - maintain persistence of file download history
On Thu, Jan 30, 2020 at 7:06 AM jkn wrote: > > Hi all > I'm almost embarrassed to ask this as it's "so simple", but thought I'd > give > it a go... Hey, nothing wrong with that! > I want to be a able to use a simple 'download manager' which I was going to > write > (in Python), but then wondered if there was something suitable already out > there. > I haven't found it, but thought people here might have some ideas for > existing work, or approaches. > > The situation is this - I have a long list of file URLs and want to download > these > as a 'background task'. I want this to process to be 'crudely persistent' - > you > can CTRL-C out, and next time you run things it will pick up where it left > off. A decent project. I've done this before but in restricted ways. > The download part is not difficult. Is is the persistence bit I am thinking > about. > It is not easy to tell the name of the downloaded file from the URL. > > I could have a file with all the URLs listed and work through each line in > turn. > But then I would have to rewrite the file (say, with the previously-successful > lines commented out) as I go. > Hmm. The easiest way would be to have something from the URL in the file name. For instance, you could hash the URL and put the first few digits of the hash in the file name, so http://some.domain.example/some/path/filename.html might get saved into "a39321604c - filename.html". That way, if you want to know if it's been downloaded already, you just hash the URL and see if any file begins with those digits. Would that kind of idea work? ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Suggestions on mechanism or existing code - maintain persistence of file download history
On 2020-01-29 20:00, jkn wrote: Hi all I'm almost embarrassed to ask this as it's "so simple", but thought I'd give it a go... I want to be a able to use a simple 'download manager' which I was going to write (in Python), but then wondered if there was something suitable already out there. I haven't found it, but thought people here might have some ideas for existing work, or approaches. The situation is this - I have a long list of file URLs and want to download these as a 'background task'. I want this to process to be 'crudely persistent' - you can CTRL-C out, and next time you run things it will pick up where it left off. The download part is not difficult. Is is the persistence bit I am thinking about. It is not easy to tell the name of the downloaded file from the URL. I could have a file with all the URLs listed and work through each line in turn. But then I would have to rewrite the file (say, with the previously-successful lines commented out) as I go. Why comment out the lines yourself when the download manager could do it for you? Load the list from disk. For each uncommented line: Download the file. Comment out the line. Write the list back to disk. I also thought of having the actual URLs as filenames (of zero length) in a 'source' directory. The process would then look at each filename in turn, and download the appropriate URL. Then the 'filename file' would either be moved to a 'done' directory, or perhaps renamed to something that the process wouldn't subsequently pick up. But I would have thought that some utility to do this kind of this exists already. Any pointers? Or any comments on the above suggested methods? -- https://mail.python.org/mailman/listinfo/python-list
Re: Suggestions on mechanism or existing code - maintain persistence of file download history
On Thu, Jan 30, 2020 at 7:49 AM MRAB wrote: > > On 2020-01-29 20:00, jkn wrote: > > I could have a file with all the URLs listed and work through each line in > > turn. > > But then I would have to rewrite the file (say, with the > > previously-successful > > lines commented out) as I go. > > > Why comment out the lines yourself when the download manager could do it > for you? > > Load the list from disk. > > For each uncommented line: > > Download the file. > > Comment out the line. > > Write the list back to disk. > Isn't that exactly what the OP was talking about? It involves rewriting the file at every step, with the consequent risks of trampling on other changes, corruption on error, etc, etc, etc. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Suggestions on mechanism or existing code - maintain persistence of file download history
On Thu, 30 Jan 2020 07:26:36 +1100 Chris Angelico wrote: > On Thu, Jan 30, 2020 at 7:06 AM jkn wrote: > > The situation is this - I have a long list of file URLs and want to > > download these as a 'background task'. I want this to process to be > > 'crudely persistent' - you can CTRL-C out, and next time you run > > things it will pick up where it left off. > A decent project. I've done this before but in restricted ways. > > The download part is not difficult. Is is the persistence bit I am > > thinking about. It is not easy to tell the name of the downloaded > > file from the URL. Where do the names of the downloaded files come from now, and why can't that same algorithm be used later to determine the existence of the file? How much control do you have over this algorithm (which leads to what ChrisA suggested)? > > I could have a file with all the URLs listed and work through each > > line in turn. But then I would have to rewrite the file (say, with > > the previously-successful lines commented out) as I go. Files have that problem. Other solutions, e.g., a sqlite3 database, don't. Also, a database might give you a place to store other information about the URL, such as the name of the associated file. > Hmm. The easiest way would be to have something from the URL in the > file name. For instance, you could hash the URL and put the first few > digits of the hash in the file name, so > http://some.domain.example/some/path/filename.html might get saved > into "a39321604c - filename.html". That way, if you want to know if > it's been downloaded already, you just hash the URL and see if any > file begins with those digits. > Would that kind of idea work? Dan -- “Atoms are not things.” – Werner Heisenberg Dan Sommers, http://www.tombstonezero.net/dan -- https://mail.python.org/mailman/listinfo/python-list
Re: Suggestions on mechanism or existing code - maintain persistence of file download history
On Wednesday, January 29, 2020 at 8:27:03 PM UTC, Chris Angelico wrote: > On Thu, Jan 30, 2020 at 7:06 AM jkn wrote: > > > > Hi all > > I'm almost embarrassed to ask this as it's "so simple", but thought I'd > > give > > it a go... > > Hey, nothing wrong with that! > > > I want to be a able to use a simple 'download manager' which I was going to > > write > > (in Python), but then wondered if there was something suitable already out > > there. > > I haven't found it, but thought people here might have some ideas for > > existing work, or approaches. > > > > The situation is this - I have a long list of file URLs and want to > > download these > > as a 'background task'. I want this to process to be 'crudely persistent' - > > you > > can CTRL-C out, and next time you run things it will pick up where it left > > off. > > A decent project. I've done this before but in restricted ways. > > > The download part is not difficult. Is is the persistence bit I am thinking > > about. > > It is not easy to tell the name of the downloaded file from the URL. > > > > I could have a file with all the URLs listed and work through each line in > > turn. > > But then I would have to rewrite the file (say, with the > > previously-successful > > lines commented out) as I go. > > > > Hmm. The easiest way would be to have something from the URL in the > file name. For instance, you could hash the URL and put the first few > digits of the hash in the file name, so > http://some.domain.example/some/path/filename.html might get saved > into "a39321604c - filename.html". That way, if you want to know if > it's been downloaded already, you just hash the URL and see if any > file begins with those digits. > > Would that kind of idea work? > > ChrisA Hi Chris Thanks for the idea. I should perhaps have said more clearly that it is not easy (though perhaps not impossible) to infer the name of the downloaded data from the URL - it is not a 'simple' file URL, more of a tag. However I guess your scheme would work if I just hashed the URL and created a marker file - "a39321604c.downloaded" once downloaded. The downloaded content would be separately (and somewhat opaquely) named, but that doesn't matter. MRAB's scheme does have the disadvantages to me that Chris has pointed out. Jon N -- https://mail.python.org/mailman/listinfo/python-list
Re: Suggestions on mechanism or existing code - maintain persistence of file download history
On 30/01/20 10:38 AM, jkn wrote: On Wednesday, January 29, 2020 at 8:27:03 PM UTC, Chris Angelico wrote: On Thu, Jan 30, 2020 at 7:06 AM jkn wrote: I want to be a able to use a simple 'download manager' which I was going to write (in Python), but then wondered if there was something suitable already out there. I haven't found it, but thought people here might have some ideas for existing work, or approaches. The situation is this - I have a long list of file URLs and want to download these as a 'background task'. I want this to process to be 'crudely persistent' - you can CTRL-C out, and next time you run things it will pick up where it left off. A decent project. I've done this before but in restricted ways. The download part is not difficult. Is is the persistence bit I am thinking about. It is not easy to tell the name of the downloaded file from the URL. I could have a file with all the URLs listed and work through each line in turn. But then I would have to rewrite the file (say, with the previously-successful lines commented out) as I go. ... Thanks for the idea. I should perhaps have said more clearly that it is not easy (though perhaps not impossible) to infer the name of the downloaded data from the URL - it is not a 'simple' file URL, more of a tag. However I guess your scheme would work if I just hashed the URL and created a marker file - "a39321604c.downloaded" once downloaded. The downloaded content would be separately (and somewhat opaquely) named, but that doesn't matter. MRAB's scheme does have the disadvantages to me that Chris has pointed out. Accordingly, +1 to @Dan's suggestion of a database*: - it can be structured to act as a queue, for URLs yet to be downloaded - when downloading starts, the pertinent row can be updated to include the fileNM in use (a separate field from the URL) - when the download is complete, further update the row with a suitable 'flag' - as long as each write/update is commit-ed, the system will be interrupt-able (^c). Upon resumption, query the DB looking for entries without completion-flags, and re-start/resume the download process. If a downloaded file is (later) found to be corrupt, either add the details to the queue again, or remove the 'flag' from the original entry. This method could also be extended/complicated to work if you (are smart about) implement multiple retrieval threads... * NB I don't use SQLite (in favor of going 'full-fat') and thus cannot vouch for its behavior under load/queuing mechanism/concurrent accesses... but I'm biased and probably think/write SQL more readily than Python - oops! -- Regards =dn -- https://mail.python.org/mailman/listinfo/python-list
Re: Suggestions on mechanism or existing code - maintain persistence of file download history
On Thu, Jan 30, 2020 at 8:59 AM DL Neil via Python-list wrote: > * NB I don't use SQLite (in favor of going 'full-fat') and thus cannot > vouch for its behavior under load/queuing mechanism/concurrent > accesses... but I'm biased and probably think/write SQL more readily > than Python - oops! I don't use SQLite either, and I always have a PostgreSQL database around that I can use. That said, though, I believe SQLite is fine in terms of reliability; the reason it's a bad choice for concurrency is that it uses large-scale locks to ensure safety, which means that multiple writers will block against each other. But that's fine for this use-case. So my recommendations would be: 1) Something stateless, or where the state is intrinsic to the downloaded files 2) Or failing that, use a database rather than a flat file for your state. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
PyQt5 QLineEditor help!!!
Hi guys I just started to learn PyQt5 and was wondering if like kivy we can delete the text in a textbox after taking the input. That is I want to make the textbox blank after the text is read. Also can you suggest a way to connect a cancel button with a function so that when the cancel button is clicked it exists a window. Tank you in advance. -- https://mail.python.org/mailman/listinfo/python-list
Help on dictionaries...
Hey I was thinking how I can save a dictionary in python(obviously) so that the script is rerun it automatically loads the dictionary. -- https://mail.python.org/mailman/listinfo/python-list
Re: Help on dictionaries...
On 1/29/20 6:14 PM, Souvik Dutta wrote: > Hey I was thinking how I can save a dictionary in python(obviously) so that > the script is rerun it automatically loads the dictionary. You could use the pickle module for that. See the python.org documentation on pickle. Alternatively you could use a json library to write the dict to disk. I think this might be preferable to pickle in many situations. Or serialize the data yourself to a file. -- https://mail.python.org/mailman/listinfo/python-list
Re: Help on dictionaries...
On 30/01/20 2:14 PM, Souvik Dutta wrote: Hey I was thinking how I can save a dictionary in python(obviously) so that the script is rerun it automatically loads the dictionary. Perhaps a YAML or JSON file (which follow a very similar format and structure to Python dicts), or a 'NoSQL' database such as MongoDB. -- Regards =dn -- https://mail.python.org/mailman/listinfo/python-list
Re: Help on dictionaries...
On 2020-01-30 06:44, Souvik Dutta wrote:
> Hey I was thinking how I can save a dictionary in python(obviously)
> so that the script is rerun it automatically loads the dictionary.
This is almost exactly what the "dbm" (nee "anydbm") module does, but
persisting the dictionary out to the disk:
import dbm
from sys import argv
with dbm.open("my_cache", "c") as db:
if len(argv) > 1:
key = argv[1]
if key in db:
print("Found it:", db[key])
else:
print("Not found. Adding")
if len(argv) > 2:
value = argv[2]
else:
value = key
db[key] = value
else:
print("There are %i items in the cache" % len(db))
The resulting "db" acts like a dictionary, but persists.
If you really must have the results as a "real" dict, you can do the
conversion:
real_dict = dict(db)
-tkc
--
https://mail.python.org/mailman/listinfo/python-list
Re: PyQt5 QLineEditor help!!!
On 1/29/20 6:11 PM, Souvik Dutta wrote: > Hi guys I just started to learn PyQt5 and was wondering if like kivy we can > delete the text in a textbox after taking the input. That is I want to make > the textbox blank after the text is read. Also can you suggest a way to > connect a cancel button with a function so that when the cancel button is > clicked it exists a window. Tank you in advance. How do you know when the input is done (taken)? If it's from pressing Enter, then you'll probably have to capture that keystroke somehow (is there a signal defined in QWidget that might do that?). If it's from a button press that activates something, then from the button click callback handler you would call clear() on the QLineEditor instance. You can definitely close a window on a cancel button click. If you just want to hide the window for displaying in the future, call setVisible(false) on the window object. If you want to destroy the window object, you would probably call the "destroy()" method of the window, and then delete the python reference to it (either let it go out of scope, or use del on it). -- https://mail.python.org/mailman/listinfo/python-list
Re: Help on dictionaries...
Thank you all. On Thu, Jan 30, 2020, 7:25 AM DL Neil via Python-list < [email protected]> wrote: > On 30/01/20 2:14 PM, Souvik Dutta wrote: > > Hey I was thinking how I can save a dictionary in python(obviously) so > that > > the script is rerun it automatically loads the dictionary. > > > Perhaps a YAML or JSON file (which follow a very similar format and > structure to Python dicts), or a 'NoSQL' database such as MongoDB. > > -- > Regards =dn > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list
Re: on sorting things
On 20Dec2019 08:23, Chris Angelico wrote: On Fri, Dec 20, 2019 at 8:06 AM Eli the Bearded <*@eli.users.panix.com> wrote: Consider a sort that first compares file size and if the same number of bytes, then compares file checksum. Any decently scaled real world implementation would memoize the checksum for speed, but only work it out for files that do not have a unique file size. The key method requires it worked out in advance for everything. But I see the key method handles the memoization under the hood for you, so those simpler, more common sorts of sort get an easy to see benefit. I guess that's a strange situation that might actually need this kind of optimization, but if you really do have that situation, you can make a magical key that behaves the way you want. [... example implementation ...] The classic situation matching Eli's criteria is comparing file trees for equivalent files, for backup or synchronisation or hard linking purposes; I've a script which does exactly what he describes in terms of comparison (size, then checksum, but I checksum a short prefix before doing a full file checksum, so even more fiddly). However, my example above isn't very amenable to sorts, because you never bother looking at checksums at all for files of different sizes. OTOH, I do sort the files by size before processing the checksum phases, letting one sync/reclaim the big files first for example - a policy choice. Cheers, Cameron Simpson -- https://mail.python.org/mailman/listinfo/python-list
Re: Help on dictionaries...
On 2020-01-30 01:51, Michael Torrie wrote: On 1/29/20 6:14 PM, Souvik Dutta wrote: Hey I was thinking how I can save a dictionary in python(obviously) so that the script is rerun it automatically loads the dictionary. You could use the pickle module for that. See the python.org documentation on pickle. Alternatively you could use a json library to write the dict to disk. I think this might be preferable to pickle in many situations. Or serialize the data yourself to a file. JSON itself supports only a limited range of types, so additional work is needed, especially when loading, if the dict contains any other types beyond those. -- https://mail.python.org/mailman/listinfo/python-list
Re: Help on dictionaries...
How do I connect it with my dictionary
On Thu, Jan 30, 2020, 7:03 AM Tim Chase
wrote:
> On 2020-01-30 06:44, Souvik Dutta wrote:
> > Hey I was thinking how I can save a dictionary in python(obviously)
> > so that the script is rerun it automatically loads the dictionary.
>
> This is almost exactly what the "dbm" (nee "anydbm") module does, but
> persisting the dictionary out to the disk:
>
> import dbm
> from sys import argv
> with dbm.open("my_cache", "c") as db:
> if len(argv) > 1:
> key = argv[1]
> if key in db:
> print("Found it:", db[key])
> else:
> print("Not found. Adding")
> if len(argv) > 2:
> value = argv[2]
> else:
> value = key
> db[key] = value
> else:
> print("There are %i items in the cache" % len(db))
>
> The resulting "db" acts like a dictionary, but persists.
>
> If you really must have the results as a "real" dict, you can do the
> conversion:
>
> real_dict = dict(db)
>
> -tkc
>
>
>
>
--
https://mail.python.org/mailman/listinfo/python-list
Was: Dynamic Data type assignment
Further thoughts on the OP's point:- On 29/01/20 4:51 PM, sushma ms wrote: ... But why can't we make output of input also dynamic data assignment. ... when i'm assigning value dynamically and when we comparing in "if" loop it is throwing compiler error. It should not throw error it should assign and act as int why it is thinking as string. NB am not disputing the facts: WebRef: https://docs.python.org/3/library/functions.html#input Coincidentally, not long after this list-conversation, I was asked to take a look at a command-line program(me) which was not accepting arguments per spec. First, I dived into the PSL to refresh my memory. There we find all manner of 'goodies' for formatting the cmdLN I/P, selecting its type, defaults, etc, etc. Why do we have this at the cmdLN and yet not have something similar for input? Perhaps there is a small library available (that I've never gone looking to find) which wraps input() and facilitates the input of int[egers], for example? What the OP asks is not really 'out there', we used FORMAT to type INPUT data in FORTRAN, back in the ?good old days! (don't over-excite me or I'll threaten you with my walking-stick...) -- Regards =dn -- https://mail.python.org/mailman/listinfo/python-list
