Re: Compression of random binary data
[email protected] wrote: Compress this: 4135124325 Bin to dec...still very large 0110 0000 1101 01100101 Wait right there! You're cheating by dropping off leading 0 bits. The maximum value of a 10 digit decimal number is 99, which in hex is 2540be3ff. That's 34 bits. That's in line with the theoretical number of bits needed: log2(10) * 10 = 33.219 So the binary version of your number above is really 00 0110 0000 1101 01100101 You may think you can get away without storing or transmitting those leading 0 bits, because the decoder can always pad out the data as needed. But to do that, the decoder needs to know *how many* bits to pad out to. That information somehow needs to be part of the encoded data. You need to imagine you're sending the data to someone over a wire. The only thing you can send along the wire are ones and zeroes. You can't convey any information by timing, or turning off the power, or anything like that. How is the receiver going to know when he's got the whole message? There are only two possibilities. Either you decide in advance that all messages will be exactly the same length, which in this case means always sending exactly 34 bits. Or you add some extra bits to the message -- prepend the length in binary, or add an end-of-message code at the end, or something like that. Whatever you do, you'll find that *on average* you will need *at least* 34 bits to be able to represent all possible 10-digit decimal numbers. Some might be shorter, but then others will be longer, and the average won't be less than 34. New compression method: 11000101 11000111 0100 A full byte less than bin. You need to be *very* careful about what you're claiming here. Are you saying that your algorithm compresses *all possible* sequences of 10 decimal digits to 3 bytes or less? Or can some of them come out longer? -- Greg -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
Greg, you're very smart, but you are missing a big key. I'm not padding, you are still thinking inside the box, and will never solve this by doing so. Yes! At least you see my accomplishment, this will compress any random file. -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
Am 23.10.17 um 12:13 schrieb Marko Rauhamaa: Thomas Jollans : On 2017-10-23 11:32, [email protected] wrote: According to this website. This is an uncompressable stream. https://en.m.wikipedia.org/wiki/Incompressible_string 12344321 No, it's not. According to that article, that string is incompressible by a particular algorithm. I can see no more general claims. Here's a compression algorithm that manages to compress that string into a 0-bit string: * If the original string is 12344321 (whatever that means), return the empty bit string. * Otherwise, prepend a don't-care bit to the original string and return the result of the concatenation. ...and that's why there is the "Kolmogorov complexity". You need to append the decompression program to the data to show how much you really saved, which will turn out to be nothing compared to the "trivial decompressor" print "12344321" Christian -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
No leading zeroes are being dropped offwish this board has an edit button. -- https://mail.python.org/mailman/listinfo/python-list
[ANN] Nuclio: A scalable, open source, real-time processing platform
Hi, Just wanted to share a project I'm working on. It a super fast serverless that support Python handlers as well. Check out more at https://www.iguazio.com/nuclio-new-serverless-superhero/ Code at https://github.com/nuclio/nuclio/ Happy hacking, -- Miki -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
[email protected] wrote: My 8 year old can decode this back into base 10, Keep in mind that your 8 year old has more information than just the 32 bits you wrote down -- he can also see that there *are* 32 bits and no more. That's hidden information that you're not counting. -- Greg -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
Paul Moore wrote: But that's not "compression", that's simply using a better encoding. In the technical sense, "compression" is about looking at redundancies that go beyond the case of how effectively you pack data into the bytes available. There may be a difference in the way the terms are used, but I don't think there's any fundamental difference. Compression is about finding clever ways to make the encoding better. Either way, the information-theoretic limits on the number of bits needed are the same. -- Greg -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
On 24 October 2017 at 09:43, Gregory Ewing wrote: > Paul Moore wrote: >> >> But that's not "compression", that's simply using a better encoding. >> In the technical sense, "compression" is about looking at redundancies >> that go beyond the case of how effectively you pack data into the >> bytes available. > > > There may be a difference in the way the terms are used, but > I don't think there's any fundamental difference. Compression > is about finding clever ways to make the encoding better. Agreed - I was trying (probably futilely, given the way this thread has gone...) to make a distinction between purely local properties that are typically considered in "how you encode the data" and the detection of more global patterns, which is where what are typically referred to as "compression" algorithms get their power. But sadly, I don't think the OP is actually interested in understanding the background, so the distinction wasn't really worth making :-( > Either way, the information-theoretic limits on the number > of bits needed are the same. Precisely. Paul -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
[email protected] writes: > Finally figured out how to turn this into a random binary compression > program. Since my transform can compress more than dec to binary. Then > i took a random binary stream, Forget random data. For one thing it's hard to define, but more importantly no one cares about it. By its very nature, random data is not interesting. What people want is a reversible compression algorithm that works on *arbitrary data* -- i.e. on *any* file at all, no matter how structured and *non-random* it is. For example, run the complete works of Shakespeare through your program. The result is very much not random data, but that's the sort of data people want to compress. If you can compress the output of your compressor you have made a good start. Of course what you really want to be able to do is to compress the output that results from compressing your compressed out. And, of course, you should not stop there. Since you can compress *any* data (not just the boring random stuff) you can keep going -- compressing the compressed output again and again until you end up with a zero-length file. Then you publish in a major journal. Post the link to the journal article when you are done. -- Ben. -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
On Tue, 24 Oct 2017 05:20 pm, Gregory Ewing wrote: > [email protected] wrote: >> I did that quite a while ago. 352,954 kb. > > Are you sure? Does that include the size of all the > code, lookup tables, etc. needed to decompress it? > > But even if you have, you haven't disproved the theorem about > compressing random data. All you have is a program that > compresses *that particular* sequence of a million digits. > > To disprove the theorem, you would need to exhibit an > algorithm that can compress *any* sequence of a million > digits to less than 415,241 bytes. Indeed -- but let's give Dancerswithnumbers his due. *IF* he is right (a very big "if" indeed) about being able to compress the Rand Corporation "Million Random Digits" in binary form, as given, that alone would be an impressive trick. Compressing the digits in text form is not impressive in the least. As Ben Bacarisse pointed out, most of us will probably already have half a dozen programs that do that. -- Steve “Cheer up,” they said, “things could be worse.” So I cheered up, and sure enough, things got worse. -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
On 24 October 2017 at 11:23, Ben Bacarisse wrote: > For example, run the complete works of Shakespeare through your program. > The result is very much not random data, but that's the sort of data > people want to compress. If you can compress the output of your > compressor you have made a good start. Of course what you really want > to be able to do is to compress the output that results from compressing > your compressed out. And, of course, you should not stop there. Since > you can compress *any* data (not just the boring random stuff) you can > keep going -- compressing the compressed output again and again until > you end up with a zero-length file. Oh, and just for fun, if you are able to guarantee compressing arbitrary data, then 1. Take a document you want to compress. 2. Compress it using your magic algorithm. The result is smaller. 3. Compress the compressed data. The result is still smaller. 4. Repeat until you hit 0 bytes. Congratulations - apparently you have a reversible algorithm that compresses every data set to an empty file. (Caveat - there's actually "hidden data" here, as you need to know how many compressions it takes to hit 0 bytes. Because you decrease the size every time, though, that number must be no greater than the size of the original file). Paul -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
Paul Moore writes: > On 24 October 2017 at 11:23, Ben Bacarisse wrote: >> For example, run the complete works of Shakespeare through your program. >> The result is very much not random data, but that's the sort of data >> people want to compress. If you can compress the output of your >> compressor you have made a good start. Of course what you really want >> to be able to do is to compress the output that results from compressing >> your compressed out. And, of course, you should not stop there. Since >> you can compress *any* data (not just the boring random stuff) you can >> keep going -- compressing the compressed output again and again until >> you end up with a zero-length file. > > Oh, and just for fun, if you are able to guarantee compressing > arbitrary data, then It's a small point, but you are replying to a post of mine and saying "you". That could make people think that /I/ am claiming to have a perfect compression algorithm. > 1. Take a document you want to compress. > 2. Compress it using your magic algorithm. The result is smaller. > 3. Compress the compressed data. The result is still smaller. > 4. Repeat until you hit 0 bytes. Isn't this just repeating what I said? I must has not written is clearly enough. -- Ben. -- https://mail.python.org/mailman/listinfo/python-list
Re: choice of web-framework
On Tue, Oct 24, 2017 at 6:57 AM, Chris Warrick wrote:
> On 23 October 2017 at 21:37, John Black wrote:
>> Chris, thanks for all this detailed information. I am confused though
>> with your database recommendation. You say you teach SQLAlchemy but
>> generally use PostgreSQL yourself. I can maybe guess why there seems to
>> be this contradiction. Perhaps PostgreSQL is better but too advanced for
>> the class you are teaching? Can you clarify on which you think is the
>> better choice? Thanks.
>
> Different Chris, but I’ll answer. Those are two very different things.
>
> PostgreSQL is a database server. It talks SQL to clients, stores data,
> retrieves it when asked. The usual stuff a database server does.
> Alternatives: SQLite, MySQL, MS SQL, Oracle DB, …
>
> SQLAlchemy is an ORM: an object-relational mapper, and also a database
> toolkit. SQLAlchemy can abstract multiple database servers/engines
> (PostgreSQL, SQLite, MySQL, etc.) and work with them from the same
> codebase. It can also hide SQL from you and instead give you Python
> classes. If you use an ORM like SQLAlchemy, you get database support
> without writing a single line of SQL on your own. But you still need a
> database engine — PostgreSQL can be one of them. But you can deploy
> the same code to different DB engines, and it will just work™
> (assuming you didn’t use any DB-specific features). Alternatives:
> Django ORM.
>
> psycopg2 is an example of a PostgreSQL client library for Python. It
> implements the Python DB-API and lets you use it to talk to a
> PostgreSQL server. When using psycopg2, you’re responsible for writing
> your own SQL statements for the server to execute. In that approach,
> you’re stuck with PostgreSQL and psycopg2 unless you rewrite your code
> to be compatible with the other database/library. Alternatives (other
> DBs): sqlite3, mysqlclient. There are also other PostgreSQL libraries
> available.
>
Thanks, namesake :)
The above is correct and mostly accurate. It IS possible to switch out
your back end fairly easily, though, even with psycopg2; there's a
standard API that most Python database packages follow. As long as you
stick to standard SQL (no PostgreSQL extensions) and the standard API
(no psycopg2 extensions), switching databases is as simple as changing
your "import psycopg2" into "import cx_oracle" or something. (And,
most likely, changing your database credentials.)
The point of an ORM is to make your databasing code look and feel like
Python code, rather than manually crafting SQL statements everywhere.
Here's how a simple database operation looks in SQLAlchemy:
def spammify(id):
person = session.query(Person).get(id)
person.style = "spam"
session.commit()
Here's the equivalent using psycopg2:
def spammify(id):
with db, db.cursor() as cur:
cur.execute("update people set style='spam' where id=%s", id)
With SQLAlchemy, you ask for a particular record, and you get back an
object. That object has attributes for all the person's information,
and you can both read and write those attributes. Then you commit when
you're done. Without SQLAlchemy, you use another language (SQL),
embedded within your Python code.
The choice is mostly one of style and preference. But if you don't
currently have a preference, I would recommend using an ORM.
(There are other ORMs than SQLAlchemy, of course; I can't recall the
exact syntax for Django's off the top of my head, but it's going to be
broadly similar to this.)
ChrisA
--
https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
On 24 October 2017 at 12:04, Ben Bacarisse wrote: > Paul Moore writes: > >> On 24 October 2017 at 11:23, Ben Bacarisse wrote: >>> For example, run the complete works of Shakespeare through your program. >>> The result is very much not random data, but that's the sort of data >>> people want to compress. If you can compress the output of your >>> compressor you have made a good start. Of course what you really want >>> to be able to do is to compress the output that results from compressing >>> your compressed out. And, of course, you should not stop there. Since >>> you can compress *any* data (not just the boring random stuff) you can >>> keep going -- compressing the compressed output again and again until >>> you end up with a zero-length file. >> >> Oh, and just for fun, if you are able to guarantee compressing >> arbitrary data, then > > It's a small point, but you are replying to a post of mine and saying > "you". That could make people think that /I/ am claiming to have a perfect > compression algorithm. Sorry. I intended the meaning "If one is able to..." but I was unclear. My bad. >> 1. Take a document you want to compress. >> 2. Compress it using your magic algorithm. The result is smaller. >> 3. Compress the compressed data. The result is still smaller. >> 4. Repeat until you hit 0 bytes. > > Isn't this just repeating what I said? I must has not written is > clearly enough. More accurately, I didn't read it carefully enough. Again sorry. However, I guess it serves as an example of a compression algorithm - we can trivially compress the content of our two posts into a single post with just as much information content, by deleting my post :-) Paul -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
On Tue, 24 Oct 2017 09:23 pm, Ben Bacarisse wrote: > Forget random data. For one thing it's hard to define, That bit is true. > but more importantly no one cares about it. But that's wrong. For instance: - Encrypted data looks very much like random noise. With more and more data traversing the internet in encrypted form, the ability to compress random noise would be worth billions. - Compressed data looks somewhat like random noise (with a bit of structure). The more it is compressed, the more random it looks. If you could compress random noise, you could take already compressed data, and compress it again, saving even more space. - Many multimedia formats (images, sound, video) are compressed using dedicated encoders. The better the encoder, the more it compresses the data (whether lossy or not) the harder it is to compress it further. If you could compress random noise, you could compress JPGs, MP3s, h265-encoded MKVs, etc, saving even more storage and transmission costs. And most importantly: - Random data is a superset of the arbitrary structured data you mention below. If we could compress random data, then we could compress any data at all, no matter how much or little structure it contained. This is why the ability to compress random data (if it were possible, which it is not) is interesting. Its not because people want to be able to compress last night's lottery numbers, or tables of random digits. > By its very nature, random data is > not interesting. What people want is a reversible compression algorithm > that works on *arbitrary data* -- i.e. on *any* file at all, no matter > how structured and *non-random* it is. In a sense you are right. Compressing randomly generated data would be a parlour trick and not specifically very useful. But if you had such an algorithm, that would change the face of the world. It would be as revolutionary and paradigm breaking as a perpetual motion machine, or discovery of a new continent the size of China in the middle of the Atlantic, or that π actually does equal 22/7 exactly. And just as impossible. > For example, run the complete works of Shakespeare through your program. > The result is very much not random data, but that's the sort of data > people want to compress. If you can compress the output of your > compressor you have made a good start. Of course what you really want > to be able to do is to compress the output that results from compressing > your compressed out. And, of course, you should not stop there. Since > you can compress *any* data (not just the boring random stuff) you can > keep going -- compressing the compressed output again and again until > you end up with a zero-length file. Indeed. That proof by contradiction is yet another reason we know we can't compress random data -- that is to say, *arbitrary* data. If we had a compression program which could guarantee to ALWAYS shrink ANY file by at least one bit, then we could just apply it over and over again, shrinking the compressed file again and again, until we were left with a zero-byte file: original.dat = 110 MB original.dat.zip.zip.zip.zip.zip.zip.zip = 0 MB And then reverse the process, to turn an empty file back into the original. But given an empty file, how do you distinguish the empty file you get from 'music.mp3' and the identical empty file you get from 'movie.avi'? Obviously you cannot. So we know that the only way to *guarantee* to shrink every possible file is if the compression is lossy. > Then you publish in a major journal. Post the link to the journal > article when you are done. These days there are plenty of predatory journals which will be happy to take Dancerswithnumber's money in return for publishing it in a junk journal. https://en.wikipedia.org/wiki/Predatory_open_access_publishing -- Steve “Cheer up,” they said, “things could be worse.” So I cheered up, and sure enough, things got worse. -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
On Tue, 24 Oct 2017 06:46 pm, [email protected] wrote: > Greg, you're very smart, but you are missing a big key. I'm not padding, > you are still thinking inside the box, and will never solve this by doing > so. Yes! At least you see my accomplishment, this will compress any random > file. Talk is cheap. -- Steve “Cheer up,” they said, “things could be worse.” So I cheered up, and sure enough, things got worse. -- https://mail.python.org/mailman/listinfo/python-list
Re: Installing tkinter on FreeBSD
Op 2017-10-23, Thomas Jollans schreef : > On 24/10/17 00:16, Dick Holmes wrote: >> I am trying to use tkinter on a FreeBSD system but the installed >> versions of Python (2.7 and 3.6) don't have thinter configured. I tried >> to download the source (no binaries available for FreeBSD). What version of FreeBSD is that? On 11.1 I get: $ pkg search tkinter py27-tkinter-2.7.14_6 Python bindings to the Tk widget set (Python 2.7) py34-tkinter-3.4.7_6 Python bindings to the Tk widget set (Python 3.4) py35-tkinter-3.5.4_6 Python bindings to the Tk widget set (Python 3.5) py36-tkinter-3.6.2_6 Python bindings to the Tk widget set (Python 3.6) pypy-tkinter-5.8.0 PyPy bindings to the Tk widget set and for sure installing py36-tkinter-3.6.2_6 works fine. Stephan -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
Steve D'Aprano writes: > On Tue, 24 Oct 2017 09:23 pm, Ben Bacarisse wrote: > >> Forget random data. For one thing it's hard to define, > > That bit is true. > >> but more importantly no one cares about it. > > But that's wrong. All generalisations are false. I was being hyperbolic. > For instance: > > - Encrypted data looks very much like random noise. With more and more data > traversing the internet in encrypted form, the ability to compress random > noise would be worth billions. > > - Compressed data looks somewhat like random noise (with a bit of structure). > The more it is compressed, the more random it looks. If you could compress > random noise, you could take already compressed data, and compress it again, > saving even more space. > > - Many multimedia formats (images, sound, video) are compressed using > dedicated encoders. The better the encoder, the more it compresses the data > (whether lossy or not) the harder it is to compress it further. If you could > compress random noise, you could compress JPGs, MP3s, h265-encoded MKVs, > etc, saving even more storage and transmission costs. But these are not random data. We care about these because they are are highly structured, non-random data. > And most importantly: > > - Random data is a superset of the arbitrary structured data you mention > below. If we could compress random data, then we could compress any data > at all, no matter how much or little structure it contained. Yes, that's part of my point. Arbitrary data includes random data but it avoids arguments about what random means. > This is why the ability to compress random data (if it were possible, which it > is not) is interesting. Its not because people want to be able to compress > last night's lottery numbers, or tables of random digits. The trouble is a pedagogic one. Saying "you can't compress random data" inevitably leads (though, again, this is just my experience) to endless attempts to define random data. My preferred way out of that is to talk about algorithmic complexity but for your average "I've got a perfect compression algorithm" poster, that is step too far. I think "arbitrary data" (thereby including the results of compression by said algorithm) is the best way to make progress. >> Then you publish in a major journal. Post the link to the journal >> article when you are done. > > These days there are plenty of predatory journals which will be happy to take > Dancerswithnumber's money in return for publishing it in a junk > journal. Sure, but you usually get a huge advantage -- a text to criticise. Your average Usenet crank will keep changing what they say to avoid being pinned down. Plus you get to note the fact that the journal is junk. -- Ben. -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
Steve D'Aprano writes: > On Tue, 24 Oct 2017 06:46 pm, [email protected] wrote: > >> Greg, you're very smart, but you are missing a big key. I'm not padding, >> you are still thinking inside the box, and will never solve this by doing >> so. Yes! At least you see my accomplishment, this will compress any random >> file. > > Talk is cheap. But highly prized. Most Usenet cranks only want to be talked to (they see it as being taken seriously, no matter how rude the respondents are) so for the cost of something cheap (a little babbling) they get an endless stream of highly prized attention. -- Ben. -- https://mail.python.org/mailman/listinfo/python-list
Re: choice of web-framework
On Tue, Oct 24, 2017 at 4:14 AM, Chris Angelico wrote:
>
> (There are other ORMs than SQLAlchemy, of course; I can't recall the
> exact syntax for Django's off the top of my head, but it's going to be
> broadly similar to this.)
>
> ChrisA
> --
> https://mail.python.org/mailman/listinfo/python-list
>
I can help with that:
## Defining a model:
class Thing(models.Model):
"""
This is the "schema" for the `thing` table. The pk field is created
automatically and is called `id` by default. This table with have
four columns: `id`, `foo`, `baz`, and `score`.
"""
foo = models.Charfield(
max_length=140,
blank=False
)
baz = models.CharField(
max_length=140,
blank=True
)
score = models.IntegerField()
## Create an object:
new_thing = Thing.objects.create(foo="bar", baz="foo")
## Get a list of objects:
Thing.objects.all()
## Filter a list of objects:
Thing.objects.filter(foo="bar")
## Modify an object:
thing = Thing.objects.get(id=1)
thing.foo = "baz"
thing.save()
## Perform an aggregation:
data = Thing.objects.aggregate(avg=Avg("score"))
print(data)
>>> {"avg": 50}
## Django basic view(called controllers in other frameworks normally) and
template:
def person_list(request):
"""
Get a collection of `User` objects from the database.
"""
people = User.objects.filter(is_active=True).order_by("date_joined")
return render(
request,
"person/list.html",
context={"people": people}
)
Then, in `templates/person/list.html`:
{% extends 'base.html' %}
{% block content %}
{% for person in people %}
{{person.first_name}} {{person.last_name}}
{% endfor %}
{% endblock %}
Alternatives to Django's ORM and SQLAlchemy include but are not limited to:
- Peewee: https://github.com/coleifer/peewee
- PonyORM: https://ponyorm.com/
--
https://mail.python.org/mailman/listinfo/python-list
Re: choice of web-framework
In article , [email protected] says... > > On Tue, Oct 24, 2017 at 6:57 AM, Chris Warrick wrote: > > On 23 October 2017 at 21:37, John Black wrote: > >> Chris, thanks for all this detailed information. I am confused though > >> with your database recommendation. You say you teach SQLAlchemy but > >> generally use PostgreSQL yourself. I can maybe guess why there seems to > >> be this contradiction. Perhaps PostgreSQL is better but too advanced for > >> the class you are teaching? Can you clarify on which you think is the > >> better choice? Thanks. > > > > Different Chris, but I?ll answer. Those are two very different things. > > > > PostgreSQL is a database server. It talks SQL to clients, stores data, > > retrieves it when asked. The usual stuff a database server does. > > Alternatives: SQLite, MySQL, MS SQL, Oracle DB, ? > > > > SQLAlchemy is an ORM: an object-relational mapper, and also a database > > toolkit. SQLAlchemy can abstract multiple database servers/engines > > (PostgreSQL, SQLite, MySQL, etc.) and work with them from the same > > codebase. It can also hide SQL from you and instead give you Python > > classes. If you use an ORM like SQLAlchemy, you get database support > > without writing a single line of SQL on your own. But you still need a > > database engine ? PostgreSQL can be one of them. But you can deploy > > the same code to different DB engines, and it will just work? > > (assuming you didn?t use any DB-specific features). Alternatives: > > Django ORM. > > > > psycopg2 is an example of a PostgreSQL client library for Python. It > > implements the Python DB-API and lets you use it to talk to a > > PostgreSQL server. When using psycopg2, you?re responsible for writing > > your own SQL statements for the server to execute. In that approach, > > you?re stuck with PostgreSQL and psycopg2 unless you rewrite your code > > to be compatible with the other database/library. Alternatives (other > > DBs): sqlite3, mysqlclient. There are also other PostgreSQL libraries > > available. > > > > Thanks, namesake :) > > The above is correct and mostly accurate. It IS possible to switch out > your back end fairly easily, though, even with psycopg2; there's a > standard API that most Python database packages follow. As long as you > stick to standard SQL (no PostgreSQL extensions) and the standard API > (no psycopg2 extensions), switching databases is as simple as changing > your "import psycopg2" into "import cx_oracle" or something. (And, > most likely, changing your database credentials.) > > The point of an ORM is to make your databasing code look and feel like > Python code, rather than manually crafting SQL statements everywhere. > Here's how a simple database operation looks in SQLAlchemy: Thank you Chris and Chris! John Black -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
Steve D'Aprano writes: > But given an empty file, how do you distinguish the empty file you get > from 'music.mp3' and the identical empty file you get from 'movie.avi'? That's simple enough: of course one empty file would be "music.mp3.zip.zip.zip", while the other would be "movie.avi.zip.zip.zip.zip.zip"... some sort of https://en.wikipedia.org/wiki/Water_memory applied to file system entries :-) ciao, lele. -- nickname: Lele Gaifax | Quando vivrò di quello che ho pensato ieri real: Emanuele Gaifas | comincerò ad aver paura di chi mi copia. [email protected] | -- Fortunato Depero, 1929. -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
On 24/10/2017 16:40, Lele Gaifax wrote: Steve D'Aprano writes: But given an empty file, how do you distinguish the empty file you get from 'music.mp3' and the identical empty file you get from 'movie.avi'? That's simple enough: of course one empty file would be "music.mp3.zip.zip.zip", while the other would be I swear this looks like the lyrics of something or another... "Music MP3 - zip - zip - zip" TJG -- https://mail.python.org/mailman/listinfo/python-list
Re: right list for SIGABRT python binary question ?
On 22.10.2017 22:15, Karsten Hilbert wrote: > On Sat, Oct 21, 2017 at 07:10:31PM +0200, M.-A. Lemburg wrote: > >>> Running a debug build of py27 gave me a first lead: this >>> Debian system (Testing, upgraded all the way from various >>> releases ago) carries an incompatible mxDateTime which I'll >>> take care of. >>> >>> *** You don't have the (right) mxDateTime binaries installed ! >>> Traceback (most recent call last): >>> File "./bootstrap_gm_db_system.py", line 87, in >>> from Gnumed.pycommon import gmCfg2, gmPsql, gmPG2, gmTools, gmI18N >>> File >>> "/home/ncq/Projekte/gm-git/gnumed/gnumed/Gnumed/pycommon/gmPG2.py", line >>> 34, in >>> from Gnumed.pycommon import gmDateTime >>> File >>> "/home/ncq/Projekte/gm-git/gnumed/gnumed/Gnumed/pycommon/gmDateTime.py", >>> line 52, in >>> import mx.DateTime as mxDT >>> File "/usr/lib/python2.7/dist-packages/mx/DateTime/__init__.py", line >>> 8, in >>> from DateTime import * >>> File "/usr/lib/python2.7/dist-packages/mx/DateTime/DateTime.py", line >>> 9, in >>> from mxDateTime import * >>> File >>> "/usr/lib/python2.7/dist-packages/mx/DateTime/mxDateTime/__init__.py", line >>> 13, in >>> raise ImportError, why >>> ImportError: >>> /usr/lib/python2.7/dist-packages/mx/DateTime/mxDateTime/mxDateTime.so: >>> undefined symbol: Py_InitModule4 >> >> This error suggests that you have 32- and 64-bit versions of >> Python and mxDateTime mixed in your installation. >> >> Py_InitModule4 is only available in the 32-bit build of >> Python. With the 64-bit build, it's called Py_InitModule4_64. >> >> Since you're getting the same error from faulthandler, >> this is where I'd start to investigate. >> >> "nm" will list all exported and required symbols. As first step, >> you should probably check the python binary for its symbols and >> see whether it exports Py_InitModule* symbols. > > Thanks for your input ! > > The python2.7-dbg build is 32 bits: > > root@hermes:~# nm /usr/bin/python2.7-dbg | grep Py_InitM > 00155b9f T Py_InitModule4TraceRefs > > > python2.7-dbg: > Installiert: 2.7.14-2 > Installationskandidat: 2.7.14-2 > Versionstabelle: >*** 2.7.14-2 500 > 500 http://httpredir.debian.org/debian unstable/main i386 > Packages > 100 /var/lib/dpkg/status >2.7.13-2 990 > 500 http://httpredir.debian.org/debian stretch/main i386 > Packages > 990 http://httpredir.debian.org/debian buster/main i386 Packages > > The python2.7 build (no -dbg) does not have symbols. > > mxDateTime really should be 32 bits, too: > > python-egenix-mxdatetime: > Installiert: 3.2.9-1 > Installationskandidat: 3.2.9-1 > Versionstabelle: >*** 3.2.9-1 990 > 500 http://httpredir.debian.org/debian stretch/main i386 > Packages > 990 http://httpredir.debian.org/debian buster/main i386 Packages > 500 http://httpredir.debian.org/debian unstable/main i386 > Packages > 100 /var/lib/dpkg/status > > Let me check the .so file: > > root@hermes:~# nm > /usr/lib/python2.7/dist-packages/mx/DateTime/mxDateTime/mxDateTime_d.so | > grep Py_InitM >U Py_InitModule4TraceRefs > > It seems it is - hm ... Could you check whether you have similar import errors with other modules that have C extensions ? E.g. lxml. What you're seeing appears to be a compilation problem with Python 2.7.14 on Debian. The executable doesn't appear to export its symbols to the .so files, or only some of them. -- Marc-Andre Lemburg eGenix.com Professional Python Services directly from the Experts >>> Python Projects, Coaching and Consulting ... http://www.egenix.com/ >>> Python Database Interfaces ... http://products.egenix.com/ >>> Plone/Zope Database Interfaces ... http://zope.egenix.com/ : Try our mxODBC.Connect Python Database Interface for free ! :: eGenix.com Software, Skills and Services GmbH Pastor-Loeh-Str.48 D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg Registered at Amtsgericht Duesseldorf: HRB 46611 http://www.egenix.com/company/contact/ -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
On 2017-10-23 04:21, Steve D'Aprano wrote: > On Mon, 23 Oct 2017 02:29 pm, Stefan Ram wrote: >> > If the probability of certain codes (either single codes, or sequences of > codes) are non-equal, then you can take advantage of that by encoding the > common cases into a short representation, and the uncommon and rare cases > into a longer representation. As you say: > > >> Otherwise, if ( 0, 0 ) is much more frequent, >> we can encode ( 0, 0 ) by "0" and >> >> ( 0, 1 ) by "101", >> ( 1, 0 ) by "110", and >> ( 1, 1 ) by "111". >> >> And we could then use /less/ than two bits on the >> average. > > That's incorrect. On average you use 2.5 bits. > > (1*1 bit + 3*3 bits divide by four possible outcomes, makes 2.5 bits.) I disagree. If the distribution is not equal, then the average needs to take the different probabilities into account. Let's assume that (0, 0) has a probability of 90 %, (0, 1) a probability of 10 % and (1, 0) and (1, 1) a probability of 5 % each. Then the average length is 0.9 * 1 bit + 0.1 * 3 bits + 0.05 * 3 bits + 0.05 * 3 bits = 1.5 bits. hp -- _ | Peter J. Holzer| Fluch der elektronischen Textverarbeitung: |_|_) || Man feilt solange an seinen Text um, bis | | | [email protected] | die Satzbestandteile des Satzes nicht mehr __/ | http://www.hjp.at/ | zusammenpaßt. -- Ralph Babel -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
On Tue, 24 Oct 2017 14:51:37 +1100, Steve D'Aprano wrote: On Tue, 24 Oct 2017 01:27 pm, [email protected] wrote: > Yes! Decode reverse is easy..sorry so excited i could shout. Then this should be easy for you: http://marknelson.us/2012/10/09/the-random-compression-challenge-turns-ten/ All you need to do is compress this file: http://marknelson.us/attachments/million-digit-challenge/AMillionRandomDigits.bin to less than 415241 bytes, and you can win $100. Then, on Mon, 23 Oct 2017 21:13:00 -0700 (PDT), danceswithnumbers wrote: > I did that quite a while ago. But 352,954 kb > 415241 bytes, by several orders of magnitude; so you didn't "do that". (Or are we using the European decimal point?) If you're claiming 352,954 *bytes*, not kb, I invite you to explain why you have not collected Mark Nelson's $100 prize, and untold fame and glory; failing which, your credibility will evaporate. -- To email me, substitute nowhere->runbox, invalid->com. -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
On Tue, Oct 24, 2017 at 12:20 AM, Gregory Ewing wrote: > [email protected] wrote: >> >> I did that quite a while ago. 352,954 kb. > > > Are you sure? Does that include the size of all the > code, lookup tables, etc. needed to decompress it? My bet is that danceswithnumbers does indeed have a file of that size which is in some way derived from the million random digits, but without any means of losslessly "decompressing" it (thus making it junk data). -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
On Wed, 25 Oct 2017 02:40 am, Lele Gaifax wrote: > Steve D'Aprano writes: > >> But given an empty file, how do you distinguish the empty file you get >> from 'music.mp3' and the identical empty file you get from 'movie.avi'? > > That's simple enough: of course one empty file would be > "music.mp3.zip.zip.zip", while the other would be > "movie.avi.zip.zip.zip.zip.zip"... some sort of > https://en.wikipedia.org/wiki/Water_memory applied to file system entries > :-) Does that mean if I name an empty file serenity2-by-joss-whedon.avi.zip.zip.zip.zip.zip Dancerswithnumbers' magic algorithm will recreate the movie from some alternative universe where it actually exists? Awesome. -- Steve “Cheer up,” they said, “things could be worse.” So I cheered up, and sure enough, things got worse. -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
On Wed, 25 Oct 2017 07:09 am, Peter J. Holzer wrote: > On 2017-10-23 04:21, Steve D'Aprano wrote: >> On Mon, 23 Oct 2017 02:29 pm, Stefan Ram wrote: >>> >> If the probability of certain codes (either single codes, or sequences of >> codes) are non-equal, then you can take advantage of that by encoding the >> common cases into a short representation, and the uncommon and rare cases >> into a longer representation. As you say: >> >> >>> Otherwise, if ( 0, 0 ) is much more frequent, >>> we can encode ( 0, 0 ) by "0" and >>> >>> ( 0, 1 ) by "101", >>> ( 1, 0 ) by "110", and >>> ( 1, 1 ) by "111". >>> >>> And we could then use /less/ than two bits on the >>> average. >> >> That's incorrect. On average you use 2.5 bits. >> >> (1*1 bit + 3*3 bits divide by four possible outcomes, makes 2.5 bits.) > > I disagree. If the distribution is not equal, then the average needs to > take the different probabilities into account. I think I would call that the *weighted* average rather than the average. Regardless of what we call it, of course both you and Stefan are right in how to calculate it, and such a variable-length scheme can be used to compress the data. E.g. given the sequence 0011 which would take 8 bits in the obvious encoding, we can encode it as "000111" which takes only 6 bits. But the cost of this encoding scheme is that *some* bit sequences expand, e.g. the 8 bit sequence 1100 is encoded as "10" which requires 10 bits. The end result is that averaged over all possible bit sequences (of a certain size), this encoding scheme requires MORE space than the obvious 0/1 bits. But in practice we don't care much, because the data sequences we care about are usually not "all possible bit sequences", but a heavily restricted subset where there are lots of 00 pairs and fewer 01, 10, and 11 pairs. -- Steve “Cheer up,” they said, “things could be worse.” So I cheered up, and sure enough, things got worse. -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
On Wed, Oct 25, 2017 at 9:11 AM, Steve D'Aprano wrote: > On Wed, 25 Oct 2017 02:40 am, Lele Gaifax wrote: > >> Steve D'Aprano writes: >> >>> But given an empty file, how do you distinguish the empty file you get >>> from 'music.mp3' and the identical empty file you get from 'movie.avi'? >> >> That's simple enough: of course one empty file would be >> "music.mp3.zip.zip.zip", while the other would be >> "movie.avi.zip.zip.zip.zip.zip"... some sort of >> https://en.wikipedia.org/wiki/Water_memory applied to file system entries >> :-) > > > Does that mean if I name an empty file > > serenity2-by-joss-whedon.avi.zip.zip.zip.zip.zip > > Dancerswithnumbers' magic algorithm will recreate the movie from some > alternative universe where it actually exists? > > Awesome. Yes, but then you'd get dmca-takedown-request.pdf.zip.zip.zip.zip.zip.zip.zip which would also be empty. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: Compression of random binary data
On 10/24/17 6:30 PM, Steve D'Aprano wrote: On Wed, 25 Oct 2017 07:09 am, Peter J. Holzer wrote: On 2017-10-23 04:21, Steve D'Aprano wrote: On Mon, 23 Oct 2017 02:29 pm, Stefan Ram wrote: If the probability of certain codes (either single codes, or sequences of codes) are non-equal, then you can take advantage of that by encoding the common cases into a short representation, and the uncommon and rare cases into a longer representation. As you say: Otherwise, if ( 0, 0 ) is much more frequent, we can encode ( 0, 0 ) by "0" and ( 0, 1 ) by "101", ( 1, 0 ) by "110", and ( 1, 1 ) by "111". And we could then use /less/ than two bits on the average. That's incorrect. On average you use 2.5 bits. (1*1 bit + 3*3 bits divide by four possible outcomes, makes 2.5 bits.) I disagree. If the distribution is not equal, then the average needs to take the different probabilities into account. I think I would call that the *weighted* average rather than the average. Regardless of what we call it, of course both you and Stefan are right in how to calculate it, and such a variable-length scheme can be used to compress the data. E.g. given the sequence 0011 which would take 8 bits in the obvious encoding, we can encode it as "000111" which takes only 6 bits. But the cost of this encoding scheme is that *some* bit sequences expand, e.g. the 8 bit sequence 1100 is encoded as "10" which requires 10 bits. The end result is that averaged over all possible bit sequences (of a certain size), this encoding scheme requires MORE space than the obvious 0/1 bits. But in practice we don't care much, because the data sequences we care about are usually not "all possible bit sequences", but a heavily restricted subset where there are lots of 00 pairs and fewer 01, 10, and 11 pairs. My understanding of the 'Random Data Comprehensibility' challenge is that is requires that the compression take ANY/ALL strings of up to N bits, and generate an output stream no longer than the input stream, and sometime less. It admits that given no-uniformly distributed data, it is possible to compress some patterns, the common ones, and expand others, the uncommon ones, to lower the net average length. What it says can't be done is to have a compression method that compresses EVERY input pattern. That is where the 'Pigeon Hole' principle comes into play which the people who claim to be able to compress random data like to ignore or just attempt to say doesn't apply. -- https://mail.python.org/mailman/listinfo/python-list
h5py.File() gives error message
Dear list,
The following Python code gives an error message
# Python code starts here:
import numpy as np
import h5py
train_dataset = h5py.File('datasets/train_catvnoncat.h5', "r")
# Python code ends
The error message:
train_dataset = h5py.File('train_catvnoncat.h5', "r")
Traceback (most recent call last):
File "", line 1, in
File "/Users/M/anaconda/lib/python3.6/site-packages/h5py/_hl/files.py",
line 269, in __init__
fid = make_fid(name, mode, userblock_size, fapl, swmr=swmr)
File "/Users/M/anaconda/lib/python3.6/site-packages/h5py/_hl/files.py",
line 99, in make_fid
fid = h5f.open(name, flags, fapl=fapl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5f.pyx", line 78, in h5py.h5f.open
OSError: Unable to open file (unable to open file: name =
'train_catvnoncat.h5', errno = 2, error message = 'No such file or
directory', flags = 0, o_flags = 0)
My directory is correct, and the dataset folder with file is there.
Why error message? Is it h5py.File() or is it my file? Everything seems
pretty simple, what's going on?
Thank you!
--
https://mail.python.org/mailman/listinfo/python-list
Re: h5py.File() gives error message
On 10/24/2017 10:58 AM, C W wrote:
Dear list,
The following Python code gives an error message
# Python code starts here:
import numpy as np
import h5py
train_dataset = h5py.File('datasets/train_catvnoncat.h5', "r")
# Python code ends
The error message:
train_dataset = h5py.File('train_catvnoncat.h5', "r")
Traceback (most recent call last):
File "", line 1, in
File "/Users/M/anaconda/lib/python3.6/site-packages/h5py/_hl/files.py",
line 269, in __init__
fid = make_fid(name, mode, userblock_size, fapl, swmr=swmr)
File "/Users/M/anaconda/lib/python3.6/site-packages/h5py/_hl/files.py",
line 99, in make_fid
fid = h5f.open(name, flags, fapl=fapl)
File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
File "h5py/h5f.pyx", line 78, in h5py.h5f.open
OSError: Unable to open file (unable to open file: name =
'train_catvnoncat.h5', errno = 2, error message = 'No such file or
directory', flags = 0, o_flags = 0)
My directory is correct, and the dataset folder with file is there.
Why error message? Is it h5py.File() or is it my file? Everything seems
pretty simple, what's going on?
Thank you!
Be 100% sure your directory is correct. Try it again with an absolute
path to the file. Windows makes it far too easy for the working
directory of a program to be other than what you think it is.
--
Rob Gaddi, Highland Technology -- www.highlandtechnology.com
Email address domain is currently out of order. See above to fix.
--
https://mail.python.org/mailman/listinfo/python-list
Objects with __name__ attribute
Hi, I know two Python's objects which have an intrinsic name, classes and functions. def f(): pass f.__name__ 'f' g = f g.__name__ 'f' class Test: pass Test.__name__ 'Test' Test2 = Test Test2.__name__ 'Test' Are there others objects with a __name__ attribute and what is it used for ? Regards -- https://mail.python.org/mailman/listinfo/python-list
