Re: Compression of random binary data

2017-10-24 Thread Gregory Ewing

[email protected] wrote:


Compress this:

4135124325

Bin to dec...still very large
0110
0000
1101
01100101


Wait right there! You're cheating by dropping off leading
0 bits. The maximum value of a 10 digit decimal number is
99, which in hex is 2540be3ff. That's 34 bits.
That's in line with the theoretical number of bits needed:

log2(10) * 10 = 33.219

So the binary version of your number above is really

00
  0110
  0000
  1101
  01100101

You may think you can get away without storing or
transmitting those leading 0 bits, because the decoder
can always pad out the data as needed.

But to do that, the decoder needs to know *how many*
bits to pad out to. That information somehow needs to
be part of the encoded data.

You need to imagine you're sending the data to someone
over a wire. The only thing you can send along the wire
are ones and zeroes. You can't convey any information
by timing, or turning off the power, or anything like
that. How is the receiver going to know when he's got
the whole message?

There are only two possibilities. Either you decide
in advance that all messages will be exactly the same
length, which in this case means always sending
exactly 34 bits. Or you add some extra bits to the
message -- prepend the length in binary, or add an
end-of-message code at the end, or something like
that.

Whatever you do, you'll find that *on average* you
will need *at least* 34 bits to be able to represent
all possible 10-digit decimal numbers. Some might
be shorter, but then others will be longer, and
the average won't be less than 34.


New compression method:

11000101
11000111
0100

A full byte less than bin.


You need to be *very* careful about what you're claiming here.
Are you saying that your algorithm compresses *all possible*
sequences of 10 decimal digits to 3 bytes or less? Or can
some of them come out longer?

--
Greg
--
https://mail.python.org/mailman/listinfo/python-list


Re: Compression of random binary data

2017-10-24 Thread danceswithnumbers
Greg, you're  very smart, but you are missing a big key. I'm not padding, you 
are still thinking inside the box, and will never solve this by doing so. Yes! 
At least you see my accomplishment, this will compress any random file.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Compression of random binary data

2017-10-24 Thread Christian Gollwitzer

Am 23.10.17 um 12:13 schrieb Marko Rauhamaa:

Thomas Jollans :


On 2017-10-23 11:32, [email protected] wrote:

According to this website. This is an uncompressable stream.

https://en.m.wikipedia.org/wiki/Incompressible_string

12344321


No, it's not. According to that article, that string is incompressible
by a particular algorithm. I can see no more general claims.


Here's a compression algorithm that manages to compress that string into
a 0-bit string:

  * If the original string is 12344321 (whatever that means),
return the empty bit string.

  * Otherwise, prepend a don't-care bit to the original string and return
the result of the concatenation.



...and that's why there is the "Kolmogorov complexity". You need to 
append the decompression program to the data to show how much you really 
saved, which will turn out to be nothing compared to the "trivial 
decompressor"


print "12344321"

Christian
--
https://mail.python.org/mailman/listinfo/python-list


Re: Compression of random binary data

2017-10-24 Thread danceswithnumbers
No leading zeroes are being dropped offwish this board has an edit button.
-- 
https://mail.python.org/mailman/listinfo/python-list


[ANN] Nuclio: A scalable, open source, real-time processing platform

2017-10-24 Thread Miki Tebeka
Hi,

Just wanted to share a project I'm working on. It a super fast serverless that 
support Python handlers as well.

Check out more at https://www.iguazio.com/nuclio-new-serverless-superhero/
Code at https://github.com/nuclio/nuclio/

Happy hacking,
--
Miki
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Compression of random binary data

2017-10-24 Thread Gregory Ewing

[email protected] wrote:


My 8 year old can decode this back into base 10,


Keep in mind that your 8 year old has more information
than just the 32 bits you wrote down -- he can also
see that there *are* 32 bits and no more. That's
hidden information that you're not counting.

--
Greg
--
https://mail.python.org/mailman/listinfo/python-list


Re: Compression of random binary data

2017-10-24 Thread Gregory Ewing

Paul Moore wrote:

But that's not "compression", that's simply using a better encoding.
In the technical sense, "compression" is about looking at redundancies
that go beyond the case of how effectively you pack data into the
bytes available.


There may be a difference in the way the terms are used, but
I don't think there's any fundamental difference. Compression
is about finding clever ways to make the encoding better.

Either way, the information-theoretic limits on the number
of bits needed are the same.

--
Greg
--
https://mail.python.org/mailman/listinfo/python-list


Re: Compression of random binary data

2017-10-24 Thread Paul Moore
On 24 October 2017 at 09:43, Gregory Ewing  wrote:
> Paul Moore wrote:
>>
>> But that's not "compression", that's simply using a better encoding.
>> In the technical sense, "compression" is about looking at redundancies
>> that go beyond the case of how effectively you pack data into the
>> bytes available.
>
>
> There may be a difference in the way the terms are used, but
> I don't think there's any fundamental difference. Compression
> is about finding clever ways to make the encoding better.

Agreed - I was trying (probably futilely, given the way this thread
has gone...) to make a distinction between purely local properties
that are typically considered in "how you encode the data" and the
detection of more global patterns, which is where what are typically
referred to as "compression" algorithms get their power. But sadly, I
don't think the OP is actually interested in understanding the
background, so the distinction wasn't really worth making :-(

> Either way, the information-theoretic limits on the number
> of bits needed are the same.

Precisely.
Paul
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Compression of random binary data

2017-10-24 Thread Ben Bacarisse
[email protected] writes:

> Finally figured out how to turn this into a random binary compression
> program. Since my transform can compress more than dec to binary. Then
> i took a random binary stream,

Forget random data.  For one thing it's hard to define, but more
importantly no one cares about it.  By its very nature, random data is
not interesting.  What people want is a reversible compression algorithm
that works on *arbitrary data* -- i.e. on *any* file at all, no matter
how structured and *non-random* it is.

For example, run the complete works of Shakespeare through your program.
The result is very much not random data, but that's the sort of data
people want to compress.  If you can compress the output of your
compressor you have made a good start.  Of course what you really want
to be able to do is to compress the output that results from compressing
your compressed out.  And, of course, you should not stop there.  Since
you can compress *any* data (not just the boring random stuff) you can
keep going -- compressing the compressed output again and again until
you end up with a zero-length file.

Then you publish in a major journal.  Post the link to the journal
article when you are done.


-- 
Ben.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Compression of random binary data

2017-10-24 Thread Steve D'Aprano
On Tue, 24 Oct 2017 05:20 pm, Gregory Ewing wrote:

> [email protected] wrote:
>> I did that quite a while ago. 352,954 kb.
> 
> Are you sure? Does that include the size of all the
> code, lookup tables, etc. needed to decompress it?
> 
> But even if you have, you haven't disproved the theorem about
> compressing random data. All you have is a program that
> compresses *that particular* sequence of a million digits.
> 
> To disprove the theorem, you would need to exhibit an
> algorithm that can compress *any* sequence of a million
> digits to less than 415,241 bytes.

Indeed -- but let's give Dancerswithnumbers his due. *IF* he is right (a very
big "if" indeed) about being able to compress the Rand Corporation "Million
Random Digits" in binary form, as given, that alone would be an impressive
trick.

Compressing the digits in text form is not impressive in the least. As Ben
Bacarisse pointed out, most of us will probably already have half a dozen
programs that do that.

 

-- 
Steve
“Cheer up,” they said, “things could be worse.” So I cheered up, and sure
enough, things got worse.

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Compression of random binary data

2017-10-24 Thread Paul Moore
On 24 October 2017 at 11:23, Ben Bacarisse  wrote:
> For example, run the complete works of Shakespeare through your program.
> The result is very much not random data, but that's the sort of data
> people want to compress.  If you can compress the output of your
> compressor you have made a good start.  Of course what you really want
> to be able to do is to compress the output that results from compressing
> your compressed out.  And, of course, you should not stop there.  Since
> you can compress *any* data (not just the boring random stuff) you can
> keep going -- compressing the compressed output again and again until
> you end up with a zero-length file.

Oh, and just for fun, if you are able to guarantee compressing
arbitrary data, then

1. Take a document you want to compress.
2. Compress it using your magic algorithm. The result is smaller.
3. Compress the compressed data. The result is still smaller.
4. Repeat until you hit 0 bytes.

Congratulations - apparently you have a reversible algorithm that
compresses every data set to an empty file. (Caveat - there's actually
"hidden data" here, as you need to know how many compressions it takes
to hit 0 bytes. Because you decrease the size every time, though, that
number must be no greater than the size of the original file).

Paul
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Compression of random binary data

2017-10-24 Thread Ben Bacarisse
Paul Moore  writes:

> On 24 October 2017 at 11:23, Ben Bacarisse  wrote:
>> For example, run the complete works of Shakespeare through your program.
>> The result is very much not random data, but that's the sort of data
>> people want to compress.  If you can compress the output of your
>> compressor you have made a good start.  Of course what you really want
>> to be able to do is to compress the output that results from compressing
>> your compressed out.  And, of course, you should not stop there.  Since
>> you can compress *any* data (not just the boring random stuff) you can
>> keep going -- compressing the compressed output again and again until
>> you end up with a zero-length file.
>
> Oh, and just for fun, if you are able to guarantee compressing
> arbitrary data, then

It's a small point, but you are replying to a post of mine and saying
"you".  That could make people think that /I/ am claiming to have a perfect
compression algorithm.

> 1. Take a document you want to compress.
> 2. Compress it using your magic algorithm. The result is smaller.
> 3. Compress the compressed data. The result is still smaller.
> 4. Repeat until you hit 0 bytes.

Isn't this just repeating what I said?  I must has not written is
clearly enough.


-- 
Ben.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: choice of web-framework

2017-10-24 Thread Chris Angelico
On Tue, Oct 24, 2017 at 6:57 AM, Chris Warrick  wrote:
> On 23 October 2017 at 21:37, John Black  wrote:
>> Chris, thanks for all this detailed information.  I am confused though
>> with your database recommendation.  You say you teach SQLAlchemy but
>> generally use PostgreSQL yourself.  I can maybe guess why there seems to
>> be this contradiction.  Perhaps PostgreSQL is better but too advanced for
>> the class you are teaching?  Can you clarify on which you think is the
>> better choice?  Thanks.
>
> Different Chris, but I’ll answer. Those are two very different things.
>
> PostgreSQL is a database server. It talks SQL to clients, stores data,
> retrieves it when asked. The usual stuff a database server does.
> Alternatives: SQLite, MySQL, MS SQL, Oracle DB, …
>
> SQLAlchemy is an ORM: an object-relational mapper, and also a database
> toolkit. SQLAlchemy can abstract multiple database servers/engines
> (PostgreSQL, SQLite, MySQL, etc.) and work with them from the same
> codebase. It can also hide SQL from you and instead give you Python
> classes. If you use an ORM like SQLAlchemy, you get database support
> without writing a single line of SQL on your own. But you still need a
> database engine — PostgreSQL can be one of them. But you can deploy
> the same code to different DB engines, and it will just work™
> (assuming you didn’t use any DB-specific features). Alternatives:
> Django ORM.
>
> psycopg2 is an example of a PostgreSQL client library for Python. It
> implements the Python DB-API and lets you use it to talk to a
> PostgreSQL server. When using psycopg2, you’re responsible for writing
> your own SQL statements for the server to execute. In that approach,
> you’re stuck with PostgreSQL and psycopg2 unless you rewrite your code
> to be compatible with the other database/library. Alternatives (other
> DBs): sqlite3, mysqlclient. There are also other PostgreSQL libraries
> available.
>

Thanks, namesake :)

The above is correct and mostly accurate. It IS possible to switch out
your back end fairly easily, though, even with psycopg2; there's a
standard API that most Python database packages follow. As long as you
stick to standard SQL (no PostgreSQL extensions) and the standard API
(no psycopg2 extensions), switching databases is as simple as changing
your "import psycopg2" into "import cx_oracle" or something. (And,
most likely, changing your database credentials.)

The point of an ORM is to make your databasing code look and feel like
Python code, rather than manually crafting SQL statements everywhere.
Here's how a simple database operation looks in SQLAlchemy:

def spammify(id):
person = session.query(Person).get(id)
person.style = "spam"
session.commit()

Here's the equivalent using psycopg2:

def spammify(id):
with db, db.cursor() as cur:
cur.execute("update people set style='spam' where id=%s", id)

With SQLAlchemy, you ask for a particular record, and you get back an
object. That object has attributes for all the person's information,
and you can both read and write those attributes. Then you commit when
you're done. Without SQLAlchemy, you use another language (SQL),
embedded within your Python code.

The choice is mostly one of style and preference. But if you don't
currently have a preference, I would recommend using an ORM.

(There are other ORMs than SQLAlchemy, of course; I can't recall the
exact syntax for Django's off the top of my head, but it's going to be
broadly similar to this.)

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Compression of random binary data

2017-10-24 Thread Paul Moore
On 24 October 2017 at 12:04, Ben Bacarisse  wrote:
> Paul Moore  writes:
>
>> On 24 October 2017 at 11:23, Ben Bacarisse  wrote:
>>> For example, run the complete works of Shakespeare through your program.
>>> The result is very much not random data, but that's the sort of data
>>> people want to compress.  If you can compress the output of your
>>> compressor you have made a good start.  Of course what you really want
>>> to be able to do is to compress the output that results from compressing
>>> your compressed out.  And, of course, you should not stop there.  Since
>>> you can compress *any* data (not just the boring random stuff) you can
>>> keep going -- compressing the compressed output again and again until
>>> you end up with a zero-length file.
>>
>> Oh, and just for fun, if you are able to guarantee compressing
>> arbitrary data, then
>
> It's a small point, but you are replying to a post of mine and saying
> "you".  That could make people think that /I/ am claiming to have a perfect
> compression algorithm.

Sorry. I intended the meaning "If one is able to..." but I was unclear. My bad.

>> 1. Take a document you want to compress.
>> 2. Compress it using your magic algorithm. The result is smaller.
>> 3. Compress the compressed data. The result is still smaller.
>> 4. Repeat until you hit 0 bytes.
>
> Isn't this just repeating what I said?  I must has not written is
> clearly enough.

More accurately, I didn't read it carefully enough. Again sorry.

However, I guess it serves as an example of a compression algorithm -
we can trivially compress the content of our two posts into a single
post with just as much information content, by deleting my post :-)

Paul
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Compression of random binary data

2017-10-24 Thread Steve D'Aprano
On Tue, 24 Oct 2017 09:23 pm, Ben Bacarisse wrote:

> Forget random data.  For one thing it's hard to define, 

That bit is true.

> but more importantly no one cares about it.

But that's wrong.

For instance:

- Encrypted data looks very much like random noise. With more and more data
  traversing the internet in encrypted form, the ability to compress random
  noise would be worth billions.

- Compressed data looks somewhat like random noise (with a bit of structure).
  The more it is compressed, the more random it looks. If you could compress
  random noise, you could take already compressed data, and compress it again,
  saving even more space.

- Many multimedia formats (images, sound, video) are compressed using
  dedicated encoders. The better the encoder, the more it compresses the data
  (whether lossy or not) the harder it is to compress it further. If you could
  compress random noise, you could compress JPGs, MP3s, h265-encoded MKVs,
  etc, saving even more storage and transmission costs.

And most importantly:

- Random data is a superset of the arbitrary structured data you mention
  below. If we could compress random data, then we could compress any data
  at all, no matter how much or little structure it contained.

This is why the ability to compress random data (if it were possible, which it
is not) is interesting. Its not because people want to be able to compress
last night's lottery numbers, or tables of random digits.


> By its very nature, random data is 
> not interesting.  What people want is a reversible compression algorithm
> that works on *arbitrary data* -- i.e. on *any* file at all, no matter
> how structured and *non-random* it is.

In a sense you are right. Compressing randomly generated data would be a
parlour trick and not specifically very useful. But if you had such an
algorithm, that would change the face of the world.

It would be as revolutionary and paradigm breaking as a perpetual motion
machine, or discovery of a new continent the size of China in the middle of
the Atlantic, or that π actually does equal 22/7 exactly.

And just as impossible.


> For example, run the complete works of Shakespeare through your program.
> The result is very much not random data, but that's the sort of data
> people want to compress.  If you can compress the output of your
> compressor you have made a good start.  Of course what you really want
> to be able to do is to compress the output that results from compressing
> your compressed out.  And, of course, you should not stop there. Since 
> you can compress *any* data (not just the boring random stuff) you can
> keep going -- compressing the compressed output again and again until
> you end up with a zero-length file.

Indeed.

That proof by contradiction is yet another reason we know we can't compress
random data -- that is to say, *arbitrary* data. If we had a compression
program which could guarantee to ALWAYS shrink ANY file by at least one bit,
then we could just apply it over and over again, shrinking the compressed
file again and again, until we were left with a zero-byte file:

original.dat = 110 MB
original.dat.zip.zip.zip.zip.zip.zip.zip = 0 MB

And then reverse the process, to turn an empty file back into the original.

But given an empty file, how do you distinguish the empty file you get
from 'music.mp3' and the identical empty file you get from 'movie.avi'?

Obviously you cannot. So we know that the only way to *guarantee* to shrink
every possible file is if the compression is lossy.


> Then you publish in a major journal.  Post the link to the journal
> article when you are done.

These days there are plenty of predatory journals which will be happy to take
Dancerswithnumber's money in return for publishing it in a junk journal.

https://en.wikipedia.org/wiki/Predatory_open_access_publishing



-- 
Steve
“Cheer up,” they said, “things could be worse.” So I cheered up, and sure
enough, things got worse.

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Compression of random binary data

2017-10-24 Thread Steve D'Aprano
On Tue, 24 Oct 2017 06:46 pm, [email protected] wrote:

> Greg, you're  very smart, but you are missing a big key. I'm not padding,
> you are still thinking inside the box, and will never solve this by doing
> so. Yes! At least you see my accomplishment, this will compress any random
> file.

Talk is cheap.



-- 
Steve
“Cheer up,” they said, “things could be worse.” So I cheered up, and sure
enough, things got worse.

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Installing tkinter on FreeBSD

2017-10-24 Thread Stephan Houben
Op 2017-10-23, Thomas Jollans schreef :
> On 24/10/17 00:16, Dick Holmes wrote:
>> I am trying to use tkinter on a FreeBSD system but the installed 
>> versions of Python (2.7 and 3.6) don't have thinter configured. I tried 
>> to download the source (no binaries available for FreeBSD).

What version of FreeBSD is that? 
On 11.1 I get:

$ pkg search tkinter
py27-tkinter-2.7.14_6  Python bindings to the Tk widget set (Python 2.7)
py34-tkinter-3.4.7_6   Python bindings to the Tk widget set (Python 3.4)
py35-tkinter-3.5.4_6   Python bindings to the Tk widget set (Python 3.5)
py36-tkinter-3.6.2_6   Python bindings to the Tk widget set (Python 3.6)
pypy-tkinter-5.8.0 PyPy bindings to the Tk widget set

and for sure installing py36-tkinter-3.6.2_6 works fine.

Stephan
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Compression of random binary data

2017-10-24 Thread Ben Bacarisse
Steve D'Aprano  writes:

> On Tue, 24 Oct 2017 09:23 pm, Ben Bacarisse wrote:
>
>> Forget random data.  For one thing it's hard to define, 
>
> That bit is true.
>
>> but more importantly no one cares about it.
>
> But that's wrong.

All generalisations are false.  I was being hyperbolic.

> For instance:
>
> - Encrypted data looks very much like random noise. With more and more data
>   traversing the internet in encrypted form, the ability to compress random
>   noise would be worth billions.
>
> - Compressed data looks somewhat like random noise (with a bit of structure).
>   The more it is compressed, the more random it looks. If you could compress
>   random noise, you could take already compressed data, and compress it again,
>   saving even more space.
>
> - Many multimedia formats (images, sound, video) are compressed using
>   dedicated encoders. The better the encoder, the more it compresses the data
>   (whether lossy or not) the harder it is to compress it further. If you could
>   compress random noise, you could compress JPGs, MP3s, h265-encoded MKVs,
>   etc, saving even more storage and transmission costs.

But these are not random data.  We care about these because they are are
highly structured, non-random data.

> And most importantly:
>
> - Random data is a superset of the arbitrary structured data you mention
>   below. If we could compress random data, then we could compress any data
>   at all, no matter how much or little structure it contained.

Yes, that's part of my point.  Arbitrary data includes random data but
it avoids arguments about what random means.

> This is why the ability to compress random data (if it were possible, which it
> is not) is interesting. Its not because people want to be able to compress
> last night's lottery numbers, or tables of random digits.

The trouble is a pedagogic one.  Saying "you can't compress random data"
inevitably leads (though, again, this is just my experience) to endless
attempts to define random data.  My preferred way out of that is to talk
about algorithmic complexity but for your average "I've got a perfect
compression algorithm" poster, that is step too far.

I think "arbitrary data" (thereby including the results of compression
by said algorithm) is the best way to make progress.


>> Then you publish in a major journal.  Post the link to the journal
>> article when you are done.
>
> These days there are plenty of predatory journals which will be happy to take
> Dancerswithnumber's money in return for publishing it in a junk
> journal.

Sure, but you usually get a huge advantage -- a text to criticise.  Your
average Usenet crank will keep changing what they say to avoid being
pinned down.  Plus you get to note the fact that the journal is junk.

-- 
Ben.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Compression of random binary data

2017-10-24 Thread Ben Bacarisse
Steve D'Aprano  writes:

> On Tue, 24 Oct 2017 06:46 pm, [email protected] wrote:
>
>> Greg, you're  very smart, but you are missing a big key. I'm not padding,
>> you are still thinking inside the box, and will never solve this by doing
>> so. Yes! At least you see my accomplishment, this will compress any random
>> file.
>
> Talk is cheap.

But highly prized.  Most Usenet cranks only want to be talked to (they
see it as being taken seriously, no matter how rude the respondents are)
so for the cost of something cheap (a little babbling) they get an
endless stream of highly prized attention.

-- 
Ben.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: choice of web-framework

2017-10-24 Thread justin walters
On Tue, Oct 24, 2017 at 4:14 AM, Chris Angelico  wrote:

>
> (There are other ORMs than SQLAlchemy, of course; I can't recall the
> exact syntax for Django's off the top of my head, but it's going to be
> broadly similar to this.)
>
> ChrisA
> --
> https://mail.python.org/mailman/listinfo/python-list
>


I can help with that:

## Defining a model:

class Thing(models.Model):
"""
This is the "schema" for the `thing` table. The pk field is created
automatically and is called `id` by default. This table with have
four columns: `id`, `foo`, `baz`, and `score`.
"""
foo = models.Charfield(
max_length=140,
blank=False
)
baz = models.CharField(
max_length=140,
blank=True
)
score = models.IntegerField()

## Create an object:

new_thing = Thing.objects.create(foo="bar", baz="foo")

## Get a list of objects:

Thing.objects.all()

## Filter a list of objects:

Thing.objects.filter(foo="bar")

## Modify an object:

thing = Thing.objects.get(id=1)
thing.foo = "baz"
thing.save()

## Perform an aggregation:

data = Thing.objects.aggregate(avg=Avg("score"))
print(data)
>>> {"avg": 50}

## Django basic view(called controllers in other frameworks normally) and
template:

def person_list(request):
"""
Get a collection of `User` objects from the database.
"""
people = User.objects.filter(is_active=True).order_by("date_joined")
return render(
request,
"person/list.html",
context={"people": people}
)


Then, in `templates/person/list.html`:

{% extends 'base.html' %}

{% block content %}

{% for person in people %}

{{person.first_name}} {{person.last_name}}

{% endfor %}

{% endblock %}


Alternatives to Django's ORM and SQLAlchemy include but are not limited to:

- Peewee: https://github.com/coleifer/peewee
- PonyORM: https://ponyorm.com/
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: choice of web-framework

2017-10-24 Thread John Black
In article , 
[email protected] says...
> 
> On Tue, Oct 24, 2017 at 6:57 AM, Chris Warrick  wrote:
> > On 23 October 2017 at 21:37, John Black  wrote:
> >> Chris, thanks for all this detailed information.  I am confused though
> >> with your database recommendation.  You say you teach SQLAlchemy but
> >> generally use PostgreSQL yourself.  I can maybe guess why there seems to
> >> be this contradiction.  Perhaps PostgreSQL is better but too advanced for
> >> the class you are teaching?  Can you clarify on which you think is the
> >> better choice?  Thanks.
> >
> > Different Chris, but I?ll answer. Those are two very different things.
> >
> > PostgreSQL is a database server. It talks SQL to clients, stores data,
> > retrieves it when asked. The usual stuff a database server does.
> > Alternatives: SQLite, MySQL, MS SQL, Oracle DB, ?
> >
> > SQLAlchemy is an ORM: an object-relational mapper, and also a database
> > toolkit. SQLAlchemy can abstract multiple database servers/engines
> > (PostgreSQL, SQLite, MySQL, etc.) and work with them from the same
> > codebase. It can also hide SQL from you and instead give you Python
> > classes. If you use an ORM like SQLAlchemy, you get database support
> > without writing a single line of SQL on your own. But you still need a
> > database engine ? PostgreSQL can be one of them. But you can deploy
> > the same code to different DB engines, and it will just work?
> > (assuming you didn?t use any DB-specific features). Alternatives:
> > Django ORM.
> >
> > psycopg2 is an example of a PostgreSQL client library for Python. It
> > implements the Python DB-API and lets you use it to talk to a
> > PostgreSQL server. When using psycopg2, you?re responsible for writing
> > your own SQL statements for the server to execute. In that approach,
> > you?re stuck with PostgreSQL and psycopg2 unless you rewrite your code
> > to be compatible with the other database/library. Alternatives (other
> > DBs): sqlite3, mysqlclient. There are also other PostgreSQL libraries
> > available.
> >
> 
> Thanks, namesake :)
> 
> The above is correct and mostly accurate. It IS possible to switch out
> your back end fairly easily, though, even with psycopg2; there's a
> standard API that most Python database packages follow. As long as you
> stick to standard SQL (no PostgreSQL extensions) and the standard API
> (no psycopg2 extensions), switching databases is as simple as changing
> your "import psycopg2" into "import cx_oracle" or something. (And,
> most likely, changing your database credentials.)
> 
> The point of an ORM is to make your databasing code look and feel like
> Python code, rather than manually crafting SQL statements everywhere.
> Here's how a simple database operation looks in SQLAlchemy:

Thank you Chris and Chris!

John Black
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Compression of random binary data

2017-10-24 Thread Lele Gaifax
Steve D'Aprano  writes:

> But given an empty file, how do you distinguish the empty file you get
> from 'music.mp3' and the identical empty file you get from 'movie.avi'?

That's simple enough: of course one empty file would be
"music.mp3.zip.zip.zip", while the other would be
"movie.avi.zip.zip.zip.zip.zip"... some sort of
https://en.wikipedia.org/wiki/Water_memory applied to file system entries :-)

ciao, lele.
-- 
nickname: Lele Gaifax | Quando vivrò di quello che ho pensato ieri
real: Emanuele Gaifas | comincerò ad aver paura di chi mi copia.
[email protected]  | -- Fortunato Depero, 1929.

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Compression of random binary data

2017-10-24 Thread Tim Golden

On 24/10/2017 16:40, Lele Gaifax wrote:

Steve D'Aprano  writes:


But given an empty file, how do you distinguish the empty file you get
from 'music.mp3' and the identical empty file you get from 'movie.avi'?


That's simple enough: of course one empty file would be
"music.mp3.zip.zip.zip", while the other would be


I swear this looks like the lyrics of something or another...

"Music MP3 - zip - zip - zip"

TJG
--
https://mail.python.org/mailman/listinfo/python-list


Re: right list for SIGABRT python binary question ?

2017-10-24 Thread M.-A. Lemburg


On 22.10.2017 22:15, Karsten Hilbert wrote:
> On Sat, Oct 21, 2017 at 07:10:31PM +0200, M.-A. Lemburg wrote:
> 
>>> Running a debug build of py27 gave me a first lead: this
>>> Debian system (Testing, upgraded all the way from various
>>> releases ago) carries an incompatible mxDateTime which I'll
>>> take care of.
>>>
>>> *** You don't have the (right) mxDateTime binaries installed !
>>> Traceback (most recent call last):
>>>   File "./bootstrap_gm_db_system.py", line 87, in 
>>> from Gnumed.pycommon import gmCfg2, gmPsql, gmPG2, gmTools, gmI18N
>>>   File 
>>> "/home/ncq/Projekte/gm-git/gnumed/gnumed/Gnumed/pycommon/gmPG2.py", line 
>>> 34, in 
>>> from Gnumed.pycommon import gmDateTime
>>>   File 
>>> "/home/ncq/Projekte/gm-git/gnumed/gnumed/Gnumed/pycommon/gmDateTime.py", 
>>> line 52, in 
>>> import mx.DateTime as mxDT
>>>   File "/usr/lib/python2.7/dist-packages/mx/DateTime/__init__.py", line 
>>> 8, in 
>>> from DateTime import *
>>>   File "/usr/lib/python2.7/dist-packages/mx/DateTime/DateTime.py", line 
>>> 9, in 
>>> from mxDateTime import *
>>>   File 
>>> "/usr/lib/python2.7/dist-packages/mx/DateTime/mxDateTime/__init__.py", line 
>>> 13, in 
>>> raise ImportError, why
>>> ImportError: 
>>> /usr/lib/python2.7/dist-packages/mx/DateTime/mxDateTime/mxDateTime.so: 
>>> undefined symbol: Py_InitModule4
>>
>> This error suggests that you have 32- and 64-bit versions of
>> Python and mxDateTime mixed in your installation.
>>
>> Py_InitModule4 is only available in the 32-bit build of
>> Python. With the 64-bit build, it's called Py_InitModule4_64.
>>
>> Since you're getting the same error from faulthandler,
>> this is where I'd start to investigate.
>>
>> "nm" will list all exported and required symbols. As first step,
>> you should probably check the python binary for its symbols and
>> see whether it exports Py_InitModule* symbols.
> 
> Thanks for your input !
> 
> The python2.7-dbg build is 32 bits:
> 
>   root@hermes:~# nm /usr/bin/python2.7-dbg | grep Py_InitM
>   00155b9f T Py_InitModule4TraceRefs
> 
> 
>   python2.7-dbg:
> Installiert:   2.7.14-2
> Installationskandidat: 2.7.14-2
> Versionstabelle:
>*** 2.7.14-2 500
>   500 http://httpredir.debian.org/debian unstable/main i386 
> Packages
>   100 /var/lib/dpkg/status
>2.7.13-2 990
>   500 http://httpredir.debian.org/debian stretch/main i386 
> Packages
>   990 http://httpredir.debian.org/debian buster/main i386 Packages
> 
> The python2.7 build (no -dbg) does not have symbols.
> 
> mxDateTime really should be 32 bits, too:
> 
>   python-egenix-mxdatetime:
> Installiert:   3.2.9-1
> Installationskandidat: 3.2.9-1
> Versionstabelle:
>*** 3.2.9-1 990
>   500 http://httpredir.debian.org/debian stretch/main i386 
> Packages
>   990 http://httpredir.debian.org/debian buster/main i386 Packages
>   500 http://httpredir.debian.org/debian unstable/main i386 
> Packages
>   100 /var/lib/dpkg/status
> 
> Let me check the .so file:
> 
>   root@hermes:~# nm 
> /usr/lib/python2.7/dist-packages/mx/DateTime/mxDateTime/mxDateTime_d.so  | 
> grep Py_InitM
>U Py_InitModule4TraceRefs
> 
> It seems it is - hm ...

Could you check whether you have similar import errors with
other modules that have C extensions ? E.g. lxml.

What you're seeing appears to be a compilation problem
with Python 2.7.14 on Debian. The executable doesn't appear
to export its symbols to the .so files, or only some of them.

-- 
Marc-Andre Lemburg
eGenix.com

Professional Python Services directly from the Experts
>>> Python Projects, Coaching and Consulting ...  http://www.egenix.com/
>>> Python Database Interfaces ...   http://products.egenix.com/
>>> Plone/Zope Database Interfaces ...   http://zope.egenix.com/


: Try our mxODBC.Connect Python Database Interface for free ! ::

   eGenix.com Software, Skills and Services GmbH  Pastor-Loeh-Str.48
D-40764 Langenfeld, Germany. CEO Dipl.-Math. Marc-Andre Lemburg
   Registered at Amtsgericht Duesseldorf: HRB 46611
   http://www.egenix.com/company/contact/
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Compression of random binary data

2017-10-24 Thread Peter J. Holzer
On 2017-10-23 04:21, Steve D'Aprano  wrote:
> On Mon, 23 Oct 2017 02:29 pm, Stefan Ram wrote:
>>
> If the probability of certain codes (either single codes, or sequences of
> codes) are non-equal, then you can take advantage of that by encoding the
> common cases into a short representation, and the uncommon and rare cases
> into a longer representation. As you say:
>
>
>>   Otherwise, if ( 0, 0 ) is much more frequent,
>>   we can encode ( 0, 0 ) by "0" and
>> 
>> ( 0, 1 ) by "101",
>> ( 1, 0 ) by "110", and
>> ( 1, 1 ) by "111".
>> 
>>   And we could then use /less/ than two bits on the
>>   average. 
>
> That's incorrect. On average you use 2.5 bits.
>
> (1*1 bit + 3*3 bits divide by four possible outcomes, makes 2.5 bits.)

I disagree. If the distribution is not equal, then the average needs to
take the different probabilities into account. 

Let's assume that (0, 0) has a probability of 90 %, (0, 1) a probability
of 10 % and (1, 0) and (1, 1) a probability of 5 % each. 

Then the average length is 

0.9 * 1 bit + 0.1 * 3 bits + 0.05 * 3 bits + 0.05 * 3 bits = 1.5 bits.

hp


-- 
   _  | Peter J. Holzer| Fluch der elektronischen Textverarbeitung:
|_|_) || Man feilt solange an seinen Text um, bis
| |   | [email protected] | die Satzbestandteile des Satzes nicht mehr
__/   | http://www.hjp.at/ | zusammenpaßt. -- Ralph Babel
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Compression of random binary data

2017-10-24 Thread Peter Pearson
On Tue, 24 Oct 2017 14:51:37 +1100, Steve D'Aprano wrote:
  On Tue, 24 Oct 2017 01:27 pm, [email protected] wrote:
  > Yes! Decode reverse is easy..sorry so excited i could shout.

  Then this should be easy for you:

  http://marknelson.us/2012/10/09/the-random-compression-challenge-turns-ten/

  All you need to do is compress this file:

  
http://marknelson.us/attachments/million-digit-challenge/AMillionRandomDigits.bin

  to less than 415241 bytes, and you can win $100.

Then, on Mon, 23 Oct 2017 21:13:00 -0700 (PDT), danceswithnumbers wrote:
> I did that quite a while ago. 


But 352,954 kb > 415241 bytes, by several orders of magnitude; so
you didn't "do that".  (Or are we using the European decimal point?)

If you're claiming 352,954 *bytes*, not kb, I invite you to explain
why you have not collected Mark Nelson's $100 prize, and untold fame
and glory; failing which, your credibility will evaporate.

-- 
To email me, substitute nowhere->runbox, invalid->com.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Compression of random binary data

2017-10-24 Thread Ian Kelly
On Tue, Oct 24, 2017 at 12:20 AM, Gregory Ewing
 wrote:
> [email protected] wrote:
>>
>> I did that quite a while ago. 352,954 kb.
>
>
> Are you sure? Does that include the size of all the
> code, lookup tables, etc. needed to decompress it?

My bet is that danceswithnumbers does indeed have a file of that size
which is in some way derived from the million random digits, but
without any means of losslessly "decompressing" it (thus making it
junk data).
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Compression of random binary data

2017-10-24 Thread Steve D'Aprano
On Wed, 25 Oct 2017 02:40 am, Lele Gaifax wrote:

> Steve D'Aprano  writes:
> 
>> But given an empty file, how do you distinguish the empty file you get
>> from 'music.mp3' and the identical empty file you get from 'movie.avi'?
> 
> That's simple enough: of course one empty file would be
> "music.mp3.zip.zip.zip", while the other would be
> "movie.avi.zip.zip.zip.zip.zip"... some sort of
> https://en.wikipedia.org/wiki/Water_memory applied to file system entries
> :-)


Does that mean if I name an empty file 

serenity2-by-joss-whedon.avi.zip.zip.zip.zip.zip

Dancerswithnumbers' magic algorithm will recreate the movie from some
alternative universe where it actually exists?

Awesome.


-- 
Steve
“Cheer up,” they said, “things could be worse.” So I cheered up, and sure
enough, things got worse.

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Compression of random binary data

2017-10-24 Thread Steve D'Aprano
On Wed, 25 Oct 2017 07:09 am, Peter J. Holzer wrote:

> On 2017-10-23 04:21, Steve D'Aprano  wrote:
>> On Mon, 23 Oct 2017 02:29 pm, Stefan Ram wrote:
>>>
>> If the probability of certain codes (either single codes, or sequences of
>> codes) are non-equal, then you can take advantage of that by encoding the
>> common cases into a short representation, and the uncommon and rare cases
>> into a longer representation. As you say:
>>
>>
>>>   Otherwise, if ( 0, 0 ) is much more frequent,
>>>   we can encode ( 0, 0 ) by "0" and
>>> 
>>> ( 0, 1 ) by "101",
>>> ( 1, 0 ) by "110", and
>>> ( 1, 1 ) by "111".
>>> 
>>>   And we could then use /less/ than two bits on the
>>>   average.
>>
>> That's incorrect. On average you use 2.5 bits.
>>
>> (1*1 bit + 3*3 bits divide by four possible outcomes, makes 2.5 bits.)
> 
> I disagree. If the distribution is not equal, then the average needs to
> take the different probabilities into account.

I think I would call that the *weighted* average rather than the average.

Regardless of what we call it, of course both you and Stefan are right in how
to calculate it, and such a variable-length scheme can be used to compress
the data.

E.g. given the sequence 0011 which would take 8 bits in the obvious
encoding, we can encode it as "000111" which takes only 6 bits.

But the cost of this encoding scheme is that *some* bit sequences expand, e.g.
the 8 bit sequence 1100 is encoded as "10" which requires 10
bits.

The end result is that averaged over all possible bit sequences (of a certain
size), this encoding scheme requires MORE space than the obvious 0/1 bits.

But in practice we don't care much, because the data sequences we care about
are usually not "all possible bit sequences", but a heavily restricted subset
where there are lots of 00 pairs and fewer 01, 10, and 11 pairs.



-- 
Steve
“Cheer up,” they said, “things could be worse.” So I cheered up, and sure
enough, things got worse.

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Compression of random binary data

2017-10-24 Thread Chris Angelico
On Wed, Oct 25, 2017 at 9:11 AM, Steve D'Aprano
 wrote:
> On Wed, 25 Oct 2017 02:40 am, Lele Gaifax wrote:
>
>> Steve D'Aprano  writes:
>>
>>> But given an empty file, how do you distinguish the empty file you get
>>> from 'music.mp3' and the identical empty file you get from 'movie.avi'?
>>
>> That's simple enough: of course one empty file would be
>> "music.mp3.zip.zip.zip", while the other would be
>> "movie.avi.zip.zip.zip.zip.zip"... some sort of
>> https://en.wikipedia.org/wiki/Water_memory applied to file system entries
>> :-)
>
>
> Does that mean if I name an empty file
>
> serenity2-by-joss-whedon.avi.zip.zip.zip.zip.zip
>
> Dancerswithnumbers' magic algorithm will recreate the movie from some
> alternative universe where it actually exists?
>
> Awesome.

Yes, but then you'd get
dmca-takedown-request.pdf.zip.zip.zip.zip.zip.zip.zip which would also
be empty.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Compression of random binary data

2017-10-24 Thread Richard Damon

On 10/24/17 6:30 PM, Steve D'Aprano wrote:

On Wed, 25 Oct 2017 07:09 am, Peter J. Holzer wrote:


On 2017-10-23 04:21, Steve D'Aprano  wrote:

On Mon, 23 Oct 2017 02:29 pm, Stefan Ram wrote:



If the probability of certain codes (either single codes, or sequences of
codes) are non-equal, then you can take advantage of that by encoding the
common cases into a short representation, and the uncommon and rare cases
into a longer representation. As you say:



   Otherwise, if ( 0, 0 ) is much more frequent,
   we can encode ( 0, 0 ) by "0" and

( 0, 1 ) by "101",
( 1, 0 ) by "110", and
( 1, 1 ) by "111".

   And we could then use /less/ than two bits on the
   average.


That's incorrect. On average you use 2.5 bits.

(1*1 bit + 3*3 bits divide by four possible outcomes, makes 2.5 bits.)


I disagree. If the distribution is not equal, then the average needs to
take the different probabilities into account.


I think I would call that the *weighted* average rather than the average.

Regardless of what we call it, of course both you and Stefan are right in how
to calculate it, and such a variable-length scheme can be used to compress
the data.

E.g. given the sequence 0011 which would take 8 bits in the obvious
encoding, we can encode it as "000111" which takes only 6 bits.

But the cost of this encoding scheme is that *some* bit sequences expand, e.g.
the 8 bit sequence 1100 is encoded as "10" which requires 10
bits.

The end result is that averaged over all possible bit sequences (of a certain
size), this encoding scheme requires MORE space than the obvious 0/1 bits.

But in practice we don't care much, because the data sequences we care about
are usually not "all possible bit sequences", but a heavily restricted subset
where there are lots of 00 pairs and fewer 01, 10, and 11 pairs.



My understanding of the 'Random Data Comprehensibility' challenge is 
that is requires that the compression take ANY/ALL strings of up to N 
bits, and generate an output stream no longer than the input stream, and 
sometime less. It admits that given no-uniformly distributed data, it is 
possible to compress some patterns, the common ones, and expand others, 
the uncommon ones, to lower the net average length. What it says can't 
be done is to have a compression method that compresses EVERY input 
pattern. That is where the 'Pigeon Hole' principle comes into play which 
the people who claim to be able to compress random data like to ignore 
or just attempt to say doesn't apply.


--
https://mail.python.org/mailman/listinfo/python-list


h5py.File() gives error message

2017-10-24 Thread C W
Dear list,

The following Python code gives an error message

# Python code starts here:
import numpy as np
import h5py
train_dataset = h5py.File('datasets/train_catvnoncat.h5', "r")

# Python code ends

The error message:

train_dataset = h5py.File('train_catvnoncat.h5', "r")
Traceback (most recent call last):
  File "", line 1, in 
  File "/Users/M/anaconda/lib/python3.6/site-packages/h5py/_hl/files.py",
line 269, in __init__
fid = make_fid(name, mode, userblock_size, fapl, swmr=swmr)
  File "/Users/M/anaconda/lib/python3.6/site-packages/h5py/_hl/files.py",
line 99, in make_fid
fid = h5f.open(name, flags, fapl=fapl)
  File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
  File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
  File "h5py/h5f.pyx", line 78, in h5py.h5f.open
OSError: Unable to open file (unable to open file: name =
'train_catvnoncat.h5', errno = 2, error message = 'No such file or
directory', flags = 0, o_flags = 0)

My directory is correct, and the dataset folder with file is there.

Why error message? Is it h5py.File() or is it my file? Everything seems
pretty simple, what's going on?

Thank you!
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: h5py.File() gives error message

2017-10-24 Thread Rob Gaddi

On 10/24/2017 10:58 AM, C W wrote:

Dear list,

The following Python code gives an error message

# Python code starts here:
import numpy as np
import h5py
train_dataset = h5py.File('datasets/train_catvnoncat.h5', "r")

# Python code ends

The error message:

train_dataset = h5py.File('train_catvnoncat.h5', "r")
Traceback (most recent call last):
   File "", line 1, in 
   File "/Users/M/anaconda/lib/python3.6/site-packages/h5py/_hl/files.py",
line 269, in __init__
 fid = make_fid(name, mode, userblock_size, fapl, swmr=swmr)
   File "/Users/M/anaconda/lib/python3.6/site-packages/h5py/_hl/files.py",
line 99, in make_fid
 fid = h5f.open(name, flags, fapl=fapl)
   File "h5py/_objects.pyx", line 54, in h5py._objects.with_phil.wrapper
   File "h5py/_objects.pyx", line 55, in h5py._objects.with_phil.wrapper
   File "h5py/h5f.pyx", line 78, in h5py.h5f.open
OSError: Unable to open file (unable to open file: name =
'train_catvnoncat.h5', errno = 2, error message = 'No such file or
directory', flags = 0, o_flags = 0)

My directory is correct, and the dataset folder with file is there.

Why error message? Is it h5py.File() or is it my file? Everything seems
pretty simple, what's going on?

Thank you!



Be 100% sure your directory is correct.  Try it again with an absolute 
path to the file.  Windows makes it far too easy for the working 
directory of a program to be other than what you think it is.


--
Rob Gaddi, Highland Technology -- www.highlandtechnology.com
Email address domain is currently out of order.  See above to fix.
--
https://mail.python.org/mailman/listinfo/python-list


Objects with __name__ attribute

2017-10-24 Thread ast

Hi,

I know two Python's objects which have an intrinsic 
name, classes and functions.


def f():
   pass


f.__name__

'f'

g = f
g.__name__

'f'

class Test:
   pass


Test.__name__

'Test'

Test2 = Test
Test2.__name__

'Test'

Are there others objects with a __name__ attribute
and what is it used for ?

Regards



--
https://mail.python.org/mailman/listinfo/python-list