Re: Mapping with continguous ranges of keys

2016-12-16 Thread Terry Reedy

On 12/15/2016 4:30 PM, Thomas Nyberg wrote:

On 12/15/2016 12:48 PM, Terry Reedy wrote:

On 12/15/2016 12:27 PM, Thomas Nyberg wrote:


I haven't dealt with a data structure exactly like this, but it's
basically a sparse array.


A sparse array has at least half missing values.  This one has none on
the defined domain, but contiguous dupicates.



I'm sorry for devolving into semantics, but there certainly isn't a
single definition of "sparse array" out there. For example, the
definition in wikipedia (https://en.wikipedia.org/wiki/Sparse_array)
doesn't agree with you:


I think it does ;-).


"In computer science, a sparse array is an array in which most of the
elements have the default value (usually 0 or null)."


Let's devolve to a memory-based language like C. An physical array 
consisting of sequential memory locations must have *some* bit pattern 
in every byte and hence in every multibyte block.  If I remember 
correctly, C malloc initialized bytes to all 0 bits, which is an int 
value of 0 also.  If there is no meaningful value, there must be a 
default value that means 'missing'.  0 may mean 'missing'. Null values 
are by definition non-values.  Python uses None as a Null.  If the 
example had been populated largely with Nones, I would have called it 
'sparse'.


> Personally my usage of sparse arrays in scipy has _always_ had all
> defined values it's just that the default value was 0. I never deal
> with "missing" values.

Lucky you.  In statistics and analysis of real, experimental data, 
missing values are often a possibility.  They are always a nuisance, 
sometimes a major one.  When the data are represented by arrays,  rather 
than by some compacted form, some value has to be chosen to represent 
the absence of of data.


--
Terry Jan Reedy

--
https://mail.python.org/mailman/listinfo/python-list


Re: Best attack order for groups of numbers trying to destroy each other, given a victory chance for number to number attack.

2016-12-16 Thread skybuck2000
Hi thanks for your (Dennis) reply,

You seem to have two main questions:

1. Is the attack order sequential in time or is it positional ?

The answer to this question is kinda:

Both, (which is kinda funny, since you did not think of this possibility that 
it could be both, which you also did with the other issue, see below ;))

It indicates which object should be attacked first. And ofcourse objects have a 
position in the real world, so this also implies position to some degree.

2. Do the numbers mean types or instances ?

Again the answer to this question is kinda:

Both, (^<-which is the real funny one, because the original model where there 
were only 4 types and 7 instances was still too large to compute within 
reasonable time. So the number of instances has been brought back to the number 
of types to represent each type exactly once so to at least keep the model 
somewhat realistic and include all types which is kinda important. I am 
particularly interested in type 4 so it has to be in the model :) Type 4 can 
attack the other 3 types with easy this gives you some hint at what kind of 
type it might be ;)).

(So there are 4 types, and 4 instances, which happen to overlap each other, so 
you can consider them the same thing type=instance (to simplify the program)).

These were your two most important questions, however you seem to have some 
additional (sub) questions which I will also try and answer.


3. One of your questions was: why are the indexes called "index1,index2" and so 
forth. This is simply because there are 8 objects. Each object needs it's own 
index to create a brute force algorithm which can create all combinations of 
the object attack patterns. The code for this is pretty simple/straight forward 
so I will mention it below, I will also rename these indexes as you requested 
to give them more meaning, pascal style:

for FriendlyIndex1 := 1 to 24 do
for FriendlyIndex2 := 1 to 24 do
for FriendlyIndex3 := 1 to 24 do
for FriendlyIndex4 := 1 to 24 do
for EnemyIndex1 := 1 to 24 do
for EnemyIndex2 := 1 to 24 do
for EnemyIndex3 := 1 to 24 do
for EnemyIndex4 := 1 to 24 do
// ComputeCombat( 
FriendlyIndex1,FriendlyIndex2,FriendlyIndex3,FriendlyIndex4, 
EnemyIndex1,EnemyIndex2,EnemyIndex3,EnemyIndex4)

So these indexes are simply 8 integer variables used to generate all possible 
attack orders (permutation number).

Each index can then be used to look up the actual permutation and used in 
combat...

So this iteration code is a big help to make sure and produce all combinations, 
it's simple, it's effective, it's clear to what it's doing.

ComputeCombat could be a routine or simply replaced with actual code to prevent 
"variable sharing issues". My suggestion in case you ever do try to write code 
for it is to keep everything "global" or simply inside a single routine... so
that parameter hell or whatever doesn't occur... keep it as simple as possible 
for a first version... then later it can be enhance with nice python features. 
Ofcourse if you think the problem is simple enough to try using more advanced 
features at first you welcome to try that as well but I would find it very 
funny if that fails... so when it does fail... fall back to ultra-simple code 
:) Perhaps python doesn't even have ultra simple data structures which might 
actually complexify your capability of solving this problem, which would be 
somewhat of an interesting conclusion regarding the python language as a whole 
! Consider yourself challenged by me to solve it and prove that Python is not a 
bad language ! LOL :) Yes little bit trollish but for good reason. Eventually I 
do suspect python might be of some help... at least you mentioned it can 
generate permutations... but that is a sub problem in this case...

4. There was one more somewhat interesting sub question, your question seems to 
be about the attack/victory table, you seem to wonder about 
symetrical/asymetrical, and why it would not be symetrical ?

I think I can explain this issue for you. The reason why it's not symetrical is 
that tanks for example have different armor thickness depending on their angle. 
So let's say Tank X sneaks up on Tank Y and Tank X manages to hit the Tank X in 
the back... then Tank X's victory chance is much higher then if Tank X was 
sneaked up on by Tank Y... in that case Tank X's victory chance would be much 
lower.

I think this is the main reason why you are seeing an asymterical victory 
chance table. I hope that clears it up for you ;)

I think I have solved all your confusion, you should now have enough 
information to write a computer program that could solve the problem. Solving 
the problem entirely would not be necessary for you to share any possible 
python enhancements or pseudo code or own techniques or (computing efficiency) 
ideas you would have come up with it. J

Re: Best attack order for groups of numbers trying to destroy each other, given a victory chance for number to number attack.

2016-12-16 Thread skybuck2000
A nice chinese-like saying might have come out of this, perhaps it even already 
exists:

"When you confused, the answer might be fused !" :)

Sounds like TZen Su ! That dude from Art of War or something like that :)


I will take this chance to correct a little pretty obvious typo corrected:

(*) Tank X changed to Tank Y

So let's say Tank X sneaks up on Tank Y and Tank X manages to hit the (*) Tank 
Y in the back... then Tank X's victory chance is much higher then if Tank X was 
sneaked up on by Tank Y... in that case Tank X's victory chance would be much 
lower.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Mapping with continguous ranges of keys

2016-12-16 Thread Steve D'Aprano
On Fri, 16 Dec 2016 04:06 am, Steve D'Aprano wrote:

> I have some key:value data where the keys often are found in contiguous
> ranges with identical values.
[...]


Thank you to everyone who gave suggestions!

I have experimented with two solutions, and hope somebody might be able to
suggest some improvements. Attached is the test code I ran, suggestions for
improving the performance will be appreciated.

I decided on these two data structures:

(1) The naive solution: don't worry about duplicate values, or the fact that
keys come in contiguous groups, and just store each individual key and
value in a dict; this is faster but uses more memory.

(2) The clever solution: use a pair of lists, one holding the starting value
of each group of keys, and the other holding the common values. This saves
a lot of memory, but is slower.

A concrete example might help. Suppose I have 15 keys in five groups:

D = {0: 10,
 1: 20, 2: 20,
 3: 30, 4: 30, 5: 30,
 6: 40, 7: 40, 8: 40, 9: 40,
 10: 50, 11: 50, 12: 50, 13: 50, 14: 50}

(Remember, in reality I could have as many as a million or two keys. This is
just a simple toy example.)

Instead of using a dict, I also set up a pair of lists:

L = [0, 1, 3, 6, 10, 15]  # starting value of each group
V = [10, 20, 30, 40, 50]  # value associated with each group

Note that the first list has one extra item, namely the number one past the
final group.

I can do a key look-up using either of these:

D[key]

V[bisect.bisect_right(L, i) - 1]


I tested the memory consumption and speed of these two solutions with
(approximately) one million keys. I found:

- the dict solution uses a lot more memory, about 24 bytes per key, compared
to the pair of lists solution, which is about 0.3 bytes per key;

- but on my computer, dict lookups are about 3-4 times faster.


Any suggestions for improving the speed of the binary search version, or the
memory consumption of the dict?

By the way: the particular pattern of groups in the sample code (groups of
size 1, 2, 3, ... up to 50, then repeating from 1, 2, 3, ... again) is just
demo. In my real data, the sizes of the groups are all over the place, in
an unpredictable pattern.



Thanks in advance.


-- 
Steve
“Cheer up,” they said, “things could be worse.” So I cheered up, and sure
enough, things got worse.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: OT - "Soft" ESC key on the new MacBook Pro

2016-12-16 Thread Matt Wheeler
On Wed, 14 Dec 2016, 01:06 Skip Montanaro,  wrote:

I know this isn't a Python-specific question, but i got zero useful
responses from the help-gnu-emacs list for some reason. I think this
expression should not evaluate to the empty set:

set(python_programmers) & set(emacs_users) & set(new_macbookpro_owners)


So I fit 2 of these, but am a vim user rather than emacs.

Hopefully there are a few people out there who've had an opportunity to try
the new ESC-less Pros with the Touch Bar for an extended period of time.
Does the lack of a physical ESC key create problems for people, especially
Emacs users?


No. I think as long as you're happy with the keyboard the esc key is fine.
I happen to like short travel keyboards, and while the lack of physical
feedback from the touch bar when hitting esc was slightly jarring initially
I got used to it within a day.
That short travel is worth bearing in mind in general though, I know a lot
of people say they don't like that.

Just to head off a couple suggestions people might be inclined to make...
Yes, I know I can use C-[ or the Alt key instead of ESC. I can remap other
keys like Caps Lock or the back tick. I can also buy some other laptop with
a true ESC key, or buy a 13-inch MBP sans Touch Bar. I do plan to try out


The 13" without touch bar only has 2 usb-c ports, which I know is a whole
other discussion but I've found the 2 ports on a side of the 15" a bit too
cramped for some of my accessories.

Emacs on a Touch-Bar-equipped MacBook Pro at the Apple Store, but a few
minutes horsing around in the din of holiday shopping isn't the same as an
extended test drive in my usual environment.

So, for those of you who've tried it, does the lack of a physical ESC key
create problems?


So... Not for me, but obviously with the caveats above. Giving it a try in
an Apple store is definitely a good idea :)

-- 

--
Matt Wheeler
http://funkyh.at
-- 
https://mail.python.org/mailman/listinfo/python-list


Last call for the Call For Proposals of PythonFOSDEM 2017

2016-12-16 Thread Stephane Wirtel via Python-list
Hello, this week-end is the last two days for the Call For Proposals of 
PythonFOSDEM 2017. We have received a lot of topics, but if you want to 
become a speaker and that you have a very cool topic to submit, please 
don't hesite and send us your proposal.


Deadline is 2016-12-18.

Stephane


Call For Proposals
==

This is the official call for sessions for the Python devroom at FOSDEM 2017.

FOSDEM is the Free and Open source Software Developers' European Meeting, a free
and non-commercial two-day week-end that offers open source contributors a place
to meet, share ideas and collaborate.

It's the biggest event in Europe with +5000 hackers, +400 speakers.

For this edition, Python will be represented by its Community.
If you want to discuss with a lot of Python Users, it's the place to be!

Important dates
===

* Submission deadlines: 2016-12-18
* Acceptance notifications: 2016-12-23

Practical
=

* The duration for talks will be 30 minutes, including presentations and
questions and answers.
* Presentation can be recorded and streamed, sending your proposal implies
giving permission to be recorded.
* A mailing list for the Python devroom is available for discussions about
devroom organisation. You can register at this address:
https://lists.fosdem.org/listinfo/python-devroom

How to submit
=

All submissions are made in the Pentabarf event planning tool at
https://penta.fosdem.org/submission/FOSDEM17

When submitting your talk in Pentabarf, make sure to select the Python devroom
as the Track.

Of course, if you already have a user account, please reuse it.

Questions
=

Any questions, please send an email to info AT python-fosdem DOT org

Thank you for submitting your sessions and see you soon in Brussels to talk
about Python.

If you want to keep informed for this edition, you can follow our twitter
account @PythonFOSDEM.

* FOSDEM 2017: https://fosdem.org/2017
* Python Devroom: http://python-fosdem.org
* Twitter: https://twitter.com/PythonFOSDEM


Stephane

--
Stéphane Wirtel - http://wirtel.be - @matrixise
--
https://mail.python.org/mailman/listinfo/python-list


Re: Reading python list as a newsgroup (was ...)

2016-12-16 Thread Grant Edwards
On 2016-12-16, Dennis Lee Bieber  wrote:
> On Thu, 15 Dec 2016 13:34:32 -0500, Terry Reedy  declaimed 
> the following:
>>
>>If you want to read python-list as a news group, you might try 
>>news.gmane.org.  About once a year, it goes down for a few hours to a 
>>day, but has otherwise been dependable.  I found it when my ISP dropped 
>>newsgroup access around a decade ago.
>
> And I found it when the spam on comp.lang.python got overly
> annoying.

I didn't notice much spam on c.l.p (but then again, I filter out all
posts from google-groups).  The problem on c.l.p that caused me to
switch to gmane's NNTP server was the number of broken references.
They both have references that break in various scenarios, but my
tests indicated that it was worse on c.l.p.  [Yes, I actually wrote a
Python program that used an NNTP client library to check all the
reference values in a large sample of posts from both sources.]

-- 
Grant Edwards   grant.b.edwardsYow! I wonder if I could
  at   ever get started in the
  gmail.comcredit world?

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Mapping with continguous ranges of keys

2016-12-16 Thread Thomas Nyberg

On 12/15/2016 11:57 PM, Terry Reedy wrote:

On 12/15/2016 4:30 PM, Thomas Nyberg wrote:

On 12/15/2016 12:48 PM, Terry Reedy wrote:

A sparse array has at least half missing values.  This one has none on
the defined domain, but contiguous dupicates.



I'm sorry for devolving into semantics, but there certainly isn't a
single definition of "sparse array" out there. For example, the
definition in wikipedia (https://en.wikipedia.org/wiki/Sparse_array)
doesn't agree with you:


I think it does ;-).


"In computer science, a sparse array is an array in which most of the
elements have the default value (usually 0 or null)."


...

>

Personally my usage of sparse arrays in scipy has _always_ had all
defined values it's just that the default value was 0. I never deal
with "missing" values.


Lucky you.  In statistics and analysis of real, experimental data,
missing values are often a possibility.  They are always a nuisance,
sometimes a major one.  When the data are represented by arrays,  rather
than by some compacted form, some value has to be chosen to represent
the absence of of data.



I thought that the definitions were different because yours said "at 
least half" and the other said "most" in addition to your saying 
"missing values" and the other said "default value". Ignoring the first 
minor point, if by "missing value" you basically mean "default value", 
then yes I agree that the definition is the same. Also I didn't mean 
that I was "lucky" to never have dealt with missing values (strictly 
speaking I have, I just do it rarely), but my point was that I deal with 
sparse matricies/arrays/structures quite often, but I rarely deal with 
"missing values" like nulls/nans/Nones. But that point is moot given 
that you apparently meant the same thing with "missing value" as I did 
with "default value" so I don't think we disagree here.


Taking a step back, the real point of sparse anything is: can you 
represent it in a space/speed efficient manner which is "stable" under 
whatever operations you care about. I.e. if you have a default value of 
0, then you have the property that 0 + x = x and 0 * x = 0 which means 
that sparse matrices with default value 0 are stable under addition and 
multiplication. If you have a default value of nan (or None or null) you 
usually have the convention that nan * x = nan (possibly depends on the 
sign of x) and nan + x = nan which means that a sparse matrix with nans 
as default is also stable under addition and multiplication. If you 
chose a default value of (say) 1 you would run into issues with the 
stability of these operations. It wouldn't mean it's not "sparse", but 
it would mean that the operations you care about might not work as well.


The extension to the thread at and is just that now you have multiple 
default values and the fact that they are not assigned at random, but 
are instead runs of constant values means that you can put a sparse 
structure on this (with a suitable definition of "sparse").


> Let's devolve to a memory-based language like C. An physical array
> consisting of sequential memory locations must have *some* bit pattern
> in every byte and hence in every multibyte block.  If I remember
> correctly, C malloc initialized bytes to all 0 bits, which is an int
> value of 0 also.  If there is no meaningful value, there must be a
> default value that means 'missing'.  0 may mean 'missing'. Null values
> are by definition non-values.  Python uses None as a Null.  If the
> example had been populated largely with Nones, I would have called it
> 'sparse'.

It's not really super pertinent to this discussion, but malloc does not 
guarantee that the values are zeroed. That is guaranteed by calloc though:


http://man7.org/linux/man-pages/man3/malloc.3.html
http://man7.org/linux/man-pages/man3/calloc.3p.html

Also the point with a sparse representation is that your default value 
wouldn't exist in memory anywhere and that instead the operations would 
understand that it exists by other factors. For example, a sparse matrix 
with all 0s might* be represented by rows of the form


i,j,v

where i and j are the row and column indices and v is the value at the 
position. So in this representation we would have as many rows as we 
would non-default values.


*I say might because there are usually more compact forms like this. 
This isn't the internal representation of scipy.sparse.csr_matrix, for 
example.


Regardless, I think I wrote too much to say basically that I don't think 
we're really disagreeing except possibly slightly on perspective.


Cheers,
Thomas
--
https://mail.python.org/mailman/listinfo/python-list


Re: Reading python list as a newsgroup (was ...)

2016-12-16 Thread D'Arcy Cain

On 2016-12-16 10:11 AM, Grant Edwards wrote:

I didn't notice much spam on c.l.p (but then again, I filter out all
posts from google-groups).  The problem on c.l.p that caused me to


Yes, blocking GG was the biggest improvement to my reading this list 
(mailing list in my case).  That and a few judicious PLONKs and it is 
now useful.


Al I need to do now is figure out how to filter out the almost daily 
Windows installer questions.


--
D'Arcy J.M. Cain
System Administrator, Vex.Net
http://www.Vex.Net/ IM:[email protected]
VoIP: sip:[email protected]
--
https://mail.python.org/mailman/listinfo/python-list


Re: Reading python list as a newsgroup (was ...)

2016-12-16 Thread Tim Golden

On 16/12/2016 16:12, D'Arcy Cain wrote:

On 2016-12-16 10:11 AM, Grant Edwards wrote:

I didn't notice much spam on c.l.p (but then again, I filter out all
posts from google-groups).  The problem on c.l.p that caused me to


Yes, blocking GG was the biggest improvement to my reading this list
(mailing list in my case).  That and a few judicious PLONKs and it is
now useful.

Al I need to do now is figure out how to filter out the almost daily
Windows installer questions.



The ever-so-slight irony is that Mailman decided it didn't like this 
post and held it for moderation! (Something to do with the headers; I 
didn't bother to check since I recognised the sender).


TJG
--
https://mail.python.org/mailman/listinfo/python-list


Re: Mapping with continguous ranges of keys

2016-12-16 Thread mbg1708
On Thursday, 15 December 2016 17:06:39 UTC, Steve D'Aprano  wrote:
> I have some key:value data where the keys often are found in contiguous
> ranges with identical values. For example:
> 
> {1: "foo",
>  2: "foo",
>  3: "foo",
>  # same for keys 4 through 99
>  100: "foo",
>  101: "bar",
>  102: "bar",
>  103: "foobar",
>  104: "bar",
>  105: "foo", 
> }
> ...
> -- 
> Steve

All the answers seem to rely on in-memory solutions.  But isn't the problem a 
classic data design problem (cf Codd) with two tables.

CREATE TABLE keys   (idINTEGER NOT NULL PRIMARY KEY,
 kkey  INTEGER,
 UNIQUE (kkey) );
## eg id = 999, kkey=101
CREATE TABLE values (idINTEGER NOT NULL PRIMARY KEY,
 k_id  INTEGER,
 value VARCHAR,
 UNIQUE (k_id, value),
 FOREIGN KEY (k_id) REFERENCES keys(id));
## eg k_id = 999, value = "bar"

For example, Python/SQLITE can parse the list of key:value pairs.  key is 
looked up in keys -- and  a keys row is added if the key is new.  The keys id 
is saved.

k_id--value pair is looked up -- and a row is added if the pair is new.

Some of this can be simplified by relying on SQL to handle non-UNIQUE errors.
This approach may be slower than in-memory processing, but it has almost no 
database size limits.

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Mapping with continguous ranges of keys

2016-12-16 Thread Chris Angelico
On Fri, Dec 16, 2016 at 4:06 AM, Steve D'Aprano
 wrote:
> I have about a million or two keys, with a few hundred or perhaps a few
> thousand distinct values. The size of each contiguous group of keys with
> the same value can vary from 1 to perhaps a hundred or so.

Going right back to the beginning here: I think that "a million or
two" is a perfectly doable figure for a straight-forward list or dict.
You get immediately-available lookups in a straight-forward way, at
the cost of maybe 16MB of memory (if you use the same strings for the
values, the cost is just the pointers).

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Mapping with continguous ranges of keys

2016-12-16 Thread John Pote


On 16/12/2016 14:27, Steve D'Aprano wrote:

(2) The clever solution: use a pair of lists, one holding the starting value
of each group of keys, and the other holding the common values. This saves
a lot of memory, but is slower.

A concrete example might help. Suppose I have 15 keys in five groups:

D = {0: 10,
  1: 20, 2: 20,
  3: 30, 4: 30, 5: 30,
  6: 40, 7: 40, 8: 40, 9: 40,
  10: 50, 11: 50, 12: 50, 13: 50, 14: 50}

(Remember, in reality I could have as many as a million or two keys. This is
just a simple toy example.)

Instead of using a dict, I also set up a pair of lists:

L = [0, 1, 3, 6, 10, 15]  # starting value of each group
V = [10, 20, 30, 40, 50]  # value associated with each group
I misread the problem (skipped the comment about keys 4 - 99) and 
assumed there might be gaps between the contiguous blocks so thought of 
the list structure

[  ( (firstKeyN, lastKeyN), "value" ),
...
]
At the cost of more memory keeping first and last keys numbers in tuples 
in the L list would mean there is only one L lookup at the expence of 
two additional tuple lookups. Are small tuple lookups quicker than 1 
large list lookup? If not stick with the simple L and V you show above.


I've never used any form of tree structure, perhaps someone else could 
comment of the performance of balanced trees as compared to simple 
lists. Would the insertion cost be too high in keeping the tree balanced?


As to speeding up access with only L and V lists the binary search must 
be optimal unless specialised knowledge about the distribution of the 
size of the contiguous groups is made use of. But you say there is no 
pattern to the group sizes.


Other thought is to have a smaller pre-index list, search this to find 
the range of L indexes the key is in. If the pre-index list had a 1000 
entries then each entry would cover 1/1000 of the L list which narrows 
the binary search space in L considerably. The cost of something like 
this is keeping the pre-index list up to date when new keys are added 
and extra time to code and test it.

preList struct [ (firstKeyN, lastKeyN), (firstLindex, lastLindex),
...
  ]

Reminds me of Jackson's first two rules on optimisation, 1 - don't do 
it, 2 - don't do it yet

Thanks for an interesting problem.

Note that the first list has one extra item, namely the number one past the
final group.

I can do a key look-up using either of these:

 D[key]

 V[bisect.bisect_right(L, i) - 1]


I tested the memory consumption and speed of these two solutions with
(approximately) one million keys. I found:

- the dict solution uses a lot more memory, about 24 bytes per key, compared
to the pair of lists solution, which is about 0.3 bytes per key;

- but on my computer, dict lookups are about 3-4 times faster.


Any suggestions for improving the speed of the binary search version, or the
memory consumption of the dict?

By the way: the particular pattern of groups in the sample code (groups of
size 1, 2, 3, ... up to 50, then repeating from 1, 2, 3, ... again) is just
demo. In my real data, the sizes of the groups are all over the place, in
an unpredictable pattern.



Thanks in advance.




--
https://mail.python.org/mailman/listinfo/python-list


Cache memory and its effect on list searching

2016-12-16 Thread sohcahtoa82
Alternatively...why you should definitely use binary searches:

Python 3.5.2+ (default, Aug 30 2016, 19:08:42) 
[GCC 6.2.0 20160822] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import hashlib
>>> import timeit
>>> hashes =  [hashlib.md5(bytes(str(i), "utf-8")).hexdigest() for i in 
>>> range(1000)]
>>> sortedhashes = sorted(hashes)
>>> timeit.timeit('"x" in hashes', globals={'hashes': hashes}, number=10)
1.9478233020054176
>>> timeit.timeit('"x" in hashes', globals={'hashes': sortedhashes}, number=10)
18.001392804995703

I thought this was curious behavior.  I created a list of random-looking 
strings, then made a sorted copy.  I then found that using "in" to see if a 
string exists in the sorted list took over 9 times as long!

At first, I thought since both lists are the same size, and the 'in' test is a 
linear search, shouldn't they take the same amount of time?  Even if there was 
some trickery with branch prediction happening, that would have benefited the 
sorted list.  Then I remembered how lists work in Python.  The original list is 
going to be mostly contiguous in memory, making the memory cache quite 
effective.  When I create the sorted copy, I'm creating a list of references to 
strings that are all over the place in memory, causing tons of cache misses.

Of course, the best solution was to implement a binary search.  That turned the 
membership check into a 300-700 microsecond operation.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Cache memory and its effect on list searching

2016-12-16 Thread Chris Angelico
On Sat, Dec 17, 2016 at 1:20 PM,   wrote:
> I thought this was curious behavior.  I created a list of random-looking 
> strings, then made a sorted copy.  I then found that using "in" to see if a 
> string exists in the sorted list took over 9 times as long!
>

My numbers replicate yours (though my computer's faster). But my
conclusion is different:

Python 3.7.0a0 (default:ecd218c41cd4, Dec 16 2016, 03:08:47)
[GCC 6.2.0 20161027] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import hashlib
>>> import timeit
>>> hashes =  [hashlib.md5(bytes(str(i), "utf-8")).hexdigest() for i in 
>>> range(1000)]
>>> sortedhashes = sorted(hashes)
>>> timeit.timeit('"x" in hashes', globals={'hashes': hashes}, number=10)
0.8167107938788831
>>> timeit.timeit('"x" in hashes', globals={'hashes': sortedhashes}, number=10)
5.029693723190576
>>> timeit.timeit('"x" in hashes', globals={'hashes': hashes}, number=10)
0.855183657258749
>>> timeit.timeit('"x" in hashes', globals={'hashes': sortedhashes}, number=10)
5.0585526106879115
>>> sethashes = set(hashes)
>>> timeit.timeit('"x" in hashes', globals={'hashes': sethashes}, number=10)
6.13601878285408e-06

You want containment checks? Use a set :)

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


building numpy arrays with regular structure

2016-12-16 Thread Seb
Hello,

Is there an easier way to write a numpy array with a regular structure?
For example, an array with [0, 1] along the diagnal of one of the array
dimensions, and zero elsewhere:

zz = np.array([[[0, 1], [0, 0], [0, 0]],
   [[0, 0], [0, 1], [0, 0]],
   [[0, 0], [0, 0], [0, 1]]])

This one is not so big, but if it were, there must be a way to code this
properly.

Thanks,

-- 
Seb

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Reading python list as a newsgroup (was ...)

2016-12-16 Thread D'Arcy Cain

On 2016-12-16 08:16 PM, Dennis Lee Bieber wrote:

Unfortunately, my client can only "pre filter" on subject and author; I
could kill all @gmail.*, but could not focus on just Google Groups
submissions...


I don't know what client you use but perhaps you can adapt this procmail 
recipe.


:0 Hir
* ^List-Id:.*python-list.python.org
* ^From:.*@gmail.com
* ^Newsgroups:.*
/dev/null


--
D'Arcy J.M. Cain
System Administrator, Vex.Net
http://www.Vex.Net/ IM:[email protected]
VoIP: sip:[email protected]
--
https://mail.python.org/mailman/listinfo/python-list


Re: Is there a way to insert hooks into a native dictionary type to see when a query arrives and what's looked up?

2016-12-16 Thread Veek M
Steven D'Aprano wrote:

> On Wednesday 14 December 2016 17:11, Veek M wrote:
> 
>> I know that with user classes one can define getattr, setattr to
>> handle dictionary lookup. Is there a way to hook into the native
>> dict() type and see in real time what's being queried.
> 
> Not easily, and maybe not at all.
> 
> There are two obvious ways to do this:
> 
> (1) monkey-patch the object's __dict__, and the class __dict__.
> 
> Unfortunately, Python doesn't support monkey-patching built-ins.
> 
> https://en.wikipedia.org/wiki/Monkey_patch
> 
> Or perhaps I should say, *fortunately* Python doesn't support it.
> 
> http://www.virtuouscode.com/2008/02/23/why-monkeypatching-is-destroying-ruby/
> 
> (2) Alternatively, you could make a dict subclass, and replace the
> class and instance __dict__ with your own.
> 
> Unfortunately, you cannot replace the __dict__ of a class:
> 
> py> class X:  # the class you want to hook into
> ... pass
> ...
> py> class MyDict(dict):  # my custom dict
> ... def __getitem__(self, key):
> ... print(key)
> ... return super().__getitem__(key)
> ...
> py> d = MyDict()
> py> d.update(X.__dict__)
> py> X.__dict__ = d
> Traceback (most recent call last):
>   File "", line 1, in 
> AttributeError: attribute '__dict__' of 'type' objects is not writable
> 
> 
> You can replace the instance dict, but Python won't call your
> __getitem__ method:
> 
> py> instance = X()
> py> instance.__dict__ = MyDict()
> py> instance.a = 999
> py> instance.a
> 999
> 
> So the short answer is, No.
> 
> You might be able to create a completely new metaclass that supports
> this, but it would be a lot of work, and I'm not even sure that it
> would be successful.
> 
> 
> 
>> I wanted to check if when one does:
>> 
>> x.sin()
>> 
>> if the x.__dict__ was queried or if the Foo.__dict__ was queried..
> 
> The easiest way to do that is something like this:
> 
> 
> py> class Test:
> ... def sin(self):
> ... return 999
> ...
> py> x = Test()
> py> x.sin
> >
> py> x.sin()
> 999
> py> x.sin = "surprise!"
> py> x.sin
> 'surprise!'
> 
> 
> 
> So now you know: an instance attribute will shadow the class
> attribute.
> 
> (Actually, that's not *completely* true. It depends on whether x.sin
> is a descriptor or not, and if so, what kind of descriptor.)
> 
> 

heh
If it walks like a duck and talks like a duck, it’s a duck, right? So if 
this duck is not giving you the noise that you want, you’ve got to just 
punch that duck until it returns what you expect. -Patrick Ewing on 
Monkey/Duck patching in RailsConf 2007
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Cache memory and its effect on list searching

2016-12-16 Thread sohcahtoa82
On Friday, December 16, 2016 at 6:27:24 PM UTC-8, Chris Angelico wrote:
> On Sat, Dec 17, 2016 at 1:20 PM,   wrote:
> > I thought this was curious behavior.  I created a list of random-looking 
> > strings, then made a sorted copy.  I then found that using "in" to see if a 
> > string exists in the sorted list took over 9 times as long!
> >
> 
> My numbers replicate yours (though my computer's faster). But my
> conclusion is different:
> 
> Python 3.7.0a0 (default:ecd218c41cd4, Dec 16 2016, 03:08:47)
> [GCC 6.2.0 20161027] on linux
> Type "help", "copyright", "credits" or "license" for more information.
> >>> import hashlib
> >>> import timeit
> >>> hashes =  [hashlib.md5(bytes(str(i), "utf-8")).hexdigest() for i in 
> >>> range(1000)]
> >>> sortedhashes = sorted(hashes)
> >>> timeit.timeit('"x" in hashes', globals={'hashes': hashes}, number=10)
> 0.8167107938788831
> >>> timeit.timeit('"x" in hashes', globals={'hashes': sortedhashes}, 
> >>> number=10)
> 5.029693723190576
> >>> timeit.timeit('"x" in hashes', globals={'hashes': hashes}, number=10)
> 0.855183657258749
> >>> timeit.timeit('"x" in hashes', globals={'hashes': sortedhashes}, 
> >>> number=10)
> 5.0585526106879115
> >>> sethashes = set(hashes)
> >>> timeit.timeit('"x" in hashes', globals={'hashes': sethashes}, number=10)
> 6.13601878285408e-06
> 
> You want containment checks? Use a set :)
> 
> ChrisA

Ah, I forgot about the optimizations of a set.  The only time I ever find 
myself using set is when I want to remove all the duplicates in a list.  I 
convert it to a set and then back.

For fun, I ran an experiment:

### BEGIN listsearch.py
import hashlib 
import timeit
import sys

from bisect import bisect_left

def bisect_search(a, x, lo=0, hi=None):   # can't use a to specify default for 
hi
# From 
http://stackoverflow.com/questions/212358/binary-search-bisection-in-python
hi = hi if hi is not None else len(a) # hi defaults to len(a)
pos = bisect_left(a,x,lo,hi)  # find insertion position
return (pos if pos != hi and a[pos] == x else -1) # don't walk off the end

def bin_search(haystack, needle):
start = 0
end = len(haystack) - 1
while True:
if start > end:
return False
middle = (start + end) // 2
if needle < haystack[middle]:
end = middle - 1
continue
elif needle > haystack[middle]:
start = middle + 1
continue
elif needle == haystack[middle]:
return True

print('Python version: ', sys.version_info)
print('building hashes...')
hashes = [hashlib.md5(bytes(str(i), "utf-8")).hexdigest() for i in 
range(1000)]
print('sorting...')
sortedhashes = sorted(hashes)
print('creating set...')
sethashes = set(hashes)
print('Unsorted list:', timeit.timeit('"x" in hashes', globals={'hashes': 
hashes}, number=10))
print('Sorted:', timeit.timeit('"x" in hashes', globals={'hashes': 
sortedhashes}, number=10))
print('set:', timeit.timeit('"x" in hashes', globals={'hashes': sethashes}, 
number=10))
print('binary search:',  timeit.timeit('binsearch(hashes, "x")', 
globals={'hashes': sortedhashes, 'binsearch': bin_search}, number=10))
print('binary search with bisect:',  timeit.timeit('binsearch(hashes, "x")', 
globals={'hashes': sortedhashes, 'binsearch': bisect_search}, number=10))
### END listsearch.py

> python3 listsearch.py
Python version:  sys.version_info(major=3, minor=5, micro=2, 
releaselevel='final', serial=0)
building hashes...
sorting...
creating set...
Unsorted list: 1.7384763684627569
Sorted: 9.248799958145042
set: 1.4614161294446149e-06
binary search: 0.00010902164328108199
binary search with bisect: 1.7829276782066472e-05

Yup.  set is definitely the way to go!
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Is there a way to insert hooks into a native dictionary type to see when a query arrives and what's looked up?

2016-12-16 Thread Chris Angelico
On Sat, Dec 17, 2016 at 3:42 PM, Veek M  wrote:
> If it walks like a duck and talks like a duck, it’s a duck, right? So if
> this duck is not giving you the noise that you want, you’ve got to just
> punch that duck until it returns what you expect. -Patrick Ewing on
> Monkey/Duck patching in RailsConf 2007

https://www.youtube.com/watch?v=PFn_agxmatg

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Cache memory and its effect on list searching

2016-12-16 Thread Chris Angelico
On Sat, Dec 17, 2016 at 3:44 PM,   wrote:
>> python3 listsearch.py
> Python version:  sys.version_info(major=3, minor=5, micro=2, 
> releaselevel='final', serial=0)
> building hashes...
> sorting...
> creating set...
> Unsorted list: 1.7384763684627569
> Sorted: 9.248799958145042
> set: 1.4614161294446149e-06
> binary search: 0.00010902164328108199
> binary search with bisect: 1.7829276782066472e-05
>
> Yup.  set is definitely the way to go!

More than that: the lists are searched in linear time, the binary
seach runs in logarithmic time, but the set lookup is constant time.
Doesn't matter how big your set is, you can test for membership with
one hash lookup.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


[RELEASE] Python 3.6.0rc2 is now available

2016-12-16 Thread Ned Deily
On behalf of the Python development community and the Python 3.6 release
team, I would like to announce the availability of Python 3.6.0rc2. 3.6.0rc2
is the second release candidate for Python 3.6, the next major release of
Python.

Code for 3.6.0 is now frozen.  3.6.0rc2 is the same code base as the first
release candidate, 3.6.0rc1, with the addition of fixes for a couple of
critical problems and with some documentation additions and updates.
Assuming no further release critical problems are found prior to the 3.6.0
final release date, now planned for 2016-12-23, the 3.6.0 final release
will be the same code base as this 3.6.0rc2.  Maintenance releases for the
3.6 series will follow at regular intervals starting in the first quarter
of 2017.

Among the new major new features in Python 3.6 are:

* PEP 468 - Preserving the order of **kwargs in a function
* PEP 487 - Simpler customization of class creation
* PEP 495 - Local Time Disambiguation
* PEP 498 - Literal String Formatting
* PEP 506 - Adding A Secrets Module To The Standard Library
* PEP 509 - Add a private version to dict
* PEP 515 - Underscores in Numeric Literals
* PEP 519 - Adding a file system path protocol
* PEP 520 - Preserving Class Attribute Definition Order
* PEP 523 - Adding a frame evaluation API to CPython
* PEP 524 - Make os.urandom() blocking on Linux (during system startup)
* PEP 525 - Asynchronous Generators (provisional)
* PEP 526 - Syntax for Variable Annotations (provisional)
* PEP 528 - Change Windows console encoding to UTF-8
* PEP 529 - Change Windows filesystem encoding to UTF-8
* PEP 530 - Asynchronous Comprehensions

Please see "What’s New In Python 3.6" for more information:

https://docs.python.org/3.6/whatsnew/3.6.html

You can find Python 3.6.0rc2 here:

https://www.python.org/downloads/release/python-360rc2/

Note that 3.6.0rc2 is still a preview release and thus its use is not
recommended for production environments.

More information about the release schedule can be found here:

https://www.python.org/dev/peps/pep-0494/

--
  Ned Deily
  [email protected] -- []

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Cache memory and its effect on list searching

2016-12-16 Thread Steve D'Aprano
On Sat, 17 Dec 2016 03:55 pm, Chris Angelico wrote:

> More than that: the lists are searched in linear time, the binary
> seach runs in logarithmic time, but the set lookup is constant time.
> Doesn't matter how big your set is, you can test for membership with
> one hash lookup.

To be pedantic: it is constant time on average.

Best case is one hash and one equality test. Worst case is if all the set
elements have colliding hashes, in which case it degenerates to a linear
search.

There is a class of denial of service attacks where the attacker can specify
the keys in a dict in such a way that lookups collide, and can push new
colliding items into the dictionary (or set) faster than Python can perform
the lookups. That's why Python now has hash randomisation:

http://bugs.python.org/issue13703
https://www.python.org/dev/peps/pep-0456/


But outside of contrived or hostile examples, where elements of the set are
designed to collide, typically you would expect hash collisions to be rare,
and long chains of colliding elements even rarer.  For random elements,
most will require only a single hash and a single equality test, a small
number might require two tests, three would be even rarer, and so forth.

So strictly speaking, and I realise I'm being exceedingly pedantic here, a
*sufficiently large* dict or set MUST have colliding elements. How large
is "sufficiently large"? Its at least 2**31, more than two billion, so
while you are right that *in practice* set/dict lookups require only a
single hash + equality, in principle (and sometimes in practice too)
collisions can be significant.

Nevertheless, I mention this only for completeness. In practice, you almost
never need to care about hash collisions except for the most pathological
cases.



-- 
Steve
“Cheer up,” they said, “things could be worse.” So I cheered up, and sure
enough, things got worse.

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Cache memory and its effect on list searching

2016-12-16 Thread Chris Angelico
On Sat, Dec 17, 2016 at 5:12 PM, Steve D'Aprano
 wrote:
> On Sat, 17 Dec 2016 03:55 pm, Chris Angelico wrote:
>
>> More than that: the lists are searched in linear time, the binary
>> seach runs in logarithmic time, but the set lookup is constant time.
>> Doesn't matter how big your set is, you can test for membership with
>> one hash lookup.
>
> To be pedantic: it is constant time on average.

True. And technically, you could say that a linear or binary search
has a best-case of constant time (if you find it right at the
beginning or middle of the list). I was talking about the average.

> But outside of contrived or hostile examples, where elements of the set are
> designed to collide, typically you would expect hash collisions to be rare,
> and long chains of colliding elements even rarer.  For random elements,
> most will require only a single hash and a single equality test, a small
> number might require two tests, three would be even rarer, and so forth.

Yep. Normal case, it's going to be one or two tests - and that doesn't
depend on the length of the list. The degenerate case does, but the
normal case doesn't.

> So strictly speaking, and I realise I'm being exceedingly pedantic here, a
> *sufficiently large* dict or set MUST have colliding elements. How large
> is "sufficiently large"? Its at least 2**31, more than two billion, so
> while you are right that *in practice* set/dict lookups require only a
> single hash + equality, in principle (and sometimes in practice too)
> collisions can be significant.

That assumes the hash has a finite size. If you have enough memory to
store that many unique objects, you can probably afford to use a
hashing algorithm that allows enough room. However, the chances of not
having collisions depend on the capacity. And the chances of having no
collisions at all are pretty low... as a rule of thumb, I estimate on
a 50-50 chance of a collision when capacity is the square of usage. So
if you have a thousand entries but capacity for a million, you have a
50% chance of having at least one collision. (The numbers aren't quite
right, but it's a good rule of thumb.)

> Nevertheless, I mention this only for completeness. In practice, you almost
> never need to care about hash collisions except for the most pathological
> cases.

Indeed. That's why hash tables are so prevalent, despite their
appalling worst-case.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list