RE: The Zen of D.E.K.

2023-01-14 Thread avi.e.gross
I can appreciate a beautiful piece of code but I can also appreciate another
piece of code that does things in another pleasing way so there is quite a
bit of subjectivity here.

And, in yet another computer language, the implementation of what seems to
be the same algorithm is somewhat jarring as it does not quite fit the
environment.

Some people consider the symmetry of a language that ends an IF statement
with FI to be sort of pleasing. Others feel that way about matched opposing
braces and yet others like having things the same symbol such as an
unadorned double quote or slash to be both the beginning and end.

It goes way deeper than that but I think there is plenty of subjectivity in
what people find pleasing. Some adore it if an algorithm is a very curt and
mysterious one-liner while others like when code is lined up just so on
multiple lines, perhaps using a nice color scheme in their editor. Some
adore copious detailed comments while others find they get in the way.

Efficiency is another matter but again has some subjectivity and variations.
The same algorithm can be much more efficient in one language/implementation
than another but also in other ways can be less. If an algorithm must sort a
billion items, the algorithm may dominate the resources used. But to sort a
small number of items, the overhead of invoking and loading an external
module that has a faster method than the built-in way, may be much slower if
used only once.

In the real world, there are other candidates for what is in some sense
better to do. One example is how fast it can be designed and implemented and
another might be if it tends to generate fewer bugs and glitches. A big one
is if it saves the company money in creating and maintaining it or at
runtime. And, of course, a good algorithm implementation is one that others,
perhaps less extremely educated than you, can later read your code and
understand it well enough to modify it, or perhaps port it to another
language with different ways than the one you wrote it in.

Efficiency keeps being relative as languages evolve. A change in the
interpreter may add features that end up making the feature you chose to
slow a bit. Replacing some functionality with a version written in a
language like C that is compiled, can often speed it up. Changing an
algorithm from using a list to a numpy array can have dramatic differences
even as the skeleton of the algorithm remains the same in terms of
aesthetics.

Amusingly, I have been reading about ideas of Aesthetics and sort of beauty
by Mathematicians and Physicists in how it guides them in their work. Knuth
and others in C.S. are arguably doing similar things.

-Original Message-
From: Python-list  On
Behalf Of Ethan Furman
Sent: Friday, January 13, 2023 1:00 PM
To: [email protected]
Subject: Re: The Zen of D.E.K.

On 1/13/23 09:06, Stefan Ram wrote:

 >"Beautiful is better than ugly." - The Zen of Python
 >
 >This says nothing. You have to sacrifice something that
 >really has /value/!
 >
 >"[A]esthetics are more important than efficiency." - Donald E. Knuth

[okay, falling for the troll bait]

Those two things do not say the same thing; further, in Python at least, and
depending on the situation, aesthetics may /not/ be more important than
efficiency.

--
~Ethan~
--
https://mail.python.org/mailman/listinfo/python-list

-- 
https://mail.python.org/mailman/listinfo/python-list


RE: To clarify how Python handles two equal objects

2023-01-14 Thread Jen Kris via Python-list
Avi, 

Your comments go farther afield than my original question, but you made some 
interesting additional points.  For example, I sometimes work with the C API 
and sys.getrefcount may be helpful in deciding when to INCREF and DECREF.  But 
that’s another issue. 

The situation I described in my original post is limited to a case such as x = 
y where both "x" and "y" are arrays – whether they are lists in Python, or from 
the array module – and the question in a compiled C extension is whether the 
assignment can be done simply by "x" taking the pointer to "y" rather than 
moving all the data from "y" into the memory buffer for "x" which, for a wide 
array, would be much more time consuming than just moving a pointer.  The other 
advantage to doing it that way is if, as in my case, we perform a math 
operation on any element in "x" then Python expects that the same change to be 
reflected in "y."  If I don’t use the same pointers then I would have to 
perform that operation twice – once for "x" and once  for "y" – in addition to 
the expense of moving all the data. 

The answers I got from this post confirmed that it I can use the pointer if "y" 
is not re-defined to something else during the lifespan of "x."  If it is then 
"x" has to be restored to its original pointer.  I did it that way, and 
helpfully the compiler did not overrule me. 


Jan 13, 2023, 18:41 by [email protected]:

> Jen,
>
> This may not be on target but I was wondering about your needs in this 
> category. Are all your data in a form where all in a cluster are the same 
> object type, such as floating point?
>
> Python has features designed to allow you to get multiple views on such 
> objects such as memoryview that can be used to say see an array as a matrix 
> of n rows by m columns, or m x n, or any other combo. And of course the 
> fuller numpy package has quite a few features.
>
> However, as you note, there is no guarantee that any reference to the data 
> may not shift away from it unless you build fairly convoluted logic or data 
> structures such as having an object that arranges to do something when you 
> try to remove it, such as tinkering with the __del__ method as well as 
> whatever method is used to try to set it to a new value. I guess that might 
> make sense for something like asynchronous programming including when setting 
> locks so multiple things cannot overlap when being done.
>
> Anyway, some of the packages like numpy are optimized in many ways but if you 
> want to pass a subset of sorts to make processing faster, I suspect you could 
> do things like pass a memoryview but it might not be faster than what you 
> build albeit probably more reliable and portable.
>
> I note another odd idea that others may have mentioned, with caution.
>
> If you load the sys module, you can CAREFULLY use code like this.
>
> a="Something Unique"
> sys.getrefcount(a)
> 2
>
> Note if a==1 you will get some huge number of references and this is 
> meaningless. The 2 above is because asking about how many references also 
> references it.
>
> So save what ever number you have and see what happens when you make a second 
> reference or a third, and what happens if you delete or alter a reference:
>
> a="Something Unique"
> sys.getrefcount(a)
> 2
> b = a
> sys.getrefcount(a)
> 3
> sys.getrefcount(b)
> 3
> c = b
> d = a
> sys.getrefcount(a)
> 5
> sys.getrefcount(d)
> 5
> del(a)
> sys.getrefcount(d)
> 4
> b = "something else"
> sys.getrefcount(d)
> 3
>
> So, in theory, you could carefully write your code to CHECK the reference 
> count had not changed but there remain edge cases where a removed reference 
> is replaced by yet another new reference and you would have no idea.
>
> Avi
>
>
> -Original Message-
> From: Python-list  On 
> Behalf Of Jen Kris via Python-list
> Sent: Wednesday, January 11, 2023 1:29 PM
> To: Roel Schroeven 
> Cc: [email protected]
> Subject: Re: To clarify how Python handles two equal objects
>
> Thanks for your comments.  After all, I asked for clarity so it’s not 
> pedantic to be precise, and you’re helping to clarify. 
>
> Going back to my original post,
>
> mx1 = [ [ 1, 2, 3 ], [ 4, 5, 6 ], [ 7, 8, 9 ] ]
> arr1 = mx1[2]
>
> Now if I write "arr1[1] += 5" then both arr1 and mx1[2][1] will be changed 
> because while they are different names, they are the assigned same memory 
> location (pointer).  Similarly, if I write "mx1[2][1] += 5" then again both 
> names will be updated. 
>
> That’s what I meant by "an operation on one is an operation on the other."  
> To be more precise, an operation on one name will be reflected in the other 
> name.  The difference is in the names,  not the pointers.  Each name has the 
> same pointer in my example, but operations can be done in Python using either 
> name. 
>
>
>
>
> Jan 11, 2023, 09:13 by [email protected]:
>
>> Op 11/01/2023 om 16:33 schreef Jen Kris via Python-list:
>>
>>> Yes, I did understand that.  In your example, "a" and "b" ar

Re: To clarify how Python handles two equal objects

2023-01-14 Thread Chris Angelico
On Sun, 15 Jan 2023 at 10:32, Jen Kris via Python-list
 wrote:
> The situation I described in my original post is limited to a case such as x 
> = y ... the assignment can be done simply by "x" taking the pointer to "y" 
> rather than moving all the data from "y" into the memory buffer for "x"
>

It's not simply whether it *can* be done. It, in fact, *MUST* be done
that way. The ONLY meaning of "x = y" is that you now have a name "x"
which refers to whatever object is currently found under the name "y".
This is not an optimization, it is a fundamental of Python's object
model. This is true regardless of what kind of object this is; every
object must behave this way.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: To clarify how Python handles two equal objects

2023-01-14 Thread Jen Kris via Python-list
Yes, in fact I asked my original question – "I discovered something about 
Python array handling that I would like to clarify" -- because I saw that 
Python did it that way.  



Jan 14, 2023, 15:51 by [email protected]:

> On Sun, 15 Jan 2023 at 10:32, Jen Kris via Python-list
>  wrote:
>
>> The situation I described in my original post is limited to a case such as x 
>> = y ... the assignment can be done simply by "x" taking the pointer to "y" 
>> rather than moving all the data from "y" into the memory buffer for "x"
>>
>
> It's not simply whether it *can* be done. It, in fact, *MUST* be done
> that way. The ONLY meaning of "x = y" is that you now have a name "x"
> which refers to whatever object is currently found under the name "y".
> This is not an optimization, it is a fundamental of Python's object
> model. This is true regardless of what kind of object this is; every
> object must behave this way.
>
> ChrisA
> -- 
> https://mail.python.org/mailman/listinfo/python-list
>

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: To clarify how Python handles two equal objects

2023-01-14 Thread Chris Angelico
On Sun, 15 Jan 2023 at 11:38, Jen Kris  wrote:
>
> Yes, in fact I asked my original question – "I discovered something about 
> Python array handling that I would like to clarify" -- because I saw that 
> Python did it that way.
>

Yep. This is not specific to arrays; it is true of all Python objects.
Also, I suspect you're still thinking about things backwards, and am
trying to lead you to a completely different way of thinking that
actually does align with Python's object model.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: To clarify how Python handles two equal objects

2023-01-14 Thread Roel Schroeven



Chris Angelico schreef op 15/01/2023 om 1:41:

On Sun, 15 Jan 2023 at 11:38, Jen Kris  wrote:
>
> Yes, in fact I asked my original question – "I discovered something about Python 
array handling that I would like to clarify" -- because I saw that Python did it that 
way.
>

Yep. This is not specific to arrays; it is true of all Python objects.
Also, I suspect you're still thinking about things backwards, and am
trying to lead you to a completely different way of thinking that
actually does align with Python's object model.
Indeen, I also still have the impression that Jen is thinking in terms 
of variables that are possible aliased such as you can have in a 
language like C, instead of objects with one or more names like we have 
in Python. Jens, in the Python model you really have to think of the 
objects largely independently of the names that are or are not 
referencing the objects.


--
"Ever since I learned about confirmation bias, I've been seeing
it everywhere."
-- Jon Ronson

--
https://mail.python.org/mailman/listinfo/python-list


Re: To clarify how Python handles two equal objects

2023-01-14 Thread Frank Millman

On 2023-01-15 4:36 AM, Roel Schroeven wrote:



Chris Angelico schreef op 15/01/2023 om 1:41:

On Sun, 15 Jan 2023 at 11:38, Jen Kris  wrote:
>
> Yes, in fact I asked my original question – "I discovered something 
about Python array handling that I would like to clarify" -- because I 
saw that Python did it that way.

>

Yep. This is not specific to arrays; it is true of all Python objects.
Also, I suspect you're still thinking about things backwards, and am
trying to lead you to a completely different way of thinking that
actually does align with Python's object model.
Indeen, I also still have the impression that Jen is thinking in terms 
of variables that are possible aliased such as you can have in a 
language like C, instead of objects with one or more names like we have 
in Python. Jens, in the Python model you really have to think of the 
objects largely independently of the names that are or are not 
referencing the objects.




My 'aha' moment came when I understood that a python object has only 
three properties - a type, an id, and a value. It does *not* have a name.


Frank Millman

--
https://mail.python.org/mailman/listinfo/python-list


Fast lookup of bulky "table"

2023-01-14 Thread Dino



Hello, I have built a PoC service in Python Flask for my work, and - now 
that the point is made - I need to make it a little more performant (to 
be honest, chances are that someone else will pick up from where I left 
off, and implement the same service from scratch in a different language 
(GoLang? .Net? Java?) but I am digressing).


Anyway, my Flask service initializes by loading a big "table" of 100k 
rows and 40 columns or so (memory footprint: order of 300 Mb) and then 
accepts queries through a REST endpoint. Columns are strings, enums, and 
numbers. Once initialized, the table is read only. The endpoint will 
parse the query and match it against column values (equality, 
inequality, greater than, etc.) Finally, it will return a (JSON) list of 
all rows that satisfy all conditions in the query.


As you can imagine, this is not very performant in its current form, but 
performance was not the point of the PoC - at least initially.


Before I deliver the PoC to a more experienced software architect who 
will look at my code, though, I wouldn't mind to look a bit less lame 
and do something about performance in my own code first, possibly by 
bringing the average time for queries down from where it is now (order 
of 1 to 4 seconds per query on my laptop) to 1 or 2 milliseconds on 
average).


To be honest, I was already able to bring the time down to a handful of 
microseconds thanks to a rudimentary cache that will associate the 
"signature" of a query to its result, and serve it the next time the 
same query is received, but this may not be good enough: 1) queries 
might be many and very different from one another each time, AND 2) I am 
not sure the server will have a ton of RAM if/when this thing - or 
whatever is derived from it - is placed into production.


How can I make my queries generally more performant, ideally also in 
case of a new query?


Here's what I have been considering:

1. making my cache more "modular", i.e. cache the result of certain 
(wide) queries. When a complex query comes in, I may be able to restrict 
my search to a subset of the rows (as determined by a previously cached 
partial query). This should keep the memory footprint under control.


2. Load my data into a numpy.array and use numpy.array operations to 
slice and dice my data.


3. load my data into sqlite3 and use SELECT statement to query my table. 
I have never used sqllite, plus there's some extra complexity as 
comparing certain colum requires custom logic, but I wonder if this 
architecture would work well also when dealing with a 300Mb database.


4. Other ideas?

Hopefully I made sense. Thank you for your attention

Dino
--
https://mail.python.org/mailman/listinfo/python-list