Re: Comparing sequences with range objects

2022-04-08 Thread Antoon Pardon




Op 8/04/2022 om 08:24 schreef Peter J. Holzer:

On 2022-04-07 17:16:41 +0200, Antoon Pardon wrote:

Op 7/04/2022 om 16:08 schreef Joel Goldstick:

On Thu, Apr 7, 2022 at 7:19 AM Antoon Pardon   wrote:

I am working with a list of data from which I have to weed out duplicates.
At the moment I keep for each entry a container with the other entries
that are still possible duplicates.

[...]

Sorry I wasn't clear. The data contains information about persons. But not
all records need to be complete. So a person can occur multiple times in
the list, while the records are all different because they are missing
different bits.

So all records with the same firstname can be duplicates. But if I have
a record in which the firstname is missing, it can at that point be
a duplicate of all other records.

There are two problems. The first one is how do you establish identity.
The second is how do you ween out identical objects. In your first mail
you only asked about the second, but that's easy.

The first is really hard. Not only may information be missing, no single
single piece of information is unique or immutable. Two people may have
the same name (I know about several other "Peter Holzer"s), a single
person might change their name (when I was younger I went by my middle
name - how would you know that "Peter Holzer" and "Hansi Holzer" are the
same person?), they will move (= change their address), change jobs,
etc. Unless you have a unique immutable identifier that's enforced by
some authority (like a social security number[1]), I don't think there
is a chance to do that reliably in a program (although with enough data,
a heuristic may be good enough).


Yes I know all that. That is why I keep a bucket of possible duplicates
per "identifying" field that is examined and use some heuristics at the
end of all the comparing instead of starting to weed out the duplicates
at the moment something differs.

The problem is, that when an identifying field is judged to be unusable,
the bucket to be associated with it should conceptually contain all other
records (which in this case are the indexes into the population list).
But that will eat a lot of memory. So I want some object that behaves as
if it is a (immutable) list of all these indexes without actually containing
them. A range object almost works, with the only problem it is not
comparable with a list.

--
Antoon Pardon.

--
https://mail.python.org/mailman/listinfo/python-list


the downloaded version is not being detected

2022-04-08 Thread Putsala Bhavani Maha Laxmi
The purpose of this mail is the problem encountered while using it through an 
IDE platform. While using an IDE this version of python is not getting detected 
even though it is already downloaded and a suggestion box to install python 
appears .

I kindly request you to look into this issue.

Thanking you

Sent from Mail for Windows

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Comparing sequences with range objects

2022-04-08 Thread duncan smith

On 08/04/2022 08:21, Antoon Pardon wrote:



Op 8/04/2022 om 08:24 schreef Peter J. Holzer:

On 2022-04-07 17:16:41 +0200, Antoon Pardon wrote:

Op 7/04/2022 om 16:08 schreef Joel Goldstick:
On Thu, Apr 7, 2022 at 7:19 AM Antoon Pardon   
wrote:
I am working with a list of data from which I have to weed out 
duplicates.

At the moment I keep for each entry a container with the other entries
that are still possible duplicates.

[...]
Sorry I wasn't clear. The data contains information about persons. 
But not

all records need to be complete. So a person can occur multiple times in
the list, while the records are all different because they are missing
different bits.

So all records with the same firstname can be duplicates. But if I have
a record in which the firstname is missing, it can at that point be
a duplicate of all other records.

There are two problems. The first one is how do you establish identity.
The second is how do you ween out identical objects. In your first mail
you only asked about the second, but that's easy.

The first is really hard. Not only may information be missing, no single
single piece of information is unique or immutable. Two people may have
the same name (I know about several other "Peter Holzer"s), a single
person might change their name (when I was younger I went by my middle
name - how would you know that "Peter Holzer" and "Hansi Holzer" are the
same person?), they will move (= change their address), change jobs,
etc. Unless you have a unique immutable identifier that's enforced by
some authority (like a social security number[1]), I don't think there
is a chance to do that reliably in a program (although with enough data,
a heuristic may be good enough).


Yes I know all that. That is why I keep a bucket of possible duplicates
per "identifying" field that is examined and use some heuristics at the
end of all the comparing instead of starting to weed out the duplicates
at the moment something differs.

The problem is, that when an identifying field is judged to be unusable,
the bucket to be associated with it should conceptually contain all other
records (which in this case are the indexes into the population list).
But that will eat a lot of memory. So I want some object that behaves as
if it is a (immutable) list of all these indexes without actually 
containing

them. A range object almost works, with the only problem it is not
comparable with a list.



Is there any reason why you can't use ints? Just set the relevant bits.

Duncan
--
https://mail.python.org/mailman/listinfo/python-list


nptyping 2.0.0 has been released

2022-04-08 Thread Ramon Hagenaars
Hello all,


It fills me with joy to announce that nptyping 2.0.0 is released!

nptyping allows type hinting NumPy arrays with support for dynamic type
checking.

The most notable changes are:
* Complete rewrite, extending numpy.typing and to be MyPy-friendly;
* "Shape Expressions" that allow for a rich notation of NumPy array shapes;
* Dropped support for Python 3.5 and 3.6.

For the full history:

https://github.com/ramonhagenaars/nptyping/blob/master/HISTORY.md


Links


Source is on Github:

https://github.com/ramonhagenaars/nptyping

Available on PyPI:

https://pypi.org/project/nptyping/

Documentation:

https://github.com/ramonhagenaars/nptyping/blob/master/USERDOCS.md

License (MIT):

https://github.com/ramonhagenaars/nptyping/blob/master/LICENSE


-- 
Ramon Hagenaars
-- 
https://mail.python.org/mailman/listinfo/python-list


Issues

2022-04-08 Thread Stevenson, John B via Python-list
Hello,

As a quick disclaimer, I am sorry if you have received this message multiple 
times over from me. I've been having technical difficulties trying to reach 
this email. Thank you.

I'm trying to install Python on a computer so that I can use it for various 
tasks for my job, like mapping and programming. But it's not downloading the 
necessary files into the right repository for me to run Python commands in a 
command prompt. I can open the Python app just fine, but I cannot use it in the 
terminal, and this messes with pip and prevents me from doing my task. What can 
I do to fix this? Error sent back is "'python' is not recognized as an internal 
or external command, operable program or batch file." Thank you.

John B. (Jack) Stevenson
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Issues

2022-04-08 Thread MRAB

On 2022-04-08 20:35, Stevenson, John B via Python-list wrote:

Hello,

As a quick disclaimer, I am sorry if you have received this message multiple 
times over from me. I've been having technical difficulties trying to reach 
this email. Thank you.

I'm trying to install Python on a computer so that I can use it for various tasks for my 
job, like mapping and programming. But it's not downloading the necessary files into the 
right repository for me to run Python commands in a command prompt. I can open the Python 
app just fine, but I cannot use it in the terminal, and this messes with pip and prevents 
me from doing my task. What can I do to fix this? Error sent back is "'python' is 
not recognized as an internal or external command, operable program or batch file." 
Thank you.


Try the Python Launcher instead by typing "py" instead of "python".
--
https://mail.python.org/mailman/listinfo/python-list


Re: the downloaded version is not being detected

2022-04-08 Thread MRAB

On 2022-04-08 17:28, Putsala Bhavani Maha Laxmi wrote:

The purpose of this mail is the problem encountered while using it through an 
IDE platform. While using an IDE this version of python is not getting detected 
even though it is already downloaded and a suggestion box to install python 
appears .

I kindly request you to look into this issue.


"An IDE platform"? Which IDE platform? Which OS?
--
https://mail.python.org/mailman/listinfo/python-list


Re: Comparing sequences with range objects

2022-04-08 Thread Antoon Pardon



Op 8/04/2022 om 16:28 schreef duncan smith:

On 08/04/2022 08:21, Antoon Pardon wrote:


Yes I know all that. That is why I keep a bucket of possible duplicates
per "identifying" field that is examined and use some heuristics at the
end of all the comparing instead of starting to weed out the duplicates
at the moment something differs.

The problem is, that when an identifying field is judged to be unusable,
the bucket to be associated with it should conceptually contain all 
other

records (which in this case are the indexes into the population list).
But that will eat a lot of memory. So I want some object that behaves as
if it is a (immutable) list of all these indexes without actually 
containing

them. A range object almost works, with the only problem it is not
comparable with a list.



Is there any reason why you can't use ints? Just set the relevant bits.


Well my first thought is that a bitset makes it less obvious to calulate
the size of the set or to iterate over its elements. But it is an idea
worth exploring.

--
Antoon.
--
https://mail.python.org/mailman/listinfo/python-list