Re: need some kind of "coherence index" for a group of strings
On 03/11/16 16:18, Fillmore wrote: > > Hi there, apologies for the generic question. Here is my problem let's > say that I have a list of lists of strings. > > list1:#strings are sort of similar to one another > > my_nice_string_blabla > my_nice_string_blqbli > my_nice_string_bl0bla > my_nice_string_aru > > > list2:#strings are mostly different from one another > > my_nice_string_blabla > some_other_string > yet_another_unrelated string > wow_totally_different_from_others_too > > > I would like an algorithm that can look at the strings and determine > that strings in list1 are sort of similar to one another, while the > strings in list2 are all different. > Ideally, it would be nice to have some kind of 'coherence index' that I > can exploit to separate lists given a certain threshold. > > I was about to concoct something using levensthein distance, but then I > figured that it would be expensive to compute and I may be reinventing > the wheel. > > Thanks in advance to python masters that may have suggestions... > > > https://pypi.python.org/pypi/jellyfish/ Duncan -- https://mail.python.org/mailman/listinfo/python-list
Re: data interpolation
On Thursday, November 3, 2016 at 11:08:34 AM UTC+1, Heli wrote: > Hi, > > I have a question about data interpolation using python. I have a big ascii > file containg data in the following format and around 200M points. > > id, xcoordinate, ycoordinate, zcoordinate > > then I have a second file containing data in the following format, ( 2M > values) > > id, xcoordinate, ycoordinate, zcoordinate, value1, value2, value3,..., valueN > > I would need to get values for x,y,z coordinates of file 1 from values of > file2. > > I don“t know whether my data in file1 and 2 is from structured or > unstructured grid source. I was wondering which interpolation module either > from scipy or scikit-learn you recommend me to use? > > I would also appreciate if you could recommend me some sample > example/reference. > > Thanks in Advance for your help, Take a look at the scipy.spatial.KDTree class: https://docs.scipy.org/doc/scipy-0.14.0/reference/generated/scipy.spatial.KDTree.html Given your example, you would build the tree (using the coordinates) from the second file. Subsequently, you can use one of the query methods for every point in your first file. From this it is up to you how to transfer (interpolate) the values. Marco -- https://mail.python.org/mailman/listinfo/python-list
Re: Pre-pep discussion material: in-place equivalents to map and filter
> If slice assignment is done as I hope it will optimize remain memory operations. Bad news. http://stackoverflow.com/questions/4948293/python-slice-assignment-memory-usage/4948508#4948508 > If you want something like C++ move semantics, use C++. I don't see anything like this in my proposal. If any in-place operation is "C++ semantics" how do you explain there is already a bunch of in-place operator stuffed in Python that have even less justification to their existance than map or filter does such as sort/sorted, reverse/reversed ? Especially troubling since the optimisation in a sort operation is likely to be less significant than in a linear algorithm. 2016-11-04 1:03 GMT+01:00 Terry Reedy : > On 11/3/2016 2:56 AM, [email protected] wrote: > > lst = [ item for item in lst if predicate(item) ] >> lst = [ f(item) for item in lst ] >> >> Both these expressions feature redundancy, lst occurs twice and item at >> least twice. Additionally, the readability is hurt, because one has to dive >> through the semantics of the comprehension to truely understand I am >> filtering the list or remapping its values. >> > ... > >> A language support for these operations to be made in-place could improve >> the efficiency of this operations through reduced use of memory. >> > > We already have that: slice assignment with an iterator. > > lst[:] = (item for item in list if predicate(item)) > lst[:] = map(f, lst) # iterator in 3.x. > > To save memory, stop using unneeded temporary lists and use iterators > instead. If slice assignment is done as I hope it will optimize remain > memory operations. (But I have not read the code.) It should overwrite > existing slots until either a) the iterator is exhausted or b) existing > memory is used up. When lst is both source and destination, only case a) > can happen. When it does, the list can be finalized with its new contents. > > As for timings. > > from timeit import Timer > setup = """data = list(range(1)) > def func(x): > return x > """ > t1a = Timer('data[:] = [func(a) for a in data]', setup=setup) > t1b = Timer('data[:] = (func(a) for a in data)', setup=setup) > t2a = Timer('data[:] = list(map(func, data))', setup=setup) > t2b = Timer('data[:] = map(func, data)', setup=setup) > > print('t1a', min(t1a.repeat(number=500, repeat=7))) > print('t1b', min(t1b.repeat(number=500, repeat=7))) > print('t2a', min(t2a.repeat(number=500, repeat=7))) > print('t2b', min(t2b.repeat(number=500, repeat=7))) > # > t1a 0.5675313005414555 > t1b 0.7034254675598604 > t2a 0.518128598520 > t2b 0.5196112759726024 > > If f does more work, the % difference among these will decrease. > > > > -- > Terry Jan Reedy > > -- > https://mail.python.org/mailman/listinfo/python-list > -- https://mail.python.org/mailman/listinfo/python-list
Re: Pre-pep discussion material: in-place equivalents to map and filter
On Fri, 4 Nov 2016 08:34 am, Chris Angelico wrote:
[...]
> List comps themselves involve one function call (zero in Py2). What
> you do inside the expression is your business. Do you agree that list
> comps don't have the overhead of opening and closing files?
/tongue firmly in cheek
I'd like to see you run a Python script containing a list comprehension
without opening and closing the .py file.
:-P
Okay, I see what you are getting at now: in CPython 3, list comprehensions
are implemented in such a way that the list comprehension requires a
minimum of one function call, while list(map(func, iterable))) requires a
minimum of O(N)+2 function calls: a call to list, a call to map, and a call
to func for each of N values.
That's *technically* true, but it is an implementation detail. I'm sure that
some day PyPy or Nuitka or maybe even CPython itself will start in-lining
at least some functions (if PyPy doesn't already do so). That would be an
obvious optimization to apply to map() and filter().
As far as the speed of map() versus list comps, my micro-benchmarks show
that at least on my computer, map() can be marginally but consistently
faster than a list comp once you equalise that cost of function calls, that
is, if the list comp explicitly calls a function. I don't think that speed
difference is significant, so let's just call them "equally fast" when
comparing similar cases:
[func(obj) for obj in iterable]
map(func, iterable)
But of course you're right that list comprehensions give you the opportunity
to avoid that function call -- at least in CPython. I already agreed with
that, but to emphasise what I've already agreed, I'll say it again :-)
If you can manually in-line func() as a single expression inside the list
comprehension:
[(spam + len(obj.eggs))*2 for obj in iterable]
then you can expect to save the cost of N function calls, which may be
significant. As I said earlier, that's why we have list comps.
(But on the other hand, if the expression is expensive enough, the cost of
an extra function call may be utterly insignificant.)
The point that I am trying to make is that none of these facts justifies the
claim that map() performs "especially bad" compared to list comprehensions.
According to my tests, *at worst* map() will be a bit better than half as
fast as a list comprehension, and at best just as fast if not slightly
faster.
>> Here's some timing results using 3.5 on my computer. For simplicity, so
>> folks can replicate the test themselves, here's the timing code:
>>
>>
>> from timeit import Timer
>> setup = """data = list(range(1))
>> def func(x): # simulate some calculation
>> return {x+1: x**2}
>> """
>> t1 = Timer('[func(a) for a in data]', setup=setup)
>> t2 = Timer('list(map(func, data))', setup=setup)
>
> This is very different from the original example, about which the OP
> said that map performs badly, and you doubted it.
I didn't read the OP as making a specific claim about these two *specific*
map and filter examples:
lst = map (lambda x: x*5, lst)
lst = filter (lambda x: x%3 == 1, lst)
I read these as mere examples of a general claim that map and
filter "perform especially bad in CPython compared to a comprehension".
But just for the exercise, I repeated my benchmarks with these specific
examples, comparing:
list(map(lambda x: x*5, data))
[x*5 for x in data]
and
list(filter(lambda x: x%3 == 1, data))
[x for x in data if x%3 == 1]
and again got a roughly factor of two performance difference, with the list
comp being faster. I don't think that justifies the claim of "especially
bad", which to me implies something much worse. If you're going to describe
a mere factor of two as "especially bad", what words do we have left for
something that is ten thousand times slower?
As the wisest man in the universe once said, hyperbole is the most terrible,
awful crime against humanity.
*wink*
[...]
> Thing is, this is extremely common. How often do you actually use a
> comprehension with something that is absolutely exactly a function
> call on the element in question?
"This" being something that can be in-lined in the body of the list comp.
Sure. I cheerfully acknowledge that list comps where you can write an
in-line expression are very common. That's the beauty of list comps!
[...]
> But this conclusion I agree with. There is a performance difference,
> but it is not overly significant. Compared to the *actual work*
> involved in the task (going through one list and doing some operation
> on each operation), the difference between map and a comprehension is
> generally going to be negligible.
I wouldn't go quite so far as to say "negligible" -- a factor of two speed
up on a large list is not something to be sneezed at. But I think we're
converging on agreement: list comps and map/filter typically have
comparable performance, and as the cost of the work done increases, the
extra overhead of a function call becomes les
Re: Pre-pep discussion material: in-place equivalents to map and filter
On Sat, Nov 5, 2016 at 11:42 AM, Steve D'Aprano wrote: > On Fri, 4 Nov 2016 08:34 am, Chris Angelico wrote: > > [...] >> List comps themselves involve one function call (zero in Py2). What >> you do inside the expression is your business. Do you agree that list >> comps don't have the overhead of opening and closing files? > > /tongue firmly in cheek > > I'd like to see you run a Python script containing a list comprehension > without opening and closing the .py file. > > :-P You got me! Let's call it a night. -- Kristoff (nearly my namesake) > Okay, I see what you are getting at now: in CPython 3, list comprehensions > are implemented in such a way that the list comprehension requires a > minimum of one function call, while list(map(func, iterable))) requires a > minimum of O(N)+2 function calls: a call to list, a call to map, and a call > to func for each of N values. > > That's *technically* true, but it is an implementation detail. I'm sure that > some day PyPy or Nuitka or maybe even CPython itself will start in-lining > at least some functions (if PyPy doesn't already do so). That would be an > obvious optimization to apply to map() and filter(). Mmmm, interesting. The fact still remains that map depends on some kind of "object representation" of a block of code (since it's being passed to some other function), where a comprehension can be implemented as an actual expression. So either Python-the-language needs a way to pass around lightweight blocks of code, or the interpreter (for some instance of 'the interpreter') needs to recognize the function calls and optimize them away, or list comps will always have an inherent advantage over map. > As far as the speed of map() versus list comps, my micro-benchmarks show > that at least on my computer, map() can be marginally but consistently > faster than a list comp once you equalise that cost of function calls, that > is, if the list comp explicitly calls a function. I don't think that speed > difference is significant, so let's just call them "equally fast" when > comparing similar cases: > > [func(obj) for obj in iterable] > > map(func, iterable) Right, which is why a lot of style guides recommend against the first form, *in this specific instance*. Using map with a lambda function, or a comprehension with nothing but a function call, is rightly called out in code review. > I didn't read the OP as making a specific claim about these two *specific* > map and filter examples: > > lst = map (lambda x: x*5, lst) > lst = filter (lambda x: x%3 == 1, lst) > > I read these as mere examples of a general claim that map and > filter "perform especially bad in CPython compared to a comprehension". > ... I don't think that justifies the claim of "especially > bad", which to me implies something much worse. If you're going to describe > a mere factor of two as "especially bad", what words do we have left for > something that is ten thousand times slower? > > As the wisest man in the universe once said, hyperbole is the most terrible, > awful crime against humanity. > > *wink* Ah, now we get to the point where we disagreed. I was responding to a misunderstanding of your position - I thought you disagreed that the performance difference could even be significant, but you were arguing against the "especially bad". Gotcha. In that case, I believe we're in agreement; even a two-to-one difference isn't "especially bad" here, and that would be an extreme case. >> But this conclusion I agree with. There is a performance difference, >> but it is not overly significant. Compared to the *actual work* >> involved in the task (going through one list and doing some operation >> on each operation), the difference between map and a comprehension is >> generally going to be negligible. > > I wouldn't go quite so far as to say "negligible" -- a factor of two speed > up on a large list is not something to be sneezed at. But I think we're > converging on agreement: list comps and map/filter typically have > comparable performance, and as the cost of the work done increases, the > extra overhead of a function call becomes less and less significant. I said negligible because the factor of two disappears almost completely when you add a large constant factor to it. What's the performance of these? def lie(n): """it's not a fib, honest""" if n < 2: return 3 return lie(n-1) + lie(n-2) mapped = list(map(lie, range(50))) comprehended = [lie(x) for x in range(50)] badly_mapped = list(map(lambda x: lie(x), range(50))) By any reasonable metric, I would expect all three to have extremely comparable performance. The difference between map's best case (passing an existing function), a list comp, and map's worst case (wrapping the function in a useless lambda function) might be two to one, but it's negligible compared to the surpassing cost of those massively-nested lies. Oh, what a tangled web we weave... ChrisA -- https://mail.python.org/mailman/listinfo/
Re: Pre-pep discussion material: in-place equivalents to map and filter
> I don't think that justifies the claim of "especially > bad", which to me implies something much worse. Quicksort has built its popularity by performing better by "a mere factor two" better than mergesort and heapsort. It became the reference sorting algorithm even though its worst case complexity is worse than its competitors. You can insurge about the fact I call a factor two to be especially bad, and you are right it is an hyperbole in the general case, but I am still right in the specific context of a frequently used linear algorithm. -- https://mail.python.org/mailman/listinfo/python-list
How to pass C++ function pointer argument in embedded python environment?
Hi,All, background:I have a python api like this, def a(arg1,arg2,progress_callback = None) obviously argument progress_callback is a callback function.In python,I can define a function like this: def pro_call(prog_arg1,prog_arg2): #do something with arg1 & arg2 and I just need call a function like follow and it worked a(arg1,arg2,progress_callback = pro_call) question: I don't know how to do like the former exam in embedded python environment. thank you for your reading ,thanks for any response! -- https://mail.python.org/mailman/listinfo/python-list
