General Purpose Pipeline library?
a pipeline can be described as a sequence of functions that are applied to an
input with each subsequent function getting the output of the preceding
function:
out = f6(f5(f4(f3(f2(f1(in))
However this isn't very readable and does not support conditionals.
Tensorflow has tensor-focused pipepines:
fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu, scope='fc1')
fc2 = layers.fully_connected(fc1, 256, activation_fn=tf.nn.relu,
scope='fc2')
out = layers.fully_connected(fc2, 10, activation_fn=None, scope='out')
I have some code which allows me to mimic this, but with an implied parameter.
def executePipeline(steps, collection_funcs = [map, filter, reduce]):
results = None
for step in steps:
func = step[0]
params = step[1]
if func in collection_funcs:
print func, params[0]
results = func(functools.partial(params[0],
*params[1:]), results)
else:
print func
if results is None:
results = func(*params)
else:
results = func(*(params+(results,)))
return results
executePipeline( [
(read_rows, (in_file,)),
(map, (lower_row, field)),
(stash_rows, ('stashed_file', )),
(map, (lemmatize_row, field)),
(vectorize_rows, (field, min_count,)),
(evaluate_rows, (weights, None)),
(recombine_rows, ('stashed_file', )),
(write_rows, (out_file,))
]
)
Which gets me close, but I can't control where rows gets passed in. In the
above code, it is always the last parameter.
I feel like I'm reinventing a wheel here. I was wondering if there's already
something that exists?
--
https://mail.python.org/mailman/listinfo/python-list
Re: General Purpose Pipeline library?
> I feel like I'm reinventing a wheel here. I was wondering if there's already > something that exists? I've wondered from time-to-time about using shell pipeline notation within Python. Maybe the grapevine package could be a starting point? I realize that's probably not precisely what you're looking for, but maybe it will give you some ideas. (I've never used it, just stumbled on it with a bit of poking around.) Skip -- https://mail.python.org/mailman/listinfo/python-list
Re: General Purpose Pipeline library?
On Nov 20, 2017 10:50 AM, "Jason" wrote:
>
> a pipeline can be described as a sequence of functions that are applied
to an input with each subsequent function getting the output of the
preceding function:
>
> out = f6(f5(f4(f3(f2(f1(in))
>
> However this isn't very readable and does not support conditionals.
>
> Tensorflow has tensor-focused pipepines:
> fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu,
scope='fc1')
> fc2 = layers.fully_connected(fc1, 256, activation_fn=tf.nn.relu,
scope='fc2')
> out = layers.fully_connected(fc2, 10, activation_fn=None, scope='out')
>
> I have some code which allows me to mimic this, but with an implied
parameter.
>
> def executePipeline(steps, collection_funcs = [map, filter, reduce]):
> results = None
> for step in steps:
> func = step[0]
> params = step[1]
> if func in collection_funcs:
> print func, params[0]
> results = func(functools.partial(params[0],
*params[1:]), results)
> else:
> print func
> if results is None:
> results = func(*params)
> else:
> results = func(*(params+(results,)))
> return results
>
> executePipeline( [
> (read_rows, (in_file,)),
> (map, (lower_row, field)),
> (stash_rows, ('stashed_file', )),
> (map, (lemmatize_row, field)),
> (vectorize_rows, (field, min_count,)),
> (evaluate_rows, (weights, None)),
> (recombine_rows, ('stashed_file', )),
> (write_rows, (out_file,))
> ]
> )
>
> Which gets me close, but I can't control where rows gets passed in. In
the above code, it is always the last parameter.
>
> I feel like I'm reinventing a wheel here. I was wondering if there's
already something that exists?
IBM has had for a very long time a program called Pipelines which runs on
IBM mainframes. It does what you want.
A number of attempts have been made to create cross-platform versions of
this marvelous program.
A long time ago I started but never completed an open source python
version. If you are interested in taking a look at this let me know.
--
https://mail.python.org/mailman/listinfo/python-list
__hash__ and ordered vs. unordered collections
Suppose we're implementing an immutable collection type that comes in unordered
and ordered flavors. Let's call them MyColl and MyOrderedColl.
We implement __eq__ such that MyColl(some_elements) ==
MyOrderedColl(other_elements) iff set(some_elements) == set(other_elements).
But MyOrderedColl(some_elements) == MyOrderedColl(other_elements) iff
list(some_elements) == list(other_elements).
This works just like dict and collections.OrderedDict, in other words.
Since our collection types are immutable, let's say we want to implement
__hash__.
We must ensure that our __hash__ results are consistent with __eq__. That is,
we make sure that if MyColl(some_elements) == MyOrderedColl(other_elements),
then hash(MyColl(some_elements)) == hash(MyOrderedColl(other_elements)).
Now for the question: Is this useful? I ask because this leads to the following
behavior:
>>> unordered = MyColl([1, 2, 3])
>>> ordered = MyOrderedColl([3, 2, 1])
>>> s = {ordered, unordered}
>>> len(s)
1
>>> s = {ordered}
>>> unordered in s
True
>>> # etc.
In other words, sets and mappings can't tell unordered and ordered apart;
they're treated like the same values.
This is a bit reminiscent of:
>>> s = {1.0}
>>> True in s
True
>>> d = {1: int, 1.0: float, True: bool}
>>> len(d)
1
>>> # etc.
The first time I encountered this was a bit of an "aha", but to be clear, I
think this behavior is totally right.
However, I'm less confident that this kind of behavior is useful for MyColl and
MyOrderedColl. Could anyone who feels more certain one way or the other please
explain the rationale and possibly even give some real-world examples?
Thanks!
Josh
--
https://mail.python.org/mailman/listinfo/python-list
Re: Is there something like head() and str() of R in python?
On Sunday, November 19, 2017 at 2:05:12 PM UTC-5, Peng Yu wrote: > Hi, R has the functions head() and str() to show the brief content of > an object. Is there something similar in python for this purpose? > > For example, I want to inspect the content of the variable "train". > What is the best way to do so? Thanks. > > $ cat demo.py > from __future__ import division, print_function, absolute_import > > import tflearn > from tflearn.data_utils import to_categorical, pad_sequences > from tflearn.datasets import imdb > > # IMDB Dataset loading > train, test, _ = imdb.load_data(path='imdb.pkl', n_words=1, > valid_portion=0.1) > > # > https://raw.githubusercontent.com/llSourcell/How_to_do_Sentiment_Analysis/master/demo.py > > -- > Regards, > Peng Python is very good at giving you a string representation of any object. However, such capabilities do fall short every now and then. That is why when defining your own classes, you must also override the __init__() and _-repr__() methods so you can get a better suited string representation of such objects. You can read more at: https://stackoverflow.com/questions/12448175/confused-about-str-in-python -- https://mail.python.org/mailman/listinfo/python-list
Re: How to Generate dynamic HTML Report using Python
Your thoughts on scope are interesting, if unorthodox. There is a problem with your deleting names after use, which is why we rarely delete names. The problem is that deleting a name does not not necessarily or immediately destroy an object. This can lead to great confusion for programmers coming from a RAII language like C++. All del does is delete a name/object binding. Which is the exact same thing as reassigning the name to a new object. Thus I don't see how using del as you do is at all useful, either for programming correctness or understanding. In fact I think it might even be harmful. Far more useful to teach people to use context handlers when appropriate. For example, when working with your sql connection object. On 11/20/2017 07:50 AM, Stefan Ram wrote: > I am posting to a Usenet newsgroup. I am not aware of any > "Python-List mailing list". As far as I'm concerned, this list is primarily a mailing list, hosted by Mailman at python.org, and is mirrored to Usenet via a gateway as a service by python.org. Granted, this is just a matter of perspective. > I am posting specifically to the Usenet, because I am aware > of it's rules and I like it and wish to support it. What rules are these? I'm curious what news reader you are using as your posts are, well, unique. You've set headers that most do not, and your post bodies are all base64 encoded. Your quotes and block (un)indents are very odd also. A very curious setup. You also have this header set: > X-Copyright: (C) Copyright 2017 Stefan Ram. All rights reserved. > Distribution through any means other than regular usenet > channels is forbidden. It is forbidden to publish this > article in the world wide web. It is forbidden to change > URIs of this article into links. It is forbidden to remove > this notice or to transfer the body without this notice. Looks to me like the mailing list needs to block your messages, lest python.org be in violation of your copyright. -- https://mail.python.org/mailman/listinfo/python-list
Re: __hash__ and ordered vs. unordered collections
On Tue, Nov 21, 2017 at 4:47 AM, Josh B. wrote:
> Now for the question: Is this useful? I ask because this leads to the
> following behavior:
>
unordered = MyColl([1, 2, 3])
ordered = MyOrderedColl([3, 2, 1])
s = {ordered, unordered}
len(s)
> 1
s = {ordered}
unordered in s
> True
# etc.
>
> In other words, sets and mappings can't tell unordered and ordered apart;
> they're treated like the same values.
>
> However, I'm less confident that this kind of behavior is useful for MyColl
> and MyOrderedColl. Could anyone who feels more certain one way or the other
> please explain the rationale and possibly even give some real-world examples?
>
This isn't a consequence of __hash__, it's a consequence of __eq__.
You have declared that MyColl and MyOrderedColl are equal, therefore
only one of them stays in the set.
But what you have is the strangeness of non-transitive equality, which
is likely to cause problems.
>>> unordered = MyColl([1, 2, 3])
>>> ordered1 = MyColl([3, 2, 1])
>>> ordered2 = MyColl([2, 1, 3])
unordered is equal to each of the others, but they're not equal to
each other. So if you put them into a set, you'll get results that
depend on order. Here's a simpler form of non-transitive equality:
>>> class Oddity(int):
... def __eq__(self, other):
... if other - 5 <= self <= other + 5:
... return True
... return int(self) == other
... def __hash__(self):
... return 1
...
>>> x, y, z = Oddity(5), Oddity(10), Oddity(15)
>>> x == y, y == z, x == z
(True, True, False)
>>> {x, y, z}
{5, 15}
>>> {y, x, z}
{10}
Setting __hash__ to a constant value is safe (but inefficient); it's
all based on how __eq__ works. So the question is: are you willing to
accept the bizarre behaviour of non-transitive equality?
ChrisA
--
https://mail.python.org/mailman/listinfo/python-list
Re: __hash__ and ordered vs. unordered collections
On 2017-11-20 17:47, Josh B. wrote:
Suppose we're implementing an immutable collection type that comes in unordered
and ordered flavors. Let's call them MyColl and MyOrderedColl.
We implement __eq__ such that MyColl(some_elements) ==
MyOrderedColl(other_elements) iff set(some_elements) == set(other_elements).
What if there are duplicate elements?
Should that be MyColl(some_elements) == MyOrderedColl(other_elements)
iff len(some_elements) == len(other_elements) and set(some_elements) ==
set(other_elements)?
But MyOrderedColl(some_elements) == MyOrderedColl(other_elements) iff
list(some_elements) == list(other_elements).
This works just like dict and collections.OrderedDict, in other words.
Since our collection types are immutable, let's say we want to implement
__hash__.
We must ensure that our __hash__ results are consistent with __eq__. That is,
we make sure that if MyColl(some_elements) == MyOrderedColl(other_elements),
then hash(MyColl(some_elements)) == hash(MyOrderedColl(other_elements)).
Now for the question: Is this useful? I ask because this leads to the following
behavior:
unordered = MyColl([1, 2, 3])
ordered = MyOrderedColl([3, 2, 1])
s = {ordered, unordered}
len(s)
1
s = {ordered}
unordered in s
True
# etc.
In other words, sets and mappings can't tell unordered and ordered apart;
they're treated like the same values.
This is a bit reminiscent of:
s = {1.0}
True in s
True
d = {1: int, 1.0: float, True: bool}
len(d)
1
# etc.
The first time I encountered this was a bit of an "aha", but to be clear, I
think this behavior is totally right.
However, I'm less confident that this kind of behavior is useful for MyColl and
MyOrderedColl. Could anyone who feels more certain one way or the other please
explain the rationale and possibly even give some real-world examples?
If MyColl(some_elements) == MyOrderedColl(other_elements), then
len({MyColl(some_elements), MyOrderedColl(other_elements)}) == 1 seems
right.
As for which one is in the set:
>>> {1, 1.0}
{1}
>>> {1.0, 1}
{1.0}
So if MyColl(some_elements) == MyOrderedColl(other_elements), then
{MyColl(some_elements), MyOrderedColl(other_elements)} ==
{MyColl(some_elements)}.
--
https://mail.python.org/mailman/listinfo/python-list
Re: __hash__ and ordered vs. unordered collections
On Monday, November 20, 2017 at 1:55:26 PM UTC-5, Chris Angelico wrote: > But what you have is the strangeness of non-transitive equality, which > is likely to cause problems. But this is exactly how Python's built-in dict and OrderedDict behave: >>> od = OrderedDict([(1, 0), (2, 0), (3, 0)]) >>> od2 = OrderedDict([(3, 0), (2, 0), (1, 0)]) >>> ud = dict(od) >>> od == ud True >>> od2 == ud True >>> od == od2 False Given that, it would seem wrong for our MyOrderedColl.__eq__ to not behave similarly. Or are you suggesting that OrderedDict.__eq__ should not have been implemented this way in the first place? > So the question is: are you willing to > accept the bizarre behaviour of non-transitive equality? Forget what I'm personally willing to do :) The question here actually is to tease out what Python's existing design is telling us to do. If it helps, substitute "frozenset" for "MyColl" and "FrozenOrderedSet" for "MyOrderedColl". How would you implement their __eq__ methods? What would be the correct design for our hypothetical frozen(ordered)set library? What would be more useful, intuitive, and usable for our users? Thanks very much for the good examples and for helping me clarify the question! -- https://mail.python.org/mailman/listinfo/python-list
Re: __hash__ and ordered vs. unordered collections
On Monday, November 20, 2017 at 2:31:40 PM UTC-5, MRAB wrote: > What if there are duplicate elements? > > Should that be MyColl(some_elements) == MyOrderedColl(other_elements) > iff len(some_elements) == len(other_elements) and set(some_elements) == > set(other_elements)? Yes, that's what I meant. Thanks for catching :) Please let me know if you have any thoughts on how you would design our hypothetical frozen(ordered)set library to behave in these cases. -- https://mail.python.org/mailman/listinfo/python-list
Re: How to Generate dynamic HTML Report using Python
On Tue, Nov 21, 2017 at 5:47 AM, Michael Torrie wrote: > You also have this header set: >> X-Copyright: (C) Copyright 2017 Stefan Ram. All rights reserved. >> Distribution through any means other than regular usenet >> channels is forbidden. It is forbidden to publish this >> article in the world wide web. It is forbidden to change >> URIs of this article into links. It is forbidden to remove >> this notice or to transfer the body without this notice. > > Looks to me like the mailing list needs to block your messages, lest > python.org be in violation of your copyright. Is that kind of copyright notice even enforceable? Personally, if I saw a header like that, I'd plonk the person, because anyone who says "please don't read my messages in any way other than the way I've stipulated" might as well be saying "please don't read my messages". It's not worth the hassle. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: __hash__ and ordered vs. unordered collections
On Tue, Nov 21, 2017 at 6:50 AM, Josh B. wrote: > On Monday, November 20, 2017 at 1:55:26 PM UTC-5, Chris Angelico wrote: >> But what you have is the strangeness of non-transitive equality, which >> is likely to cause problems. > > But this is exactly how Python's built-in dict and OrderedDict behave: > od = OrderedDict([(1, 0), (2, 0), (3, 0)]) od2 = OrderedDict([(3, 0), (2, 0), (1, 0)]) ud = dict(od) od == ud > True od2 == ud > True od == od2 > False > > > Given that, it would seem wrong for our MyOrderedColl.__eq__ to not behave > similarly. > > Or are you suggesting that OrderedDict.__eq__ should not have been > implemented this way in the first place? > > >> So the question is: are you willing to >> accept the bizarre behaviour of non-transitive equality? > > Forget what I'm personally willing to do :) > The question here actually is to tease out what Python's existing design is > telling us to do. > > If it helps, substitute "frozenset" for "MyColl" and "FrozenOrderedSet" for > "MyOrderedColl". How would you implement their __eq__ methods? What would be > the correct design for our hypothetical frozen(ordered)set library? What > would be more useful, intuitive, and usable for our users? > > Thanks very much for the good examples and for helping me clarify the > question! > What I'm saying is that non-transitive equality can cause a lot of confusion in sets/dicts; since OrderedDict and dict are unhashable, they won't themselves be problematic, and Python doesn't have a built-in FrozenOrderedSet. So there isn't really a precedent here, and it's up to you to decide how you want to deal with this. Basically, you're going to have to accept one of two situations: * Either your class doesn't behave the same way dict and OD do * Or your class, when put into a set, depends on ordering. Neither is perfect. You have to take your pick between them. ChrisA -- https://mail.python.org/mailman/listinfo/python-list
Re: General Purpose Pipeline library?
[email protected] (Stefan Ram): > Jason writes: >>I feel like I'm reinventing a wheel here. I was wondering if >>there's already something that exists? > > Why do you want this? Some time back Stephen D'Aprano demonstrated how the | operator can be defined to create pipelines in Python. As a hobby project, I started developing the idea further into a Python-based shell (I call it "snake"). I kinda proved to myself that it is very much doable and left it at that. For example: $ ./snake >>> ls() notes.txt .git snake notes.txt~ >>> ls() | cat() notes.txt .git snake notes.txt~ >>> ls() | grep(lambda x: "n" in x) notes.txt snake notes.txt~ >>> sleep(5) >>> sleep(5).bg() 29766 >>> [29766] Done: sleep(5) >>> X("/bin/echo hello") hello >>> X("/bin/seq 20") | grep(lambda x: "2" in x) 2 12 20 >>> So snake is just a regular Python REPL with some predefined things that implement a full-fledged Unix shell, courtesy of the amazingly complete Linux system call support by Python. The pipelines relay byte streams, line sequences or JSON arrays. The pipeline processors are either generators or processes. > Can't you just map what you want to do to plain-old Python? The above is plain Python, but it might be more pythonesque to do the pipelining using the dot notation: feed(dataset).f().g().h().output() Marko -- https://mail.python.org/mailman/listinfo/python-list
Re: "help( pi )"
On 20Nov2017 10:49, Greg Ewing wrote: Cameron Simpson wrote: Unless one had a misfortune and wanted another docstring. Good point. I guess having differing docstrings should make otherwise equal objects ineligible for merging. [...example...] I think setting the docstring of an existing immutable object would have to be disallowed -- you need to create a new object if you want it to have a distinct docstring, e.g. MAX_BUFSIZE = int(8192, __doc__ = 'Size of the hardware buffer used for I/O on this device.') Which is painful and elaborate. In my original post I had written: Now, I accept that the "CPython coaleases some values to shared singletons" thing is an issue, but the language doesn't require it, and one could change implementations such that applying a docstring to an object _removed_ it from the magic-shared-singleton pool, avoiding conflicts with other uses of the same value by coincidence. hoping for automatic arrangement of that. Cheers, Cameron Simpson (formerly [email protected]) -- https://mail.python.org/mailman/listinfo/python-list
Re: General Purpose Pipeline library?
On 20/11/17 15:48, Jason wrote:
> a pipeline can be described as a sequence of functions that are applied to an
> input with each subsequent function getting the output of the preceding
> function:
>
> out = f6(f5(f4(f3(f2(f1(in))
>
> However this isn't very readable and does not support conditionals.
>
> Tensorflow has tensor-focused pipepines:
> fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu,
> scope='fc1')
> fc2 = layers.fully_connected(fc1, 256, activation_fn=tf.nn.relu,
> scope='fc2')
> out = layers.fully_connected(fc2, 10, activation_fn=None, scope='out')
>
> I have some code which allows me to mimic this, but with an implied parameter.
>
> def executePipeline(steps, collection_funcs = [map, filter, reduce]):
> results = None
> for step in steps:
> func = step[0]
> params = step[1]
> if func in collection_funcs:
> print func, params[0]
> results = func(functools.partial(params[0],
> *params[1:]), results)
> else:
> print func
> if results is None:
> results = func(*params)
> else:
> results = func(*(params+(results,)))
> return results
>
> executePipeline( [
> (read_rows, (in_file,)),
> (map, (lower_row, field)),
> (stash_rows, ('stashed_file', )),
> (map, (lemmatize_row, field)),
> (vectorize_rows, (field, min_count,)),
> (evaluate_rows, (weights, None)),
> (recombine_rows, ('stashed_file', )),
> (write_rows, (out_file,))
> ]
> )
>
> Which gets me close, but I can't control where rows gets passed in. In the
> above code, it is always the last parameter.
>
> I feel like I'm reinventing a wheel here. I was wondering if there's already
> something that exists?
>
Maybe Kamaelia?
http://www.kamaelia.org/Home.html
Duncan
--
https://mail.python.org/mailman/listinfo/python-list
