General Purpose Pipeline library?

2017-11-20 Thread Jason
a pipeline can be described as a sequence of functions that are applied to an 
input with each subsequent function getting the output of the preceding 
function:

out = f6(f5(f4(f3(f2(f1(in))

However this isn't very readable and does not support conditionals.

Tensorflow has tensor-focused pipepines:
fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu, scope='fc1')
fc2 = layers.fully_connected(fc1, 256, activation_fn=tf.nn.relu, 
scope='fc2')
out = layers.fully_connected(fc2, 10, activation_fn=None, scope='out')

I have some code which allows me to mimic this, but with an implied parameter.

def executePipeline(steps, collection_funcs = [map, filter, reduce]):
results = None
for step in steps:
func = step[0]
params = step[1]
if func in collection_funcs:
print func, params[0]
results = func(functools.partial(params[0], 
*params[1:]), results)
else:
print func
if results is None:
results = func(*params)
else:
results = func(*(params+(results,)))
return results

executePipeline( [
(read_rows, (in_file,)),
(map, (lower_row, field)),
(stash_rows, ('stashed_file', )),
(map, (lemmatize_row, field)),
(vectorize_rows, (field, min_count,)),
(evaluate_rows, (weights, None)),
(recombine_rows, ('stashed_file', )),
(write_rows, (out_file,))
]
)

Which gets me close, but I can't control where rows gets passed in. In the 
above code, it is always the last parameter.

I feel like I'm reinventing a wheel here.  I was wondering if there's already 
something that exists?

-- 
https://mail.python.org/mailman/listinfo/python-list


Re: General Purpose Pipeline library?

2017-11-20 Thread Skip Montanaro
> I feel like I'm reinventing a wheel here.  I was wondering if there's already 
> something that exists?

I've wondered from time-to-time about using shell pipeline notation
within Python. Maybe the grapevine package could be a starting point?
I realize that's probably not precisely what you're looking for, but
maybe it will give you some ideas. (I've never used it, just stumbled
on it with a bit of poking around.)

Skip
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: General Purpose Pipeline library?

2017-11-20 Thread Bob Gailer
On Nov 20, 2017 10:50 AM, "Jason"  wrote:
>
> a pipeline can be described as a sequence of functions that are applied
to an input with each subsequent function getting the output of the
preceding function:
>
> out = f6(f5(f4(f3(f2(f1(in))
>
> However this isn't very readable and does not support conditionals.
>
> Tensorflow has tensor-focused pipepines:
> fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu,
scope='fc1')
> fc2 = layers.fully_connected(fc1, 256, activation_fn=tf.nn.relu,
scope='fc2')
> out = layers.fully_connected(fc2, 10, activation_fn=None, scope='out')
>
> I have some code which allows me to mimic this, but with an implied
parameter.
>
> def executePipeline(steps, collection_funcs = [map, filter, reduce]):
> results = None
> for step in steps:
> func = step[0]
> params = step[1]
> if func in collection_funcs:
> print func, params[0]
> results = func(functools.partial(params[0],
*params[1:]), results)
> else:
> print func
> if results is None:
> results = func(*params)
> else:
> results = func(*(params+(results,)))
> return results
>
> executePipeline( [
> (read_rows, (in_file,)),
> (map, (lower_row, field)),
> (stash_rows, ('stashed_file', )),
> (map, (lemmatize_row, field)),
> (vectorize_rows, (field, min_count,)),
> (evaluate_rows, (weights, None)),
> (recombine_rows, ('stashed_file', )),
> (write_rows, (out_file,))
> ]
> )
>
> Which gets me close, but I can't control where rows gets passed in. In
the above code, it is always the last parameter.
>
> I feel like I'm reinventing a wheel here.  I was wondering if there's
already something that exists?

IBM has had for a very long time a program called Pipelines which runs on
IBM mainframes. It does what you want.

A number of attempts have been made to create cross-platform versions of
this marvelous program.

A long time ago I started but never completed an open source python
version. If you are interested in taking a look at this let me know.
-- 
https://mail.python.org/mailman/listinfo/python-list


__hash__ and ordered vs. unordered collections

2017-11-20 Thread Josh B.
Suppose we're implementing an immutable collection type that comes in unordered 
and ordered flavors. Let's call them MyColl and MyOrderedColl.

We implement __eq__ such that MyColl(some_elements) == 
MyOrderedColl(other_elements) iff set(some_elements) == set(other_elements).

But MyOrderedColl(some_elements) == MyOrderedColl(other_elements) iff 
list(some_elements) == list(other_elements).

This works just like dict and collections.OrderedDict, in other words.

Since our collection types are immutable, let's say we want to implement 
__hash__.

We must ensure that our __hash__ results are consistent with __eq__. That is, 
we make sure that if MyColl(some_elements) == MyOrderedColl(other_elements), 
then hash(MyColl(some_elements)) == hash(MyOrderedColl(other_elements)).

Now for the question: Is this useful? I ask because this leads to the following 
behavior:

>>> unordered = MyColl([1, 2, 3])
>>> ordered = MyOrderedColl([3, 2, 1])
>>> s = {ordered, unordered}
>>> len(s)
1
>>> s = {ordered}
>>> unordered in s
True
>>> # etc.

In other words, sets and mappings can't tell unordered and ordered apart; 
they're treated like the same values.

This is a bit reminiscent of:

>>> s = {1.0}
>>> True in s
True
>>> d = {1: int, 1.0: float, True: bool}
>>> len(d)
1
>>> # etc.

The first time I encountered this was a bit of an "aha", but to be clear, I 
think this behavior is totally right.

However, I'm less confident that this kind of behavior is useful for MyColl and 
MyOrderedColl. Could anyone who feels more certain one way or the other please 
explain the rationale and possibly even give some real-world examples?

Thanks!

Josh
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: Is there something like head() and str() of R in python?

2017-11-20 Thread Mario R. Osorio
On Sunday, November 19, 2017 at 2:05:12 PM UTC-5, Peng Yu wrote:
> Hi, R has the functions head() and str() to show the brief content of
> an object. Is there something similar in python for this purpose?
> 
> For example, I want to inspect the content of the variable "train".
> What is the best way to do so? Thanks.
> 
> $ cat demo.py
> from __future__ import division, print_function, absolute_import
> 
> import tflearn
> from tflearn.data_utils import to_categorical, pad_sequences
> from tflearn.datasets import imdb
> 
> # IMDB Dataset loading
> train, test, _ = imdb.load_data(path='imdb.pkl', n_words=1,
> valid_portion=0.1)
> 
> # 
> https://raw.githubusercontent.com/llSourcell/How_to_do_Sentiment_Analysis/master/demo.py
> 
> -- 
> Regards,
> Peng

Python is very good at giving you a string representation of any object. 
However, such capabilities do fall short every now and then. That is why when 
defining your own classes, you must also override the __init__() and _-repr__() 
methods so you can get a better suited string representation of such objects.

You can read more at: 
https://stackoverflow.com/questions/12448175/confused-about-str-in-python
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to Generate dynamic HTML Report using Python

2017-11-20 Thread Michael Torrie
Your thoughts on scope are interesting, if unorthodox.  There is a
problem with your deleting names after use, which is why we rarely
delete names.  The problem is that deleting a name does not not
necessarily or immediately destroy an object.  This can lead to great
confusion for programmers coming from a RAII language like C++.  All del
does is delete a name/object binding.  Which is the exact same thing as
reassigning the name to a new object.

Thus I don't see how using del as you do is at all useful, either for
programming correctness or understanding.  In fact I think it might even
be harmful.

Far more useful to teach people to use context handlers when
appropriate.  For example, when working with your sql connection object.

On 11/20/2017 07:50 AM, Stefan Ram wrote:
>   I am posting to a Usenet newsgroup. I am not aware of any
>   "Python-List mailing list".

As far as I'm concerned, this list is primarily a mailing list, hosted
by Mailman at python.org, and is mirrored to Usenet via a gateway as a
service by python.org.  Granted, this is just a matter of perspective.

>   I am posting specifically to the Usenet, because I am aware
>   of it's rules and I like it and wish to support it. 

What rules are these?

I'm curious what news reader you are using as your posts are, well,
unique.  You've set headers that most do not, and your post bodies are
all base64 encoded.  Your quotes and block (un)indents are very odd
also.  A very curious setup.

You also have this header set:
> X-Copyright: (C) Copyright 2017 Stefan Ram. All rights reserved.
> Distribution through any means other than regular usenet
> channels is forbidden. It is forbidden to publish this
> article in the world wide web. It is forbidden to change
> URIs of this article into links. It is forbidden to remove
> this notice or to transfer the body without this notice.

Looks to me like the mailing list needs to block your messages, lest
python.org be in violation of your copyright.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: __hash__ and ordered vs. unordered collections

2017-11-20 Thread Chris Angelico
On Tue, Nov 21, 2017 at 4:47 AM, Josh B.  wrote:
> Now for the question: Is this useful? I ask because this leads to the 
> following behavior:
>
 unordered = MyColl([1, 2, 3])
 ordered = MyOrderedColl([3, 2, 1])
 s = {ordered, unordered}
 len(s)
> 1
 s = {ordered}
 unordered in s
> True
 # etc.
>
> In other words, sets and mappings can't tell unordered and ordered apart; 
> they're treated like the same values.
>
> However, I'm less confident that this kind of behavior is useful for MyColl 
> and MyOrderedColl. Could anyone who feels more certain one way or the other 
> please explain the rationale and possibly even give some real-world examples?
>

This isn't a consequence of __hash__, it's a consequence of __eq__.
You have declared that MyColl and MyOrderedColl are equal, therefore
only one of them stays in the set.

But what you have is the strangeness of non-transitive equality, which
is likely to cause problems.

>>> unordered = MyColl([1, 2, 3])
>>> ordered1 = MyColl([3, 2, 1])
>>> ordered2 = MyColl([2, 1, 3])

unordered is equal to each of the others, but they're not equal to
each other. So if you put them into a set, you'll get results that
depend on order. Here's a simpler form of non-transitive equality:

>>> class Oddity(int):
... def __eq__(self, other):
... if other - 5 <= self <= other + 5:
... return True
... return int(self) == other
... def __hash__(self):
... return 1
...
>>> x, y, z = Oddity(5), Oddity(10), Oddity(15)
>>> x == y, y == z, x == z
(True, True, False)
>>> {x, y, z}
{5, 15}
>>> {y, x, z}
{10}

Setting __hash__ to a constant value is safe (but inefficient); it's
all based on how __eq__ works. So the question is: are you willing to
accept the bizarre behaviour of non-transitive equality?

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: __hash__ and ordered vs. unordered collections

2017-11-20 Thread MRAB

On 2017-11-20 17:47, Josh B. wrote:

Suppose we're implementing an immutable collection type that comes in unordered 
and ordered flavors. Let's call them MyColl and MyOrderedColl.

We implement __eq__ such that MyColl(some_elements) == 
MyOrderedColl(other_elements) iff set(some_elements) == set(other_elements).


What if there are duplicate elements?

Should that be MyColl(some_elements) == MyOrderedColl(other_elements) 
iff len(some_elements) == len(other_elements) and set(some_elements) == 
set(other_elements)?



But MyOrderedColl(some_elements) == MyOrderedColl(other_elements) iff 
list(some_elements) == list(other_elements).

This works just like dict and collections.OrderedDict, in other words.

Since our collection types are immutable, let's say we want to implement 
__hash__.

We must ensure that our __hash__ results are consistent with __eq__. That is, 
we make sure that if MyColl(some_elements) == MyOrderedColl(other_elements), 
then hash(MyColl(some_elements)) == hash(MyOrderedColl(other_elements)).

Now for the question: Is this useful? I ask because this leads to the following 
behavior:


unordered = MyColl([1, 2, 3])
ordered = MyOrderedColl([3, 2, 1])
s = {ordered, unordered}
len(s)

1

s = {ordered}
unordered in s

True

# etc.


In other words, sets and mappings can't tell unordered and ordered apart; 
they're treated like the same values.

This is a bit reminiscent of:


s = {1.0}
True in s

True

d = {1: int, 1.0: float, True: bool}
len(d)

1

# etc.


The first time I encountered this was a bit of an "aha", but to be clear, I 
think this behavior is totally right.

However, I'm less confident that this kind of behavior is useful for MyColl and 
MyOrderedColl. Could anyone who feels more certain one way or the other please 
explain the rationale and possibly even give some real-world examples?

If MyColl(some_elements) == MyOrderedColl(other_elements), then 
len({MyColl(some_elements), MyOrderedColl(other_elements)}) == 1 seems 
right.


As for which one is in the set:

>>> {1, 1.0}
{1}
>>> {1.0, 1}
{1.0}

So if MyColl(some_elements) == MyOrderedColl(other_elements), then 
{MyColl(some_elements), MyOrderedColl(other_elements)} == 
{MyColl(some_elements)}.

--
https://mail.python.org/mailman/listinfo/python-list


Re: __hash__ and ordered vs. unordered collections

2017-11-20 Thread Josh B.
On Monday, November 20, 2017 at 1:55:26 PM UTC-5, Chris Angelico wrote:
> But what you have is the strangeness of non-transitive equality, which
> is likely to cause problems.

But this is exactly how Python's built-in dict and OrderedDict behave:

>>> od = OrderedDict([(1, 0), (2, 0), (3, 0)])
>>> od2 = OrderedDict([(3, 0), (2, 0), (1, 0)])
>>> ud = dict(od)
>>> od == ud
True
>>> od2 == ud
True
>>> od == od2
False


Given that, it would seem wrong for our MyOrderedColl.__eq__ to not behave 
similarly.

Or are you suggesting that OrderedDict.__eq__ should not have been implemented 
this way in the first place?


> So the question is: are you willing to
> accept the bizarre behaviour of non-transitive equality?

Forget what I'm personally willing to do :)
The question here actually is to tease out what Python's existing design is 
telling us to do.

If it helps, substitute "frozenset" for "MyColl" and "FrozenOrderedSet" for 
"MyOrderedColl". How would you implement their __eq__ methods? What would be 
the correct design for our hypothetical frozen(ordered)set library? What would 
be more useful, intuitive, and usable for our users?

Thanks very much for the good examples and for helping me clarify the question!
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: __hash__ and ordered vs. unordered collections

2017-11-20 Thread Josh B.
On Monday, November 20, 2017 at 2:31:40 PM UTC-5, MRAB wrote:
> What if there are duplicate elements?
> 
> Should that be MyColl(some_elements) == MyOrderedColl(other_elements) 
> iff len(some_elements) == len(other_elements) and set(some_elements) == 
> set(other_elements)?

Yes, that's what I meant. Thanks for catching :)

Please let me know if you have any thoughts on how you would design our 
hypothetical frozen(ordered)set library to behave in these cases.
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: How to Generate dynamic HTML Report using Python

2017-11-20 Thread Chris Angelico
On Tue, Nov 21, 2017 at 5:47 AM, Michael Torrie  wrote:
> You also have this header set:
>> X-Copyright: (C) Copyright 2017 Stefan Ram. All rights reserved.
>> Distribution through any means other than regular usenet
>> channels is forbidden. It is forbidden to publish this
>> article in the world wide web. It is forbidden to change
>> URIs of this article into links. It is forbidden to remove
>> this notice or to transfer the body without this notice.
>
> Looks to me like the mailing list needs to block your messages, lest
> python.org be in violation of your copyright.

Is that kind of copyright notice even enforceable? Personally, if I
saw a header like that, I'd plonk the person, because anyone who says
"please don't read my messages in any way other than the way I've
stipulated" might as well be saying "please don't read my messages".
It's not worth the hassle.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: __hash__ and ordered vs. unordered collections

2017-11-20 Thread Chris Angelico
On Tue, Nov 21, 2017 at 6:50 AM, Josh B.  wrote:
> On Monday, November 20, 2017 at 1:55:26 PM UTC-5, Chris Angelico wrote:
>> But what you have is the strangeness of non-transitive equality, which
>> is likely to cause problems.
>
> But this is exactly how Python's built-in dict and OrderedDict behave:
>
 od = OrderedDict([(1, 0), (2, 0), (3, 0)])
 od2 = OrderedDict([(3, 0), (2, 0), (1, 0)])
 ud = dict(od)
 od == ud
> True
 od2 == ud
> True
 od == od2
> False
>
>
> Given that, it would seem wrong for our MyOrderedColl.__eq__ to not behave 
> similarly.
>
> Or are you suggesting that OrderedDict.__eq__ should not have been 
> implemented this way in the first place?
>
>
>> So the question is: are you willing to
>> accept the bizarre behaviour of non-transitive equality?
>
> Forget what I'm personally willing to do :)
> The question here actually is to tease out what Python's existing design is 
> telling us to do.
>
> If it helps, substitute "frozenset" for "MyColl" and "FrozenOrderedSet" for 
> "MyOrderedColl". How would you implement their __eq__ methods? What would be 
> the correct design for our hypothetical frozen(ordered)set library? What 
> would be more useful, intuitive, and usable for our users?
>
> Thanks very much for the good examples and for helping me clarify the 
> question!
>

What I'm saying is that non-transitive equality can cause a lot of
confusion in sets/dicts; since OrderedDict and dict are unhashable,
they won't themselves be problematic, and Python doesn't have a
built-in FrozenOrderedSet. So there isn't really a precedent here, and
it's up to you to decide how you want to deal with this.

Basically, you're going to have to accept one of two situations:
* Either your class doesn't behave the same way dict and OD do
* Or your class, when put into a set, depends on ordering.

Neither is perfect. You have to take your pick between them.

ChrisA
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: General Purpose Pipeline library?

2017-11-20 Thread Marko Rauhamaa
[email protected] (Stefan Ram):

> Jason  writes:
>>I feel like I'm reinventing a wheel here.  I was wondering if
>>there's already something that exists?
>
>   Why do you want this?

Some time back Stephen D'Aprano demonstrated how the | operator can be
defined to create pipelines in Python. As a hobby project, I started
developing the idea further into a Python-based shell (I call it
"snake"). I kinda proved to myself that it is very much doable and left
it at that.

For example:

   $ ./snake 
   >>> ls()
   notes.txt
   .git
   snake
   notes.txt~
   >>> ls() | cat()
   notes.txt
   .git
   snake
   notes.txt~
   >>> ls() | grep(lambda x: "n" in x)
   notes.txt
   snake
   notes.txt~
   >>> sleep(5)
   >>> sleep(5).bg()
   29766
   >>> 
   [29766] Done: sleep(5)
   >>> X("/bin/echo hello")
   hello
   >>> X("/bin/seq 20") | grep(lambda x: "2" in x)
   2
   12
   20
   >>> 

So snake is just a regular Python REPL with some predefined things that
implement a full-fledged Unix shell, courtesy of the amazingly complete
Linux system call support by Python.

The pipelines relay byte streams, line sequences or JSON arrays. The
pipeline processors are either generators or processes.

>  Can't you just map what you want to do to plain-old Python?

The above is plain Python, but it might be more pythonesque to do the
pipelining using the dot notation:

   feed(dataset).f().g().h().output()


Marko
-- 
https://mail.python.org/mailman/listinfo/python-list


Re: "help( pi )"

2017-11-20 Thread Cameron Simpson

On 20Nov2017 10:49, Greg Ewing  wrote:

Cameron Simpson wrote:

Unless one had a misfortune and wanted another docstring.


Good point. I guess having differing docstrings should make
otherwise equal objects ineligible for merging.


[...example...]


I think setting the docstring of an existing immutable object
would have to be disallowed -- you need to create a new object
if you want it to have a distinct docstring, e.g.

MAX_BUFSIZE = int(8192, __doc__ = 'Size of the hardware buffer used 
for I/O on this device.')


Which is painful and elaborate. In my original post I had written:

 Now, I accept that the "CPython coaleases some values to shared singletons" 
 thing is an issue, but the language doesn't require it, and one could change 
 implementations such that applying a docstring to an object _removed_ it from 
 the magic-shared-singleton pool, avoiding conflicts with other uses of the 
 same value by coincidence.


hoping for automatic arrangement of that.

Cheers,
Cameron Simpson  (formerly [email protected])
--
https://mail.python.org/mailman/listinfo/python-list


Re: General Purpose Pipeline library?

2017-11-20 Thread duncan smith
On 20/11/17 15:48, Jason wrote:
> a pipeline can be described as a sequence of functions that are applied to an 
> input with each subsequent function getting the output of the preceding 
> function:
> 
> out = f6(f5(f4(f3(f2(f1(in))
> 
> However this isn't very readable and does not support conditionals.
> 
> Tensorflow has tensor-focused pipepines:
> fc1 = layers.fully_connected(x, 256, activation_fn=tf.nn.relu, 
> scope='fc1')
> fc2 = layers.fully_connected(fc1, 256, activation_fn=tf.nn.relu, 
> scope='fc2')
> out = layers.fully_connected(fc2, 10, activation_fn=None, scope='out')
> 
> I have some code which allows me to mimic this, but with an implied parameter.
> 
> def executePipeline(steps, collection_funcs = [map, filter, reduce]):
>   results = None
>   for step in steps:
>   func = step[0]
>   params = step[1]
>   if func in collection_funcs:
>   print func, params[0]
>   results = func(functools.partial(params[0], 
> *params[1:]), results)
>   else:
>   print func
>   if results is None:
>   results = func(*params)
>   else:
>   results = func(*(params+(results,)))
>   return results
> 
> executePipeline( [
>   (read_rows, (in_file,)),
>   (map, (lower_row, field)),
>   (stash_rows, ('stashed_file', )),
>   (map, (lemmatize_row, field)),
>   (vectorize_rows, (field, min_count,)),
>   (evaluate_rows, (weights, None)),
>   (recombine_rows, ('stashed_file', )),
>   (write_rows, (out_file,))
>   ]
> )
> 
> Which gets me close, but I can't control where rows gets passed in. In the 
> above code, it is always the last parameter.
> 
> I feel like I'm reinventing a wheel here.  I was wondering if there's already 
> something that exists?
> 

Maybe Kamaelia?

http://www.kamaelia.org/Home.html

Duncan
-- 
https://mail.python.org/mailman/listinfo/python-list