Re: [Tutor] look back comprehensively
On 12/24/18 5:45 PM, Avi Gross wrote: > As for the UNIX tools, one nice thing about them was using them in a > pipeline where each step made some modification and often that merely > allowed the next step to modify that. The solution did not depend on one > tool doing everything. I know we're wondering off topic here, but I miss the days when this philosophy was more prevalent - "do one thing well" and be prepared to pass your results on in a way that a different tool could potentially consume, doing its one thing well, and so on if needed. Of course equivalents of those old UNIX tools are still with us, mostly thanks to the GNU umbrella of projects, but so many current tools have grown so many capabilities they no longer can interact with with other tools in any sane way. "pipes and filters" seems destined to be constrained to the dustbin of tech history. I'll shut up now... ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
[Tutor] decomposing a problem
[Long enough that some should neither read nor comment on.] Mats raised an issue that I think does relate to how to tutor people in python. The issue is learning how to take a PROBLEM to solve that looks massive and find ways to look at it as a series of steps where each step can be easily solved using available tools and techniques OR can recursively be decomposed into smaller parts that can. Many people learn to program without learning first how to write down several levels of requirements that spell out how each part of the overall result needs to look and finally how each part will be developed and tested. I worked in organizations with a division of labor to try to get this waterfall method in place. At times I would write higher-level architecture documents followed by Systems Engineering documents and Developer documents and Unit Test and System Test and even Field Support. The goal was to move from abstract to concrete so that the actual development was mainly writing fairly small functions, often used multiple times, and gluing them together. I looked back at the kind of tools used in UNIX and realize how limited they were relative to what is easily done in languages like python especially given a huge tool set you can import. The support for passing the output of one program to another made it easy to build pipelines. You can do that in python too but rarely need to. And I claim there are many easy ways to do things even better in python. Many UNIX tools were simple filters. One would read a file or two and pass through some of the lines, perhaps altered, to the standard output. The next process in the pipeline would often do the same, with a twist and sometimes new lines might even be added. The simple tools like cat and grep and sed and so on loosely fit the filter analogy. They worked on a line at a time, mostly. The more flexible tools like AWK and PERL are frankly more like Python than the simple tools. So if you had a similar task to do in python, is there really much difference? I claim not so much. Python has quite a few ways to do a filter. One simple one is a list comprehension and its relatives. Other variations are the map and filter functions and even reduce. Among other things, they can accept a list of lines of text and apply changes to them or just keep a subset or even calculate a result from them. Let me be concrete. You have a set of lines to process. You want to find all lines that pass through a gauntlet, perhaps with changes along the way. So assume you read an entire file (all at once at THIS point) into a list of lines. stuff = open(...).readlines() Condition 1 might be to keep only lines that had some word or pattern in them. You might have used sed or grep in the UNIX shell to specify a fixed string or pattern to search for. So in python, what might you do? Since stuff is a list, something like a list comprehension can handle many such needs. For a fixed string like "this" you can do something like this. stuff2 = [some_function(line) for line in stuff if some_condition(line)] The condition might be: "this" in line Or it might be a phrase than the line ends with something. Or it might be a regular expression type search. Or it might be the length is long enough or the number of words short enough. Every such condition can be some of the same things used in a UNIX pipeline or brand new ideas not available there like does a line translate into a set of numbers that are all prime! And, the function applied to what is kept can be to transform it to uppercase, or replace it with something else looked up in a dictionary and so on. You might even be able to apply multiple filters with each step. Python allows phrases like line.strip().upper() and conditions like: this or (that and not something_else) The point is a single line like the list comprehension above may already do what a pipeline of 8 simple commands in UNIX did, and more. Some of the other things UNIX tools did might involve taking a line and breaking it into chunks such as at a comma or tab or space and then keeping just the third and fifth and eighth but in reverse order. We sometimes used commands like cut or very brief AWK scripts to do that. Again, this can be trivial to do in python. Built in to character strings are functions that let you split a line like the above into a list of fields on a separator and perhaps rearrange and even rejoin them. In the above list comprehension method, if you are expecting eight regions that are comma separated >>> line1 = "f1,f2,f3,f4,f5,f6,f7,f8" >>> line2 = "g1,g2,g3,g4,g5,g6,g7,g8" >>> lines=[line1, line2] >>> splitsville = [line.split(',') for line in lines] >>> splitsville [['f1', 'f2', 'f3', 'f4', 'f5', 'f6', 'f7', 'f8'], ['g1', 'g2', 'g3', 'g4', 'g5', 'g6', 'g7', 'g8']] >>> items8_5_3 = [(h8, h5, h3) for (h1,h2,h3,h4,h5,h6,h7,h8) in splitsville] >>> items8_5_3 [('f8', 'f5', 'f3'), ('g8', 'g5', 'g3')] Or if you want them back as character with an und
Re: [Tutor] decomposing a problem
On 26/12/2018 00:00, Avi Gross wrote: > great. Many things in python can be made to fit and some need work. Dumb > example is that sorting something internally returns None and not the object > itself. This is one of my few complaints about Python. In Smalltalk the default return value from any method is self. In Python it is None. self allows chaining of methods, None does not. Introducing features like reversed() and sorted() partially addresses the issue but leads to inconsistent and ugly syntax. Smalltalk uses this technique so much it has its own code layout idiom (Pythonised as follows): object .method1() .method2() .method3() .lastone() We can do this with some methods but not all. And of course methods that return a different type of value require careful handling (eg. an index() call in the middle of a set of list operations means the subsequent methods are being called on an int not a list - which if handled correctly can be confusing and if not handled correctly produces errors! (The idiomatic way says don't chain with methods not returning self!) In practice I (and the Smalltalk community) don't find that an issue in real world usage, but it may have been why Guido chose not to do it that way. But I still curse the decision every time I hit it! But as I said, it's about the only thing in Python I dislike... a small price to pay. -- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/ http://www.amazon.com/author/alan_gauld Follow my photo-blog on Flickr at: http://www.flickr.com/photos/alangauldphotos ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] decomposing a problem
On Wed, Dec 26, 2018 at 01:06:04AM +, Alan Gauld via Tutor wrote: > In Smalltalk the default return value from > any method is self. In Python it is None. > > self allows chaining of methods, None does not. You might be interested in this simple recipe for retrofitting method chaining onto any class: http://code.activestate.com/recipes/578770-method-chaining-or-cascading/ -- Steve ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] decomposing a problem
On 26Dec2018 01:06, Alan Gauld wrote: On 26/12/2018 00:00, Avi Gross wrote: great. Many things in python can be made to fit and some need work. Dumb example is that sorting something internally returns None and not the object itself. This is one of my few complaints about Python. In Smalltalk the default return value from any method is self. In Python it is None. self allows chaining of methods, None does not. [...] Smalltalk uses this technique so much it has its own code layout idiom (Pythonised as follows): object .method1() .method2() .method3() .lastone() While I see your point, the Python distinction is that methods returning values tend to return _independent_ values; the original object is not normally semanticly changed. As you know. To take the builtin sorted() example, let us soppose object is a collection, such as a list. I would not want: object.sort() to return the list because that method has a side effect on object. By contract, I'd be happy with a: object.sorted() method returning a new list because it hasn't changes object, and it returns a nice chaining capable object for continued use. But that way lies a suite of doubled methods for most classes: one to apply some operation to an object, modifying it, and its partner to produce a new object (normally of the same type) being a copy of the first object with the operation applied. To me it is the side effect on the original object which weighs against modification methods returning self. Here's a shiny counter example for chaining. thread1: print(object.sorted()) thread2: print(object.sorted(reverse=True)) The above employs composable methods. And they conflict. When methods return a copy the above operation is, loosely speaking, safe: thread1: print(sorted(object)) thread2: print(sorted(object,reverse=True)) Cheers, Cameron Simpson ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] decomposing a problem
Alan, Your thoughts were helpful and gave me a hint. Just an idea. What if you sub-classed an object type like list with a name like chainable_list? For most things it would be left alone. But if you isolated specific named methods like sort() and reverse() you could over-ride them with the same name or a new name. If you override the function, you need to call list.sort() with whatever arguments you had passed and then return this. If you choose a new name, call this.sort() and then return this. I tried it and it seems to work fine when I use a new name: """Module to create a version of list that is more chainable""" class chainable_list(list): """Same as list but sort() can now be chained""" def chainsort(this, *args, **kwargs): this.sort(*args, **kwargs) return this Here it is on a list of ints: >>> testink = chainable_list([3,5,1,7]) >>> testink [3, 5, 1, 7] >>> testink.chainsort() [1, 3, 5, 7] >>> testink.chainsort(reverse=True) [7, 5, 3, 1] Here it is on a list of strings that sort differently unless coerced back into an int to show keyword arguments are passed: >>> testink = chainable_list(["3","15","1","7"]) >>> testink.chainsort() ['1', '15', '3', '7'] >>> testink.chainsort(reverse=True) ['7', '3', '15', '1'] >>> testink.chainsort(key=int,reverse=True) ['15', '7', '3', '1'] I then tested the second method using the same name but asking the original list sort to do things: """Module to create a version of list that is more chainable""" class chainable_list(list): """Same as list but sort() can now be chained""" def sort(this, *args, **kwargs): list.sort(this, *args, **kwargs) return this >>> testink = chainable_list(["3","15","1","7"]) >>> testink.sort() ['1', '15', '3', '7'] >>> testink.sort().sort(reverse=true) Traceback (most recent call last): File "", line 1, in testink.sort().sort(reverse=true) NameError: name 'true' is not defined >>> testink.sort().sort(reverse=True) ['7', '3', '15', '1'] >>> testink.sort().sort(reverse=True).sort(key=int) ['1', '3', '7', '15'] Again, it works fine. So if someone did something similar to many of the methods that now return None, you could use the new class when needed. This seems too simple so it must have been done. Obviously not in the standard distribution but perhaps elsewhere. And, no, I do not expect a method like pop() to suddenly return the list with a member dropped but it would be nice to fix some like this one: >>> testink.remove('7') >>> testink ['1', '3', '15'] Meanwhile, I hear Beethoven is decomp..., well never mind! It was probably Liszt! -Original Message- From: Tutor On Behalf Of Alan Gauld via Tutor Sent: Tuesday, December 25, 2018 8:06 PM To: tutor@python.org Subject: Re: [Tutor] decomposing a problem On 26/12/2018 00:00, Avi Gross wrote: > great. Many things in python can be made to fit and some need work. > Dumb example is that sorting something internally returns None and not > the object itself. This is one of my few complaints about Python. In Smalltalk the default return value from any method is self. In Python it is None. self allows chaining of methods, None does not. Introducing features like reversed() and sorted() partially addresses the issue but leads to inconsistent and ugly syntax. Smalltalk uses this technique so much it has its own code layout idiom (Pythonised as follows): object .method1() .method2() .method3() .lastone() We can do this with some methods but not all. And of course methods that return a different type of value require careful handling (eg. an index() call in the middle of a set of list operations means the subsequent methods are being called on an int not a list - which if handled correctly can be confusing and if not handled correctly produces errors! (The idiomatic way says don't chain with methods not returning self!) In practice I (and the Smalltalk community) don't find that an issue in real world usage, but it may have been why Guido chose not to do it that way. But I still curse the decision every time I hit it! But as I said, it's about the only thing in Python I dislike... a small price to pay. -- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/ http://www.amazon.com/author/alan_gauld Follow my photo-blog on Flickr at: http://www.flickr.com/photos/alangauldphotos ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] decomposing a problem
On Tue, Dec 25, 2018 at 10:25:50PM -0500, Avi Gross wrote: > class chainable_list(list): > """Same as list but sort() can now be chained""" > def chainsort(this, *args, **kwargs): > this.sort(*args, **kwargs) > return this In Python, it is traditional to use "self" rather than "this" as the instance parameter. Using "this" is not an error, but you can expect a lot of strange looks. Like a Scotsman in a kilt wandering down the middle of Main Street, Pleasantville USA. -- Steve ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] decomposing a problem
Mike, Excellent advice. I find that many people are fairly uncomfortable with abstraction and tend to resist a pure top down approach by diving to any solutions they may envision. For example, if you say things like create a data structure that can hold as many kinds of information as will be needed. The data should be able to be viewed in several ways and adding a new item should be fast even if the number of items grows large ... Some will have stopped reading (or creating) and will jump to deciding then need a dictionary. Others may want a deque. Some may insist they need a new class. But wait, if you continue reading or designing, it may be clear that some choices are not optimal. Heck, it may turn out some design elements are contradictory. As someone asked on another python list, is there a better way to get a random key for a dictionary. Well, not easily without expanding all keys into a list of perhaps huge length. Followed by a search of much of that list to get the nth index. So maybe a plain dictionary does not make that easy or efficient so do you give up that need or use some other data structure that makes that fast? Perhaps you need a hybrid data structure. One weird idea is to use the dictionary but every time you generate a new key/value pair you also store a second pair that looks like "findkey666": key so that a random key of the first kind can be found in constant time by picking a random number up to half the number of items, concatenate it to "findkey" and look up the value which is a key. When you try to work bottom up with students, some see no point as they are missing the big picture. I used to work during graduate school writing PASCAL code for a company making flexible manufacturing systems and my job often was to read a man page describing some function that did something minor. I often had no clue why it was needed or where it would be used? I was sometimes told it had to FIT into a certain amount of memory because of the overlay technique used and if it was compiled to something larger, was asked to break the function down into multiple functions that were called alternately Sometimes an entire section had to be redesigned because it had to fit into the same footprint as another. That was the limit of the big picture. A shadow! What I found works for me is a combination. I mean teaching. You give them just enough of the top-down view for motivation. Then you say that we need to figure out what kinds of things might be needed to support the functionality. This includes modules to import as well as objects or functions to build. But that too can be hard unless you move back into the middle and explain a bit about the subunit you are building so you know what kind of support it needs closer to the bottom. I admit that my personal style is the wrong one for most people. I do top down and bottom up simultaneously as well as jump into the middle to see both ways to try to make sure the parts will meet fairly seamlessly. Does not always work. How often have we seen a project where some function is designed with three arguments. Much later, you find out some uses of the function only have and need two but some may have additional arguments, perhaps to pass along to yet another function the second will conditionally invoke? It may turn out that the bottom up approach starting from one corner assumed that the function would easily meet multiple needs when the needs elsewhere are not identical enough. If they keep demanding one function to master all, you can end up with fairly awful spaghetti code. Of course python is not a compiled language like C/C++ and PASCAL and many others were. It is often fairly easy in python to have a variable number of arguments or for the same function to do something reasonable with multiple types and do something reasonable for each. One thing I warn people about is mission creep. When asked to do something, try not to add lots of nice features at least until you have developed and tested the main event. I have seen many projects that did feel the need to add every feature they could imagine as there remained keys on the keyboard that did not yet invoke some command, even if no customer ever asked for it or would ever use it. Amazing how often these projects took too long and came to market too late to catch on ... Some of the people asking questions here do not even tell us much about what is needed, let alone their initial design plan. It can take multiple interactions back and forth and I wonder how many give up long before as they just want an ANSWER. In case you wonder, I am reliably told the answer to life, the universe and everything is 2*21. -Original Message- From: Mike Mossey Sent: Tuesday, December 25, 2018 9:49 PM To: Avi Gross Subject: Re: [Tutor] decomposing a problem > On Dec 25, 2018, at 4:00 PM, Avi Gross wrote: > > [Long enough that some should nei
Re: [Tutor] decomposing a problem
[REAL SUBJECT: What's this?] Steven, I am afraid you are right. I was not selfish enough about this. I have done object-oriented programming in many other languages and I am afraid today it showed. Think C++ or Java. Part of me continues to think in every language I ever used, including human languages. So since the name of this variable is a suggestion, it was not enforced by the interpreter and I was not reminded. Be happy I even used an English word and not something like idempotent or eponymous . P.S. just to confuse the issue, some in JavaScript confusingly use both this and self near each other. P.P.S. Please pardon my puns, especially the ones you did not notice. -Original Message- From: Tutor On Behalf Of Steven D'Aprano Sent: Tuesday, December 25, 2018 11:39 PM To: tutor@python.org Subject: Re: [Tutor] decomposing a problem On Tue, Dec 25, 2018 at 10:25:50PM -0500, Avi Gross wrote: > class chainable_list(list): > """Same as list but sort() can now be chained""" > def chainsort(this, *args, **kwargs): > this.sort(*args, **kwargs) > return this In Python, it is traditional to use "self" rather than "this" as the instance parameter. Using "this" is not an error, but you can expect a lot of strange looks. Like a Scotsman in a kilt wandering down the middle of Main Street, Pleasantville USA. -- Steve ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] decomposing a problem
On Tue, Dec 25, 2018 at 11:56:21PM -0500, Avi Gross wrote: > I find that many people are fairly uncomfortable with abstraction and > tend to resist a pure top down approach by diving to any solutions > they may envision. https://blog.codinghorror.com/it-came-from-planet-architecture/ > As someone asked on another python list, > is there a better way to get a random key for a dictionary. Well, not > easily without expanding all keys into a list of perhaps huge length. Define "better". What do you value? Time, space, simplicity or something else? One of the most harmful things to value is "cleverness" for its own sake. Some people tend to value a "clever" solution even when it wastes time, space and is over complex and therefore hard to maintain or debug. Even when the practical response to the "clever" solution is "YAGNI". What counts as "huge"? To me, picking a random key from a list of 100 keys is "huge". Copy out 100 keys to a list by hand and then pick one? What a PITA that would be. But to your computer, chances are that ten million keys is "small". One hundred million might be pushing "largish". A billion, or perhaps ten billion, could be "large". Fifty, a hundred, maybe even a thousand billion (a trillion) would be "huge". Unless you expect to be handling at least a billion keys, there's probably no justification for anything more complex than: random.choose(list(dict.keys()) Chances are that it will be faster *and* use less memory than any clever solution you come up with -- and even if it does use more memory, it uses it for a few milliseconds, only when needed, unlike a more complex solution that inflates the size of the data structure all the time, whether you need it or not. Of course there may be use-cases where we really do need a more complex, clever solution, and are willing to trade off space for time (or sometimes time for space). But chances are YAGNI. > Followed by a search of much of that list to get the nth index. That's incorrect. Despite the name, Python lists aren't linked lists[1] where you have to traverse N items to get to the Nth item. They're arrays, where indexing requires constant time. [...] > If they keep demanding one function to master all, you can end up with > fairly awful spaghetti code. https://en.wikipedia.org/wiki/God_object [1] Technically speaking, this is not a requirement of the language, only a "quality of implementation" question. A Python interpreter could offer built-in lists using linked lists under the hood, with O(N) indexing. But all the major implementations -- CPython, Stackless, PyPy, Jython, IronPython, Cython, Nuitka, even (I think) MicroPython -- use arrays as the list implementation. Given how simple arrays are, I think it is fair to assume that any reasonable Python interpreter will do the same. -- Steve ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: https://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] decomposing a problem
Steven showed a more abstract solution than the one I tried but Cameron is making some good points on whether it might not be a great idea to chain some side-effect operations. I have seen languages where everything seems to be immutable. Python does this in places like with tuples. The idea is that every change results in a new copy of things, sort of. But in such a world, there is no real concept of making a change internally. The name pointing to the object is all that remains the same. The underlying object is a new copy if any changes are needed. So if I made a deep copy of "this" and returned that, then what would happen to the original? Would it still be changed for anything else that cared? When I am doing a pipeline, I really may not care about the original. At every point I care about propagating my changes. " Hello World ".lower() becomes " hello world " at that point. If I then add a .rstrip() and a .lstrip() each produces a new string without some whitespace. I don't care if the original is affected. If I then add a request to get just part of the string, again. The original is intact. The only time it is changed is when I assign the result back to the original variable and even then, anything else also pointing to it is unchanged. Python has plenty of operators of multiple kinds. Some return a shallow copy and some a deep copy and some an altered copy and some an altered original and some change NOTHING whatsoever. Some make subtle changes in the parent class for example which may later impact the child but are not stored in the child. And efficiency is also a concern. Returning something when not needed is not efficient and can result in having to do something to suppress. Brief digression to make the point. R defaults to returning the last evaluated item in a function with no explicit return statement. Python returns None. So sometimes a line typed at the console generates a print of the returned value. In some cases, the automatic print generates a graph, as in a ggplot object. So some functions take care to mark the returned value as invisible. It is there if you ask for it by saving the result to a variable but does not otherwise print by default. So I can easily see why the design of some features is to not do more than you have to. If the goal is to change the current object, you can simply show the darn object afterwards, right? Well, no, not easy when using a pipeline method. Still, how much would it hurt to allow a keyword option on those methods people WANT to call in a pipeline when it makes sense. Why not let me say object.sort(key=int, reverse=True,displayCopy=True) or something like that. If that messes up threads, fine. Don't use them there. The side effect issue is not to be taken lightly. I believe that may be similar to why there is no ++ operator. But they are adding := which arguably is also a side effect. -Original Message- From: Tutor On Behalf Of Cameron Simpson Sent: Tuesday, December 25, 2018 8:44 PM To: tutor@python.org Subject: Re: [Tutor] decomposing a problem On 26Dec2018 01:06, Alan Gauld wrote: >On 26/12/2018 00:00, Avi Gross wrote: >> great. Many things in python can be made to fit and some need work. >> Dumb example is that sorting something internally returns None and >> not the object itself. > >This is one of my few complaints about Python. >In Smalltalk the default return value from any method is self. In >Python it is None. >self allows chaining of methods, None does not. [...] >Smalltalk uses this technique so much it has its own code layout idiom >(Pythonised as >follows): > >object > .method1() > .method2() > .method3() > > .lastone() While I see your point, the Python distinction is that methods returning values tend to return _independent_ values; the original object is not normally semanticly changed. As you know. To take the builtin sorted() example, let us soppose object is a collection, such as a list. I would not want: object.sort() to return the list because that method has a side effect on object. By contract, I'd be happy with a: object.sorted() method returning a new list because it hasn't changes object, and it returns a nice chaining capable object for continued use. But that way lies a suite of doubled methods for most classes: one to apply some operation to an object, modifying it, and its partner to produce a new object (normally of the same type) being a copy of the first object with the operation applied. To me it is the side effect on the original object which weighs against modification methods returning self. Here's a shiny counter example for chaining. thread1: print(object.sorted()) thread2: print(object.sorted(reverse=True)) The above employs composable methods. And they conflict. When methods return a copy the above operation is, loosely speaking, safe: thread1: print(sorted(object)) thread2: print(sorted(object,reverse=True)