Re: [Tutor] Error in Game
On 7/2/2013 4:22 PM, Jack Little wrote: I know the code is correct As Joel said- how could it be, since you do not get the desired results? When posting questions tell us: - what version of Python? - what operating system? - what you use to edit (write) your code - what you do to run your code copy and paste the execution , but it doesn't send the player to the shop. Here is the code: def lvl3_2(): print "You beat level 3!" print "Congratulations!" print "You have liberated the Bristol Channel!" print "[Y] to go to the shop or [N] to advance." final1=raw_input(">>") if final1.lower()=="y": shop2() elif final1.lower()=="n": lvl4() It is a good idea to add an else clause to handle the case where the user's entry does not match the if or elif tests. It is not a good idea to use recursion to navigate a game structure. It is better to have each function return to a main program, have the main program determine the next step and invoke it. Help? Since we are volunteers, the more you tell us the easier it is for us to do that. -- Bob Gailer 919-636-4239 Chapel Hill NC ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] memory consumption
On 04/07/13 04:17, Andre' Walker-Loud wrote: Hi All, I wrote some code that is running out of memory. How do you know? What are the symptoms? Do you get an exception? Computer crashes? Something else? It involves a set of three nested loops, manipulating a data file (array) of dimension ~ 300 x 256 x 1 x 2. Is it a data file, or an array? They're different things. It uses some third party software, but my guess is I am just not aware of how to use proper memory management and it is not the 3rd party software that is the culprit. As a general rule, you shouldn't need to worry about such things, at least 99% of the time. Memory management is new to me, and so I am looking for some general guidance. I had assumed that reusing a variable name in a loop would automatically flush the memory by just overwriting it. But this is probably wrong. Below is a very generic version of what I am doing. I hope there is something obvious I am doing wrong or not doing which I can to dump the memory in each cycle of the innermost loop. Hopefully, what I have below is meaningful enough, but again, I am new to this, so we shall see. Completely non-meaningful. # generic code skeleton # import a class I wrote to utilize the 3rd party software import my_class Looking at the context here, "my_class" is a misleading name, since it's actually a module, not a class. # instantiate the function do_stuff my_func = my_class.do_stuff() This is getting confusing. Either you've oversimplified your pseudo-code, or you're using words in ways that do not agree with standard terminology. Or both. You don't instantiate functions, you instantiate a class, which gives you an instance (an object), not a function. So I'm lost here -- I have no idea what my_class is (possibly a module?), or do_stuff (possibly a class?) or my_func (possibly an instance?). # I am manipulating a data array of size ~ 300 x 256 x 1 x 2 data = my_data # my_data is imported just once and has the size above Where, and how, is my_data imported from? What is it? You say it is "a data array" (what sort of data array?) of size 300x256x1x2 -- that's a four-dimensional array, with 153600 entries. What sort of entries? Is that 153600 bytes (about 150K) or 153600 x 64-bit floats (about 1.3 MB)? Or 153600 data structures, each one holding 1MB of data (about 153 GB)? # instantiate a 3d array of size 20 x 10 x 10 and fill it with all zeros my_array = numpy.zeros([20,10,10]) At last, we finally see something concrete! A numpy array. Is this the same sort of array used above? # loop over parameters and fill array with desired output for i in range(loop_1): for j in range(loop_2): for k in range(loop_3): How big are loop_1, loop_2, loop_3? You should consider using xrange() rather than range(). If the number is very large, xrange will be more memory efficient. # create tmp_data that has a shape which is the same as data except the first dimension can range from 1 - 1024 instead of being fixed at 300 ''' Is the next line where I am causing memory problems? ''' tmp_data = my_class.chop_data(data,i,j,k) How can we possibly tell if chop_data is causing memory problems when you don't show us what chop_data does? my_func(tmp_data) my_func.third_party_function() Again, no idea what they do. my_array([i,j,k]) = my_func.results() # this is just a floating point number ''' should I do something to flush tmp_data? ''' No. Python will automatically garbage collect is as needed. Well, that's not quite true. It depends on what my_tmp actually is. So, *probably* no. But without seeing the code for my_tmp, I cannot be sure. -- Steven ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] memory consumption
On 07/03/2013 02:17 PM, Andre' Walker-Loud wrote: Hi All, I wrote some code that is running out of memory. And you know this how? What OS are you using, and specifically how is it telling you that you've run out of memory? And while you're at it, what version of Python? And are the OS and Python 32 and 32 bit, or 64 and 64, or mixed? It involves a set of three nested loops, manipulating a data file (array) of dimension ~ 300 x 256 x 1 x 2. It uses some third party software, but my guess is I am just not aware of how to use proper memory management and it is not the 3rd party software that is the culprit. In particular you're using numpy, and I don't know its quirks. So I'll just speak of Python in general, and let someone else address numpy. Memory management is new to me, and so I am looking for some general guidance. I had assumed that reusing a variable name in a loop would automatically flush the memory by just overwriting it. It could be useful to learn how Python memory is manipulated. To start with, the 'variable' doesn't take a noticeable amount of space. It's the object its bound to that might take up lots of space, directly or indirectly. When you bind a new object to it, you free up the last one, unless something else is also bound to it. By indirectly, I refer to something like a list, which is one object, but which generally is bound to dozens or millions of others, and any of those may be bound to lots of others. Unbinding the list will usually free up all that stuff. The other thing that can happen is an object may indirectly be bound to itself. Trivial example: >>> mylist = [1,2] >>> mylist.append(mylist) >>> mylist [1, 2, [...]] >>> Fortunately for us, the repr() display of mylist doesn't descend infinitely into the guts of the elements, or it would be still printing next week (or until the printing logic ran out of memory). Anyway, once you have such a binding loop, the simple memory freeing logic (refcount) has to defer to the slower and less frequently run gc (garbage collection). But this is probably wrong. Below is a very generic version of what I am doing. I hope there is something obvious I am doing wrong or not doing which I can to dump the memory in each cycle of the innermost loop. Hopefully, what I have below is meaningful enough, but again, I am new to this, so we shall see. # generic code skeleton # import a class I wrote to utilize the 3rd party software import my_class # instantiate the function do_stuff my_func = my_class.do_stuff() So this is a class-static method which returns a callable object? One with methods of its own? # I am manipulating a data array of size ~ 300 x 256 x 1 x 2 data = my_data # my_data is imported just once and has the size above # instantiate a 3d array of size 20 x 10 x 10 and fill it with all zeros my_array = numpy.zeros([20,10,10]) # loop over parameters and fill array with desired output for i in range(loop_1): for j in range(loop_2): for k in range(loop_3): # create tmp_data that has a shape which is the same as data except the first dimension can range from 1 - 1024 instead of being fixed at 300 ''' Is the next line where I am causing memory problems? ''' Hard to tell. is chop-data() a trivial function you could have posted? It's a class method, not an instance method. Is it keeping references to the data it's returning? Perhaps for caching purposes? tmp_data = my_class.chop_data(data,i,j,k) my_func(tmp_data) my_func.third_party_function() my_array([i,j,k]) = my_func.results() # this is just a floating point number ''' should I do something to flush tmp_data? ''' You don't show us any code that would cause me to suspect tmp_data. # You leave out so much that it's hard to know what parts to ask you to post. if data is a numpy array, and my_class.chop_data is a class method, perhaps you could post that class method. Do you have a tool for your OS that lets you examine memory usage dynamically? If you do, sometimes it's instructive to watch while a program is running to see what the dynamics are. Note that Python, like nearly any other program written with the C library, will not necessarily free memory all the way to the OS at any particular moment in time. If you (A C programmer) were to malloc() a megabyte block and immediately free it, you might not see the free externally, but new allocations would instead be carved out of that freed block. Those specifics vary with OS and with C compiler. And it may very well vary with size of block. Thus individual blocks over a certain size may be allocated directly from the OS, and freed immediately when done, while smaller blocks are coalesced in the library and reused over and over
Re: [Tutor] memory consumption
On 03/07/13 19:17, Andre' Walker-Loud wrote: Your terminology is all kixed up and therefore does not make sense. WE definitely need to know more about the my_class module and do_stuff # generic code skeleton # import a class I wrote to utilize the 3rd party software import my_class This import a module which may contain some code that you wrote but... # instantiate the function do_stuff my_func = my_class.do_stuff() You don;t instantiate functions you call them. You are setting my_func to be the return value of do_stuff(). What is that return value? What does my_func actually refer to? my_array = numpy.zeros([20,10,10]) # loop over parameters and fill array with desired output for i in range(loop_1): for j in range(loop_2): for k in range(loop_3): # create tmp_data that has a shape which is the same as data except the first dimension can range from 1 - 1024 instead of being fixed at 300 ''' Is the next line where I am causing memory problems? ''' tmp_data = my_class.chop_data(data,i,j,k) Again we must guess what the chop_data function is returning. Some sample data would be useful here. my_func(tmp_data) Here you call a function but do not store any return values. Or are you using global variables somewhere? my_func.third_party_function() But now you are accessing an attribute of my_func. What is my_func? Is it a function or an object? We cannot begin to guess what is going on without knowing that. my_array([i,j,k]) = my_func.results() # this is just a floating point number ''' should I do something to flush tmp_data? ''' No idea, you haven't begun to give us enough information. -- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/ ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] Cleaning up output
I've written my first program to take a given directory and look in all directories below it for duplicate files (duplicate being defined as having the same MD5 hash, which I know isn't a perfect solution, but for what I'm doing is good enough) My problem now is that my output file is a rather confusing jumble of paths and I'm not sure the best way to make it more user readable. My gut reaction would be to go through and list by first directory, but is there a logical way to do it so that all the groupings that have files in the same two directories would be grouped together? So I'm thinking I'd have: First File Dir /some/directory/ Duplicate directories: some/other/directory/ Original file 1 , dupicate file 1 Original file 2, duplicate file 2 some/third directory/ original file 3, duplicate file 3 and so forth, where the Original file would be the file name in the First files so that all the ones are the same there. I fear I'm not explaining this well but I'm hoping someone can either ask questions to help get out of my head what I'm trying to do or can decipher this enough to help me. Here's a git repo of my code if it helps: https://github.com/CyberCowboy/FindDuplicates ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Cleaning up output
On 07/03/2013 03:51 PM, bja...@jamesgang.dyndns.org wrote: I've written my first program to take a given directory and look in all directories below it for duplicate files (duplicate being defined as having the same MD5 hash, which I know isn't a perfect solution, but for what I'm doing is good enough) This is a great first project for learning Python. It's a utility which doesn't write any data to the disk (other than the result file), and therefore bugs won't cause much havoc. Trust me, you will have bugs, we all do. One of the things experience teaches you is how to isolate the damage that bugs do before they're discovered. My problem now is that my output file is a rather confusing jumble of paths and I'm not sure the best way to make it more user readable. My gut reaction would be to go through and list by first directory, but is there a logical way to do it so that all the groupings that have files in the same two directories would be grouped together? I've come up with the same "presentation problem" with my own similar utilities. Be assured, there's no one "right answer." First question is have you considered what you want when there are MORE than two copies of one of those files? When you know what you'd like to see if there are four identical files, you might have a better idea what you should do even for two. Additionally, consider that two identical files may be in the same directory, with different names. Anyway, if you can explain why you want a particular grouping, we might better understand how to accomplish it. So I'm thinking I'd have: First File Dir /some/directory/ Duplicate directories: some/other/directory/ Original file 1 , dupicate file 1 Original file 2, duplicate file 2 some/third directory/ original file 3, duplicate file 3 At present, this First File Dir could be any of the directories involved; without some effort, os.walk doesn't promise you any order of processing. But if you want them to appear in sorted order, you can do sorts at key points inside your os.walk code, and they'll at least come out in an order that's recognizable. (Some OS's may also sort things they feed to os.walk, but you'd do better not to count on it) You also could sort each list in itervalues of hashdict, after the dict is fully populated. Even with sorting, you run into the problem that there may be duplicates between some/other/directory and some/third/directory that are not in /some/directory. So in the sample you show above, they won't be listed with the ones that are in /some/directory. and so forth, where the Original file would be the file name in the First files so that all the ones are the same there. I fear I'm not explaining this well but I'm hoping someone can either ask questions to help get out of my head what I'm trying to do or can decipher this enough to help me. Here's a git repo of my code if it helps: https://github.com/CyberCowboy/FindDuplicates At 40 lines, you should have just included it. It's usually much better to include the code inline if you want any comments on it. Think of what the archives are going to show in a year, when you're removed that repo, or thoroughly updated it. Somebody at that time will not be able to make sense of comments directed at the current version of the code. BTW, thanks for posting as text, since that'll mean that when you do post code, it shouldn't get mangled. So I'll comment on the code. You never call the dupe() function, so presumably this is a module intended to be used from some place else. But if that's the case, I would have expected it to be factored better, at least to separate the input processing from the output file formatting. That way you could re-use the dups logic and provide a new save formatting without duplicating anything. The first function could return the hashdict, and the second one could analyze it to produce a particular formatted output. The hashdict and dups variables should be initialized within the function, since they are not going to be used outside. Avoid non-const globals. And of course once you factor it, dups will be in the second function only. You do have a if __name__ == "__main__": line, but it's inside the function. Probably you meant it to be at the left margin. And importing inside a conditional is seldom a good idea, though it doesn't matter here since you're not using the import. Normally you want all your imports at the top, so they're easy to spot. You also probably want a call to dupe() inside the conditional. And perhaps some parsing of argv to get rootdir. You don't mention your OS, but many OS's have symbolic links or the equivalent. There's no code here to handle that possibility. Symlinks are a pain to do right. You could just add it in your docs, that no symlink is allowed under the rootdir. Your open() call has no mode switch. If you want the md5 to be 'correct", it
Re: [Tutor] memory consumption
On 03/07/13 20:50, Andre' Walker-Loud wrote: # loop over parameters and fill array with desired output for i in range(loop_1): for j in range(loop_2): for k in range(loop_3): How big are loop_1, loop_2, loop_3? The sizes of the loops are not big len(loop_1) = 20 len(loop_2) = 10 len(loop_3) = 10 This is confusing. The fact that you are getting values for the len() of these variables suggests they are some kind of collection? But you are using them in range which expects number(s) What kind of things are loop_1 etc? What happens at the >>> prompt if you try to >>> print range(loop_1) -- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/ ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] memory consumption
On 04/07/13 08:11, Andre' Walker-Loud wrote: Yes, I was being sloppy. My later post clarified what I meant. The loops are really lists, and I was really using enumerate() to get both the iter and the element. loop_2 = [1,2,4,8,16,32,64,128,256,512,1024] for i,n in enumerate(loop_2): ... Please be careful about portraying yourself as less experienced/more naive than you really are. Otherwise we end up wasting both our time and yours telling you to do things that you're already doing. Have you googled for "Python memory leak os-x"? When I do, I find a link to this numpy bug: https://github.com/numpy/numpy/issues/2969 -- Steven ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] memory consumption
On 04/07/13 09:24, Oscar Benjamin wrote: On 3 July 2013 23:37, Andre' Walker-Loud wrote: Hi Oscar, Hi Andre', (your name shows in my email client with an apostrophe ' after it; I'm not sure if I'm supposed to include that when I write it). I expect that it's meant to be André, since that is the "correct" spelling even in English (however a lot of non-French Andrés drop the accent). -- Steven ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor