Re: [Tutor] Unexpected result of division by negative integer, was: (no subject)
Jason Barrett wrote: [Jason, please use "reply all" instead of sending a private mail. Your answer will then appear on the mailing list and other readers get a chance to offer an alternative explanation.] >> In python, why does 17/-10= -2? Shouldn't it be -1? > >> http://docs.python.org/faq/programming.html#why-does-22-10-return-3 > I just started programming so all this is new to me. I don't really > understand the explanation. I'm a novice at programming. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] Octal confusion, please explain why. Python 3.2
Hello, first off I am using Python 3.2 on Linux Mint 12 64-bit. I am confused as to why I can not successfully compare a variable that was created as an octal to a variable that is converted to an octal in a if statement yet print yields that they are the same octal value. I think it is because they are transported around within python as a integer value, is this correct and if so why? Code example 1: This code works because both are explicitly converted to Octals. PATH = '/home/wolfrage/Documents' DIR_PERMISSIONS = 0o40740 for dirname, dirnames, filenames in os.walk(PATH): for subdirname in dirnames: #list each subdirectory in this directory. full_path = os.path.join(dirname, subdirname) print(full_path) stat_info = os.stat(full_path) uid = stat_info.st_uid gid = stat_info.st_gid #TODO: Group and User ID Check #TODO: Directories need the right Masks, and then read, write, and # execute permissions for User(Owner), Group and Others #TODO: UMask is seperate from permissions, so handle it seperately. if oct(stat_info.st_mode) != oct(DIR_PERMISSIONS): #TODO: Fix User Read Permissions print(str(oct(stat_info.st_mode)) + ' Vs ' + str(oct(DIR_PERMISSIONS))) os.chmod(full_path, DIR_PERMISSIONS) print(subdirname + ' has bad user permissions.') Code Example 2: This code does not work because we do not use an explicit conversion. PATH = '/home/wolfrage/Documents' DIR_PERMISSIONS = 0o40740 for dirname, dirnames, filenames in os.walk(PATH): for subdirname in dirnames: #list each subdirectory in this directory. full_path = os.path.join(dirname, subdirname) print(full_path) stat_info = os.stat(full_path) uid = stat_info.st_uid gid = stat_info.st_gid #TODO: Group and User ID Check #TODO: Directories need the right Masks, and then read, write, and # execute permissions for User(Owner), Group and Others #TODO: UMask is seperate from permissions, so handle it seperately. if oct(stat_info.st_mode) != DIR_PERMISSIONS: #TODO: Fix User Read Permissions print(str(oct(stat_info.st_mode)) + ' Vs ' + str(DIR_PERMISSIONS)) # The Above print statement shows that DIR_PERMISSIONS is printed as an # Integer. But Why does Python convert a explicitly created Octal to an # Integer? os.chmod(full_path, DIR_PERMISSIONS) print(subdirname + ' has bad user permissions.') ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Joining all strings in stringList into one string
On 05/30/2012 06:21 PM, Akeria Timothy wrote: > Hello all, > > I am working on learning Python(on my own) and ran into an exercise > that I figured out but I wanted to know if there was a different way > to write the code? I know he wanted a different answer for the body > because we haven't gotten to the ' '.join() command yet. > > This is what I have: > > def joinStrings(stringList): > string = [] > for string in stringList: # Here you never used the string variable so why have a for statement? > print ''.join(stringList) #Another version might look like this: def join_strings2(string_list): final_string = '' for string in string_list: final_string += string print(final_string) return final_string # Tested in Python 3.2 join_strings2(['1', '2', '3', '4', '5']) > > > def main(): > print joinStrings(['very', 'hot', 'day']) > print joinStrings(['this', 'is', 'it']) > print joinStrings(['1', '2', '3', '4', '5']) > > main() > > > thanks all > > > ___ > Tutor maillist - Tutor@python.org > To unsubscribe or change subscription options: > http://mail.python.org/mailman/listinfo/tutor ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Unexpected result of division by negative integer, was: (no subject)
On 02/06/12 09:52, Peter Otten wrote: I just started programming so all this is new to me. I don't really understand the explanation. I'm a novice at programming. OK, Here is an attempt to explain the explanation! Original question: Why does -17 / 10 => -2 instead of -1 Python FAQ response: > It’s primarily driven by the desire that i % j have the same > sign as j. The % operator returns the "remainder" in integer division. It turns out that in programming we can often use the remainder to perform programming "tricks", especially when dealing with cyclic data - such as the days of the week. For example if you want to find out which day of the week it is 53 days from now it is 53 % 7 plus whatever todays number is... Now for those tricks to work we want to have the % operator work as described in the FAQ above > If you want that, and also want: > > i == (i // j) * j + (i % j) > > then integer division has to return the floor. This is just saying that the integer division(//) result times the original divisor plus the remainder should equal the starting number. Thus: 17/10 => 1,7 So (17 // 10) => 1 and (17 % 10) => 7 So 1 x 10 + 7 => 17. And to achieve those two aims(remainder with the same sign as the divisor and the sum returning the original value) we must define integer division to return the "floor" of the division result. The floor of a real number is the integer value *lower* than its real value. Thus: floor(5.5) -> 5 floor(-5.5) -> -6 17/10 -> 1.7 floor(1.7) -> 1 -17/10 -> -1.7 floor(-1.7) -> -2 So X // Y == floor(X / Y) HTH. -- Alan G Author of the Learn to Program web site http://www.alan-g.me.uk/ ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Octal confusion, please explain why. Python 3.2
Jordan wrote: Hello, first off I am using Python 3.2 on Linux Mint 12 64-bit. I am confused as to why I can not successfully compare a variable that was created as an octal to a variable that is converted to an octal in a if statement yet print yields that they are the same octal value. Because the oct() function returns a string, not a number. py> oct(63) '0o77' In Python, strings are never equal to numbers: py> 42 == '42' False Octal literals are just regular integers that you type differently: py> 0o77 63 "Octals" aren't a different type or object; the difference between the numbers 0o77 and 63 is cosmetic only. 0o77 is just the base 8 version of the base 10 number 63 or the binary number 0b11. Because octal notation is only used for input, there's no need to covert numbers to octal strings to compare them: py> 0o77 == 63 # no different to "63 == 63" True The same applies for hex and bin syntax too: py> 0x1ff == 511 True -- Steven ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Octal confusion, please explain why. Python 3.2
Thank you for the detailed answer, now I understand and I understand that each number format is an integer just with a different base and cosmetic appearance. On 06/02/2012 01:51 PM, Steven D'Aprano wrote: > Jordan wrote: >> Hello, first off I am using Python 3.2 on Linux Mint 12 64-bit. >> I am confused as to why I can not successfully compare a variable that >> was created as an octal to a variable that is converted to an octal in a >> if statement yet print yields that they are the same octal value. > > Because the oct() function returns a string, not a number. Ahh, I see. That makes sense next time I will read the documentation on the functions that I am using. > py> oct(63) > '0o77' > > In Python, strings are never equal to numbers: > > py> 42 == '42' > False > > > Octal literals are just regular integers that you type differently: > > py> 0o77 > 63 > > "Octals" aren't a different type or object; the difference between the > numbers 0o77 and 63 is cosmetic only. 0o77 is just the base 8 version > of the base 10 number 63 or the binary number 0b11. > > > Because octal notation is only used for input, there's no need to > covert numbers to octal strings to compare them: > > py> 0o77 == 63 # no different to "63 == 63" > True > > > The same applies for hex and bin syntax too: > > py> 0x1ff == 511 > True > > > ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Octal confusion, please explain why. Python 3.2
On 02/06/12 12:29, Jordan wrote: think it is because they are transported around within python as a integer value, is this correct and if so why? Steven has already answered your issue but just to emphasise this point. Python stores all data as binary values in memory. Octal, decimal, hex are all representations of that raw data. Python interprets some binary data as strings, some as integers, some as booleans etc and then tries to display these objects in a suitable representation. (We can use the struct module to define the way python interprets binary data input.) For integers the default is base 10 but we can exercise some control over the format by using modifiers such as oct() hex() etc. But these functions only change the representation they do not change the underlying data in any way. HTH, Alan G. ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Division by negative numbers [was: no subject]
Jason Barrett wrote: In python, why does 17/-10= -2? Shouldn't it be -1? Integer division with negative numbers can be done in two different ways. (1) 17/-10 gives -1, with remainder 7. Proof: -1 times -10 is 10, plus 7 gives 17. (2) 17/-10 gives -2, with remainder -3. Proof: -2 times -10 is 20, plus -3 gives 17. [Aside: technically, integer division with positive numbers also has the same choice: 17 = 1*10+7 or 2*10-3.] Every language has to choose whether to support the first or the second version. Python happens to choose the second. Other languages might make the same choice, or the other. For example, C leaves that decision up to the compiler. Different C compiles may give different results, which is not very useful in my opinion. Ruby behaves like Python: steve@orac:/home/steve$ irb irb(main):001:0> 17/10 => 1 irb(main):002:0> 17/-10 => -2 Other languages may be different. P.S. in future, please try to include a meaningful subject line to your posts. -- Steven ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Joining all strings in stringList into one string
Jordan wrote: #Another version might look like this: def join_strings2(string_list): final_string = '' for string in string_list: final_string += string print(final_string) return final_string Please don't do that. This risks becoming slow. REALLY slow. Painfully slow. Like, 10 minutes versus 3 seconds slow. Seriously. The reason for this is quite technical, and the reason why you might not notice is even more complicated, but the short version is this: Never build up a long string by repeated concatenation of short strings. Always build up a list of substrings first, then use the join method to assemble them into one long string. Why repeated concatenation is slow: Suppose you want to concatenation two strings, "hello" and "world", to make a new string "helloworld". What happens? Firstly, Python has to count the number of characters needed, which fortunately is fast in Python, so we can ignore it. In this case, we need 5+5 = 10 characters. Secondly, Python sets aside enough memory for those 10 characters, plus a little bit of overhead: -- Then it copies the characters from "hello" into the new area: hello- followed by the characters of "world": helloworld and now it is done. Simple, right? Concatenating two strings is pretty fast. You can't get much faster. Ah, but what happens if you do it *repeatedly*? Suppose we have SIX strings we want to concatenate, in a loop: words = ['hello', 'world', 'foo', 'bar', 'spam', 'ham'] result = '' for word in words: result = result + word How much work does Python have to do? Step one: add '' + 'hello', giving result 'hello' Python needs to copy 0+5 = 5 characters. Step two: add 'hello' + 'world', giving result 'helloworld' Python needs to copy 5+5 = 10 characters, as shown above. Step three: add 'helloworld' + 'foo', giving 'helloworldfoo' Python needs to copy 10+3 = 13 characters. Step four: add 'helloworldfoo' + 'bar', giving 'helloworldfoobar' Python needs to copy 13+3 = 16 characters. Step five: add 'helloworldfoobar' + 'spam', giving 'helloworldfoobarspam' Python needs to copy 16+4 = 20 characters. Step six: add 'helloworldfoobarspam' + 'ham', giving 'helloworldfoobarspamham' Python needs to copy 20+3 = 23 characters. So in total, Python has to copy 5+10+13+16+20+23 = 87 characters, just to build up a 23 character string. And as the number of loops increases, the amount of extra work needed just keeps expanding. Even though a single string concatenation is fast, repeated concatenation is painfully SLOW. In comparison, ''.join(words) one copies each substring once: it counts out that it needs 23 characters, allocates space for 23 characters, then copies each substring into the right place instead of making a whole lot of temporary strings and redundant copying. So, join() is much faster than repeated concatenation. But you may never have noticed. Why not? Well, for starters, for small enough pieces of data, everything is fast. The difference between copying 87 characters (the slow way) and 23 characters (the fast way) is trivial. But more importantly, some years ago (Python 2.4, about 8 years ago?) the Python developers found a really neat trick that they can do to optimize string concatenation so it doesn't need to repeatedly copy characters over and over and over again. I won't go into details, but the thing is, this trick works well enough that repeated concatenation is about as fast as the join method MOST of the time. Except when it fails. Because it is a trick, it doesn't always work. And when it does fail, your repeated string concatenation code will suddenly drop from running in 0.1 milliseconds to a full second or two; or worse, from 20 seconds to over an hour. (Potentially; the actual slow-down depends on the speed of your computer, your operating system, how much memory you have, etc.) Because this is a cunning trick, it doesn't always work, and when it doesn't work, and you have slow code and no hint as to why. What can cause it to fail? - Old versions of Python, before 2.4, will be slow. - Other implementations of Python, such as Jython and IronPython, will not have the trick, and so will be slow. - The trick is highly-dependent on internal details of the memory management of Python and the way it interacts with the operating system. So what's fast under Linux may be slow under Windows, or the other way around. - The trick is highly-dependent on specific circumstances to do with the substrings being added. Without going into details, if those circumstances are violated, you will have slow code. - The trick only works when you are adding strings to the end of the new string, not if you are building it up from the beginning. So even though your function works, you can't rely on it being fast. -- Steven ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscriptio
[Tutor] How to identify clusters of similar files
Hi, I want to use difflib to compare a lot (tens of thousands) of text files. I know that many files are quite similar as they are subsequent versions of the same document (a primitive kind of version control). What would be a good approach to cluster the files based on their likeness? I want to be able to say something like: the number of files could be reduced by a factor of ten when the number of (near-)duplicates is taken into account. So let's say I have ten versions of a txt file: 'file0.txt', 'file1.txt', 'file2.txt', 'file3.txt', 'file4.txt', 'file5.txt', 'file6.txt', 'file7.txt', 'file8.txt', 'file9.txt'. How could I to some degree of certainty say they are related (I can't rely on the file names I'm affraid). file0 may be very similar to file1, but no longer to file10. But their likeness is "chained". The situation is easier with perfectly identical files. The crude code below illustrates what I'd like to do, but it's too simplistic. I'd appreciate some thoughts or references to theoretical approaches to this kind of stuff. import difflib, glob, os path = "/home/aj/Destkop/someDir" extension = ".txt" cut_off = 0.95 allTheFiles = sorted(glob.glob(os.path.join(path, "*" + extension))) for f_a in allTheFiles: for f_b in allTheFiles: file_a = open(f_a).readlines() file_b = open(f_b).readlines() if f_a != f_b: likeness = difflib.SequenceMatcher(lambda x: x == " ", file_a, file_b).ratio() if likeness >= cut_off: try: clusters[f_a].append(f_b) except KeyError: clusters[f_a] = [f_b] Thank you in advance! Regards, Albert-Jan ~~ All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a fresh water system, and public health, what have the Romans ever done for us? ~~ ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Joining all strings in stringList into one string
Thanks for the excellent feedback and very informative. I guess I just did not consider the memory side of things and did not think about just how much extra addition was having to occur. Additionally I also found this supporting link: http://wiki.python.org/moin/PythonSpeed/PerformanceTips#String_Concatenation On 06/02/2012 04:29 PM, Steven D'Aprano wrote: > Jordan wrote: > >> #Another version might look like this: >> >> def join_strings2(string_list): >> final_string = '' >> for string in string_list: >> final_string += string >> print(final_string) >> return final_string > > Please don't do that. This risks becoming slow. REALLY slow. Painfully > slow. Like, 10 minutes versus 3 seconds slow. Seriously. > > The reason for this is quite technical, and the reason why you might > not notice is even more complicated, but the short version is this: > > Never build up a long string by repeated concatenation of short strings. > Always build up a list of substrings first, then use the join method to > assemble them into one long string. > > > Why repeated concatenation is slow: > > Suppose you want to concatenation two strings, "hello" and "world", to > make a new string "helloworld". What happens? > > Firstly, Python has to count the number of characters needed, which > fortunately is fast in Python, so we can ignore it. In this case, we > need 5+5 = 10 characters. > > Secondly, Python sets aside enough memory for those 10 characters, > plus a little bit of overhead: -- > > Then it copies the characters from "hello" into the new area: hello- > > followed by the characters of "world": helloworld > > and now it is done. Simple, right? Concatenating two strings is pretty > fast. You can't get much faster. > > Ah, but what happens if you do it *repeatedly*? Suppose we have SIX > strings we want to concatenate, in a loop: > > words = ['hello', 'world', 'foo', 'bar', 'spam', 'ham'] > result = '' > for word in words: > result = result + word > > How much work does Python have to do? > > Step one: add '' + 'hello', giving result 'hello' > Python needs to copy 0+5 = 5 characters. > > Step two: add 'hello' + 'world', giving result 'helloworld' > Python needs to copy 5+5 = 10 characters, as shown above. > > Step three: add 'helloworld' + 'foo', giving 'helloworldfoo' > Python needs to copy 10+3 = 13 characters. > > Step four: add 'helloworldfoo' + 'bar', giving 'helloworldfoobar' > Python needs to copy 13+3 = 16 characters. > > Step five: add 'helloworldfoobar' + 'spam', giving 'helloworldfoobarspam' > Python needs to copy 16+4 = 20 characters. > > Step six: add 'helloworldfoobarspam' + 'ham', giving > 'helloworldfoobarspamham' > Python needs to copy 20+3 = 23 characters. > > So in total, Python has to copy 5+10+13+16+20+23 = 87 characters, just > to build up a 23 character string. And as the number of loops > increases, the amount of extra work needed just keeps expanding. Even > though a single string concatenation is fast, repeated concatenation > is painfully SLOW. > > In comparison, ''.join(words) one copies each substring once: it > counts out that it needs 23 characters, allocates space for 23 > characters, then copies each substring into the right place instead of > making a whole lot of temporary strings and redundant copying. > > So, join() is much faster than repeated concatenation. But you may > never have noticed. Why not? > > Well, for starters, for small enough pieces of data, everything is > fast. The difference between copying 87 characters (the slow way) and > 23 characters (the fast way) is trivial. > > But more importantly, some years ago (Python 2.4, about 8 years ago?) > the Python developers found a really neat trick that they can do to > optimize string concatenation so it doesn't need to repeatedly copy > characters over and over and over again. I won't go into details, but > the thing is, this trick works well enough that repeated concatenation > is about as fast as the join method MOST of the time. > > Except when it fails. Because it is a trick, it doesn't always work. > And when it does fail, your repeated string concatenation code will > suddenly drop from running in 0.1 milliseconds to a full second or > two; or worse, from 20 seconds to over an hour. (Potentially; the > actual slow-down depends on the speed of your computer, your operating > system, how much memory you have, etc.) > > Because this is a cunning trick, it doesn't always work, and when it > doesn't work, and you have slow code and no hint as to why. > > What can cause it to fail? > > - Old versions of Python, before 2.4, will be slow. > > - Other implementations of Python, such as Jython and IronPython, will > not have the trick, and so will be slow. > > - The trick is highly-dependent on internal details of the memory > management of Python and the way it interacts with the operating > system. So what's fast under Linux may be slow under Windows, or the > other way around. > > - The t
Re: [Tutor] How to identify clusters of similar files
Albert-Jan Roskam wrote: Hi, I want to use difflib to compare a lot (tens of thousands) of text files. I know that many files are quite similar as they are subsequent versions of the same document (a primitive kind of version control). What would be a good approach to cluster the files based on their likeness? You have already identified the basic tool: difflib. But your question is not really about Python, it is more about the algorithm used for clustering data according to goodness of fit. That's a hard problem, and you should consider asking it on the main Python mailing list or newsgroup too. Some search terms to get you started: biopython nltk (the Natural Language Tool Kit) unrooted phylogram Good luck! -- Steven ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor