[Tutor] Plural words to Singular
Has anyone come across a quality program to turn plural words to singular words? We don't want to use a stemmer. Thanks. Dinesh___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
[Tutor] pickling codecs
I use codecs to retain consistent unicode/utf-8 encoding and decoding for reading/writing to files. Should the codecs be applied when using the pickle/unpickle function? For example, the standard syntax is: # pickle object f = open(object, 'wb') pickle.dump(object, f, 2) # unpickle object f = open(object, 'rb') object= pickle.load(f) or should it be: # pickle object f = codecs.open(object, 'wb', 'utf-8') pickle.dump(object, f, 2) # unpickle object f = codecs.open(object, 'rb', 'utf-8') object= pickle.load(f) ___ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Picking up citations
Kent The citation without the name is perfect (and this appears to be how most citation parsers work). There are two issues in the test run: 1. The parallel citation 422 U.S. 490, 499 n. 10, 95 S.Ct. 2197, 2205 n. 10, 45 L.Ed.2d 343 (1975) is resolved as: 422 U.S. 490 (1975) 499 n. 10 (1975) 95 S.Ct. 2197 (1975) 2205 n. 10 (1975) 45 L.Ed.2d 343 (1975) instead of as: 422 U.S. 490, 499 n. 10 (1975) 95 S.Ct. 2197, 2205 n. 10 (1975) 45 L.Ed.2d 343 (1975) ie. parsing the second page references should pick up all alphanumeric chars between the commas. 2. It doesn't parse the last citation ie. 463 U.S. 29, 43, 103 S.Ct. 2856, 2867, 77 L.Ed.2d 443 (1983). I tested it on another sample text and it missed the last citation too. Thanks! Dinesh From: Kent Johnson Sent: Tuesday, February 10, 2009 4:01 AM To: Dinesh B Vadhia Cc: tutor@python.org Subject: Re: [Tutor] Picking up citations On Mon, Feb 9, 2009 at 12:51 PM, Dinesh B Vadhia wrote: > Kent /Emmanuel > > Below are the results using the PLY parser and Regex versions on the > attached 'sierra' data which I think covers the common formats. Here are > some 'fully unparsed" citations that were missed by the programs: > > Smith v. Wisconsin Dept. of Agriculture, 23 F.3d 1134, 1141 (7th Cir.1994) > > Indemnified Capital Investments, S.A. v. R.J. O'Brien & Assoc., Inc., 12 > F.3d 1406, 1409 (7th Cir.1993). > > Hunt v. Washington Apple Advertising Commn., 432 U.S. 333, 343, 97 S.Ct. > 2434, 2441, 53 L.Ed.2d 383 (1977) > > Idaho Conservation League v. Mumma, 956 F.2d 1508, 1517-18 (9th Cir.1992) A few issues here: S.A. - this is hard, to allow this while filtering out sentences R.J. O'Brien, etc. - Loosening up the rules for the second name can allow these 1517-18 - allow page ranges The name issues are getting to be too much for me. Attached is a PLY version that just pulls out the citation without the name; at one point you indicated that would work for you. Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Picking up citations
I'm guessing that '499 n. 10' is a page reference ie. page 499, point number 10. Legal citations are all a mystery - they even have their own citation bluebook (http://www.legalbluebook.com/) ! Dinesh From: Kent Johnson Sent: Tuesday, February 10, 2009 10:57 AM To: Dinesh B Vadhia Cc: tutor@python.org Subject: Re: [Tutor] Picking up citations On Tue, Feb 10, 2009 at 12:42 PM, Dinesh B Vadhia wrote: > Kent > > The citation without the name is perfect (and this appears to be how most > citation parsers work). There are two issues in the test run: > > 1. The parallel citation 422 U.S. 490, 499 n. 10, 95 S.Ct. 2197, 2205 n. > 10, 45 L.Ed.2d 343 (1975) is resolved as: > > 422 U.S. 490 (1975) > 499 n. 10 (1975) > 95 S.Ct. 2197 (1975) > 2205 n. 10 (1975) > 45 L.Ed.2d 343 (1975) > > instead of as: > > 422 U.S. 490, 499 n. 10 (1975) > 95 S.Ct. 2197, 2205 n. 10 (1975) > 45 L.Ed.2d 343 (1975) > > ie. parsing the second page references should pick up all alphanumeric chars > between the commas. So 499 n. 10 is a page reference? I can't pick up all alphanumeric chars between commas, that would include a second reference. Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Picking up citations
You're probably right Paul. But, my assumption is that the originators of legal documents pay a little more attention to getting the citation correct and in the right format then say Joe Bloggs does when completing an address block. I think that Kent has reached the end of his commendable effort. I'll test out the latest version in anger over the coming weeks on large numbers of legal documents. Dinesh Message: 2 Date: Tue, 10 Feb 2009 14:29:20 -0600 From: "Paul McGuire" Subject: Re: [Tutor] Picking up citations To: Message-ID: <0a8f5cca89bf4b08becd3c4b86f18...@awa2> Content-Type: text/plain; charset="us-ascii" Dinesh and Kent - I've been lurking along as you run this problem to ground. The syntax you are working on looks very slippery, and reminds me of some of the issues I had writing a generic street address parser with pyparsing (http://pyparsing.wikispaces.com/file/view/streetAddressParser.py). Mailing list companies spend beaucoup $$$ trying to parse addresses in order to filter duplicates, to group by zip code, street, neighborhood, etc., and this citation format looks similarly scary. Congratulations on getting to a 95% solution using PLY. -- Paul ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] Removing control characters
I want a regex to remove control characters (< chr(32) and > chr(126)) from strings ie. line = re.sub(r"[^a-z0-9-';.]", " ", line) # replace all chars NOT A-Z, a-z, 0-9, [-';.] with " " 1. What is the best way to include all the required chars rather than list them all within the r"" ? 2. How do you handle the inclusion of the quotation mark " ? Cheers Dinesh ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Removing control characters
At the bottom of the link http://code.activestate.com/recipes/303342/ there are list comprehensions for string manipulation ie. import string str = 'Chris Perkins : 224-7992' set = '0123456789' r = '$' # 1) Keeping only a given set of characters. print ''.join([c for c in str if c in set]) > '2247992' # 2) Deleting a given set of characters. print ''.join([c for c in str if c not in set]) > 'Chris Perkins : -' The missing one is # 3) Replacing a set of characters with a single character ie. for c in str: if c in set: string.replace (c, r) to give > 'Chris Perkins : $$$-' My solution is: print ''.join[string.replace(c, r) for c in str if c in set] But, this returns a syntax error. Any idea why? Ta! Dinesh From: Kent Johnson Sent: Thursday, February 19, 2009 8:03 AM To: Dinesh B Vadhia Cc: tutor@python.org Subject: Re: [Tutor] Removing control characters On Thu, Feb 19, 2009 at 10:14 AM, Dinesh B Vadhia wrote: > I want a regex to remove control characters (< chr(32) and > chr(126)) from > strings ie. > > line = re.sub(r"[^a-z0-9-';.]", " ", line) # replace all chars NOT A-Z, > a-z, 0-9, [-';.] with " " > > 1. What is the best way to include all the required chars rather than list > them all within the r"" ? You have to list either the chars you want, as you have done, or the ones you don't want. You could use r'[\x00-\x1f\x7f-\xff]' or r'[^\x20-\x7e]' > 2. How do you handle the inclusion of the quotation mark " ? Use \", that works even in a raw string. By the way string.translate() is likely to be faster for this purpose than re.sub(). This recipe might help: http://code.activestate.com/recipes/303342/ Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Removing control characters
Okay, here is a combination of Mark's suggestions and yours: > # string of all chars > a = ''.join([chr(n) for n in range(256)]) > a '\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f !"#$%&\'()*+,-./0123456789:;<=>?...@abcdefghijklmnopqrstuvwxyz[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x7f\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff' > # string of wanted chars > b = ''.join([n for n in a if ord(n) >= 32 and ord(n) <= 126]) > b ' !"#$%&\'()*+,-./0123456789:;<=>?...@abcdefghijklmnopqrstuvwxyz[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~' > # string of unwanted chars > ord(126) > c = ''.join([n for n in a if ord(n) < 32 or ord(n) > 126]) > c '\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f\x7f\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff' > # the string to process > s = "Product Concepts\xe2\x80\x94Hard candy with an innovative twist, > Internet Archive: Wayback Machine. [online] Mar. 25, 2004. Retrieved from the > Internet http://www.confectionery-innovations.com>." > # replace unwanted chars in string s with " " > t = "".join([(" " if n in c else n) for n in s if n not in c]) > t 'Product ConceptsHard candy with an innovative twist, Internet Archive: Wayback Machine. [online] Mar. 25, 2004. Retrieved from the Internet http://www.confectionery-innovations.com>.' This last bit doesn't work ie. replacing the unwanted chars with " " - eg. 'ConceptsHard'. What's missing? Dinesh From: Kent Johnson Sent: Thursday, February 19, 2009 12:36 PM To: Dinesh B Vadhia Cc: tutor@python.org Subject: Re: [Tutor] Removing control characters On Thu, Feb 19, 2009 at 2:25 PM, Dinesh B Vadhia wrote: > # 3) Replacing a set of characters with a single character ie. > > for c in str: > if c in set: > string.replace (c, r) > > to give > >> 'Chris Perkins : $$$-' > My solution is: > > print ''.join[string.replace(c, r) for c in str if c in set] With the syntax corrected this will not do what you want; the "if c in set" filters the characters in the result, so the result will contain only the replacement characters. You would need something like ''.join([ (r if c in set else c) for c in str]) Note that both 'set' and 'str' are built-in names and therefore poor choices for variable names. Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] Standardizing on Unicode and utf8
We want to standardize on unicode and utf8 and would like to clarify and verify their use to minimize encode()/decode()'ing: 1. Python source files Use the header: # -*- coding: utf8 -*- 2. Reading files In most cases, we don't know the source encoding of the files being read. Do we have to decode('utf8') after reading from file? 3. Writing files We will always write to files in utf8. Do we have to encode('utf8') before writing to file? Is there anything else that we have to consider? Cheers Dinesh ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] Sorting large numbers of co-ordinate pairs
Have a large number (> 1bn) of integer co-ordinates (i, j). The i are ordered and the j unordered. I want to create (j, i) with j ordered and i unordered ie. from: ... 6940, 22886 6940, 38277 6940, 43788 ... to: ... 38277, 567 38277, 90023 38277, 6940 ... I've tried the dictionary route and it works perfectly for small set of co-ordinate pairs but not for large sets as it hits memory capacity. Any ideas how I could do this? Dinesh ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] 32-bit libaries on 64-bit Windows
Does anyone know if 32-bit Python libraries will work with 64-bit Python under 64-bit Windows? For example, will 32-bit Numpy or Scipy work under 64-bit Python? Cheers ... Dinesh ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] parse text for paragraghs/sections
Hi! I want to parse text and pickup sections. For example, from the text: t = """abc DEF ghi jkl MNO pqr""" ... pickup all text between the tags and and replace with another piece of text. I tried t = re.sub(r"\[A-Za-z0-9]\", "DBV", t) ... but it doesn't work. How do you do this with re? Thanks Dinesh ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] PDF to text conversion
Hi Robert I don't have an answer but can have my sympathy. I've been looking for a quality pdf to text convertor for months and not turned up anything useful. I've tried many free programs which are poor. I too wanted a Python-only solution and tried pyPdf but that didn't work. Just today I download a trial version of a so called top-notch converter and it produced unfaithful text. Not sure what the answer is! Dinesh Message: 5 Date: Tue, 21 Apr 2009 13:44:16 -0400 From: Robert Berman Subject: Re: [Tutor] PDF to text conversion To: "Emad Nawfal ( )" Cc: tutor@python.org Message-ID: <49ee05f0.3080...@cfl.rr.com> Content-Type: text/plain; charset=windows-1256; format=flowed Hello Emad, I have seriously looked at the documentation associated with pyPDF. This seems to have the page as its smallest element of work, and what i need is a line by line process to go from .PDF format to Text. I don't think pyPDF will meet my needs but thank you for bringing it to my attention. Thanks, Robert Berman ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] PDF to text conversion
The best converter so far is pdftotext from http://www.glyphandcog.com/ who maintain an open source project at http://www.foolabs.com/xpdf/. It's not a Python library but you can call pdftotext from with Python using os.system(). I used the pdftotext -layout option and that gave the best result. hth. dinesh Message: 4 Date: Tue, 21 Apr 2009 18:37:39 -0400 From: Robert Berman Subject: Re: [Tutor] PDF to text conversion To: tutor@python.org Message-ID: <49ee4ab3.4040...@cfl.rr.com> Content-Type: text/plain; charset=ISO-8859-1; format=flowed First, thanks to everyone who contributed to this thread. I have a number of possible solutions and a number of paths to pursue to determine which avenue I should take to resolve this remaining issue. I did try the itools library and while everything installed nicely, most of the tests failed so I am not particularly overjoyed with the results. Thank you Dinesh for the vote of sympathy. I do appreciate it. I did use Adobe Reader to convert the history PDF file into a text file and it did seem to do it faithfully. So now I will work out a parsing function to extract my data and send it to a SQLLITE database. I am thrilled both with the number of suggestions I have received from this group and the quality of the suggestions. Thanks again, Robert Berman ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] finding mismatched or unpaired html tags
I'm processing tens of thousands of html files and a few of them contain mismatched tags and ElementTree throws the error: "Unexpected error opening J:/F2/663/blahblah.html: mismatched tag: line 124, column 8" I now want to scan each file and simply identify each mismatched or unpaired tags (by line number) in each file. I've read the ElementTree docs and cannot see anything obvious how to do this. I know this is a common problem but feeling a bit clueless here - any ideas? Dinesh ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] finding mismatched or unpaired html tags
A.T. / Marty I'd prefer that the html parser didn't replace the missing tags as I want to know where and what the problems are. Also, the source html documents were generated by another computer ie. they are not web page documents. My sense is that it is only a few files out of tens of thousands. Cheers ... Dinesh Message: 7 Date: Tue, 28 Apr 2009 08:54:33 -0500 From: Martin Walsh Subject: Re: [Tutor] finding mismatched or unpaired html tags To: "tutor@python.org" Message-ID: <49f70a99.3050...@mwalsh.org> Content-Type: text/plain; charset=us-ascii A.T.Hofkamp wrote: > Dinesh B Vadhia wrote: >> I'm processing tens of thousands of html files and a few of them >> contain mismatched tags and ElementTree throws the error: >> >> "Unexpected error opening J:/F2/663/blahblah.html: mismatched tag: >> line 124, column 8" >> >> I now want to scan each file and simply identify each mismatched or >> unpaired > tags (by line number) in each file. I've read the ElementTree docs and > cannot > see anything obvious how to do this. I know this is a common problem but > feeling a bit clueless here - any ideas? >> > > Don't use elementTree, use BeautifulSoup instead. > > elementTree expects perfect input, typically generated by another computer. > BeautifulSoup is designed to handle your everyday HTML page, filled with > errors of all possible kinds. But it also modifies the source html by default, adding closing tags, etc. Important to know, I suppose, if you intend to re-write the html files you parse with BeautifulSoup. Also, unless you're running python 3.0 or greater, use the 3.0.x series of BeautifulSoup -- otherwise you may run into the same issue. http://www.crummy.com/software/BeautifulSoup/3.1-problems.html HTH, Marty ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] finding mismatched or unpaired html tags
This is the error and traceback: Unexpected error opening J:/F2/html: mismatched tag: line 124, column 8 Traceback (most recent call last): File "C:\py", line 492, in raw = extractText(xhtmlfile) File "C:\py", line 334, in extractText tree = make_tree(xhtmlfile) File "py", line 169, in make_tree return tree UnboundLocalError: local variable 'tree' referenced before assignment Here is line 124, col 8 and I cannot see any obvious missing/mismatched tags: "As to the present time I am unable physical and mentally to secure all this information at present." Dinesh From: Kent Johnson Sent: Tuesday, April 28, 2009 7:13 AM To: Dinesh B Vadhia Cc: tutor@python.org Subject: Re: [Tutor] finding mismatched or unpaired html tags On Tue, Apr 28, 2009 at 8:54 AM, Dinesh B Vadhia wrote: > I'm processing tens of thousands of html files and a few of them contain > mismatched tags and ElementTree throws the error: > > "Unexpected error opening J:/F2/663/blahblah.html: mismatched tag: line 124, > column 8" > > I now want to scan each file and simply identify each mismatched or unpaired > tags (by line number) in each file. I've read the ElementTree docs and > cannot see anything obvious how to do this. I know this is a common problem > but feeling a bit clueless here - any ideas? It seems like the exception gives you the line number. What kind of exception is raised? The exception object may contain the line and column in a more accessible form, so you could catch the exception, get the line number, then read that line out of the file and show it. Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] finding mismatched or unpaired html tags
Found the mismatched tag on line 94: "My Name in Nelma Lois Thornton-S.S. No. sjn-yz-yokv/p>" should be: "My Name in Nelma Lois Thornton-S.S. No. sjn-yz-yokv" I'll run all the html files through a simple script to identify the mismatches using etree. Thanks. Dinesh From: Kent Johnson Sent: Tuesday, April 28, 2009 8:17 AM To: Dinesh B Vadhia Cc: tutor@python.org Subject: Re: [Tutor] finding mismatched or unpaired html tags On Tue, Apr 28, 2009 at 10:41 AM, Dinesh B Vadhia wrote: > This is the error and traceback: > > Unexpected error opening J:/F2/html: mismatched tag: line 124, column 8 > > Traceback (most recent call last): > File "C:\py", line 492, in > raw = extractText(xhtmlfile) > File "C:\py", line 334, in extractText > tree = make_tree(xhtmlfile) > File "py", line 169, in make_tree > return tree > UnboundLocalError: local variable 'tree' referenced before assignment This is inconsistent. The exception in the stack trace is from a coding error in extractText. It looks like maybe ExtractText is catching exceptions and printing them, and a bug in the exception handling is causing the UnboundLocalError > Here is line 124, col 8 and I cannot see any obvious missing/mismatched > tags: > > "As to the present time I am unable physical and mentally to secure all > this information at present." If you look at a few more lines do you see anything untoward? Perhaps there is a missing before the , for example? I don't think is allowed inside every tag. Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] finding mismatched or unpaired html tags
Stefan / Alan et al Thank-you for all the advice and links. A simple script using etree is scanning 500K+ xhtml files and 2 files with mismatched files have been found so far which can be fixed manually. I'll definitely look into "tidy" as it sounds pretty cool. Because, we are running data processing programs on a 64-bit Windows box (yes, I know, I know ...) using 64-bit Python we can only use pure Python-only libraries. I believe that lxml uses C libraries. Again, thanks to everyone - a terrific community as usual! Message: 5 Date: Tue, 28 Apr 2009 19:39:17 +0200 From: Stefan Behnel Subject: Re: [Tutor] finding mismatched or unpaired html tags To: tutor@python.org Message-ID: Content-Type: text/plain; charset=ISO-8859-1 A.T.Hofkamp wrote: > Dinesh B Vadhia wrote: >> I'm processing tens of thousands of html files and a few of them >> contain mismatched tags and ElementTree throws the error: >> >> "Unexpected error opening J:/F2/663/blahblah.html: mismatched tag: >> line 124, column 8" >> >> I now want to scan each file and simply identify each mismatched or >> unpaired > tags (by line number) in each file. I've read the ElementTree docs and > cannot > see anything obvious how to do this. I know this is a common problem but > feeling a bit clueless here - any ideas? > > Don't use elementTree, use BeautifulSoup instead. Actually, now that the code is there anyway, the OP might be happier with lxml.html. It's a lot faster than BeautifulSoup, uses less memory, and often parses broken HTML better. It's also more user friendly for many HTML tasks. http://codespeak.net/lxml/lxmlhtml.html This might also be worth a read: http://blog.ianbicking.org/2008/12/10/lxml-an-underappreciated-web-scraping-library/ Stefan ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] finding mismatched or unpaired html tags
Lie / Alan re: If the source document was generated by a computer, and it produces invalid markup, shouldn't that be considered a bug in the producing program? Yes, absolutely but we don't have access to the producing program only the produced xhtml files. Dinesh Message: 7 Date: Wed, 29 Apr 2009 08:35:16 +0100 From: "Alan Gauld" Subject: Re: [Tutor] finding mismatched or unpaired html tags To: tutor@python.org Message-ID: Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=response "Lie Ryan" wrote >> documents were generated by another computer ie. they are not web page >> documents. > > If the source document was generated by a computer, and it produces > invalid markup, shouldn't that be considered a bug in the producing Elementree parses xml, the source docs are html. Valid html may not be valid xml so the source could be correct even though it doesn't parse properly in elemtree. OTOH you could be right! :-) Alan G. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] Dictionary, integer, compression
This could be a question for the comp.lang.python list but I'll try here first: Say, you have a dictionary of integers, are the integers stored in a compressed integer format or as integers ie. are integers encoded before being stored in the dictionary and then decoded when read? Dinesh ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Dictionary, integer, compression
Alan I want to perform test runs on my local machine with very large numbers of integers stored in a dictionary. As the Python dictionary is an built-in function I thought that for very large dictionaries there could be compression. Done correctly, integer compression wouldn't affect performance but could enhance it. Weird, I know! I'll check in with the comp.lang.python lot. Dinesh Message: 3 Date: Wed, 29 Apr 2009 17:35:53 +0100 From: "Alan Gauld" Subject: Re: [Tutor] Dictionary, integer, compression To: tutor@python.org Message-ID: Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original "Dinesh B Vadhia" wrote > Say, you have a dictionary of integers, are the integers stored > in a compressed integer format or as integers ie. are integers > encoded before being stored in the dictionary and then > decoded when read? I can't think of any reason to compress them, I imagine they are stored as integers. But given the way Python handlers integers with arbitrarily long numbers etc it may well be more complex than a simple integer (ie 4 byte number). But any form of compression would be likely to hit performamce so I doubt that they would be compressed. Is there anything that made you think they might be? HTH -- Alan Gauld Author of the Learn to Program web site http://www.alan-g.me.uk/ ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] reading nested folders in gzip files
The structure of the gzip files are: gzip archive folderA folderB list of folderC's each folderC contains the target files Within the archive, I want to open the gzip archive, open folderA, openFolderB , get the list of target files in folderC, and extract each file in folderC individually. I've used gzip before but cannot see how to move from folderA to folder B within the archive. Any ideas? Dinesh ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] unicode, utf-8 problem again
Hi! I'm processing a large number of xml files that are all declared as utf-8 encoded in the header ie. My Python environment has been set for 'utf-8' through site.py. Additionally, the top of each program/module has the declaration: # -*- coding: utf-8 -*- But, I still get this error: Traceback (most recent call last): ... UnicodeEncodeError: 'ascii' codec can't encode character u'\u201c' in position 76: ordinal not in range(128) What am I missing? Dinesh ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] Fw: unicode, utf-8 problem again
I forgot to add that I'm using elementtree to process the xml files and don't (usually) have any problems with that. Plus, the workaround that works is to encode each elementtree output ie.: thisxmlline = thisxmlline.encode('utf8') But, this seems odd to me as isn't it already being processed as utf-8? Dinesh From: Dinesh B Vadhia Sent: Thursday, June 04, 2009 6:47 AM To: tutor@python.org Subject: unicode, utf-8 problem again Hi! I'm processing a large number of xml files that are all declared as utf-8 encoded in the header ie. My Python environment has been set for 'utf-8' through site.py. Additionally, the top of each program/module has the declaration: # -*- coding: utf-8 -*- But, I still get this error: Traceback (most recent call last): ... UnicodeEncodeError: 'ascii' codec can't encode character u'\u201c' in position 76: ordinal not in range(128) What am I missing? Dinesh ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] unicode, utf-8 problem again
Okay, I get it now ... reading/writing files with the codecs module and the 'utf-8' option fixes it. Thanks! From: Christian Witts Sent: Thursday, June 04, 2009 7:05 AM To: Dinesh B Vadhia Cc: tutor@python.org Subject: Re: [Tutor] unicode, utf-8 problem again Dinesh B Vadhia wrote: > Hi! I'm processing a large number of xml files that are all declared > as utf-8 encoded in the header ie. > > > > My Python environment has been set for 'utf-8' through site.py. > Additionally, the top of each program/module has the declaration: > > # -*- coding: utf-8 -*- > > But, I still get this error: > > Traceback (most recent call last): > ... > UnicodeEncodeError: 'ascii' codec can't encode character u'\u201c' in > position 76: ordinal not in range(128) > > What am I missing? > > Dinesh > > > > > > ___ > Tutor maillist - Tutor@python.org > http://mail.python.org/mailman/listinfo/tutor > Hi, Take a read through http://evanjones.ca/python-utf8.html which will give you insight as to how you should be reading and processing your files. As for the encoding line "# -*- coding: utf-8 -*-", that is actually to declare the character encoding of your script and not of potential data it will be working with. -- Kind Regards, Christian Witts ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] unicode, utf-8 problem again
That was very useful - thanks! Hopefully, I'm "all Unicode" now. From: wesley chun Sent: Thursday, June 04, 2009 10:45 AM To: Dinesh B Vadhia ; tutor@python.org Subject: Re: [Tutor] unicode, utf-8 problem again >> But, I still get this error: >> Traceback (most recent call last): >> ... >> UnicodeEncodeError: 'ascii' codec can't encode character u'\u201c' in >> position 76: ordinal not in range(128) >> What am I missing? > > Take a read through http://evanjones.ca/python-utf8.html which will give you > insight as to how you should be reading and processing your files. in a similar vein, i wrote a shorter blog post awhile ago that focuses specifically on string processing: http://wesc.livejournal.com/1743.html ... in it, i also describe the correct way of thinking about strings in these contexts... the difference between a string that represents data vs. a "string" which is made up of various bytes, as in binary files. hope this helps! -- wesley - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - "Core Python Programming", Prentice Hall, (c)2007,2001 "Python Fundamentals", Prentice Hall, (c)2009 http://corepython.com wesley.j.chun :: wescpy-at-gmail.com python training and technical consulting cyberweb.consulting : silicon valley, ca http://cyberwebconsulting.com ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] string pickling and sqlite blob'ing
I want to pickle (very long) strings and save them in a sqlite db. The plan is to use pickle dumps() to turn a string into a pickle object and store it in sqlite. After reading the string back from the sqlite db, use pickle loads() to turn back into original string. - Is this a good approach for storing very long strings? - Are the pickle'd strings stored in the sqlite db as a STRING or BLOB? Cheers. Dinesh ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] string pickling and sqlite blob'ing
Hi Vince That's terrific! Once a string is compressed with gzip.zlib does it make a difference whether it is stored it in a TEXT or BLOB column? Dinesh From: vince spicer Sent: Wednesday, June 24, 2009 10:49 AM To: Dinesh B Vadhia Cc: tutor@python.org Subject: Re: [Tutor] string pickling and sqlite blob'ing Pickle is more for storing complex objects (arrays, dict, etc). pickling a string makes it bigger. I have stored large text chunks in text and/or blob columns compressed with gzip.zlib.compress and extracted with gzip.zlib.decompress Comparison: import cPickle as Pickle import gzip x = "asdfasdfasdfasdfasdfasdfasdfasdfasdf" print len(x) >> 36 print len(Pickle.dumps(x)) >> 44 print len(gzip.zlib.compress(x)) >> 14 Vince On Wed, Jun 24, 2009 at 11:17 AM, Dinesh B Vadhia wrote: I want to pickle (very long) strings and save them in a sqlite db. The plan is to use pickle dumps() to turn a string into a pickle object and store it in sqlite. After reading the string back from the sqlite db, use pickle loads() to turn back into original string. - Is this a good approach for storing very long strings? - Are the pickle'd strings stored in the sqlite db as a STRING or BLOB? Cheers. Dinesh ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] string pickling and sqlite blob'ing
Alan On a machine with 6gb of ram, storing very long strings in sqlite caused a "sqlite3.OperationalError: Could not decode to UTF-8 column 'j' with text" which has been resolved. This fix then caused a memory error when reading some of the strings back from the db. Hence, I'm trying to work out what the problem is and looking for alternative solutions. It is strange that I can insert a long string into sqlite but a memory error is caused when selecting it. Splitting the strings into smaller chunks is the obvious solution but I need to sort out the above first since the post-processing after the select is on the entire string. Dinesh Message: 3 Date: Thu, 25 Jun 2009 00:44:22 +0100 From: "Alan Gauld" To: tutor@python.org Subject: Re: [Tutor] string pickling and sqlite blob'ing Message-ID: Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original "Dinesh B Vadhia" wrote > I want to pickle (very long) strings and save them in a sqlite db. Why? Why not just store the string in the database? If that turns out to be a problem then think about other options - like splitting it into chunks say? But until you know you have a problem don't try to solve it! > - Is this a good approach for storing very long strings? Probably not. > - Are the pickle'd strings stored in the sqlite db as a STRING or BLOB? They could be stored either way, thats up to how you define your tables and write your SQL. In general I expect databases to handle very large quantities of data either as blobs or as references to a file. Is this a valid approach? Write the long string (assuming its many MB in size) into a text file and store that with a unique name. Then store the filename in the database. But first check that you can't store it in the database directly or in chunks. -- Alan Gauld Author of the Learn to Program web site http://www.alan-g.me.uk/ ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] array and int
Say, you create an array['i'] for signed integers (which take a minimum 2 bytes). A calculation results in an integer that is larger than the range of an 'i'. Normally, Python will convert an 'i' to a 4-byte 'l' integer. But, does the same apply for an array ie. does Python dynamically adjust from array['i'] to array['l'']? Before anyone suggests it, I would be using Numpy for arrays but there isn't a 64-bit version available under Windows that works. Dinesh ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] list comprehension problem
I'm suffering from brain failure (or most likely just being brain less!) and need help to create a list comprehension for this problem: d is a list of integers: d = [0, 8, 4, 4, 4, 7, 2, 5, 1, 1, 5, 11, 11, 1, 6, 3, 5, 6, 11, 1] Want to create a new list that adds the current number and the prior number, where the prior number is the accumulation of the previous numbers ie. dd = [0, 8, 12, 16, 20, 27, 29, 34, 35, 36, 41, 52, 63, 64, 70, 73, 78, 84, 95, 96] A brute force solution which works is: >>> dd = [] >>> y = d[0] >>> for i, x in enumerate(d): >>>y += x >>>dd.append(y) Is there a list comprehension solution? Dinesh ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] list comprehension problem
d = [0, 8, 4, 4, 4, 7, 2, 5, 1, 1, 5, 11, 11, 1, 6, 3, 5, 6, 11, 1] and we want: [0, 8, 12, 16, 20, 27, 29, 34, 35, 36, 41, 52, 63, 64, 70, 73, 78, 84, 95, 96] dd = [ sum(d[:j]) for j in range(len(d)) ][1:] gives: [0, 8, 12, 16, 20, 27, 29, 34, 35, 36, 41, 52, 63, 64, 70, 73, 78, 84, 95] Dinesh Message: 6 Date: Fri, 03 Jul 2009 12:22:30 -0700 From: Emile van Sebille To: tutor@python.org Subject: Re: [Tutor] list comprehension problem Message-ID: Content-Type: text/plain; charset=ISO-8859-1; format=flowed On 7/3/2009 12:09 PM Dinesh B Vadhia said... > I'm suffering from brain failure (or most likely just being brain less!) > and need help to create a list comprehension for this problem: > > d is a list of integers: d = [0, 8, 4, 4, 4, 7, 2, 5, 1, 1, 5, 11, 11, > 1, 6, 3, 5, 6, 11, 1] > > Want to create a new list that adds the current number and the prior > number, where the prior number is the accumulation of the previous > numbers ie. [ sum(d[:j]) for j in range(len(d)) ][1:] Emile > > dd = [0, 8, 12, 16, 20, 27, 29, 34, 35, 36, 41, 52, 63, 64, 70, 73, 78, > 84, 95, 96] > > A brute force solution which works is: > > >>> dd = [] > >>> y = d[0] > >>> for i, x in enumerate(d): > >>>y += x > >>>dd.append(y) > > Is there a list comprehension solution? > > Dinesh ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] list comprehension problem
Thanks Emile / Kent. The problem I see with this solution is that at each stage it is re-summing the j's instead of retaining a running total which the 'for-loop' method does ie. >>> dd = [] >>> y = d[0] >>> for i, x in enumerate(d): >>>y += x >>>dd.append(y) As the lists of integers get larger (mine are in the thousands of integers per list) the list comprehension solution will get slower. Do you agree? Dinesh From: Kent Johnson Sent: Friday, July 03, 2009 1:21 PM To: Dinesh B Vadhia Cc: tutor@python.org Subject: Re: [Tutor] list comprehension problem On Fri, Jul 3, 2009 at 3:49 PM, Dinesh B Vadhia wrote: > d = [0, 8, 4, 4, 4, 7, 2, 5, 1, 1, 5, 11, 11, 1, 6, 3, 5, 6, 11, 1] > > and we want: > > [0, 8, 12, 16, 20, 27, 29, 34, 35, 36, 41, 52, 63, 64, 70, 73, 78, 84, 95, > 96] > dd = [ sum(d[:j]) for j in range(len(d)) ][1:] > > gives: > > [0, 8, 12, 16, 20, 27, 29, 34, 35, 36, 41, 52, 63, 64, 70, 73, 78, 84, 95] In [9]: [ sum(d[:j+1]) for j in range(len(d)) ] Out[9]: [0, 8, 12, 16, 20, 27, 29, 34, 35, 36, 41, 52, 63, 64, 70, 73, 78, 84, 95, 96] Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] large strings and garbage collection
This was discussed in a previous post but I didn't see a solution. Say, you have for i in veryLongListOfStringValues: s += i As per previous post (http://thread.gmane.org/gmane.comp.python.tutor/54029/focus=54139), (quoting verbatim) "... the following happens inside the python interpreter: 1. get a reference to the current value of s. 2. get a reference to the string value i. 3. compute the new value += i, store it in memory, and make a reference to it. 4. drop the old reference of s (thus free-ing "abc") 5. give s a reference to the newly computed value. After step 3 and before step 4, the old value of s is still referenced by s, and the new value is referenced internally (so step 5 can be performed). In other words, both the old and the new value are in memory at the same time after step 3 and before step 4, and both are referenced (that is, they cannot be garbage collected). ... " As s gets very large, how do you deal with this situation to avoid a memory error or what I think will be a general slowing down of the system if the for-loop is repeated a large number of times. Dinesh ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] large strings and garbage collection
join with generator expression is what was needed. terrific! From: Rich Lovely Sent: Friday, July 17, 2009 4:19 PM To: Dinesh B Vadhia Cc: tutor@python.org Subject: Re: [Tutor] large strings and garbage collection 2009/7/17 Dinesh B Vadhia : > This was discussed in a previous post but I didn't see a solution. Say, you > have > > for i in veryLongListOfStringValues: > s += i > > As per previous post > (http://thread.gmane.org/gmane.comp.python.tutor/54029/focus=54139), > (quoting verbatim) "... the following happens inside the python interpreter: > > 1. get a reference to the current value of s. > 2. get a reference to the string value i. > 3. compute the new value += i, store it in memory, and make a reference to > it. > 4. drop the old reference of s (thus free-ing "abc") > 5. give s a reference to the newly computed value. > > After step 3 and before step 4, the old value of s is still referenced by s, > and the new value is referenced internally (so step 5 can be performed). In > other words, both the old and the new value are in memory at the same time > after step 3 and before step 4, and both are referenced (that is, they > cannot be garbage collected). ... " > > As s gets very large, how do you deal with this situation to avoid a memory > error or what I think will be a general slowing down of the system if the > for-loop is repeated a large number of times. > > Dinesh > > ___ > Tutor maillist - Tutor@python.org > http://mail.python.org/mailman/listinfo/tutor > > If all you are doing is concatenating a list of strings, use the str.join() method, which is designed for the job: >>> listOfStrings ['And', 'now', 'for', 'something', 'completely', 'different.'] >>> print " ".join(listOfStrings) And now for something completely different. >>> print "_".join(listOfStrings) And_now_for_something_completely_different. If you need to perform other operations first, you can pass a generator expression as the argument, for example: >>> " ".join((s.upper() if n%2 else s.lower()) for n, s in >>> enumerate(listOfStrings)) 'and NOW for SOMETHING completely DIFFERENT.' Hope that helps you. -- Rich "Roadie Rich" Lovely There are 10 types of people in the world: those who know binary, those who do not, and those who are off by one. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] python interpreter vs bat file
During recent program testing, I ran a few Python programs from a Windows XP batch file which causes a memory error for one of the programs. If I run the same set of programs from the Python interpreter no memory error occurs. Any idea why this might be? Dinesh ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] python interpreter vs bat file
Not much more information available. Have a batch file (eg. 'test.bat') with entries: python "program a.py" python "program b.py" python "program c.py" python "program e.py" ... One of the programs (eg. 'program c.py') fails with a memory error when performing a pickle.dump: Traceback (most recent call last): ... File "py", line 176, in pickleObject pickle.dump(self, f, 2) MemoryError When the programs are run in the same order from the Python interpreter there are no memory errors. This has happened before and it seems odd behavior. Dinesh From: Jeff Johnson Sent: Saturday, July 18, 2009 3:24 PM To: Dinesh B Vadhia Cc: tutor@python.org Subject: Re: [Tutor] python interpreter vs bat file Need more information. Python works on Windows as good as anything else. Maybe even better. Dinesh B Vadhia wrote: > During recent program testing, I ran a few Python programs from a > Windows XP batch file which causes a memory error for one of the > programs. If I run the same set of programs from the Python interpreter > no memory error occurs. Any idea why this might be? > > Dinesh Jeff Jeff Johnson j...@dcsoftware.com Phoenix Python User Group - sunpigg...@googlegroups.com ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] python interpreter vs bat file
1. Run Python Programs with Batch file Python programs run from a Windows XP batch file (test.bat) in a CMD window initiated from Windows Explorer. All programs except one execute successfully which stops with a memory error but batch file continues to execute other Python programs (as it should). 2. Run Python Programs with Python Interpreter Fire up Python Interpreter, open .py program, Run. When the program with the memory error in 1. is run independently as in 2. it works. Dinesh Message: 4 Date: Sun, 19 Jul 2009 07:18:08 +0100 From: "Alan Gauld" To: tutor@python.org Subject: Re: [Tutor] python interpreter vs bat file Message-ID: Content-Type: text/plain; format=flowed; charset="Windows-1252"; reply-type=original "Dinesh B Vadhia" wrote > Not much more information available. > Have a batch file (eg. 'test.bat') with entries: > > python "program a.py" > python "program b.py" > python "program c.py" > > One of the programs (eg. 'program c.py') fails with a > memory error when performing a pickle.dump: > > Traceback (most recent call last): > ... > File "py", line 176, in pickleObject > pickle.dump(self, f, 2) > MemoryError > > When the programs are run in the same order from the > Python interpreter there are no memory errors. Can you elaborate on how you run the programs. It looks like an environmental issue so we need to know exactly what you are doing. How do you run the bat file? How do you run the programs "from the Python interpreter" Are you using Windows Explorer or a CMD wondow? or the Start->Run dialog etc? Which folders are you starting from in each case? > This has happened before and it seems odd behavior. So how did you fix it before? I've never seen or heard of this before. -- Alan Gauld Author of the Learn to Program web site http://www.alan-g.me.uk/ ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] python interpreter vs bat file
Hi Dave Sorry, I wasn't being obtuse. Here is more info: 1. Run Python Programs with Batch file - OS (correction): Windows 64-bit Vista SP2 - Python 2.5.4 64 bit (AMD64) - The Python programs run from a Windows batch file (test.bat) in a CMD window initiated from Windows Explorer. All programs except one execute successfully which stops with a memory error but the batch file continues to execute the other Python programs (as it should). 2. Run Python Programs with Python Interpreter - Start Idle, File/Open .py program, Run/Run Module - When the program with the memory error in 1. is run independently with Idle it works. Bob Gailer suggested running the Python programs individually in CMD one after the other. This is sensible but my test programs run for days and the full suite of programs take longer. The programs are memory intensive (the 64-bit machine has 8gb ram). Hence, it is not easy to test this scenario right now. It seems to me as if Windows is not freeing up memory between Python invocations in the batch file but can't be sure. I said earlier that this has happened before but the fix, as now, is to run the program individually with Idle. Hth ... Dinesh Message: 5 Date: Sun, 19 Jul 2009 11:56:15 -0700 From: Dave Kuhlman To: tutor@python.org Subject: Re: [Tutor] python interpreter vs bat file Message-ID: <20090719185615.ga5...@cutter.rexx.com> Content-Type: text/plain; charset=us-ascii On Sun, Jul 19, 2009 at 05:40:41AM -0700, Dinesh B Vadhia wrote: > >1. Run Python Programs with Batch file > >Python programs run from a Windows XP batch file (test.bat) in a CMD >window initiated from Windows Explorer. All programs except one >execute successfully which stops with a memory error but batch file >continues to execute other Python programs (as it should). > > > >2. Run Python Programs with Python Interpreter > >Fire up Python Interpreter, open .py program, Run. > Dinesh - Please tell us how you did this. Did you type "python" at a command prompt and then see the ">>>" prompt? If so how did you "open .py program, Run"? Or, did you start Idle (or some other IDE) then click File-->Open, then run with the Run-->RunModule menu item? You have been asked several times for more information. You really need to read: http://catb.org/~esr/faqs/smart-questions.html There are people on this list who are very generous with their time. It's a valuable resource. Please don't waste it. I don't mean to be rude. But, you will help us all, yourself included, if you think carefully when asking a question. - Dave ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] python interpreter vs bat file
Running time in CMD and IDLE - Program running time is about the same in both CMD and IDLE - The programs take a long time to run NOT because of runaway processes that are using up memory - The programs are being optimized with each successive generation to reduce resources and time but the limitations boil down to Python for-loops (within functions) and sorts (probably the subject of another note to Tutor). IDLE masking program errors - Could be but ... - The programs work under IDLE and return the correct results - At this point I decided to run the programs from a batch file Batch file method - Except for one program, all other programs work using the batch file method. - The program with the error is run under IDLE and combined at the end with the output of the batch file programs and correct results are returned. Program memory use - The program with the memory error uses a lot of memory but the data structures should fit into available memory as it does when run with IDLE Use of DOS Start command - I'll try out the /I, /B and /WAIT commands in the next run and will let you know what happens. Thanks. Dinesh Message: 1 Date: Sun, 19 Jul 2009 23:22:47 +0100 From: "Alan Gauld" To: tutor@python.org Subject: Re: [Tutor] python interpreter vs bat file Message-ID: Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original "Dinesh B Vadhia" wrote > Bob Gailer suggested running the Python programs individually > in CMD one after the other. This is sensible but my test programs > run for days and the full suite of programs take longer. OK, But it can't take longer than in IDLE? Or even in the bat file. So you can start the program running and then iconify it. The reason this is important is that IDLE catches some errors that the normal python interpreter does not So IDLE may be masking a real problem in your code. However... > The programs are memory intensive (the 64-bit machine > has 8gb ram). Hence, it is not easy to test this scenario > right now. Have you chedked in Task Manager how much RAM the python programs use up - they should be visible in the process tab. If it is a lot then maybe we can rewrite the code to use less memory (Or maybe leak less memory). > It seems to me as if Windows is not freeing up memory > between Python invocations in the batch file but can't be > sure. Windows should free up the memory, but it might depend on how you run the programs. In your earlier post you said the bat file contained lines like python foo.py python bar.py You could try usng the start command instead, as in: start foo.py You might want to explore the /I, /B and /WAIT options start gives you a lot more control over the execution environment. Notice you don;t need the 'python' because start uses the file association. HTH, -- Alan Gauld Author of the Learn to Program web site http://www.alan-g.me.uk/ -- Message: 2 Date: Sun, 19 Jul 2009 23:36:03 +0100 From: "Alan Gauld" To: tutor@python.org Subject: Re: [Tutor] hitting a wall (not a collision detection question :P) Message-ID: Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original "Michael" wrote > ...everything up to functions vs. methods and the basics of classes > and OOP. This is where I'm hitting a wall. It's at this point the all the > books go off in different directions OK, First thing is don;t worry about it, you are far from alone. Many, Many programmers (even long term pros) find the transition from functions to objects really hard to adjust to. Not surprising, since it doers require a new way of thinking about program structure. Eventually the OOP way will become second nature, in fact you might even find it hard to think about ordinary functions after a while! But it can take a while. > and I'm not sure a) what I'm learning, b) why I'm learning it, > and c) how this is going to help me get to my goals. It might be good to throw us some specific questions and we can try to answer them. General questions tend to produce vague answers! You can try my tutorial on OOP to see if that helps. Follow it up with the case study to see OOP in action. > I'm not really even understanding much of what these books > are talking about at this point anyway. Again, anything you are unsure about tell us and we can try to explain. That isd what this klist is really good at because there are many different perspectives who have all gone through the same learning curve. Someone likely has the same way if thinking about it as you do! > It's like a few chapters after "Classes and OOP" were torn out of all of > them. :-) > So, I'm just wondering what
[Tutor] Inverted Index
Hello! Anyone know of any example/cookbook code for implementing inverted indexes? Cheers Dinesh ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Inverted Index
Sure! To create an inverted index of a very large matrix (M x N with M<>N and M>10m rows). Most times the matrix will be sparse but sometimes it won't be. Most times the matrix will consist of 0's and 1's but sometimes it won't. Hope that helps. Dinesh - Original Message ----- From: Kent Johnson To: Dinesh B Vadhia Cc: tutor@python.org Sent: Wednesday, October 31, 2007 7:48 AM Subject: Re: [Tutor] Inverted Index Dinesh B Vadhia wrote: > Hello! Anyone know of any example/cookbook code for implementing > inverted indexes? Can you say more about what you are trying to do? Maybe PyLucene is interesting: http://mail.python.org/pipermail/tutor/2006-April/046116.html http://pylucene.osafoundation.org/ Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Inverted Index
A NumPy matrix (because we have to perform a dot matrix multiplication prior to creating an inverted index). Thank-you! - Original Message - From: Kent Johnson To: Dinesh B Vadhia Cc: tutor@python.org Sent: Wednesday, October 31, 2007 8:16 AM Subject: Re: [Tutor] Inverted Index Dinesh B Vadhia wrote: > Sure! To create an inverted index of a very large matrix (M x N with > M<>N and M>10m rows). Most times the matrix will be sparse but > sometimes it won't be. Most times the matrix will consist of 0's and > 1's but sometimes it won't. How is the matrix represented? Is it in a numpy array? a dict? or... Kent > > Hope that helps. > > Dinesh > > > - Original Message ----- > *From:* Kent Johnson <mailto:[EMAIL PROTECTED]> > *To:* Dinesh B Vadhia <mailto:[EMAIL PROTECTED]> > *Cc:* tutor@python.org <mailto:tutor@python.org> > *Sent:* Wednesday, October 31, 2007 7:48 AM > *Subject:* Re: [Tutor] Inverted Index > > Dinesh B Vadhia wrote: > > Hello! Anyone know of any example/cookbook code for implementing > > inverted indexes? > > Can you say more about what you are trying to do? > > Maybe PyLucene is interesting: > http://mail.python.org/pipermail/tutor/2006-April/046116.html > http://pylucene.osafoundationorg/ <http://pylucene.osafoundation.org/> > > Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] dictionary append
Hello! I'm creating a dictionary called keywords that has multiple entries each with a variable list of values eg. keywords[1] = [1, 4, 6, 3] keywords[2] = [67,2] keywords[3] = [2, 8, 5, 66, 3, 23] etc. The keys and respective values (both are integers) are read in from a file. For each key, the value is append'ed until the next key. Here is the code. . >>> keywords = {} >>> with open("x.txt", "r") as f: k=0 for line in f.readlines(): keywords[k], second = map(int, line.split()) keywords[k].append(second) if keywords[k] != k: k=k+1 Traceback (most recent call last): File "", line 5, in keywords[k].append(second) AttributeError: 'int' object has no attribute 'append' . Any idea why I get this error? Dinesh ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] Elegant argument index sort
I'm sorting a 1-d (NumPy) matrix array (a) and wanting the index results (b). This is what I have: b = a.argsort(0) b = b+1 The one (1) is added to b so that there isn't a zero index element. Is there a more elegant way to do this? Dinesh ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] From Numpy Import *
Hello! The standard Python practice for importing modules is, for example: import sys import os etc. In NumPy (and SciPy) the 'book' suggests using: from numpy import * from scipy import * However, when I instead use 'import numpy' it causes all sorts of errors in my existing code. What do you suggest? Dinesh ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] From Numpy Import *
Thank-you! It is important for us to avoid potential code conflicts and so we'll standardize on the import syntax. On a related note: We are using both NumPy and SciPy. Consider the example y = Ax where A is a sparse matrix. If A is qualified as a scipy object then do y and x also have to be scipy objects or can they be numpy objects? Dinesh - Original Message - From: Michael H. Goldwasser To: Dinesh B Vadhia Cc: tutor@python.org Sent: Wednesday, November 07, 2007 5:37 PM Subject: [Tutor] From Numpy Import * On Wednesday November 7, 2007, Dinesh B Vadhia wrote: >Hello! The standard Python practice for importing modules is, for example: > >import sys >import os >etc. > >In NumPy (and SciPy) the 'book' suggests using: > >from numpy import * >from scipy import * > >However, when I instead use 'import numpy' it causes all sorts of errors > in my existing code. The issue is the following. The numpy module includes many definitions, for example a class named array. When you use the syntax, from numpy import * That takes all definitions from the module and places them into your current namespace. At this point, it would be fine to use a command such as values = array([1.0, 2.0, 3.0]) which instantiates a (numpy) array. If you instead use the syntax import numpy things brings that module as a whole into your namespace, but to access definitions from that module you have to give a qualified name, for example as values = numpy.array([1.0, 2.0, 3.0]) You cannot simply use the word array as in the first scenario. This would explain why your existing code would no longer work with the change. >What do you suggest? The advantage of the "from numpy import *" syntax is mostly convenience. However, the better style is "import numpy" precisely becuase it does not automatically introduce many other definitions into your current namespace. If you were using some other package that also defined an "array" and then you were to use the "from numpy import *", the new definition would override the other definition. The use of qualified names helps to avoid these collisions and makes clear where those definitions are coming from. With regard, Michael ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] global is bad but ...
Consider a data structure (say, an array) that is operated on by a bunch of functions eg. def function_A global array_G do stuff with array_G return def function_B global array_G do stuff with array_G return def function_C global array_G do stuff with array_G return The described way is to place the statement 'global' in line 1 of each function. On the other hand, wiser heads say that the use of 'global' is bad and that reworking the code into classes and objects is better. What do you think and suggest? Dinesh ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] global is bad but ...
Alan/Jim: It's good to hear some pragmatic advice. This particular module has 8 small functions that share common data (structures, primarily in arrays and vectors). I tried passing array_G as a parameter but that doesn't work because everything in the function remains local and I cannot get back the altered data (unless you know better?). The 'global' route works a treat so far. Dinesh ... Date: Tue, 13 Nov 2007 23:11:49 - From: "Alan Gauld" <[EMAIL PROTECTED]> Subject: Re: [Tutor] global is bad but ... To: tutor@python.org Message-ID: <[EMAIL PROTECTED]> Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original "Dinesh B Vadhia" <[EMAIL PROTECTED]> wrote > Consider a data structure (say, an array) that is operated > on by a bunch of functions eg. > > def function_A > global array_G > def function_B > global array_G > etc... > On the other hand, wiser heads say that the use of 'global' > is bad and that reworking the code into classes and objects > is better. Rather than answer your question directly can I ask, do you know *why* wiser heads say global is bad? What problems does using global introduce? What problems does it solve? > What do you think and suggest? I think it's better to understand issues and make informed choices rather than following the rules of others. I suggest you consider whether global is bad in this case and what other solutions might be used instead. Then make an informed choice. If, having researched the subject you don't understand why global is (sometimes) bad ask for more info here. HTH (a little), -- Alan Gauld Author of the Learn to Program web site http://www.freenetpages.co.uk/hp/alan.gauld ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] global is bad but ... okay
Kent et al I reworked the code to pass parameters (mainly arrays) to the functions. It works and performs faster. Thank-you all very much for the insights. Dinesh - Original Message - From: Kent Johnson To: Dinesh B Vadhia Cc: tutor@python.org Sent: Wednesday, November 14, 2007 4:53 AM Subject: Re: [Tutor] global is bad but ... Dinesh B Vadhia wrote: > Alan/Jim: > > It's good to hear some pragmatic advice. > > This particular module has 8 small functions that share common data > (structures, primarily in arrays and vectors). I tried passing array_G > as a parameter but that doesn't work because everything in the function > remains local and I cannot get back the altered data (unless you know > better?). That sounds like a good candidate for a class with array_G as an instance attribute and your 8 small functions as methods. If you pass the array as a parameter, you can change the passed parameter in place and changes will be seen by other clients. Re-assigning the parameter will have only local effect. For example: This function mutates the list passed in, so changes are visible externally: In [23]: def in_place(lst): : lst[0] = 1 : : In [24]: a = [3,4,5] In [25]: in_place(a) In [26]: a Out[26]: [1, 4, 5] This function assigns a new value to the local name, changes are not visible externally: In [27]: def reassign(lst): : lst = [] : : In [28]: reassign(a) In [29]: a Out[29]: [1, 4, 5] This function replaces the contents of the list with a new list. This is a mutating function so the changes are visible externally. In [30]: def replace(lst): : lst[:] = [1,2,3] : : In [31]: replace(a) In [32]: a Out[32]: [1, 2, 3] > The 'global' route works a treat so far. Yes, globals work and they appear to be a simple solution, that is why they are used at all! They also - increase coupling - hinder testing and reuse - obscure the relationship between pieces of code which leads experienced developers to conclude that in general globals are a bad idea and should be strenuously avoided. Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] Web programming
Hi! I want to create (for testing purposes) a straightforward web application consisting of a client that makes simple queries to a backend which returns data from a database (initially pysqlite3). That's it - really! I don't need a professional web server (eg. Apache) per se. Are the Python urlparse, urllib, urllib2, httplib, BaseHTTPServer, SimpleHTTPServer etc. modules sufficient for the task. The number of queries per second will initially be low, in the 10's/second. Dinesh ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] error binding parameter 1
Hello! Can anyone see what the problem with this code snippet is? Dinesh image_filename = str(dir_list[i]) image_file = dir_path + image_filename image_blob = open(image_file, 'rb') [L40] cursor.execute("Insert into image_table values (?, ?)", (image_filename, image_blob)) Traceback (most recent call last): File "C:\storage management.py", line 40, in cursor.execute("Insert into image_table values (?, ?)", (image_filename, image_blob)) InterfaceError: Error binding parameter 1 - probably unsupported type. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] error binding parameter 1
Yes, it should be: image_blob = open(image_file, 'rb').read() Thank-you! - Original Message - From: bob gailer To: Dinesh B Vadhia Cc: tutor@python.org Sent: Saturday, November 24, 2007 5:55 PM Subject: Re: [Tutor] error binding parameter 1 Dinesh B Vadhia wrote: > Hello! Can anyone see what the problem with this code snippet is? > > Dinesh > > > image_filename = str(dir_list[i]) > image_file = dir_path + image_filename > image_blob = open(image_file, 'rb') Should that be image_blob = open(image_file, 'rb').read()? > [L40] cursor.execute("Insert into image_table values (?, ?)", > (image_filename, image_blob)) > > Traceback (most recent call last): > File "C:\storage management.py", line 40, in > cursor.execute("Insert into image_table values (?, ?)", > (image_filename, image_blob)) > InterfaceError: Error binding parameter 1 - probably unsupported type. > > > ___ > Tutor maillist - Tutor@python.org > http://mail.python.org/mailman/listinfo/tutor > ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] Displaying images on a web page
I want to display a fixed number of same-size (jpeg) images on a web page. The images displayed will change on user input. I can use PIL to write the code but has anyone come across open source code that already does this? Thank-you Dinesh___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] A faster x in S
For some significant data pre-processing we have to perform the following simple process: Is the integer x in a list of 13K sorted integers. That's it except this has to be done >100m times with different x's (multiple times). Yep, a real pain! I've put the 13K integers in a list S and am using the is 'x in S' function. I was wondering if there is anything faster? Dinesh ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] A faster x in S
I used the s.intersection(t) function in the set type as it was the most appropriate. The performance was phenomenal. Thank-you! Dinesh - Original Message - From: bob gailer To: Dinesh B Vadhia Cc: tutor@python.org Sent: Tuesday, January 15, 2008 2:03 PM Subject: Re: [Tutor] A faster x in S Dinesh B Vadhia wrote: > For some significant data pre-processing we have to perform the > following simple process: > > Is the integer x in a list of 13K sorted integers. That's it except > this has to be done >100m times with different x's (multiple times). > Yep, a real pain! > > I've put the 13K integers in a list S and am using the is 'x in S' > function. > > I was wondering if there is anything faster? I agree with Kent. >>> l = range(13000) >>> s=set(l) >>> d=dict(enumerate(l)) >>> import time >>> def f(lookupVal, times, values): .. st=time.time() .. for i in range(times): .. z = lookupVal in values .. return time.time()-st >>> f(6499,1000,l) 0.3126376037598 >>> f(6499,100,s) 0.3123623962402 So set is 1000 times faster than list! >>> f(6499,100,d) 0.31300020217895508 And dict is (as expected) about the same as set. So 100,000,000 lookups should take about 30 seconds. Not bad, eh? Let's explore another angle. What range are the integers in (min and max)? Bob ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] An -1.#IND error
After a matrix*vector multiplication (ie. b = Ax, with A, x and b all floats), the b vector elements are all "-1.#IND". What does this mean? Btw, they are no divisions in the program eg. no divide by zeros. Dinesh ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] An -1.#IND error
Luke: This is literally the core of the code: A = scipy.asmatrix(scipy.zeros((M, N), float)) q = scipy.asmatrix(scipy.zeros((N, 1)), float) b = scipy.asmatrix(scipy.zeros((1, N)), float) # populate A # x is a vector of valid floats (I've checked) # calculate b as: b = A * x After the matrix multiplication, the b vector elements are all "-1.#IND" 's. Note that there are no divisions by zero in the program. Cheers Dinesh - Original Message - From: Luke Paireepinart To: Dinesh B Vadhia Cc: tutor@python.org Sent: Saturday, January 26, 2008 11:12 PM Subject: Re: [Tutor] An -1.#IND error Dinesh B Vadhia wrote: > After a matrix*vector multiplication (ie. b = Ax, with A, x and b all > floats), the b vector elements are all "-1.#IND". What does this > mean? Btw, they are no divisions in the program eg. no divide by zeros. A code sample would be _much_ more helpful here. Please include one that exhibits the problem. > > Dinesh > > > > > ___ > Tutor maillist - Tutor@python.org > http://mail.python.org/mailman/listinfo/tutor > ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] matrix-vector multiplication errors
I've posted this on the Scipy forum but maybe there are answers on Tutor too. I'm performing a standard Scipy matrix* vector multiplication, b=Ax , (but not using the sparse module) with different sizes of A as follows: Assuming 8 bytes per float, then: 1. matrix A with M=10,000 and N=15,000 is of approximate size: 1.2Gb 2. matrix A with M=10,000 and N=5,000 is of approximate size: 390Mb 3. matrix A with M=10,000 and N=1,000 is of approximate size: 78Mb The Python/Scipy matrix initialization statements are: > A = scipy.asmatrix(scipy.empty((I,J), dtype=int)) > x = scipy.asmatrix(scipy.empty((J,1), dtype=float)) > b = scipy.asmatrix(scipy.empty((I,1), dtype=float)) I'm using a Windows XP SP2 PC with 2Gb RAM. Both matrices 1. and 2. fail with INDeterminate values in b. Matrix 3. works perfectly. As I have 2Gb of RAM why are matrices 1. and 2. failing? The odd thing is that Python doesn't return any error messages with 1. and 2. but we know the results are garbage (literally!) Cheers! Dinesh ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] List Box for Web
I know this isn't the right forum to ask but I'll try as someone might know. For my web application, I need a list box with a search capability. An example is the Python documentation (hit the F1 key under Windows from IDLE) and specifically the Index list ie. context-sensitive search through a list of phrases, but for use on a web page. Does anyone know if there are any open source UI widgets for such a capability? Any help/pointers appreciated. Dinesh___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] Bag of Words and libbow
Has anyone come across Python modules/libraries to perform "Bag of Words" text analysis or an interface to the libbow C library? Thank-you! Dinesh ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Bag of Words and libbow
Andre I had a quick look at NLTK which is an NLP library suite whereas libbow is for statistical text analysis. Cheers Dinesh Message: 3 Date: Mon, 10 Mar 2008 08:24:23 +0100 From: Andre Halama <[EMAIL PROTECTED]> Subject: Re: [Tutor] Bag of Words and libbow To: tutor@python.org Message-ID: <[EMAIL PROTECTED]> Content-Type: text/plain; charset=ISO-8859-1; format=flowed -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Dinesh B Vadhia schrieb: Hi, | Has anyone come across Python modules/libraries to perform "Bag of | Words" text analysis or an interface to the libbow C library? Thank-you! did you already have a look at NLTK (http://nltk.sourceforge.net/index.php/Main_Page)? HTH, ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] Working with Python Objects
I've avoided it as long as possible but I've reached a stage where I have to start using Python objects! The primary reason is that the web framework uses objects and the second is to eliminate a few globals. Here is example pseudo code followed by the question (one of many I suspect!): class A: constantA = 9 def OneOfA: a = class B: variableB = "quick brown fox" def OneOfB: b = c = b * a# the 'a' from def OneOfA in class A Question: 1) how do I access the 'a' from function (method) OneOfA in class A so that it can be used by functions (methods) in class B? Cheers Dinesh ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Working with Python Objects
Alan/Greg I've combined your code fragments and added a function call too, to determine how 'a' is passed between objects and classes: def addNumbers(i, j): k = i + j return k class A: def oneA(self): z = 2 self.a = self.a * z class B: def oneB(self): inA = A() # instance of class A y = 5 b = y * inA.a c = addNumbers(y, b) Is this correct? Dinesh class A: constantA = 9 def OneOfA: a = class B: variableB = "quick brown fox" def OneOfB: b = c = b * a# the 'a' from def OneOfA in class A -- > Question: > 1) how do I access the 'a' from function (method) OneOfA in > class A so that it can be used by functions (methods) in class B? You don't and shouldn't try to. In this case because the attriute only exists inside the method, it is local, so dies when the method completes. So first of all you need to make it part of the class A. We do that by tagging it as an attribute of self, which should be the fitrst attribute of every method. But one of the concepts of OOP is to think in terms of the objects not the attributes inside them So your question should probably be: How do I access objects of class A inside methods of class B? The answer is by passing an instance into the method as a parameter. You can then manipulate the instance of A by sending messages to it. In Python you can access the instance values of an object by sending a message with the same name as the attribute - in other OOP languages you would need to provide an accessor method. But it is very important conceptually that you try to get away from thinking about accessing attributes of another object inside methods. Access the objects. Metthods should only be manipulating the attributes of their own class. To do otherwise is to break the reusability of your classes. So re writing your pseudo code: class A: constantA = 9 def OneOfA(self): # add self as first parameter self.a =# use 'self' to tag 'a' as an attribute class B: variableB = "quick brown fox" def OneOfB(self, anA):# add self and the instance of A b = c = b * anA.a# the 'a' from the instance anA This way OneOfB() only works with attributes local to it or defined as instance variables or passed in as arguments. Which is as it should be! Real OOP purists don't like direct attribute access but in Python its an accepted idiom and frankly there is little value in writing an accessor method that simply returns the value if you can access it directly. The thing you really should try to avoid though is modifying the attributes directly from another class. Normally you can write a more meaningful method that will do that for you. -- Alan Gauld Author of the Learn to Program web site Temorarily at: http://uk.geocities.com/[EMAIL PROTECTED]/ Normally: http://www.freenetpages.co.uk/hp/alan.gauld ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] Python to C++
Say because of performance, you might want to re-write/convert Python code to C++. What is the best way (or best practice) to do this wrt the tools available? Dinesh ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Python to C++
Thank-you for all the suggestions for converting to C/C++ which will be followed up. Can we interface Python to a C++ library and if so how? Dinesh Date: Thu, 20 Mar 2008 17:21:52 - From: "Alan Gauld" <[EMAIL PROTECTED]> Subject: Re: [Tutor] Python to C++ To: tutor@python.org Message-ID: <[EMAIL PROTECTED]> Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original "Dinesh B Vadhia" <[EMAIL PROTECTED]> wrote > Say because of performance, you might want to re-write/convert > Python code to C++. What is the best way (or best practice) > to do this wrt the tools available? It may be obvious but its worth noting that optimised Python may be faster than a badly written C port. So first make sure you have squeezed the best performance out of Python. Secondly only rewrite the bits that need it so use the profiler to identify the bottlenecks in your Python code and move those to a separate module to reduce conversion effort. After that the advice already given re pyrex/psycho etc is all good. You might also find SWIG a useful alternative if you decide to rewrite the slow functions by hand. SWIG will help wrap those functions so that the remaining Python code can access them. Alan G. ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] from __future__ import division
I spent fruitless hours trying to get a (normal) division x/y to work and then saw that you have to declare: > from __future__ import division .. at the top of a module file. What is this all about? Dinesh ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] Google App Engine
Hi! Google announced an app server that allows pure Python developed applications/services to use their infrastructure. This maybe of use to many on this list. Further details can be found at: http://appengine.google.com/ The SDK include a modified Python 2.5.2 and Django 0.96.1, WebOb 0.9 and PyYAML 3.05. As an aside, does anyone here have experience of WebOb and specifically is it a mini web framework (like webpy)? Cheers Dinesh ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] List comprehensions
Here is a for loop operating on a list of string items: data = ["string 1", "string 2", "string 3", "string 4", "string 5", "string 6", "string 7", "string 8", "string 9", "string 10", "string 11"] result = "" for item in data: result = item + "\n" print result I want to replace the for loop with a List Comrehension (or whatever) to improve performance (as the data list will be >10,000]. At each stage of the for loop I want to print the result ie. [print (item + "\n") for item in data] But, this doesn't work as the inclusion of the print causes an invalid syntax error. Any thoughts? Dinesh ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] List comprehensions
Sorry, let's start again. Here is a for loop operating on a list of string items: data = ["string 1", "string 2", "string 3", "string 4", "string 5", "string 6", "string 7", "string 8", "string 9", "string 10", "string 11"] result = "" for item in data: result = item print result I want to replace the for loop with another structure to improve performance (as the data list will contain >10,000 string items]. At each iteration of the for loop the result is printed (in fact, the result is sent from the server to a browser one result line at a time) The for loop will be called continuously and this is another reason to look for a potentially better structure preferably a built-in. Hope this makes sense! Thank-you. Dinesh - Original Message - From: Kent Johnson To: Dinesh B Vadhia Cc: tutor@python.org Sent: Wednesday, April 09, 2008 12:40 PM Subject: Re: [Tutor] List comprehensions Dinesh B Vadhia wrote: > Here is a for loop operating on a list of string items: > > data = ["string 1", "string 2", "string 3", "string 4", "string 5", > "string 6", "string 7", "string 8", "string 9", "string 10", "string 11"] > > result = "" > for item in data: > result = item + "\n" > print result I'm not sure what your goal is here. Do you mean to be accumulating all the values in data into result? Your sample code does not do that. > I want to replace the for loop with a List Comrehension (or whatever) to > improve performance (as the data list will be >10,000]. At each stage > of the for loop I want to print the result ie. > > [print (item + "\n") for item in data] > > But, this doesn't work as the inclusion of the print causes an invalid > syntax error. You can't include a statement in a list comprehension. Anyway the time taken to print will swamp any advantage you get from the list comp. If you just want to print the items, a simple loop will do it: for item in data: print item + '\n' Note this will double-space the output since print already adds a newline. If you want to create a string with all the items with following newlines, the classic way to do this is to build a list and then join it. To do it with the print included, try result = [] for item in data: newItem = item + '\n' print newItem result.append(newItem) result = ''.join(result) Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] List comprehensions
Kent I'm using a Javascript autocomplete plugin for an online web application/service. Each time a user inputs a character, the character is sent to the backend Python program which searches for the character in a list of >10,000 string items. Once it finds the character, the backend will return that string and N other adjacent string items where N can vary from 20 to 150. Each string item is sent back to the JS in separate print statements. Hence, the for loop. Now, N = 20 to 150 is not a lot (for a for loop) but this process is performed each time the user enters a character. Plus, there will be thousands (possibly more) users at a time. There is also the searching of the >10,000 string items using the entered character. All of this adds up in terms of performance. I haven't done any profiling yet as we are still building the system but it seemed sensible that replacing the for loop with a built-in would help. Maybe not? Hope that helps. Dinesh - Original Message - From: Kent Johnson To: Dinesh B Vadhia Cc: tutor@python.org Sent: Wednesday, April 09, 2008 1:48 PM Subject: Re: [Tutor] List comprehensions Dinesh B Vadhia wrote: > Here is a for loop operating on a list of string items: > > data = ["string 1", "string 2", "string 3", "string 4", "string 5", > "string 6", "string 7", "string 8", "string 9", "string 10", "string 11"] > > result = "" > for item in data: > result = item > print result > > I want to replace the for loop with another structure to improve > performance (as the data list will contain >10,000 string items]. At > each iteration of the for loop the result is printed (in fact, the > result is sent from the server to a browser one result line at a time) Any savings you have from optimizing this loop will be completely swamped by the network time. Why do you think this is a bottleneck? You could use [ sys.stdout.write(some operation on item) for item in data ] but I consider this bad style and I seriously doubt you will see any difference in performance. > The for loop will be called continuously and this is another reason to > look for a potentially better structure preferably a built-in. What do you mean 'called continuously'? Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Searching through large number of string items
The 10,000 string items are sorted. The way the autocomplete works is that when a user enters a char eg. 'f', the 'f' is sent to the server and returns strings with the char 'f'. You can limit the number of items sent back to the browser (say, limit to between 15 and 100). The string items containing 'f' are displayed. The user can then enter another char eg. 'a' to make 'fa'. The autocomplete plugin will search the cache to find all items containing 'fa' but may need to go back to the server to collect others. And, so on. Equally, the user could backspace the 'f' and enter 'k'. The 'k' will be sent to the server to find strings containing 'k', and so on. One way to solve this is with linear search which as you rightly pointed out has horrible performance (and it has!). I'll try the binary search and let you know. I'll also look at the trie structure. An alternative is to create an in-memory SQLite database of the string items. Any thoughts on that? Dinesh - Original Message - From: Kent Johnson To: Dinesh B Vadhia Cc: tutor@python.org Sent: Thursday, April 10, 2008 5:20 AM Subject: Re: [Tutor] List comprehensions Dinesh B Vadhia wrote: > Kent > > I'm using a Javascript autocomplete plugin for an online web > application/service. Each time a user inputs a character, the character > is sent to the backend Python program which searches for the character > in a list of >10,000 string items. Once it finds the character, the > backend will return that string and N other adjacent string items where > N can vary from 20 to 150. Each string item is sent back to the JS in > separate print statements. Hence, the for loop. Ok, this sounds a little closer to a real spec. What kind of search are you doing? Do you really just search for individual characters or are you looking for the entire string entered so far as a prefix? Is the list of 10,000 items sorted? Can it be? You need to look at your real problem and find an appropriate data structure, rather than showing us what you think is the solution and asking how to make it faster. For example, if what you have a sorted list of strings and you want to find the first string that starts with a given prefix and return the N adjacent strings, you could use the bisect module to do a binary search rather than a linear search. Binary search of 10,000 items will take 13-14 comparisons to find the correct location. Your linear search will take an average of 5,000 comparisons. You might also want to use a trie structure though I'm not sure if that will let you find adjacent items. http://www.cs.mcgill.ca/~cs251/OldCourses/1997/topic7/ http://jtauber.com/blog/2005/02/10/updated_python_trie_implementation/ > I haven't done any profiling yet as we are still building the system but > it seemed sensible that replacing the for loop with a built-in would > help. Maybe not? Not. An algorithm with poor "big O" performance should be *replaced*, not optimized. Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Searching through large number of string items
Ignore the 'adjacent items' remark. The rest is correct ie. looking for all strings containing a substring x. - Original Message - From: Kent Johnson To: Dinesh B Vadhia Cc: tutor@python.org Sent: Thursday, April 10, 2008 6:32 AM Subject: Re: [Tutor] Searching through large number of string items Dinesh B Vadhia wrote: > The 10,000 string items are sorted. > > The way the autocomplete works is that when a user enters a char eg. > 'f', the 'f' is sent to the server and returns strings with the char > 'f'. If it is all strings containing 'f' (not all strings starting with 'f') then the binary search will not work. A database might work better for that. You can get all strings containing some substring x with [ item for item in list if x in item ] Of course that is back to linear search. You mentioned before that you want to also show adjacent items? I don't know how to do that with a database either. Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] SQLite LIKE question
I'm reading a text file into an in-memory pysqlite table. When I do a SELECT on the table, I get a 'u' in front of each returned row eg. > (u'QB VII',) > (u'Quackser Fortune Has a Cousin in the Bronx',) I've checked the data being INSERT'ed into the table and it has no 'u'. The second problem is that I'm using the LIKE operator to match a pattern against a string but am getting garbage results. For example, looking for the characters q='dog' in each string the SELECT statement is as follows: for row in con.execute("SELECT FROM WHERE LIKE '%q%' limit 25"): print row This doesn't work and I've tried other combinations without luck! Any thoughts on the correct syntax for the LIKE? Dinesh ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] Fw: SQLite LIKE question
Try again: I'm using the LIKE operator to match a pattern against a string using this SELECT statement: for row in con.execute("SELECT FROM WHERE LIKE '%q%' limit 25"): .. where , , are placeholders! With q="dog" as a test example, I've tried '$q%', '%q%', '%q' and 'q%' and none of them return what I expect ie. all strings with the characters "dog" in them. Cheers! Dinesh - Original Message - From: Dinesh B Vadhia To: tutor@python.org Sent: Thursday, April 10, 2008 3:24 PM Subject: SQLite LIKE question I'm reading a text file into an in-memory pysqlite table. When I do a SELECT on the table, I get a 'u' in front of each returned row eg. > (u'QB VII',) > (u'Quackser Fortune Has a Cousin in the Bronx',) I've checked the data being INSERT'ed into the table and it has no 'u'. The second problem is that I'm using the LIKE operator to match a pattern against a string but am getting garbage results. For example, looking for the characters q='dog' in each string the SELECT statement is as follows: for row in con.execute("SELECT FROM WHERE LIKE '%q%' limit 25"): print row This doesn't work and I've tried other combinations without luck! Any thoughts on the correct syntax for the LIKE? Dinesh ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] SQLite LIKE question
Okay, I've got this now: > con = sqlite3.connect(":memory:") > cur = con.cursor() > cur.execute("""CREATE TABLE db.table(col.a integer, col.b text)""") > con.executemany("""INSERT INTO db.table(col.a, col.b) VALUES (?, ?)""", m) > con.commit() > for row in con.execute("""SELECT col.a, col.b FROM db.table"""): > print row > # when run, all rows are printed correctly but as unicode strings > q = "dog" > for row in con.execute("""SELECT col.b FROM db.table WHERE col.b LIKE ? LIMIT > 25""", q): >print row .. And, I get the following error: Traceback (most recent call last): for row in con.execute("SELECT col.b FROM db.table WHERE col.b LIKE ? LIMIT 25", q): ProgrammingError: Incorrect number of bindings supplied. The current statement uses 1, and there are 3 supplied. As Python/pysqlite stores the items in the db.table as unicode strings, I've also run the code with q=u"dog" but get the same error. Same with putting the q as a tuple ie. (q) in the Select statement. Btw, there are 73 instances of the substring 'dog' in db.table. Cheers Dinesh ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] Old School
I belong to the Old School where getting my head around OO is just one big pain. I write software by modularization executed as a set of functions - and it works (some call this functional programming!). Whenever I review Python books (eg. Lutz's excellent Programming Python, 3ed) the code is laid out with Def's followed by Classes (with their own Def's) which is as it should be. But, the Def's on their own (ie. not in Classes) are all of the form: > def abc(self): return or, > def xyz(self, ): return I don't use 'self' in my def's - should I? If so, why? Thanks! Dinesh ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] pysqlite and functions
I'm using a pysqlite select statement within a def function and it's not working because (I suspect) the pysqlite variables are not being declared corrrectly to be used within a def function or the def function is not setup correctly. Here is the code followed by the errors: code con = sqlite3.connect(":memory:") # create database/table in memory cur = con.cursor()# note: can use the nonstandard execute, executemany to avoid using Cursor object query = "CREATE TABLE db.table(field.a INTEGER, field.b TEXT)" cur.execute(query) query = "INSERT INTO db.table(field.a, field.b) VALUES (?, ?)", data cur.executemany(query) def getResult(q, limit): query = "SELECT field.b FROM db.table WHERE field.b LIKE '%s' LIMIT '%s'" %(q, limit) for row in cur.execute(query): print row return # main program .. q = limit = getResult(q, limit)# call getResult with parameters q and limit .. end code The error recieved is: Traceback (most recent call last): for row in cur.execute(query): NameError: global name 'cur' is not defined Some notes: 1. The code works perfectly outside of a def function but I need to have it working within a def. 2. Clearly, everything inside getResults is private unless declared otherwise. As a quick and dirty to force it to work I declared > global con, curs, db.table .. but that results in the same error 3. Moving con and cur into the def statement results in the error: Traceback (most recent call last): for row in cur.execute(query): OperationalError: no such table: db.table 4. The def getResults is not seeing con, curs and db.table even when declared as global. 5. I wonder if this is something specific to pysqlite. Cheers! Dinesh ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] SQLite LIKE question
Guys, I got it to work. The problem was to use pysqlite to search (in memory) a large number (>10,000) of string items containing the substring q (and to do it continuosly with different q's). The solution was to incase the substring q with % ie. '%q%'. The performance is excellent. The code is in my recent post (Subject: pysqlite and functions) with a new problem ie. the code works as-is but not within a def function. Dinesh .. Date: Fri, 11 Apr 2008 13:20:12 +0100 From: Tim Golden <[EMAIL PROTECTED]> Subject: Re: [Tutor] SQLite LIKE question Cc: tutor@python.org Message-ID: <[EMAIL PROTECTED]> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Dinesh B Vadhia wrote: > Okay, I've got this now: > >> con = sqlite3.connect(":memory:") >> cur = con.cursor() >> cur.execute("""CREATE TABLE db.table(col.a integer, col.b text)""") >> con.executemany("""INSERT INTO db.table(col.a, col.b) VALUES (?, ?)""", m) >> con.commit() > >> for row in con.execute("""SELECT col.a, col.b FROM db.table"""): >> print row >> # when run, all rows are printed correctly but as unicode strings >> q = "dog" >> for row in con.execute("""SELECT col.b FROM db.table WHERE col.b LIKE ? >> LIMIT 25""", q): >>print row > > .. And, I get the following error: > > Traceback (most recent call last): > for row in con.execute("SELECT col.b FROM db.table WHERE col.b LIKE ? > LIMIT 25", q): > ProgrammingError: Incorrect number of bindings supplied. The current > statement uses 1, and there are 3 supplied. Whenever you see this in a dbapi context, you can bet your socks that you're passing a single item (such as a string, q) rather than a list or tuple of items. Try passing [q] as the second parameter to that .execute function and see what happens! TJG ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] in-memory pysqlite databases
Say, you have already created a pysqlite database "testDB". In a Python program, you connect to the database as: > con = sqlite3.connect("testDB") > cur = con.cursor() To use a database in memory (ie. all the 'testDB' tables are held in memory) the pysqlite documentation says the declaration is: > con = sqlite3.connect(":memory:") > cur = con.cursor() But, this can't be right as you're not telling Python/pysqlite which database to keep in memory. I've tried ... > con = sqlite3.connect("testDB", ":memory:") > cur = con.cursor() .. but that didn't work. Any ideas? Dinesh ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] in-memory pysqlite databases
Bob An in-memory database that is empty to start, loaded with data, and goes away when the connection goes away is exactly what I'm after. The code and the program for an in-memory database works perfectly. However, a web version using webpy doesn't work - the error message is that it cannot find the database table. After reading your note, it hit me that an execution thread is created by pysqlite and another thread by webpy and hence webpy is not seeing the table. What a pain! Dinesh - Original Message - From: bob gailer To: Dinesh B Vadhia Cc: tutor@python.org Sent: Saturday, April 12, 2008 11:25 AM Subject: Re: [Tutor] in-memory pysqlite databases Dinesh B Vadhia wrote: Say, you have already created a pysqlite database "testDB". In a Python program, you connect to the database as: > con = sqlite3.connect("testDB") > cur = con.cursor() To use a database in memory (ie. all the 'testDB' tables are held in memory) the pysqlite documentation says the declaration is: > con = sqlite3.connect(":memory:") > cur = con.cursor() But, this can't be right as you're not telling Python/pysqlite which database to keep in memory. The documentation says "Creating an in-memory database". That means (to me) a new database that is memory resident and as consequence is empty to start and goes away when the connection goes away. I don't see any easy way to load a file-based db into a memory-based one. Seems like you'd need to create all the tables in memory, then run select cursors to retrieve from the file-based db and insert the rows into the memory-based db tables Why do you want it in memory? [snip] -- Bob Gailer 919-636-4239 Chapel Hill, NC ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] in-memory pysqlite databases
Why do you say: "Now you didn't mention webpy before, that makes a big difference!" ? As an aside, it really is a huge pain in the neck that, in general standard Python works (and works wonderfully) but as soon as you include external libraries (eg. Numpy, Scipy, webpy - and probably other web frameworks etc. etc.) things start to fall apart (badly!). And, from my experience with Python so far it is not of my incompetance (well, not most of the time!). Dinesh .. Date: Sat, 12 Apr 2008 23:23:30 +0100 From: "Alan Gauld" <[EMAIL PROTECTED]> Subject: Re: [Tutor] in-memory pysqlite databases To: tutor@python.org Message-ID: <[EMAIL PROTECTED]> Content-Type: text/plain; format=flowed; charset="Windows-1252"; reply-type=original "Dinesh B Vadhia" <[EMAIL PROTECTED]> wrote > However, a web version using webpy doesn't work Now you didn't mention webpy before, that makes a big difference! > an execution thread is created by pysqlite and > another thread by webpy and hence webpy is not > seeing the table. Almost certainly the case but if you are using the web you can almost certainly afford to use a file based SqlLite database and that way the data can be shared. The network delays will more than overcome the slowdown of moving to the file based database. -- Alan Gauld Author of the Learn to Program web site http://www.freenetpages.co.uk/hp/alan.gauld ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] in-memory pysqlite databases
Alan Your last paragraph is the gist of my note ie. it's the documentation, documentation, documentation. In addition to Python, we use Numpy/Scipy/webpy at the server - all of them Python libraries written in Python and/or C - and have faced no end of problems with these libraries. We also use HTML/CSS/JavaScript/JQuery at the browser and so far we've had zero problems. Of course, these tools are fully documented including the dead tree type! Cheers Dinesh ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] encode unicode strings from pysqlite
Here is a program that SELECT's from a pysqlite database table and encode's the returned unicode strings: import sys import os import sqlite3 con = sqlite3.connect("testDB.db") cur = con.cursor() a = u'99 Cycling Swords' b = a.encode('utf-8') print b q = '%wor%' limit = 25 query = "SELECT fieldB FROM testDB WHERE fieldB LIKE '%s' LIMIT '%s'" %(q, limit) for row in cur.execute(query): r = str(row) print r.encode('utf-8') The print b results in: 99 Cycling Swords ... which is what I want. But, the print r.encode('utf-8') leaves the strings as unicode strings eg. u'99 Cycling Swords' Any ideas what might be going on? Dinesh ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] encode unicode strings from pysqlite
Hi! Kent. The row[0].encode('utf-8') works perfectly within a standalone program. But didn't work within webpy until I realized that maybe webpy is storing the row as a dictionary (which it does) and that you have to get the string by the key (ie. 'fieldB'). That worked and also webpy encodes the unicode string at the same time. Here are the details: # standard Python: testDB.py con = sqlite3.connect("testDB.db") cur = con.cursor() query = "SELECT fieldB FROM testDB WHERE fieldB LIKE '%s' LIMIT '%s'" %(q, limit) for row in cur.execute(query):# row is a list print row[0].encode('utf-8')# works perfectly! # webpy: testDB2.py web.config.db_parameters = dict(dbn='sqlite', db="testDB.db") for row in web.select('testDB', what='fieldB', where='fieldB LIKE $q', limit=limit, vars={'q':q}): r = row['fieldB']# get encode'd unicode through dict key value print r # works perfectly! - Original Message - From: Kent Johnson To: Dinesh B Vadhia Cc: tutor@python.org Sent: Monday, April 14, 2008 3:42 AM Subject: Re: [Tutor] encode unicode strings from pysqlite Dinesh B Vadhia wrote: > Here is a program that SELECT's from a pysqlite database table and > encode's the returned unicode strings: > query = "SELECT fieldB FROM testDB WHERE fieldB LIKE '%s' LIMIT '%s'" > %(q, limit) > for row in cur.execute(query): Here row is a list containing a single unicode string. When you convert a list to a string, it converts the list elements to strings using the repr() function. The repr() of a unicode string includes the u'' as part of the result. In [64]: row = [u'99 Cycling Swords'] In [65]: str(row) Out[65]: "[u'99 Cycling Swords']" Notice that the above is a string that includes u' as part of the string. What you need to do is pick out the actual data and encode just that to a string. In [62]: row[0].encode('utf-8') Out[62]: '99 Cycling Swords' Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] Loading and using large sparse matrices under Windows
Hi! Does anyone on this list have experience of using the Scipy Sparse matrix library for loading and using very large datasets (>20,000 rows x >1m columns of integers) under Windows? I'm using a recent Scipy svn that supports (sparse) integer matrices but it still causes the pythonw.exe program to abort for the larger datasets. I have ample RAM to create, load and use the matrices. I posted a note on the Scipy list but thought I'd try here too as you always get a response! Plus, I need a solution to the problem pdq. Thanks! Dinesh ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] Equivalent 'case' statement
Is there an equivalent to the C/C++ 'case' (or 'switch') statement in Python? Dinesh ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] Equivalent 'case' statement
The dictionary of functions was the way to go and does perform much faster than if/elif's. Thank-you! - Original Message - From: inhahe To: Dinesh B Vadhia Cc: tutor@python.org Sent: Thursday, May 22, 2008 4:15 PM Subject: Re: [Tutor] Equivalent 'case' statement no, but you can a) use elifs if c==1: do this elif c==2: do this elif c==3: do this b) make a dictionary of functions (this is faster) def case1: do this def case2: do that def case3: do the other cases = {1: case2, 2: case2, 3:case3} cases[c]() if your functions are one expression you could use lambdas cases = { 1: lambda: x*2 2: lambda: y**2 3: lambda: sys.stdout.write("hi\n") } cases[c]() your functions and lambdas can also take parameters of course On Thu, May 22, 2008 at 5:53 PM, Dinesh B Vadhia <[EMAIL PROTECTED]> wrote: > Is there an equivalent to the C/C++ 'case' (or 'switch') statement in > Python? > > Dinesh > > ___ > Tutor maillist - Tutor@python.org > http://mail.python.org/mailman/listinfo/tutor > > ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] finding special character string
A text document has special character strings defined as "." + "set of characters" + ".". For example, ".sup." or ".quadbond." or ".degree." etc. The length of the characters between the opening "." and closing "." is variable. Assuming that you don't know beforehand all possible special character strings, how do you find all such character strings in the text document? Dinesh ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] finding special character string
Thank-you Kent - it works a treat! - Original Message - From: Kent Johnson To: Dinesh B Vadhia Cc: tutor@python.org Sent: Sunday, June 01, 2008 4:25 AM Subject: Re: [Tutor] finding special character string On Sun, Jun 1, 2008 at 6:48 AM, Dinesh B Vadhia <[EMAIL PROTECTED]> wrote: > A text document has special character strings defined as "." + "set of > characters" + ".". For example, ".sup." or ".quadbond." or ".degree." etc. > The length of the characters between the opening "." and closing "." is > variable. > > Assuming that you don't know beforehand all possible special character > strings, how do you find all such character strings in the text document? Assuming the strings are non-overlapping, i.e. the closing "." of one string is not the opening "." of another, you can find them all with import re re.findall(r'\..*?\.', text) Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] finding special character string
Yes, I'm happy because I found a non-regex way to solve the problem (see below). No, I'm not a student or worn out but wish I was back at college and partying! Yes, this is an interesting problem and here is the requirement: - A text document contains special words that start and end with a period ("."), the word between the start and end periods contain no punctuation or spaces except a hyphen in some special words. - Examples of special words include ".thrfore.", ".because.", '.music-sharp.", ".music-flat.", ".dbd.", ".vertline.", ".uparw.", ".hoarfrost." etc. - In most cases, the special words have a space (" ") before and after. - In some cases, a special word will be followed by one or two other special words eg. ".dbd..vertline." or ".music-flat..dbd..vertline." - In some cases, a special word will be followed by an ordinary word (with or without punctuation) eg. ".music-flat.mozart" or ".vertline.isn't" - A special word followed by an ordinary word (with or without punctuation) could be the end of a sentence and hence have a full-stop (".") eg. ".music-flat.mozart." or ".vertline.isn't." - The number of characters in a special word excluding the two periods is > 1 - Find and remove all special words from the text document (by processing one line at a time) How did I solve it? I found a list of all the special words, created a set of special words and then checked if each word in the text belonged to the set of special words. If we assume that the list of special words doesn't exist then the problem is interesting in itself to solve. Cheers! Dinesh Date: Sun, 1 Jun 2008 21:56:26 -0400 From: "Kent Johnson" <[EMAIL PROTECTED]> Subject: Re: [Tutor] finding special character string To: "Marilyn Davis" <[EMAIL PROTECTED]> Cc: tutor@python.org Message-ID: <[EMAIL PROTECTED]> Content-Type: text/plain; charset=ISO-8859-1 On Sun, Jun 1, 2008 at 9:41 PM, Marilyn Davis <[EMAIL PROTECTED]> wrote: > Yeh, we need a better spec. I was wondering if the stuff between the text > ought not include white space, or even a word boundary. A character class > might be better, if we knew. Hmm, yes, my regex will find many ordinary sentences in plain text. > Anyhow, I think we wore out the student. :^) He went away happy after my first reply. Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] zip and rar files
Does the Python zipfile module work on rar archives? If not, does a similar module exist for rar archives? Dinesh ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] zip and rar files
the zipfile module does work or rar zip archives. - Original Message - From: Dinesh B Vadhia To: tutor@python.org Sent: Saturday, June 07, 2008 8:27 AM Subject: zip and rar files Does the Python zipfile module work on rar archives? If not, does a similar module exist for rar archives? Dinesh ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] Extracting text from XML document
I want to extract text from XML (and SGML) documents. I found one program by Paul Prescod (http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/65128) from 2001. Does anyone know of any programs that are more recent? Cheers Dinesh ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] endless processing through for loop
I have a program with 2 for loops like this (in pseudocode): fw = open(newLine.txt, 'w') for i in xrange(0, 700,000, 1): read a file fname from folder for line in open(fname, 'r'): do some simple string processing on line fw.write(newline) fw.close() That's it. Very simple but after i reaches about 550,000 the program begins to crawl. As an example, the loops to 550,000 takes about an hour. From 550,000 to 580,000 takes an additional 4 hours. Any ideas about what could be going on? Dinesh ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] endless processing through for loop
There is no thrashing of disk as I have > 2gb RAM and I'm not keeping the file contents in memory. One line is read at a time, some simple string processing and then writing out the modified line. From: Kent Johnson Sent: Sunday, June 22, 2008 5:39 PM To: Dinesh B Vadhia Cc: tutor@python.org Subject: Re: [Tutor] endless processing through for loop On Sun, Jun 22, 2008 at 8:13 PM, Dinesh B Vadhia <[EMAIL PROTECTED]> wrote: > That's it. Very simple but after i reaches about 550,000 the program begins > to crawl. As an example, the loops to 550,000 takes about an hour. From > 550,000 to 580,000 takes an additional 4 hours. > > Any ideas about what could be going on? What happens to memory use? Does it start to thrash the disk? Are you somehow keeping the file contents in memory for all the files you read? Kent ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] removing whole numbers from text
I want to remove whole numbers from text but retain numbers attached to words. All whole numbers to be removed have a leading and trailing space. For example, in "the cow jumped-20 feet high30er than the lazy 20 timing fox who couldn't keep up the 865 meter race." remove the whole numbers 20 and 865 but keep the 20 in jumped-20 and the 30 in high30er. What is the best to do this using re? Dinesh ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
[Tutor] array and dictionary
Hi! Say, I've got a numpy array/matrix of the form: [[1 6 1 2 3] [4 5 4 7 0] [2 0 8 0 2] [8 2 6 3 0] [0 7 0 3 5] [8 0 3 0 6] [8 0 0 2 2] [3 1 0 4 0] [5 0 8 0 0] [2 1 0 5 6]] And, I want to create a dictionary of rows (as the keys) mapped to lists of non-zero numbers in that row ie. dictionary_non-zeros = { 0: [1 6 1 2 3] 1: [4 5 4 7] 2: [2 8 2] ... 9: [2 1 5 6] } How do I do this? Thanks! Dinesh ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor
Re: [Tutor] array and dictionary
Alan Thanks but I've been a bit daft and described the wrong problem which is easy to solve the long way. Starting again ... Given a (numpy) array how do you create a dictionary of lists where the list contains the column indexes of non-zero elements and the dictionary key is the row index. The easy way is 2 for loops ie. import numpy from collections import defaultdict A = [[1 6 1 2 3] [4 5 4 7 0] [2 0 8 0 2] [0 0 0 3 7]] dict = defaultdict(list) I = A.shape[0] J = A.shape[1] for i in xrange(0, I, 1): for j in xrange(0, J, 1): if a[i,j] > 0: dict[i].append(j) I want to find a faster/efficient way to do this without using the 2 for loops. Thanks! Btw, I posted this on the numpy list too to make sure that there aren't any numpy functions that would help. Dinesh Message: 5 Date: Sun, 21 Sep 2008 09:15:00 +0100 From: "Alan Gauld" <[EMAIL PROTECTED]> Subject: Re: [Tutor] array and dictionary To: tutor@python.org Message-ID: <[EMAIL PROTECTED]> Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original "Dinesh B Vadhia" <[EMAIL PROTECTED]> wrote > Hi! Say, I've got a numpy array/matrix of the form: > > [[1 6 1 2 3] > [4 5 4 7 0]... > [2 1 0 5 6]] > > I want to create a dictionary of rows (as the keys) mapped > to lists of non-zero numbers in that row Caveat, I dont know about numpy arrays.But assuming they act like Python lists You can get the non zeros with a comprehension nz = [n for n in row if n != 0] you can get the row and index using enumerate for n,r in enumerate(arr): So to create a dictionary, combine the elements somethng like: d ={} for n,r in enumerate(arr): d[n] = [v for v in r if v !=0] I'm sure you could do it all in one line if you really wanted to! Also the new any() function might be usable too. All untested HTH, -- Alan Gauld Author of the Learn to Program web site http://www.freenetpages.co.uk/hp/alan.gauld ___ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor