from:"Dinesh B Vadhia"

[Tutor] Plural words to Singular

2010-08-31 Thread Dinesh B Vadhia

Has anyone come across a quality program to turn plural words to singular 
words?  We don't want to use a stemmer.  Thanks.

Dinesh___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

[Tutor] pickling codecs

2010-09-08 Thread Dinesh B Vadhia

I use codecs to retain consistent unicode/utf-8 encoding and decoding for 
reading/writing to files.  Should the codecs be applied when using the 
pickle/unpickle function?  For example, the standard syntax is:

# pickle object
f = open(object, 'wb')
pickle.dump(object, f, 2)
 
# unpickle object
f = open(object, 'rb')
object= pickle.load(f)
 
or should it be:

# pickle object
f = codecs.open(object, 'wb', 'utf-8')
pickle.dump(object, f, 2)


# unpickle object
f = codecs.open(object, 'rb', 'utf-8')
object= pickle.load(f)

___
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Picking up citations

2009-02-10 Thread Dinesh B Vadhia

Kent

The citation without the name is perfect (and this appears to be how most 
citation parsers work).  There are two issues in the test run:

1.  The parallel citation 422 U.S. 490, 499 n. 10, 95 S.Ct. 2197, 2205 n. 10, 
45 L.Ed.2d 343 (1975) is resolved as:

422 U.S. 490 (1975)
499 n. 10 (1975)
95 S.Ct. 2197 (1975)
2205 n. 10 (1975)
45 L.Ed.2d 343 (1975)

instead of as:

422 U.S. 490, 499 n. 10 (1975)
95 S.Ct. 2197, 2205 n. 10 (1975)
45 L.Ed.2d 343 (1975)

ie. parsing the second page references should pick up all alphanumeric chars 
between the commas.

2. It doesn't parse the last citation ie. 463 U.S. 29, 43, 103 S.Ct. 2856, 
2867, 77 L.Ed.2d 443 (1983).  I tested it on another sample text and it missed 
the last citation too.

Thanks!

Dinesh


 
From: Kent Johnson 
Sent: Tuesday, February 10, 2009 4:01 AM
To: Dinesh B Vadhia 
Cc: tutor@python.org 
Subject: Re: [Tutor] Picking up citations


On Mon, Feb 9, 2009 at 12:51 PM, Dinesh B Vadhia
 wrote:
> Kent /Emmanuel
>
> Below are the results using the PLY parser and Regex versions on the
> attached 'sierra' data which I think covers the common formats.  Here are
> some 'fully unparsed" citations that were missed by the programs:
>
> Smith v. Wisconsin Dept. of Agriculture, 23 F.3d 1134, 1141 (7th Cir.1994)
>
> Indemnified Capital Investments, S.A. v. R.J. O'Brien & Assoc., Inc., 12
> F.3d 1406, 1409 (7th Cir.1993).
>
> Hunt v. Washington Apple Advertising Commn., 432 U.S. 333, 343, 97 S.Ct.
> 2434, 2441, 53 L.Ed.2d 383 (1977)
>
> Idaho Conservation League v. Mumma, 956 F.2d 1508, 1517-18 (9th Cir.1992)

A few issues here:
S.A. - this is hard, to allow this while filtering out sentences
R.J. O'Brien, etc. - Loosening up the rules for the second name can allow these
1517-18 - allow page ranges

The name issues are getting to be too much for me. Attached is a PLY
version that just pulls out the citation without the name; at one
point you indicated that would work for you.

Kent
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Picking up citations

2009-02-10 Thread Dinesh B Vadhia

I'm guessing that  '499 n. 10' is a page reference ie. page 499, point number 
10.  Legal citations are all a mystery - they even have their own citation 
bluebook (http://www.legalbluebook.com/) !

Dinesh




From: Kent Johnson 
Sent: Tuesday, February 10, 2009 10:57 AM
To: Dinesh B Vadhia 
Cc: tutor@python.org 
Subject: Re: [Tutor] Picking up citations


On Tue, Feb 10, 2009 at 12:42 PM, Dinesh B Vadhia
 wrote:
> Kent
>
> The citation without the name is perfect (and this appears to be how most
> citation parsers work).  There are two issues in the test run:
>
> 1.  The parallel citation 422 U.S. 490, 499 n. 10, 95 S.Ct. 2197, 2205 n.
> 10, 45 L.Ed.2d 343 (1975) is resolved as:
>
> 422 U.S. 490 (1975)
> 499 n. 10 (1975)
> 95 S.Ct. 2197 (1975)
> 2205 n. 10 (1975)
> 45 L.Ed.2d 343 (1975)
>
> instead of as:
>
> 422 U.S. 490, 499 n. 10 (1975)
> 95 S.Ct. 2197, 2205 n. 10 (1975)
> 45 L.Ed.2d 343 (1975)
>
> ie. parsing the second page references should pick up all alphanumeric chars
> between the commas.

So 499 n. 10 is a page reference? I can't pick up all alphanumeric
chars between commas, that would include a second reference.

Kent
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Picking up citations

2009-02-10 Thread Dinesh B Vadhia

You're probably right Paul.  But, my assumption is that the originators of 
legal documents pay a little more attention to getting the citation correct and 
in the right format then say Joe Bloggs does when completing an address block.  

I think that Kent has reached the end of his commendable effort.  I'll test out 
the latest version in anger over the coming weeks on large numbers of legal 
documents.

Dinesh





Message: 2
Date: Tue, 10 Feb 2009 14:29:20 -0600
From: "Paul McGuire" 
Subject: Re: [Tutor] Picking up citations
To: 
Message-ID: <0a8f5cca89bf4b08becd3c4b86f18...@awa2>
Content-Type: text/plain; charset="us-ascii"

Dinesh and Kent -

I've been lurking along as you run this problem to ground.  The syntax you
are working on looks very slippery, and reminds me of some of the issues I
had writing a generic street address parser with pyparsing
(http://pyparsing.wikispaces.com/file/view/streetAddressParser.py).  Mailing
list companies spend beaucoup $$$ trying to parse addresses in order to
filter duplicates, to group by zip code, street, neighborhood, etc., and
this citation format looks similarly scary.  

Congratulations on getting to a 95% solution using PLY.

-- Paul



___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] Removing control characters

2009-02-19 Thread Dinesh B Vadhia

I want a regex to remove control characters (< chr(32) and > chr(126)) from 
strings ie.

line = re.sub(r"[^a-z0-9-';.]", " ", line)   # replace all chars NOT A-Z, a-z, 
0-9, [-';.] with " " 

1.  What is the best way to include all the required chars rather than list 
them all within the r"" ?
2.  How do you handle the inclusion of the quotation mark " ?

Cheers

Dinesh

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Removing control characters

2009-02-19 Thread Dinesh B Vadhia

At the bottom of the link http://code.activestate.com/recipes/303342/ there are 
list comprehensions for string manipulation ie.

import string

str = 'Chris Perkins : 224-7992'
set = '0123456789'
r = '$'

# 1) Keeping only a given set of characters.

print  ''.join([c for c in str if c in set])

> '2247992'

# 2) Deleting a given set of characters.

print  ''.join([c for c in str if c not in set])

> 'Chris Perkins : -'

The missing one is

# 3) Replacing a set of characters with a single character ie.

for c in str:
if c in set:
string.replace (c, r)

to give

> 'Chris Perkins : $$$-'

My solution is:

print ''.join[string.replace(c, r) for c in str if c in set]

But, this returns a syntax error.  Any idea why?

Ta!

Dinesh

From: Kent Johnson 
Sent: Thursday, February 19, 2009 8:03 AM
To: Dinesh B Vadhia 
Cc: tutor@python.org 
Subject: Re: [Tutor] Removing control characters

On Thu, Feb 19, 2009 at 10:14 AM, Dinesh B Vadhia
 wrote:
> I want a regex to remove control characters (< chr(32) and > chr(126)) from
> strings ie.
>
> line = re.sub(r"[^a-z0-9-';.]", " ", line)   # replace all chars NOT A-Z,
> a-z, 0-9, [-';.] with " "
>
> 1.  What is the best way to include all the required chars rather than list
> them all within the r"" ?

You have to list either the chars you want, as you have done, or the
ones you don't want. You could use
r'[\x00-\x1f\x7f-\xff]' or
r'[^\x20-\x7e]'

> 2.  How do you handle the inclusion of the quotation mark " ?

Use \", that works even in a raw string.

By the way string.translate() is likely to be faster for this purpose
than re.sub(). This recipe might help:
http://code.activestate.com/recipes/303342/

Kent
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Removing control characters

2009-02-19 Thread Dinesh B Vadhia

Okay, here is a combination of Mark's suggestions and yours:

> # string of all chars
> a = ''.join([chr(n) for n in range(256)])
> a
'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f
 
!"#$%&\'()*+,-./0123456789:;<=>?...@abcdefghijklmnopqrstuvwxyz[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~\x7f\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff'

> # string of wanted chars
> b = ''.join([n for n in a if ord(n) >= 32 and ord(n) <= 126])
> b
' 
!"#$%&\'()*+,-./0123456789:;<=>?...@abcdefghijklmnopqrstuvwxyz[\\]^_`abcdefghijklmnopqrstuvwxyz{|}~'

> # string of unwanted chars > ord(126)
> c = ''.join([n for n in a if ord(n) < 32 or ord(n) > 126])
> c
'\x00\x01\x02\x03\x04\x05\x06\x07\x08\t\n\x0b\x0c\r\x0e\x0f\x10\x11\x12\x13\x14\x15\x16\x17\x18\x19\x1a\x1b\x1c\x1d\x1e\x1f\x7f\x80\x81\x82\x83\x84\x85\x86\x87\x88\x89\x8a\x8b\x8c\x8d\x8e\x8f\x90\x91\x92\x93\x94\x95\x96\x97\x98\x99\x9a\x9b\x9c\x9d\x9e\x9f\xa0\xa1\xa2\xa3\xa4\xa5\xa6\xa7\xa8\xa9\xaa\xab\xac\xad\xae\xaf\xb0\xb1\xb2\xb3\xb4\xb5\xb6\xb7\xb8\xb9\xba\xbb\xbc\xbd\xbe\xbf\xc0\xc1\xc2\xc3\xc4\xc5\xc6\xc7\xc8\xc9\xca\xcb\xcc\xcd\xce\xcf\xd0\xd1\xd2\xd3\xd4\xd5\xd6\xd7\xd8\xd9\xda\xdb\xdc\xdd\xde\xdf\xe0\xe1\xe2\xe3\xe4\xe5\xe6\xe7\xe8\xe9\xea\xeb\xec\xed\xee\xef\xf0\xf1\xf2\xf3\xf4\xf5\xf6\xf7\xf8\xf9\xfa\xfb\xfc\xfd\xfe\xff'

> # the string to process
> s = "Product Concepts\xe2\x80\x94Hard candy with an innovative twist, 
> Internet Archive: Wayback Machine. [online] Mar. 25, 2004. Retrieved from the 
> Internet http://www.confectionery-innovations.com>."

> # replace unwanted chars in string s with " "
> t = "".join([(" " if n in c else n) for n in s if n not in c])
> t
'Product ConceptsHard candy with an innovative twist, Internet Archive: Wayback 
Machine. [online] Mar. 25, 2004. Retrieved from the Internet http://www.confectionery-innovations.com>.'

This last bit doesn't work ie. replacing the unwanted chars with " " - eg. 
'ConceptsHard'.  What's missing?

Dinesh



From: Kent Johnson 
Sent: Thursday, February 19, 2009 12:36 PM
To: Dinesh B Vadhia 
Cc: tutor@python.org 
Subject: Re: [Tutor] Removing control characters


On Thu, Feb 19, 2009 at 2:25 PM, Dinesh B Vadhia
 wrote:

> # 3) Replacing a set of characters with a single character ie.
>
> for c in str:
> if c in set:
> string.replace (c, r)
>
> to give
>
>> 'Chris Perkins : $$$-'
> My solution is:
>
> print ''.join[string.replace(c, r) for c in str if c in set]

With the syntax corrected this will not do what you want; the "if c in
set" filters the characters in the result, so the result will contain
only the replacement characters. You would need something like
''.join([ (r if c in set else c) for c in str])

Note that both 'set' and 'str' are built-in names and therefore poor
choices for variable names.

Kent
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] Standardizing on Unicode and utf8

2009-02-20 Thread Dinesh B Vadhia

We want to standardize on unicode and utf8 and would like to clarify and verify 
their use to minimize encode()/decode()'ing:

1.  Python source files 
Use the header: # -*- coding: utf8 -*-

2.  Reading files
In most cases, we don't know the source encoding of the files being read.  Do 
we have to decode('utf8') after reading from file?

3. Writing files
We will always write to files in utf8.  Do we have to encode('utf8') before 
writing to file?

Is there anything else that we have to consider?

Cheers

Dinesh

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] Sorting large numbers of co-ordinate pairs

2009-03-12 Thread Dinesh B Vadhia

Have a large number (> 1bn) of integer co-ordinates (i, j).  The i are ordered 
and the j unordered.

I want to create (j, i) with j ordered and i unordered ie.

from:

...
6940, 22886
6940, 38277
6940, 43788
...

to:
...
38277, 567
38277, 90023
38277, 6940
...

I've tried the dictionary route and it works perfectly for small set of 
co-ordinate pairs but not for large sets as it hits memory capacity.

Any ideas how I could do this?

Dinesh
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] 32-bit libaries on 64-bit Windows

2009-03-16 Thread Dinesh B Vadhia

Does anyone know if 32-bit Python libraries will work with 64-bit Python under 
64-bit Windows?  For example, will 32-bit Numpy or Scipy work under 64-bit 
Python?  Cheers ...

Dinesh


___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] parse text for paragraghs/sections

2009-04-20 Thread Dinesh B Vadhia

Hi!  I want to parse text and pickup sections.  For example, from the text:

t = """abc  DEF ghi jkl  MNO pqr"""

... pickup all text between the tags  and  and replace with another 
piece of text.

I tried 

t = re.sub(r"\[A-Za-z0-9]\", "DBV", t)

... but it doesn't work.

How do you do this with re?  

Thanks

Dinesh
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] PDF to text conversion

2009-04-21 Thread Dinesh B Vadhia

Hi Robert

I don't have an answer but can have my sympathy.  I've been looking for a 
quality pdf to text convertor for months and not turned up anything useful.  
I've tried many free programs which are poor.  I too wanted a Python-only 
solution and tried pyPdf but that didn't work.  Just today I download a trial 
version of a so called top-notch converter and it produced unfaithful text.   
Not sure what the answer is!

Dinesh




Message: 5
Date: Tue, 21 Apr 2009 13:44:16 -0400
From: Robert Berman 
Subject: Re: [Tutor] PDF to text conversion
To: "Emad Nawfal ( )" 
Cc: tutor@python.org
Message-ID: <49ee05f0.3080...@cfl.rr.com>
Content-Type: text/plain; charset=windows-1256; format=flowed

Hello Emad,

I have seriously looked at the documentation associated with pyPDF. This 
seems to have the page as its smallest element of work, and what i need 
is a line by line process to go from .PDF format to Text. I don't think 
pyPDF will meet my needs but thank you for bringing it to my attention.

Thanks,


Robert Berman
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] PDF to text conversion

2009-04-22 Thread Dinesh B Vadhia

The best converter so far is pdftotext from http://www.glyphandcog.com/ who 
maintain an open source project at http://www.foolabs.com/xpdf/.

It's not a Python library but you can call pdftotext from with Python using 
os.system().  I used the pdftotext -layout option and that gave the best 
result.  hth.

dinesh




Message: 4
Date: Tue, 21 Apr 2009 18:37:39 -0400
From: Robert Berman 
Subject: Re: [Tutor] PDF to text conversion
To: tutor@python.org
Message-ID: <49ee4ab3.4040...@cfl.rr.com>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

First, thanks to everyone who contributed to this thread. I have a 
number of possible solutions and a number of paths to pursue to 
determine which avenue I should take to resolve this remaining issue. I 
did try the itools library and while everything installed nicely, most 
of the tests failed so I am not particularly overjoyed with the results.

Thank you Dinesh for the vote of sympathy. I do appreciate it.

I did use Adobe Reader to convert the history PDF file into a text file 
and it did seem to do it faithfully. So now I will work out a parsing 
function to extract my data and send it to a SQLLITE database.

I am thrilled both with the number of suggestions I have received from 
this group and the quality of the suggestions.

Thanks again,

Robert Berman

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] finding mismatched or unpaired html tags

2009-04-28 Thread Dinesh B Vadhia

I'm processing tens of thousands of html files and a few of them contain 
mismatched tags and ElementTree throws the error:

"Unexpected error opening J:/F2/663/blahblah.html: mismatched tag: line 124, 
column 8"

I now want to scan each file and simply identify each mismatched or unpaired 
tags (by line number) in each file.  I've read the ElementTree docs and cannot 
see anything obvious how to do this.  I know this is a common problem but 
feeling a bit clueless here - any ideas?

Dinesh
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] finding mismatched or unpaired html tags

2009-04-28 Thread Dinesh B Vadhia

A.T. / Marty

I'd prefer that the html parser didn't replace the missing tags as I want to 
know where and what the problems are.  Also, the source html documents were 
generated by another computer ie. they are not web page documents.  My sense is 
that it is only a few files out of tens of thousands.  Cheers ...

Dinesh

Message: 7
Date: Tue, 28 Apr 2009 08:54:33 -0500
From: Martin Walsh 
Subject: Re: [Tutor] finding mismatched or unpaired html tags
To: "tutor@python.org" 
Message-ID: <49f70a99.3050...@mwalsh.org>
Content-Type: text/plain; charset=us-ascii

A.T.Hofkamp wrote:
> Dinesh B Vadhia wrote:
>> I'm processing tens of thousands of html files and a few of them
>> contain mismatched tags and ElementTree throws the error:
>>
>> "Unexpected error opening J:/F2/663/blahblah.html: mismatched tag:
>> line 124, column 8"
>>
>> I now want to scan each file and simply identify each mismatched or
>> unpaired
> tags (by line number) in each file. I've read the ElementTree docs and
> cannot
> see anything obvious how to do this. I know this is a common problem but
> feeling a bit clueless here - any ideas?
>>
> 
> Don't use elementTree, use BeautifulSoup instead.
> 
> elementTree expects perfect input, typically generated by another computer.
> BeautifulSoup is designed to handle your everyday HTML page, filled with
> errors of all possible kinds.

But it also modifies the source html by default, adding closing tags,
etc. Important to know, I suppose, if you intend to re-write the html
files you parse with BeautifulSoup.

Also, unless you're running python 3.0 or greater, use the 3.0.x series
of BeautifulSoup -- otherwise you may run into the same issue.

http://www.crummy.com/software/BeautifulSoup/3.1-problems.html

HTH,
Marty

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] finding mismatched or unpaired html tags

2009-04-28 Thread Dinesh B Vadhia

This is the error and traceback:

Unexpected error opening J:/F2/html: mismatched tag: line 124, column 8

Traceback (most recent call last):
  File "C:\py", line 492, in 
raw = extractText(xhtmlfile)
  File "C:\py", line 334, in extractText
tree = make_tree(xhtmlfile)
  File "py", line 169, in make_tree
return tree
UnboundLocalError: local variable 'tree' referenced before assignment
 

Here is line 124, col 8 and I cannot see any obvious missing/mismatched tags:

"As to the present time I am unable physical and mentally to secure all this 
information at present."

Dinesh




From: Kent Johnson 
Sent: Tuesday, April 28, 2009 7:13 AM
To: Dinesh B Vadhia 
Cc: tutor@python.org 
Subject: Re: [Tutor] finding mismatched or unpaired html tags


On Tue, Apr 28, 2009 at 8:54 AM, Dinesh B Vadhia
 wrote:
> I'm processing tens of thousands of html files and a few of them contain
> mismatched tags and ElementTree throws the error:
>
> "Unexpected error opening J:/F2/663/blahblah.html: mismatched tag: line 124,
> column 8"
>
> I now want to scan each file and simply identify each mismatched or unpaired
> tags (by line number) in each file.  I've read the ElementTree docs and
> cannot see anything obvious how to do this.  I know this is a common problem
> but feeling a bit clueless here - any ideas?

It seems like the exception gives you the line number. What kind of
exception is raised? The exception object may contain the line and
column in a more accessible form, so you could catch the exception,
get the line number, then read that line out of the file and show it.

Kent
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] finding mismatched or unpaired html tags

2009-04-28 Thread Dinesh B Vadhia

Found the mismatched tag on line 94:

"My Name in Nelma Lois Thornton-S.S. No. sjn-yz-yokv/p>"

should be:

"My Name in Nelma Lois Thornton-S.S. No. sjn-yz-yokv"

I'll run all the html files through a simple script to identify the mismatches 
using etree.  Thanks.

Dinesh

From: Kent Johnson 
Sent: Tuesday, April 28, 2009 8:17 AM
To: Dinesh B Vadhia 
Cc: tutor@python.org 
Subject: Re: [Tutor] finding mismatched or unpaired html tags

On Tue, Apr 28, 2009 at 10:41 AM, Dinesh B Vadhia
 wrote:
> This is the error and traceback:
>
> Unexpected error opening J:/F2/html: mismatched tag: line 124, column 8
>
> Traceback (most recent call last):
>   File "C:\py", line 492, in 
> raw = extractText(xhtmlfile)
>   File "C:\py", line 334, in extractText
> tree = make_tree(xhtmlfile)
>   File "py", line 169, in make_tree
> return tree
> UnboundLocalError: local variable 'tree' referenced before assignment

This is inconsistent. The exception in the stack trace is from a
coding error in extractText. It looks like maybe ExtractText is
catching exceptions and printing them, and a bug in the exception
handling is causing the UnboundLocalError

> Here is line 124, col 8 and I cannot see any obvious missing/mismatched
> tags:
>
> "As to the present time I am unable physical and mentally to secure all
> this information at present."

If you look at a few more lines do you see anything untoward? Perhaps
there is a missing  before the , for example? I don't think 
is allowed inside every tag.

Kent
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] finding mismatched or unpaired html tags

2009-04-28 Thread Dinesh B Vadhia

Stefan / Alan et al

Thank-you for all the advice and links.  A simple script using etree is 
scanning 500K+ xhtml files and 2 files with mismatched files have been found so 
far which can be fixed manually.  I'll definitely look into "tidy" as it sounds 
pretty cool.  Because, we are running data processing programs on a 64-bit 
Windows box (yes, I know, I know ...) using 64-bit Python we can only use pure 
Python-only libraries.  I believe that lxml uses C libraries.  Again, thanks to 
everyone - a terrific community as usual!

Message: 5
Date: Tue, 28 Apr 2009 19:39:17 +0200
From: Stefan Behnel 
Subject: Re: [Tutor] finding mismatched or unpaired html tags
To: tutor@python.org
Message-ID: 
Content-Type: text/plain; charset=ISO-8859-1

A.T.Hofkamp wrote:
> Dinesh B Vadhia wrote:
>> I'm processing tens of thousands of html files and a few of them
>> contain mismatched tags and ElementTree throws the error:
>>
>> "Unexpected error opening J:/F2/663/blahblah.html: mismatched tag:
>> line 124, column 8"
>>
>> I now want to scan each file and simply identify each mismatched or
>> unpaired
> tags (by line number) in each file. I've read the ElementTree docs and
> cannot
> see anything obvious how to do this. I know this is a common problem but
> feeling a bit clueless here - any ideas?
> 
> Don't use elementTree, use BeautifulSoup instead.

Actually, now that the code is there anyway, the OP might be happier with
lxml.html. It's a lot faster than BeautifulSoup, uses less memory, and
often parses broken HTML better. It's also more user friendly for many HTML
tasks.

http://codespeak.net/lxml/lxmlhtml.html

This might also be worth a read:

http://blog.ianbicking.org/2008/12/10/lxml-an-underappreciated-web-scraping-library/

Stefan
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] finding mismatched or unpaired html tags

2009-04-29 Thread Dinesh B Vadhia

Lie / Alan

re: If the source document was generated by a computer, and it produces invalid 
markup, shouldn't that be considered a bug in the producing program?

Yes, absolutely but we don't have access to the producing program only the 
produced xhtml files.


Dinesh



Message: 7
Date: Wed, 29 Apr 2009 08:35:16 +0100
From: "Alan Gauld" 
Subject: Re: [Tutor] finding mismatched or unpaired html tags
To: tutor@python.org
Message-ID: 
Content-Type: text/plain; format=flowed; charset="iso-8859-1";
reply-type=response


"Lie Ryan"  wrote

>> documents were generated by another computer ie. they are not web page 
>> documents.  
> 
> If the source document was generated by a computer, and it produces 
> invalid markup, shouldn't that be considered a bug in the producing 

Elementree parses xml, the source docs are html.
Valid html may not be valid xml so the source could be correct 
even though it doesn't parse properly in elemtree.

OTOH you could be right! :-)

Alan G.
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] Dictionary, integer, compression

2009-04-29 Thread Dinesh B Vadhia

This could be a question for the comp.lang.python list but I'll try here first:

Say, you have a dictionary of integers, are the integers stored in a compressed 
integer format or as integers ie. are integers encoded before being stored in 
the dictionary and then decoded when read?

Dinesh




___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Dictionary, integer, compression

2009-04-29 Thread Dinesh B Vadhia

Alan

I want to perform test runs on my local machine with very large numbers of 
integers stored in a dictionary.  As the Python dictionary is an built-in 
function I thought that for very large dictionaries there could be compression. 
 Done correctly, integer compression wouldn't affect performance but could 
enhance it.  Weird, I know!  I'll check in with the comp.lang.python lot.

Dinesh




Message: 3
Date: Wed, 29 Apr 2009 17:35:53 +0100
From: "Alan Gauld" 
Subject: Re: [Tutor] Dictionary, integer, compression
To: tutor@python.org
Message-ID: 
Content-Type: text/plain; format=flowed; charset="iso-8859-1";
reply-type=original


"Dinesh B Vadhia"  wrote

> Say, you have a dictionary of integers, are the integers stored 
> in a compressed integer format or as integers ie. are integers 
> encoded before being stored in the dictionary and then 
> decoded when read?

I can't think of any reason to compress them, I imagine they 
are stored as integers. But given the way Python handlers 
integers with arbitrarily long numbers etc it may well be more 
complex than a simple integer (ie 4 byte number). But any 
form of compression would be likely to hit performamce 
so I doubt that they would be compressed.

Is there anything that made you think they might be?

HTH


-- 
Alan Gauld
Author of the Learn to Program web site
http://www.alan-g.me.uk/

 
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] reading nested folders in gzip files

2009-05-18 Thread Dinesh B Vadhia

The structure of the gzip files are:

gzip archive
folderA
folderB
list of folderC's
each folderC contains the target files

Within the archive, I want to open the gzip archive, open folderA, openFolderB 
, get the list of target files in folderC, and extract each file in folderC 
individually.

I've used gzip before but cannot see how to move from folderA to folder B 
within the archive.  Any ideas?

Dinesh


___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] unicode, utf-8 problem again

2009-06-04 Thread Dinesh B Vadhia

Hi!  I'm processing a large number of xml files that are all declared as utf-8 
encoded in the header ie.



My Python environment has been set for 'utf-8' through site.py.  Additionally, 
the top of each program/module has the declaration:

# -*- coding: utf-8 -*-

But, I still get this error:

Traceback (most recent call last):
...
UnicodeEncodeError: 'ascii' codec can't encode character u'\u201c' in position 
76: ordinal not in range(128)

What am I missing?

Dinesh


___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] Fw: unicode, utf-8 problem again

2009-06-04 Thread Dinesh B Vadhia

I forgot to add that I'm using elementtree to process the xml files and don't 
(usually) have any problems with that.  Plus, the workaround that works is to 
encode each elementtree output ie.:

thisxmlline = thisxmlline.encode('utf8')

But, this seems odd to me as isn't it already being processed as utf-8?

Dinesh



From: Dinesh B Vadhia 
Sent: Thursday, June 04, 2009 6:47 AM
To: tutor@python.org 
Subject: unicode, utf-8 problem again


Hi!  I'm processing a large number of xml files that are all declared as utf-8 
encoded in the header ie.



My Python environment has been set for 'utf-8' through site.py.  Additionally, 
the top of each program/module has the declaration:

# -*- coding: utf-8 -*-

But, I still get this error:

Traceback (most recent call last):
...
UnicodeEncodeError: 'ascii' codec can't encode character u'\u201c' in position 
76: ordinal not in range(128)

What am I missing?

Dinesh


___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] unicode, utf-8 problem again

2009-06-04 Thread Dinesh B Vadhia

Okay, I get it now ... reading/writing files with the codecs module and the 
'utf-8' option fixes it.   Thanks!  

From: Christian Witts 
Sent: Thursday, June 04, 2009 7:05 AM
To: Dinesh B Vadhia 
Cc: tutor@python.org 
Subject: Re: [Tutor] unicode, utf-8 problem again

Dinesh B Vadhia wrote:
> Hi!  I'm processing a large number of xml files that are all declared 
> as utf-8 encoded in the header ie.
>  
> 
>  
> My Python environment has been set for 'utf-8' through site.py.  
> Additionally, the top of each program/module has the declaration:
>  
> # -*- coding: utf-8 -*-
>  
> But, I still get this error:
>  
> Traceback (most recent call last):
> ...
> UnicodeEncodeError: 'ascii' codec can't encode character u'\u201c' in 
> position 76: ordinal not in range(128)
>  
> What am I missing?
>  
> Dinesh
>  
>  
>  
> 
>
> ___
> Tutor maillist  -  Tutor@python.org
> http://mail.python.org/mailman/listinfo/tutor
>   
Hi,

Take a read through http://evanjones.ca/python-utf8.html which will give 
you insight as to how you should be reading and processing your files.
As for the encoding line "# -*- coding: utf-8 -*-", that is actually to 
declare the character encoding of your script and not of potential data 
it will be working with.

-- 
Kind Regards,
Christian Witts

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] unicode, utf-8 problem again

2009-06-04 Thread Dinesh B Vadhia

That was very useful - thanks!  Hopefully, I'm  "all Unicode" now.

From: wesley chun 
Sent: Thursday, June 04, 2009 10:45 AM
To: Dinesh B Vadhia ; tutor@python.org 
Subject: Re: [Tutor] unicode, utf-8 problem again

>>  But, I still get this error:
>>  Traceback (most recent call last):
>> ...
>> UnicodeEncodeError: 'ascii' codec can't encode character u'\u201c' in
>> position 76: ordinal not in range(128)
>>  What am I missing?
>
> Take a read through http://evanjones.ca/python-utf8.html which will give you
> insight as to how you should be reading and processing your files.

in a similar vein, i wrote a shorter blog post awhile ago that focuses
specifically on string processing:
http://wesc.livejournal.com/1743.html ... in it, i also describe the
correct way of thinking about strings in these contexts... the
difference between a string that represents data vs. a "string" which
is made up of various bytes, as in binary files.

hope this helps!
-- wesley
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
"Core Python Programming", Prentice Hall, (c)2007,2001
"Python Fundamentals", Prentice Hall, (c)2009
http://corepython.com

wesley.j.chun :: wescpy-at-gmail.com
python training and technical consulting
cyberweb.consulting : silicon valley, ca
http://cyberwebconsulting.com
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] string pickling and sqlite blob'ing

2009-06-24 Thread Dinesh B Vadhia

I want to pickle (very long) strings and save them in a sqlite db.  The plan is 
to use pickle dumps() to turn a string into a pickle object and store it in 
sqlite.  After reading the string back from the sqlite db, use pickle loads() 
to turn back into original string.  

- Is this a good approach for storing very long strings?  

- Are the pickle'd strings stored in the sqlite db as a STRING or BLOB? 

Cheers.

Dinesh

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] string pickling and sqlite blob'ing

2009-06-24 Thread Dinesh B Vadhia

Hi Vince

That's terrific!  Once a string is compressed with gzip.zlib does it make a 
difference whether it is stored it in a TEXT or BLOB column?

Dinesh

From: vince spicer 
Sent: Wednesday, June 24, 2009 10:49 AM
To: Dinesh B Vadhia 
Cc: tutor@python.org 
Subject: Re: [Tutor] string pickling and sqlite blob'ing

Pickle is more for storing complex objects (arrays, dict, etc). pickling a 
string makes it bigger.

I have stored large text chunks in text and/or blob columns compressed with 
gzip.zlib.compress and extracted with gzip.zlib.decompress

Comparison:

import cPickle as Pickle
import gzip

x = "asdfasdfasdfasdfasdfasdfasdfasdfasdf"

print len(x)
>> 36

print len(Pickle.dumps(x))
>> 44

print len(gzip.zlib.compress(x))
>> 14

Vince

On Wed, Jun 24, 2009 at 11:17 AM, Dinesh B Vadhia  
wrote:

I want to pickle (very long) strings and save them in a sqlite db.  The plan is 
to use pickle dumps() to turn a string into a pickle object and store it in 
sqlite.  After reading the string back from the sqlite db, use pickle loads() 
to turn back into original string.  

- Is this a good approach for storing very long strings?  

- Are the pickle'd strings stored in the sqlite db as a STRING or BLOB? 

Cheers.

Dinesh

  ___
  Tutor maillist  -  Tutor@python.org
  http://mail.python.org/mailman/listinfo/tutor

 ___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] string pickling and sqlite blob'ing

2009-06-25 Thread Dinesh B Vadhia

Alan

On a machine with 6gb of ram, storing very long strings in sqlite caused a 
"sqlite3.OperationalError: Could not decode to UTF-8 column 'j' with text" 
which has been resolved.  This fix then caused a memory error when reading some 
of the strings back from the db.  Hence, I'm trying to work out what the 
problem is and looking for alternative solutions.  It is strange that I can 
insert a long string into sqlite but a memory error is caused when selecting 
it.  Splitting the strings into smaller chunks is the obvious solution but I 
need to sort out the above first since the post-processing after the select is 
on the entire string.

Dinesh




Message: 3
Date: Thu, 25 Jun 2009 00:44:22 +0100
From: "Alan Gauld" 
To: tutor@python.org
Subject: Re: [Tutor] string pickling and sqlite blob'ing
Message-ID: 
Content-Type: text/plain; format=flowed; charset="iso-8859-1";
reply-type=original


"Dinesh B Vadhia"  wrote 

> I want to pickle (very long) strings and save them in a sqlite db.  

Why?
Why not just store the string in the database?
If that turns out to be a problem then think about other 
options - like splitting it into chunks say?
But until you know you have a problem don't try to 
solve it!

> - Is this a good approach for storing very long strings?  

Probably not.

> - Are the pickle'd strings stored in the sqlite db as a STRING or BLOB? 

They could be stored either way, thats up to how you define 
your tables and write your SQL.

In general I expect databases to handle very large quantities of data
either as blobs or as references to a file. Is this a valid approach? 
Write the long string (assuming its many MB in size) into a text 
file and store that with a unique name. Then store the filename 
in the database.

But first check that you can't store it in the database directly or 
in chunks.


-- 
Alan Gauld
Author of the Learn to Program web site
http://www.alan-g.me.uk/
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] array and int

2009-06-26 Thread Dinesh B Vadhia

Say, you create an array['i'] for signed integers (which take a minimum 2 
bytes).  A calculation results in an integer that is larger than the range of 
an 'i'.  Normally, Python will convert an 'i' to a 4-byte 'l' integer.  But, 
does the same apply for an array ie. does Python dynamically adjust from 
array['i'] to array['l'']?

Before anyone suggests it, I would be using Numpy for arrays but there isn't a 
64-bit version available under Windows that works.

Dinesh
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] list comprehension problem

2009-07-03 Thread Dinesh B Vadhia

I'm suffering from brain failure (or most likely just being brain less!) and 
need help to create a list comprehension for this problem:

d is a list of integers: d = [0, 8, 4, 4, 4, 7, 2, 5, 1, 1, 5, 11, 11, 1, 6, 3, 
5, 6, 11, 1]

Want to create a new list that adds the current number and the prior number, 
where the prior number is the accumulation of the previous numbers ie.

dd = [0, 8, 12, 16, 20, 27, 29, 34, 35, 36, 41, 52, 63, 64, 70, 73, 78, 84, 95, 
96]

A brute force solution which works is:

>>> dd = []
>>> y = d[0]
>>> for i, x in enumerate(d):
>>>y += x
>>>dd.append(y)

Is there a list comprehension solution?

Dinesh

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] list comprehension problem

2009-07-03 Thread Dinesh B Vadhia

d = [0, 8, 4, 4, 4, 7, 2, 5, 1, 1, 5, 11, 11, 1, 6, 3, 5, 6, 11, 1]

and we want:

[0, 8, 12, 16, 20, 27, 29, 34, 35, 36, 41, 52, 63, 64, 70, 73, 78, 84, 95, 96]


dd = [ sum(d[:j]) for j in range(len(d)) ][1:]

gives:

[0, 8, 12, 16, 20, 27, 29, 34, 35, 36, 41, 52, 63, 64, 70, 73, 78, 84, 95]

Dinesh




Message: 6
Date: Fri, 03 Jul 2009 12:22:30 -0700
From: Emile van Sebille 
To: tutor@python.org
Subject: Re: [Tutor] list comprehension problem
Message-ID: 
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

On 7/3/2009 12:09 PM Dinesh B Vadhia said...
> I'm suffering from brain failure (or most likely just being brain less!) 
> and need help to create a list comprehension for this problem:
>  
> d is a list of integers: d = [0, 8, 4, 4, 4, 7, 2, 5, 1, 1, 5, 11, 11, 
> 1, 6, 3, 5, 6, 11, 1]
>  
> Want to create a new list that adds the current number and the prior 
> number, where the prior number is the accumulation of the previous 
> numbers ie.

[ sum(d[:j]) for j in range(len(d)) ][1:]

Emile

>  
> dd = [0, 8, 12, 16, 20, 27, 29, 34, 35, 36, 41, 52, 63, 64, 70, 73, 78, 
> 84, 95, 96]
>  
> A brute force solution which works is:
>  
>  >>> dd = []
>  >>> y = d[0]
>  >>> for i, x in enumerate(d):
>  >>>y += x
>  >>>dd.append(y)
>  
> Is there a list comprehension solution?
>  
> Dinesh
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] list comprehension problem

2009-07-03 Thread Dinesh B Vadhia

Thanks Emile / Kent.

The problem I see with this solution is that at each stage it is re-summing the 
j's instead of retaining a running total which the 'for-loop' method does ie.

>>> dd = []
>>> y = d[0]
>>> for i, x in enumerate(d):
>>>y += x
>>>dd.append(y)

As the lists of integers get larger (mine are in the thousands of integers per 
list) the list comprehension solution will get slower.  Do you agree?

Dinesh



From: Kent Johnson 
Sent: Friday, July 03, 2009 1:21 PM
To: Dinesh B Vadhia 
Cc: tutor@python.org 
Subject: Re: [Tutor] list comprehension problem


On Fri, Jul 3, 2009 at 3:49 PM, Dinesh B
Vadhia wrote:
> d = [0, 8, 4, 4, 4, 7, 2, 5, 1, 1, 5, 11, 11, 1, 6, 3, 5, 6, 11, 1]
>
> and we want:
>
> [0, 8, 12, 16, 20, 27, 29, 34, 35, 36, 41, 52, 63, 64, 70, 73, 78, 84, 95,
> 96]
> dd = [ sum(d[:j]) for j in range(len(d)) ][1:]
>
> gives:
>
> [0, 8, 12, 16, 20, 27, 29, 34, 35, 36, 41, 52, 63, 64, 70, 73, 78, 84, 95]

In [9]: [ sum(d[:j+1]) for j in range(len(d)) ]
Out[9]: [0, 8, 12, 16, 20, 27, 29, 34, 35, 36, 41, 52, 63, 64, 70, 73,
78, 84, 95, 96]

Kent
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] large strings and garbage collection

2009-07-17 Thread Dinesh B Vadhia

This was discussed in a previous post but I didn't see a solution.  Say, you 
have 

for i in veryLongListOfStringValues:
s += i

As per previous post 
(http://thread.gmane.org/gmane.comp.python.tutor/54029/focus=54139), (quoting 
verbatim) "... the following happens inside the python interpreter:

1. get a reference to the current value of s.
2. get a reference to the string value i.
3. compute the new value += i, store it in memory, and make a reference to it.
4. drop the old reference of s (thus free-ing "abc")
5. give s a reference to the newly computed value.

After step 3 and before step 4, the old value of s is still referenced by s, 
and the new value is referenced internally (so step 5 can be performed). In 
other words, both the old and the new value are in memory at the same time 
after step 3 and before step 4, and both are referenced (that is, they cannot 
be garbage collected). ... "

As s gets very large, how do you deal with this situation to avoid a memory 
error or what I think will be a general slowing down of the system if the 
for-loop is repeated a large number of times.

Dinesh
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] large strings and garbage collection

2009-07-17 Thread Dinesh B Vadhia

join with generator expression is what was needed.  terrific!



From: Rich Lovely 
Sent: Friday, July 17, 2009 4:19 PM
To: Dinesh B Vadhia 
Cc: tutor@python.org 
Subject: Re: [Tutor] large strings and garbage collection


2009/7/17 Dinesh B Vadhia :
> This was discussed in a previous post but I didn't see a solution.  Say, you
> have
>
> for i in veryLongListOfStringValues:
> s += i
>
> As per previous post
> (http://thread.gmane.org/gmane.comp.python.tutor/54029/focus=54139),
> (quoting verbatim) "... the following happens inside the python interpreter:
>
> 1. get a reference to the current value of s.
> 2. get a reference to the string value i.
> 3. compute the new value += i, store it in memory, and make a reference to
> it.
> 4. drop the old reference of s (thus free-ing "abc")
> 5. give s a reference to the newly computed value.
>
> After step 3 and before step 4, the old value of s is still referenced by s,
> and the new value is referenced internally (so step 5 can be performed). In
> other words, both the old and the new value are in memory at the same time
> after step 3 and before step 4, and both are referenced (that is, they
> cannot be garbage collected). ... "
>
> As s gets very large, how do you deal with this situation to avoid a memory
> error or what I think will be a general slowing down of the system if the
> for-loop is repeated a large number of times.
>
> Dinesh
>
> ___
> Tutor maillist  -  Tutor@python.org
> http://mail.python.org/mailman/listinfo/tutor
>
>

If all you are doing is concatenating a list of strings, use the
str.join() method, which is designed for the job:

>>> listOfStrings
['And', 'now', 'for', 'something', 'completely', 'different.']
>>> print " ".join(listOfStrings)
And now for something completely different.
>>> print "_".join(listOfStrings)
And_now_for_something_completely_different.

If you need to perform other operations first, you can pass a
generator expression as the argument, for example:

>>> " ".join((s.upper() if n%2 else s.lower()) for n, s in 
>>> enumerate(listOfStrings))
'and NOW for SOMETHING completely DIFFERENT.'


Hope that helps you.
-- 
Rich "Roadie Rich" Lovely
There are 10 types of people in the world: those who know binary,
those who do not, and those who are off by one.
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] python interpreter vs bat file

2009-07-18 Thread Dinesh B Vadhia

During recent program testing, I ran a few Python programs from a Windows XP 
batch file which causes a memory error for one of the programs.  If I run the 
same set of programs from the Python interpreter no memory error occurs.  Any 
idea why this might be?

Dinesh
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] python interpreter vs bat file

2009-07-18 Thread Dinesh B Vadhia

Not much more information available.  Have a batch file (eg. 'test.bat') with 
entries:

python "program a.py"
python "program b.py"
python "program c.py"
python "program e.py"
...

One of the programs (eg. 'program c.py') fails with a memory error when 
performing a pickle.dump:

Traceback (most recent call last):
  ...
File "py", line 176, in pickleObject
pickle.dump(self, f, 2)
MemoryError

When the programs are run in the same order from the Python interpreter there 
are no memory errors.  This has happened before and it seems odd behavior.

Dinesh

From: Jeff Johnson 
Sent: Saturday, July 18, 2009 3:24 PM
To: Dinesh B Vadhia 
Cc: tutor@python.org 
Subject: Re: [Tutor] python interpreter vs bat file

Need more information.  Python works on Windows as good as anything 
else.  Maybe even better.

Dinesh B Vadhia wrote:
> During recent program testing, I ran a few Python programs from a 
> Windows XP batch file which causes a memory error for one of the 
> programs.  If I run the same set of programs from the Python interpreter 
> no memory error occurs.  Any idea why this might be?
>  
> Dinesh

Jeff

Jeff Johnson
j...@dcsoftware.com
Phoenix Python User Group - sunpigg...@googlegroups.com
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] python interpreter vs bat file

2009-07-19 Thread Dinesh B Vadhia

1.  Run Python Programs with Batch file
Python programs run from a Windows XP batch file (test.bat) in a CMD window 
initiated from Windows Explorer.  All programs except one execute successfully 
which stops with a memory error but batch file continues to execute other 
Python programs (as it should).

2.  Run Python Programs with Python Interpreter
Fire up Python Interpreter, open .py program, Run.  

When the program with the memory error in 1. is run independently as in 2. it 
works.

Dinesh





Message: 4
Date: Sun, 19 Jul 2009 07:18:08 +0100
From: "Alan Gauld" 
To: tutor@python.org
Subject: Re: [Tutor] python interpreter vs bat file
Message-ID: 
Content-Type: text/plain; format=flowed; charset="Windows-1252";
reply-type=original

"Dinesh B Vadhia"  wrote

> Not much more information available.  
> Have a batch file (eg. 'test.bat') with entries:
>
> python "program a.py"
> python "program b.py"
> python "program c.py"
>
> One of the programs (eg. 'program c.py') fails with a 
> memory error when performing a pickle.dump:
>
> Traceback (most recent call last):
>  ...
> File "py", line 176, in pickleObject
> pickle.dump(self, f, 2)
> MemoryError
> 
> When the programs are run in the same order from the 
> Python interpreter there are no memory errors.  

Can you elaborate on how you run the programs. It looks like 
an environmental issue so we need to know exactly what 
you are doing.

How do you run the bat file?
How do you run the programs "from the Python interpreter"

Are you using Windows Explorer or a CMD wondow? 
or the Start->Run dialog etc?

Which folders are you starting from in each case?

> This has happened before and it seems odd behavior.

So how did you fix it before? 
I've never seen or heard of this before.

-- 
Alan Gauld
Author of the Learn to Program web site
http://www.alan-g.me.uk/

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] python interpreter vs bat file

2009-07-19 Thread Dinesh B Vadhia

Hi Dave

Sorry, I wasn't being obtuse.  Here is more info:

1.  Run Python Programs with Batch file
- OS (correction): Windows 64-bit Vista SP2
- Python 2.5.4 64 bit (AMD64)
- The Python programs run from a Windows batch file (test.bat) in a CMD window 
initiated from Windows Explorer.  All programs except one execute successfully 
which stops with a memory error but the batch file continues to execute the 
other Python programs (as it should).

2.  Run Python Programs with Python Interpreter
- Start Idle, File/Open .py program, Run/Run Module
- When the program with the memory error in 1. is run independently with Idle 
it works.

Bob Gailer suggested running the Python programs individually in CMD one after 
the other.  This is sensible but my test programs run for days and the full 
suite of programs take longer.  The programs are memory intensive (the 64-bit 
machine has 8gb ram).  Hence, it is not easy to test this scenario right now.

It seems to me as if Windows is not freeing up memory between Python 
invocations in the batch file but can't be sure.  I said earlier that this has 
happened before but the fix, as now, is to run the program individually with 
Idle.  Hth ...

Dinesh

Message: 5
Date: Sun, 19 Jul 2009 11:56:15 -0700
From: Dave Kuhlman 
To: tutor@python.org
Subject: Re: [Tutor] python interpreter vs bat file
Message-ID: <20090719185615.ga5...@cutter.rexx.com>
Content-Type: text/plain; charset=us-ascii

On Sun, Jul 19, 2009 at 05:40:41AM -0700, Dinesh B Vadhia wrote:
> 
>1.  Run Python Programs with Batch file
> 
>Python programs run from a Windows XP batch file (test.bat) in a CMD
>window initiated from Windows Explorer.  All programs except one
>execute successfully which stops with a memory error but batch file
>continues to execute other Python programs (as it should).
> 
> 
> 
>2.  Run Python Programs with Python Interpreter
> 
>Fire up Python Interpreter, open .py program, Run.
> 

Dinesh -

Please tell us how you did this.  Did you type "python" at a
command prompt and then see the ">>>" prompt?  If so how did you
"open .py program, Run"? Or, did you start Idle (or some other IDE)
then click File-->Open, then run with the Run-->RunModule menu
item?

You have been asked several times for more information.  You really
need to read:

http://catb.org/~esr/faqs/smart-questions.html

There are people on this list who are very generous with their
time.  It's a valuable resource.  Please don't waste it.

I don't mean to be rude.  But, you will help us all, yourself
included, if you think carefully when asking a question.

- Dave

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] python interpreter vs bat file

2009-07-20 Thread Dinesh B Vadhia

Running time in CMD and IDLE
- Program running time is about the same in both CMD and IDLE
- The programs take a long time to run NOT because of runaway processes that 
are using up memory
- The programs are being optimized with each successive generation to reduce 
resources and time but the limitations boil down to Python for-loops (within 
functions) and sorts (probably the subject of another note to Tutor).

IDLE masking program errors
- Could be but ...
- The programs work under IDLE and return the correct results
- At this point I decided to run the programs from a batch file

Batch file method
- Except for one program, all other programs work using the batch file method.  
- The program with the error is run under IDLE and combined at the end with the 
output of the batch file programs and correct results are returned.

Program memory use
- The program with the memory error uses a lot of memory but the data 
structures should fit into available memory as it does when run with IDLE

Use of DOS Start command
- I'll try out the /I, /B and /WAIT commands in the next run and will let you 
know what happens.  Thanks.

Dinesh




Message: 1
Date: Sun, 19 Jul 2009 23:22:47 +0100
From: "Alan Gauld" 
To: tutor@python.org
Subject: Re: [Tutor] python interpreter vs bat file
Message-ID: 
Content-Type: text/plain; format=flowed; charset="iso-8859-1";
reply-type=original


"Dinesh B Vadhia"  wrote 

> Bob Gailer suggested running the Python programs individually 
> in CMD one after the other.  This is sensible but my test programs 
> run for days and the full suite of programs take longer.  

OK, But it can't take longer than in IDLE? Or even in the bat file.
So you can start the program running and then iconify it.

The reason this is important is that IDLE catches some errors 
that the normal python interpreter does not So IDLE may be 
masking a real problem in your code. However...

> The programs are memory intensive (the 64-bit machine 
> has 8gb ram).  Hence, it is not easy to test this scenario 
> right now.

Have you chedked in Task Manager how much RAM the python 
programs use up - they should be visible in the process tab.

If it is a lot then maybe we can rewrite the code to use less 
memory (Or maybe leak less memory).

> It seems to me as if Windows is not freeing up memory 
> between Python invocations in the batch file but can't be 
> sure.  

Windows should free up the memory, but it might depend on 
how you run the programs. In your earlier post you said the 
bat file contained lines like

python foo.py
python bar.py

You could try usng the start command instead, as in:

start foo.py

You might want to explore the /I, /B and /WAIT options

start gives you a lot more control over the execution environment.

Notice you don;t need the 'python' because start uses the file 
association.

HTH,


-- 
Alan Gauld
Author of the Learn to Program web site
http://www.alan-g.me.uk/



--

Message: 2
Date: Sun, 19 Jul 2009 23:36:03 +0100
From: "Alan Gauld" 
To: tutor@python.org
Subject: Re: [Tutor] hitting a wall (not a collision detection
question :P)
Message-ID: 
Content-Type: text/plain; format=flowed; charset="iso-8859-1";
reply-type=original

"Michael"  wrote
> ...everything up to functions vs. methods and the basics of classes
> and OOP. This is where I'm hitting a wall. It's at this point the all the
> books go off in different directions

OK, First thing is don;t worry about it, you are far from alone.
Many, Many programmers (even long term pros) find the transition
from functions to objects really hard to adjust to. Not surprising,
since it doers require a new way of thinking about program
structure. Eventually the OOP way will become second nature,
in fact you might even find it hard to think about ordinary functions
after a while! But it can take a while.

> and I'm not sure a) what I'm learning, b) why I'm learning it,
> and c) how this is going to help me get to my goals.

It might be good to throw us some specific questions and we can
try to answer them. General questions tend to produce vague
answers!

You can try my tutorial on OOP to see if that helps. Follow it
up with the case study to see OOP in action.

> I'm not really even understanding much of what these books
> are talking about at this point anyway.

Again, anything you are unsure about tell us and we can try to
explain. That isd what this klist is really good at because there
are many different perspectives who have all gone through
the same learning curve. Someone likely has the same way if
thinking about it as you do!

> It's like a few chapters after "Classes and OOP" were torn out of all of 
> them.

:-)

> So, I'm just wondering what

[Tutor] Inverted Index

2007-10-31 Thread Dinesh B Vadhia

Hello!  Anyone know of any example/cookbook code for implementing inverted 
indexes?

Cheers

Dinesh
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Inverted Index

2007-10-31 Thread Dinesh B Vadhia

Sure!  To create an inverted index of a very large matrix (M x N with M<>N and 
M>10m rows).  Most times the matrix will be sparse but sometimes it won't be.  
Most times the matrix will consist of 0's and 1's but sometimes it won't.  

Hope that helps.

Dinesh

- Original Message ----- 
From: Kent Johnson 
To: Dinesh B Vadhia 
Cc: tutor@python.org 
Sent: Wednesday, October 31, 2007 7:48 AM
Subject: Re: [Tutor] Inverted Index

Dinesh B Vadhia wrote:
> Hello!  Anyone know of any example/cookbook code for implementing 
> inverted indexes?

Can you say more about what you are trying to do?

Maybe PyLucene is interesting:
http://mail.python.org/pipermail/tutor/2006-April/046116.html
http://pylucene.osafoundation.org/

Kent
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Inverted Index

2007-10-31 Thread Dinesh B Vadhia

A NumPy matrix (because we have to perform a dot matrix multiplication prior to 
creating an inverted index).

Thank-you!


- Original Message - 
From: Kent Johnson 
To: Dinesh B Vadhia 
Cc: tutor@python.org 
Sent: Wednesday, October 31, 2007 8:16 AM
Subject: Re: [Tutor] Inverted Index


Dinesh B Vadhia wrote:
> Sure!  To create an inverted index of a very large matrix (M x N with 
> M<>N and M>10m rows).  Most times the matrix will be sparse but 
> sometimes it won't be.  Most times the matrix will consist of 0's and 
> 1's but sometimes it won't. 

How is the matrix represented? Is it in a numpy array? a dict? or...

Kent

>  
> Hope that helps.
>  
> Dinesh
>  
>  
> - Original Message -----
> *From:* Kent Johnson <mailto:[EMAIL PROTECTED]>
> *To:* Dinesh B Vadhia <mailto:[EMAIL PROTECTED]>
> *Cc:* tutor@python.org <mailto:tutor@python.org>
> *Sent:* Wednesday, October 31, 2007 7:48 AM
> *Subject:* Re: [Tutor] Inverted Index
> 
> Dinesh B Vadhia wrote:
>  > Hello!  Anyone know of any example/cookbook code for implementing
>  > inverted indexes?
> 
> Can you say more about what you are trying to do?
> 
> Maybe PyLucene is interesting:
> http://mail.python.org/pipermail/tutor/2006-April/046116.html
> http://pylucene.osafoundationorg/ <http://pylucene.osafoundation.org/>
> 
> Kent

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] dictionary append

2007-11-01 Thread Dinesh B Vadhia

Hello!  I'm creating a dictionary called keywords that has multiple entries 
each with a variable list of values eg.

keywords[1] = [1, 4, 6, 3]
keywords[2] = [67,2]
keywords[3] = [2, 8, 5, 66, 3, 23]
etc.

The keys and respective values (both are integers) are read in from a file.  
For each key, the value is append'ed until the next key.  Here is the code.

.
>>> keywords = {}
>>> with open("x.txt", "r") as f:
k=0
for line in f.readlines():
keywords[k], second = map(int, line.split())
keywords[k].append(second)
if keywords[k] != k:
k=k+1
   
Traceback (most recent call last):
  File "", line 5, in 
keywords[k].append(second)
AttributeError: 'int' object has no attribute 'append'
.

Any idea why I get this error?

Dinesh


___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] Elegant argument index sort

2007-11-07 Thread Dinesh B Vadhia

I'm sorting a 1-d (NumPy) matrix array (a) and wanting the index results (b).  
This is what I have:

b = a.argsort(0)
b = b+1

The one (1) is added to b so that there isn't a zero index element.  Is there a 
more elegant way to do this?

Dinesh
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] From Numpy Import *

2007-11-07 Thread Dinesh B Vadhia

Hello!  The standard Python practice for importing modules is, for example:

import sys
import os
etc.

In NumPy (and SciPy) the 'book' suggests using:

from numpy import *
from scipy import *

However, when I instead use 'import numpy' it causes all sorts of errors in my 
existing code.

What do you suggest?

Dinesh
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] From Numpy Import *

2007-11-08 Thread Dinesh B Vadhia

Thank-you!  It is important for us to avoid potential code conflicts and so 
we'll standardize on the import  syntax.

On a related note: 
We are using both NumPy and SciPy.  Consider the example y = Ax where A is a 
sparse matrix.  If A is qualified as a scipy object then do y and x also have 
to be scipy objects or can they be numpy objects?

Dinesh

- Original Message - 
From: Michael H. Goldwasser 
To: Dinesh B Vadhia 
Cc: tutor@python.org 
Sent: Wednesday, November 07, 2007 5:37 PM
Subject: [Tutor] From Numpy Import *

On Wednesday November 7, 2007, Dinesh B Vadhia wrote: 

>Hello!  The standard Python practice for importing modules is, for example:
>
>import sys
>import os
>etc.
>
>In NumPy (and SciPy) the 'book' suggests using:
>
>from numpy import *
>from scipy import *
>
>However, when I instead use 'import numpy' it causes all sorts of errors 
> in my existing code.

The issue is the following.  The numpy module includes many definitions, for
example a class named array.   When you use the syntax,

   from numpy import *

That takes all definitions from the module and places them into your
current namespace.  At this point, it would be fine to use a command
such as 

  values = array([1.0, 2.0, 3.0])

which instantiates a (numpy) array.

If you instead use the syntax

   import numpy

things brings that module as a whole into your namespace, but to
access definitions from that module you have to give a qualified
name, for example as

  values = numpy.array([1.0, 2.0, 3.0])

You cannot simply use the word array as in the first scenario.  This
would explain why your existing code would no longer work with the
change.

>What do you suggest?

The advantage of the "from numpy import *" syntax is mostly
convenience.   However, the better style is "import numpy" precisely
becuase it does not automatically introduce many other definitions
into your current namespace.

If you were using some other package that also defined an "array" and
then you were to use the "from numpy import *", the new definition
would override the other definition.  The use of qualified names helps
to avoid these collisions and makes clear where those definitions are
coming from.

With regard,
Michael

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] global is bad but ...

2007-11-13 Thread Dinesh B Vadhia

Consider a data structure (say, an array) that is operated on by a bunch of 
functions eg.

def function_A
global array_G
do stuff with array_G
return

def function_B
global array_G
do stuff with array_G
return

def function_C
global array_G
do stuff with array_G
return

The described way is to place the statement 'global' in line 1 of each 
function.  On the other hand, wiser heads say that the use of 'global' is bad 
and that reworking the code into classes and objects is better.

What do you think and suggest?

Dinesh
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] global is bad but ...

2007-11-13 Thread Dinesh B Vadhia

Alan/Jim:

It's good to hear some pragmatic advice.  

This particular module has 8 small functions that share common data 
(structures, primarily in arrays and vectors).  I tried passing array_G as a 
parameter but that doesn't work because everything in the function remains 
local and I cannot get back the altered data (unless you know better?). 

The 'global' route works a treat so far.

Dinesh

...
Date: Tue, 13 Nov 2007 23:11:49 -
From: "Alan Gauld" <[EMAIL PROTECTED]>
Subject: Re: [Tutor] global is bad but ...
To: tutor@python.org
Message-ID: <[EMAIL PROTECTED]>
Content-Type: text/plain; format=flowed; charset="iso-8859-1";
reply-type=original

"Dinesh B Vadhia" <[EMAIL PROTECTED]> wrote

> Consider a data structure (say, an array) that is operated 
> on by a bunch of functions eg.
>
> def function_A
> global array_G

> def function_B
> global array_G

> etc...

> On the other hand, wiser heads say that the use of 'global' 
> is bad and that reworking the code into classes and objects 
> is better.

Rather than answer your question directly can I ask, do 
you know *why* wiser heads say global is bad? What 
problems does using global introduce? What problems 
does it solve?

> What do you think and suggest?

I think it's better to understand issues and make informed 
choices rather than following the rules of others.

I suggest you consider whether global is bad in this case 
and what other solutions might be used instead. Then make 
an informed choice. If, having researched the subject you 
don't understand why global is (sometimes) bad ask for 
more info here.

HTH (a little),

-- 
Alan Gauld
Author of the Learn to Program web site
http://www.freenetpages.co.uk/hp/alan.gauld



___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] global is bad but ... okay

2007-11-14 Thread Dinesh B Vadhia

Kent et al

I reworked the code to pass parameters (mainly arrays)  to the functions.  It 
works and performs faster.  Thank-you all very much for the insights.

Dinesh


- Original Message - 
From: Kent Johnson 
To: Dinesh B Vadhia 
Cc: tutor@python.org 
Sent: Wednesday, November 14, 2007 4:53 AM
Subject: Re: [Tutor] global is bad but ...


Dinesh B Vadhia wrote:
> Alan/Jim:
>  
> It's good to hear some pragmatic advice. 
>  
> This particular module has 8 small functions that share common data 
> (structures, primarily in arrays and vectors).  I tried passing array_G 
> as a parameter but that doesn't work because everything in the function 
> remains local and I cannot get back the altered data (unless you know 
> better?). 

That sounds like a good candidate for a class with array_G as an 
instance attribute and your 8 small functions as methods.

If you pass the array as a parameter, you can change the passed 
parameter in place and changes will be seen by other clients. 
Re-assigning the parameter will have only local effect. For example:

This function mutates the list passed in, so changes are visible externally:
In [23]: def in_place(lst):
: lst[0] = 1
:
:
In [24]: a = [3,4,5]
In [25]: in_place(a)
In [26]: a
Out[26]: [1, 4, 5]

This function assigns a new value to the local name, changes are not 
visible externally:
In [27]: def reassign(lst):
: lst = []
:
:
In [28]: reassign(a)
In [29]: a
Out[29]: [1, 4, 5]

This function replaces the contents of the list with a new list. This is 
a mutating function so the changes are visible externally.
In [30]: def replace(lst):
: lst[:] = [1,2,3]
:
:
In [31]: replace(a)
In [32]: a
Out[32]: [1, 2, 3]

> The 'global' route works a treat so far.

Yes, globals work and they appear to be a simple solution, that is why 
they are used at all! They also
- increase coupling
- hinder testing and reuse
- obscure the relationship between pieces of code

which leads experienced developers to conclude that in general globals 
are a bad idea and should be strenuously avoided.

Kent

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] Web programming

2007-11-17 Thread Dinesh B Vadhia

Hi!  I want to create (for testing purposes) a straightforward web application 
consisting of a client that makes simple queries to a backend which returns 
data from a database (initially pysqlite3).  That's it - really!   I don't need 
a professional web server (eg. Apache) per se.

Are the Python urlparse, urllib, urllib2, httplib, BaseHTTPServer, 
SimpleHTTPServer etc. modules sufficient for the task.  The number of queries 
per second will initially be low, in the 10's/second.

Dinesh



___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] error binding parameter 1

2007-11-24 Thread Dinesh B Vadhia

Hello!  Can anyone see what the problem with this code snippet is?

Dinesh




image_filename = str(dir_list[i])
image_file = dir_path + image_filename
image_blob = open(image_file, 'rb')
[L40]   cursor.execute("Insert into image_table values (?, ?)", 
(image_filename, image_blob))

Traceback (most recent call last):
  File "C:\storage management.py", line 40, in 
cursor.execute("Insert into image_table values (?, ?)", (image_filename, 
image_blob))
InterfaceError: Error binding parameter 1 - probably unsupported type.
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] error binding parameter 1

2007-11-24 Thread Dinesh B Vadhia

Yes, it should be: image_blob = open(image_file, 'rb').read()

Thank-you!


- Original Message - 
From: bob gailer 
To: Dinesh B Vadhia 
Cc: tutor@python.org 
Sent: Saturday, November 24, 2007 5:55 PM
Subject: Re: [Tutor] error binding parameter 1


Dinesh B Vadhia wrote:
> Hello!  Can anyone see what the problem with this code snippet is?
>  
> Dinesh
>  
> 
> image_filename = str(dir_list[i])
> image_file = dir_path + image_filename
> image_blob = open(image_file, 'rb')
Should that be
image_blob = open(image_file, 'rb').read()?
> [L40]   cursor.execute("Insert into image_table values (?, ?)", 
> (image_filename, image_blob))
>  
> Traceback (most recent call last):
>   File "C:\storage management.py", line 40, in 
> cursor.execute("Insert into image_table values (?, ?)", 
> (image_filename, image_blob))
> InterfaceError: Error binding parameter 1 - probably unsupported type.
> 
>
> ___
> Tutor maillist  -  Tutor@python.org
> http://mail.python.org/mailman/listinfo/tutor
>   

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] Displaying images on a web page

2008-01-01 Thread Dinesh B Vadhia

I want to display a fixed number of same-size (jpeg) images on a web page.  The 
images displayed will change on user input.

I can use PIL to write the code but has anyone come across open source code 
that already does this?  Thank-you

Dinesh___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] A faster x in S

2008-01-15 Thread Dinesh B Vadhia

For some significant data pre-processing we have to perform the following 
simple process:

Is the integer x in a list of 13K sorted integers.  That's it except this has 
to be done >100m times with different x's (multiple times).  Yep, a real pain!  

I've put the 13K integers in a list S and am using the is 'x in S' function.

I was wondering if there is anything faster?

Dinesh
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] A faster x in S

2008-01-16 Thread Dinesh B Vadhia

I used the s.intersection(t) function in the set type as it was the most 
appropriate.  The performance was phenomenal.  Thank-you!

Dinesh


- Original Message - 
From: bob gailer 
To: Dinesh B Vadhia 
Cc: tutor@python.org 
Sent: Tuesday, January 15, 2008 2:03 PM
Subject: Re: [Tutor] A faster x in S


Dinesh B Vadhia wrote:
> For some significant data pre-processing we have to perform the 
> following simple process:
>  
> Is the integer x in a list of 13K sorted integers.  That's it except 
> this has to be done >100m times with different x's (multiple times).  
> Yep, a real pain! 
>  
> I've put the 13K integers in a list S and am using the is 'x in S' 
> function.
>  
> I was wondering if there is anything faster?
I agree with Kent.

 >>> l = range(13000)
 >>> s=set(l)
 >>> d=dict(enumerate(l))
 >>> import time
 >>> def f(lookupVal, times, values):
.. st=time.time()
.. for i in range(times):
.. z = lookupVal in values
.. return time.time()-st   
 >>> f(6499,1000,l)
0.3126376037598
 >>> f(6499,100,s)
0.3123623962402

So set is 1000 times faster than list!

 >>> f(6499,100,d)
0.31300020217895508

And dict is (as expected) about the same as set.

So 100,000,000 lookups should take about 30 seconds. Not bad, eh?

Let's explore another angle. What range are the integers in (min and max)?

Bob
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] An -1.#IND error

2008-01-26 Thread Dinesh B Vadhia

After a matrix*vector multiplication (ie. b = Ax, with A, x and b all floats), 
the b vector elements are all "-1.#IND".  What does this mean?  Btw, they are 
no divisions in the program eg. no divide by zeros.

Dinesh

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] An -1.#IND error

2008-01-27 Thread Dinesh B Vadhia

Luke:

This is literally the core of the code:

A = scipy.asmatrix(scipy.zeros((M, N), float))
q = scipy.asmatrix(scipy.zeros((N, 1)), float)
b = scipy.asmatrix(scipy.zeros((1, N)), float)

# populate A
# x is a vector of valid floats (I've checked)
# calculate b as:

b = A * x

After the matrix multiplication, the b vector elements are all "-1.#IND" 's.  
Note that there are no divisions by zero in the program.


Cheers

Dinesh



- Original Message - 
From: Luke Paireepinart 
To: Dinesh B Vadhia 
Cc: tutor@python.org 
Sent: Saturday, January 26, 2008 11:12 PM
Subject: Re: [Tutor] An -1.#IND error


Dinesh B Vadhia wrote:
> After a matrix*vector multiplication (ie. b = Ax, with A, x and b all 
> floats), the b vector elements are all "-1.#IND".  What does this 
> mean?  Btw, they are no divisions in the program eg. no divide by zeros.
A code sample would be _much_ more helpful here.
Please include one that exhibits the problem.
>  
> Dinesh
>  
>  
> 
>
> ___
> Tutor maillist  -  Tutor@python.org
> http://mail.python.org/mailman/listinfo/tutor
>   

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] matrix-vector multiplication errors

2008-02-01 Thread Dinesh B Vadhia

I've posted this on the Scipy forum but maybe there are answers on Tutor too.  
I'm performing a standard Scipy matrix* vector multiplication, b=Ax , (but not 
using the sparse module) with different sizes of A as follows:  


Assuming 8 bytes per float, then:
1. matrix A with M=10,000 and N=15,000 is of approximate size: 1.2Gb
2. matrix A with M=10,000 and N=5,000 is of approximate size: 390Mb
3. matrix A with M=10,000 and N=1,000 is of approximate size: 78Mb

The Python/Scipy matrix initialization statements are:
> A = scipy.asmatrix(scipy.empty((I,J), dtype=int))
> x = scipy.asmatrix(scipy.empty((J,1), dtype=float))
> b = scipy.asmatrix(scipy.empty((I,1), dtype=float))

I'm using a Windows XP SP2 PC with 2Gb RAM.

Both matrices 1. and 2. fail with INDeterminate values in b.  Matrix 3. works 
perfectly.  As I have 2Gb of RAM why are matrices 1. and 2. failing?

The odd thing is that Python doesn't return any error messages with 1. and 2. 
but we know the results are garbage (literally!)

Cheers!

Dinesh
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] List Box for Web

2008-02-26 Thread Dinesh B Vadhia

I know this isn't the right forum to ask but I'll try as someone might know.

For my web application, I need a list box with a search capability.  An example 
is the Python documentation (hit the F1 key under Windows from IDLE) and 
specifically the Index list ie. context-sensitive search through a list of 
phrases, but for use on a web page. 

Does anyone know if there are any open source UI widgets for such a capability?

Any help/pointers appreciated. 

Dinesh___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] Bag of Words and libbow

2008-03-09 Thread Dinesh B Vadhia

Has anyone come across Python modules/libraries to perform "Bag of Words" text 
analysis or an interface to the libbow C library?  Thank-you!

Dinesh
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Bag of Words and libbow

2008-03-10 Thread Dinesh B Vadhia

Andre

I had a quick look at NLTK which is an NLP library suite whereas libbow is for 
statistical text analysis.  Cheers

Dinesh



Message: 3
Date: Mon, 10 Mar 2008 08:24:23 +0100
From: Andre Halama <[EMAIL PROTECTED]>
Subject: Re: [Tutor] Bag of Words and libbow
To: tutor@python.org
Message-ID: <[EMAIL PROTECTED]>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Dinesh B Vadhia schrieb:

Hi,

| Has anyone come across Python modules/libraries to perform "Bag of
| Words" text analysis or an interface to the libbow C library?  Thank-you!

did you already have a look at NLTK
(http://nltk.sourceforge.net/index.php/Main_Page)?

HTH,

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] Working with Python Objects

2008-03-14 Thread Dinesh B Vadhia

I've avoided it as long as possible but I've reached a stage where I have to 
start using Python objects!  The primary reason is that the web framework uses 
objects and the second is to eliminate a few globals.  Here is example pseudo 
code followed by the question (one of many I suspect!):

class A:
constantA = 9
def OneOfA:

a = 

class B:
variableB = "quick brown fox"
def OneOfB:

b = 
c = b * a# the 'a' from def OneOfA in class A

Question:
1) how do I access the 'a' from function (method) OneOfA in class A so that it 
can be used by functions (methods) in class B?

Cheers

Dinesh


___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Working with Python Objects

2008-03-15 Thread Dinesh B Vadhia

Alan/Greg

I've combined your code fragments and added a function call too, to determine 
how 'a' is passed between objects and classes:

def addNumbers(i, j):
k = i + j
return k

class A:
def oneA(self):
z = 2
self.a = self.a * z

class B:
def oneB(self):
inA = A() # instance of class A
y = 5
b = y * inA.a
c = addNumbers(y, b)

Is this correct?

Dinesh


class A:
constantA = 9
def OneOfA:

a = 

class B:
variableB = "quick brown fox"
def OneOfB:

b = 
c = b * a# the 'a' from def OneOfA in class A
--
> Question:
> 1) how do I access the 'a' from function (method) OneOfA in
> class A so that it can be used by functions (methods) in class B?

You don't and shouldn't try to. In this case because the attriute
only exists inside the method, it is local, so dies when the
method completes. So first of all you need to make it part
of the class A. We do that by tagging it as an attribute of
self, which should be the fitrst attribute of every method.

But one of the concepts of OOP is to think in terms of the
objects not the attributes inside them So your question
should probably be: How do I access objects of class A
inside methods of class B?

The answer is by passing an instance into the method as a
parameter. You can then manipulate the instance of A by
sending messages to it. In Python you can access the
instance values of an object by sending a message with
the same name as the attribute - in other OOP languages
you would need to provide an accessor method.

But it is very important conceptually that you try to get away
from thinking about accessing attributes of another object
inside methods. Access the objects. Metthods should only
be manipulating the attributes of their own class. To do
otherwise is to break the reusability of your classes.

So re writing your pseudo code:

class A:
constantA = 9
def OneOfA(self):   # add self as first parameter

self.a =# use 'self' to tag 'a' as 
an attribute

class B:
variableB = "quick brown fox"
def OneOfB(self, anA):# add self and the instance of A

b = 
c = b * anA.a# the 'a' from the instance anA

This way OneOfB() only works with attributes local to it
or defined as instance variables or passed in as arguments.
Which is as it should be!

Real OOP purists don't like direct attribute access but
in Python its an accepted idiom and frankly there is little
value in writing an accessor method that simply returns
the value if you can access it directly. The thing you
really should try to avoid though is modifying the attributes
directly from another class. Normally you can write a
more meaningful method that will do that for you.

-- 
Alan Gauld
Author of the Learn to Program web site
Temorarily at:
http://uk.geocities.com/[EMAIL PROTECTED]/
Normally:
http://www.freenetpages.co.uk/hp/alan.gauld 
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] Python to C++

2008-03-19 Thread Dinesh B Vadhia

Say because of performance, you might want to re-write/convert Python code to 
C++.  What is the best way (or best practice) to do this wrt the tools 
available?

Dinesh
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Python to C++

2008-03-21 Thread Dinesh B Vadhia

Thank-you for all the suggestions for converting to C/C++ which will be 
followed up.  

Can we interface Python to a C++ library and if so how?

Dinesh

Date: Thu, 20 Mar 2008 17:21:52 -
From: "Alan Gauld" <[EMAIL PROTECTED]>
Subject: Re: [Tutor] Python to C++
To: tutor@python.org
Message-ID: <[EMAIL PROTECTED]>
Content-Type: text/plain; format=flowed; charset="iso-8859-1";
reply-type=original

"Dinesh B Vadhia" <[EMAIL PROTECTED]> wrote

> Say because of performance, you might want to re-write/convert 
> Python code to C++.  What is the best way (or best practice) 
> to do this wrt the tools available?

It may be obvious but its worth noting that optimised Python may 
be faster than a badly written C port. So first make sure you have 
squeezed the best performance out of Python.

Secondly only rewrite the bits that need it so use the profiler to 
identify the bottlenecks in your Python code and move those 
to a separate module to reduce conversion effort.

After that the advice already given re pyrex/psycho etc is all good.

You might also find SWIG a useful alternative if you decide 
to rewrite the slow functions by hand. SWIG will help wrap 
those functions so that the remaining Python code can 
access them.

Alan G.

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] from future import division

2008-03-23 Thread Dinesh B Vadhia

I spent fruitless hours trying to get a (normal) division x/y to work and then 
saw that you have to declare:

> from __future__ import division

.. at the top of a module file.  What is this all about?

Dinesh

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] Google App Engine

2008-04-08 Thread Dinesh B Vadhia

Hi!  Google announced an app server that allows pure Python developed 
applications/services to use their infrastructure.  This maybe of use to many 
on this list.  Further details can be found at: http://appengine.google.com/ 

The SDK include a modified Python 2.5.2 and Django 0.96.1, WebOb 0.9 and PyYAML 
3.05.

As an aside, does anyone here have experience of WebOb and specifically is it a 
mini web framework (like webpy)?  

Cheers

Dinesh
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] List comprehensions

2008-04-09 Thread Dinesh B Vadhia

Here is a for loop operating on a list of string items:

data = ["string 1", "string 2", "string 3", "string 4", "string 5", "string 6", 
"string 7", "string 8", "string 9", "string 10", "string 11"]

result = ""
for item in data:
result = item + "\n"
print result

I want to replace the for loop with a List Comrehension (or whatever) to 
improve performance (as the data list will be >10,000].  At each stage of the 
for loop I want to print the result ie.

[print (item + "\n")  for item in data]

But, this doesn't work as the inclusion of the print causes an invalid syntax 
error.

Any thoughts?

Dinesh
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] List comprehensions

2008-04-09 Thread Dinesh B Vadhia

Sorry, let's start again.

Here is a for loop operating on a list of string items:

data = ["string 1", "string 2", "string 3", "string 4", "string 5", "string 6", 
"string 7", "string 8", "string 9", "string 10", "string 11"]

result = ""
for item in data:
result =  item 
print result

I want to replace the for loop with another structure to improve performance 
(as the data list will contain >10,000 string items].  At each iteration of the 
for loop the result is printed (in fact, the result is sent from the server to 
a browser one result line at a time)

The for loop will be called continuously and this is another reason to look for 
a potentially better structure preferably a built-in.

Hope this makes sense!  Thank-you.

Dinesh

- Original Message - 
From: Kent Johnson 
To: Dinesh B Vadhia 
Cc: tutor@python.org 
Sent: Wednesday, April 09, 2008 12:40 PM
Subject: Re: [Tutor] List comprehensions

Dinesh B Vadhia wrote:
> Here is a for loop operating on a list of string items:
>  
> data = ["string 1", "string 2", "string 3", "string 4", "string 5", 
> "string 6", "string 7", "string 8", "string 9", "string 10", "string 11"]
>  
> result = ""
> for item in data:
> result = item + "\n"
> print result

I'm not sure what your goal is here. Do you mean to be accumulating all 
the values in data into result? Your sample code does not do that.

> I want to replace the for loop with a List Comrehension (or whatever) to 
> improve performance (as the data list will be >10,000].  At each stage 
> of the for loop I want to print the result ie.
>  
> [print (item + "\n")  for item in data]
>  
> But, this doesn't work as the inclusion of the print causes an invalid 
> syntax error.

You can't include a statement in a list comprehension. Anyway the time 
taken to print will swamp any advantage you get from the list comp.

If you just want to print the items, a simple loop will do it:

for item in data:
   print item + '\n'

Note this will double-space the output since print already adds a newline.

If you want to create a string with all the items with following 
newlines, the classic way to do this is to build a list and then join 
it. To do it with the print included, try

result = []
for item in data:
   newItem = item + '\n'
   print newItem
   result.append(newItem)
result = ''.join(result)

Kent
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] List comprehensions

2008-04-09 Thread Dinesh B Vadhia

Kent

I'm using a Javascript autocomplete plugin for an online web 
application/service.  Each time a user inputs a character, the character is 
sent to the backend Python program which searches for the character in a list 
of >10,000 string items.  Once it finds the character, the backend will return 
that string and N other adjacent string items where N can vary from 20 to 150.  
Each string item is sent back to the JS in separate print statements.  Hence, 
the for loop.

Now, N = 20 to 150 is not a lot (for a for loop) but this process is performed 
each time the user enters a character.  Plus, there will be thousands (possibly 
more) users at a time.  There is also the searching of the >10,000 string items 
using the entered character.  All of this adds up in terms of performance.

I haven't done any profiling yet as we are still building the system but it 
seemed sensible that replacing the for loop with a built-in would help.  Maybe 
not?

Hope that helps.

Dinesh

- Original Message - 
From: Kent Johnson 
To: Dinesh B Vadhia 
Cc: tutor@python.org 
Sent: Wednesday, April 09, 2008 1:48 PM
Subject: Re: [Tutor] List comprehensions

Dinesh B Vadhia wrote:
> Here is a for loop operating on a list of string items:
>  
> data = ["string 1", "string 2", "string 3", "string 4", "string 5", 
> "string 6", "string 7", "string 8", "string 9", "string 10", "string 11"]
>  
> result = ""
> for item in data:
> result =  item
> print result
>  
> I want to replace the for loop with another structure to improve 
> performance (as the data list will contain >10,000 string items].  At 
> each iteration of the for loop the result is printed (in fact, the 
> result is sent from the server to a browser one result line at a time)

Any savings you have from optimizing this loop will be completely 
swamped by the network time. Why do you think this is a bottleneck?

You could use
[ sys.stdout.write(some operation on item) for item in data ]

but I consider this bad style and I seriously doubt you will see any 
difference in performance.

> The for loop will be called continuously and this is another reason to 
> look for a potentially better structure preferably a built-in.

What do you mean 'called continuously'?

Kent
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Searching through large number of string items

2008-04-10 Thread Dinesh B Vadhia

The 10,000 string items are sorted.

The way the autocomplete works is that when a user enters a char eg. 'f', the 
'f' is sent to the server and returns strings with the char 'f'.  You can limit 
the number of items sent back to the browser (say, limit to between 15 and 
100).  The string items containing 'f' are displayed.  The user can then enter 
another char eg. 'a' to make 'fa'.  The autocomplete plugin will search the 
cache to find all items containing 'fa' but may need to go back to the server 
to collect others.  And, so on.  Equally, the user could backspace the 'f' and 
enter 'k'.  The 'k' will be sent to the server to find strings containing 'k', 
and so on.

One way to solve this is with linear search which as you rightly pointed out 
has horrible performance (and it has!).  I'll try the binary search and let you 
know.  I'll also look at the trie structure.

An alternative is to create an in-memory SQLite database of the string items.  
Any thoughts on that?

Dinesh

- Original Message - 
From: Kent Johnson 
To: Dinesh B Vadhia 
Cc: tutor@python.org 
Sent: Thursday, April 10, 2008 5:20 AM
Subject: Re: [Tutor] List comprehensions

Dinesh B Vadhia wrote:
> Kent
>  
> I'm using a Javascript autocomplete plugin for an online web 
> application/service.  Each time a user inputs a character, the character 
> is sent to the backend Python program which searches for the character 
> in a list of >10,000 string items.  Once it finds the character, the 
> backend will return that string and N other adjacent string items where 
> N can vary from 20 to 150.  Each string item is sent back to the JS in 
> separate print statements.  Hence, the for loop.

Ok, this sounds a little closer to a real spec. What kind of search are 
you doing? Do you really just search for individual characters or are 
you looking for the entire string entered so far as a prefix? Is the 
list of 10,000 items sorted? Can it be?

You need to look at your real problem and find an appropriate data 
structure, rather than showing us what you think is the solution and 
asking how to make it faster.

For example, if what you have a sorted list of strings and you want to 
find the first string that starts with a given prefix and return the N 
adjacent strings, you could use the bisect module to do a binary search 
rather than a linear search. Binary search of 10,000 items will take 
13-14 comparisons to find the correct location. Your linear search will 
take an average of 5,000 comparisons.

You might also want to use a trie structure though I'm not sure if that 
will let you find adjacent items.
http://www.cs.mcgill.ca/~cs251/OldCourses/1997/topic7/
http://jtauber.com/blog/2005/02/10/updated_python_trie_implementation/

> I haven't done any profiling yet as we are still building the system but 
> it seemed sensible that replacing the for loop with a built-in would 
> help.  Maybe not?

Not. An algorithm with poor "big O" performance should be *replaced*, 
not optimized.

Kent

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Searching through large number of string items

2008-04-10 Thread Dinesh B Vadhia

Ignore the 'adjacent items' remark.   The rest is correct ie. looking for all 
strings containing a substring x.

- Original Message - 
From: Kent Johnson 
To: Dinesh B Vadhia 
Cc: tutor@python.org 
Sent: Thursday, April 10, 2008 6:32 AM
Subject: Re: [Tutor] Searching through large number of string items

Dinesh B Vadhia wrote:
> The 10,000 string items are sorted.
>  
> The way the autocomplete works is that when a user enters a char eg. 
> 'f', the 'f' is sent to the server and returns strings with the char 
> 'f'. 

If it is all strings containing 'f' (not all strings starting with 'f') 
then the binary search will not work. A database might work better for that.

You can get all strings containing some substring x with
[ item for item in list if x in item ]

Of course that is back to linear search. You mentioned before that you 
want to also show adjacent items? I don't know how to do that with a 
database either.

Kent
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] SQLite LIKE question

2008-04-10 Thread Dinesh B Vadhia

I'm reading a text file into an in-memory pysqlite table.  When I do a SELECT 
on the table, I get a 'u' in front of each returned row eg.

> (u'QB VII',)
> (u'Quackser Fortune Has a Cousin in the Bronx',)

I've checked the data being INSERT'ed into the table and it has no 'u'.

The second problem is that I'm using the LIKE operator to match a pattern 
against a string but am getting garbage results.  For example, looking for the 
characters q='dog' in each string the SELECT statement is as follows:

for row in con.execute("SELECT  FROM  WHERE  LIKE '%q%' 
limit 25"):
print row

This doesn't work and I've tried other combinations without luck!  Any thoughts 
on the correct syntax for the LIKE?

Dinesh
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] Fw: SQLite LIKE question

2008-04-11 Thread Dinesh B Vadhia

Try again:  
I'm using the LIKE operator to match a pattern against a string using this 
SELECT statement:

for row in con.execute("SELECT  FROM  WHERE  LIKE '%q%' 
limit 25"):


.. where , ,  are placeholders!

With q="dog" as a test example, I've tried '$q%', '%q%', '%q' and 'q%' and none 
of them return what I expect ie. all strings with the characters "dog" in them.

Cheers!

Dinesh


- Original Message - 
From: Dinesh B Vadhia 
To: tutor@python.org 
Sent: Thursday, April 10, 2008 3:24 PM
Subject: SQLite LIKE question


I'm reading a text file into an in-memory pysqlite table.  When I do a SELECT 
on the table, I get a 'u' in front of each returned row eg.

> (u'QB VII',)
> (u'Quackser Fortune Has a Cousin in the Bronx',)

I've checked the data being INSERT'ed into the table and it has no 'u'.

The second problem is that I'm using the LIKE operator to match a pattern 
against a string but am getting garbage results.  For example, looking for the 
characters q='dog' in each string the SELECT statement is as follows:

for row in con.execute("SELECT  FROM  WHERE  LIKE '%q%' 
limit 25"):
print row

This doesn't work and I've tried other combinations without luck!  Any thoughts 
on the correct syntax for the LIKE?

Dinesh
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] SQLite LIKE question

2008-04-11 Thread Dinesh B Vadhia

Okay, I've got this now:

> con = sqlite3.connect(":memory:")
> cur = con.cursor()
> cur.execute("""CREATE TABLE db.table(col.a integer, col.b text)""")
> con.executemany("""INSERT INTO db.table(col.a, col.b) VALUES (?, ?)""", m)
> con.commit()

> for row in con.execute("""SELECT col.a, col.b FROM db.table"""):
> print row
> # when run, all rows are printed correctly but as unicode strings
> q = "dog"
> for row in con.execute("""SELECT col.b FROM db.table WHERE col.b LIKE ? LIMIT 
> 25""", q):
>print row

.. And, I get the following error:

Traceback (most recent call last):
for row in con.execute("SELECT col.b FROM db.table WHERE col.b LIKE ? LIMIT 
25", q):
ProgrammingError: Incorrect number of bindings supplied. The current 
statement uses 1, and there are 3 supplied.

As Python/pysqlite stores the items in the db.table as unicode strings, I've 
also run the code with q=u"dog" but get the same error. Same with putting the q 
as a tuple ie. (q) in the Select statement.  Btw, there are 73 instances of the 
substring 'dog' in db.table.  

Cheers

Dinesh
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] Old School

2008-04-11 Thread Dinesh B Vadhia

I belong to the Old School where getting my head around OO is just one big 
pain.  I write software by modularization executed as a set of functions - and 
it works (some call this functional programming!).  Whenever I review Python 
books (eg. Lutz's excellent Programming Python, 3ed) the code is laid out with 
Def's followed by Classes (with their own Def's) which is as it should be.  
But, the Def's on their own (ie. not in Classes) are all of the form:

> def abc(self):

return 

or,

> def xyz(self, ):

return 

I don't use 'self' in my def's - should I?  If so, why?

Thanks!

Dinesh

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] pysqlite and functions

2008-04-12 Thread Dinesh B Vadhia

I'm using a pysqlite select statement within a def function and it's not 
working because (I suspect) the pysqlite variables are not being declared 
corrrectly to be used within a def function or the def function is not setup 
correctly.  Here is the code followed by the errors:

 code 

con = sqlite3.connect(":memory:")  # create database/table in memory 
cur = con.cursor()# note: can use the nonstandard execute, executemany 
to avoid using Cursor object 
query = "CREATE TABLE db.table(field.a INTEGER, field.b TEXT)" 
cur.execute(query) 
query = "INSERT INTO db.table(field.a, field.b) VALUES (?, ?)", data 
cur.executemany(query) 

def getResult(q, limit):
query = "SELECT field.b FROM db.table WHERE field.b LIKE '%s' LIMIT 
'%s'" %(q, limit)
for row in cur.execute(query):
print row
return

# main program

..
q = 
limit = 
getResult(q, limit)# call getResult with parameters q and limit
..


 end code 

The error recieved is:

Traceback (most recent call last):

for row in cur.execute(query):
NameError: global name 'cur' is not defined

Some notes:

1.  The code works perfectly outside of a def function but I need to have it 
working within a def.

2. Clearly, everything inside getResults is private unless declared otherwise.  
As a quick and dirty to force it to work I declared 

> global con, curs, db.table

.. but that results in the same error

3. Moving con and cur into the def statement results in the error:

Traceback (most recent call last):

for row in cur.execute(query):
OperationalError: no such table: db.table

4. The def getResults is not seeing con, curs and db.table even when declared 
as global. 

5. I wonder if this is something specific to pysqlite.

Cheers!

Dinesh ___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] SQLite LIKE question

2008-04-12 Thread Dinesh B Vadhia

Guys, I got it to work.  The problem was to use pysqlite to search (in memory) 
a large number (>10,000) of string items containing the substring q (and to do 
it continuosly with different q's).  The solution was to incase the substring q 
with % ie. '%q%'.  The performance is excellent.  

The code is in my recent post (Subject: pysqlite and functions) with a new 
problem ie. the code works as-is but not within a def function.

Dinesh

..
Date: Fri, 11 Apr 2008 13:20:12 +0100
From: Tim Golden <[EMAIL PROTECTED]>
Subject: Re: [Tutor] SQLite LIKE question
Cc: tutor@python.org
Message-ID: <[EMAIL PROTECTED]>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Dinesh B Vadhia wrote:
> Okay, I've got this now:
> 
>> con = sqlite3.connect(":memory:")
>> cur = con.cursor()
>> cur.execute("""CREATE TABLE db.table(col.a integer, col.b text)""")
>> con.executemany("""INSERT INTO db.table(col.a, col.b) VALUES (?, ?)""", m)
>> con.commit()
> 
>> for row in con.execute("""SELECT col.a, col.b FROM db.table"""):
>> print row
>> # when run, all rows are printed correctly but as unicode strings
>> q = "dog"
>> for row in con.execute("""SELECT col.b FROM db.table WHERE col.b LIKE ? 
>> LIMIT 25""", q):
>>print row
> 
> .. And, I get the following error:
> 
> Traceback (most recent call last):
> for row in con.execute("SELECT col.b FROM db.table WHERE col.b LIKE ? 
> LIMIT 25", q):
> ProgrammingError: Incorrect number of bindings supplied. The current 
> statement uses 1, and there are 3 supplied.

Whenever you see this in a dbapi context, you can bet your socks
that you're passing a single item (such as a string, q) rather than
a list or tuple of items. Try passing [q] as the second parameter
to that .execute function and see what happens!

TJG

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] in-memory pysqlite databases

2008-04-12 Thread Dinesh B Vadhia

Say, you have already created a pysqlite database "testDB".  In a Python 
program, you connect to the database as:

> con = sqlite3.connect("testDB")
> cur = con.cursor()

To use a database in memory (ie. all the 'testDB' tables are held in memory) 
the pysqlite documentation says the declaration is:

> con = sqlite3.connect(":memory:")
> cur = con.cursor()

But, this can't be right as you're not telling Python/pysqlite which database 
to keep in memory.  I've tried ...

> con = sqlite3.connect("testDB", ":memory:")
> cur = con.cursor()

.. but that didn't work.  Any ideas?

Dinesh
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] in-memory pysqlite databases

2008-04-12 Thread Dinesh B Vadhia

Bob

An in-memory database that is empty to start, loaded with data, and goes away 
when the connection goes away is exactly what I'm after.  The code and the 
program for an in-memory database works perfectly.  

However, a web version using webpy doesn't work - the error message is that it 
cannot find the database table.  After reading your note, it hit me that an 
execution thread is created by pysqlite and another thread by webpy and hence 
webpy is not seeing the table.  What a pain!

Dinesh

- Original Message - 
From: bob gailer 
To: Dinesh B Vadhia 
Cc: tutor@python.org 
Sent: Saturday, April 12, 2008 11:25 AM
Subject: Re: [Tutor] in-memory pysqlite databases

Dinesh B Vadhia wrote: 
  Say, you have already created a pysqlite database "testDB".  In a Python 
program, you connect to the database as:

  > con = sqlite3.connect("testDB")
  > cur = con.cursor()

  To use a database in memory (ie. all the 'testDB' tables are held in memory) 
the pysqlite documentation says the declaration is:

  > con = sqlite3.connect(":memory:")
  > cur = con.cursor()

  But, this can't be right as you're not telling Python/pysqlite which database 
to keep in memory. 

The documentation says "Creating an in-memory database". That means (to me) a 
new database that is memory resident and as consequence is empty to start and 
goes away when the connection goes away.

I don't see any easy way to load a file-based db into a memory-based one. Seems 
like you'd need to create all the tables in memory, then run select cursors to 
retrieve from the file-based db and insert the rows into the memory-based db 
tables

Why do you want it in memory?

[snip]

-- 
Bob Gailer
919-636-4239 Chapel Hill, NC
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] in-memory pysqlite databases

2008-04-13 Thread Dinesh B Vadhia

Why do you say: "Now you didn't mention webpy before, that makes a big 
difference!" ?

 
As an aside, it really is a huge pain in the neck that, in general standard 
Python works (and works wonderfully) but as soon as you include external 
libraries (eg. Numpy, Scipy, webpy - and probably other web frameworks etc. 
etc.) things start to fall apart (badly!).  And, from my experience with Python 
so far it is not of my incompetance (well, not most of the time!).
 

Dinesh

..
Date: Sat, 12 Apr 2008 23:23:30 +0100
From: "Alan Gauld" <[EMAIL PROTECTED]>
Subject: Re: [Tutor] in-memory pysqlite databases
To: tutor@python.org
Message-ID: <[EMAIL PROTECTED]>
Content-Type: text/plain; format=flowed; charset="Windows-1252";
reply-type=original


"Dinesh B Vadhia" <[EMAIL PROTECTED]> wrote 

> However, a web version using webpy doesn't work 

Now you didn't mention webpy before, that makes a big 
difference!

> an execution thread is created by pysqlite and 
> another thread by webpy and hence webpy is not 
> seeing the table.  

Almost certainly the case but if you are using the web 
you can almost certainly afford to use a file based 
SqlLite database and that way the data can be shared. 
The network delays will more than overcome the 
slowdown of moving to the file based database.


-- 
Alan Gauld
Author of the Learn to Program web site
http://www.freenetpages.co.uk/hp/alan.gauld


___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] in-memory pysqlite databases

2008-04-13 Thread Dinesh B Vadhia

Alan

Your last paragraph is the gist of my note ie. it's the documentation, 
documentation, documentation.

In addition to Python, we use Numpy/Scipy/webpy at the server - all of them 
Python libraries written in Python and/or C - and have faced no end of problems 
with these libraries.

We also use HTML/CSS/JavaScript/JQuery at the browser and so far we've had zero 
problems.  Of course, these tools are fully documented including the dead tree 
type!

Cheers

Dinesh
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] encode unicode strings from pysqlite

2008-04-14 Thread Dinesh B Vadhia

Here is a program that SELECT's from a pysqlite database table and encode's the 
returned unicode strings:

import sys
import os
import sqlite3

con = sqlite3.connect("testDB.db")
cur = con.cursor()

a = u'99 Cycling Swords'
b = a.encode('utf-8')
print b

q = '%wor%'
limit = 25
query = "SELECT fieldB FROM testDB WHERE fieldB LIKE '%s' LIMIT '%s'" %(q, 
limit)
for row in cur.execute(query):
r = str(row)
print r.encode('utf-8')


The print b results in: 99 Cycling Swords ... which is what I want.

But, the print r.encode('utf-8') leaves the strings as unicode strings eg. u'99 
Cycling Swords'

Any ideas what might be going on?

Dinesh




___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] encode unicode strings from pysqlite

2008-04-14 Thread Dinesh B Vadhia

Hi! Kent.  The row[0].encode('utf-8') works perfectly within a standalone 
program.  But didn't work within webpy until I realized that maybe webpy is 
storing the row as a dictionary (which it does) and that you have to get the 
string by the key (ie. 'fieldB').  That worked and also webpy encodes the 
unicode string at the same time.  Here are the details:

# standard Python: testDB.py
con = sqlite3.connect("testDB.db")
cur = con.cursor()
query = "SELECT fieldB FROM testDB 
WHERE fieldB LIKE '%s' 
LIMIT '%s'" %(q, limit)
for row in cur.execute(query):# row is a list
print row[0].encode('utf-8')# works perfectly!

# webpy: testDB2.py
web.config.db_parameters = dict(dbn='sqlite', db="testDB.db")
for row in web.select('testDB', 
what='fieldB', 
where='fieldB LIKE $q', 
limit=limit, 
vars={'q':q}):
r = row['fieldB']# get encode'd unicode through 
dict key value
print r   # works perfectly!

- Original Message - 
From: Kent Johnson 
To: Dinesh B Vadhia 
Cc: tutor@python.org 
Sent: Monday, April 14, 2008 3:42 AM
Subject: Re: [Tutor] encode unicode strings from pysqlite

Dinesh B Vadhia wrote:
> Here is a program that SELECT's from a pysqlite database table and 
> encode's the returned unicode strings:

> query = "SELECT fieldB FROM testDB WHERE fieldB LIKE '%s' LIMIT '%s'" 
> %(q, limit)
> for row in cur.execute(query):

Here row is a list containing a single unicode string. When you convert 
a list to a string, it converts the list elements to strings using the 
repr() function. The repr() of a unicode string includes the u'' as part 
of the result.

In [64]: row = [u'99 Cycling Swords']
In [65]: str(row)
Out[65]: "[u'99 Cycling Swords']"

Notice that the above is a string that includes u' as part of the string.

What you need to do is pick out the actual data and encode just that to 
a string.
In [62]: row[0].encode('utf-8')
Out[62]: '99 Cycling Swords'

Kent
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] Loading and using large sparse matrices under Windows

2008-04-27 Thread Dinesh B Vadhia

Hi!  Does anyone on this list have experience of using the Scipy Sparse matrix 
library for loading and using very large datasets (>20,000 rows x >1m columns 
of integers) under Windows?

I'm using a recent Scipy svn that supports (sparse) integer matrices but it 
still causes the pythonw.exe program to abort for the larger datasets.  I have 
ample RAM to create, load and use the matrices.

I posted a note on the Scipy list but thought I'd try here too as you always 
get a response!  Plus, I need a solution to the problem pdq.  Thanks!

Dinesh
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] Equivalent 'case' statement

2008-05-22 Thread Dinesh B Vadhia

Is there an equivalent to the C/C++ 'case' (or 'switch') statement in Python?

Dinesh
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] Equivalent 'case' statement

2008-05-24 Thread Dinesh B Vadhia

The dictionary of functions was the way to go and does perform much faster than 
if/elif's.  Thank-you!  


- Original Message - 
From: inhahe 
To: Dinesh B Vadhia 
Cc: tutor@python.org 
Sent: Thursday, May 22, 2008 4:15 PM
Subject: Re: [Tutor] Equivalent 'case' statement


no, but you can
a) use elifs
if c==1:
  do this
elif c==2:
  do this
elif c==3:
  do this

b) make a dictionary of functions (this is faster)

def case1: do this
def case2: do that
def case3: do the other

cases = {1: case2, 2: case2, 3:case3}

cases[c]()

if your functions are one expression you could use lambdas

cases = {
1: lambda: x*2
2: lambda: y**2
3: lambda: sys.stdout.write("hi\n")
}

cases[c]()

your functions and lambdas can also take parameters of course




On Thu, May 22, 2008 at 5:53 PM, Dinesh B Vadhia
<[EMAIL PROTECTED]> wrote:
> Is there an equivalent to the C/C++ 'case' (or 'switch') statement in
> Python?
>
> Dinesh
>
> ___
> Tutor maillist  -  Tutor@python.org
> http://mail.python.org/mailman/listinfo/tutor
>
>
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] finding special character string

2008-06-01 Thread Dinesh B Vadhia

A text document has special character strings defined as "." + "set of 
characters" + ".".  For example, ".sup." or ".quadbond." or ".degree." etc.  
The length of the characters between the opening "." and closing "." is 
variable.

Assuming that you don't know beforehand all possible special character strings, 
how do you find all such character strings in the text document?

Dinesh
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] finding special character string

2008-06-01 Thread Dinesh B Vadhia

Thank-you Kent - it works a treat!


- Original Message - 
From: Kent Johnson 
To: Dinesh B Vadhia 
Cc: tutor@python.org 
Sent: Sunday, June 01, 2008 4:25 AM
Subject: Re: [Tutor] finding special character string


On Sun, Jun 1, 2008 at 6:48 AM, Dinesh B Vadhia
<[EMAIL PROTECTED]> wrote:
> A text document has special character strings defined as "." + "set of
> characters" + ".".  For example, ".sup." or ".quadbond." or ".degree." etc.
> The length of the characters between the opening "." and closing "." is
> variable.
>
> Assuming that you don't know beforehand all possible special character
> strings, how do you find all such character strings in the text document?

Assuming the strings are non-overlapping, i.e. the closing "." of one
string is not the opening "." of another, you can find them all with
  import re
  re.findall(r'\..*?\.', text)

Kent
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] finding special character string

2008-06-03 Thread Dinesh B Vadhia

Yes, I'm happy because I found a non-regex way to solve the problem (see below).

No, I'm not a student or worn out but wish I was back at college and partying!

Yes, this is an interesting problem and here is the requirement:

- A text document contains special words that start and end with a period 
("."), the word between the start and end periods contain no punctuation or 
spaces except a hyphen in some special words.
- Examples of special words include ".thrfore.", ".because.", '.music-sharp.", 
".music-flat.", ".dbd.", ".vertline.", ".uparw.", ".hoarfrost." etc.
- In most cases, the special words have a space (" ") before and after.
- In some cases, a special word will be followed by one or two other special 
words eg. ".dbd..vertline." or ".music-flat..dbd..vertline."
- In some cases, a special word will be followed by an ordinary word (with or 
without punctuation) eg. ".music-flat.mozart" or ".vertline.isn't"
- A special word followed by an ordinary word (with or without punctuation) 
could be the end of a sentence and hence have a full-stop (".") eg. 
".music-flat.mozart." or ".vertline.isn't."
- The number of characters in a special word excluding the two periods is > 1
- Find and remove all special words from the text document (by processing one 
line at a time)

How did I solve it?  I found a list of all the special words, created a set of 
special words and then checked if each word in the text belonged to the set of 
special words.  If we assume that the list of special words doesn't exist then 
the problem is interesting in itself to solve.

Cheers!

Dinesh




Date: Sun, 1 Jun 2008 21:56:26 -0400
From: "Kent Johnson" <[EMAIL PROTECTED]>
Subject: Re: [Tutor] finding special character string
To: "Marilyn Davis" <[EMAIL PROTECTED]>
Cc: tutor@python.org
Message-ID:
<[EMAIL PROTECTED]>
Content-Type: text/plain; charset=ISO-8859-1

On Sun, Jun 1, 2008 at 9:41 PM, Marilyn Davis <[EMAIL PROTECTED]> wrote:

> Yeh, we need a better spec. I was wondering if the stuff between the text
> ought not include white space, or even a word boundary.  A character class
> might be better, if we knew.

Hmm, yes, my regex will find many ordinary sentences in plain text.

> Anyhow, I think we wore out the student. :^)

He went away happy after my first reply.

Kent


___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] zip and rar files

2008-06-07 Thread Dinesh B Vadhia

Does the Python zipfile module work on rar archives?  If not, does a similar 
module exist for rar archives?

Dinesh
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] zip and rar files

2008-06-08 Thread Dinesh B Vadhia

the zipfile module does work or rar zip archives.

- Original Message - 
From: Dinesh B Vadhia 
To: tutor@python.org 
Sent: Saturday, June 07, 2008 8:27 AM
Subject: zip and rar files

Does the Python zipfile module work on rar archives?  If not, does a similar 
module exist for rar archives?

Dinesh
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] Extracting text from XML document

2008-06-08 Thread Dinesh B Vadhia

I want to extract text from XML (and SGML) documents.  I found one program by 
Paul Prescod (http://aspn.activestate.com/ASPN/Cookbook/Python/Recipe/65128) 
from 2001.  Does anyone know of any programs that are more recent?  Cheers

Dinesh
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] endless processing through for loop

2008-06-22 Thread Dinesh B Vadhia

I have a program with 2 for loops like this (in pseudocode):

fw = open(newLine.txt, 'w')
for i in xrange(0, 700,000, 1):
read a file fname from folder
for line in open(fname, 'r'):
do some simple string processing on line
fw.write(newline)
fw.close()

That's it.  Very simple but after i reaches about 550,000 the program begins to 
crawl.  As an example, the loops to 550,000 takes about an hour.  From 550,000 
to 580,000 takes an additional 4 hours.

Any ideas about what could be going on?

Dinesh


___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] endless processing through for loop

2008-06-22 Thread Dinesh B Vadhia

There is no thrashing of disk as I have > 2gb RAM and I'm not keeping the file 
contents in memory.  One line is read at a time, some simple string processing 
and then writing out the modified line.

From: Kent Johnson 
Sent: Sunday, June 22, 2008 5:39 PM
To: Dinesh B Vadhia 
Cc: tutor@python.org 
Subject: Re: [Tutor] endless processing through for loop

On Sun, Jun 22, 2008 at 8:13 PM, Dinesh B Vadhia
<[EMAIL PROTECTED]> wrote:
> That's it.  Very simple but after i reaches about 550,000 the program begins
> to crawl.  As an example, the loops to 550,000 takes about an hour.  From
> 550,000 to 580,000 takes an additional 4 hours.
>
> Any ideas about what could be going on?

What happens to memory use? Does it start to thrash the disk? Are you
somehow keeping the file contents in memory for all the files you
read?

Kent
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] removing whole numbers from text

2008-08-02 Thread Dinesh B Vadhia

I want to remove whole numbers from text but retain numbers attached to words.  
All whole numbers to be removed have a leading and trailing space.

For example, in "the cow jumped-20 feet high30er than the lazy 20 timing fox 
who couldn't keep up the 865 meter race." remove the whole numbers 20 and 865 
but keep the 20 in jumped-20 and the 30 in high30er.

What is the best to do this using re?

Dinesh





___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

[Tutor] array and dictionary

2008-09-20 Thread Dinesh B Vadhia

Hi!  Say, I've got a numpy array/matrix of the form:

[[1 6 1 2 3]
 [4 5 4 7 0]
 [2 0 8 0 2]
 [8 2 6 3 0]
 [0 7 0 3 5]
 [8 0 3 0 6]
 [8 0 0 2 2]
 [3 1 0 4 0]
 [5 0 8 0 0]
 [2 1 0 5 6]]

And, I want to create a dictionary of rows (as the keys) mapped to lists of 
non-zero numbers in that row ie.

dictionary_non-zeros = {
0: [1 6 1 2 3]
1: [4 5 4 7]
2: [2 8 2]
...
9: [2 1 5 6]
}

How do I do this?

Thanks!

Dinesh
___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] array and dictionary

2008-09-21 Thread Dinesh B Vadhia

Alan

Thanks but I've been a bit daft and described the wrong problem which is easy 
to solve the long way.  Starting again ...

Given a (numpy) array how do you create a dictionary of lists where the list 
contains the column indexes of non-zero elements and the dictionary key is the 
row index.  The easy way is 2 for loops ie.

import numpy
from collections import defaultdict

A = 
[[1 6 1 2 3]
 [4 5 4 7 0]
 [2 0 8 0 2]
 [0 0 0 3 7]]

dict = defaultdict(list)
I = A.shape[0]
J = A.shape[1]
for i in xrange(0, I, 1):
for j in xrange(0, J, 1):
if a[i,j] > 0:
dict[i].append(j)

I want to find a faster/efficient way to do this without using the 2 for loops. 
 Thanks!

Btw, I posted this on the numpy list too to make sure that there aren't any 
numpy functions that would help.

Dinesh




Message: 5
Date: Sun, 21 Sep 2008 09:15:00 +0100
From: "Alan Gauld" <[EMAIL PROTECTED]>
Subject: Re: [Tutor] array and dictionary
To: tutor@python.org
Message-ID: <[EMAIL PROTECTED]>
Content-Type: text/plain; format=flowed; charset="iso-8859-1";
reply-type=original

"Dinesh B Vadhia" <[EMAIL PROTECTED]> wrote

> Hi!  Say, I've got a numpy array/matrix of the form:
>
> [[1 6 1 2 3]
>  [4 5 4 7 0]...
>  [2 1 0 5 6]]
> 
> I want to create a dictionary of rows (as the keys) mapped 
> to lists of non-zero numbers in that row

Caveat, I dont know about numpy arrays.But assuming they 
act like Python lists

You can get the non zeros with a comprehension

nz = [n for n in row if n != 0]

you can get the row and index using enumerate

for n,r in enumerate(arr):

So to create a dictionary, combine the elements somethng like:

d ={}
for n,r in enumerate(arr):
d[n] = [v for v in r if v !=0]

I'm sure you could do it all in one line if you really wanted to!
Also the new any() function might be usable too.

All untested

HTH,

-- 
Alan Gauld
Author of the Learn to Program web site
http://www.freenetpages.co.uk/hp/alan.gauld

___
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

1 2 >

1 - 100 of 129 matches

Mail list logo