from:"harvey"

Python/Django Developer

2008-10-24 Thread harvey


My client in Jersey City, NJ 07302 is looking for a Python Developer. Below
is the job description:


Job Summary:

This is a programming position in the technical department of Advance
Internet, working on application development, application integration,
automated testing and deployment of applications, publishing structure and
unit testing in various development environments.  

Job Functions: 

   Develop extensible online applications
   Integrate vendor code into web tier
   Write and Maintain Unit Tests
   Develop and Maintain Automated Testing Frameworks
   Perform Load/Performance Testing
   Maintain & Enhance Build & Deployment Scripts
   Liaise with programmers (internal & external)
   Ability to set development standards
   Oversee incorporation of applications into web tier
   Assess stability of existing applications
   Coordinate conversion from legacy systems

Supervisory Responsibilities:

   None

Required Knowledge, Skills and Abilities:
Candidate needs to be aggressive in learning new things as well as taking
responsibility for work product and meeting deadlines with minimal
supervision.  They need to have worked in an online environment and have
published applications that have withstood live deployment. 
   Open source familiarity
   Django framework
   Python
   Other frameworks
   At least 2 standard templating languages such as Velocity, PHP, JSP
   Knowledge of quality control methods and philosophy
   Linux command line proficiency
   ANT/Maven, Build, Make
   Project management experience
   Excellent written and oral communication skills

Desired Skills/Experience:

   Moveable Type application knowledge
   Developing for a clustered server environment
   Ability to read/understand C 
   OO Perl

-- 
View this message in context: 
http://www.nabble.com/Python-Django-Developer-tp20155587p20155587.html
Sent from the Python - python-list mailing list archive at Nabble.com.

--
http://mail.python.org/mailman/listinfo/python-list

Error with math.sqrt

2017-01-07 Thread Jack Harvey

I'm starting out with Python 3.5.  My current frustration is with:


>>> math.sqrt(25)
Traceback (most recent call last):
  File "", line 1, in 
math.sqrt(25)
NameError: name 'math' is not defined
>>>


Advice?


Jack
-- 
https://mail.python.org/mailman/listinfo/python-list

variable attribute name

2014-10-27 Thread Harvey Greenberg

I want to let the name of an attribute be the string value of a variable.  Here 
is some code:

class Object(object): pass
A = Object()
s = 'attr'
A. = 1

The last line denotes the variable value by  (not a python form).  What I 
want is to have A.attr = 1, but 'attr' determined by the value of s.  Please 
advise.
-- 
https://mail.python.org/mailman/listinfo/python-list

how to read list from file

2013-10-05 Thread Harvey Greenberg

I am looping as for L in file.readlines(), where file is csv.

L is a list of 3 items, eg, [{'a':1, 'b':2}, [1,2,3], 10] Note that the first 
item is a dir and 2nd is a list, so parsing with split doesn't work.  Is there 
a way to convert L, which is a string, to the list of 3 items I want?

-- 
https://mail.python.org/mailman/listinfo/python-list

Re: how to read list from file

2013-10-06 Thread Harvey Greenberg

On Saturday, October 5, 2013 7:24:39 PM UTC-6, Tim Chase wrote:
> On 2013-10-05 18:08, Harvey Greenberg wrote:
> 
> > I am looping as for L in file.readlines(), where file is csv.
> 
> > 
> 
> > L is a list of 3 items, eg, [{'a':1, 'b':2}, [1,2,3], 10] Note that
> 
> > the first item is a dir and 2nd is a list, so parsing with split
> 
> > doesn't work.  Is there a way to convert L, which is a string, to
> 
> > the list of 3 items I want?
> 
> 
> 
> sounds like you want ast.literal_eval():
> 
> 
> 
>   Python 2.7.3 (default, Jan  2 2013, 13:56:14) 
> 
>   [GCC 4.7.2] on linux2
> 
>   Type "help", "copyright", "credits" or "license" for more
> 
>   information.
> 
>   >>> s = "[{'a':1, 'b':2}, [1,2,3], 10]"
> 
>   >>> import ast
> 
>   >>> print repr(ast.literal_eval(s))
> 
>   [{'a': 1, 'b': 2}, [1, 2, 3], 10]
> 
> 
> 
> -tkc

that didn't work.  printing it looks like the list because it's the input, but 
try printing len(repr(ast.literal_eval(s))).  It should give 3, but it gives 72 
(number of chars).
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: how to read list from file

2013-10-06 Thread Harvey Greenberg

On Sunday, October 6, 2013 10:41:33 AM UTC-6, Harvey Greenberg wrote:
> On Saturday, October 5, 2013 7:24:39 PM UTC-6, Tim Chase wrote:
> 
> > On 2013-10-05 18:08, Harvey Greenberg wrote:
> 
> > 
> 
> > > I am looping as for L in file.readlines(), where file is csv.
> 
> > 
> 
> > > 
> 
> > 
> 
> > > L is a list of 3 items, eg, [{'a':1, 'b':2}, [1,2,3], 10] Note that
> 
> > 
> 
> > > the first item is a dir and 2nd is a list, so parsing with split
> 
> > 
> 
> > > doesn't work.  Is there a way to convert L, which is a string, to
> 
> > 
> 
> > > the list of 3 items I want?
> 
> > 
> 
> > 
> 
> > 
> 
> > sounds like you want ast.literal_eval():
> 
> > 
> 
> > 
> 
> > 
> 
> >   Python 2.7.3 (default, Jan  2 2013, 13:56:14) 
> 
> > 
> 
> >   [GCC 4.7.2] on linux2
> 
> > 
> 
> >   Type "help", "copyright", "credits" or "license" for more
> 
> > 
> 
> >   information.
> 
> > 
> 
> >   >>> s = "[{'a':1, 'b':2}, [1,2,3], 10]"
> 
> > 
> 
> >   >>> import ast
> 
> > 
> 
> >   >>> print repr(ast.literal_eval(s))
> 
> > 
> 
> >   [{'a': 1, 'b': 2}, [1, 2, 3], 10]
> 
> > 
> 
> > 
> 
> > 
> 
> > -tkc
> 
> 
> 
> that didn't work.  printing it looks like the list because it's the input, 
> but try printing len(repr(ast.literal_eval(s))).  It should give 3, but it 
> gives 72 (number of chars).

None of the responses worked; after import json, I used:

  for line in inputFile.readlines():
 L = json.loads(line.replace("",""))
 print L, len(L)

I get error.  I probably  misunderstood how to implement these suggestions, but 
I wrote a list to a csv file whose members have lists.  I now want to read them 
(in another program) and end up with the origianl list.
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: how to read list from file

2013-10-06 Thread Harvey Greenberg

On Saturday, October 5, 2013 7:08:08 PM UTC-6, Harvey Greenberg wrote:
> I am looping as for L in file.readlines(), where file is csv.
> 
> 
> 
> L is a list of 3 items, eg, [{'a':1, 'b':2}, [1,2,3], 10] Note that the first 
> item is a dir and 2nd is a list, so parsing with split doesn't work.  Is 
> there a way to convert L, which is a string, to the list of 3 items I want?

Yay It worked.  Thanks!
-- 
https://mail.python.org/mailman/listinfo/python-list

Re: Insert characters into string based on re ?

2006-10-13 Thread harvey . thomas


Matt wrote:
> I am attempting to reformat a string, inserting newlines before certain
> phrases. For example, in formatting SQL, I want to start a new line at
> each JOIN condition. Noting that strings are immutable, I thought it
> best to spllit the string at the key points, then join with '\n'.
>
> Regexps can seem the best way to identify the points in the string
> ('LEFT.*JOIN' to cover 'LEFT OUTER JOIN' and 'LEFT JOIN'), since I need
> to identify multiple locationg in the string. However, the re.split
> method returns the list without the split phrases, and re.findall does
> not seem useful for this operation.
>
> Suggestions?

I think that re.sub is a more appropriate method rather than split and
join

trivial example (non SQL):

>>> addnlre = re.compile('LEFT\s.*?\s*JOIN|RIGHT\s.*?\s*JOIN', re.DOTALL + 
>>> re.IGNORECASE).sub
>>> addnlre(lambda x: x.group() + '\n', '... LEFT JOIN x RIGHT OUTER join y')
'... LEFT JOIN\n x RIGHT OUTER join\n y'

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Need a Regular expression to remove a char for Unicode text

2006-10-13 Thread harvey . thomas


శ్రీనివాస wrote:
> Hai friends,
> Can any one tell me how can i remove a character from a unocode text.
> కల్‌&హార is a Telugu word in Unicode. Here i want to
> remove '&' but not replace with a zero width char. And one more thing,
> if any whitespaces are there before and after '&' char, the text should
> be kept as it is. Please tell me how can i workout this with regular
> expressions.
>
> Thanks and regards
> Srinivasa Raju Datla

Don't know anything about Telugu, but is this the approach you want?

>>> x=u'\xfe\xff & \xfe\xff \xfe\xff&\xfe\xff'
>>> noampre = re.compile('(?>> noampre('', x)
u'\xfe\xff & \xfe\xff \xfe\xff\xfe\xff'

The regular expression has negative look behind and look ahead
assertions to check that there is no whitespace surrounding the '&'
character. Each match then found is then  replaced with the empty string

-- 
http://mail.python.org/mailman/listinfo/python-list

efficiency question

2006-06-30 Thread David Harvey

Hi,

Suppose I write

if x in ("abc", "def", "xyz"):
doStuff()

elif x in ("pqr", "tuv", "123"):
doOtherStuff()

elif ...

etc.

When is python building the tuples? Does it need to build the tuple  
every time it comes through this code? Or does it somehow recognise  
that they are constant and cache them?

In other words, is anything gained (efficiency-wise) by first putting  
say

tuple1 = ("abc", "def", "xyz")
tuple2 = ("pqr", "tuv", "123")

somewhere where I know it's executed once, and then writing

if x in tuple1:
doStuff()

elif x in tuple2:
doOtherStuff()

elif ...



(The tuples I have in mind are of course much longer than three  
elements)

Many thanks

David

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: efficiency question

2006-06-30 Thread David Harvey

Fredrik Lundh wrote:

 > when in doubt, ask the compiler:

MTD wrote:

 > >>> dis.dis(cod)


Thanks so much guys! Python just gets cooler every day!

David

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: xml.dom.minidom: how to preserve CRLF's inside CDATA?

2007-05-22 Thread harvey . thomas

On May 22, 2:45 pm, "sim.sim" <[EMAIL PROTECTED]> wrote:
> Hi all.
> i'm faced to trouble using minidom:
>
> #i have a string (xml) within CDATA section, and the section includes
> "\r\n":
> iInStr = '\n\n'
>
> #After i create DOM-object, i get the value of "Data" without "\r\n"
>
> from xml.dom import minidom
> iDoc = minidom.parseString(iInStr)
> iDoc.childNodes[0].childNodes[0].data # it gives u'BEGIN:VCALENDAR
> \nEND:VCALENDAR\n'
>
> according tohttp://www.w3.org/TR/REC-xml/#sec-line-ends
>
> it looks normal, but another part of the documentation says that "only
> the CDEnd string is recognized as 
> markup":http://www.w3.org/TR/REC-xml/#sec-cdata-sect
>
> so parser must (IMHO) give the value of CDATA-section "as is" (neither
> both of parts of the document do not contradicts to each other).
>
> How to get the value of CDATA-section with preserved all symbols
> within? (perhaps use another parser - which one?)
>
> Many thanks for any help.

You will lose the \r characters. From the document you referred to
"""
This section defines some symbols used widely in the grammar.

S (white space) consists of one or more space (#x20) characters,
carriage returns, line feeds, or tabs.

White Space
[3]S::=(#x20 | #x9 | #xD | #xA)+

Note:

The presence of #xD in the above production is maintained purely for
backward compatibility with the First Edition. As explained in 2.11
End-of-Line Handling, all #xD characters literally present in an XML
document are either removed or replaced by #xA characters before any
other processing is done. The only way to get a #xD character to match
this production is to use a character reference in an entity value
literal.
"""

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: just a bug (was: xml.dom.minidom: how to preserve CRLF's inside CDATA?)

2007-05-25 Thread harvey . thomas

On May 25, 12:03 pm, "sim.sim" <[EMAIL PROTECTED]> wrote:
> On 25 ÍÁÊ, 12:45, Marc 'BlackJack' Rintsch <[EMAIL PROTECTED]> wrote:
>
> > In <[EMAIL PROTECTED]>, sim.sim wrote:
> > > Below the code that tryes to parse an well-formed xml, but it fails
> > > with error message:
> > > "not well-formed (invalid token): line 3, column 85"
>
> > How did you verified that it is well formed?  `xmllint` barf on it too.
>
> you can try to write iMessage to file and open it using Mozilla
> Firefox (web-browser)
>
>
>
>
>
>
>
> > > The "problem" within CDATA-section: it consists a part of utf-8
> > > encoded string wich was splited (widely used for memory limited
> > > devices).
>
> > > When minidom parses the xml-string, it fails becouse it tryes to convert
> > > into unicode the data within CDATA-section, insted of just to return the
> > > value of the section "as is". The convertion contradicts the
> > > specificationhttp://www.w3.org/TR/REC-xml/#sec-cdata-sect
>
> > An XML document contains unicode characters, so does the CDTATA section.
> > CDATA is not meant to put arbitrary bytes into a document.  It must
> > contain valid characters of this 
> > typehttp://www.w3.org/TR/REC-xml/#NT-Char(linkedfrom the grammar of CDATA in
> > your link above).
>
> > Ciao,
> > Marc 'BlackJack' Rintsch
>
> my CDATA-section contains only symbols in the range specified for
> Char:
> Char ::=   #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] |
> [#x1-#x10]
>
> filter(lambda x: ord(x) not in range(0x20, 0xD7FF), iMessage)- Hide quoted 
> text -
>
> - Show quoted text -

You need to explicitly convert the string of UTF8 encoded bytes to a
Unicode string before parsing e.g.
unicodestring = unicode(encodedbytes, 'utf8')

Unless I messed up copying and pasting, your original string had an
erroneous byte immediately before ]]>. With that corrected I was able
to process the string correctly - the CDATA marked section consits
entirely of spaces and Cyrillic characters. As I noted earlier you
will lose \r characters as part of the basic XML processing.

HTH

Harvey

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: XML Parsing

2007-03-28 Thread harvey . thomas

On Mar 28, 10:51 am, "Diez B. Roggisch" <[EMAIL PROTECTED]> wrote:
> [EMAIL PROTECTED] wrote:
> > I want to parse this XML file:
>
> > 
>
> > 
>
> > 
> > filename
> > 
> > Hello
> > 
> > 
>
> > 
> > filename2
> > 
> > Hello2
> > 
> > 
>
> > 
>
> > This XML will be in a file called filecreate.xml
>
> > As you might have guessed, I want to create files from this XML file
> > contents, so how can I do this?
> > What modules should I use? What options do I have? Where can I find
> > tutorials? Will I be able to put
> > this on the internet (on a googlepages server)?
>
> > Thanks in advance to everyone who helps me.
> > And yes I have used Google but I am unsure what to use.
>
> The above file is not valid XML. It misses a xmlns:text namespace
> declaration. So you won't be able to parse it regardless of what parser you
> use.
>
> Diez- Hide quoted text -
>
> - Show quoted text -

The example is valid well-formed XML. It is permitted to use the ":"
character in element names. Whether one should in a non namespace
context is a different matter.

Harvey

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: re.sub and empty groups

2007-01-16 Thread harvey . thomas


Hugo Ferreira wrote:

> Hi!
>
> I'm trying to do a search-replace in places where some groups are
> optional... Here's an example:
>
> >> re.match(r"Image:([^\|]+)(?:\|(.*))?", "Image:ola").groups()
> ('ola', None)
>
> >> re.match(r"Image:([^\|]+)(?:\|(.*))?", "Image:ola|").groups()
> ('ola', '')
>
> >> re.match(r"Image:([^\|]+)(?:\|(.*))?", "Image:ola|ole").groups()
> ('ola', 'ole')
>
> The second and third results are right, but not the first one, where
> it should be equal to the second (i.e., it should be an empty string
> instead of None). This is because I want to use re.sub() and when the
> group is None, it blows up with a stack trace...
>
> Maybe I'm not getting the essence of groups and non-grouping groups.
> Someone care to explain (and, give the correct solution :)) ?
>
> Thanks in advance,
>
> Hugo Ferreira
>
> --
> GPG Fingerprint: B0D7 1249 447D F5BB 22C5  5B9B 078C 2615 504B 7B85

>From the documentation:
groups( [default])
Return a tuple containing all the subgroups of the match, from 1 up to
however many groups are in the pattern. The default argument is used
for groups that did not participate in the match; it defaults to None.

Your second group is optional and does not take part in the match in
your first example. You can, however, still use this regular expression
if you use groups('') rather than groups().

A better way probably is to use a simplified regular expression

re.match(r"Image:([^\|]+)\|?(.*)", "Image:ola").groups()

i.e. match the text "Image:" followed by at least one character not
matching "|" followed by an optional "|" followed by any remaining
characters.

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: re.sub and empty groups

2007-01-16 Thread harvey . thomas


Hugo Ferreira wrote:

> Hi!
>
> I'm trying to do a search-replace in places where some groups are
> optional... Here's an example:
>
> >> re.match(r"Image:([^\|]+)(?:\|(.*))?", "Image:ola").groups()
> ('ola', None)
>
> >> re.match(r"Image:([^\|]+)(?:\|(.*))?", "Image:ola|").groups()
> ('ola', '')
>
> >> re.match(r"Image:([^\|]+)(?:\|(.*))?", "Image:ola|ole").groups()
> ('ola', 'ole')
>
> The second and third results are right, but not the first one, where
> it should be equal to the second (i.e., it should be an empty string
> instead of None). This is because I want to use re.sub() and when the
> group is None, it blows up with a stack trace...
>
> Maybe I'm not getting the essence of groups and non-grouping groups.
> Someone care to explain (and, give the correct solution :)) ?
>
> Thanks in advance,
>
> Hugo Ferreira
>
> --
> GPG Fingerprint: B0D7 1249 447D F5BB 22C5  5B9B 078C 2615 504B 7B85

>From the documentation:
groups( [default])
Return a tuple containing all the subgroups of the match, from 1 up to
however many groups are in the pattern. The default argument is used
for groups that did not participate in the match; it defaults to None.

Your second group is optional and does not take part in the match in
your first example. You can, however, still use this regular expression
if you use groups('') rather than groups().

A better way probably is to use a simplified regular expression

re.match(r"Image:([^\|]+)\|?(.*)", "Image:ola").groups()

i.e. match the text "Image:" followed by at least one character not
matching "|" followed by an optional "|" followed by any remaining
characters.

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: One more regular expressions question

2007-01-18 Thread harvey . thomas

Victor Polukcht wrote:

> My pattern now is:
>
> (?P[^(]+)(?P\d+)\)\s+(?P\d+)
>
> And i expect to get:
>
> var1 = "Unassigned Number "
> var2 = "1"
> var3 = "32"
>
> I'm sure my regexp is incorrect, but can't understand where exactly.
>
> Regex.debug shows that even the first block is incorrect.
>
> Thanks in advance.
>
> On Jan 18, 1:15 pm, Roberto Bonvallet <[EMAIL PROTECTED]>
> wrote:
> > Victor Polukcht wrote:
> > > My actual problem is i can't get how to include space, comma, slash.Post 
> > > here what you have written already, so we can tell you what the
> > problem is.
> >
> > --
> > Roberto Bonvallet

You are missing \( after the first group. The RE should be:

'(?P[^(]+)\((?P\d+)\)\s+(?P\d+)'

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Match 2 words in a line of file

2007-01-19 Thread harvey . thomas


Rickard Lindberg wrote:

> I see two potential problems with the non regex solutions.
>
> 1) Consider a line: "foo (bar)". When you split it you will only get
> two strings, as split by default only splits the string on white space
> characters. Thus "'bar' in words" will return false, even though bar is
> a word in that line.
>
> 2) If you have a line something like this: "foobar hello" then "'foo'
> in line" will return true, even though foo is not a word (it is part of
> a word).

Here's a solution using re.split:

import re
import StringIO

wordsplit = re.compile('\W+').split
def matchlines(fh, w1, w2):
w1 = w1.lower()
w2 = w2.lower()
for line in fh:
words = [x.lower() for x in wordsplit(line)]
if w1 in words and w2 in words:
print line.rstrip()

test = """1st line of text (not matched)
2nd line of words (not matched)
3rd line (Word test) should match (case insensitivity)
4th line simple test of word's (matches)
5th line simple test of words not found (plural words)
6th line tests produce strange words (no match - plural)
7th line "word test" should find this
"""
matchlines(StringIO.StringIO(test), 'test', 'word')

-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Match 2 words in a line of file

2007-01-19 Thread harvey . thomas


Rickard Lindberg wrote:

> I see two potential problems with the non regex solutions.
>
> 1) Consider a line: "foo (bar)". When you split it you will only get
> two strings, as split by default only splits the string on white space
> characters. Thus "'bar' in words" will return false, even though bar is
> a word in that line.
>
> 2) If you have a line something like this: "foobar hello" then "'foo'
> in line" will return true, even though foo is not a word (it is part of
> a word).

Here's a solution using re.split:

import re
import StringIO

wordsplit = re.compile('\W+').split
def matchlines(fh, w1, w2):
w1 = w1.lower()
w2 = w2.lower()
for line in fh:
words = [x.lower() for x in wordsplit(line)]
if w1 in words and w2 in words:
print line.rstrip()

test = """1st line of text (not matched)
2nd line of words (not matched)
3rd line (Word test) should match (case insensitivity)
4th line simple test of word's (matches)
5th line simple test of words not found (plural words)
6th line tests produce strange words (no match - plural)
7th line "word test" should find this
"""
matchlines(StringIO.StringIO(test), 'test', 'word')

-- 
http://mail.python.org/mailman/listinfo/python-list

SMTPLIB & email.MIMEText : Certain charaters in the body stop mail from arriving. Why?

2007-11-29 Thread West, Harvey

 

Hello

 

Sending mail with certain characters in the body causes mail never to
arrive. Why? 

e.g if body text has a fullstop "." mail never arrives.

 

I'm using python 4.2 on windows.

 

 

Harvey

 

 

#

import smtplib

from   email.MIMEText import MIMEText

 

 

def mail(serverURL=None, sender='', to='', subject='', text=''):

COMMASPACE = ', '

to = COMMASPACE.join(to)

 

msg = MIMEText(text)

msg['Subject'] = subject

msg['From'] = sender

msg['To'] = to



mailServer = smtplib.SMTP(serverURL, 25)

mailServer.sendmail(sender, to, msg.as_string())

mailServer.quit()



print msg.as_string()

#

Output 

 

Content-Type: text/plain; charset="us-ascii"

MIME-Version: 1.0

Content-Transfer-Encoding: 7bit

Subject: "Information from ost-cs-emma"

From: [EMAIL PROTECTED]

To: [EMAIL PROTECTED]

 

Some text

Some more text in body

 

 




-
Information in this email including any attachments may be
privileged, confidential and is intended exclusively for the
addressee. The views expressed may not be official policy, but the
personal views of the originator. If you have received it in error,
please notify the sender by return e-mail and delete it from your
system. You should not reproduce, distribute, store, retransmit,
use or disclose its contents to anyone.

Please note we reserve the right to monitor all e-mail
communication through our internal and external networks.

SKY and the SKY marks are trade marks of British Sky Broadcasting
Group plc and are used under licence. British Sky Broadcasting
Limited (Registration No. 2906991), Sky Interactive Limited
(Registration No. 3554332), Sky-In-Home Service Limited
(Registration No. 2067075) and Sky Subscribers Services Limited
(Registration No. 2340150) are direct or indirect subsidiaries of
British Sky Broadcasting Group plc (Registration No. 2247735). All
of the companies mentioned in this paragraph are incorporated in
England and Wales and share the same registered office at Grant
Way, Isleworth, Middlesex TW7 5QD.  
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Matching XML Tag Contents with Regex

2007-12-11 Thread harvey . thomas

On Dec 11, 4:05 pm, Chris <[EMAIL PROTECTED]> wrote:
> I'm trying to find the contents of an XML tag. Nothing fancy. I don't
> care about parsing child tags or anything. I just want to get the raw
> text. Here's my script:
>
> import re
>
> data = """
> 
> 
> 
> here's some text!
> 
> 
> here's some text!
> 
> 
> here's some text!
> 
> 
> """
>
> tagName = 'div'
> pattern = re.compile('<%(tagName)s\s[^>]*>[.\n\r\w\s\d\D\S\W]*[^(%
> (tagName)s)]*' % dict(tagName=tagName))
>
> matches = pattern.finditer(data)
> for m in matches:
> contents = data[m.start():m.end()]
> print repr(contents)
> assert tagName not in contents
>
> The problem I'm running into is that the [^%(tagName)s]* portion of my
> regex is being ignored, so only one match is being returned, starting
> at the first  and ending at the end of the text, when it should
> end at the first . For this example, it should return three
> matches, one for each div.
>
> Is what I'm trying to do possible with Python's Regex library? Is
> there an error in my Regex?
>
> Thanks,
> Chris

print re.findall(r'<%s(?=[\s/>])[^>]*>' % 'div', r)

["", "", ""]

HTH

Harvey
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Issue with regular expressions

2008-04-29 Thread harvey . thomas

On Apr 29, 2:46 pm, Julien <[EMAIL PROTECTED]> wrote:
> Hi,
>
> I'm fairly new in Python and I haven't used the regular expressions
> enough to be able to achieve what I want.
> I'd like to select terms in a string, so I can then do a search in my
> database.
>
> query = '   "  some words"  with and "without    quotes   "  '
> p = re.compile(magic_regular_expression)   $ <--- the magic happens
> m = p.match(query)
>
> I'd like m.groups() to return:
> ('some words', 'with', 'and', 'without quotes')
>
> Is that achievable with a single regular expression, and if so, what
> would it be?
>
> Any help would be much appreciated.
>
> Thanks!!
>
> Julien

You can't do it simply and completely with regular expressions alone
because of the requirement to strip the quotes and normalize
whitespace, but its not too hard to write a function to do it. Viz:

import re

wordre = re.compile('"[^"]+"|[a-zA-Z]+').findall
def findwords(src):
ret = []
for x in wordre(src):
if x[0] == '"':
#strip off the quotes and normalise spaces
ret.append(' '.join(x[1:-1].split()))
    else:
ret.append(x)
return ret

query = '   "  Some words"  withand "withoutquotes   "  '
print findwords(query)

Running this gives
['Some words', 'with', 'and', 'without quotes']

HTH

Harvey
--
http://mail.python.org/mailman/listinfo/python-list

Re: finding most common elements between thousands of multiple arrays.

2009-07-04 Thread Vilya Harvey

2009/7/4 Andre Engels :
> On Sat, Jul 4, 2009 at 9:33 AM, mclovin wrote:
>> Currently I need to find the most common elements in thousands of
>> arrays within one large array (arround 2 million instances with ~70k
>> unique elements)
>>
>> so I set up a dictionary to handle the counting so when I am
>> iterating  I up the count on the corrosponding dictionary element. I
>> then iterate through the dictionary and find the 25 most common
>> elements.
>>
>> the elements are initially held in a array within an array. so I am am
>> just trying to find the most common elements between all the arrays
>> contained in one large array.
>> my current code looks something like this:
>> d = {}
>> for arr in my_array:
>> -for i in arr:
>> #elements are numpy integers and thus are not accepted as dictionary
>> keys
>> ---d[int(i)]=d.get(int(i),0)+1
>>
>> then I filter things down. but with my algorithm that only takes about
>> 1 sec so I dont need to show it here since that isnt the problem.
>>
>>
>> But there has to be something better. I have to do this many many
>> times and it seems silly to iterate through 2 million things just to
>> get 25. The element IDs are integers and are currently being held in
>> numpy arrays in a larger array. this ID is what makes up the key to
>> the dictionary.
>>
>>  It currently takes about 5 seconds to accomplish this with my current
>> algorithm.
>>
>> So does anyone know the best solution or algorithm? I think the trick
>> lies in matrix intersections but I do not know.
>
> There's no better algorithm for the general case. No method of
> checking the matrices using less than 200-x look-ups will ensure
> you that there's not a new value with x occurences lurking somewhere.

Try flattening the arrays into a single large array & sorting it. Then
you can just iterate over the large array counting as you go; you only
ever have to insert into the dict once for each value and there's no
lookups in the dict. I don't know numpy, so there's probably a more
efficient way to write this, but this should show what I'm talking
about:

big_arr = sorted(reduce(list.__add__, my_array, []))
counts = {}
last_val = big_arr[0]
count = 0
for val in big_arr:
if val == last_val:
count += 1
else:
counts[last_val] = count
count = 0
last_val = val
counts[last_val] = count# to get the count for the last value.

If flattening the arrays isn't practical, you may still get some
improvements by sorting them individually and applying the same
principle to each of them:

counts = {}
for arr in my_array:
sorted_arr = sorted(arr)
last_val = sorted_arr[0]
count = 0
for val in sorted_arr:
if val == last_val:
count += 1
else:
counts[last_val] = counts.get(last_val, 0) + count
count = 0
last_val = val
counts[last_val] = counts.get(last_val, 0) + count

Hope that helps...

Vil.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Reversible Debugging

2009-07-04 Thread Vilya Harvey

2009/7/4 Patrick Sabin :
> If someone has another idea of taking a snapshot let me know. Using VMWare
> is not a
> very elegant way in my opinion.

Someone implemented the same idea for Java a while ago. They called it
"omniscient debugging"; you can find details at
http://www.lambdacs.com/debugger/
and a paper about it at
http://www.lambdacs.com/debugger/AADEBUG_Mar_03.pdf

Another more recent paper on the topic is
http://scg.unibe.ch/archive/papers/Lien08bBackInTimeDebugging.pdf

I haven't read either of these papers myself, but maybe they'll give
you some ideas.

Vil.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: finding most common elements between thousands of multiple arrays.

2009-07-04 Thread Vilya Harvey

2009/7/4 Steven D'Aprano :
> On Sat, 04 Jul 2009 13:42:06 +, Steven D'Aprano wrote:
>
>> On Sat, 04 Jul 2009 10:55:44 +0100, Vilya Harvey wrote:
>>
>>> 2009/7/4 Andre Engels :
>>>> On Sat, Jul 4, 2009 at 9:33 AM, mclovin wrote:
>>>>> Currently I need to find the most common elements in thousands of
>>>>> arrays within one large array (arround 2 million instances with ~70k
>>>>> unique elements)
>> ...
>>>> There's no better algorithm for the general case. No method of
>>>> checking the matrices using less than 200-x look-ups will ensure
>>>> you that there's not a new value with x occurences lurking somewhere.
>>>
>>> Try flattening the arrays into a single large array & sorting it. Then
>>> you can just iterate over the large array counting as you go; you only
>>> ever have to insert into the dict once for each value and there's no
>>> lookups in the dict.
>>
>> You're suggesting to do a whole bunch of work copying 2,000,000 pointers
>> into a single array, then a whole bunch of more work sorting that second
>> array (which is O(N*log N) on average), and then finally iterate over
>> the second array. Sure, that last step will on average involve fewer
>> than O(N) steps,
>
> Er what?
>
> Ignore that last comment -- I don't know what I was thinking. You still
> have to iterate over all N elements, sorted or not.
>
>> but to get to that point you've already done more work
>> than just iterating over the array-of-arrays in the first place.
>
> What it does buy you though, as you pointed out, is reducing the number
> of explicit dict lookups and writes. However, dict lookups and writes are
> very fast, fast enough that they're used throughout Python. A line like:
>
> count += 1
>
> actually is a dict lookup and write.

I did some tests, just to be sure, and you're absolutely right: just
creating the flattened list took several hundred (!) times as long as
iterating through all the lists in place. Live and learn...

Vil.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Why is my code faster with append() in a loop than with a large list?

2009-07-06 Thread Vilya Harvey

2009/7/6 Xavier Ho :
> Why is version B of the code faster than version A? (Only three lines
> different)

Here's a guess:

As the number you're testing gets larger, version A is creating very
big list. I'm not sure exactly how much overhead each list entry has
in python, but I guess it's at least 8 bytes: a 32-bit reference for
each list entry, and 32 bits to hold the int value (assuming a 32-bit
version of python). The solution you're looking for is a large 8 digit
number; let's say 80,000,000, for the sake of easy calculation. That
means, as you get close to the solution, you'll be trying to allocate
almost 640 Mb of memory for every number you're checking. That's going
to make the garbage collector work extremely hard. Also, depending on
how much memory your computer has free, you'll probably start hitting
virtual memory too, which will slow you down even further. Finally,
the reduce step has to process all 80,000,000 elements which is
clearly going to take a while.

Version b creates a list which is only as long as the largest prime
factor, so at worst the list size will be approx. sqrt(80,000,000),
which is approx. 8900 elements or approx. 72 Kb or memory - a much
more manageable size.

Hope that helps,

Vil.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: count

2009-07-08 Thread Vilya Harvey

2009/7/8 Dhananjay :
> I wanted to sort column 2 in assending order  and I read whole file in array
> "data" and did the following:
>
> data.sort(key = lambda fields:(fields[2]))
>
> I have sorted column 2, however I want to count the numbers in the column 2.
> i.e. I want to know, for example, how many repeates of say '3' (first row,
> 2nd column in above data) are there in column 2.

One thing: indexes in Python start from 0, so the second column has an
index of 1 not 2. In other words, it should be data.sort(key = lambda
fields: fields[1]) instead.

With that out of the way, the following will print out a count of each
unique item in the second column:

from itertools import groupby
for x, g in groupby([fields[1] for fields in data]):
print x, len(tuple(g))

Hope that helps,
Vil.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Question about generators

2009-07-12 Thread Vilya Harvey

2009/7/12 Cameron Pulsford :
> My question is, is it possible to combine those two loops? The primes
> generator I wrote finds all primes up to n, except for 2, 3 and 5, so I must
> check those explicitly. Is there anyway to concatenate the hard coded list
> of [2,3,5] and the generator I wrote so that I don't need two for loops that
> do the same thing?

itertools.chain([2, 3, 5], primes) is what you're looking for, I think.

Vil.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Best Way to Handle All Exceptions

2009-07-13 Thread Vilya Harvey

2009/7/13 seldan24 :
> Thank you both for your input.  I want to make sure I get started on
> the right track.  For this particular script, I should have included
> that I would take the exception contents, and pass those to the
> logging module.  For this particular script, all exceptions are fatal
> and I would want them to be.  I just wanted a way to catch them and
> log them prior to program termination.

The logging module has a function specifically to handle this case:

try:
# do something
except:
logging.exception("Uh oh...")

The exception() method will automatically add details of the exception
to the log message it creates; as per the docs, you should only call
it from within an exception handler.

Hope that helps,

Vil.
-- 
http://mail.python.org/mailman/listinfo/python-list

Re: Memory error due to big input file

2009-07-13 Thread Vilya Harvey

2009/7/13 Aaron Scott :
>> BTW, you should derive all your classes from something.  If nothing
>> else, use object.
>>   class textfile(object):
>
> Just out of curiousity... why is that? I've been coding in Python for
> a long time, and I never derive my base classes. What's the advantage
> to deriving them?

class Foo:

uses the old object model.

class Foo(object):

uses the new object model.

See http://docs.python.org/reference/datamodel.html (specifically
section 3.3) for details of the differences.

Vil.
-- 
http://mail.python.org/mailman/listinfo/python-list

type hinting backward compatibility with python 3.0 to 3.4

2017-05-19 Thread Edward Ned Harvey (python)

I think it's great that for built-in types such as int and str, backward 
compatibility of type hinting annotations is baked into python 3.0 to 3.4. In 
fact, I *thought* python 3.0 to 3.4 would *ignore* annotations, but it 
doesn't...

I'm struggling to create something backward compatible that requires the 
'typing' module. For example, the following program is good in python 3.5, but 
line 11 is a syntax error in python 3.4:

 1 import sys 
 2 
 3 if sys.version_info[0] < 3:
 4 raise RuntimeError("Must use at least python version 3") 
 5 
 6 # The 'typing' module, useful for type hints, was introduced in python 
3.5 
 7 if sys.version_info[1] >= 5: 
 8 from typing import Optional
 9 
10 
11 def divider(x: int, y: int) -> Optional[float]:
12 if y == 0: 
13 return None
14 return x / y
15 
16 print("22 / 7 = " + str(divider(22, 7)))
17 print("8 / 0 = " + str(divider(8, 0)))
18

When I run this program in python 3.4, I get this:
Traceback (most recent call last):
  File "./ned.py", line 11, in 
def divider(x: int, y: int) -> Optional[float]:
NameError: name 'Optional' is not defined
-- 
https://mail.python.org/mailman/listinfo/python-list

RE: type hinting backward compatibility with python 3.0 to 3.4

2017-05-19 Thread Edward Ned Harvey (python)

This pattern seems to work:

import sys

if sys.version_info[0] < 3:
raise RuntimeError("Must use at least python version 3")

# The 'typing' module, useful for type hints, was introduced in python 3.5
if sys.version_info[1] >= 5:
from typing import Optional
optional_float = Optional[float]
else:
optional_float = object

def divider(x: int, y: int) -> optional_float:
if y == 0:
return None
return x / y

print("3 / 0 = " + str(divider(3,0)))
print("22 / 7 = " + str(divider(22,7)))

-- 
https://mail.python.org/mailman/listinfo/python-list

Python/Django Developer

Error with math.sqrt

variable attribute name

how to read list from file

Re: how to read list from file

Re: how to read list from file

Re: how to read list from file

Re: Insert characters into string based on re ?

Re: Need a Regular expression to remove a char for Unicode text

efficiency question

Re: efficiency question

Re: xml.dom.minidom: how to preserve CRLF's inside CDATA?

Re: just a bug (was: xml.dom.minidom: how to preserve CRLF's inside CDATA?)

Re: XML Parsing

Re: re.sub and empty groups

Re: re.sub and empty groups

Re: One more regular expressions question

Re: Match 2 words in a line of file

Re: Match 2 words in a line of file

SMTPLIB & email.MIMEText : Certain charaters in the body stop mail from arriving. Why?

Re: Matching XML Tag Contents with Regex

Re: Issue with regular expressions

Re: finding most common elements between thousands of multiple arrays.

Re: Reversible Debugging

Re: finding most common elements between thousands of multiple arrays.

Re: Why is my code faster with append() in a loop than with a large list?

Re: count

Re: Question about generators

Re: Best Way to Handle All Exceptions

Re: Memory error due to big input file

type hinting backward compatibility with python 3.0 to 3.4

RE: type hinting backward compatibility with python 3.0 to 3.4

32 matches

Site Navigation

Mail list logo

Footer information