Python/Django Developer
My client in Jersey City, NJ 07302 is looking for a Python Developer. Below is the job description: Job Summary: This is a programming position in the technical department of Advance Internet, working on application development, application integration, automated testing and deployment of applications, publishing structure and unit testing in various development environments. Job Functions: Develop extensible online applications Integrate vendor code into web tier Write and Maintain Unit Tests Develop and Maintain Automated Testing Frameworks Perform Load/Performance Testing Maintain & Enhance Build & Deployment Scripts Liaise with programmers (internal & external) Ability to set development standards Oversee incorporation of applications into web tier Assess stability of existing applications Coordinate conversion from legacy systems Supervisory Responsibilities: None Required Knowledge, Skills and Abilities: Candidate needs to be aggressive in learning new things as well as taking responsibility for work product and meeting deadlines with minimal supervision. They need to have worked in an online environment and have published applications that have withstood live deployment. Open source familiarity Django framework Python Other frameworks At least 2 standard templating languages such as Velocity, PHP, JSP Knowledge of quality control methods and philosophy Linux command line proficiency ANT/Maven, Build, Make Project management experience Excellent written and oral communication skills Desired Skills/Experience: Moveable Type application knowledge Developing for a clustered server environment Ability to read/understand C OO Perl -- View this message in context: http://www.nabble.com/Python-Django-Developer-tp20155587p20155587.html Sent from the Python - python-list mailing list archive at Nabble.com. -- http://mail.python.org/mailman/listinfo/python-list
Error with math.sqrt
I'm starting out with Python 3.5. My current frustration is with: >>> math.sqrt(25) Traceback (most recent call last): File "", line 1, in math.sqrt(25) NameError: name 'math' is not defined >>> Advice? Jack -- https://mail.python.org/mailman/listinfo/python-list
variable attribute name
I want to let the name of an attribute be the string value of a variable. Here is some code: class Object(object): pass A = Object() s = 'attr' A. = 1 The last line denotes the variable value by (not a python form). What I want is to have A.attr = 1, but 'attr' determined by the value of s. Please advise. -- https://mail.python.org/mailman/listinfo/python-list
how to read list from file
I am looping as for L in file.readlines(), where file is csv.
L is a list of 3 items, eg, [{'a':1, 'b':2}, [1,2,3], 10] Note that the first
item is a dir and 2nd is a list, so parsing with split doesn't work. Is there
a way to convert L, which is a string, to the list of 3 items I want?
--
https://mail.python.org/mailman/listinfo/python-list
Re: how to read list from file
On Saturday, October 5, 2013 7:24:39 PM UTC-6, Tim Chase wrote:
> On 2013-10-05 18:08, Harvey Greenberg wrote:
>
> > I am looping as for L in file.readlines(), where file is csv.
>
> >
>
> > L is a list of 3 items, eg, [{'a':1, 'b':2}, [1,2,3], 10] Note that
>
> > the first item is a dir and 2nd is a list, so parsing with split
>
> > doesn't work. Is there a way to convert L, which is a string, to
>
> > the list of 3 items I want?
>
>
>
> sounds like you want ast.literal_eval():
>
>
>
> Python 2.7.3 (default, Jan 2 2013, 13:56:14)
>
> [GCC 4.7.2] on linux2
>
> Type "help", "copyright", "credits" or "license" for more
>
> information.
>
> >>> s = "[{'a':1, 'b':2}, [1,2,3], 10]"
>
> >>> import ast
>
> >>> print repr(ast.literal_eval(s))
>
> [{'a': 1, 'b': 2}, [1, 2, 3], 10]
>
>
>
> -tkc
that didn't work. printing it looks like the list because it's the input, but
try printing len(repr(ast.literal_eval(s))). It should give 3, but it gives 72
(number of chars).
--
https://mail.python.org/mailman/listinfo/python-list
Re: how to read list from file
On Sunday, October 6, 2013 10:41:33 AM UTC-6, Harvey Greenberg wrote:
> On Saturday, October 5, 2013 7:24:39 PM UTC-6, Tim Chase wrote:
>
> > On 2013-10-05 18:08, Harvey Greenberg wrote:
>
> >
>
> > > I am looping as for L in file.readlines(), where file is csv.
>
> >
>
> > >
>
> >
>
> > > L is a list of 3 items, eg, [{'a':1, 'b':2}, [1,2,3], 10] Note that
>
> >
>
> > > the first item is a dir and 2nd is a list, so parsing with split
>
> >
>
> > > doesn't work. Is there a way to convert L, which is a string, to
>
> >
>
> > > the list of 3 items I want?
>
> >
>
> >
>
> >
>
> > sounds like you want ast.literal_eval():
>
> >
>
> >
>
> >
>
> > Python 2.7.3 (default, Jan 2 2013, 13:56:14)
>
> >
>
> > [GCC 4.7.2] on linux2
>
> >
>
> > Type "help", "copyright", "credits" or "license" for more
>
> >
>
> > information.
>
> >
>
> > >>> s = "[{'a':1, 'b':2}, [1,2,3], 10]"
>
> >
>
> > >>> import ast
>
> >
>
> > >>> print repr(ast.literal_eval(s))
>
> >
>
> > [{'a': 1, 'b': 2}, [1, 2, 3], 10]
>
> >
>
> >
>
> >
>
> > -tkc
>
>
>
> that didn't work. printing it looks like the list because it's the input,
> but try printing len(repr(ast.literal_eval(s))). It should give 3, but it
> gives 72 (number of chars).
None of the responses worked; after import json, I used:
for line in inputFile.readlines():
L = json.loads(line.replace("",""))
print L, len(L)
I get error. I probably misunderstood how to implement these suggestions, but
I wrote a list to a csv file whose members have lists. I now want to read them
(in another program) and end up with the origianl list.
--
https://mail.python.org/mailman/listinfo/python-list
Re: how to read list from file
On Saturday, October 5, 2013 7:08:08 PM UTC-6, Harvey Greenberg wrote:
> I am looping as for L in file.readlines(), where file is csv.
>
>
>
> L is a list of 3 items, eg, [{'a':1, 'b':2}, [1,2,3], 10] Note that the first
> item is a dir and 2nd is a list, so parsing with split doesn't work. Is
> there a way to convert L, which is a string, to the list of 3 items I want?
Yay It worked. Thanks!
--
https://mail.python.org/mailman/listinfo/python-list
Re: Insert characters into string based on re ?
Matt wrote:
> I am attempting to reformat a string, inserting newlines before certain
> phrases. For example, in formatting SQL, I want to start a new line at
> each JOIN condition. Noting that strings are immutable, I thought it
> best to spllit the string at the key points, then join with '\n'.
>
> Regexps can seem the best way to identify the points in the string
> ('LEFT.*JOIN' to cover 'LEFT OUTER JOIN' and 'LEFT JOIN'), since I need
> to identify multiple locationg in the string. However, the re.split
> method returns the list without the split phrases, and re.findall does
> not seem useful for this operation.
>
> Suggestions?
I think that re.sub is a more appropriate method rather than split and
join
trivial example (non SQL):
>>> addnlre = re.compile('LEFT\s.*?\s*JOIN|RIGHT\s.*?\s*JOIN', re.DOTALL +
>>> re.IGNORECASE).sub
>>> addnlre(lambda x: x.group() + '\n', '... LEFT JOIN x RIGHT OUTER join y')
'... LEFT JOIN\n x RIGHT OUTER join\n y'
--
http://mail.python.org/mailman/listinfo/python-list
Re: Need a Regular expression to remove a char for Unicode text
శ్రీనివాస wrote:
> Hai friends,
> Can any one tell me how can i remove a character from a unocode text.
> కల్&హార is a Telugu word in Unicode. Here i want to
> remove '&' but not replace with a zero width char. And one more thing,
> if any whitespaces are there before and after '&' char, the text should
> be kept as it is. Please tell me how can i workout this with regular
> expressions.
>
> Thanks and regards
> Srinivasa Raju Datla
Don't know anything about Telugu, but is this the approach you want?
>>> x=u'\xfe\xff & \xfe\xff \xfe\xff&\xfe\xff'
>>> noampre = re.compile('(?>> noampre('', x)
u'\xfe\xff & \xfe\xff \xfe\xff\xfe\xff'
The regular expression has negative look behind and look ahead
assertions to check that there is no whitespace surrounding the '&'
character. Each match then found is then replaced with the empty string
--
http://mail.python.org/mailman/listinfo/python-list
efficiency question
Hi,
Suppose I write
if x in ("abc", "def", "xyz"):
doStuff()
elif x in ("pqr", "tuv", "123"):
doOtherStuff()
elif ...
etc.
When is python building the tuples? Does it need to build the tuple
every time it comes through this code? Or does it somehow recognise
that they are constant and cache them?
In other words, is anything gained (efficiency-wise) by first putting
say
tuple1 = ("abc", "def", "xyz")
tuple2 = ("pqr", "tuv", "123")
somewhere where I know it's executed once, and then writing
if x in tuple1:
doStuff()
elif x in tuple2:
doOtherStuff()
elif ...
(The tuples I have in mind are of course much longer than three
elements)
Many thanks
David
--
http://mail.python.org/mailman/listinfo/python-list
Re: efficiency question
Fredrik Lundh wrote: > when in doubt, ask the compiler: MTD wrote: > >>> dis.dis(cod) Thanks so much guys! Python just gets cooler every day! David -- http://mail.python.org/mailman/listinfo/python-list
Re: xml.dom.minidom: how to preserve CRLF's inside CDATA?
On May 22, 2:45 pm, "sim.sim" <[EMAIL PROTECTED]> wrote: > Hi all. > i'm faced to trouble using minidom: > > #i have a string (xml) within CDATA section, and the section includes > "\r\n": > iInStr = '\n\n' > > #After i create DOM-object, i get the value of "Data" without "\r\n" > > from xml.dom import minidom > iDoc = minidom.parseString(iInStr) > iDoc.childNodes[0].childNodes[0].data # it gives u'BEGIN:VCALENDAR > \nEND:VCALENDAR\n' > > according tohttp://www.w3.org/TR/REC-xml/#sec-line-ends > > it looks normal, but another part of the documentation says that "only > the CDEnd string is recognized as > markup":http://www.w3.org/TR/REC-xml/#sec-cdata-sect > > so parser must (IMHO) give the value of CDATA-section "as is" (neither > both of parts of the document do not contradicts to each other). > > How to get the value of CDATA-section with preserved all symbols > within? (perhaps use another parser - which one?) > > Many thanks for any help. You will lose the \r characters. From the document you referred to """ This section defines some symbols used widely in the grammar. S (white space) consists of one or more space (#x20) characters, carriage returns, line feeds, or tabs. White Space [3]S::=(#x20 | #x9 | #xD | #xA)+ Note: The presence of #xD in the above production is maintained purely for backward compatibility with the First Edition. As explained in 2.11 End-of-Line Handling, all #xD characters literally present in an XML document are either removed or replaced by #xA characters before any other processing is done. The only way to get a #xD character to match this production is to use a character reference in an entity value literal. """ -- http://mail.python.org/mailman/listinfo/python-list
Re: just a bug (was: xml.dom.minidom: how to preserve CRLF's inside CDATA?)
On May 25, 12:03 pm, "sim.sim" <[EMAIL PROTECTED]> wrote: > On 25 ÍÁÊ, 12:45, Marc 'BlackJack' Rintsch <[EMAIL PROTECTED]> wrote: > > > In <[EMAIL PROTECTED]>, sim.sim wrote: > > > Below the code that tryes to parse an well-formed xml, but it fails > > > with error message: > > > "not well-formed (invalid token): line 3, column 85" > > > How did you verified that it is well formed? `xmllint` barf on it too. > > you can try to write iMessage to file and open it using Mozilla > Firefox (web-browser) > > > > > > > > > > The "problem" within CDATA-section: it consists a part of utf-8 > > > encoded string wich was splited (widely used for memory limited > > > devices). > > > > When minidom parses the xml-string, it fails becouse it tryes to convert > > > into unicode the data within CDATA-section, insted of just to return the > > > value of the section "as is". The convertion contradicts the > > > specificationhttp://www.w3.org/TR/REC-xml/#sec-cdata-sect > > > An XML document contains unicode characters, so does the CDTATA section. > > CDATA is not meant to put arbitrary bytes into a document. It must > > contain valid characters of this > > typehttp://www.w3.org/TR/REC-xml/#NT-Char(linkedfrom the grammar of CDATA in > > your link above). > > > Ciao, > > Marc 'BlackJack' Rintsch > > my CDATA-section contains only symbols in the range specified for > Char: > Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | > [#x1-#x10] > > filter(lambda x: ord(x) not in range(0x20, 0xD7FF), iMessage)- Hide quoted > text - > > - Show quoted text - You need to explicitly convert the string of UTF8 encoded bytes to a Unicode string before parsing e.g. unicodestring = unicode(encodedbytes, 'utf8') Unless I messed up copying and pasting, your original string had an erroneous byte immediately before ]]>. With that corrected I was able to process the string correctly - the CDATA marked section consits entirely of spaces and Cyrillic characters. As I noted earlier you will lose \r characters as part of the basic XML processing. HTH Harvey -- http://mail.python.org/mailman/listinfo/python-list
Re: XML Parsing
On Mar 28, 10:51 am, "Diez B. Roggisch" <[EMAIL PROTECTED]> wrote: > [EMAIL PROTECTED] wrote: > > I want to parse this XML file: > > > > > > > > > > > filename > > > > Hello > > > > > > > > > filename2 > > > > Hello2 > > > > > > > > > > This XML will be in a file called filecreate.xml > > > As you might have guessed, I want to create files from this XML file > > contents, so how can I do this? > > What modules should I use? What options do I have? Where can I find > > tutorials? Will I be able to put > > this on the internet (on a googlepages server)? > > > Thanks in advance to everyone who helps me. > > And yes I have used Google but I am unsure what to use. > > The above file is not valid XML. It misses a xmlns:text namespace > declaration. So you won't be able to parse it regardless of what parser you > use. > > Diez- Hide quoted text - > > - Show quoted text - The example is valid well-formed XML. It is permitted to use the ":" character in element names. Whether one should in a non namespace context is a different matter. Harvey -- http://mail.python.org/mailman/listinfo/python-list
Re: re.sub and empty groups
Hugo Ferreira wrote:
> Hi!
>
> I'm trying to do a search-replace in places where some groups are
> optional... Here's an example:
>
> >> re.match(r"Image:([^\|]+)(?:\|(.*))?", "Image:ola").groups()
> ('ola', None)
>
> >> re.match(r"Image:([^\|]+)(?:\|(.*))?", "Image:ola|").groups()
> ('ola', '')
>
> >> re.match(r"Image:([^\|]+)(?:\|(.*))?", "Image:ola|ole").groups()
> ('ola', 'ole')
>
> The second and third results are right, but not the first one, where
> it should be equal to the second (i.e., it should be an empty string
> instead of None). This is because I want to use re.sub() and when the
> group is None, it blows up with a stack trace...
>
> Maybe I'm not getting the essence of groups and non-grouping groups.
> Someone care to explain (and, give the correct solution :)) ?
>
> Thanks in advance,
>
> Hugo Ferreira
>
> --
> GPG Fingerprint: B0D7 1249 447D F5BB 22C5 5B9B 078C 2615 504B 7B85
>From the documentation:
groups( [default])
Return a tuple containing all the subgroups of the match, from 1 up to
however many groups are in the pattern. The default argument is used
for groups that did not participate in the match; it defaults to None.
Your second group is optional and does not take part in the match in
your first example. You can, however, still use this regular expression
if you use groups('') rather than groups().
A better way probably is to use a simplified regular expression
re.match(r"Image:([^\|]+)\|?(.*)", "Image:ola").groups()
i.e. match the text "Image:" followed by at least one character not
matching "|" followed by an optional "|" followed by any remaining
characters.
--
http://mail.python.org/mailman/listinfo/python-list
Re: re.sub and empty groups
Hugo Ferreira wrote:
> Hi!
>
> I'm trying to do a search-replace in places where some groups are
> optional... Here's an example:
>
> >> re.match(r"Image:([^\|]+)(?:\|(.*))?", "Image:ola").groups()
> ('ola', None)
>
> >> re.match(r"Image:([^\|]+)(?:\|(.*))?", "Image:ola|").groups()
> ('ola', '')
>
> >> re.match(r"Image:([^\|]+)(?:\|(.*))?", "Image:ola|ole").groups()
> ('ola', 'ole')
>
> The second and third results are right, but not the first one, where
> it should be equal to the second (i.e., it should be an empty string
> instead of None). This is because I want to use re.sub() and when the
> group is None, it blows up with a stack trace...
>
> Maybe I'm not getting the essence of groups and non-grouping groups.
> Someone care to explain (and, give the correct solution :)) ?
>
> Thanks in advance,
>
> Hugo Ferreira
>
> --
> GPG Fingerprint: B0D7 1249 447D F5BB 22C5 5B9B 078C 2615 504B 7B85
>From the documentation:
groups( [default])
Return a tuple containing all the subgroups of the match, from 1 up to
however many groups are in the pattern. The default argument is used
for groups that did not participate in the match; it defaults to None.
Your second group is optional and does not take part in the match in
your first example. You can, however, still use this regular expression
if you use groups('') rather than groups().
A better way probably is to use a simplified regular expression
re.match(r"Image:([^\|]+)\|?(.*)", "Image:ola").groups()
i.e. match the text "Image:" followed by at least one character not
matching "|" followed by an optional "|" followed by any remaining
characters.
--
http://mail.python.org/mailman/listinfo/python-list
Re: One more regular expressions question
Victor Polukcht wrote: > My pattern now is: > > (?P[^(]+)(?P\d+)\)\s+(?P\d+) > > And i expect to get: > > var1 = "Unassigned Number " > var2 = "1" > var3 = "32" > > I'm sure my regexp is incorrect, but can't understand where exactly. > > Regex.debug shows that even the first block is incorrect. > > Thanks in advance. > > On Jan 18, 1:15 pm, Roberto Bonvallet <[EMAIL PROTECTED]> > wrote: > > Victor Polukcht wrote: > > > My actual problem is i can't get how to include space, comma, slash.Post > > > here what you have written already, so we can tell you what the > > problem is. > > > > -- > > Roberto Bonvallet You are missing \( after the first group. The RE should be: '(?P[^(]+)\((?P\d+)\)\s+(?P\d+)' -- http://mail.python.org/mailman/listinfo/python-list
Re: Match 2 words in a line of file
Rickard Lindberg wrote:
> I see two potential problems with the non regex solutions.
>
> 1) Consider a line: "foo (bar)". When you split it you will only get
> two strings, as split by default only splits the string on white space
> characters. Thus "'bar' in words" will return false, even though bar is
> a word in that line.
>
> 2) If you have a line something like this: "foobar hello" then "'foo'
> in line" will return true, even though foo is not a word (it is part of
> a word).
Here's a solution using re.split:
import re
import StringIO
wordsplit = re.compile('\W+').split
def matchlines(fh, w1, w2):
w1 = w1.lower()
w2 = w2.lower()
for line in fh:
words = [x.lower() for x in wordsplit(line)]
if w1 in words and w2 in words:
print line.rstrip()
test = """1st line of text (not matched)
2nd line of words (not matched)
3rd line (Word test) should match (case insensitivity)
4th line simple test of word's (matches)
5th line simple test of words not found (plural words)
6th line tests produce strange words (no match - plural)
7th line "word test" should find this
"""
matchlines(StringIO.StringIO(test), 'test', 'word')
--
http://mail.python.org/mailman/listinfo/python-list
Re: Match 2 words in a line of file
Rickard Lindberg wrote:
> I see two potential problems with the non regex solutions.
>
> 1) Consider a line: "foo (bar)". When you split it you will only get
> two strings, as split by default only splits the string on white space
> characters. Thus "'bar' in words" will return false, even though bar is
> a word in that line.
>
> 2) If you have a line something like this: "foobar hello" then "'foo'
> in line" will return true, even though foo is not a word (it is part of
> a word).
Here's a solution using re.split:
import re
import StringIO
wordsplit = re.compile('\W+').split
def matchlines(fh, w1, w2):
w1 = w1.lower()
w2 = w2.lower()
for line in fh:
words = [x.lower() for x in wordsplit(line)]
if w1 in words and w2 in words:
print line.rstrip()
test = """1st line of text (not matched)
2nd line of words (not matched)
3rd line (Word test) should match (case insensitivity)
4th line simple test of word's (matches)
5th line simple test of words not found (plural words)
6th line tests produce strange words (no match - plural)
7th line "word test" should find this
"""
matchlines(StringIO.StringIO(test), 'test', 'word')
--
http://mail.python.org/mailman/listinfo/python-list
SMTPLIB & email.MIMEText : Certain charaters in the body stop mail from arriving. Why?
Hello Sending mail with certain characters in the body causes mail never to arrive. Why? e.g if body text has a fullstop "." mail never arrives. I'm using python 4.2 on windows. Harvey # import smtplib from email.MIMEText import MIMEText def mail(serverURL=None, sender='', to='', subject='', text=''): COMMASPACE = ', ' to = COMMASPACE.join(to) msg = MIMEText(text) msg['Subject'] = subject msg['From'] = sender msg['To'] = to mailServer = smtplib.SMTP(serverURL, 25) mailServer.sendmail(sender, to, msg.as_string()) mailServer.quit() print msg.as_string() # Output Content-Type: text/plain; charset="us-ascii" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit Subject: "Information from ost-cs-emma" From: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Some text Some more text in body - Information in this email including any attachments may be privileged, confidential and is intended exclusively for the addressee. The views expressed may not be official policy, but the personal views of the originator. If you have received it in error, please notify the sender by return e-mail and delete it from your system. You should not reproduce, distribute, store, retransmit, use or disclose its contents to anyone. Please note we reserve the right to monitor all e-mail communication through our internal and external networks. SKY and the SKY marks are trade marks of British Sky Broadcasting Group plc and are used under licence. British Sky Broadcasting Limited (Registration No. 2906991), Sky Interactive Limited (Registration No. 3554332), Sky-In-Home Service Limited (Registration No. 2067075) and Sky Subscribers Services Limited (Registration No. 2340150) are direct or indirect subsidiaries of British Sky Broadcasting Group plc (Registration No. 2247735). All of the companies mentioned in this paragraph are incorporated in England and Wales and share the same registered office at Grant Way, Isleworth, Middlesex TW7 5QD. -- http://mail.python.org/mailman/listinfo/python-list
Re: Matching XML Tag Contents with Regex
On Dec 11, 4:05 pm, Chris <[EMAIL PROTECTED]> wrote:
> I'm trying to find the contents of an XML tag. Nothing fancy. I don't
> care about parsing child tags or anything. I just want to get the raw
> text. Here's my script:
>
> import re
>
> data = """
>
>
>
> here's some text!
>
>
> here's some text!
>
>
> here's some text!
>
>
> """
>
> tagName = 'div'
> pattern = re.compile('<%(tagName)s\s[^>]*>[.\n\r\w\s\d\D\S\W]*[^(%
> (tagName)s)]*' % dict(tagName=tagName))
>
> matches = pattern.finditer(data)
> for m in matches:
> contents = data[m.start():m.end()]
> print repr(contents)
> assert tagName not in contents
>
> The problem I'm running into is that the [^%(tagName)s]* portion of my
> regex is being ignored, so only one match is being returned, starting
> at the first and ending at the end of the text, when it should
> end at the first . For this example, it should return three
> matches, one for each div.
>
> Is what I'm trying to do possible with Python's Regex library? Is
> there an error in my Regex?
>
> Thanks,
> Chris
print re.findall(r'<%s(?=[\s/>])[^>]*>' % 'div', r)
["", "", ""]
HTH
Harvey
--
http://mail.python.org/mailman/listinfo/python-list
Re: Issue with regular expressions
On Apr 29, 2:46 pm, Julien <[EMAIL PROTECTED]> wrote:
> Hi,
>
> I'm fairly new in Python and I haven't used the regular expressions
> enough to be able to achieve what I want.
> I'd like to select terms in a string, so I can then do a search in my
> database.
>
> query = ' " some words" with and "without quotes " '
> p = re.compile(magic_regular_expression) $ <--- the magic happens
> m = p.match(query)
>
> I'd like m.groups() to return:
> ('some words', 'with', 'and', 'without quotes')
>
> Is that achievable with a single regular expression, and if so, what
> would it be?
>
> Any help would be much appreciated.
>
> Thanks!!
>
> Julien
You can't do it simply and completely with regular expressions alone
because of the requirement to strip the quotes and normalize
whitespace, but its not too hard to write a function to do it. Viz:
import re
wordre = re.compile('"[^"]+"|[a-zA-Z]+').findall
def findwords(src):
ret = []
for x in wordre(src):
if x[0] == '"':
#strip off the quotes and normalise spaces
ret.append(' '.join(x[1:-1].split()))
else:
ret.append(x)
return ret
query = ' " Some words" withand "withoutquotes " '
print findwords(query)
Running this gives
['Some words', 'with', 'and', 'without quotes']
HTH
Harvey
--
http://mail.python.org/mailman/listinfo/python-list
Re: finding most common elements between thousands of multiple arrays.
2009/7/4 Andre Engels :
> On Sat, Jul 4, 2009 at 9:33 AM, mclovin wrote:
>> Currently I need to find the most common elements in thousands of
>> arrays within one large array (arround 2 million instances with ~70k
>> unique elements)
>>
>> so I set up a dictionary to handle the counting so when I am
>> iterating I up the count on the corrosponding dictionary element. I
>> then iterate through the dictionary and find the 25 most common
>> elements.
>>
>> the elements are initially held in a array within an array. so I am am
>> just trying to find the most common elements between all the arrays
>> contained in one large array.
>> my current code looks something like this:
>> d = {}
>> for arr in my_array:
>> -for i in arr:
>> #elements are numpy integers and thus are not accepted as dictionary
>> keys
>> ---d[int(i)]=d.get(int(i),0)+1
>>
>> then I filter things down. but with my algorithm that only takes about
>> 1 sec so I dont need to show it here since that isnt the problem.
>>
>>
>> But there has to be something better. I have to do this many many
>> times and it seems silly to iterate through 2 million things just to
>> get 25. The element IDs are integers and are currently being held in
>> numpy arrays in a larger array. this ID is what makes up the key to
>> the dictionary.
>>
>> It currently takes about 5 seconds to accomplish this with my current
>> algorithm.
>>
>> So does anyone know the best solution or algorithm? I think the trick
>> lies in matrix intersections but I do not know.
>
> There's no better algorithm for the general case. No method of
> checking the matrices using less than 200-x look-ups will ensure
> you that there's not a new value with x occurences lurking somewhere.
Try flattening the arrays into a single large array & sorting it. Then
you can just iterate over the large array counting as you go; you only
ever have to insert into the dict once for each value and there's no
lookups in the dict. I don't know numpy, so there's probably a more
efficient way to write this, but this should show what I'm talking
about:
big_arr = sorted(reduce(list.__add__, my_array, []))
counts = {}
last_val = big_arr[0]
count = 0
for val in big_arr:
if val == last_val:
count += 1
else:
counts[last_val] = count
count = 0
last_val = val
counts[last_val] = count# to get the count for the last value.
If flattening the arrays isn't practical, you may still get some
improvements by sorting them individually and applying the same
principle to each of them:
counts = {}
for arr in my_array:
sorted_arr = sorted(arr)
last_val = sorted_arr[0]
count = 0
for val in sorted_arr:
if val == last_val:
count += 1
else:
counts[last_val] = counts.get(last_val, 0) + count
count = 0
last_val = val
counts[last_val] = counts.get(last_val, 0) + count
Hope that helps...
Vil.
--
http://mail.python.org/mailman/listinfo/python-list
Re: Reversible Debugging
2009/7/4 Patrick Sabin : > If someone has another idea of taking a snapshot let me know. Using VMWare > is not a > very elegant way in my opinion. Someone implemented the same idea for Java a while ago. They called it "omniscient debugging"; you can find details at http://www.lambdacs.com/debugger/ and a paper about it at http://www.lambdacs.com/debugger/AADEBUG_Mar_03.pdf Another more recent paper on the topic is http://scg.unibe.ch/archive/papers/Lien08bBackInTimeDebugging.pdf I haven't read either of these papers myself, but maybe they'll give you some ideas. Vil. -- http://mail.python.org/mailman/listinfo/python-list
Re: finding most common elements between thousands of multiple arrays.
2009/7/4 Steven D'Aprano : > On Sat, 04 Jul 2009 13:42:06 +, Steven D'Aprano wrote: > >> On Sat, 04 Jul 2009 10:55:44 +0100, Vilya Harvey wrote: >> >>> 2009/7/4 Andre Engels : >>>> On Sat, Jul 4, 2009 at 9:33 AM, mclovin wrote: >>>>> Currently I need to find the most common elements in thousands of >>>>> arrays within one large array (arround 2 million instances with ~70k >>>>> unique elements) >> ... >>>> There's no better algorithm for the general case. No method of >>>> checking the matrices using less than 200-x look-ups will ensure >>>> you that there's not a new value with x occurences lurking somewhere. >>> >>> Try flattening the arrays into a single large array & sorting it. Then >>> you can just iterate over the large array counting as you go; you only >>> ever have to insert into the dict once for each value and there's no >>> lookups in the dict. >> >> You're suggesting to do a whole bunch of work copying 2,000,000 pointers >> into a single array, then a whole bunch of more work sorting that second >> array (which is O(N*log N) on average), and then finally iterate over >> the second array. Sure, that last step will on average involve fewer >> than O(N) steps, > > Er what? > > Ignore that last comment -- I don't know what I was thinking. You still > have to iterate over all N elements, sorted or not. > >> but to get to that point you've already done more work >> than just iterating over the array-of-arrays in the first place. > > What it does buy you though, as you pointed out, is reducing the number > of explicit dict lookups and writes. However, dict lookups and writes are > very fast, fast enough that they're used throughout Python. A line like: > > count += 1 > > actually is a dict lookup and write. I did some tests, just to be sure, and you're absolutely right: just creating the flattened list took several hundred (!) times as long as iterating through all the lists in place. Live and learn... Vil. -- http://mail.python.org/mailman/listinfo/python-list
Re: Why is my code faster with append() in a loop than with a large list?
2009/7/6 Xavier Ho : > Why is version B of the code faster than version A? (Only three lines > different) Here's a guess: As the number you're testing gets larger, version A is creating very big list. I'm not sure exactly how much overhead each list entry has in python, but I guess it's at least 8 bytes: a 32-bit reference for each list entry, and 32 bits to hold the int value (assuming a 32-bit version of python). The solution you're looking for is a large 8 digit number; let's say 80,000,000, for the sake of easy calculation. That means, as you get close to the solution, you'll be trying to allocate almost 640 Mb of memory for every number you're checking. That's going to make the garbage collector work extremely hard. Also, depending on how much memory your computer has free, you'll probably start hitting virtual memory too, which will slow you down even further. Finally, the reduce step has to process all 80,000,000 elements which is clearly going to take a while. Version b creates a list which is only as long as the largest prime factor, so at worst the list size will be approx. sqrt(80,000,000), which is approx. 8900 elements or approx. 72 Kb or memory - a much more manageable size. Hope that helps, Vil. -- http://mail.python.org/mailman/listinfo/python-list
Re: count
2009/7/8 Dhananjay : > I wanted to sort column 2 in assending order and I read whole file in array > "data" and did the following: > > data.sort(key = lambda fields:(fields[2])) > > I have sorted column 2, however I want to count the numbers in the column 2. > i.e. I want to know, for example, how many repeates of say '3' (first row, > 2nd column in above data) are there in column 2. One thing: indexes in Python start from 0, so the second column has an index of 1 not 2. In other words, it should be data.sort(key = lambda fields: fields[1]) instead. With that out of the way, the following will print out a count of each unique item in the second column: from itertools import groupby for x, g in groupby([fields[1] for fields in data]): print x, len(tuple(g)) Hope that helps, Vil. -- http://mail.python.org/mailman/listinfo/python-list
Re: Question about generators
2009/7/12 Cameron Pulsford : > My question is, is it possible to combine those two loops? The primes > generator I wrote finds all primes up to n, except for 2, 3 and 5, so I must > check those explicitly. Is there anyway to concatenate the hard coded list > of [2,3,5] and the generator I wrote so that I don't need two for loops that > do the same thing? itertools.chain([2, 3, 5], primes) is what you're looking for, I think. Vil. -- http://mail.python.org/mailman/listinfo/python-list
Re: Best Way to Handle All Exceptions
2009/7/13 seldan24 :
> Thank you both for your input. I want to make sure I get started on
> the right track. For this particular script, I should have included
> that I would take the exception contents, and pass those to the
> logging module. For this particular script, all exceptions are fatal
> and I would want them to be. I just wanted a way to catch them and
> log them prior to program termination.
The logging module has a function specifically to handle this case:
try:
# do something
except:
logging.exception("Uh oh...")
The exception() method will automatically add details of the exception
to the log message it creates; as per the docs, you should only call
it from within an exception handler.
Hope that helps,
Vil.
--
http://mail.python.org/mailman/listinfo/python-list
Re: Memory error due to big input file
2009/7/13 Aaron Scott : >> BTW, you should derive all your classes from something. If nothing >> else, use object. >> class textfile(object): > > Just out of curiousity... why is that? I've been coding in Python for > a long time, and I never derive my base classes. What's the advantage > to deriving them? class Foo: uses the old object model. class Foo(object): uses the new object model. See http://docs.python.org/reference/datamodel.html (specifically section 3.3) for details of the differences. Vil. -- http://mail.python.org/mailman/listinfo/python-list
type hinting backward compatibility with python 3.0 to 3.4
I think it's great that for built-in types such as int and str, backward
compatibility of type hinting annotations is baked into python 3.0 to 3.4. In
fact, I *thought* python 3.0 to 3.4 would *ignore* annotations, but it
doesn't...
I'm struggling to create something backward compatible that requires the
'typing' module. For example, the following program is good in python 3.5, but
line 11 is a syntax error in python 3.4:
1 import sys
2
3 if sys.version_info[0] < 3:
4 raise RuntimeError("Must use at least python version 3")
5
6 # The 'typing' module, useful for type hints, was introduced in python
3.5
7 if sys.version_info[1] >= 5:
8 from typing import Optional
9
10
11 def divider(x: int, y: int) -> Optional[float]:
12 if y == 0:
13 return None
14 return x / y
15
16 print("22 / 7 = " + str(divider(22, 7)))
17 print("8 / 0 = " + str(divider(8, 0)))
18
When I run this program in python 3.4, I get this:
Traceback (most recent call last):
File "./ned.py", line 11, in
def divider(x: int, y: int) -> Optional[float]:
NameError: name 'Optional' is not defined
--
https://mail.python.org/mailman/listinfo/python-list
RE: type hinting backward compatibility with python 3.0 to 3.4
This pattern seems to work:
import sys
if sys.version_info[0] < 3:
raise RuntimeError("Must use at least python version 3")
# The 'typing' module, useful for type hints, was introduced in python 3.5
if sys.version_info[1] >= 5:
from typing import Optional
optional_float = Optional[float]
else:
optional_float = object
def divider(x: int, y: int) -> optional_float:
if y == 0:
return None
return x / y
print("3 / 0 = " + str(divider(3,0)))
print("22 / 7 = " + str(divider(22,7)))
--
https://mail.python.org/mailman/listinfo/python-list
