Hello,
The following code returns 'abc123abc45abc789jk'. How do I revise the pattern so
that the return value will be 'abc789jk'? In other words, I want to find the
pattern 'abc' that is closest to 'jk'. Here the string '123', '45' and '789' are
just examples. They are actually quite different in
Le Mon, 27 Apr 2009 23:29:13 -0400,
Dan Liang s'exprima ainsi:
> Hi Bob, Shantanoo, Kent, and tutors,
>
> Thank you Bob, Shantanoo, Kent for all the nice feedback. Exception
> handling, the concept of states in cs, and the use of the for loop with
> offset helped a lot. Here is the code I now ha
David wrote:
Norman Khine wrote:
On Mon, Apr 27, 2009 at 12:07 AM, Sander Sweers
wrote:
Here is another one for fun, you run it like
python countdown.py 10
#!/usr/bin/env python
import sys
from time import sleep
times = int(sys.argv[1]) # The argument given on the command line
def countdow
Denis, this mail was very comprehensive, and went a long way of driving
it all home for me.
There are several different concepts that are involved in this simple
problem that I had, and
you guys explaining them has really expanded my pythonic horizon,
especially the explanations
on the argv mod
> Hello,
>
> The following code returns 'abc123abc45abc789jk'. How do I revise the pattern
> so
> that the return value will be 'abc789jk'? In other words, I want to find the
> pattern 'abc' that is closest to 'jk'. Here the string '123', '45' and '789'
> are
> just examples. They are actually q
Dnia 28 kwietnia 2009 11:16 Andre Engels napisał(a):
> 2009/4/28 Marek spociń...@go2.pl,Poland :
> >> Hello,
> >>
> >> The following code returns 'abc123abc45abc789jk'. How do I revise the
> >> pattern so
> >> that the return value will be 'abc789jk'? In other words, I want to find
> >> the
> >>
Le Tue, 28 Apr 2009 11:06:16 +0200,
Marek spociń...@go2.pl, Poland s'exprima ainsi:
> > Hello,
> >
> > The following code returns 'abc123abc45abc789jk'. How do I revise the
> > pattern so that the return value will be 'abc789jk'? In other words, I
> > want to find the pattern 'abc' that is clos
Andre Engels gmail.com> writes:
>
> 2009/4/28 Marek Spociński go2.pl,Poland 10g.pl>:
> > I suggest using r'abc.+?jk' instead.
> >
>
> That was my first idea too, but it does not work for this case,
> because Python will still try to _start_ the match as soon as
> possible.
yeah, i tried t
2009/4/28 Marek spociń...@go2.pl,Poland :
>> import re
>> s = 'abc123abc45abc789jk'
>> p = r'abc.+jk'
>> lst = re.findall(p, s)
>> print lst[0]
>
> I suggest using r'abc.+?jk' instead.
>
> the additional ? makes the preceeding '.+' non-greedy so instead of matching
> as long string as it can it m
On Tue, Apr 28, 2009 at 4:03 AM, Kelie wrote:
> Hello,
>
> The following code returns 'abc123abc45abc789jk'. How do I revise the pattern
> so
> that the return value will be 'abc789jk'? In other words, I want to find the
> pattern 'abc' that is closest to 'jk'. Here the string '123', '45' and '78
2009/4/28 Marek spociń...@go2.pl,Poland :
>> Hello,
>>
>> The following code returns 'abc123abc45abc789jk'. How do I revise the
>> pattern so
>> that the return value will be 'abc789jk'? In other words, I want to find the
>> pattern 'abc' that is closest to 'jk'. Here the string '123', '45' and '7
spir free.fr> writes:
> To avoid that, use non-grouping parens (?:...). This also avoids the need for
parens around the whole format:
> p = Pattern(r'abc(?:(?!abc).)+jk')
> print p.findall(s)
> ['abc789jk']
>
> Denis
This one works! Thank you Denis. I'll try it out on the actual much longer
(m
I'm processing tens of thousands of html files and a few of them contain
mismatched tags and ElementTree throws the error:
"Unexpected error opening J:/F2/663/blahblah.html: mismatched tag: line 124,
column 8"
I now want to scan each file and simply identify each mismatched or unpaired
tags (b
Dinesh B Vadhia wrote:
I'm processing tens of thousands of html files and a few of them contain
mismatched tags and ElementTree throws the error:
"Unexpected error opening J:/F2/663/blahblah.html: mismatched tag: line 124, column
8"
I now want to scan each file and simply identify each mismat
A.T.Hofkamp wrote:
> Dinesh B Vadhia wrote:
>> I'm processing tens of thousands of html files and a few of them
>> contain mismatched tags and ElementTree throws the error:
>>
>> "Unexpected error opening J:/F2/663/blahblah.html: mismatched tag:
>> line 124, column 8"
>>
>> I now want to scan each
A.T. / Marty
I'd prefer that the html parser didn't replace the missing tags as I want to
know where and what the problems are. Also, the source html documents were
generated by another computer ie. they are not web page documents. My sense is
that it is only a few files out of tens of thousa
On Tue, Apr 28, 2009 at 8:54 AM, Dinesh B Vadhia
wrote:
> I'm processing tens of thousands of html files and a few of them contain
> mismatched tags and ElementTree throws the error:
>
> "Unexpected error opening J:/F2/663/blahblah.html: mismatched tag: line 124,
> column 8"
>
> I now want to scan
Dinesh B Vadhia wrote:
> A.T. / Marty
>
> I'd prefer that the html parser didn't replace the missing tags as I
> want to know where and what the problems are. Also, the source html
> documents were generated by another computer ie. they are not web page
> documents. My sense is that it is only
This is the error and traceback:
Unexpected error opening J:/F2/html: mismatched tag: line 124, column 8
Traceback (most recent call last):
File "C:\py", line 492, in
raw = extractText(xhtmlfile)
File "C:\py", line 334, in extractText
tree = make_tree(xhtmlfile)
File ".
Le Tue, 28 Apr 2009 07:41:36 -0700,
"Dinesh B Vadhia" s'exprima ainsi:
> This is the error and traceback:
>
> Unexpected error opening J:/F2/html: mismatched tag: line 124, column 8
>
> Traceback (most recent call last):
> File "C:\py", line 492, in
> raw = extractText(xhtmlfile)
On Tue, Apr 28, 2009 at 10:41 AM, Dinesh B Vadhia
wrote:
> This is the error and traceback:
>
> Unexpected error opening J:/F2/html: mismatched tag: line 124, column 8
>
> Traceback (most recent call last):
> File "C:\py", line 492, in
> raw = extractText(xhtmlfile)
> File "C:\...
Found the mismatched tag on line 94:
"My Name in Nelma Lois Thornton-S.S. No. sjn-yz-yokv/p>"
should be:
"My Name in Nelma Lois Thornton-S.S. No. sjn-yz-yokv"
I'll run all the html files through a simple script to identify the mismatches
using etree. Thanks.
Dinesh
From: Kent Johnson
Sen
"Dinesh B Vadhia" wrote
I'm processing tens of thousands of html files and a few of them contain
mismatched tags and ElementTree throws the error:
"Unexpected error opening J:/F2/663/blahblah.html: mismatched tag: line
124, column 8"
IMHO the best way to cleanse HTML files is to use tidy.
I
A.T.Hofkamp wrote:
> Dinesh B Vadhia wrote:
>> I'm processing tens of thousands of html files and a few of them
>> contain mismatched tags and ElementTree throws the error:
>>
>> "Unexpected error opening J:/F2/663/blahblah.html: mismatched tag:
>> line 124, column 8"
>>
>> I now want to scan each
I am getting information from .txt files and posting them in fields on a
web site. I need to break up single strings so they are around 80
characters then a new line because when I enter the info to the form on
the website it has fields and it errors out with such a long string.
here is a samp
first, grabbing output from an external command try:
import commands
USE = commands.getoutput('grep USE /tmp/comprookie2000/emege_info.txt |head
-n1|cut -d\\"-f2')
then you can wrap strings,
import textwrap
Lines = textwrap.wrap(USE, 80) # return a list
so in short:
import commands, textwrap
Stefan / Alan et al
Thank-you for all the advice and links. A simple script using etree is
scanning 500K+ xhtml files and 2 files with mismatched files have been found so
far which can be fixed manually. I'll definitely look into "tidy" as it sounds
pretty cool. Because, we are running data
vince spicer wrote:
first, grabbing output from an external command try:
import commands
USE = commands.getoutput('grep USE /tmp/comprookie2000/emege_info.txt
|head -n1|cut -d\\"-f2')
then you can wrap strings,
import textwrap
Lines = textwrap.wrap(USE, 80) # return a list
so in short:
Hi,
following the example from
http://docs.python.org/3.0/howto/regex.html
If I execute the following code on the python shell (3.1a1):
>>> import re
>>> p = re.compile('ab*')
>>> p
I get the msg:
<_sre.SRE_Pattern object at 0x013A3440>
instead of the msg from the example:
Why I get an SRE_
Emilio Casbas wrote:
Hi,
following the example from
http://docs.python.org/3.0/howto/regex.html
...from version 3.0 docs...
If I execute the following code on the python shell (3.1a1):
import re
p = re.compile('ab*')
p
I get the msg:
<_sre.SRE_Pattern object at 0x013A3440>
... is the
Emile van Sebille wrote:
Emilio Casbas wrote:
Hi,
following the example from
http://docs.python.org/3.0/howto/regex.html
...from version 3.0 docs...
If I execute the following code on the python shell (3.1a1):
import re
p = re.compile('ab*')
p
I get the msg:
<_sre.SRE_Pattern object at
Dinesh B Vadhia wrote:
A.T. / Marty
I'd prefer that the html parser didn't replace the missing tags as I
want to know where and what the problems are. Also, the source html
documents were generated by another computer ie. they are not web page
documents.
If the source document was gener
David wrote:
> I am getting information from .txt files and posting them in fields on a
> web site. I need to break up single strings so they are around 80
> characters then a new line because when I enter the info to the form on
> the website it has fields and it errors out with such a long string
David wrote:
> vince spicer wrote:
>> first, grabbing output from an external command try:
>>
>> import commands
>>
>> USE = commands.getoutput('grep USE /tmp/comprookie2000/emege_info.txt
>> |head -n1|cut -d\\"-f2')
>>
>> then you can wrap strings,
>>
>> import textwrap
>>
>> Lines = textwrap.wr
Hi, there:
I am new to python, and now I got a trouble:
I have an application named canola, it is written under python 2.5, and can
run normally under python 2.5
But when it comes under python 2.6, problem up, it says:
Traceback (most recent call last):
File "/usr/lib/python2.6/site-packages/
35 matches
Mail list logo