Reading files, splitting on a delimiter and newlines.

2007-07-25 Thread chrispwd
Hello,

I have a situation where I have a file that contains text similar to:

myValue1 = contents of value1
myValue2 = contents of value2 but
with a new line here
myValue3 = contents of value3

My first approach was to open the file, use readlines to split the
lines on the "=" delimiter into a key/value pair (to be stored in a
dict).

After processing a couple files I noticed its possible that a newline
can be present in the value as shown in myValue2.

In this case its not an option to say remove the newlines if its a
"multi line" value as the value data needs to stay intact.

I'm a bit confused as how to go about getting this to work.

Any suggestions on an approach would be greatly appreciated!

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Reading files, splitting on a delimiter and newlines.

2007-07-25 Thread chrispwd
On Jul 25, 7:56 pm, "[EMAIL PROTECTED]"
<[EMAIL PROTECTED]> wrote:
> On Jul 25, 8:46 am, [EMAIL PROTECTED] wrote:
>
>
>
> > Hello,
>
> > I have a situation where I have a file that contains text similar to:
>
> > myValue1 = contents of value1
> > myValue2 = contents of value2 but
> > with a new line here
> > myValue3 = contents of value3
>
> > My first approach was to open the file, use readlines to split the
> > lines on the "=" delimiter into a key/value pair (to be stored in a
> > dict).
>
> > After processing a couple files I noticed its possible that a newline
> > can be present in the value as shown in myValue2.
>
> > In this case its not an option to say remove the newlines if its a
> > "multi line" value as the value data needs to stay intact.
>
> > I'm a bit confused as how to go about getting this to work.
>
> > Any suggestions on an approach would be greatly appreciated!
>
> Check the length of the list returned from split; this allows
> your to append to the previously extracted value if need be.
>
> import StringIO
> import pprint
>
> buf = """\
> myValue1 = contents of value1
> myValue2 = contents of value2 but
>with a new line here
> myValue3 = contents of value3
> """
>
> mockfile = StringIO.StringIO(buf)
>
> record=dict()
>
> for line in mockfile:
> kvpair = line.split('=', 2)
> if len(kvpair) == 2:
> key, value = kvpair
> record[key] = value
> else:
> record[key] += line
>
> pprint.pprint(record)
>
> # lstrip() to remove newlines if needed ...
>
> --
> Hope this helps,
> Steven

Great thank you! That was the logic I was looking for.

-- 
http://mail.python.org/mailman/listinfo/python-list


Buffering HTML as HTMLParser reads it?

2007-08-01 Thread chrispwd
Hello,

I am working on a project where I'm using python to parse HTML pages,
transforming data between certain tags. Currently the HTMLParser class
is being used for this. In a nutshell, its pretty simple -- I'm
feeding the contents of the HTML page to HTMLParser, then I am
overriding the appropriate handle_ method to handle this extracted
data. In that method, I take the found data and I transform it into
another string based on some logic.

Now, what I would like to do here is take that transformed string and
put it "back into" the HTML document. Has anybody ever implemented
something like this with HTMLParser?

I'm thinking maybe somehow have HTMLParser append each character it
reads except for data inside tags in some kind of buffer? This way I
can have the HTML contents read into a buffer, then when I do my own
handle_ overrides, I can also append to that buffer with the
transformed data. Once the HTML page is finished parsing, ideally I
would be able to print the contents of the buffer and the HTML would
be identical except for the string transformations.

I also need to make sure that all newlines, tags, spacing, etc are
kept in tact -- this part is a requirement for other reasons.

Thanks!

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Buffering HTML as HTMLParser reads it?

2007-08-05 Thread chrispwd
On Aug 1, 4:08 pm, Paul McGuire <[EMAIL PROTECTED]> wrote:
> On Aug 1, 1:31 pm, [EMAIL PROTECTED] wrote:
> 
>
>
>
> > I'm thinking maybe somehow haveHTMLParserappend each character it
> > reads except for data inside tags in some kind of buffer? This way I
> > can have the HTML contents read into a buffer, then when I do my own
> > handle_ overrides, I can also append to that buffer with the
> > transformed data. Once the HTML page is finished parsing, ideally I
> > would be able to print the contents of the buffer and the HTML would
> > be identical except for the string transformations.
>
> > I also need to make sure that all newlines, tags, spacing, etc are
> > kept in tact -- this part is a requirement for other reasons.
>
> > Thanks!
>
> What you describe is almost exactly how pyparsing implements
> transformString.  See below:
>
> from pyparsing import *
>
> boldStart,boldEnd = makeHTMLTags("B")
>
> # convert  to  and  to 
> boldStart.setParseAction(replaceWith(''))
> boldEnd.setParseAction(replaceWith(''))
> converter = boldStart | boldEnd
>
> html = "Display this in bold"
> print converter.transformString(html)
>
> Prints:
>
> Display this in bold
>
> All text not matched by a pattern in the converter is left as-is.  (My
> CSS style/form may not be up to date, but I hope you get the idea.)
>
> -- Paul

Hello,

Sorry for the delay in reply, and that you for the info. Though, I
think either I am mis-understanding your post or its not the solution
I'm looking for.

How does this fit into what I'm looking to do with HTMLParser?

Thanks!

-- 
http://mail.python.org/mailman/listinfo/python-list