Reading files, splitting on a delimiter and newlines.
Hello, I have a situation where I have a file that contains text similar to: myValue1 = contents of value1 myValue2 = contents of value2 but with a new line here myValue3 = contents of value3 My first approach was to open the file, use readlines to split the lines on the "=" delimiter into a key/value pair (to be stored in a dict). After processing a couple files I noticed its possible that a newline can be present in the value as shown in myValue2. In this case its not an option to say remove the newlines if its a "multi line" value as the value data needs to stay intact. I'm a bit confused as how to go about getting this to work. Any suggestions on an approach would be greatly appreciated! -- http://mail.python.org/mailman/listinfo/python-list
Re: Reading files, splitting on a delimiter and newlines.
On Jul 25, 7:56 pm, "[EMAIL PROTECTED]"
<[EMAIL PROTECTED]> wrote:
> On Jul 25, 8:46 am, [EMAIL PROTECTED] wrote:
>
>
>
> > Hello,
>
> > I have a situation where I have a file that contains text similar to:
>
> > myValue1 = contents of value1
> > myValue2 = contents of value2 but
> > with a new line here
> > myValue3 = contents of value3
>
> > My first approach was to open the file, use readlines to split the
> > lines on the "=" delimiter into a key/value pair (to be stored in a
> > dict).
>
> > After processing a couple files I noticed its possible that a newline
> > can be present in the value as shown in myValue2.
>
> > In this case its not an option to say remove the newlines if its a
> > "multi line" value as the value data needs to stay intact.
>
> > I'm a bit confused as how to go about getting this to work.
>
> > Any suggestions on an approach would be greatly appreciated!
>
> Check the length of the list returned from split; this allows
> your to append to the previously extracted value if need be.
>
> import StringIO
> import pprint
>
> buf = """\
> myValue1 = contents of value1
> myValue2 = contents of value2 but
>with a new line here
> myValue3 = contents of value3
> """
>
> mockfile = StringIO.StringIO(buf)
>
> record=dict()
>
> for line in mockfile:
> kvpair = line.split('=', 2)
> if len(kvpair) == 2:
> key, value = kvpair
> record[key] = value
> else:
> record[key] += line
>
> pprint.pprint(record)
>
> # lstrip() to remove newlines if needed ...
>
> --
> Hope this helps,
> Steven
Great thank you! That was the logic I was looking for.
--
http://mail.python.org/mailman/listinfo/python-list
Buffering HTML as HTMLParser reads it?
Hello, I am working on a project where I'm using python to parse HTML pages, transforming data between certain tags. Currently the HTMLParser class is being used for this. In a nutshell, its pretty simple -- I'm feeding the contents of the HTML page to HTMLParser, then I am overriding the appropriate handle_ method to handle this extracted data. In that method, I take the found data and I transform it into another string based on some logic. Now, what I would like to do here is take that transformed string and put it "back into" the HTML document. Has anybody ever implemented something like this with HTMLParser? I'm thinking maybe somehow have HTMLParser append each character it reads except for data inside tags in some kind of buffer? This way I can have the HTML contents read into a buffer, then when I do my own handle_ overrides, I can also append to that buffer with the transformed data. Once the HTML page is finished parsing, ideally I would be able to print the contents of the buffer and the HTML would be identical except for the string transformations. I also need to make sure that all newlines, tags, spacing, etc are kept in tact -- this part is a requirement for other reasons. Thanks! -- http://mail.python.org/mailman/listinfo/python-list
Re: Buffering HTML as HTMLParser reads it?
On Aug 1, 4:08 pm, Paul McGuire <[EMAIL PROTECTED]> wrote:
> On Aug 1, 1:31 pm, [EMAIL PROTECTED] wrote:
>
>
>
>
> > I'm thinking maybe somehow haveHTMLParserappend each character it
> > reads except for data inside tags in some kind of buffer? This way I
> > can have the HTML contents read into a buffer, then when I do my own
> > handle_ overrides, I can also append to that buffer with the
> > transformed data. Once the HTML page is finished parsing, ideally I
> > would be able to print the contents of the buffer and the HTML would
> > be identical except for the string transformations.
>
> > I also need to make sure that all newlines, tags, spacing, etc are
> > kept in tact -- this part is a requirement for other reasons.
>
> > Thanks!
>
> What you describe is almost exactly how pyparsing implements
> transformString. See below:
>
> from pyparsing import *
>
> boldStart,boldEnd = makeHTMLTags("B")
>
> # convert to and to
> boldStart.setParseAction(replaceWith(''))
> boldEnd.setParseAction(replaceWith(''))
> converter = boldStart | boldEnd
>
> html = "Display this in bold"
> print converter.transformString(html)
>
> Prints:
>
> Display this in bold
>
> All text not matched by a pattern in the converter is left as-is. (My
> CSS style/form may not be up to date, but I hope you get the idea.)
>
> -- Paul
Hello,
Sorry for the delay in reply, and that you for the info. Though, I
think either I am mis-understanding your post or its not the solution
I'm looking for.
How does this fit into what I'm looking to do with HTMLParser?
Thanks!
--
http://mail.python.org/mailman/listinfo/python-list
