Data Manipulation - Rows to Columns
Hello All, I have a text file with marked up data that I need to convert into a text tab separated file. The structure of the input file is listed below (see file 1) and the desired output file is below as well (see file 2). I am a complete novice with python and would appreciate any tips you may be able to provide. Best, Tess file 1: TABLE black blue red CHAIR yellow black red TABLE white gray pink file 2 (tab separated): TABLE black bluered CHAIR yellow black red TABLE white graypink -- http://mail.python.org/mailman/listinfo/python-list
Beautiful Soup Looping Extraction Question
Hello All,
I have a Beautiful Soup question and I'd appreciate any guidance the
forum can provide.
Let's say I have a file that looks at file.html pasted below.
My goal is to extract all elements where the following is true: and .
The lines should be ordered in the same order as they appear in the
file - therefore the output file would look like output.txt below.
I experimented with something similar to this code:
for i in soup.findAll('p', align="left"):
print i
for i in soup.findAll('p', align="center"):
print i
I get something like this:
P4
P3
P1
div4b
div3b
div2b
div2a
Any guidance would be greatly appreciated.
Best,
Ira
##begin: file.html
P1
P2
div2a
div2b
P3
div3a
div3b
div3c
P4
div4a
div4b
##end: file.html
===begin: output.txt===
P1
div2a
div2b
P3
div3b
P4
div4b
===end: output.txt===
--
http://mail.python.org/mailman/listinfo/python-list
Re: Beautiful Soup Looping Extraction Question
Paul - thanks for the input, it's interesting to see how pyparser
handles it.
Anyhow, a simple regex took care of the issue in BS:
for i in soup.findAll(re.compile('^p|^div'),align=re.compile('^center|
^left')):
print i
Thanks again!
T
--
http://mail.python.org/mailman/listinfo/python-list
Re: Beautiful Soup Looping Extraction Question
Paul - you are very right. I am back to the drawing board. Tess -- http://mail.python.org/mailman/listinfo/python-list
