Re: How to extract certain set of lines from PDF
Thank you for your response,in case if its a word file how could i do this. On Tuesday, March 19, 2013 7:16:00 PM UTC+5:30, Joel Goldstick wrote: > On Tue, Mar 19, 2013 at 9:16 AM, wrote: > > Hello, > > > > I need to extract certain set of lines from PDF > > Ex:- > > IF(..) > > .. > > .. > > IF(.) > > ... > > ... > > ENDIF > > ENDIF > > > > I need to copy entire lines from first "IF" till last "ENDIF".and extract it > to seperate row of excel sheet.when ever a new occurrance of this kind of IF > loops are found out. > > -- > > http://mail.python.org/mailman/listinfo/python-list > > > > You might start with this: http://knowah.github.com/PyPDF2/ > > > I've never had to read pdf files, but it looks like there are several > libraries to choose from > > > > > -- > > > Joel Goldstick > http://joelgoldstick.com -- http://mail.python.org/mailman/listinfo/python-list
Need help in extracting lines from word using python
I'm currently trying to extract some data between 2 lines of an input file
using Python. the infile is set up such that there is a line -START- where I
need the next 10 lines of code if and only if the -END- condition occurs before
the next -START-. The -START- line occurs many times before the -END-. Heres a
general example of what I mean:
blah
blah
-START-
10 lines I DONT need
blah
-START-
10 lines I need
blah
blah
-END-
blah
blah
-START-
10 lines I dont need
blah
-START-
and so on and so forth
so far I have only been able to get the -START- + 10 lines for every iteration,
but am at a total loss when it comes to specifying the condition to only write
if the -END- condition comes before another -START- condition. I'm a bit of a
newb, so any help will be greatly appreciated.
heres the code I have for printing the -START- + 10 lines:
in = open('input.log')
out = open('output.txt', 'a')
lines = in.readlines()
for i, line in enumerate(lines):
if (line.find('START')) > -1:
out.write(line)
out.write(lines[i + 1])
out.write(lines[i + 2])
out.write(lines[i + 3])
out.write(lines[i + 4])
out.write(lines[i + 5])
out.write(lines[i + 6])
out.write(lines[i + 7])
out.write(lines[i + 8])
out.write(lines[i + 9])
out.write(lines[i + 10])
--
http://mail.python.org/mailman/listinfo/python-list
Re: Need help in extracting lines from word using python
Thanks steven
On Tuesday, March 19, 2013 8:11:22 PM UTC+5:30, Steven D'Aprano wrote:
> On Tue, 19 Mar 2013 07:20:57 -0700, razinzamada wrote:
>
>
>
> > I'm currently trying to extract some data between 2 lines of an input
>
> > file using Python. the infile is set up such that there is a line
>
> > -START- where I need the next 10 lines of code if and only if the -END-
>
> > condition occurs before the next -START-. The -START- line occurs many
>
> > times before the -END-. Heres a general example of what I mean:
>
> >
>
> > blah
>
> > blah
>
> > -START-
>
> > 10 lines I DONT need
>
> > blah
>
> > -START-
>
> > 10 lines I need
>
> > blah
>
> > blah
>
> > -END-
>
> > blah
>
> > blah
>
> > -START-
>
> > 10 lines I dont need
>
> > blah
>
> > -START-
>
> >
>
> > and so on and so forth
>
>
>
> [...]
>
>
>
> > heres the code I have for printing the -START- + 10 lines:
>
> >
>
> > in = open('input.log')
>
>
>
> No it is not. "in" is a reserved word in Python, that code cannot
>
> possibly work, it will give a SyntaxError.
>
>
>
>
>
> Try this code. Untested but it should do want you want.
>
>
>
>
>
> infile = open('input.log')
>
> outfile = open('output.txt', 'a')
>
> # Accumulate lines between START and END lines, ignoring everything else.
>
> collect = False # Initially we start by ignoring lines.
>
> for line in infile:
>
> if '-START-' in line:
>
> # Ignore any lines already seen, and start collecting.
>
> accum = []
>
> collect = True
>
> elif '-END-' in line:
>
> # Write the first ten accumulated lines.
>
> outfile.writelines(accum[:10])
>
> # Clear the accumulated lines.
>
> accum = []
>
> # and stop collecting until the next START line
>
> collect = False
>
> elif collect:
>
> accum.append(line)
>
>
>
> outfile.close()
>
> infile.close()
>
>
>
>
>
>
>
> --
>
> Steven
--
http://mail.python.org/mailman/listinfo/python-list
Re: Need help in extracting lines from word using python
Thanks DAVE On Tuesday, March 19, 2013 8:24:24 PM UTC+5:30, Dave Angel wrote: > On 03/19/2013 10:20 AM, [email protected] wrote: > > > I'm currently trying to extract some data between 2 lines of an input file > > > > Your subject line says "from word". I'm only guessing that you might > > mean Microsoft Word, a proprietary program that does not, by default, > > save text files. The following code and description assumes a text > > file, so there's a contradiction. > > > > > > > using Python. the infile is set up such that there is a line -START- where > > I need the next 10 lines of code if and only if the -END- condition occurs > > before the next -START-. The -START- line occurs many times before the > > -END-. Heres a general example of what I mean: > > > > > > > In other words, you want to scan for -END-, then go backwards to -START- > > and use the first ten of the lines between? Try coding it that way, and > > perhaps it'll be easier. > > > > You also need to consider (and specify behavior for) the possibility > > that start and end are less than 10 lines apart. > > > > > blah > > > blah > > > -START- > > > 10 lines I DONT need > > > blah > > > -START- > > > 10 lines I need > > > blah > > > blah > > > -END- > > > blah > > > blah > > > -START- > > > 10 lines I dont need > > > blah > > > -START- > > > > > > and so on and so forth > > > > > > so far I have only been able to get the -START- + 10 lines for every > > iteration, but am at a total loss when it comes to specifying the condition > > to only write if the -END- condition comes before another -START- > > condition. I'm a bit of a newb, so any help will be greatly appreciated. > > > > > > > > > heres the code I have for printing the -START- + 10 lines: > > > > > > in = open('input.log') > > > out = open('output.txt', 'a') > > > > > > lines = in.readlines() > > > for i, line in enumerate(lines): > > > if (line.find('START')) > -1: > > > out.write(line) > > > out.write(lines[i + 1]) > > > out.write(lines[i + 2]) > > > out.write(lines[i + 3]) > > > out.write(lines[i + 4]) > > > out.write(lines[i + 5]) > > > out.write(lines[i + 6]) > > > out.write(lines[i + 7]) > > > out.write(lines[i + 8]) > > > out.write(lines[i + 9]) > > > out.write(lines[i + 10]) > > > > or justout.write(lines[i:i+11) to write out all 11 of them. > > > > > > > > > -- > > DaveA -- http://mail.python.org/mailman/listinfo/python-list
