Re: How to extract certain set of lines from PDF

2013-03-19 Thread razinzamada
Thank you for your response,in case if its a word file how could i do this.

On Tuesday, March 19, 2013 7:16:00 PM UTC+5:30, Joel Goldstick wrote:
> On Tue, Mar 19, 2013 at 9:16 AM,   wrote:
> 
> Hello,
> 
> 
> 
> I need to extract certain set of lines from PDF
> 
> Ex:-
> 
> IF(..)
> 
> ..
> 
> ..
> 
>    IF(.)
> 
>    ...
> 
>    ...
> 
>    ENDIF
> 
> ENDIF
> 
> 
> 
> I need to copy entire lines from first "IF" till last "ENDIF".and extract it 
> to seperate row of excel sheet.when ever a new occurrance of this kind of IF 
> loops are found out.
> 
> --
> 
> http://mail.python.org/mailman/listinfo/python-list
> 
> 
> 
> You might start with this: http://knowah.github.com/PyPDF2/
> 
> 
> I've never had to read pdf files, but it looks like there are several 
> libraries to choose from
> 
> 
> 
> 
> -- 
> 
> 
> Joel Goldstick
> http://joelgoldstick.com
-- 
http://mail.python.org/mailman/listinfo/python-list


Need help in extracting lines from word using python

2013-03-19 Thread razinzamada
I'm currently trying to extract some data between 2 lines of an input file 
using Python. the infile is set up such that there is a line -START- where I 
need the next 10 lines of code if and only if the -END- condition occurs before 
the next -START-. The -START- line occurs many times before the -END-. Heres a 
general example of what I mean:

blah
blah
-START-
10 lines I DONT need
blah
-START-
10 lines I need
blah
blah
-END-
blah
blah
-START-
10 lines I dont need
blah
-START-

 and so on and so forth

so far I have only been able to get the -START- + 10 lines for every iteration, 
but am at a total loss when it comes to specifying the condition to only write 
if the -END- condition comes before another -START- condition. I'm a bit of a 
newb, so any help will be greatly appreciated.


heres the code I have for printing the -START- + 10 lines:

in = open('input.log')
out = open('output.txt', 'a')

lines = in.readlines()
for i, line in enumerate(lines):
if (line.find('START')) > -1:
out.write(line)
out.write(lines[i + 1])
out.write(lines[i + 2])
out.write(lines[i + 3])
out.write(lines[i + 4])
out.write(lines[i + 5])
out.write(lines[i + 6])
out.write(lines[i + 7])
out.write(lines[i + 8])
out.write(lines[i + 9])
out.write(lines[i + 10])
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Need help in extracting lines from word using python

2013-03-19 Thread razinzamada
Thanks steven

On Tuesday, March 19, 2013 8:11:22 PM UTC+5:30, Steven D'Aprano wrote:
> On Tue, 19 Mar 2013 07:20:57 -0700, razinzamada wrote:
> 
> 
> 
> > I'm currently trying to extract some data between 2 lines of an input
> 
> > file using Python. the infile is set up such that there is a line
> 
> > -START- where I need the next 10 lines of code if and only if the -END-
> 
> > condition occurs before the next -START-. The -START- line occurs many
> 
> > times before the -END-. Heres a general example of what I mean:
> 
> > 
> 
> > blah
> 
> > blah
> 
> > -START-
> 
> > 10 lines I DONT need
> 
> > blah
> 
> > -START-
> 
> > 10 lines I need
> 
> > blah
> 
> > blah
> 
> > -END-
> 
> > blah
> 
> > blah
> 
> > -START-
> 
> > 10 lines I dont need
> 
> > blah
> 
> > -START-
> 
> > 
> 
> >  and so on and so forth
> 
> 
> 
> [...]
> 
> 
> 
> > heres the code I have for printing the -START- + 10 lines:
> 
> > 
> 
> > in = open('input.log')
> 
> 
> 
> No it is not. "in" is a reserved word in Python, that code cannot 
> 
> possibly work, it will give a SyntaxError.
> 
> 
> 
> 
> 
> Try this code. Untested but it should do want you want.
> 
> 
> 
> 
> 
> infile = open('input.log')
> 
> outfile = open('output.txt', 'a')
> 
> # Accumulate lines between START and END lines, ignoring everything else.
> 
> collect = False  # Initially we start by ignoring lines.
> 
> for line in infile:
> 
> if '-START-' in line:
> 
> # Ignore any lines already seen, and start collecting.
> 
> accum = []
> 
> collect = True
> 
> elif '-END-' in line:
> 
> # Write the first ten accumulated lines.
> 
> outfile.writelines(accum[:10])
> 
> # Clear the accumulated lines.
> 
> accum = []
> 
> # and stop collecting until the next START line
> 
> collect = False
> 
> elif collect:
> 
> accum.append(line)
> 
> 
> 
> outfile.close()
> 
> infile.close()
> 
> 
> 
> 
> 
> 
> 
> -- 
> 
> Steven

-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Need help in extracting lines from word using python

2013-03-19 Thread razinzamada
Thanks DAVE

On Tuesday, March 19, 2013 8:24:24 PM UTC+5:30, Dave Angel wrote:
> On 03/19/2013 10:20 AM, [email protected] wrote:
> 
> > I'm currently trying to extract some data between 2 lines of an input file
> 
> 
> 
> Your subject line says "from word".  I'm only guessing that you might 
> 
> mean Microsoft Word, a proprietary program that does not, by default, 
> 
> save text files.  The following code and description assumes a text 
> 
> file, so there's a contradiction.
> 
> 
> 
> 
> 
> > using Python. the infile is set up such that there is a line -START- where 
> > I need the next 10 lines of code if and only if the -END- condition occurs 
> > before the next -START-. The -START- line occurs many times before the 
> > -END-. Heres a general example of what I mean:
> 
> >
> 
> 
> 
> In other words, you want to scan for -END-, then go backwards to -START- 
> 
> and use the first ten of the lines between?  Try coding it that way, and 
> 
> perhaps it'll be easier.
> 
> 
> 
> You also need to consider (and specify behavior for) the possibility 
> 
> that start and end are less than 10 lines apart.
> 
> 
> 
> > blah
> 
> > blah
> 
> > -START-
> 
> > 10 lines I DONT need
> 
> > blah
> 
> > -START-
> 
> > 10 lines I need
> 
> > blah
> 
> > blah
> 
> > -END-
> 
> > blah
> 
> > blah
> 
> > -START-
> 
> > 10 lines I dont need
> 
> > blah
> 
> > -START-
> 
> >
> 
> >  and so on and so forth
> 
> >
> 
> > so far I have only been able to get the -START- + 10 lines for every 
> > iteration, but am at a total loss when it comes to specifying the condition 
> > to only write if the -END- condition comes before another -START- 
> > condition. I'm a bit of a newb, so any help will be greatly appreciated.
> 
> >
> 
> >
> 
> > heres the code I have for printing the -START- + 10 lines:
> 
> >
> 
> >  in = open('input.log')
> 
> >  out = open('output.txt', 'a')
> 
> >
> 
> >  lines = in.readlines()
> 
> >  for i, line in enumerate(lines):
> 
> >  if (line.find('START')) > -1:
> 
> >  out.write(line)
> 
> >  out.write(lines[i + 1])
> 
> >  out.write(lines[i + 2])
> 
> >  out.write(lines[i + 3])
> 
> >  out.write(lines[i + 4])
> 
> >  out.write(lines[i + 5])
> 
> >  out.write(lines[i + 6])
> 
> >  out.write(lines[i + 7])
> 
> >  out.write(lines[i + 8])
> 
> >  out.write(lines[i + 9])
> 
> >  out.write(lines[i + 10])
> 
> 
> 
>  or justout.write(lines[i:i+11) to write out all 11 of them.
> 
> >
> 
> 
> 
> 
> 
> -- 
> 
> DaveA

-- 
http://mail.python.org/mailman/listinfo/python-list