On 02/01/2013 03:09 PM, Scurvy Scott wrote:
Hey all how're things?
I'm hoping for some guidance on a problem I'm trying to work through.
I know this has been previously covered on this list but I'm hoping it
won't bother you guys to run through it again.
My basic program I'm attempting to create is like this..
I want to read from a large, very large file.
I want to find a certain string
if it finds the string I would like to select the first 15-20
characters pre and proceeding the string and then output that new
string to a new file along with the line the string was located on
within the file.
Why not just use grep ?
It seems fairly straight forward but I'm wondering if y'all can point
me to a direction that would help me accomplish this..
Firstly I know I can read a file and search for the string with (a
portion of this code was found on stackoverflow and is not mine and
some of it is my own)
First, you probably want to do something to quit when you get your first
match. If you do want to continue finding matches, then you'd have to
change the location of that open() on the newfile. Currently, it'll
throw out any earlier contents, and just write the match.
The linenum is easy, using enumerate.
with open('largeFile', 'r') as inF:
for line in inF:
for linenum, line in enumerate(inF):
myString = "The String"
This should be moved to a location before the loop; it's a waste
reassigning it every time through the loop.
if 'myString' in line:
f = open(thenewfile', 'w')
f.write(myString)
f.close()
break #quit upon first match
I guess what I'm looking for then is tips on A)My stated goal of also
writing the 15-20 characters before and after myString to the new file
and
B)finding the line number and writing that to the file as well.
Any information you can give me or pointers would be awesome, thanks in advance.
I'm on Ubuntu 12.10 running LXDE and working with Python 2.7
About giving the 15 characters before and after the match:
Is it sufficient to truncate that spec at the line boundaries? What I
mean is that if the match occurs at column 10, do you really need the
last 5 characters of the previous line? Likewise, if it occurs near
the end of the line, do you need some from the next line(s) ?
If you never need to show more than the current line, then you can parse
the line (write a separate function). If you have to go 15 characters
earlier in the file, then consider using file.seek
http://docs.python.org/2/library/stdtypes.html?highlight=seek#file.seek
The catch to that is that it messes up the position in the file, so if
you do want multiple matches, you'll need to use file.tell to save and
restore the location to continue reading lines.
Lots of other options, but it all depends on what you REALLY want.
--
DaveA
_______________________________________________
Tutor maillist - Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor