Re: [Tutor] simple text replace

Dave Angel Mon, 27 Jul 2009 03:39:03 -0700

Albert-Jan Roskam wrote:

Hi!


Did you consider using a regex?

import re
re.sub("python\s", "snake ", "python is cool, pythonprogramming...")

Cheers!!
Albert-Jan


--- On Mon, 7/27/09, Dave Angel <da...@ieee.org> wrote:

From: Dave Angel <da...@ieee.org>
Subject: Re: [Tutor] simple text replace
To: "j booth" <j8o...@gmail.com>
Cc: tutor@python.org
Date: Monday, July 27, 2009, 12:41 AM
j booth wrote:

Hello,

I am scanning a text file and replacing words with

alternatives. My

difficulty is that all occurrences are replaced (even

if they are part of

another word!)..

This is an example of what I have been using:

     for line in

fileinput.FileInput("test_file.txt",inplace=1):

         line =

line.replace(original, new)

         print line,

   fileinput.close()

original and new are variables that have string values

from functions..

original finds each word in a text file and old is a

manipulated

replacement. Essentially, I would like to replace only

the occurrence that

is currently selected-- not the rest. for example:

python is great, but my python knowledge is limited!

regardless, I enjoy

pythonprogramming

returns something like:

snake is great, but my snake knowledge is limited!

regardless, I enjoy

snakeprogramming

thanks so much!

Not sure what you mean by "currently selected," you're
processing a line at a time, and there are multiple
legitimate occurrences of the word in the line.

The trick is to define what you mean by "word."replace() has no such notion. So we want to write a

function such as:

given three strings, line, inword, and outword.  Find
all occurrences of inword in the line, and replace all of
them with outword.  The definition of word is a group
of alphabetic characters (a-z perhaps) that is surrounded by
non-alphabetic characters.

The approach that I'd use is to prepare a translated copy
of the line as follows:   Replace each
non-alphabetic character with a space.  Also insert a
space at the beginning and one at the end.  Now, take

the inword, and similarly add spaces at begin and end.Now search this modified line for all occurrences of this

modified inword, and make a list of the indices where it is
found.  In your example line, there would be 2 items in
the list.

Now, using the original line, use that list of indices to
substitute the outword in the appropriate places.  Use
slices to do it, preferably from right to left, so the

indices will work even though the string is changing.(The easiest way to do right to left is to reverse() the

list.

DaveA

_______________________________________________
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

(Please don't top-post on this list. The message then appears out oforder. Append new responses to end, or inline when appropriate)

Yes, a regex would make a lot of sense here. But a person should nottake on regular expressions till they have lots of experience with therest of the language. Besides, it's pretty easy to have subtle bugs,even with such a simple case. For example your re string woulderroneously convert the word "newpython", and miss the last twooccurrences of the real word "python" near the end of the string.


import re

print st = re.sub("python\s", "snake ", "python is cool,pythonprogramming... newpython becomes python, or python")

Output: snake is cool, pythonprogramming... newsnake becomes python, orpython


(gives the wrong answer, in three places)

DaveA

_______________________________________________
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] simple text replace

Reply via email to