Hello arun,
Thinking about it, I believe this one is reasonably solid.
I've added a 'txt0' just in case it wouldn't like shorter.
txt0 <- "my name name is micky"
txt1 <- "my name name name is micky"
txt2 <- "my name name name name is micky"
pat <- "(\\w+\\s)\\1+"
gsub(pat, "\\1", txt0)
gsub(pat, "\\1", txt1)
gsub(pat, "\\1", txt2)
Rui Barradas
Em 14-06-2012 17:32, arun escreveu:
Hi Carlos,
Thanks for your suggestions. I saw Rui's reply about the same problem using rle. It
looks very solid. I was trying replicate the same thing with "gsub", but it
was not working in that way.
For example,
txt1<-"my name name name is micky"
gsub("\\b(\\w+)\\b(\\s+)\\1\\2","",txt1)
[1] "my name is micky"
txt2<-"my name name name name is micky"
gsub("\\b(\\w+)\\b(\\s+)\\1\\2","",txt2)
[1] "my is micky"
I still think there must be a way in gsub to make it more general.
A.K.
________________________________
From: Carlos Ortega <c...@qualityexcellence.es>
To: arun <smartpink...@yahoo.com>
Sent: Thursday, June 14, 2012 12:11 PM
Subject: Re: [R] need help
Hi,
The way to make it very general and independant of the string "name" can be
done as follows:
a) For every sentece make a "table()" and get the word with the highest number
of occurences.
b) With that word you can follow the procedure I gave you.
This sequence is not free of possible errors, it can capture prepositions (they
appears many times in every sentence) so you will have to make the algorithm a
little bit more complex, like besides getting the occurences of every word,
getting their lenghts too and only change those words with the highest number
of occurences and highest lengths... but again is not free of possible errors...
Despite regex offers a lot of possibilities, no doubt, I prefer the simplicity
stringr offers..
Regards,
Carlos Ortega
www.qualityexcellence.es
2012/6/14 arun <smartpink...@yahoo.com>
Hi,
For the example you gave, the regex below works:
txt1<-"my name name name is micky"
gsub("\\b(\\w+)\\b(\\s+)\\1\\2","",txt1)
[1] "my name is micky"
But, the expression is not a generalized one.
A.K.
----- Original Message -----
From: shilpa rai <raishilpa....@gmail.com>
To: r-help@r-project.org
Cc:
Sent: Wednesday, June 13, 2012 6:56 AM
Subject: [R] need help
hello
could you help in solving the following problem
I want to replace same consecutive words by a single word in a sentence..
for example --- my name name name is micky
so I want the output like this--my name is micky
I want this solution for a text file
can you tell me the code for it??
thanking you in anticipation
--
Shilpa Rai
MSc.(2011-2013)
Applied Statistics and Informatics
Indian Institute of Technology,Bombay
[[alternative HTML version deleted]]
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.