On Dec 8, 2009, at 8:46 PM, Lynn Wang wrote:



Hi all,

I have two sets:

dig<-c("DAVID ADAMS","PIERS AKERMAN","SHERYLE BAGWELL","JULIAN BAJKOWSKI","CANDIDA BAKER")

import<-c("by DAVID ADAMS","piersAKERMAN","SHERYLE BagWEL","JULIAN BAJKOWSKI with ","Cand BAKER","smith green")


I want to get the following result from "import" after comparing the two sets

result<-c("by DAVID ADAMS","piersAKERMAN","JULIAN BAJKOWSKI with ")

> sapply(dig, function(x) grep(x, import) ) >0
DAVID ADAMS PIERS AKERMAN SHERYLE BAGWELL JULIAN BAJKOWSKI CANDIDA BAKER TRUE NA NA TRUE NA

#Not exactly so need a partial match function that is more flexible. Unfortunately the Levenshtein function in MiscPsycho is not vectorized:


> import<-c("by DAVID ADAMS","piersAKERMAN","SHERYLE BagWEL","JULIAN BAJKOWSKI with ","Cand BAKER","smith green") > dig<-c("DAVID ADAMS","PIERS AKERMAN","SHERYLE BAGWELL","JULIAN BAJKOWSKI","CANDIDA BAKER")
> library(MiscPsycho)
> import<-c("by DAVID ADAMS","piersAKERMAN","SHERYLE BagWEL","JULIAN BAJKOWSKI with ","Cand BAKER","smith green")
> word.pairs <- expand.grid(dig,import)
> wordpairs <- lapply(word.pairs,  as.character)
> wp2 <-data.frame(dig= wordpairs[[1]], import=wordpairs[[2]], stringsAsFactors=F)
> wp2$distnc <- apply(wp2, 1, function(x) stringMatch( x[1], x[2] ) )
>  wp2[wp2$distnc >.7, ]
                dig                 import    distnc
1       DAVID ADAMS         by DAVID ADAMS 0.7142857
7     PIERS AKERMAN           piersAKERMAN 0.9230769
13  SHERYLE BAGWELL         SHERYLE BagWEL 0.9333333
19 JULIAN BAJKOWSKI JULIAN BAJKOWSKI with  0.7272727
25    CANDIDA BAKER             Cand BAKER 0.7692308


(I think you missed a couple of obvious matches that ought to be in the list)

--
David



I created a "partialmatch" function as follow, but can not get right result.

partialmatch<- function(x, y) as.vector(y[regexpr(as.character(x), as.character(y), ignore.case = TRUE)>0])

result<-partialmatch(dig,import)


[1] "by DAVID ADAMS"



Thanks,

lynn


__________________________________________________________________________________
Win 1 of 4 Sony home entertainment packs thanks to Yahoo!7.
Enter now: http://au.docs.yahoo.com/homepageset/
        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

David Winsemius, MD
Heritage Laboratories
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to