On Tue, Aug 24, 2010 at 11:25 AM, Martin Morgan <mtmor...@fhcrc.org> wrote: > On 08/24/2010 07:27 AM, Doran, Harold wrote: >> There is the stringMatch function in the MiscPsycho package. >> >>> stringMatch('Hadley', 'Hadley Wickham', normalize = 'no') >> [1] 8 >>> stringMatch('Hadley', 'Hadley Wickham', normalize = 'yes') >> [1] 0.4285714 >> >> It uses Levenshtein distance to tell you how much they differ by, either >> normalized or not. So, the above two tell you the first string differs from >> the second string by 8 insertions/deletions/substitutions. The second number >> normalizes the comparison such that 1 denotes perfect agreement and 2 >> denotes imperfect agreement. >> >> Examples of an exact match are below. >> >>> stringMatch('Hadley Wickham', 'Hadley Wickham', normalize = 'yes') >> [1] 1 >>> stringMatch('Hadley Wickham', 'Hadley Wickham', normalize = 'n') >> [1] 0 > > You're probably looking for something lighter weight, but Bioconductor > Biostrings has pairwiseAlignment. > >> library(Biostrings) >> pairwiseAlignment("Hadley Wickham", "Hadley Hamwick") > Global PairwiseAlignedFixedSubject (1 of 1) > pattern: [1] Hadley W---ick > subject: [1] Hadley Hamwick > score: 29.5102 > >> pairwiseAlignment("Hadley Hamwick", "Hadley Wickham") > Global PairwiseAlignedFixedSubject (1 of 1) > pattern: [1] Hadley Hamwick > subject: [1] Hadley W---ick > score: 29.5102 > >> aln <- pairwiseAlignment("Hadley Hamwick", "Haderley Hamwich") >> consensusMatrix(aln)["-",] > [1] 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0
Thanks all for the suggestions! Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.