Re: [R] Comparing/diffing strings

Hadley Wickham Tue, 24 Aug 2010 17:36:30 -0700

On Tue, Aug 24, 2010 at 11:25 AM, Martin Morgan <mtmor...@fhcrc.org> wrote:
> On 08/24/2010 07:27 AM, Doran, Harold wrote:
>> There is the stringMatch function in the MiscPsycho package.
>>
>>> stringMatch('Hadley', 'Hadley Wickham', normalize = 'no')
>> [1] 8
>>> stringMatch('Hadley', 'Hadley Wickham', normalize = 'yes')
>> [1] 0.4285714
>>
>> It uses Levenshtein distance to tell you how much they differ by, either 
>> normalized or not. So, the above two tell you the first string differs from 
>> the second string by 8 insertions/deletions/substitutions. The second number 
>> normalizes the comparison such that 1 denotes perfect agreement and 2 
>> denotes imperfect agreement.
>>
>> Examples of an exact match are below.
>>
>>> stringMatch('Hadley Wickham', 'Hadley Wickham', normalize = 'yes')
>> [1] 1
>>> stringMatch('Hadley Wickham', 'Hadley Wickham', normalize = 'n')
>> [1] 0
>
> You're probably looking for something lighter weight, but Bioconductor
> Biostrings has pairwiseAlignment.
>
>> library(Biostrings)
>> pairwiseAlignment("Hadley Wickham", "Hadley Hamwick")
> Global PairwiseAlignedFixedSubject (1 of 1)
> pattern: [1] Hadley W---ick
> subject: [1] Hadley Hamwick
> score: 29.5102
>
>> pairwiseAlignment("Hadley Hamwick", "Hadley Wickham")
> Global PairwiseAlignedFixedSubject (1 of 1)
> pattern: [1] Hadley Hamwick
> subject: [1] Hadley W---ick
> score: 29.5102
>
>> aln <- pairwiseAlignment("Hadley Hamwick", "Haderley Hamwich")
>> consensusMatrix(aln)["-",]
>  [1] 0 0 0 1 1 0 0 0 0 0 0 0 0 0 0 0


Thanks all for the suggestions!

Hadley

-- 
Assistant Professor / Dobelman Family Junior Chair
Department of Statistics / Rice University
http://had.co.nz/

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Comparing/diffing strings

Reply via email to