[R] Matching a period in grep...

2008-08-05 Thread Alec.Zwart
Hi folks, 

Can anyone enlighten me as to why I get the following when I search for
".csv" at the end of a string?

> grep("\.csv$","Blah.csv",value=TRUE)

[1] "Blah.csv"
Warning messages:
1: '\.' is an unrecognized escape in a character string 
2: unrecognized escape removed from "[\.]csv$"


R reference for regular expressions says

"Any metacharacter with special meaning may be quoted by preceding it
 with a backslash. [...] The metacharacters in EREs are . \ | ( ) [ { ^
$ * + ?" 

Am I missing something here?  If "\." is not the right way to match a
period, can anyone tell me what is?  I can't find anything on this in R
reference...

I'm using R 2.6 on Windows XP

Thanks, 

Alec Zwart
CMIS CSIRO
[EMAIL PROTECTED] 

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] One-to-one matching?

2008-06-22 Thread Alec.Zwart
Hi folks, 

Can anyone suggest an efficient way to do "matching without
replacement", or "one-to-one matching"?  pmatch() doesn't quite provide
what I need...

For example, 

lookupTable <- c("a","b","c","d","e","f")
matchSample <- c("a","a","b","d")
##Normal match() behaviour:
match(matchSample,lookupTable)
[1] 1 1 2 4

My problem here is that both "a"s in matchSample are matched to the same
"a" in the lookup table.  I need the elements of the lookup table to be
excluded from the table as they are matched, so that no match can be
found for the second "a".  

Function pmatch() comes close to what I need:

pmatch(matchSample,lookupTable)
[1] 1 NA 2 4

Yep!  However, pmatch() incorporates partial matching, which I
definitely don't want:

lookupTable <- c("a","b","c","d","e","f") 
matchSample <- c("a","a","b","d")
pmatch(matchSample,lookupTable)
[1] 1 6 2 4
## i.e. the second "a", matches "f" - I don't want this.

Of course, when identical items ARE duplicated in both sample and lookup
table, I need the matching to reflect this:

lookupTable <- c("a","a","c","d","e","f")
matchSample <- c("a","a","c","d")
##Normal match() behaviour
match(matchSample,lookupTable)
[1] 1 1 3 4

No good - pmatch() is better:

lookupTable <- c("a","a","c","d","e","f")
matchSample <- c("a","a","c","d")
pmatch(matchSample,lookupTable)
[1] 1 2 3 4

...but we still have the partial matching issue...

##And of course, as per the usual behaviour of match(), sample elements
missing from the lookup table should return NA:

matchSample <- c("a","frog","e","d") ; print(matchSample)
match(matchSample,lookupTable)

Is there a nifty way to get what I'm after without resorting to a for
loop? (my code's already got too blasted many of those...)

Thanks, 

Alec Zwart
CMIS CSIRO
[EMAIL PROTECTED]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] One-to-one matching?

2008-06-23 Thread Alec.Zwart
My thanks to Gabor Grothendieck, Charles C. Berry and Moshe Olshansky
for their suggested solutions.

The upshot of which is that a nice one-line solution to my one-to-one
exact matching problem is the Grothendieck-Berry collaboration of

   match(make.unique(matchSample), make.unique(lookupTable))

I've settled on this particular solution as it appears to be the fastest
of the three possibilities given, although Moshe's solution comes a
close second :-)

Many thanks...

Alec 


On Sun, Jun 22, 2008 at 10:57 PM,  <[EMAIL PROTECTED]> wrote:
> Hi folks,
>
> Can anyone suggest an efficient way to do "matching without 
> replacement", or "one-to-one matching"?  pmatch() doesn't quite 
> provide what I need...
>
> For example,
>
> lookupTable <- c("a","b","c","d","e","f")
> matchSample <- c("a","a","b","d")
> ##Normal match() behaviour:
> match(matchSample,lookupTable)
> [1] 1 1 2 4
>
> My problem here is that both "a"s in matchSample are matched to the 
> same "a" in the lookup table.  I need the elements of the lookup table

> to be excluded from the table as they are matched, so that no match 
> can be found for the second "a".
>
> Function pmatch() comes close to what I need:
>
> pmatch(matchSample,lookupTable)
> [1] 1 NA 2 4
>
> Yep!  However, pmatch() incorporates partial matching, which I 
> definitely don't want:
>
> lookupTable <- c("a","b","c","d","e","f")
> matchSample <- c("a","a","b","d")
> pmatch(matchSample,lookupTable)
> [1] 1 6 2 4
> ## i.e. the second "a", matches "f" - I don't want this.
>
> Of course, when identical items ARE duplicated in both sample and 
> lookup table, I need the matching to reflect this:
>
> lookupTable <- c("a","a","c","d","e","f")
> matchSample <- c("a","a","c","d")
> ##Normal match() behaviour
> match(matchSample,lookupTable)
> [1] 1 1 3 4
>
> No good - pmatch() is better:
>
> lookupTable <- c("a","a","c","d","e","f")
> matchSample <- c("a","a","c","d")
> pmatch(matchSample,lookupTable)
> [1] 1 2 3 4
>
> ...but we still have the partial matching issue...
>
> ##And of course, as per the usual behaviour of match(), sample 
> elements missing from the lookup table should return NA:
>
> matchSample <- c("a","frog","e","d") ; print(matchSample)
> match(matchSample,lookupTable)
>
> Is there a nifty way to get what I'm after without resorting to a for 
> loop? (my code's already got too blasted many of those...)
>
> Thanks,
>
> Alec Zwart
> CMIS CSIRO
> [EMAIL PROTECTED]
>
> __
> R-help@r-project.org mailing list 
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide 
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.