Re: [R] split strings

Allan Engelhardt Tue, 26 May 2009 23:37:39 -0700

Immaterial, yes, but it is always good to test :) and your solution *is*faster and it is even faster if you can assume byte strings:

> strings = sprintf('f:/foo/bar//%s.tif', replicate(1000,paste(sample(letters, 10), collapse='')))

> library(rbenchmark)
> benchmark(columns=c('test', 'elapsed'), replications=1000, order=NULL,
  'one-pass, perl'=sub('.*//(.*)[.]tif$', '\\1', strings, perl=TRUE),
  'two-pass, perl'=sub('.tif$', '', basename(strings), perl=TRUE),
  'one-pass, no perl'=sub('.*//(.*)[.]tif$', '\\1', strings, perl=FALSE),
  'two-pass, no perl'=sub('.tif$', '', basename(strings), perl=FALSE),
  'fixed'=sub(".tif", "", basename(strings), fixed=TRUE),

'fixed, bytes'=sub(".tif", "", basename(strings), fixed=TRUE,useBytes=TRUE))


              test elapsed
1    one-pass, perl   2.946
2    two-pass, perl   3.858
3 one-pass, no perl  15.884
4 two-pass, no perl   3.788
5             fixed   2.264
6      fixed, bytes   1.813

Allan

Gabor Grothendieck wrote:

Although speed is really immaterial here this is likely
to be faster than all shown so far:

sub(".tif", "", basename(metr_list), fixed = TRUE)

It does not allow file names with .tif in the middle
of them since it will delete the first occurrence rather
than the last but such a situation is highly unlikely.


On Tue, May 26, 2009 at 4:24 PM, Wacek Kusnierczyk
<waclaw.marcin.kusnierc...@idi.ntnu.no> wrote:

Monica Pisica wrote:

Hi everybody,

Thank you for the suggestions and especially the explanation Waclaw provided 
for his code. Maybe one day i will be able to wrap my head around this.

Thanks again,

you're welcome.  note that if efficiency is an issue, you'd better have
perl=TRUE there:

   output = sub('.*//(.*)[.]tif$', '\\1', input, perl=TRUE)

with perl=TRUE, the one-pass solution is somewhat faster than the
two-pass solution of gabor's -- which, however, is probably easier to
understand;  with perl=FALSE (the default), the performance drops:

   strings = sprintf(
       'f:/foo/bar//%s.tif',
       replicate(1000, paste(sample(letters, 10), collapse='')))
   library(rbenchmark)
   benchmark(columns=c('test', 'elapsed'), replications=1000, order=NULL,
      'one-pass, perl'=sub('.*//(.*)[.]tif$', '\\1', strings, perl=TRUE),
      'two-pass, perl'=sub('.tif$', '', basename(strings), perl=TRUE),
      'one-pass, no perl'=sub('.*//(.*)[.]tif$', '\\1', strings,
perl=FALSE),
      'two-pass, no perl'=sub('.tif$', '', basename(strings), perl=FALSE))
   # 1    one-pass, perl   3.391
   # 2    two-pass, perl   4.944
   # 3 one-pass, no perl  18.836
   # 4 two-pass, no perl   5.191

vQ

Monica

----------------------------------------

Date: Tue, 26 May 2009 15:46:21 +0200
From: waclaw.marcin.kusnierc...@idi.ntnu.no
To: pisican...@hotmail.com
CC: r-help@r-project.org
Subject: Re: [R] split strings

Monica Pisica wrote:

Hi everybody,

I have a vector of characters and i would like to extract certain parts. My 
vector is named metr_list:

[1] "F:/Naval_Live_Oaks/2005/data//BE.tif"
[2] "F:/Naval_Live_Oaks/2005/data//CH.tif"
[3] "F:/Naval_Live_Oaks/2005/data//CRR.tif"
[4] "F:/Naval_Live_Oaks/2005/data//HOME.tif"

And i would like to extract BE, CH, CRR, and HOME in a different vector named 
"names.id"

one way that seems reasonable is to use sub:

output = sub('.*//(.*)[.]tif$', '\\1', input)

which says 'from each string remember the substring between the
rigthmost two slashes and a .tif extension, exclusive, and replace the
whole thing with the captured part'. if the pattern does not match, you
get the original input:

sub('.*//(.*)[.]tif$', '\\1', 'f:/foo/bar//buz.tif')
# buz

vQ

_________________________________________________________________


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] split strings

Reply via email to