Re: [R] Decomposing a List

William Dunlap Fri, 26 Apr 2013 10:10:26 -0700

You might add vapply() to you repertoire, as it is quicker than sapply but
also does some error checking on the your input data.  E.g., your f2 returns
a matrix whose columns are the elements of the list l and you assume that
there each element of l contains 2 character strings.
    f2 <- function(l)matrix(unlist(l),nr=2)
Here is a function based on vapply() the returns the same thing but also
verifies that element of l is really a 2-long character vector.
   f2v <- function (l) vapply(l, function(x) x, FUN.VALUE = character(2))
and a function to generate datasets of various sizes
   makeL <- 
function(n)strsplit(paste(sample(LETTERS,n,rep=TRUE),sample(1:10,n,rep=TRUE),sep="+"),"+",fix=TRUE)


Timing the functions on a million-long list I get
  > l <- makeL(n=10^6)
  > system.time( r2 <- f2(l) )
     user  system elapsed
    0.088   0.000   0.090
  > system.time( r2v <- f2v(l) )
     user  system elapsed
     0.92    0.00    0.92 
   > identical(r2, r2v)
   [1] TRUE
vapply() is ten times slower than unlist() but three times faster than 
sapply(x,function(x)x).   However,
when  you give it data that doesn't meet your expectations, which is common 
when using strsplit(),
f2v tells you about the problem and f2 gives you an incorrect result:
  > l[[10]] <- c("a","b","c","d")
  > system.time( r2v <- f2v(l) )
  Error in vapply(l, function(x) x, FUN.VALUE = character(2)) :
    values must be length 2,
   but FUN(X[[10]]) result is length 4
  Timing stopped at: 0.004 0 0.002
  > system.time( rv <- f2(l) )
     user  system elapsed
    0.088   0.008   0.095
  > dim(rv) # you will have alignment problems later
  [1]       2 1000001

Bill Dunlap
Spotfire, TIBCO Software
wdunlap tibco.com


> -----Original Message-----
> From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On 
> Behalf
> Of Bert Gunter
> Sent: Thursday, April 25, 2013 7:54 AM
> To: ted.hard...@wlandres.net
> Cc: R mailing list
> Subject: Re: [R] Decomposing a List
> 
> Well, what you really want to do is convert the list to a matrix, and
> it can be done directly and considerably faster than with the
> (implicit) looping of sapply:
> 
> f1 <- function(l)sapply(l,"[",1)
> f2 <- function(l)matrix(unlist(l),nr=2)
> l <-
> strsplit(paste(sample(LETTERS,1e6,rep=TRUE),sample(1:10,1e6,rep=TRUE),sep="+"),"+",f
> ix=TRUE)
> 
> ## Then you get these results:
> 
> > system.time(x1 <- f1(l))
>    user  system elapsed
>    1.92    0.01    1.95
> > system.time(x2 <- f2(l))
>    user  system elapsed
>    0.06    0.02    0.08
> > system.time(x2 <- f2(l)[1,])
>    user  system elapsed
>     0.1     0.0     0.1
> > identical(x1,x2)
> [1] TRUE
> 
> 
> Cheers,
> Bert
> 
> 
> 
> 
> 
> 
> On Thu, Apr 25, 2013 at 3:32 AM, Ted Harding <ted.hard...@wlandres.net> wrote:
> > Thanks, Jorge, that seems to work beautifully!
> > (Now to try to understand why ... but that's for later).
> > Ted.
> >
> > On 25-Apr-2013 10:21:29 Jorge I Velez wrote:
> >> Dear Dr. Harding,
> >>
> >> Try
> >>
> >> sapply(L, "[", 1)
> >> sapply(L, "[", 2)
> >>
> >> HTH,
> >> Jorge.-
> >>
> >>
> >>
> >> On Thu, Apr 25, 2013 at 8:16 PM, Ted Harding 
> >> <ted.hard...@wlandres.net>wrote:
> >>
> >>> Greetings!
> >>> For some reason I am not managing to work out how to do this
> >>> (in principle) simple task!
> >>>
> >>> As a result of applying strsplit() to a vector of character strings,
> >>> I have a long list L (N elements), where each element is a vector
> >>> of two character strings, like:
> >>>
> >>>   L[1] = c("A1","B1")
> >>>   L[2] = c("A2","B2")
> >>>   L[3] = c("A3","B3")
> >>>   [etc.]
> >>>
> >>> >From L, I wish to obtain (as directly as possible, e.g. avoiding
> >>> a loop) two vectors each of length N where one contains the strings
> >>> that are first in the pair, and the other contains the strings
> >>> which are second, i.e. from L (as above) I would want to extract:
> >>>
> >>>   V1 = c("A1","A2","A3",...)
> >>>   V2 = c("B1","B2","B3",...)
> >>>
> >>> Suggestions?
> >>>
> >>> With thanks,
> >>> Ted.
> >>>
> >>> -------------------------------------------------
> >>> E-Mail: (Ted Harding) <ted.hard...@wlandres.net>
> >>> Date: 25-Apr-2013  Time: 11:16:46
> >>> This message was sent by XFMail
> >>>
> >>> ______________________________________________
> >>> R-help@r-project.org mailing list
> >>> https://stat.ethz.ch/mailman/listinfo/r-help
> >>> PLEASE do read the posting guide
> >>> http://www.R-project.org/posting-guide.html
> >>> and provide commented, minimal, self-contained, reproducible code.
> >>>
> >
> > -------------------------------------------------
> > E-Mail: (Ted Harding) <ted.hard...@wlandres.net>
> > Date: 25-Apr-2013  Time: 11:31:57
> > This message was sent by XFMail
> >
> > ______________________________________________
> > R-help@r-project.org mailing list
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> 
> 
> 
> --
> 
> Bert Gunter
> Genentech Nonclinical Biostatistics
> 
> Internal Contact Info:
> Phone: 467-7374
> Website:
> http://pharmadevelopment.roche.com/index/pdb/pdb-functional-groups/pdb-
> biostatistics/pdb-ncb-home.htm
> 
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Decomposing a List

Reply via email to