You can study this yourself using the System.time() utility: just write
System.time() around any block of code and R will time it for you.

Offhand, I'd guess example2 may be ever so slightly quicker since it doesn't
have to create colA and colB, but not to a degree that would be noticeable
for reasonably sized data. More importantly, you should probably notice that
the examples give different output: one puts just the p.value of the t.test
in tt_pvalue while the other puts the entire t.test object. You probably
meant

Example2:
tt_pvalue [ i ] <- t.test ( temp[ , j ], temp[ , k ],
var.equal=TRUE)$p.value

If you are a beginner, I'd strongly suggest you wait the extra 3.2
milliseconds and use code like example one: it will be easier to debug.

In your second block of code, you wind up t-testing a column against itself
many times and you wind up deleting many of the p.values you store. Is this
actual code or are you more interested in how something would be vectorized?
If the first, write back and I'll talk to you about storing the results and
doing the tests in a logical manner.

If you are only interested from a coding efficiency point of view, the first
for loop over all the files is probably best replaced by

L =  lapply(files_to_test, read.table, header=TRUE, sep="\t")

This will create a list object L containing all of the file information:
List objects are basically R's way of sticking any combination of objects
together in one big "super-object" that can contain anything. (I'm sure the
code experts will want to correct me, but for a beginner I think that gives
sufficient intuition.)

Once you have everything in R you have a wealth of opportunities depending
on what you want to do: there's an open thread started by J. Bouldin on how
to do things columnwise over different objects most efficiently in R right
now that will hopefully get some good answers. Let me know if there's a
specific thing you want to wind up doing and I'll try to give you a hand: if
it's just a theoretical interest, keep an eye on the other thread.

Hope this helps,

Michael Weylandt


On Thu, Aug 4, 2011 at 11:19 PM, Matt Curcio <matt.curcio...@gmail.com>wrote:

> Greetings all,
> I am curious to know if either of these two sets of code is more efficient?
>
> Example1:
>  ## t-test ##
> colA <- temp [ , j ]
> colB <- temp [ , k ]
> ttr <- t.test ( colA, colB, var.equal=TRUE)
> tt_pvalue [ i ] <- ttr$p.value
>
> or
> Example2:
> tt_pvalue [ i ] <- t.test ( temp[ , j ], temp[ , k ], var.equal=TRUE)
> -------------
> I have three loops, i, j, k.
> One to test the all of <i> files in a directory.  One to tease out
> column <j> and compare it by means of t-test to column <k> in each of
> the files.
> ---------------
> for ( i in 1:num_files ) {
>   temp <- read.table ( files_to_test [ i ], header=TRUE, sep="\t")
>   num_cols <- ncol ( temp )
>   ## Define Columns To Compare ##
>   for ( j in 2 : num_cols ) {
>      for ( k in 3 : num_cols ) {
>          ## t-test ##
>          colA <- temp [ , j ]
>          colB <- temp [ , k ]
>          ttr <- t.test ( colA, colB, var.equal=TRUE)
>          tt_pvalue [ i ] <- ttr$p.value
>      }
>   }
> }
> --------------------------------
> I am a novice writer of code and am interested to hear if there are
> any (dis)advantages to one way or the other.
> M
>
>
> Matt Curcio
> M: 401-316-5358
> E: matt.curcio...@gmail.com
>
> ______________________________________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to