Thank you Greg for the insights! I agree with you that the decrease in speed is not worth the decrease in readability, and I'll change my length() calls to ncol().
Best, Ivan -- Dr. Ivan Calandra TraCEr, laboratory for Traceology and Controlled Experiments MONREPOS Archaeological Research Centre and Museum for Human Behavioural Evolution Schloss Monrepos 56567 Neuwied, Germany +49 (0) 2631 9772-243 https://www.researchgate.net/profile/Ivan_Calandra On 03/04/2020 17:45, Greg Snow wrote: > As others have pointed out, ncol calls the length function, so you are > pretty safe in terms of output of getting the same result when applied > to the results of functions like read.csv (there will be a big > difference if you ever apply those functions to a matrix or some other > data structures). > > One thing that I have not seen yet is a comparison on timing, so here goes: > >> library(microbenchmark) >> microbenchmark( > + length = length(iris), > + ncol = ncol(iris) > + ) > Unit: nanoseconds > expr min lq mean median uq max neval > length 700 750 869 800 800 7400 100 > ncol 2400 2500 2981 2600 2700 31900 100 > > So ncol takes about 3 times as long to run as length on the iris data > frame (5 columns), you can rerun the above code with data frames more > the size that you will be using to see if that makes any difference. > But also notice that the units are nanoseconds, so the median time for > ncol to run is less than the time it takes light to travel a kilometer > in a vacuum, or about the time it takes light to go 1/3 of a mile > through a fiber optic cable (en.wikipedia.org/wiki/Microsecond). If > this is used as part of a simulation or other repeated procedure and > it is done one million times then you will add about 2 seconds to the > overall run. If this is just part of code where length/ncol will be > called fewer than 10 times then nobody is going to notice. > > So the trade-off of moving from length to ncol is a slight decrease in > speed for an increase of readability. I think that I would go with > the readability myself. > > On Tue, Mar 31, 2020 at 8:11 AM Ivan Calandra <calan...@rgzm.de> wrote: >> Thanks Ivan for the answer. >> >> So it confirms my first thought that these two functions are equivalent >> when applied to a "simple" data.frame. >> >> The reason I was asking is because I have gotten used to use length() in >> my scripts. It works perfectly and I understand it easily. But to be >> honest, ncol() is more intuitive to most users (especially the novice) >> so I was thinking about switching to using this function instead (all my >> data.frames are created from read.csv() or similar functions so there >> should not be any issue). But before doing that, I want to be sure that >> it is not going to create unexpected results. >> >> Thank you, >> Ivan >> >> -- >> Dr. Ivan Calandra >> TraCEr, laboratory for Traceology and Controlled Experiments >> MONREPOS Archaeological Research Centre and >> Museum for Human Behavioural Evolution >> Schloss Monrepos >> 56567 Neuwied, Germany >> +49 (0) 2631 9772-243 >> https://www.researchgate.net/profile/Ivan_Calandra >> >> On 31/03/2020 16:00, Ivan Krylov wrote: >>> On Tue, 31 Mar 2020 14:47:54 +0200 >>> Ivan Calandra <calan...@rgzm.de> wrote: >>> >>>> On a simple data.frame (i.e. each element is a vector), ncol() and >>>> length() will give the same result. >>>> Are they just equivalent on such objects, or are they differences in >>>> some cases? >>> I am not aware of any exceptions to ncol(dataframe)==length(dataframe) >>> (in fact, ncol(x) is dim(x)[2L] and ?dim says that dim(dataframe) >>> returns c(length(attr(dataframe, 'row.names')), length(dataframe))), but >>> watch out for AsIs columns which can have columns of their own: >>> >>> x <- data.frame(I(volcano)) >>> dim(x) >>> # [1] 87 1 >>> length(x) >>> # [1] 1 >>> dim(x[,1]) >>> # [1] 87 61 >>> >>> >> ______________________________________________ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.