[R] Splitting dataset for Tuning Parameter with Cross Validation
Hi, My question might be a little general. I have a number of values to select for the complexity parameters in some classifier, e.g. the C and gamma in SVM with RBF kernel. The selection is based on which values give the smallest cross validation error. I wonder if the randomized splitting of the available dataset into folds is done only once for all those choices for the parameter values, or once for each choice? And why? Thanks and regards! __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Splitting dataset for Tuning Parameter with Cross Validation
Seems to me if splitting once for all the bias will be big and if splitting once for each choice of parameters the variance ill be big. In LibSVM, for each choice of (c, gamma), the searching script grid.py calls svm_cross_validation() which has a random split of the dataset. So seems to me it is the second method. As to the first one, I come to it in Ch 7 Section 10 of "The Elements of Statistical Learning" by Hastie where it says first split the dataset, then evaluate validation error CV(alpha) and vary the complexity parameter value alpha to find the one giving smallest validation error. It appears to me the splitting is once for all choices of the complexity parameter. Thanks! --- On Sun, 7/12/09, Tim wrote: > From: Tim > Subject: [R] Splitting dataset for Tuning Parameter with Cross Validation > To: r-h...@stat.math.ethz.ch > Date: Sunday, July 12, 2009, 6:58 PM > > Hi, > My question might be a little general. > > I have a number of values to select for the complexity > parameters in some classifier, e.g. the C and gamma in SVM > with RBF kernel. The selection is based on which values give > the smallest cross validation error. > > I wonder if the randomized splitting of the available > dataset into folds is done only once for all those choices > for the parameter values, or once for each choice? And why? > > Thanks and regards! > > __ > R-help@r-project.org > mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, > reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Get (feature, threshold) from Output of rpart() for Stump Tree
Hi, I have a question regarding how to get some partial information from the output of rpart, which could be used as the first argument to predict. For example, in my code, I try to learn a stump tree (decision tree of depth 2): "fit <- rpart(y~bx, weights = w/mean(w), control = cntrl) print(fit) btest[1,] <- predict(fit, newdata = data.frame(bx)) " I found that "fit" is of mode "list" and length 12. If I "print(fit)", I will get as output: "n= 124 node), split, n, deviance, yval * denotes terminal node 1) root 124 61.54839 0.7096774 2) bx.21< 13.5 41 40.39024 0.1219512 * 3) bx.21>=13.5 83 0.0 1.000 *" I don't want the whole output of "print(fit)" but only the two kinds of info in it: "21" in "bx.21", which I believe to be the feature ID of the stump tree , and 13.5, which I believe to be the threshold on the feature. If I am able to get these two out, then I will be able to further process them or write them into a file. Any hint? Thanks and regards! -Tim [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Get (feature, threshold) from Output of rpart() for Stump Tree
Thank you so much! It seems that fit$splits[1,] does not contain the feature ID: "> fit$splits[1,] count ncat improve index adj 124.000 -1.000 0.3437644 13.500 0.000 " However help(rpart.object) says: " splits: a matrix describing the splits. The row label is the name of the split variable,..." I try to get the row label of fit$splits[1,] by "> names(fit$splits[1,]) [1] "count" "ncat" "improve" "index" "adj" " However it has no feature ID. Is this the correct way to get the row label of fit$splits[1,]? Regards, - Tim --- On Fri, 5/8/09, Terry Therneau wrote: From: Terry Therneau Subject: Re: Get (feature, threshold) from Output of rpart() for Stump Tree To: "Tim" Cc: r-help@r-project.org Date: Friday, May 8, 2009, 8:05 AM --- begin included message -- Hi, I have a question regarding how to get some partial information from the output of rpart, which could be used as the first argument to predict. For example, in my code, I try to learn a stump tree (decision tree of depth 2): "fit <- rpart(y~bx, weights = w/mean(w), control = cntrl) --- end inclusion --- 1. For stump trees, you can use the depth option in rpart.control to get a small tree. You also might want to set maxsurrogate=0 for speed. 2. Try help(rpart.object) for more information on what is contained in the returned rpart object. In your case fit$splits[1,] would contain all that you need. Terry T. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ROCR: auc and logarithm plot
Hi, I am quite new to R and I have two questions regarding ROCR. 1. I have tried to understand how to extract area-under-curve value by looking at the ROCR document and googling. Still I am not sure if I am doing the right thing. Here is my code, is "auc1" the auc value? " pred1 <- prediction(resp1,label1) perf1 <- performance(pred1,"tpr","fpr") plot( perf1, type="l",col=1 ) auc1 <- performance(pred1,"auc") auc1 <- a...@y.values[[2]] " 2. I have to compare two models that have very close ROCs. I'd like to have a more distinguishable plot of the ROCs. So is it possible to have a logarithm FP axis which might probably separate them well? Or zoom in the part close to the leftup corner of ROC plot? Or any other ways to make the ROCs more separate? Thanks and regards! --Tim [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ROCR: auc and logarithm plot
Thanks Tobias! A new question: if I want to draw an average ROC from cross-validation, how to make the bar color same as the line color? Here is my code: "plot( perf2,avg="threshold",lty=2,col=2, spread.estimate="stddev",barcol=2)" Even I specify "barcol=2", the color of bars are still black, the default one, instead of red "2". --Tim --- On Tue, 5/12/09, Tobias Sing wrote: From: Tobias Sing Subject: Re: [R] ROCR: auc and logarithm plot To: timlee...@yahoo.com, r-help@r-project.org Date: Tuesday, May 12, 2009, 5:54 AM > 1. I have tried to understand how to extract area-under-curve value by looking at the ROCR document and googling. Still I am not sure if I am doing the right thing. Here is my code, is "auc1" the auc value? > " > pred1 <- prediction(resp1,label1) > > perf1 <- performance(pred1,"tpr","fpr") > plot( perf1, type="l",col=1 ) > > auc1 <- performance(pred1,"auc") > auc1 <- a...@y.values[[2]] > " If you have only one set of predictions and matching class labels, it would be in a...@y.values[[1]]. If you have multiple sets (as from cross-validation or bootstrapping), the AUCs would be in a...@y.values[[1]], a...@y.values[[2]], etc. You can collect all of them for example by unlist(p...@y.values). Btw, you can use str(auc1) to see the structure of objects. > 2. I have to compare two models that have very close ROCs. I'd like to have a more distinguishable plot of the ROCs. So is it possible to have a logarithm FP axis which might probably separate them well? Or zoom in the part close to the leftup corner of ROC plot? Or any other ways to make the ROCs more separate? To "zoom in" to a specific part: plot(perf1, xlim=c(0,0.2), ylim=c(0.7,1)) plot(perf2, add=TRUE, lty=2, col='red') If you want logarithmic axes (though I wouldn't personally do this for a ROC plot), you can set up an empty canvas and add ROC curves to it: plot(1,1, log='x', xlim=c(0.001,1), ylim=c(0,1), type='n') plot(perf, add=TRUE) You can adjust all components of the performance plots. See ?plot.performance and the examples in this slide deck: http://rocr.bioinf.mpi-sb.mpg.de/ROCR_Talk_Tobias_Sing.ppt Hope that helps, Tobias [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] #Keeping row names when using as.data.frame.matrix
#question I have the following data set: Date<-c("9/7/2010","9/7/2010","9/7/2010","9/7/2010","9/7/2010","9/7/2010","9/8/2010") EstimatedQuantity<-c(3535,2772,3279,3411,3484,3274,3305) ScowNo<-c("4001","3002","4002","BR 8","4002","BR 8","4001") dataset<- data.frame(EstimatedQuantity,Date,ScowNo) #I'm trying to convert the data set into a contingency table and then back into a regular data frame: xtabdata<-as.data.frame.matrix(xtabs(EstimatedQuantity~Date+ScowNo,data=dataset), row.names=(dataset$Date),optional=F) #I'm trying to keep the row names (in xtabsdata) as the dates. #But the row names keep coming up as integers. #How can I preserve the row names as dates when #the table is converted back to a data frame? -- View this message in context: http://r.789695.n4.nabble.com/Keeping-row-names-when-using-as-data-frame-matrix-tp4667344.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] need help reshaping table using aggregate
I am trying to learn how to reshape my data set. I am new to R, so please bear with me. Basically, I have the following data set: site<-c("A","A","B","B") bug<-c("spider","grasshopper","ladybug","stinkbug") count<-c(2,4,6,8) myf <- data.frame(site, bug, count) myf site bug count 1A spider 2 2A grasshopper 4 3B ladybug 6 4Bstinkbug 8 This means that in site A, I found 2 spiders and 4 grasshopper. In site B, I found 6 ladybugs and 8 stinkbugs. I would like to change the df to aggregate the site column and make the bugs columns so it arranged like this: site spider grasshopper ladybug stinkbug 1A 2 4 00 2B 0 0 68 -- View this message in context: http://r.789695.n4.nabble.com/need-help-reshaping-table-using-aggregate-tp4634014.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Installation: not creating necessary directories
I have tried installing R on a web server on which I have a user account but not root access. I checked and the PERL, Fortran, etc. prerequisites all seem in order. The compiling of R with: % ./configure --with-x=no This works fine without errors. I try a "make check", however, and soon get an error as it cannot find files which should have been made. Most of the files seem to have come through though. ... collecting examples for package 'base' ... make[5]: Entering directory `/home/USERACCOUNT/mybin/R-2.8.0/src/library' /bin/sh: ../../bin/R: No such file or directory make[5]: *** [Rdfiles] Error 127 make[5]: Leaving directory `/home/USERACCOUNT/mybin/R-2.8.0/src/library' file ../../library/base/R-ex cannot be opened at ../../share/perl/massage-Examples.pl line 136. ... Is there any advice on what might be happening or on what I might need to do? Thanks, Tim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] CRAN I-MR / Xbar-R / Xbar-S control chart package ?
Hi, I've had a quick look through the package list, and unless I've missed something, I can't seem to find anything that will do I-MR / Xbar-R / Xbar-S control charts ? Assuming there is something out there, can anyone point me in the right direction ? Thanks ! TIm __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] CRAN I-MR / Xbar-R / Xbar-S control chart package ?
Interesting, thanks for that. I came accross qcc but my quick scan of the docs is that it only did xbars but maybe I need to re-read the docs, I guess it does the individual plot versions (I-MR) too. Tim On 8 July 2017 at 20:53, Rui Barradas wrote: > Hello, > > I have no experience with I-MR charts but a google search found package qcc. > Maybe it's what you're looking for. > > Hope this helps, > > Rui Barradas > > > Em 08-07-2017 09:07, Tim Smith escreveu: >> >> Hi, >> >> I've had a quick look through the package list, and unless I've missed >> something, I can't seem to find anything that will do I-MR / Xbar-R / >> Xbar-S control charts ? >> >> Assuming there is something out there, can anyone point me in the >> right direction ? >> >> Thanks ! >> >> TIm >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Interesting behavior of lm() with small, problematic data sets
I've recently come across the following results reported from the lm() function when applied to a particular type of admittedly difficult data. When working with small data sets (for instance 3 points) with the same response for different predicting variable, the resulting slope estimate is a reasonable approximation of the expected 0.0, but the p-value of that slope estimate is a surprising value. A reproducible example is included below, along with the output of the summary of results # example code x <- c(1,2,3) y <- c(1,1,1) #above results in{ (1,1) (2,1) (3,1)} data set to regress new.rez <- lm (y ~ x) # regress constant y on changing x) summary(new.rez) # display results of regression end of example code Results: Call: lm(formula = y ~ x) Residuals: 1 2 3 5.906e-17 -1.181e-16 5.906e-17 Coefficients: Estimate Std. Errort value Pr(>|t|) (Intercept) 1.000e+00 2.210e-16 4.525e+15 <2e-16 *** x -1.772e-16 1.023e-16 -1.732e+000.333 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 1.447e-16 on 1 degrees of freedom Multiple R-squared: 0.7794,Adjusted R-squared: 0.5589 F-statistic: 3.534 on 1 and 1 DF, p-value: 0.3112 Warning message: In summary.lm(new.rez) : essentially perfect fit: summary may be unreliable ## There is a warning that the summary may be unreliable sue to the essentially perfect fit, but a p-value of 0.3112 doesn’t seem reasonable. As a side note, the various r^2 values seem odd too. Tim Glover Senior Scientist II (Geochemistry, Statistics), Americas - Environment & Infrastructure, Amec Foster Wheeler 271 Mill Road, Chelmsford, Massachusetts, USA 01824-4105 T +01 978 692 9090 D +01 978 392 5383 M +01 850 445 5039 tim.glo...@amecfw.com amecfw.com This message is the property of Amec Foster Wheeler plc and/or its subsidiaries and/or affiliates and is intended only for the named recipient(s). Its contents (including any attachments) may be confidential, legally privileged or otherwise protected from disclosure by law. Unauthorised use, copying, distribution or disclosure of any of it may be unlawful and is strictly prohibited. We assume no responsibility to persons other than the intended named recipient(s) and do not accept liability for any errors or omissions which are a result of email transmission. If you have received this message in error, please notify us immediately by reply email to the sender and confirm that the original message and any attachments and copies have been destroyed and deleted from your system. If you do not wish to receive future unsolicited commercial electronic messages from us, please forward this email to: unsubscr...@amecfw.com and include “Unsubscribe” in the subject line. If applicable, you will continue to receive invoices, project communications and similar factual, non-commercial electronic communications. Please click http://amecfw.com/email-disclaimer for notices and company information in relation to emails originating in the UK, Italy or France. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] rtmvnorm {tmvtnorm} seems broken
General linear constraints don't seem to work. I get an error message if I have more constraint equations than variables. E.g. executing the following code print(R.version) library('tmvtnorm') cat('tmvtnorm version ') print(packageVersion('tmvtnorm')) ## Let's constrain our sample to the dwarfed hypercube of dimension p. p <- 3 # dimension mean <- rep(0,p) sigma <- diag(p) ## a <= Dx <= b a <- c(rep(0,p),-Inf) b <- c(rep(1,p),2) D <- rbind(diag(p),rep(1,p)) cat('mean is\n'); print(mean) cat('a is\n'); print(a) cat('b is\n'); print(b) cat('D is\n'); print(D) X <- rtmvnorm(n=1000, mean, sigma, D=D, lower=a, upper=b, algorithm="gibbsR") produces the following output platform x86_64-w64-mingw32 arch x86_64 os mingw32 system x86_64, mingw32 status major 3 minor 1.0 year 2014 month 04 day10 svn rev65387 language R version.string R version 3.1.0 (2014-04-10) nickname Spring Dance tmvtnorm version [1] '1.4.9' mean is [1] 0 0 0 a is [1]000 -Inf b is [1] 1 1 1 2 D is [,1] [,2] [,3] [1,]100 [2,]010 [3,]001 [4,]111 Error in checkTmvArgs(mean, sigma, lower, upper) (from rtmvnorm-test.R#18) : mean, lower and upper must have the same length That error message is not appropriate when a matrix of linear constraints is passed in. I emailed the package maintainer on the 3rd but received only an automatic out-of-office reply. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] read.table with missing data and consecutive delimiters
All, Assume we have data in an ASCII file that looks like Var1$Var2$Var3$Var4 1$2$3$4 2$$5 $$$6 When I execute read.table( 'test.dat', header=TRUE, sep='$' ) I, of course, receive the following error: Error in scan(file, what, nmax, sep, dec, quote, skip, nlines, na.strings, : line 2 did not have 4 elements When I set fill=TRUE, e.g., read.table( 'test.dat', header=TRUE, sep='$', fill=TRUE ) I get: Var1 Var2 Var3 Var4 11234 22 NA5 NA 3 NA NA NA6 What I need is Var1 Var2 Var3 Var4 11234 22 NA NA5 3 NA NA NA6 What am I missing? Thanks, Tim [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] problem with labeling plots, possibly in font defaults
Hello, I am using R 3.1.1 on a (four year old) MacBook, running OSX 10.9.4. I just tried making and labeling a plot as follows: > x<-rnorm(10) > y<-rnorm(10) > plot(x,y) > title(main="random points") which produces a scatter plot of the random points, but without the title and without any numbers along the axes. If I then run > par(family="sans") > plot(x,y,main="plot title") the plot has the title and the numbers on the axes (also and 'x' and 'y' appear as default labels for the axes). I do not know what is going on, but maybe there is some problem in the default font settings (I don't know if that could be an R issue or an issue specific to my Mac)? This is clearly not a big problem (at least for now) since I can put labels on plots by running par(), but if it is indicative of a larger underlying problem (or if there is a simple fix) I would like to know. Thank you! tb [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How can I overwrite a method in R?
How can I create an improved version of a method in R, and have it be used? Short version: I think plot.histogram has a bug, and I'd like to try a version with a fix. But when I call hist(), my fixed version doesn't get used. Long version: hist() calls plot() which calls plot.histogram() which fails to pass ... when it calls plot.window(). As a result hist() ignores xaxs and yaxs arguments. I'd like to make my own copy of plot.histogram that passes ... to plot.window(). If I just make my own copy of plot.histogram, plot() ignores it, because my version is not part of the same graphics package that plot belongs to. If I copy hist, hist.default and plot, the copies inherit the same environments as the originals, and behave the same. If I also change the environment of each to .GlobalEnv, hist.default fails in a .Call because it cannot find C_BinCount. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] How can I overwrite a method in R?
Thank you Duncan, Brian, Hadley, and Lin. In Lin's suggestion, I believe the latter two statements should be reversed, so that the environment is added before the function is placed into the graphics namespace. source('plot.histogram.R') environment(plot.histogram) <- asNamespace('graphics') assignInNamespace('plot.histogram', plot.histogram, ns='graphics') The middle statement could also be environment(plot.histogram) <- environment(graphics:::plot.histogram) The point is to ensure that the replacement version has the same environment as the original. Having tested this, I will now submit a bug report :-) On Thu, Oct 9, 2014 at 9:11 AM, C Lin wrote: > I posted similar question a while ago. Search for "modify function in a > package". > In your case, the following should work. > > source('plot.histogram.R') > assignInNamespace('plot.histogram',plot.histogram,ns='graphics') > environment(plot.histogram) <- asNamespace('graphics'); > > Assuming you have your own plot.histogram function inside > "plot.histogram.R" and the plot.histogram function you are trying to > overwrite is in graphics package. > > Lin > > > > From: h.wick...@gmail.com > > Date: Thu, 9 Oct 2014 07:00:31 -0500 > > To: timhesterb...@gmail.com > > CC: r-h...@stat.math.ethz.ch > > Subject: Re: [R] How can I overwrite a method in R? > > > > This is usually ill-advised, but I think it's the right solution for > > your problem: > > > > assignInNamespace("plot.histogram", function(...) plot(1:10), "graphics") > > hist(1:10) > > > > Haley > > > > On Thu, Oct 9, 2014 at 1:14 AM, Tim Hesterberg > wrote: > >> How can I create an improved version of a method in R, and have it be > used? > >> > >> Short version: > >> I think plot.histogram has a bug, and I'd like to try a version with a > fix. > >> But when I call hist(), my fixed version doesn't get used. > >> > >> Long version: > >> hist() calls plot() which calls plot.histogram() which fails to pass ... > >> when it calls plot.window(). > >> As a result hist() ignores xaxs and yaxs arguments. > >> I'd like to make my own copy of plot.histogram that passes ... to > >> plot.window(). > >> > >> If I just make my own copy of plot.histogram, plot() ignores it, > because my > >> version is not part of the same graphics package that plot belongs to. > >> > >> If I copy hist, hist.default and plot, the copies inherit the same > >> environments as > >> the originals, and behave the same. > >> > >> If I also change the environment of each to .GlobalEnv, hist.default > fails > >> in > >> a .Call because it cannot find C_BinCount. > >> > >> [[alternative HTML version deleted]] > >> > >> __ > >> R-help@r-project.org mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, reproducible code. > > > > > > > > -- > > http://had.co.nz/ > > > > __ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > -- Tim Hesterberg http://www.timhesterberg.net (resampling, water bottle rockets, computers to Costa Rica, hot shower = 2650 light bulbs, ...) Help your students understand statistics: Mathematical Statistics with Resampling and R, Chihara & Hesterberg http://www.timhesterberg.net/bootstrap/ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] big datasets for R
Another source of large datasets is the Public Data Sets on AWS http://aws.amazon.com/public-data-sets/ Tim Hoolihan @thoolihan http://linkedin.com/in/timhoolihan On Oct 28, 2014, at 7:00 AM, r-help-requ...@r-project.org wrote: > -- > > Message: 2 > Date: Mon, 27 Oct 2014 13:37:12 +0100 > From: Qiong Cai > To: r-help@r-project.org > Subject: [R] big datasets for R > Message-ID: > > Content-Type: text/plain; charset="UTF-8" > > Hi, > > Could anyone please tell me where I can find very big datasets for R? I'd > like to do some benchmarking on R by stressing R a lot. > > Thanks > Qiong > > [[alternative HTML version deleted]] > > > > -- [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Download showing as exploit
Hi , I work for the NHS, and our IT service has been unable to download as its anti-virus software says it contains an exploit. Is this normal? Is there a way around this? Kind regards, Tim Kingston Sent from my HTC [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problem installing/loading packages from Africa
I'm very new to R and I live in Mali, west Africa. I'm on *OS X 10.7.5*. I downloaded and installed *R 3.2.1*. I downloaded and installed *RStudio 00.99.893*. I ran through the free Microsoft data camp intro to R course, then started another free course through 'edX', for Data and Statistics for Life Sciences, using R. The first prompt in the course is to install.packages("swirl"). Copied below are the various error messages I get when trying to install or load any package. My best guess is that the problems I'm having are due to being in west Africa, with unreliable connections, weak connections and no CRAN Mirror closer than Italy or Spain (as far as I know). I checked into common package errors on the RStudio page, but I'm not confident enough in my computing to get into internet proxies and some of the other suggested troubleshooting. Any insight would be very helpful. See error messages below. Thanks -- Tim - When attempting to install, the print-out I get in the Console in RStudio is any length of: > install.packages("swirl") % Total% Received % Xferd Average Speed TimeTime Time Current Dload Upload Total SpentLeft Speed 0 00 00 0 0 0 --:--:-- 0:00:10 --:--:-- 0 0 00 00 0 0 0 --:--:-- 0:00:10 --:--:-- 0 0 00 00 0 0 0 --:--:-- 0:00:11 --:--:-- 0 22 207k 22 483840 0 3833 0 0:00:55 0:00:12 0:00:43 19740 30 207k 30 648230 0 4808 0 0:00:44 0:00:13 0:00:31 19578 38 207k 38 812070 0 5638 0 0:00:37 0:00:14 0:00:23 19193 46 207k 46 97591 0 0 6299 0 0:00:33 0:00:15 0:00:18 20076 53 207k 53 111k0 0 6966 0 0:00:30 0:00:16 0:00:14 22717 69 207k 69 143k0 0 8423 0 0:00:25 0:00:17 0:00:08 20487 76 207k 76 159k0 0 8821 0 0:00:24 0:00:18 0:00:06 19621 92 207k 92 191k0 0 10085 0 0:00:21 0:00:19 0:00:02 22841100 207k 100 207k0 0 10363 0 0:00:20 0:00:20 --:--:-- 230100 207k 100 207k0 0 10363 0 0:00:20 0:00:20 --:--:-- 23922 The downloaded binary packages are in /var/folders/yb/7z339kn56mdbwx92ydmsqswcgn/T//RtmpBzQ16u/downloaded_packages For other packages, ggplot2, forexample, a similar printout can run for hundreds and hundred of lines. So RStudio tells me that the packages are available to load, BUT: - when *loading* any package an error comes up saying package 'name of package' was built under R version 3.2.5 but info on swirl says it should work on any version of R above 3.0.2. I get similar errors for other packages (treemap, ggplot2, etc). - Or sometimes I"ll get this: Error : .onAttach failed in attachNamespace() for 'swirl', details: call: stri_c(..., sep = sep, collapse = collapse, ignore_null = TRUE) error: object 'C_stri_join' not found In addition: Warning message: package ‘swirl’ was built under R version 3.2.5 Error: package or namespace load failed for ‘swirl’ Any suggestions? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] [R-pkgs] CRAN updates: rpg and odeintr
rpg is a package for working with postgresql: https://github.com/thk686/rpg odeintr is a package for integrating differential equations: https://github.com/thk686/odeintr Cheers, THK http://www.keittlab.org/ [[alternative HTML version deleted]] ___ R-packages mailing list r-packa...@r-project.org https://stat.ethz.ch/mailman/listinfo/r-packages __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Coefficients of Logistic Regression from bootstrap - how to get them?
I'll address the question of whether you can use the bootstrap to improve estimates, and whether you can use the bootstrap to "virtually increase the size of the sample". Short answer - no, with some exceptions (bumping / Random Forests). Longer answer: Suppose you have data (x1, ..., xn) and a statistic ThetaHat, that you take a number of bootstrap samples (all of size n) and let ThetaHatBar be the average of those bootstrap statistics from those samples. Is ThetaHatBar better than ThetaHat? Usually not. Usually it is worse. You have not collected any new data, you are just using the existing data in a different way, that is usually harmful: * If the statistic is the sample mean, all this does is to add some noise to the estimate * If the statistic is nonlinear, this gives an estimate that has roughly double the bias, without improving the variance. What are the exceptions? The prime example is tree models (random forests) - taking bootstrap averages helps smooth out the discontinuities in tree models. For a simple example, suppose that a simple linear regression model really holds: y = beta x + epsilon but that you fit a tree model; the tree model predictions are a step function. If you bootstrap the data, the boundaries of the step function will differ from one sample to another, so the average of the bootstrap samples smears out the steps, getting closer to the smooth linear relationship. Aside from such exceptions, the bootstrap is used for inference (bias, standard error, confidence intervals), not improving on ThetaHat. Tim Hesterberg >Hi Doran, > >Maybe I am wrong, but I think bootstrap is a general resampling method which >can be used for different purposes...Usually it works well when you do not >have a presentative sample set (maybe with limited number of samples). >Therefore, I am positive with Michal... > >P.S., overfitting, in my opinion, is used to depict when you got a model >which is quite specific for the training dataset but cannot be generalized >with new samples.. > >Thanks, > >--Jerry >2008/7/21 Doran, Harold <[EMAIL PROTECTED]>: > >> > I used bootstrap to virtually increase the size of my >> > dataset, it should result in estimates more close to that >> > from the population - isn't it the purpose of bootstrap? >> >> No, not really. The bootstrap is a resampling method for variance >> estimation. It is often used when there is not an easy way, or a closed >> form expression, for estimating the sampling variance of a statistic. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Vista problem -- can't type commands at prompt
Hi All - I recently moved to Vista and reinstalled R. I am able to run R as I typically do (R.exe from the command prompt), and it can work well. However, if I switch windows to, say, firefox or excel or anything else, when I return to the R prompt it no longer works. I am able to use the up and down arrow keys to access previous commands, but no other key stroke has any impact. As long as I leave the R window active, it continues to work as expected. I have tried using the batchfiles "el R" method which doesn't work. I have disabled UAC and am an admin on this machine. thanks for any help you might be able to provide. tim -- Tim Calkins 0406 753 997 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Question about type conversion in read.table with columns that contain "+" and "-" in R > 2.7
Somewhere in between the R-Versions 2.6 and 2.7 the behaviour of the function type.convert and therefore also read.table, read.csv, etc. has changed (see below): In 2.6 and before: > type.convert(c("+", "-", "+")) [1] + - + Levels: + - In 2.7 and later: > type.convert(c("+", "-", "+")) [1] 0 0 0 Apparently, the character strings "+" and "-" are now interpreted as numeric and not any more as factors or character strings. I have quite a number of files with columns that contain "+" or "-" and would like to convert these to characters or factors, without having to specify the individual column types manually. Is there any way to still do so in a new version of R? Many thanks and best wishes, Tim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Counting the number of non-NA values per day
I have a long dataframe ("pollution") that contains a column of hourly date information ("date") and a column of pollution measurements ("pol") I have been happily calculating daily means and daily maximums using the aggregate function DMEANpollution = aggregate(pollution["pol"], format(pollution["date"],"%Y-%j"), mean, na.rm = TRUE) DMAXpollution = aggregate(pollution["pol"], format(pollution["date"],"%Y-%j"), max, na.rm = TRUE) However, I also need to count the number of valid measurements for each day to check that the mean and max are likely to be valid (for example I need at least 18 hourly measurements to calculate a valid daily mean) Try as I might I have not found a simple way of doing this. Can anybody help please? Many thanks, Tim. -- __ Dr Tim Chatterton Senior Research Fellow Air Quality Management Resource Centre Faculty of Environment and Technology University of the West of England Frenchay Campus Bristol BS16 1QY Tel: 0117 328 2929 Fax: 0117 328 3360 Email: tim.chatter...@uwe.ac.uk __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] REMOVE ME
This mailing list is too intrusive. Remove my name. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Finding minimum of time subset
Dear List, I have a data frame of data taken every few seconds. I would like to subset the data to retain only the data taken on the quarter hour, and as close to the quarter hour as possible. So far I have figured out how to subset the data to the quarter hour, but not how to keep only the minimum time for each quarter hour. For example: mytime<-c("12:00:00","12:00:05","12:15:05","12:15:06","12:20:00","12:30:01","12:45:01","13:00:00","13:15:02") subtime<-grep(pattern="[[:digit:]]+[[:punct:]]00[[:punct:]][[:digit:]]+|[[:digit:]]+[[:punct:]]15[[:punct:]][[:digit:]]+|[[:digit:]]+[[:punct:]]30[[:punct:]][[:digit:]]+|[[:digit:]]+[[:punct:]]45[[:punct:]][[:digit:]]+",mytime) mytime[subtime] [1] "12:00:00" "12:00:05" "12:15:05" "12:15:06" "12:30:01" "12:45:01" "13:00:00" "13:15:02" This gives me the data taken at quarter hour intervals (removes 12:20:00) but I am still left with multiple values at the quarter hours. I would like to obtain: "12:00:00" "12:15:05" "12:30:01" "12:45:01" "13:00:00" "13:15:02" Thanks! Tim Tim Clark Department of Zoology University of Hawaii __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Finding minimum of time subset
Jim, That works great! However, would you please explain what the '[' and the 1 do in the sapply function? I understand that you are cutting x by quarter, then creating a list of x that is split based on those cuts. I just don't understand what "[" means in this contex, or what the number one at the end does. Thanks for you help, Tim Tim Clark Department of Zoology University of Hawaii --- On Fri, 8/14/09, jim holtman wrote: > From: jim holtman > Subject: Re: [R] Finding minimum of time subset > To: "Tim Clark" > Cc: r-help@r-project.org > Date: Friday, August 14, 2009, 6:18 AM > Here is one way to do it: > > > > mytime<-c("12:00:00","12:00:05","12:15:05","12:15:06","12:20:00","12:30:01","12:45:01","13:00:00","13:15:02") > > # you might want a date on your data > > x <- as.POSIXct(mytime, format="%H:%M:%S") > > # create quarter hour intervals for the data range > > quarter <- seq(trunc(min(x), 'days'), trunc(max(x) > + 86400, 'days'), by='15 min') # add 86400 to add a day for > truncation > > # cut the data by quarter hours and then take the > first value in each group > > x.s <- sapply(split(x, cut(x, breaks=quarter), > drop=TRUE), '[', 1) > > # lost the 'class' for some reason; put it back > > class(x.s) <- c("POSIXt", "POSIXct") > > # the answer > > x.s > 2009-08-14 12:00:00 > 2009-08-14 12:15:00 > 2009-08-14 > 12:30:00 2009-08-14 > 12:45:00 2009-08-14 13:00:00 > "2009-08-14 12:00:00 EDT" "2009-08-14 12:15:05 EDT" > "2009-08-14 > 12:30:01 EDT" "2009-08-14 12:45:01 EDT" "2009-08-14 > 13:00:00 EDT" > 2009-08-14 13:15:00 > "2009-08-14 13:15:02 EDT" > > > > > On Thu, Aug 13, 2009 at 4:10 PM, Tim Clark > wrote: > > Dear List, > > > > I have a data frame of data taken every few seconds. > I would like to subset the data to retain only the data > taken on the quarter hour, and as close to the quarter hour > as possible. So far I have figured out how to subset the > data to the quarter hour, but not how to keep only the > minimum time for each quarter hour. > > > > For example: > > > mytime<-c("12:00:00","12:00:05","12:15:05","12:15:06","12:20:00","12:30:01","12:45:01","13:00:00","13:15:02") > > > subtime<-grep(pattern="[[:digit:]]+[[:punct:]]00[[:punct:]][[:digit:]]+|[[:digit:]]+[[:punct:]]15[[:punct:]][[:digit:]]+|[[:digit:]]+[[:punct:]]30[[:punct:]][[:digit:]]+|[[:digit:]]+[[:punct:]]45[[:punct:]][[:digit:]]+",mytime) > > mytime[subtime] > > > > [1] "12:00:00" "12:00:05" "12:15:05" "12:15:06" > "12:30:01" "12:45:01" "13:00:00" "13:15:02" > > > > This gives me the data taken at quarter hour intervals > (removes 12:20:00) but I am still left with multiple values > at the quarter hours. > > > > I would like to obtain: > > > > "12:00:00" "12:15:05" "12:30:01" "12:45:01" "13:00:00" > "13:15:02" > > > > Thanks! > > > > Tim > > > > > > > > > > Tim Clark > > Department of Zoology > > University of Hawaii > > > > __ > > R-help@r-project.org > mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, > reproducible code. > > > > > > -- > Jim Holtman > Cincinnati, OH > +1 513 646 9390 > > What is the problem that you are trying to solve? > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Finding minimum of time subset
Jim, Got it! Thanks for the explanation and the example. Always nice to learn new tricks on R. Aloha, Tim Tim Clark Department of Zoology University of Hawaii --- On Fri, 8/14/09, jim holtman wrote: > From: jim holtman > Subject: Re: [R] Finding minimum of time subset > To: "Tim Clark" > Cc: r-help@r-project.org > Date: Friday, August 14, 2009, 7:51 AM > sapply(mylist, '[', 1) > > is equivalent to > > sapply(mylist, function(x) x[1]) # select just the > first element > > "[" is an function that is called with a object and an > index. Using > it the way I did in the email was a shorthand way of doing > it. Here > is an example: > > > x <- list(1,2,3) > > x[1] > [[1]] > [1] 1 > > > `[`(x, 1) > [[1]] > [1] 1 > > Notice the function call `[`(x,1). This is what > is being done in the > sapply and passing the 1 as the second parameter. > > On Fri, Aug 14, 2009 at 1:30 PM, Tim Clark > wrote: > > Jim, > > > > That works great! However, would you please explain > what the '[' and the 1 do in the sapply function? I > understand that you are cutting x by quarter, then creating > a list of x that is split based on those cuts. I just > don't understand what "[" means in this contex, or what the > number one at the end does. > > > > Thanks for you help, > > > > Tim > > > > > > > > Tim Clark > > Department of Zoology > > University of Hawaii > > > > > > --- On Fri, 8/14/09, jim holtman > wrote: > > > >> From: jim holtman > >> Subject: Re: [R] Finding minimum of time subset > >> To: "Tim Clark" > >> Cc: r-help@r-project.org > >> Date: Friday, August 14, 2009, 6:18 AM > >> Here is one way to do it: > >> > >> > > >> > mytime<-c("12:00:00","12:00:05","12:15:05","12:15:06","12:20:00","12:30:01","12:45:01","13:00:00","13:15:02") > >> > # you might want a date on your data > >> > x <- as.POSIXct(mytime, > format="%H:%M:%S") > >> > # create quarter hour intervals for the data > range > >> > quarter <- seq(trunc(min(x), 'days'), > trunc(max(x) > >> + 86400, 'days'), by='15 min') # add 86400 to add > a day for > >> truncation > >> > # cut the data by quarter hours and then take > the > >> first value in each group > >> > x.s <- sapply(split(x, cut(x, > breaks=quarter), > >> drop=TRUE), '[', 1) > >> > # lost the 'class' for some reason; put it > back > >> > class(x.s) <- c("POSIXt", "POSIXct") > >> > # the answer > >> > x.s > >> 2009-08-14 12:00:00 > >> 2009-08-14 12:15:00 > >> 2009-08-14 > >> 12:30:00 2009-08-14 > >> 12:45:00 2009-08-14 13:00:00 > >> "2009-08-14 12:00:00 EDT" "2009-08-14 12:15:05 > EDT" > >> "2009-08-14 > >> 12:30:01 EDT" "2009-08-14 12:45:01 EDT" > "2009-08-14 > >> 13:00:00 EDT" > >> 2009-08-14 13:15:00 > >> "2009-08-14 13:15:02 EDT" > >> > > >> > >> > >> On Thu, Aug 13, 2009 at 4:10 PM, Tim Clark > >> wrote: > >> > Dear List, > >> > > >> > I have a data frame of data taken every few > seconds. > >> I would like to subset the data to retain only > the data > >> taken on the quarter hour, and as close to the > quarter hour > >> as possible. So far I have figured out how to > subset the > >> data to the quarter hour, but not how to keep only > the > >> minimum time for each quarter hour. > >> > > >> > For example: > >> > > >> > mytime<-c("12:00:00","12:00:05","12:15:05","12:15:06","12:20:00","12:30:01","12:45:01","13:00:00","13:15:02") > >> > > >> > subtime<-grep(pattern="[[:digit:]]+[[:punct:]]00[[:punct:]][[:digit:]]+|[[:digit:]]+[[:punct:]]15[[:punct:]][[:digit:]]+|[[:digit:]]+[[:punct:]]30[[:punct:]][[:digit:]]+|[[:digit:]]+[[:punct:]]45[[:punct:]][[:digit:]]+",mytime) > >> > mytime[subtime] > >> > > >> > [1] "12:00:00" "12:00:05" "12:15:05" > "12:15:06" >
Re: [R] Finding minimum of time subset
Thanks for everyones help and for the alternate ways of doing this. I am always amazed at how many solutions this list comes up with for things I get stuck on! It really helps us non-programmers learn R! Aloha, Tim Tim Clark Department of Zoology University of Hawaii --- On Fri, 8/14/09, Henrique Dallazuanna wrote: > From: Henrique Dallazuanna > Subject: Re: [R] Finding minimum of time subset > To: "Tim Clark" > Cc: r-help@r-project.org > Date: Friday, August 14, 2009, 7:19 AM > Try this also: > > times <- as.POSIXlt(mytime, format = > "%H:%M:%S") > subTimes <- times[times[['min']] %in% > c(0,15,30,45)] > format(subTimes[!duplicated(format(subTimes, > "%H:%M"))], "%H:%M:%S") > > > On Thu, Aug 13, 2009 at 5:10 PM, > Tim Clark > wrote: > > Dear List, > > > > I have a data frame of data taken every few seconds. I > would like to subset the data to retain only the data taken > on the quarter hour, and as close to the quarter hour as > possible. So far I have figured out how to subset the data > to the quarter hour, but not how to keep only the minimum > time for each quarter hour. > > > > > For example: > > mytime<-c("12:00:00","12:00:05","12:15:05","12:15:06","12:20:00","12:30:01","12:45:01","13:00:00","13:15:02") > > subtime<-grep(pattern="[[:digit:]]+[[:punct:]]00[[:punct:]][[:digit:]]+|[[:digit:]]+[[:punct:]]15[[:punct:]][[:digit:]]+|[[:digit:]]+[[:punct:]]30[[:punct:]][[:digit:]]+|[[:digit:]]+[[:punct:]]45[[:punct:]][[:digit:]]+",mytime) > > > mytime[subtime] > > > > [1] "12:00:00" "12:00:05" > "12:15:05" "12:15:06" > "12:30:01" "12:45:01" > "13:00:00" "13:15:02" > > > > This gives me the data taken at quarter hour intervals > (removes 12:20:00) but I am still left with multiple values > at the quarter hours. > > > > I would like to obtain: > > > > "12:00:00" "12:15:05" > "12:30:01" "12:45:01" > "13:00:00" "13:15:02" > > > > Thanks! > > > > Tim > > > > > > > > > > Tim Clark > > Department of Zoology > > University of Hawaii > > > > __ > > R-help@r-project.org > mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, > reproducible code. > > > > > -- > Henrique Dallazuanna > Curitiba-Paraná-Brasil > 25° 25' 40" S 49° 16' 22" O > > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Trying to rename spatial pts data frame slot that isn't a slot()
Dear List, I am analyzing the home range area of fish and seem to have lost the individuals ID names during my manipulations, and can't find out how to rename them. I calculated the MCP of the fish using mcp() in Adehabitat. MCP's were converted to spatial points data frame and exported to qGIS for manipulations. At this point the ID names were lost. I brought the manipulated shapefiles back into qGIS, but can't figure out how to rename the individuals. #Calculate MCP and save as a shapefile my.mcp<-mcp(xy, id=id, percent=100) spol<-area2spol(my.mcp) spdf <- SpatialPolygonsDataFrame(spol, data=data.frame +(getSpPPolygonsLabptSlots(spol), +row.names=getSpPPolygonsIDSlots(spol)), match.ID = TRUE) writeOGR(spdf,dsn=mcp.dir,layer="All Mantas MCP", driver="ESRI +Shapefile") #Read shapefile manipulated in qGIS mymcp<-readOGR(dsn=mcp.dir,layer="All mantas MCP land differenc") My spatial points data frame has a number of "Slot"s, including one that contained the original names called Slot "ID". However, I can not access this slot using slot() or slotNames(). > slotNames(mymcp) [1] "data""polygons""plotOrder" "bbox" "proj4string" What am I missing here? Is Slot "ID" not a slot? Can I export the ID's with the shapefiles to qGIS? Can I rename the ID's when I bring them back into R? When is a slot not a slot()? Thanks, TIm Tim Clark Department of Zoology University of Hawaii __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Zoomable graphs with multiple plots
Hi folks, I was wondering if anyone could confirm/deny whether there exists any kind of package to facilitate zoomable graphs with multiple plots (eg, plot(..) and then points(..)).I've tried zoom from IDPmisc, and iplot from the iplot and iplot extreme packages, but as far I can tell, neither can handle the task. Does anyone know anything else that might work? Or generally know different? Cheers, Tim. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Zoomable graphs with multiple plots
This is what I've done. Just capture two identifies and then replot. If the identifies are right left, I zoom out. Works quite well. Still, can't wait for iplot xtreme. Cheers, Tim. On Thu, Sep 3, 2009 at 5:03 AM, Jim Lemon wrote: > Tim Shephard wrote: >> >> Hi folks, >> >> I was wondering if anyone could confirm/deny whether there exists any >> kind of package to facilitate zoomable graphs with multiple plots (eg, >> plot(..) and then points(..)). I've tried zoom from IDPmisc, and >> iplot from the iplot and iplot extreme packages, but as far I can >> tell, neither can handle the task. >> >> Does anyone know anything else that might work? Or generally know >> different? >> > > Hi Tim, > zoomInPlot in the plotrix package just plots the same data twice with > different limits. You could use the same strategy, except instead of passing > the coordinates to the function, write a similar function that accepts a > list of plotting commands to be evaluated in each plot. > > Jim > > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Zoomable graphs with multiple plots
On Thu, Sep 3, 2009 at 1:25 PM, Jim Porzak wrote: > Tim, > > I've had success (& user acceptance) simply plotting to a .pdf & > passing zoom functionality to Acrobat, or whatever. > > Worked especially well with large US map with a lot of fine print annotation. > > Of, course, will not replot axes more appropriate for zoom level. > That's clever. It solve another problem too: my function shuts down R until I'm done. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Amazon SimpleDB and R
As far as I know there isn't anything available for this, but I thought I'd check before working up something of my own. Is there a way to query Amazon SimpleDB and import the data results directly into R? Cheers, Tim. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ROCR.plot methods, cross validation averaging
Dear R-help and ROCR developers (Tobias Sing and Oliver Sander) - I think my first question is generic and could apply to many methods, which is why I'm directing this initially to R-help as well as Tobias and Oliver. Question 1. The plot function in ROCR will average your cross validation data if asked. I'd like to use that averaged data to find a "best" cutoff but I can't figure out how to grab the actual data that get plotted. A simple redirect of the plot (such as test <- plot(mydata)) doesn't do it. Question 2. I am asking ROCR to average lists with varying lengths for each list entry. See my example below. None of the ROCR examples have data structured in this manner. Can anyone speak to whether the averaging methods in ROCR allow for this? If I can't easily grab the data as desired from Question 1, can someone help me figure out how to average the lists, by threshold, similarly? Question 3. If my cross validation data happen to have a list entry whose length = 2, ROCR errors out. Please see the second part of my example. Any suggestions? #reproducible examples exemplifying my questions ##part one## library(ROCR) data(ROCR.xval) # set up data so it looks more like my real data sampSize <- c(4, 55, 20, 75, 350, 250, 6, 120, 200, 25) testSet <- ROCR.xval # do the extraction for (i in 1:length(ROCR.xval[[1]])){ y <- sample(c(1:350),sampSize[i]) testSet$predictions[[i]] <- ROCR.xval$predictions[[i]][y] testSet$labels[[i]] <- ROCR.xval$labels[[i]][y] } # now massage the data using ROCR, set up for a ROC plot # if it errors out here, run the above sample again. pred <- prediction(testSet$predictions, testSet$labels) perf <- performance(pred,"tpr","fpr") # create the ROC plot, averaging by cutoff value plot(perf, avg="threshold") # check out the structure of the data str(perf) # note the ragged edges of the list and that I assume averaging # whether it be vertical, horizontal, or threshold, somehow # accounts for this? ## part two ## # add a list entry with only two values p...@x.values[[1]] <- c(0,1) p...@y.values[[1]] <- c(0,1) p...@alpha.values[[1]] <- c(Inf,0) plot(perf, avg="threshold") ##output results in an error with this message # Error in if (from == to) rep.int(from, length.out) else as.vector(c(from, : # missing value where TRUE/FALSE needed Thanks in advance for your help Tim Howard New York Natural Heritage Program __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Find each time a value changes
Dear List, I am trying to find each time a value changes in a dataset. The numbers are variables for day vs. night values, so what I am really getting is the daily sunrise and sunset. A simplified example is the following: x<-seq(1:100) y1<-rep(1,10) y2<-rep(2,10) y<-c(y1,y2,y1,y1,y1,y2,y1,y2,y1,y2) xy<-cbind(x,y) I would like to know each time the numbers change. Correct answer should be: x=1,11,21,51,61,71,81,91 I would appreciate any help or suggestions. It seems like it should be simple but I’m stuck! Thanks, Tim Tim Clark Department of Zoology University of Hawaii __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Find each time a value changes
Thanks everyone! I would have been banging my head around for quite a while and still wouldn't have come up with either solution. The function rle() is a good one to know! Aloha, Tim Tim Clark Department of Zoology University of Hawaii --- On Wed, 2/10/10, Ben Tupper wrote: > From: Ben Tupper > Subject: Re: [R] Find each time a value changes > To: r-help@r-project.org > Cc: "Tim Clark" > Date: Wednesday, February 10, 2010, 4:16 PM > Hi, > > On Feb 10, 2010, at 8:58 PM, Tim Clark wrote: > > > Dear List, > > > > I am trying to find each time a value changes in a > dataset. The > > numbers are variables for day vs. night values, so > what I am really > > getting is the daily sunrise and sunset. > > > > A simplified example is the following: > > > > x<-seq(1:100) > > y1<-rep(1,10) > > y2<-rep(2,10) > > y<-c(y1,y2,y1,y1,y1,y2,y1,y2,y1,y2) > > xy<-cbind(x,y) > > > > > > I would like to know each time the numbers change. > > Correct answer should be: > > x=1,11,21,51,61,71,81,91 > > > > I think this gets close... > > which(diff(y) != 0) > [1] 10 20 50 60 70 80 90 > > You'll need to fiddle to get exactly what you want. > > Cheers, > Ben > > > > > I would appreciate any help or suggestions. It > seems like it should > > be simple but I’m stuck! > > > > Thanks, > > > > Tim > > > > > > Tim Clark > > Department of Zoology > > University of Hawaii > > > > > > > > > > __ > > R-help@r-project.org > mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, > reproducible code. > > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Find each time a value changes
It was brought to my attention that the rle() answer to this question was not posted. The following gives the correct answer once the last value is deleted. x<-seq(1:100) y1<-rep(1,10) y2<-rep(2,10) y<-c(y1,y2,y1,y1,y1,y2,y1,y2,y1,y2) xy<-cbind(x,y) print(xy) print(str(xy)) # SEE WHAT RLE GIVES test <- rle(xy[,2]) print(str(test) # USE JIM"S TRICK OF CUMULATIVE SUMMING # TO GET THE LOCATIONS result <- cumsum(c(1,rle(xy[,2])$lengths)) Tim Clark Department of Zoology University of Hawaii --- On Wed, 2/10/10, Ben Tupper wrote: > From: Ben Tupper > Subject: Re: [R] Find each time a value changes > To: r-help@r-project.org > Cc: "Tim Clark" > Date: Wednesday, February 10, 2010, 4:16 PM > Hi, > > On Feb 10, 2010, at 8:58 PM, Tim Clark wrote: > > > Dear List, > > > > I am trying to find each time a value changes in a > dataset. The > > numbers are variables for day vs. night values, so > what I am really > > getting is the daily sunrise and sunset. > > > > A simplified example is the following: > > > > x<-seq(1:100) > > y1<-rep(1,10) > > y2<-rep(2,10) > > y<-c(y1,y2,y1,y1,y1,y2,y1,y2,y1,y2) > > xy<-cbind(x,y) > > > > > > I would like to know each time the numbers change. > > Correct answer should be: > > x=1,11,21,51,61,71,81,91 > > > > I think this gets close... > > which(diff(y) != 0) > [1] 10 20 50 60 70 80 90 > > You'll need to fiddle to get exactly what you want. > > Cheers, > Ben > > > > > I would appreciate any help or suggestions. It > seems like it should > > be simple but I’m stuck! > > > > Thanks, > > > > Tim > > > > > > Tim Clark > > Department of Zoology > > University of Hawaii > > > > > > > > > > __ > > R-help@r-project.org > mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, > reproducible code. > > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Formatting question for separate polygons
Dear List, I am trying to plot several separate polygons on a graph. I have figured out how to do it by manually, but have too much data to use such a tedious method. I would appreciate your help. I have made a simple example to illustrate the problem. How can I get x into the proper format (x1)? #Sample data x<-c(1,2,3,4,5,6) y<-c(1,2,2,1) #I need to format the data like this x1<-c(1,1,2,2,NA,3,3,4,4,NA,5,5,6,6,NA) y1<-rep(c(1,2,2,1,NA),length(x)/2) #Final plot plot(c(1,6), 1:2, type="n") polygon(x1,y1,density=c(40)) Thanks, Tim Tim Clark Department of Zoology University of Hawaii __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Formatting question for separate polygons
Thanks Uwe! And Peter for the correction. I would never have come up with that! Tim Tim Clark Department of Zoology University of Hawaii --- On Fri, 2/12/10, Uwe Ligges wrote: > From: Uwe Ligges > Subject: Re: [R] Formatting question for separate polygons > To: "Peter Ehlers" > Cc: "Tim Clark" , r-help@r-project.org > Date: Friday, February 12, 2010, 1:57 AM > > > On 12.02.2010 12:54, Peter Ehlers wrote: > > Nice, Uwe. > > Small correction: make that nrow=4: > > > > x1 <- as.numeric(rbind(matrix(rep(x, each=2), > nrow=4), NA)) > > > > Whoops, thanks! > > Uwe > > > > -Peter Ehlers > > > > Uwe Ligges wrote: > >> > >> > >> On 11.02.2010 22:38, Tim Clark wrote: > >>> Dear List, > >>> > >>> I am trying to plot several separate polygons > on a graph. I have > >>> figured out how to do it by manually, but have > too much data to use > >>> such a tedious method. I would appreciate your > help. I have made a > >>> simple example to illustrate the problem. How > can I get x into the > >>> proper format (x1)? > >>> > >>> #Sample data > >>> x<-c(1,2,3,4,5,6) > >>> y<-c(1,2,2,1) > >>> > >>> #I need to format the data like this > >>> x1<-c(1,1,2,2,NA,3,3,4,4,NA,5,5,6,6,NA) > >>> y1<-rep(c(1,2,2,1,NA),length(x)/2) > >> > >> > >> > >> x1 <- as.numeric(rbind(matrix(rep(x, each=2), > nrow=2), NA)) > >> > >> Uwe Ligges > >> > >> > >>> #Final plot > >>> plot(c(1,6), 1:2, type="n") > >>> polygon(x1,y1,density=c(40)) > >>> > >>> > >>> Thanks, > >>> > >>> Tim > >>> > >>> > >>> Tim Clark > >>> Department of Zoology > >>> University of Hawaii > >>> > >>> > __ > >>> R-help@r-project.org > mailing list > >>> https://stat.ethz.ch/mailman/listinfo/r-help > >>> PLEASE do read the posting guide > >>> http://www.R-project.org/posting-guide.html > >>> and provide commented, minimal, > self-contained, reproducible code. > >> > >> __ > >> R-help@r-project.org > mailing list > >> https://stat.ethz.ch/mailman/listinfo/r-help > >> PLEASE do read the posting guide > >> http://www.R-project.org/posting-guide.html > >> and provide commented, minimal, self-contained, > reproducible code. > >> > >> > > > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Non-monotonic spline using splinefun(method = "monoH.FC")
Hi, In my version of R, the stats package splinefun code for fitting a Fritsch and Carlson monotonic spline does not appear to guarantee a monotonic result. If two adjoining sections both have over/undershoot the way the resulting adjustment of alpha and beta is performed can give modified values which still do not satisfy the required constraints. I do not think this is due to finite precision arithmetic. Is this a known bug? Have had a look through the bug database but couldn't find anything. Below is an example created to demonstrate this, ### # Create the following data # This is created so that their are two adjoining sections which have to be adjusted x <- 1:8 y <- c(-12, -10, 3.5, 4.45, 4.5, 140, 142, 142) # Now run the splinefun() function FailMonSpline <- splinefun(x, y, method = "mono") # In theory this should be monotonic increasing but the required conditions are not satisfied # Check values of alpha and beta for this curve m <- FailMonSpline(x, deriv = 1) nx <- length(x) n1 <- nx - 1L dy <- y[-1] - y[-nx] dx <- x[-1] - x[-nx] Sx <- dy/dx alpha <- m[-nx]/Sx beta <- m[-1]/Sx a2b3 <- 2 * alpha + beta - 3 ab23 <- alpha + 2 * beta - 3 ok <- (a2b3 > 0 & ab23 > 0) ok <- ok & (alpha * (a2b3 + ab23) < a2b3^2) # If the curve is monotonic then all ok should be FALSE however this is not the case ok # Alternatively can easily seen to be non-monotonic by plotting the region between 4 and 5 t <- seq(4,5, length = 200) plot(t, FailMonSpline(t), type = "l") The version of R I am running is platform x86_64-suse-linux-gnu arch x86_64 os linux-gnu system x86_64, linux-gnu status major 2 minor 8.1 year 2008 month 12 day22 svn rev47281 language R version.string R version 2.8.1 (2008-12-22) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Non-monotonic spline using splinefun(method = "monoH.FC")
Hi, Thanks for the reply but I think you are confusing monotonic and strictly increasing/decreasing. I also just used the y-value of the last knot as a simple example as that is not the bit where it goes wrong. It will still produce a non-monotonic spline if you use for example x <- 1:8 y <- c(-12, -10, 3.5, 4.45, 4.5, 140, 142, 143) I am pretty sure that it's a bug in the way that the alpha's and beta's are modified in the code itself which does not guarantee (if there are two overlapping sections which need their alphas and betas modifying) that after modification they satisfy the constraints as explained in the original Fritsch and Carlson paper. The original paper is quite vague about how to deal with multiple sections which need modifying --- should one do it in order (in which case one might get a different result if one entered the data in the opposite direction and moving one knot would no longer guarantee that the curve changes in only a finite region) or possibly shrink the coefficients twice (which would create a flatter spline than necessary but would give a finite effect and the same curve if the data were entered in the opposite direction). Tim David Winsemius wrote: > > On Feb 15, 2010, at 10:59 AM, Tim Heaton wrote: > >> Hi, >> >> In my version of R, the stats package splinefun code for fitting a >> Fritsch and Carlson monotonic spline does not appear to guarantee a >> monotonic result. If two adjoining sections both have over/undershoot >> the way the resulting adjustment of alpha and beta is performed can give >> modified values which still do not satisfy the required constraints. I >> do not think this is due to finite precision arithmetic. Is this a known >> bug? Have had a look through the bug database but couldn't find anything. > > IThe help page says that the resulting function will be "monotone > iff the data are." > > y[7] < y[8] # False > FailMonSpline[7] < FailMonSpline[8] # False, ... , as promised. > >> >> Below is an example created to demonstrate this, >> >> ### >> # Create the following data >> # This is created so that their are two adjoining sections which have to >> be adjusted >> x <- 1:8 >> y <- c(-12, -10, 3.5, 4.45, 4.5, 140, 142, 142) >> >> # Now run the splinefun() function >> >> FailMonSpline <- splinefun(x, y, method = "mono") >> >> # In theory this should be monotonic increasing but the required >> conditions are not satisfied >> >> # Check values of alpha and beta for this curve >> m <- FailMonSpline(x, deriv = 1) >> nx <- length(x) >> n1 <- nx - 1L >> dy <- y[-1] - y[-nx] >> dx <- x[-1] - x[-nx] >> Sx <- dy/dx >> >> alpha <- m[-nx]/Sx >> beta <- m[-1]/Sx >> a2b3 <- 2 * alpha + beta - 3 >> ab23 <- alpha + 2 * beta - 3 >> ok <- (a2b3 > 0 & ab23 > 0) >> ok <- ok & (alpha * (a2b3 + ab23) < a2b3^2) >> # If the curve is monotonic then all ok should be FALSE however this is >> not the case >> ok >> >> >> # Alternatively can easily seen to be non-monotonic by plotting the >> region between 4 and 5 >> >> t <- seq(4,5, length = 200) >> plot(t, FailMonSpline(t), type = "l") >> >> >> The version of R I am running is >> >> platform x86_64-suse-linux-gnu >> arch x86_64 >> os linux-gnu >> system x86_64, linux-gnu >> status >> major 2 >> minor 8.1 >> year 2008 >> month 12 >> day22 >> svn rev47281 >> language R >> version.string R version 2.8.1 (2008-12-22) >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > David Winsemius, MD > Heritage Laboratories > West Hartford, CT > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] r help date format changes with c() vs. rbind()
Dear List, I am having a problem with dates and I would like to understand what is going on. Below is an example. I can produce a date/time using as.POSIXct, but I am trying to combine two as.POSIXct objects and keep getting strange results. I thought I was using the wrong origin, but according to structure(0,class="Date") I am not (see below). In my example a is a simple date/time object, b combines it using rbind(), c converts b to a date/time object again using as.POSIXct and gives the incorrect time, and d combines a using c() and gives the correct time. Why doesn't c give me the correct answer? Thanks, Tim > a<-as.POSIXct("2000-01-01 12:00:00") > a [1] "2000-01-01 12:00:00 HST" > b<-rbind(a,a) > b [,1] a 946764000 a 946764000 > c<-as.POSIXct(b,origin="1970-01-01") > c [1] "2000-01-01 22:00:00 HST" [2] "2000-01-01 22:00:00 HST" > d<-c(a,a) > d [1] "2000-01-01 12:00:00 HST" [2] "2000-01-01 12:00:00 HST" > structure(0,class="Date") [1] "1970-01-01" Tim Clark Department of Zoology University of Hawaii __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] r help date format changes with c() vs. rbind()
Glad to know it isn't just me! I couldn't use Phil's data.frame method since my real problem went from a POSIXct object to a large matrix where I used rbind and then back to POSIXct. Jim's function worked great on converting the final product back to the proper date. Thanks! Tim Tim Clark Department of Zoology University of Hawaii --- On Fri, 2/19/10, jim holtman wrote: > From: jim holtman > Subject: Re: [R] r help date format changes with c() vs. rbind() > To: "Tim Clark" > Cc: r-help@r-project.org > Date: Friday, February 19, 2010, 12:19 PM > I have used the following function to > convert to POSIXct from a numeric without any > problems: > > > unix2POSIXct <- function (time) > structure(time, class = c("POSIXt", > "POSIXct")) > > > > unix2POSIXct(946764000) > [1] "2000-01-01 17:00:00 EST" > > > > > > > On Fri, Feb 19, 2010 at 4:07 PM, > Tim Clark > wrote: > > Dear > List, > > I am having a problem with dates and I would like to > understand what is going on. Below is an example. I can > produce a date/time using as.POSIXct, but I am trying to > combine two as.POSIXct objects and keep getting strange > results. I thought I was using the wrong origin, but > according to structure(0,class="Date") I am not > (see below). In my example a is a simple date/time object, > b combines it using rbind(), c converts b to a date/time > object again using as.POSIXct and gives the incorrect time, > and d combines a using c() and gives the correct time. Why > doesn't c give me the correct answer? > > > Thanks, > > Tim > > > > a<-as.POSIXct("2000-01-01 12:00:00") > > a > [1] "2000-01-01 12:00:00 HST" > > > b<-rbind(a,a) > > b > [,1] > a 946764000 > a 946764000 > > > > c<-as.POSIXct(b,origin="1970-01-01") > > c > [1] "2000-01-01 22:00:00 HST" > [2] "2000-01-01 22:00:00 HST" > > > d<-c(a,a) > > d > [1] "2000-01-01 12:00:00 HST" > > [2] "2000-01-01 12:00:00 HST" > > > > structure(0,class="Date") > [1] "1970-01-01" > > > Tim Clark > Department of Zoology > University of Hawaii > > __ > > R-help@r-project.org > mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, > reproducible code. > > > > -- > Jim Holtman > Cincinnati, OH > +1 513 646 9390 > > What is the problem that you are trying to solve? > > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] (quite possibly OT) Re: how to make R running on a Linux server display a plot on a Windows machine
One additional option on the Linux-on-the-local-machine option described below. Consider running Microsoft's free virtual machine with a copy of Linux in there. Now you have the advantages of dual-boot with BOTH operating systems available with a toggle key sequence. Tim Glover Senior Environmental Scientist - Geochemistry Geoscience Department Atlanta Area MACTEC Engineering and Consulting, Inc. Kennesaw, Georgia, USA Office 770-421-3310 Fax 770-421-3486 Email ntglo...@mactec.com Web www.mactec.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Emmanuel Charpentier Sent: Monday, February 22, 2010 4:07 PM To: r-h...@stat.math.ethz.ch Subject: [R] (quite possibly OT) Re: how to make R running on a Linux server display a plot on a Windows machine Dear Xin, Le lundi 22 février 2010 à 09:53 -0800, xin wei a écrit : > hi, Kevin and K.Elo: > thank you for the suggestion. Can you be more specific on these? (like how > exactly get into x-switch or man ssh). I am totally ignorant about linux and > SSH:( Memory limitation forces me to switch from windows to Linux > cluster. \begin{SemiOffTopic} While you may not believe it now, that is a reason for rejoicing rather than grieving :-) Unix is a bit like sex : uncomprehensible and possibly frightening to the young/ignorant, extremely enjoyable and almost indispensable to the mature person ... and source of severe withdrawal syndrome if/when it becomes unavailable (some employers have strange ideas about the set of tools their employees are allowed to use) ! You've been warned... \end{SemiOffTopic} Your problem seems to have (almost) nothing to do with R *per* *se*, but with your ability to use it on a *remote* computer. This problem should be discussed with your *local* friendly help. To the best of my knowledge, neither the R help list nor its "members" are subsidized by psu.edu's sponsors (Penn State, right ?) to provide basic computer use tutoring to its students ; consequently, this list is *definitely* *NOT* the right place for this. But ... Nonwhistanding my own advice, I'll take the time to try to answer you, in the (futile?) hope that future beginning students, after *reading* the Posting Guide, following its advice and searching the list, will find this answer, thus avoiding overloading our reading bandwith *again*... That's also why I rephrased the "subject" of your post. I suppose that the simplest solution (namely fitting "enough" memory on your present Windows machine) is deemed impossible. The fun begins... Now, to work with R, you need only a *text* connection to your server. It is enough to use function creating graphs ... as files that you can later display on your Windows machine. That's what ssh does : terminal emulation (plus the ability to copy files back and forth, via scp (which you *will* need), plus a way to create so-called tunnels and redirections... But that's a horse on an entirely different color (an elephant, actually:)). If you want "real-time" graphics displayed by the R interpreter, you need, indeed, to use the "-X" switch to ssh. but that is *not* *enough* : your Windows machine *must* be fitted with software accepting commands emitted by the server's R interpreter and obeying them to actually create the image ; that is something called "an X server" (yes, server : in this case, your windows machine offers a "displaying service", that your R interpreter uses for displaying your graph, thus becoming a client of your server). Installing such a beast under Windows is (was ?) neither easy nor (usually) cheap. There *are* free (in both meanings of this word) X server implementations for Windows (most notoriously Cygwin/X and Xming), but, as far as I know, none of them is "easy" to install for the uninitiated : to do this, you must understand what you are doing, which implies (partially) mastering the X window system, which is ... complicated, to put it mildly. You'd better seek *informed* help around you on this one. I am aware of a handful of commercial implementations claiming to be "easy to install", but canot emit any opinion of them : the price tag is enough to give me pause... Another option (to be discussed with your server's manager) is to display on your Windows machine the image of a "virtual" X session started on the server. Such a solution, which has a couple of implementations (variants of VNC, variants of RDP) might be quite preferable if your network connection is slow/unreliable : X eats bandwidth like there's no tomorrow... I find VNC quite useful on the limited-bandwith connections that I use almost daily. But, may well be that the *simplest* solution would be to install Linux on your own machine (dual boot for a first time...) : X is the
Re: [R] Plotting 15 million points
Have you considered taking a random subset and plotting that? I'd bet you can get a really impression of the distribution with a few hundred thousand points at most. Tim Glover Senior Environmental Scientist - Geochemistry Geoscience Department Atlanta Area MACTEC Engineering and Consulting, Inc. Kennesaw, Georgia, USA Office 770-421-3310 Fax 770-421-3486 Email ntglo...@mactec.com Web www.mactec.com -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Abhishek Pratap Sent: Thursday, February 25, 2010 6:12 PM To: r-help@r-project.org Subject: [R] Plotting 15 million points Hi All I have a vector of about 15 million numbers which I would like to plot. The goal is the see the distribution. I tired the usual steps. 1. Histogram : never gets complete my window freezes w/out log base 10 2. Density : I first calculated the kernel density and then plotted it which worked. It would be nice to superimpose histogram with density but as of now I am not able to get this data as a histogram. I tried ggplot2 which also hangs. Any efficient methods to play with > 10 million numbers in a vector. Thanks, -Abhi __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] reading data from web data sources
Hullo I'm trying to read some time series data of meteorological records that are available on the web (eg http://climate.arm.ac.uk/calibrated/soil/dsoil100_cal_1910-1919.dat) . I'd like to be able to read in the digital data directly into R. However, I cannot work out the right function and set of parameters to use. It could be that the only practical route is to write a parser, possibly in some other language, reformat the files and then read these into R. As far as I can tell, the informal grammar of the file is: [ ]+ and the are of the form: [ ] 12 Readings for days in months where a day does not exist have special values. Missing values have a different special value. And then I've got the problem of iterating over all relevant files to get a whole timeseries. Is there a way to read in this type of file into R? I've read all of the examples that I can find, but cannot work out how to do it. I don't think that read.table can handle the separate sections of data representing each year. read.ftable maybe can be coerced to parse the data, but I cannot see how after reading the documentation and experimenting with the parameters. I'm using R 2.10.1 on osx 10.5.8 and 2.10.0 on Fedora 10. Any help/suggestions would be greatly appreciated. I can see that this type of issue is likely to grow in importance, and I'd also like to give the data owners suggestions on how to reformat their data so that it is easier to consume by machines, while being easy to read for humans. The early records are a serious machine parsing challenge as they are tiff images of old notebooks ;-) tia Tim Tim Coote t...@coote.org vincit veritas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] reading data from web data sources
Thanks, Gabor. My take away from this and Phil's post is that I'm going to have to construct some code to do the parsing, rather than use a standard function. I'm afraid that neither approach works, yet: Gabor's gets has an off-by-one error (days start on the 2nd, not the first), and the years get messed up around the 29th day. I think that na.omit (DF) line is throwing out the baby with the bathwater. It's interesting that this approach is based on read.table, I'd assumed that I'd need read.ftable, which I couldn't understand the documentation for. What is it that's removing the -999 and -888 values in this code -they seem to be gone, but I cannot see why. Phil's reads in the data, but interleaves rows with just a year and all other values as NA. Tim On 27 Feb 2010, at 17:33, Gabor Grothendieck wrote: Mark Leeds pointed out to me that the code wrapped around in the post so it may not be obvious that the regular expression in the grep is (i.e. it contains a space): "[^ 0-9.]" On Sat, Feb 27, 2010 at 7:15 AM, Gabor Grothendieck wrote: Try this. First we read the raw lines into R using grep to remove any lines containing a character that is not a number or space. Then we look for the year lines and repeat them down V1 using cumsum. Finally we omit the year lines. myURL <- "http://climate.arm.ac.uk/calibrated/soil/dsoil100_cal_1910-1919.dat " raw.lines <- readLines(myURL) DF <- read.table(textConnection(raw.lines[!grepl("[^ 0-9.]",raw.lines)]), fill = TRUE) DF$V1 <- DF[cumsum(is.na(DF[[2]])), 1] DF <- na.omit(DF) head(DF) On Sat, Feb 27, 2010 at 6:32 AM, Tim Coote > wrote: Hullo I'm trying to read some time series data of meteorological records that are available on the web (eg http://climate.arm.ac.uk/calibrated/soil/ dsoil100_cal_1910-1919.dat). I'd like to be able to read in the digital data directly into R. However, I cannot work out the right function and set of parameters to use. It could be that the only practical route is to write a parser, possibly in some other language, reformat the files and then read these into R. As far as I can tell, the informal grammar of the file is: [ ]+ and the are of the form: [ ] 12 Readings for days in months where a day does not exist have special values. Missing values have a different special value. And then I've got the problem of iterating over all relevant files to get a whole timeseries. Is there a way to read in this type of file into R? I've read all of the examples that I can find, but cannot work out how to do it. I don't think that read.table can handle the separate sections of data representing each year. read.ftable maybe can be coerced to parse the data, but I cannot see how after reading the documentation and experimenting with the parameters. I'm using R 2.10.1 on osx 10.5.8 and 2.10.0 on Fedora 10. Any help/suggestions would be greatly appreciated. I can see that this type of issue is likely to grow in importance, and I'd also like to give the data owners suggestions on how to reformat their data so that it is easier to consume by machines, while being easy to read for humans. The early records are a serious machine parsing challenge as they are tiff images of old notebooks ;-) tia Tim Tim Coote t...@coote.org vincit veritas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Tim Coote t...@coote.org vincit veritas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Best Hardware & OS For Large Data Sets
Is it possible to run a Linux guest VM on the Wintel box so that you can run the 64 bit code? I used to do this on XP (but not for R). On 27 Feb 2010, at 20:03, David Winsemius wrote: On Feb 27, 2010, at 12:47 PM, J. Daniel wrote: Greetings, I am acquiring a new computer in order to conduct data analysis. I currently have a 32-bit Vista OS with 3G of RAM and I consistently run into memory allocation problems. I will likely be required to run Windows 7 on the new system, but have flexibility as far as hardware goes. Can people recommend the best hardware to minimize memory allocation problems? I am leaning towards dual core on a 64-bit system with 8G of RAM. Given the Windows constraint, is there anything I am missing here? Perhaps the fact that the stable CRAN version of R for (any) Windows is 32-bit? It would expand your memory space somewhat but not as much as you might naively expect. (There was a recent announcement that an experimental version of a 64-bit R was available (even with an installer) and there are vendors who will supply a 64-bit Windows version for an un-announced price. The fact that there was not as of January support for binary packages seems to a bit of a constraint on who would be able to "step up" to use full 64 bit R capabilities on Win64. I'm guessing from the your failure to mention potential software constraints that you are not among that more capable group, as I am also not.) https://stat.ethz.ch/pipermail/r-devel/2010-January/056301.html https://stat.ethz.ch/pipermail/r-devel/2010-January/056411.html I know that Windows limits the RAM that a single application can access. Does this fact over-ride many hardware considerations? Any way around this? Thanks, JD -- David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Tim Coote t...@coote.org +44 (0)7866 479 760 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] turn character string into unevaluated R object
fortune('parse') But if you have a vector of file names you can create a blank list and read.table each file into a list. I generally find that if I'm reading a bunch of files in at the same time they are probably related and I will end up coming back and putting them all in to a list anyways. file.names <- c('name 1', 'name 2', 'name 3') my.list <- list() for (i in file.names) { temp <- read.table(paste(i, '.txt', sep = '')) # assign temp to i if desired assign(i, temp) # or put it in the list my.list[[i]] <- temp } On Wed, Mar 3, 2010 at 10:15 AM, Liviu Andronic wrote: > On 3/2/10, carol white wrote: > > How to turn a character string into an unevaluated R object? I want to > load some > > > I'm not sure if this is what you're looking for: > > as.name("iris") > iris > > parse(text="iris") > expression(iris) > attr(,"srcfile") > > > head(eval(as.name("iris"))) > Sepal.Length Sepal.Width Petal.Length Petal.Width Species > 1 5.1 3.5 1.4 0.2 setosa > 2 4.9 3.0 1.4 0.2 setosa > 3 4.7 3.2 1.3 0.2 setosa > 4 4.6 3.1 1.5 0.2 setosa > 5 5.0 3.6 1.4 0.2 setosa > 6 5.4 3.9 1.7 0.4 setosa > > head(eval(parse(text="iris"))) > Sepal.Length Sepal.Width Petal.Length Petal.Width Species > 1 5.1 3.5 1.4 0.2 setosa > 2 4.9 3.0 1.4 0.2 setosa > 3 4.7 3.2 1.3 0.2 setosa > 4 4.6 3.1 1.5 0.2 setosa > 5 5.0 3.6 1.4 0.2 setosa > 6 5.4 3.9 1.7 0.4 setosa > > Liviu > > ______ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Tim Calkins 0406 753 997 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ggplot2 rose diagram
Dear R gurus - consider this plot: library(ggplot2) dat <- sample(1:8,100,replace=TRUE) smp <- ggplot(data.frame(dat), aes(x=factor(dat),fill=factor(dat))) + geom_bar(width=1) smp + coord_polar() Q1. How do I change the font size and weight of bar labels (1,2,3...)? I've been wallowing in the 'Themes' structure and I just can't figure out the correct place to change the definitions. Along these same lines, what does 'strip' mean when referring to strip text? Q2. How can I move the legend defining bar height into the plot, so that it overlays the lines they refer to? Consider the same figure using Circstats: library(CircStats) dat.rad <- (dat*((2*pi)/8)) -(2*pi)/16 rose.diag(dat.rad, bins = 8) #note the origin is to the right rather than on top Q3. The key difference is that CircStats uses an area-based calculation for the size of each slice, which makes for a different presentation than ggplot2. Any suggestions on how to use this calculation method in the ggplot framework? Thanks in advance for your help. Tim Howard New York Natural Heritage Program __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Question on passing in parameter to Cox hazard
Hi, I wanted to do the cox model using a matrix. The following lines illustrate what I want to do: dat <- matrix(rnorm(30), ncol=3,dimnames = list(1:10,letters[1:3])) Survival <- rexp(10) Status <- ifelse(runif(10) < .7, 1, 0) mat <- as.data.frame(cbind(dat,Survival,Status)) cmod <- coxph(Surv(Survival, Status) ~ a+b+c, mat) - This works fine. However, I need to change the code so that the column headers ( a+b+c )are passed into the coxph function on the fly. What string/object do I need to generate so the function works? I am trying: # For example chead <- "a+b+c" cmod <- coxph(Surv(Survival, Status) ~ chead, mat) but this gives an error since I'm passing in a string. Can I change chead to something so that the code works? many thanks. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ggplot2 rose diagram
To answer two of my own questions to get them into the archives (I am slowly getting the hang of ggplot): Q1. use "opts(axis.text.x = theme_text(size=xx))" to change font size of the bar labels: library(ggplot2) set.seed(5) dat <- sample(1:8,100,replace=TRUE) smp <- ggplot(data.frame(dat), aes(x=factor(dat),fill=factor(dat))) + geom_bar(width=1) + opts(axis.text.x = theme_text(size = 18)) smp + coord_polar() Q3. calculate the frequencies themselves and use stat="identity" inside the aes call: L <- table(dat) L.df <- data.frame(L) L.df <- cbind(L.df, "SQRrelFreq" = sqrt(L.df[,2]/sum(L.df[,2]))) smp2 <- ggplot(L.df, aes(x=dat,y=SQRrelFreq, stat="identity", fill=dat)) + geom_bar(width=1) + opts(axis.text.x = theme_text(size = 18)) smp2 + coord_polar() Cheers, Tim >>> Tim Howard 3/9/2010 9:25 AM >>> Dear R gurus - consider this plot: library(ggplot2) dat <- sample(1:8,100,replace=TRUE) smp <- ggplot(data.frame(dat), aes(x=factor(dat),fill=factor(dat))) + geom_bar(width=1) smp + coord_polar() Q1. How do I change the font size and weight of bar labels (1,2,3...)? I've been wallowing in the 'Themes' structure and I just can't figure out the correct place to change the definitions. Along these same lines, what does 'strip' mean when referring to strip text? Q2. How can I move the legend defining bar height into the plot, so that it overlays the lines they refer to? Consider the same figure using Circstats: library(CircStats) dat.rad <- (dat*((2*pi)/8)) -(2*pi)/16 rose.diag(dat.rad, bins = 8) #note the origin is to the right rather than on top Q3. The key difference is that CircStats uses an area-based calculation for the size of each slice, which makes for a different presentation than ggplot2. Any suggestions on how to use this calculation method in the ggplot framework? Thanks in advance for your help. Tim Howard __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ggplot2 rose diagram
Hadley, Thanks for chiming in. By Q2 I was trying to refer to the Y-axis labels. For the polar plot, the Y-axis labels reside left of the panel. I was looking for a way to get the Y-axis labels to radiate out from the center so it would be clear which line each label refers to. I still can't find any reference to moving the y-axis labels (for any plot type) in any of your documentation. It's probably my failure For Q3, can you speak to whether the square-root transformation of counts for the y-axis provides the same function as square-root of the frequencies e.g. sqrt(countInBin/totalCount) . My goal is for area of the slice to correlate with number of records in each bin (rather than area expanding at a faster rate). Thanks, Tim >>> hadley wickham 3/10/2010 9:14 AM >>> For Q2 you can use opts(legend.position = c(0.9, 0.9)). For Q3, you can also use scale_y_sqrt(). Hadley On Wed, Mar 10, 2010 at 2:05 PM, Tim Howard wrote: > To answer two of my own questions to get them into the archives (I am slowly > getting the hang of ggplot): > > Q1. use "opts(axis.text.x = theme_text(size=xx))" to change font size of the > bar labels: > > library(ggplot2) > set.seed(5) > dat <- sample(1:8,100,replace=TRUE) > smp <- ggplot(data.frame(dat), aes(x=factor(dat),fill=factor(dat))) + >geom_bar(width=1) + >opts(axis.text.x = > theme_text(size = 18)) > smp + coord_polar() > > Q3. calculate the frequencies themselves and use stat="identity" inside the > aes call: > > L <- table(dat) > L.df <- data.frame(L) > L.df <- cbind(L.df, "SQRrelFreq" = sqrt(L.df[,2]/sum(L.df[,2]))) > smp2 <- ggplot(L.df, aes(x=dat,y=SQRrelFreq, stat="identity", fill=dat)) + > geom_bar(width=1) + >opts(axis.text.x = > theme_text(size = 18)) > smp2 + coord_polar() > > Cheers, > Tim > >>>> Tim Howard 3/9/2010 9:25 AM >>> > Dear R gurus - > > consider this plot: > > library(ggplot2) > dat <- sample(1:8,100,replace=TRUE) > smp <- ggplot(data.frame(dat), aes(x=factor(dat),fill=factor(dat))) + > geom_bar(width=1) > smp + coord_polar() > > > Q1. How do I change the font size and weight of bar labels (1,2,3...)? I've > been wallowing in the 'Themes' structure and I just can't figure out the > correct place to change the definitions. Along these same lines, what does > 'strip' mean when referring to strip text? > > Q2. How can I move the legend defining bar height into the plot, so that it > overlays the lines they refer to? > > > Consider the same figure using Circstats: > > library(CircStats) > dat.rad <- (dat*((2*pi)/8)) -(2*pi)/16 > rose.diag(dat.rad, bins = 8) #note the origin is to the right rather than on > top > > Q3. The key difference is that CircStats uses an area-based calculation for > the size of each slice, which makes for a different presentation than > ggplot2. Any suggestions on how to use this calculation method in the ggplot > framework? > > Thanks in advance for your help. > Tim Howard > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ggplot2 rose diagram
Got it. Thanks again so much for the help. Best, Tim >>> hadley wickham 3/10/2010 2:46 PM >>> > By Q2 I was trying to refer to the Y-axis labels. For the polar plot, the > Y-axis labels reside left of the panel. I was looking for a way to get the > Y-axis labels to radiate out from the center so it would be clear which line > each label refers to. I still can't find any reference to moving the y-axis > labels (for any plot type) in any of your documentation. It's probably my > failure Ah ok, there's not currently anyway to do that. You'd be best off just adding the text and markings yourself with geom_text and geom_line. > For Q3, can you speak to whether the square-root transformation of counts for > the y-axis provides the same function as square-root of the frequencies e.g. > sqrt(countInBin/totalCount) . My goal is for area of the slice to correlate > with number of records in each bin (rather than area expanding at a faster > rate). It should produce a plot that looks the same as your explicit transformation, so yes. Hadley -- Assistant Professor / Dobelman Family Junior Chair Department of Statistics / Rice University http://had.co.nz/ __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] heatmap.2 - ColSideColors question
Hi, I wanted to make more than one side color bar. For example, I can make one side color bar (col1) with the following code: --- library(gplots) mat <- matrix(sample(1:100,40),nrow=5) class1 <- c(rep(0,4),rep(1,4)) col1 <- ifelse(class1 == 0,"blue","red") class2 <- c(rep(1,3),rep(2,5)) col2 <- ifelse(class2 == 0,"yellow","green") heatmap.2(mat,col=greenred(75),ColSideColors=col1,trace="none", dendrogram = "column",labCol = NULL) --- How can I modify the code so that both col1 & col2 are displayed in the heatmap? thanks! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] abline on heatmap
Hi, Is there a way I can draw an abline on a heatmap? I try the abline function, but don't get the line. My sample code is: mat <- matrix(sample(1:100,40),nrow=5) heatmap(mat,col=greenred(75),trace="none", dendrogram = "column",labCol = NULL) abline(h=5,v=4) thanks! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Calculating distance between spatial points
Dear List, I am trying to determine the speed an animal is traveling on each leg of a track. My data is in longitude and latitude, so I am using the package rgdal to convert it into a spatial points data frame and transform it to UTM. I would then like to find the difference between successive longitudes and latitudes, find the euclidean distance between points, and compute the speed of the animal on each leg. My problem is that once I convert the lat and long into a spatial points data frame I can not access the lat and long individually. As far as I know I need to convert them in order to transform the lat and long to UTM. Is there a way I can call each variable separately in the sp dataframe? My code with example data is below. Any suggestions would be appreciated. library(rgdal) date.diff<-(20,30,10,30) Long<-c(-156.0540 ,-156.0541 ,-156.0550 ,-156.0640) Lat<-c(19.73733,19.73734,19.73743,19.73833) SP<-data.frame(Long,Lat) SP<-SpatialPoints(SP,proj4string=CRS("+proj=longlat +ellps=WGS84")) SP.utm<-spTransform(SP, CRS("+proj=utm +zone=4 +ellps=WGS84")) long.diff<-diff(SP.utm$Long) lat.diff<-diff(SP.utm$Lat) d=(long.diff^2+lat.diff^2)^.5 speed=d/date.diff Aloha, Tim Tim Clark Department of Zoology University of Hawaii __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How should I denormalise a data frame list of lists column?
Hi, I have a data frame where one column is a list of lists. I would like to subset the data frame based on membership of the lists in that column and be able to 'denormalise' the data frame so that a row is duplicated for each of its list elements. Example code follows: # The data is read in in this form with the c2 list values in single strings which I then split to give lists: > f1 <- data.frame(c1=0:2, c2=c("A,B,C", "A,E", "F,G,H")) > f1$Split <- strsplit(as.character(f1$c2), ",") > f1 c1c2 Split 1 0 A,B,C A, B, C 2 1 A,EA, E 3 2 F,G,H F, G, H # So f1$Split is the list of lists column I want to denormalise or use as the subject for subsetting # f2 is data to use to select subsets from f1 > f2 <- data.frame(c1=LETTERS[0:8], c2=c("Apples", "Badger","Camel","Dog","Elephants","Fish","Goat","Horse")) > f2 c1 c2 1 AApple 2 B Badger 3 CCamel 4 D Dog 5 E Elephant 6 F Fish 7 G Goat 8 HHorse # I was able to find which rows of f2 are represented in the f1 lists (not entirely sure if this is the best way to do this): > f3 <- f2[f2$c1 %in% unlist(f1$Split),] > f3 c1 c2 1 AApple 2 B Badger 3 CCamel 5 E Elephant 6 F Fish 7 G Goat 8 HHorse # Note that 'D' is missing from f3 because it is not in any of the f1$Split lists # f4 is a subset of f3 and I want to find the rows of f1 where f1$Split contains any of f4$c1: > f4 <- f3[c(1,3),] > f4 c1c2 1 A Apple 3 C Camel # I tried this and it didn't work, presumably because it's trying to match against each list object rather than the list elements, but unlist doesn't do the trick here because I need the individual rows, I need to unlist on a row by row basis. > f1[f1$Split %in% f4$c1,] [1] c1c2Split <0 rows> (or 0-length row.names) > f1[f4$c1 %in% f1$Split,] [1] c1c2Split <0 rows> (or 0-length row.names) > f1[match(f4$c1, f1$Split),] c1 c2 Split NA NA NULL NA.1 NA NULL I also looked at reshape which I don't think helps. I thought I might be able to create a new data frame with the f1$Split denormalised and use that, but couldn't find a way to do this, the result I'd want there is something like: > f1_denorm c1c2 Split SplitDenorm 1 0 A,B,C A, B, C A 2 0 A,B,C A, B, C B 3 0 A,B,C A, B, C C 4 1 A,EA, E A 5 1 A,EA, E E 6 2 F,G,H F, G, H F 7 2 F,G,H F, G, H G 8 2 F,G,H F, G, H H I thought perhaps for loops would be the next thing to try, but there must be a better way! Thanks for any help. Tim [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How should I denormalise a data frame list of lists column?
Hi, (apologies for initial html posting) I have a data frame where one column is a list of lists. I would like to subset the data frame based on membership of the lists in that column and be able to 'denormalise' the data frame so that a row is duplicated for each of its list elements. Example code follows: # The data is read in in this form with the c2 list values in single strings which I then split to give lists: > f1 <- data.frame(c1=0:2, c2=c("A,B,C", "A,E", "F,G,H")) > f1$Split <- strsplit(as.character(f1$c2), ",") > f1 c1c2 Split 1 0 A,B,C A, B, C 2 1 A,EA, E 3 2 F,G,H F, G, H # So f1$Split is the list of lists column I want to denormalise or use as the subject for subsetting # f2 is data to use to select subsets from f1 > f2 <- data.frame(c1=LETTERS[0:8], c2=c("Apples", "Badger","Camel","Dog","Elephants","Fish","Goat","Horse")) > f2 c1 c2 1 AApple 2 B Badger 3 CCamel 4 D Dog 5 E Elephant 6 F Fish 7 G Goat 8 HHorse # I was able to find which rows of f2 are represented in the f1 lists (not entirely sure if this is the best way to do this): > f3 <- f2[f2$c1 %in% unlist(f1$Split),] > f3 c1 c2 1 AApple 2 B Badger 3 CCamel 5 E Elephant 6 F Fish 7 G Goat 8 HHorse # Note that 'D' is missing from f3 because it is not in any of the f1$Split lists # f4 is a subset of f3 and I want to find the rows of f1 where f1$Split contains any of f4$c1: > f4 <- f3[c(1,3),] > f4 c1c2 1 A Apple 3 C Camel # I tried this and it didn't work, presumably because it's trying to match against each list object rather than the list elements, but unlist doesn't do the trick here because I need the individual rows, I need to unlist on a row by row basis. > f1[f1$Split %in% f4$c1,] [1] c1c2Split <0 rows> (or 0-length row.names) > f1[f4$c1 %in% f1$Split,] [1] c1c2Split <0 rows> (or 0-length row.names) > f1[match(f4$c1, f1$Split),] c1 c2 Split NA NA NULL NA.1 NA NULL I also looked at reshape which I don't think helps. I thought I might be able to create a new data frame with the f1$Split denormalised and use that, but couldn't find a way to do this, the result I'd want there is something like: > f1_denorm c1c2 Split SplitDenorm 1 0 A,B,C A, B, C A 2 0 A,B,C A, B, C B 3 0 A,B,C A, B, C C 4 1 A,EA, E A 5 1 A,EA, E E 6 2 F,G,H F, G, H F 7 2 F,G,H F, G, H G 8 2 F,G,H F, G, H H I thought perhaps for loops would be the next thing to try, but there must be a better way! Thanks for any help. Tim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Getting the month out of my date as a number not characters
I have a data frame (hf) that is all set up and the dates are working fine - however I need to extract the months and hours (2 separate columns) as numbers - however they are coming out as characters. I have tried both the following: hf50$hour= hf50$date hf50$hour=format(hf50["hour"],"%H") and hf$month <- as.POSIXct(strptime(hf$date, format = "%m")) but they are still coming out as characters. Any ideas please? Thanks, Tim. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Getting the month out of my date as a number not characters
I have a data frame (hf) that is all set up and the dates are working fine - however I need to extract the months and hours (2 separate columns) as numbers - however they are coming out as characters. I have tried both the following: hf50$hour <- hf50$date hf50$hour <- format(hf50["hour"],"%H") and hf$month <- as.POSIXct(strptime(hf$date, format = "%m")) but they are still coming out as characters. Any ideas please? Thanks, Tim. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Collinearity in Linear Multiple Regression
Actually, the CI index and VIF are just a start. It is best to look at what they call a matrix of "variance proportions" (found in SAS and a few other places...)--which hardly anyone understands (including the SAS folks). It is a matrix of estimates of what the variences of the regression coefficients would be if you could figure them out in the first place. It shows which factors dominate over others IN THE PARTICULAR SETUP you are analyzing. The matrix is often calculated using eigenvalues, but is best done with Singular Value Decomposition techniques (you don't have to have a square matrix, and you maintain better precision). Analysts will say that it can display an unstable system -- which is correct, but they generally say that, if its true, you have bad data and should throw it out--or collect more. I suggest care, because it may be illustrating the nature of the system you are studying. The only decent reference that I know of is a little book (hard to read) that I can't remember off the top of my head. Have to look it up. Timothy E. Paysen, Phd Research Forester (ret.) From: John Sorkin To: Alex Roy ; r-help@r-project.org Sent: Tuesday, July 21, 2009 4:19:11 AM Subject: Re: [R] Collinearity in Linear Multiple Regression I suggest you start by doing some reading about Condition index (CI) and variation inflation factor (VIF). Once you have reviewed the theory, a search of search.r-project.org (under the help menu in a windows-based R installation) for VIF will help you obtain values for VIF, c.f. http://finzi.psych.upenn.edu/R/library/HH/html/vif.html John John David Sorkin M.D., Ph.D. Chief, Biostatistics and Informatics University of Maryland School of Medicine Division of Gerontology Baltimore VA Medical Center 10 North Greene Street GRECC (BT/18/GR) Baltimore, MD 21201-1524 (Phone) 410-605-7119 (Fax) 410-605-7913 (Please call phone number above prior to faxing) >>> Alex Roy 7/21/2009 7:01 AM >>> Dear all, How can I test for collinearity in the predictor data set for multiple linear regression. Thanks Alex [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Confidentiality Statement: This email message, including any attachments, is for th...{{dropped:11}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Duplicated date values aren't duplicates
Dear list, I just had a function (as.ltraj in Adehabitat) give me the following error: "Error in as.ltraj(xy, id, date = da) : non unique dates for a given burst" I checked my dates and got the following: > dupes<-mydata$DateTime[duplicated(mydata$DateTime)] > dupes [1] (07/30/02 00:00:00) (08/06/03 17:45:00) Is there a reason different dates would come up as duplicate values? I would prefer not to have to delete them if I don't have to. Any suggestions on how to get R to realize they are different? Thanks, Tim Tim Clark Department of Zoology University of Hawaii __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Duplicated date values aren't duplicates
Don and Jim, Thanks! I got it! Duplicated is only returning one of the two duplicated dates (the second date). It all makes sense now! Tim Tim Clark Department of Zoology University of Hawaii --- On Fri, 7/24/09, Don MacQueen wrote: > From: Don MacQueen > Subject: Re: [R] Duplicated date values aren't duplicates > To: "Tim Clark" , r-help@r-project.org > Date: Friday, July 24, 2009, 4:00 AM > Look at results of > > table( mydata$DateTime ) > > and I think you will see that some are duplicated. > Specifically, the > two in your dupes object. > > -Don > > At 5:50 PM -0700 7/23/09, Tim Clark wrote: > >Dear list, > > > >I just had a function (as.ltraj in Adehabitat) give me > the following error: > > > >"Error in as.ltraj(xy, id, date = da) : non unique > dates for a given burst" > > > >I checked my dates and got the following: > > > > > > dupes<-mydata$DateTime[duplicated(mydata$DateTime)] > >> dupes > >[1] (07/30/02 00:00:00) (08/06/03 17:45:00) > > > >Is there a reason different dates would come up as > duplicate values? > >I would prefer not to have to delete them if I don't > have to. Any > >suggestions on how to get R to realize they are > different? > > > >Thanks, > > > >Tim > > > > > > > >Tim Clark > >Department of Zoology > >University of Hawaii > > > >__ > >R-help@r-project.org > mailing list > >https://*stat.ethz.ch/mailman/listinfo/r-help > >PLEASE do read the posting guide > >http://*www.*R-project.org/posting-guide.html > >and provide commented, minimal, self-contained, > reproducible code. > > > -- > -- > Don MacQueen > Environmental Protection Department > Lawrence Livermore National Laboratory > Livermore, CA, USA > 925-423-1062 > -- > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Maximizing values in subsetted dataframe
Dear List, I am trying to sub-sample some data by taking a data point every x minutes. The data contains missing values, and I would like to take the sub-sample that maximizes the number of valid points in the sample. I.e. minimizes the number of NA's in the data set. For example, given the following: da<-seq(Sys.time(),by=1,length.out=10) x<-c(1,2,NA,4,NA,6,NA,8,9,10) mydata<-data.frame(da,x) If I wanted to take a subsample every 2 seconds, I would have the following two possible answers: answer1: 2,4,NA,8 answer2: 1,NA,NA,7 I would like a function that would choose between these and obtain the one with the fewest missing values. In my real dataset I have multiple variables collected every second and I would like to subsample it every 5, 10, and 15 minutes. I appreciate your help. Tim Tim Clark Department of Zoology University of Hawaii __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] creating MS Access query objects using RODBC
Hi - I'm trying to use R to create an MS Access query object. In particular, I would like to pass a given sql statement to a variety of Access files and have that sql statement saved as an Access Query in each db. Is this possible using R? I'm aware that I could use RODBC sqlQuery and write sql to make a table or that I could run the sql, extract it to R, and then use sqlSave to save the dataframe as a table in the db. thanks in advance, -- Tim Calkins 0406 753 997 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] xts off by one confusion or error
Hullo I may have missed something blindingly obvious here. I'm using xts to handle some timeseries data. I've got daily measurements for 100 years. If I try to reduce the error rate by taking means of each month, I'm getting what at first sight appears to be conflicting information. Here's a small subset to show the problem: A small set of data: > vv x 2010-02-01 6.1 2010-02-02 6.1 2010-02-03 6.0 2010-02-04 6.0 2010-02-05 6.0 2010-02-06 6.1 2010-02-07 6.1 2010-02-08 6.1 2010-02-09 6.1 2010-02-10 6.2 Aggregate: > aggregate (vv, as.yearmon (index (vv)), mean) Feb 2010 6.08 That's fine. But if I explicitly convert to xts (which the answer ought to be, so this should be a noop), the values shift back by one month: > xts (aggregate (vv, as.yearmon (index (vv)), mean)) x Jan 2010 6.08 Just to confirm the classes: > class (aggregate (vv, as.yearmon (index (vv)), mean)) [1] "zoo" > class (vv) [1] "xts" "zoo" And to confirm that as.yearmon is returning the right month: > as.yearmon (index (vv)) [1] "Feb 2010" "Feb 2010" "Feb 2010" "Feb 2010" "Feb 2010" "Feb 2010" [7] "Feb 2010" "Feb 2010" "Feb 2010" "Feb 2010" This run was on a stock Fedora 10 build: > version _ platform i386-redhat-linux-gnu arch i386 os linux-gnu system i386, linux-gnu status major 2 minor 10.0 year 2009 month 10 day26 svn rev50208 language R version.string R version 2.10.0 (2009-10-26) And from installed.packages (): xtsNA NA "GPL-3""2.10.0" zooNA NA "GPL-2""2.10.0" Any help gratefully received. Tim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] xts off by one confusion or error
I find the following even more confusing as I thought that xts was a subclass of zoo and I'd expected that the conversion would have been more transparent > aggregate (vv, as.yearmon(index(vv)), mean) Feb 2010 6.08 > xts (aggregate (vv, as.yearmon(index(vv)), mean)) x Jan 2010 6.08 > zoo (aggregate (vv, as.yearmon(index(vv)), mean)) x Feb 2010 6.08 On 8 Apr 2010, at 15:53, Tim Coote wrote: Hullo I may have missed something blindingly obvious here. I'm using xts to handle some timeseries data. I've got daily measurements for 100 years. If I try to reduce the error rate by taking means of each month, I'm getting what at first sight appears to be conflicting information. Here's a small subset to show the problem: A small set of data: > vv x 2010-02-01 6.1 2010-02-02 6.1 2010-02-03 6.0 2010-02-04 6.0 2010-02-05 6.0 2010-02-06 6.1 2010-02-07 6.1 2010-02-08 6.1 2010-02-09 6.1 2010-02-10 6.2 Aggregate: > aggregate (vv, as.yearmon (index (vv)), mean) Feb 2010 6.08 That's fine. But if I explicitly convert to xts (which the answer ought to be, so this should be a noop), the values shift back by one month: > xts (aggregate (vv, as.yearmon (index (vv)), mean)) x Jan 2010 6.08 Just to confirm the classes: > class (aggregate (vv, as.yearmon (index (vv)), mean)) [1] "zoo" > class (vv) [1] "xts" "zoo" And to confirm that as.yearmon is returning the right month: > as.yearmon (index (vv)) [1] "Feb 2010" "Feb 2010" "Feb 2010" "Feb 2010" "Feb 2010" "Feb 2010" [7] "Feb 2010" "Feb 2010" "Feb 2010" "Feb 2010" This run was on a stock Fedora 10 build: > version _ platform i386-redhat-linux-gnu arch i386 os linux-gnu system i386, linux-gnu status major 2 minor 10.0 year 2009 month 10 day26 svn rev50208 language R version.string R version 2.10.0 (2009-10-26) And from installed.packages (): xtsNA NA "GPL-3""2.10.0" zooNA NA "GPL-2""2.10.0" Any help gratefully received. Tim __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] xts off by one confusion or error
I find the following even more confusing as I thought that xts was a subclass of zoo and I'd expected that the conversion would have been more transparent > aggregate (vv, as.yearmon(index(vv)), mean) Feb 2010 6.08 > xts (aggregate (vv, as.yearmon(index(vv)), mean)) x Jan 2010 6.08 > zoo (aggregate (vv, as.yearmon(index(vv)), mean)) x Feb 2010 6.08 On 8 Apr 2010, at 15:53, Tim Coote wrote: On 8 Apr 2010, at 15:53, Tim Coote wrote: Hullo I may have missed something blindingly obvious here. I'm using xts to handle some timeseries data. I've got daily measurements for 100 years. If I try to reduce the error rate by taking means of each month, I'm getting what at first sight appears to be conflicting information. Here's a small subset to show the problem: A small set of data: > vv x 2010-02-01 6.1 2010-02-02 6.1 2010-02-03 6.0 2010-02-04 6.0 2010-02-05 6.0 2010-02-06 6.1 2010-02-07 6.1 2010-02-08 6.1 2010-02-09 6.1 2010-02-10 6.2 Aggregate: > aggregate (vv, as.yearmon (index (vv)), mean) Feb 2010 6.08 That's fine. But if I explicitly convert to xts (which the answer ought to be, so this should be a noop), the values shift back by one month: > xts (aggregate (vv, as.yearmon (index (vv)), mean)) x Jan 2010 6.08 Just to confirm the classes: > class (aggregate (vv, as.yearmon (index (vv)), mean)) [1] "zoo" > class (vv) [1] "xts" "zoo" And to confirm that as.yearmon is returning the right month: > as.yearmon (index (vv)) [1] "Feb 2010" "Feb 2010" "Feb 2010" "Feb 2010" "Feb 2010" "Feb 2010" [7] "Feb 2010" "Feb 2010" "Feb 2010" "Feb 2010" This run was on a stock Fedora 10 build: > version _ platform i386-redhat-linux-gnu arch i386 os linux-gnu system i386, linux-gnu status major 2 minor 10.0 year 2009 month 10 day26 svn rev50208 language R version.string R version 2.10.0 (2009-10-26) And from installed.packages (): xtsNA NA "GPL-3""2.10.0" zooNA NA "GPL-2""2.10.0" Any help gratefully received. Tim ______ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Tim Coote t...@coote.org vincit veritas __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Best subset of models for glm.nb()
Dear List, I am looking for a function that will find the best subset of negative binomial models. I have a large data set with 15 variables that I am interested in. I want an easy way to run all possible models and find a subset of the "best" models that I can then look at in more detail. I have found two functions that seem to provide what I am looking for, but am not sure which one (if either) are appropriate. glmulti() in package glmulti does an exhaustive search of all models and gives a number of candidate models to choose from based on your choice of Information Criterion. This seems to be exactly what I am after, but I found nothing about it on this list which makes me think there is some reason no one is using it. gl1ce() in package lasso2 uses the least absolute shrinkage and selection operator (lasso) to do something. I found it at another thread: http://tolstoy.newcastle.edu.au/R/help/05/03/0121.html I did not understand the paper it was based on, and want to know if it even does what I am interested in before investing a lot of time in trying to understand it. Yes, I have read about the problems with stepwise algorithms and am looking for a valid alternative to narrowing down models when you have a lot of data and a large number of variables your interested in. Any thoughts on either of these methods? Or should I be doing something else? Thanks for your help, Tim Tim Clark Department of Zoology University of Hawaii __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Periodic regression - lunar percent cover
Dear List, I am trying include a lunar variable in a model and am having problems figuring out the correct way to include it. I want to convert the percent lunar illumination (fraction of moon showing) to a combination of sin and cos variables to account for the periodic nature of the lunar cycle. Would someone let me know if I am doing this correctly? I have included the first 20 variables from my dataset as an example. Y is count data and lp is the lunar percent cover. The lunar period is 29.53. y<-c(1, 3, 0, 0, 0, 0, 2, 4, 0, 1, 0, 5, 3, 2, 4, 2, 0, 1, 3, 5) lp<-c(0.80, 0.88, 0.62, 0.19, 0.21, 0.01, 0.70, 1.00, 0.88, 0.04, 0.70, 0.93, 0.23, 0.99, 0.19, 0.79, 1.00, 0.03, 0.01, 0.00) g1<-glm(y~cos((2*pi*lp)/29.530589)+sin((2*pi*lp)/29.530589)) Thanks, Tim Tim Clark Department of Zoology University of Hawaii __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Estimating theta for negative binomial model
Dear List, I am trying to do model averaging for a negative binomial model using the package AICcmodavg. I need to use glm() since the package does not accept glm.nb() models. I can get glm() to work if I first run glm.nb and take theta from that model, but is there a simpler way to estimate theta for the glm model? The two models are: mod.nb<-glm.nb(mantas~site,data=mydata) mod.glm<-glm(mantas~site,data=mydata, family=negative.binomial(mod.nb$theta)) How else can I get theta for the family=negative.binomial(theta=???) Thanks! Tim Tim Clark Department of Zoology University of Hawaii __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] pipe data from plot(). was: ROCR.plot methods, cross validation averaging
All, I'm trying again with a slightly more generic version of my first question. I can extract the plotted values from hist(), boxplot(), and even plot.randomForest(). Observe: # get some data dat <- rnorm(100) # grab histogram data hdat <- hist(dat) hdat #provides details of the hist output #grab boxplot data bdat <- boxplot(dat) bdat #provides details of the boxplot output # the same works for randomForest library(randomForest) data(mtcars) RFdat <- plot(randomForest(mpg ~ ., mtcars, keep.forest=FALSE, ntree=100), log="y") RFdat ##But, I can't use this method in ROCR library(ROCR) data(ROCR.xval) RCdat <- plot(perf, avg="threshold") RCdat ## output: NULL Does anyone have any tricks for piping or extracting these data? Or, perhaps for steering me in another direction? Thanks, Tim From: "Tim Howard" Subject: [R] ROCR.plot methods, cross validation averaging To: , , Message-ID: <4aba1079.6d16.00d...@gw.dec.state.ny.us> Content-Type: text/plain; charset=US-ASCII Dear R-help and ROCR developers (Tobias Sing and Oliver Sander) - I think my first question is generic and could apply to many methods, which is why I'm directing this initially to R-help as well as Tobias and Oliver. Question 1. The plot function in ROCR will average your cross validation data if asked. I'd like to use that averaged data to find a "best" cutoff but I can't figure out how to grab the actual data that get plotted. A simple redirect of the plot (such as test <- plot(mydata)) doesn't do it. Question 2. I am asking ROCR to average lists with varying lengths for each list entry. See my example below. None of the ROCR examples have data structured in this manner. Can anyone speak to whether the averaging methods in ROCR allow for this? If I can't easily grab the data as desired from Question 1, can someone help me figure out how to average the lists, by threshold, similarly? Question 3. If my cross validation data happen to have a list entry whose length = 2, ROCR errors out. Please see the second part of my example. Any suggestions? #reproducible examples exemplifying my questions ##part one## library(ROCR) data(ROCR.xval) # set up data so it looks more like my real data sampSize <- c(4, 55, 20, 75, 350, 250, 6, 120, 200, 25) testSet <- ROCR.xval # do the extraction for (i in 1:length(ROCR.xval[[1]])){ y <- sample(c(1:350),sampSize[i]) testSet$predictions[[i]] <- ROCR.xval$predictions[[i]][y] testSet$labels[[i]] <- ROCR.xval$labels[[i]][y] } # now massage the data using ROCR, set up for a ROC plot # if it errors out here, run the above sample again. pred <- prediction(testSet$predictions, testSet$labels) perf <- performance(pred,"tpr","fpr") # create the ROC plot, averaging by cutoff value plot(perf, avg="threshold") # check out the structure of the data str(perf) # note the ragged edges of the list and that I assume averaging # whether it be vertical, horizontal, or threshold, somehow # accounts for this? ## part two ## # add a list entry with only two values p...@x.values[[1]] <- c(0,1) p...@y.values[[1]] <- c(0,1) p...@alpha.values[[1]] <- c(Inf,0) plot(perf, avg="threshold") ##output results in an error with this message # Error in if (from == to) rep.int(from, length.out) else as.vector(c(from, : # missing value where TRUE/FALSE needed Thanks in advance for your help Tim Howard New York Natural Heritage Program __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] pipe data from plot(). was: ROCR.plot methods, cross validation averaging
Whoops, sorry. Here is the full set with the missing lines: library(ROCR) data(ROCR.xval) pred <- prediction(ROCR.xval$predictions, ROCR.xval$labels) perf <- performance(pred,"tpr","fpr") RCdat <- plot(perf, avg="threshold") RCdat Thanks. Tim >>> David Winsemius 9/24/2009 9:25 AM >>> On Sep 24, 2009, at 9:09 AM, Tim Howard wrote: > All, > I'm trying again with a slightly more generic version of my first > question. I can extract the > plotted values from hist(), boxplot(), and even plot.randomForest(). > Observe: > > # get some data > dat <- rnorm(100) > # grab histogram data > hdat <- hist(dat) > hdat #provides details of the hist output > > #grab boxplot data > bdat <- boxplot(dat) > bdat #provides details of the boxplot output > > # the same works for randomForest > library(randomForest) > data(mtcars) > RFdat <- plot(randomForest(mpg ~ ., mtcars, keep.forest=FALSE, > ntree=100), log="y") > RFdat > > > ##But, I can't use this method in ROCR > library(ROCR) > data(ROCR.xval) > RCdat <- plot(perf, avg="threshold") That code throws an object not found error. Perhaps you defined perf earlier? David > > RCdat > ## output: NULL > > Does anyone have any tricks for piping or extracting these data? > Or, perhaps for steering me in another direction? > > Thanks, > Tim > > > From: "Tim Howard" > Subject: [R] ROCR.plot methods, cross validation averaging > To: , , > > Message-ID: <4aba1079.6d16.00d...@gw.dec.state.ny.us> > Content-Type: text/plain; charset=US-ASCII > > Dear R-help and ROCR developers (Tobias Sing and Oliver Sander) - > > I think my first question is generic and could apply to many methods, > which is why I'm directing this initially to R-help as well as > Tobias and Oliver. > > Question 1. The plot function in ROCR will average your cross > validation > data if asked. I'd like to use that averaged data to find a "best" > cutoff > but I can't figure out how to grab the actual data that get plotted. > A simple redirect of the plot (such as test <- plot(mydata)) doesn't > do it. > > Question 2. I am asking ROCR to average lists with varying lengths for > each list entry. See my example below. None of the ROCR examples > have data > structured in this manner. Can anyone speak to whether the averaging > methods in ROCR allow for this? If I can't easily grab the data as > desired > from Question 1, can someone help me figure out how to average the > lists, > by threshold, similarly? > > Question 3. If my cross validation data happen to have a list entry > whose > length = 2, ROCR errors out. Please see the second part of my example. > Any suggestions? > > #reproducible examples exemplifying my questions > ##part one## > library(ROCR) > data(ROCR.xval) > # set up data so it looks more like my real data > sampSize <- c(4, 55, 20, 75, 350, 250, 6, 120, 200, 25) > testSet <- ROCR.xval > # do the extraction > for (i in 1:length(ROCR.xval[[1]])){ > y <- sample(c(1:350),sampSize[i]) > testSet$predictions[[i]] <- ROCR.xval$predictions[[i]][y] > testSet$labels[[i]] <- ROCR.xval$labels[[i]][y] > } > # now massage the data using ROCR, set up for a ROC plot > # if it errors out here, run the above sample again. > pred <- prediction(testSet$predictions, testSet$labels) > perf <- performance(pred,"tpr","fpr") > # create the ROC plot, averaging by cutoff value > plot(perf, avg="threshold") > # check out the structure of the data > str(perf) > # note the ragged edges of the list and that I assume averaging > # whether it be vertical, horizontal, or threshold, somehow > # accounts for this? > > ## part two ## > # add a list entry with only two values > p...@x.values[[1]] <- c(0,1) > p...@y.values[[1]] <- c(0,1) > p...@alpha.values[[1]] <- c(Inf,0) > > plot(perf, avg="threshold") > > ##output results in an error with this message > # Error in if (from == to) rep.int(from, length.out) else > as.vector(c(from, : > # missing value where TRUE/FALSE needed > > > Thanks in advance for your help > Tim Howard > New York Natural Heritage Program > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius, MD Heritage Laboratories West Hartford, CT __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] pipe data from plot(). was: ROCR.plot methods, cross validation averaging
David, Thank you for your reply. Yes, I can access the y-values slot with p...@y-values but, note that in the cross-validation example (ROCR.xval), the plot function averages across the list of ten vectors in the y-values slot. I might be able to create a function to average across these ten vectors, but, since the plot function already does it for me, I thought it most efficient to get the values from the function. The compounding factor is that averaging needs to incorporate some kind of complex (to me at least) equalization based on the third slot (alpha.values). I don't know how to average vectors (especially uneven-length vectors) that align using the alpha-values (suggestions here welcome!). Again, the plot function does this for me... if I could just get those values. Tobias, You suggestion to change the plot.performance function is a good one. I'll see if I can get in there and tweak it. Thanks to both of you for the help. Tim >>> David Winsemius 9/24/2009 9:43 AM >>> On Sep 24, 2009, at 9:09 AM, Tim Howard wrote: > All, > I'm trying again with a slightly more generic version of my first > question. I can extract the > plotted values from hist(), boxplot(), and even plot.randomForest(). > Observe: > > # get some data > dat <- rnorm(100) > # grab histogram data > hdat <- hist(dat) > hdat #provides details of the hist output > > #grab boxplot data > bdat <- boxplot(dat) > bdat #provides details of the boxplot output > > # the same works for randomForest > library(randomForest) > data(mtcars) > RFdat <- plot(randomForest(mpg ~ ., mtcars, keep.forest=FALSE, > ntree=100), log="y") > RFdat > > > ##But, I can't use this method in ROCR > library(ROCR) > data(ROCR.xval) > RCdat <- plot(perf, avg="threshold") > > RCdat > ## output: NULL > > Does anyone have any tricks for piping or extracting these data? > Or, perhaps for steering me in another direction? After looking at the examples in ROCR, my guess is that you really ought to examine the perf object itself. It's an S4 object so some of the access to internals are a bit different. In the example performance object I just created, the y-values slot values would ba obtainable with: p...@y.values The is also help from: ?"plot-methods" -- David > > Thanks, > Tim > > > From: "Tim Howard" > Subject: [R] ROCR.plot methods, cross validation averaging > To: , , > > Message-ID: <4aba1079.6d16.00d...@gw.dec.state.ny.us> > Content-Type: text/plain; charset=US-ASCII > > Dear R-help and ROCR developers (Tobias Sing and Oliver Sander) - > > I think my first question is generic and could apply to many methods, > which is why I'm directing this initially to R-help as well as > Tobias and Oliver. > > Question 1. The plot function in ROCR will average your cross > validation > data if asked. I'd like to use that averaged data to find a "best" > cutoff > but I can't figure out how to grab the actual data that get plotted. > A simple redirect of the plot (such as test <- plot(mydata)) doesn't > do it. > > Question 2. I am asking ROCR to average lists with varying lengths for > each list entry. See my example below. None of the ROCR examples > have data > structured in this manner. Can anyone speak to whether the averaging > methods in ROCR allow for this? If I can't easily grab the data as > desired > from Question 1, can someone help me figure out how to average the > lists, > by threshold, similarly? > > Question 3. If my cross validation data happen to have a list entry > whose > length = 2, ROCR errors out. Please see the second part of my example. > Any suggestions? > > #reproducible examples exemplifying my questions > ##part one## > library(ROCR) > data(ROCR.xval) > # set up data so it looks more like my real data > sampSize <- c(4, 55, 20, 75, 350, 250, 6, 120, 200, 25) > testSet <- ROCR.xval > # do the extraction > for (i in 1:length(ROCR.xval[[1]])){ > y <- sample(c(1:350),sampSize[i]) > testSet$predictions[[i]] <- ROCR.xval$predictions[[i]][y] > testSet$labels[[i]] <- ROCR.xval$labels[[i]][y] > } > # now massage the data using ROCR, set up for a ROC plot > # if it errors out here, run the above sample again. > pred <- prediction(testSet$predictions, testSet$labels) > perf <- performance(pred,"tpr","fpr") > # create the ROC plot, averaging by cutoff value > plot(perf, avg="threshold") > # check out the structure of the data > str(perf) > # note the ragged edges of the list and that I assume averagin
Re: [R] pipe data from plot(). was: ROCR.plot methods, cross validation averaging
Yes, that's exactly what I am after. Thank you for clarifying my problem for me! I'll try to dive into the plot.performance function. Best, Tim >>> Tobias Sing 9/24/2009 9:57 AM >>> Tim, if I understand correctly, you are trying to get the numerical values of averaged cross-validation curves. Unfortunately the plot function of ROCR does not return anything in the current version (it's a good suggestion to change this). If you want a quick fix, you could change the plot.performance function of ROCR to return back the values you wanted. Kind regards, Tobias On Thu, Sep 24, 2009 at 3:09 PM, Tim Howard wrote: > All, > I'm trying again with a slightly more generic version of my first question. > I can extract the > plotted values from hist(), boxplot(), and even plot.randomForest(). Observe: > > # get some data > dat <- rnorm(100) > # grab histogram data > hdat <- hist(dat) > hdat #provides details of the hist output > > #grab boxplot data > bdat <- boxplot(dat) > bdat #provides details of the boxplot output > > # the same works for randomForest > library(randomForest) > data(mtcars) > RFdat <- plot(randomForest(mpg ~ ., mtcars, keep.forest=FALSE, ntree=100), > log="y") > RFdat > > > ##But, I can't use this method in ROCR > library(ROCR) > data(ROCR.xval) > RCdat <- plot(perf, avg="threshold") > > RCdat > ## output: NULL > > Does anyone have any tricks for piping or extracting these data? > Or, perhaps for steering me in another direction? > > Thanks, > Tim > > > From: "Tim Howard" > Subject: [R] ROCR.plot methods, cross validation averaging > To: , , > > Message-ID: <4aba1079.6d16.00d...@gw.dec.state.ny.us> > Content-Type: text/plain; charset=US-ASCII > > Dear R-help and ROCR developers (Tobias Sing and Oliver Sander) - > > I think my first question is generic and could apply to many methods, > which is why I'm directing this initially to R-help as well as Tobias and > Oliver. > > Question 1. The plot function in ROCR will average your cross validation > data if asked. I'd like to use that averaged data to find a "best" cutoff > but I can't figure out how to grab the actual data that get plotted. > A simple redirect of the plot (such as test <- plot(mydata)) doesn't do it. > > Question 2. I am asking ROCR to average lists with varying lengths for > each list entry. See my example below. None of the ROCR examples have data > structured in this manner. Can anyone speak to whether the averaging > methods in ROCR allow for this? If I can't easily grab the data as desired > from Question 1, can someone help me figure out how to average the lists, > by threshold, similarly? > > Question 3. If my cross validation data happen to have a list entry whose > length = 2, ROCR errors out. Please see the second part of my example. > Any suggestions? > > #reproducible examples exemplifying my questions > ##part one## > library(ROCR) > data(ROCR.xval) > # set up data so it looks more like my real data > sampSize <- c(4, 55, 20, 75, 350, 250, 6, 120, 200, 25) > testSet <- ROCR.xval > # do the extraction > for (i in 1:length(ROCR.xval[[1]])){ > y <- sample(c(1:350),sampSize[i]) > testSet$predictions[[i]] <- ROCR.xval$predictions[[i]][y] > testSet$labels[[i]] <- ROCR.xval$labels[[i]][y] > } > # now massage the data using ROCR, set up for a ROC plot > # if it errors out here, run the above sample again. > pred <- prediction(testSet$predictions, testSet$labels) > perf <- performance(pred,"tpr","fpr") > # create the ROC plot, averaging by cutoff value > plot(perf, avg="threshold") > # check out the structure of the data > str(perf) > # note the ragged edges of the list and that I assume averaging > # whether it be vertical, horizontal, or threshold, somehow > # accounts for this? > > ## part two ## > # add a list entry with only two values > p...@x.values[[1]] <- c(0,1) > p...@y.values[[1]] <- c(0,1) > p...@alpha.values[[1]] <- c(Inf,0) > > plot(perf, avg="threshold") > > ##output results in an error with this message > # Error in if (from == to) rep.int(from, length.out) else as.vector(c(from, : > # missing value where TRUE/FALSE needed > > > Thanks in advance for your help > Tim Howard > New York Natural Heritage Program > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] col headers in read.table()
Hi, I was trying to read in a file test.txt, which has the following data: norm normnormclass class class a 1 2 3 4 5 6 b 3 4 5 6 7 8 c 5 6 7 8 9 10 in my R code, I do the following: --- > mat <- read.table('test.txt',header=T,row.names=1,sep='\t') > mat norm norm.1 norm.2 class class.1 class.2 a1 2 3 4 5 6 b3 4 5 6 7 8 c5 6 7 8 9 10 > -- What do I need to do so that I don't get 'norm.1', 'norm.2' etc., but just 'norm', 'norm'..,i.e. without the numbers. thanks, [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Data formatting for matplot
Dear List, I am wanting to produce a multiple line plot, and know I can do it with matplot but can't get my data in the format I need. I have a dataframe with three columns; individuals ID, x, and y. I have tried split() but it gives me a list of matrices, which is closer but not quite what I need. For example: id<-rep(seq(1,5,1),length.out=100) x<-rnorm(100,5,1) y<-rnorm(100,20,5) mydat<-data.frame(id,x,y) split.dat<-split(mydat[,2:3],mydat[,1]) I would appreciate your help in either how to get this into a format acceptable to matplot or other options for creating a multiple line plot. Thanks, Tim Tim Clark Department of Zoology University of Hawaii __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data formatting for matplot
Henrique, Thanks for the suggestion. I think I may not understand matplot() because the graph did not come out like it should have. Gabor suggested: library(lattice) xyplot(y ~ x, mydat, groups = id) Which gave what I was looking for. Is there a way to get matplot() to give the same graph? I don't have to use matplot(), but would like to understand its use. Thanks, Tim Tim Clark Department of Zoology University of Hawaii --- On Sun, 9/27/09, Henrique Dallazuanna wrote: > From: Henrique Dallazuanna > Subject: Re: [R] Data formatting for matplot > To: "Tim Clark" > Cc: r-help@r-project.org > Date: Sunday, September 27, 2009, 4:47 PM > You can try this: > > matplot(do.call(cbind, split.dat)) > > On Sun, Sep 27, 2009 at 11:42 PM, Tim Clark > wrote: > > Dear List, > > > > I am wanting to produce a multiple line plot, and know > I can do it with matplot but can't get my data in the format > I need. I have a dataframe with three columns; individuals > ID, x, and y. I have tried split() but it gives me a list > of matrices, which is closer but not quite what I need. > For example: > > > > id<-rep(seq(1,5,1),length.out=100) > > x<-rnorm(100,5,1) > > y<-rnorm(100,20,5) > > > > mydat<-data.frame(id,x,y) > > split.dat<-split(mydat[,2:3],mydat[,1]) > > > > I would appreciate your help in either how to get this > into a format acceptable to matplot or other options for > creating a multiple line plot. > > > > Thanks, > > > > Tim > > > > > > > > Tim Clark > > Department of Zoology > > University of Hawaii > > > > __ > > R-help@r-project.org > mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, > reproducible code. > > > > > > -- > Henrique Dallazuanna > Curitiba-Paraná-Brasil > 25° 25' 40" S 49° 16' 22" O > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Data formatting for matplot
Thanks for everyones help. It is great to have a number of options that result in the same graph. Aloha, Tim Tim Clark Department of Zoology University of Hawaii --- On Mon, 9/28/09, Henrique Dallazuanna wrote: > From: Henrique Dallazuanna > Subject: Re: [R] Data formatting for matplot > To: "Tim Clark" > Cc: r-help@r-project.org > Date: Monday, September 28, 2009, 1:43 AM > Tim, > > With Gabor examples, I understand this, > > You can get a similar graph with plot: > > with(mydat, plot(x, y, col = id)) > > On Mon, Sep 28, 2009 at 3:01 AM, Tim Clark > wrote: > > Henrique, > > > > Thanks for the suggestion. I think I may not > understand matplot() because the graph did not come out like > it should have. Gabor suggested: > > > > library(lattice) > > xyplot(y ~ x, mydat, groups = id) > > > > Which gave what I was looking for. Is there a way to > get matplot() to give the same graph? I don't have to use > matplot(), but would like to understand its use. > > > > Thanks, > > > > Tim > > > > > > Tim Clark > > Department of Zoology > > University of Hawaii > > > > > > --- On Sun, 9/27/09, Henrique Dallazuanna > wrote: > > > >> From: Henrique Dallazuanna > >> Subject: Re: [R] Data formatting for matplot > >> To: "Tim Clark" > >> Cc: r-help@r-project.org > >> Date: Sunday, September 27, 2009, 4:47 PM > >> You can try this: > >> > >> matplot(do.call(cbind, split.dat)) > >> > >> On Sun, Sep 27, 2009 at 11:42 PM, Tim Clark > >> wrote: > >> > Dear List, > >> > > >> > I am wanting to produce a multiple line plot, > and know > >> I can do it with matplot but can't get my data in > the format > >> I need. I have a dataframe with three columns; > individuals > >> ID, x, and y. I have tried split() but it gives > me a list > >> of matrices, which is closer but not quite what I > need. > >> For example: > >> > > >> > id<-rep(seq(1,5,1),length.out=100) > >> > x<-rnorm(100,5,1) > >> > y<-rnorm(100,20,5) > >> > > >> > mydat<-data.frame(id,x,y) > >> > split.dat<-split(mydat[,2:3],mydat[,1]) > >> > > >> > I would appreciate your help in either how to > get this > >> into a format acceptable to matplot or other > options for > >> creating a multiple line plot. > >> > > >> > Thanks, > >> > > >> > Tim > >> > > >> > > >> > > >> > Tim Clark > >> > Department of Zoology > >> > University of Hawaii > >> > > >> > > __ > >> > R-help@r-project.org > >> mailing list > >> > https://stat.ethz.ch/mailman/listinfo/r-help > >> > PLEASE do read the posting guide > >> > http://www.R-project.org/posting-guide.html > >> > and provide commented, minimal, > self-contained, > >> reproducible code. > >> > > >> > >> > >> > >> -- > >> Henrique Dallazuanna > >> Curitiba-Paraná-Brasil > >> 25° 25' 40" S 49° 16' 22" O > >> > > > > > > > > > > > > -- > Henrique Dallazuanna > Curitiba-Paraná-Brasil > 25° 25' 40" S 49° 16' 22" O > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] xyplot help - colors and break in plot
Dear List, I am new to lattice plots, and am having problems with getting my plot to do what I want. Specifically: 1. I would like the legend to have the same symbols as the plot. I tried simpleKey but can't seem to get it to work with autoKey. Right now my plot has dots (pch=19) and my legend shows circles. 2. I have nine groups but xyplot seems to only be using seven colors, so two groups have the same color. How do I get a range of nine colors? 3. I have one group who's y range is much greater than all the others. I would like to split the plot somehow so that the bottom part shows ylim=c(0,200) and the top shows ylim=c(450,550). Is this possible? What I have so far is: library(lattice) xyplot(m.dp.area$Area.km2 ~ m.dp.area$DataPoint, m.dp.area, groups = m.dp.area$Manta, main = "Cummulative area of 100% MCP", xlab = "Data Point", ylab = "MCP Area", ylim = c(0,150), scales = list(tck = c(1, 0)), #Removes tics on top and r-axis pch=19,cex=.4, auto.key = list(title = "Mantas", x = .05, y=.95, corner = c(0,1),border = TRUE)) #Legend Thanks, Tim Tim Clark Department of Zoology University of Hawaii __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] xyplot help - colors and break in plot
Felix, Thanks, that did the trick! Lattice is a lot less intuitive than basic plotting! Also, another person suggested using gap.plot from the plotrix package to put a break in the graph. I am surprised Lattice doesn't have something similar since it seems like a common problem when you have data that groups in clusters separated by a large range. Aloha, Tim Tim Clark Department of Zoology University of Hawaii --- On Mon, 9/28/09, Felix Andrews wrote: > From: Felix Andrews > Subject: Re: [R] xyplot help - colors and break in plot > To: "Tim Clark" > Cc: r-help@r-project.org > Date: Monday, September 28, 2009, 1:50 PM > 2009/9/29 Tim Clark : > > Dear List, > > > > I am new to lattice plots, and am having problems with > getting my plot to do what I want. Specifically: > > > > 1. I would like the legend to have the same symbols as > the plot. I tried simpleKey but can't seem to get it > to work with autoKey. Right now my plot has dots > (pch=19) and my legend shows circles. > > Rather than the pch = 19 argument, use par.settings = > simpleTheme(pch > = 19, cex = .4) > > > > > 2. I have nine groups but xyplot seems to only > be using seven colors, so two groups have the same > color. How do I get a range of nine colors? > > Yes, in the default theme, there are seven colours: see > trellis.par.get("superpose.symbol") > > You can change the set of colours yourself by modifying > that list (via > trellis.par.set). > > An easier option is to use one of the predefined > ColorBrewer palettes, > with custom.theme() from the latticeExtra package, or just > simpleTheme(). See ?brewer.pal (RColorBrewer package) > You will see there are a few qualitative color palettes > with 9 or more > colours: e.g. > brewer.pal(9, "Set1") > brewer.pal(12, "Set3") > > > > > 3. I have one group who's y range is much > greater than all the others. I would like to split the > plot somehow so that the bottom part shows ylim=c(0,200) and > the top shows ylim=c(450,550). Is this possible? > > Yes... in the absence of a reproducible example, maybe > something like > > xyplot(Area.km2 ~ DataPoint | (Area.km2 > 200), > m.dp.area, > groups = Manta, > scales = list(y = "free")) > > or > > AreaRange <- shingle(Area.km2, > rbind(c(0,200),c(450,550))) > xyplot(Area.km2 ~ DataPoint | AreaRange, m.dp.area, > groups = Manta, scales = list(y > = "free")) > > > > > What I have so far is: > > > > library(lattice) > > xyplot(m.dp.area$Area.km2 ~ m.dp.area$DataPoint, > m.dp.area, groups = m.dp.area$Manta, > > main = "Cummulative area of > 100% MCP", > > xlab = "Data Point", > > ylab = "MCP Area", > > ylim = c(0,150), > > scales = list(tck = c(1, > 0)), #Removes tics on top and r-axis > > pch=19,cex=.4, > > auto.key = list(title = > "Mantas", x = .05, y=.95, corner = c(0,1),border = TRUE)) > #Legend > > > > > > Thanks, > > > > Tim > > > > > > > > Tim Clark > > Department of Zoology > > University of Hawaii > > > > __ > > R-help@r-project.org > mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, > reproducible code. > > > > > -- > Felix Andrews / 安福立 > Postdoctoral Fellow > Integrated Catchment Assessment and Management (iCAM) > Centre > Fenner School of Environment and Society [Bldg 48a] > The Australian National University > Canberra ACT 0200 Australia > M: +61 410 400 963 > T: + 61 2 6125 1670 > E: felix.andr...@anu.edu.au > CRICOS Provider No. 00120C > -- > http://www.neurofractal.org/felix/ > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] bwplot scales in alphabetical order
Dear List, I know this has been covered before, but I don't seem to be able to get it right. I am constructing a boxplot in lattice and can't get the scales in the correct alphebetical order. I have already read that this is due to the way factors are treated, and I have to redefine the levels of the factors. However, I have failed. As a simple example: library(lattice) id<-rep(letters[1:9], each=20) x<-rep(seq(1:10),each=18) y<-rnorm(180,50,20) #Reverse alphebetical order bwplot(y~x|id, horizontal=FALSE) #alphebetical order reading right to left id<-factor(id,levels = sort(id,decreasing = TRUE)) bwplot(y~x|id, horizontal=FALSE) It appears that bwplot plots scales from the bottom left to the top right. If so my factor levels would need to be levels=c(7,8,9,4,5,6,1,2,3). I tried that but can't seem to get the factor function to work. #Did not work! id<-factor(id,levels=c(7,8,9,4,5,6,1,2,3),lables=letters[1:9]) Your help would be greatly appreciated. Tim Tim Clark Department of Zoology University of Hawaii __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] bwplot scales in alphabetical order
Peter, Thanks, that did it! Tim Tim Clark Department of Zoology University of Hawaii --- On Wed, 9/30/09, Peter Ehlers wrote: > From: Peter Ehlers > Subject: Re: [R] bwplot scales in alphabetical order > To: "Tim Clark" > Cc: r-help@r-project.org > Date: Wednesday, September 30, 2009, 2:43 AM > Tim, > > Add the argument as.table=TRUE to your call: > > bwplot(y~x|id, horizontal=FALSE, as.table=TRUE) > > Peter Ehlers > > Tim Clark wrote: > > Dear List, > > > > I know this has been covered before, but I don't seem > to be able to get it right. I am constructing a > boxplot in lattice and can't get the scales in the correct > alphebetical order. I have already read that this is > due to the way factors are treated, and I have to redefine > the levels of the factors. However, I have > failed. As a simple example: > > > > library(lattice) > > id<-rep(letters[1:9], each=20) > > x<-rep(seq(1:10),each=18) > > y<-rnorm(180,50,20) > > > > #Reverse alphebetical order > > bwplot(y~x|id, horizontal=FALSE) > > > > #alphebetical order reading right to left > > id<-factor(id,levels = > sort(id,decreasing = TRUE)) > > bwplot(y~x|id, horizontal=FALSE) > > > > It appears that bwplot plots scales from the bottom > left to the top right. If so my factor levels would need to > be levels=c(7,8,9,4,5,6,1,2,3). I tried that but can't seem > to get the factor function to work. > > > > #Did not work! > > > id<-factor(id,levels=c(7,8,9,4,5,6,1,2,3),lables=letters[1:9]) > > > > Your help would be greatly appreciated. > > > > Tim > > > > > > > > > > > > Tim Clark > > Department of Zoology University of Hawaii > > > > __ > > R-help@r-project.org > mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, > reproducible code. > > > > > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Paste a character to an object
Dear List, I can't seem to get a simple paste function to work like I need. I have an object I need to call but it ends in a character string. The object is a list of home range values for a range of percent isopleths. I need to loop through a vector of percent values, so I need to paste the percent as a character on the end of the object variable. I have no idea why the percent is in character form, and I can't use a simple index value (homerange[[1]]$polygons[100]) because there are a variable number of isopleths that are calculated and [100] will not always correspond to "100". So I am stuck. What I want is: homerange[[1]]$polygons$"100" What I need is something like the following, but that works: percent<-c("100","75","50") p=1 paste(homerange[[1]]$polygons$,percent[p],sep="") Thanks for the help, Tim Tim Clark Department of Zoology University of Hawaii __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Paste a character to an object
David, Thanks, that helps me in making an example of what I am trying to do. Given the following example, I would like to run through a for loop and obtain a vector of the data only for the 100, 75, and 50 percent values. Is there a way to get this to work, either using paste as in the example below or some other method? homerange <- list() homerange[[1]] <- "test" homerange[[1]]$polygons <- "test2" homerange[[1]]$polygons$`100` <- rnorm(20,10,1) homerange[[1]]$polygons$`90` <- rnorm(20,10,1) homerange[[1]]$polygons$`75` <- rnorm(20,10,1) homerange[[1]]$polygons$`50` <- rnorm(20,10,1) xx<-c() percent<-c("100","75","50") for (i in 1:length(percent)) { x<-paste(homerange[[1]]$polygons$,percent[i]) #This does not work!!! xx<-rbind(x,xx) } The x<-paste(...) in this function does not work, and that is what I am stuck on. The result should be a vector the values for the "100","75",and "50" levels, but not the "90" level. Aloha, Tim Tim Clark Department of Zoology University of Hawaii --- On Sat, 10/3/09, David Winsemius wrote: > From: David Winsemius > Subject: Re: [R] Paste a character to an object > To: "Tim Clark" > Cc: r-help@r-project.org > Date: Saturday, October 3, 2009, 4:45 PM > > On Oct 3, 2009, at 10:26 PM, Tim Clark wrote: > > > Dear List, > > > > I can't seem to get a simple paste function to work > like I need. I have an object I need to call but it > ends in a character string. The object is a list of > home range values for a range of percent isopleths. I > need to loop through a vector of percent values, so I need > to paste the percent as a character on the end of the object > variable. I have no idea why the percent is in > character form, and I can't use a simple index value > (homerange[[1]]$polygons[100]) because there are a variable > number of isopleths that are calculated and [100] will not > always correspond to "100". So I am stuck. > > > > What I want is: > > > > homerange[[1]]$polygons$"100" > > > > What I need is something like the following, but that > works: > > > > percent<-c("100","75","50") > > p=1 > > paste(homerange[[1]]$polygons$,percent[p],sep="") > > Not a reproducible example, but here is some code that > shows that it is possible to construct names that would > otherwise be invalid due to having numerals as a first > character by using back-quotes: > > > percent<-c("100","75","50") > > p=1 > > paste(homerange[[1]]$polygons$,percent[p],sep="") > Error: syntax error > > homerange <- list() > > homerange[[1]] <- "test" > > homerange[[1]]$polygons <- "test2" > Warning message: > In homerange[[1]]$polygons <- "test2" : Coercing LHS to > a list > > homerange > [[1]] > [[1]][[1]] > [1] "test" > > [[1]]$polygons > [1] "test2" > > > > homerange[[1]]$polygons$`100` <- percent[1] > Warning message: > In homerange[[1]]$polygons$`100` <- percent[1] : > Coercing LHS to a list > > homerange[[1]]$polygons$`100` > [1] "100" > > --David Winsemius > > > > > > Thanks for the help, > > > > Tim > > > > > > > > Tim Clark > > Department of Zoology > > University of Hawaii > > > > __ > > R-help@r-project.org > mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, > reproducible code. > > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Paste a character to an object
David, Thanks! You just gave me the answer. All I had to do was: xx<-c() for (i in c('100', '75', '50') ) { x<-homerange[[1]]$polygons[[i]] ; xx<-rbind(x,xx) } xx I didn't know you could use characters as index values in a for loop, or that you could use characters in double brackets instead of using the $ symbol. homerange[[1]]$polygons[['100']] is the same as homerange[[1]]$polygons$'100 The list is actually the output of the NNCH function in Adehabitat. I thought about changing the function first, but looked at the code and couldn't figure it out. I knew there had to be an easier way. I greatly appreciate all your help, Tim Tim Clark Department of Zoology University of Hawaii --- On Sat, 10/3/09, David Winsemius wrote: > From: David Winsemius > Subject: Re: [R] Paste a character to an object > To: "Tim Clark" > Cc: r-help@r-project.org > Date: Saturday, October 3, 2009, 5:43 PM > > On Oct 3, 2009, at 11:14 PM, Tim Clark wrote: > > > David, > > > > Thanks, that helps me in making an example of what I > am trying to do. Given the following example, I would > like to run through a for loop and obtain a vector of the > data only for the 100, 75, and 50 percent values. Is > there a way to get this to work, either using paste as in > the example below or some other method? > > > > homerange <- list() > > homerange[[1]] <- "test" > > homerange[[1]]$polygons <- "test2" > > homerange[[1]]$polygons$`100` <- rnorm(20,10,1) > > homerange[[1]]$polygons$`90` <- rnorm(20,10,1) > > homerange[[1]]$polygons$`75` <- rnorm(20,10,1) > > homerange[[1]]$polygons$`50` <- rnorm(20,10,1) > > > > xx<-c() > > percent<-c("100","75","50") > > for (i in 1:length(percent)) > > { > > x<-paste(homerange[[1]]$polygons$ > , percent[i]) #This does not work!!! > > > ^?^ > And why _would_ you expect an expression ending in a "$" to > be acceptable to the parser? You did not put quotes around > it so the interpreter tried to evaluate it. > > You are probably looking for the capabilities of the > functions get and assign which take string variable and > either get the object named by a sstring or assign a vlaue > to an object so named. > > But why are you intent in causing yourself all this > pain? (Not to mention asking questions I cannot > answer.) Working with expressions involving backquotes > is a recipe for hair-pulling and frustration for us normal > mortals. Why not call your lists "p100", "p90", "p75", > "p50"? Then everything is simple: > > > xx<-c() > > percent<-c(100, 75, 50) > > for (i in c("p100", "p75", "p50") ) > + { > + x<-homerange[[1]]$polygons[[i]] ; > xx<-rbind(x,xx) # could have simplified this > + } > > xx > [,1] > [,2] [,3] > [,4] [,5] > [,6] [,7] > [,8] [,9] > x 9.660935 10.46526 10.75813 8.866064 > 9.967950 9.987941 10.757160 10.180826 9.992162 > x 11.674645 10.51753 10.88061 10.515120 9.440838 11.460845 > 12.033612 9.318392 9.592026 > x 10.057021 10.14339 10.29757 9.164233 8.977280 > 9.733971 9.965002 9.693649 9.430043 > [,10] > [,11] [,12] > [,13] [,14] > [,15] [,16] > [,17] [,18] > x 11.78904 9.437353 11.910747 10.996167 > 11.631264 9.386944 9.602160 10.498921 > 9.09349 > x 9.11036 9.546378 11.030323 > 9.715164 9.500268 11.762440 9.101104 > 9.610251 10.56210 > x 9.62574 12.738020 9.146863 10.497626 > 10.485520 11.644503 10.303581 11.340263 11.34873 > [,19] [,20] > x 10.146955 9.640136 > x 9.334912 10.101603 > x 8.710609 11.265633 > > > > > > > > > > The x<-paste(...) in this function does not work, > and that is what I am stuck on. The result should be a > vector the values for the "100","75",and "50" levels, but > not the "90" level. > > > > Aloha, > > > > Tim Clark > > Department of Zoology > > University of Hawaii > > > > > > --- On Sat, 10/3/09, David Winsemius > wrote: > > > >> From: David Winsemius > >> Subject: Re: [R] Paste a character to an object > >> To: "Tim Clark" > >> Cc: r-help@r-project.org > >> Date: Saturday, October 3, 2009, 4:45 PM > >> > >> On Oct 3, 2009, at 10:26 PM, Tim Clark wrote: > >> > >>> Dea
[R] Satellite ocean color palette?
Dear List, Is there a color palette avaliable similar to what is used in satellite ocean color imagery? I.e. a gradient with blue on one end and red on the other, with yellow in the middle? I have tried topo.colors(n) but that comes out more yellow on the end. I am looking for something similar to what is found on the CoastWatch web page: http://oceanwatch.pifsc.noaa.gov/imagery/GA2009281_2009282_sst_2D_eddy.jpg Thanks! Tim Tim Clark Department of Zoology University of Hawaii __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Satellite ocean color palette?
Thanks! The colorRampPalette() did just what I need. Tim Tim Clark Department of Zoology University of Hawaii --- On Fri, 10/9/09, Barry Rowlingson wrote: > From: Barry Rowlingson > Subject: Re: [R] Satellite ocean color palette? > To: "Tim Clark" > Cc: r-help@r-project.org > Date: Friday, October 9, 2009, 9:06 AM > On Fri, Oct 9, 2009 at 7:51 PM, Tim > Clark > wrote: > > Dear List, > > > > Is there a color palette avaliable similar to what is > used in satellite ocean color imagery? I.e. a gradient > with blue on one end and red on the other, with yellow in > the middle? I have tried topo.colors(n) but that comes out > more yellow on the end. I am looking for something similar > to what is found on the CoastWatch web page: > > > > http://oceanwatch.pifsc.noaa.gov/imagery/GA2009281_2009282_sst_2D_eddy.jpg > > > > Thanks! > > You could build one yourself with the colorRamp function: > > satRampP = > colorRampPalette(c("black","blue","cyan","yellow","orange","red","black")) > > that looks roughly like the one in the jpg, but I'm not > sure about > the black at the far end...anyway, let's see: > > image(matrix(seq(0,1,len=100),100,1),col=satRampP(100)) > > Or you could try my colour schemes package: > > https://r-forge.r-project.org/projects/colourscheme/ > > Barry > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Bezier interpolation
Dear List, I am trying to interpolate animal tracking data using Bezier curves. I need a function similar to spline() or approx() but that has a method Bezier. I have tried xspline() but it does not allow you to set the number of points to interpolate between a given interval (n points between min(x) and max(x)). Mark Hindell asked the same question in 2006 (http://tolstoy.newcastle.edu.au/R/e2/help/06/12/7034.html). I contacted him and he never found a workable function. Has one been developed since then? Thanks, Tim Tim Clark Department of Zoology University of Hawaii __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Time Series methods
Hello, I have a quick question about time series methodology. If I want to display a boxplot of time series data, sorted by period, I can type: boxplot(data ~ cycle(data)); where data is of class "ts" Is there a similar method for calculating, say, the median value of each time step within the series? (So for a monthly data set, calculate median for all Januarys, all Februarys, all Marchs, etc.) Thanks, Tim [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Time Series methods
Sure, but it seems like the point of the time series class is so you don't have to create a factor based on the period of sampling. Is there anyway of imposing calculations like median on the time series class, or is Kjetil's suggestion the only approach? Thanks! On Sun, Nov 1, 2009 at 4:09 PM, Kjetil Halvorsen < kjetilbrinchmannhalvor...@gmail.com> wrote: > introduce a factor variable with the months and then use tapply? > > Kjetil > > On Sun, Nov 1, 2009 at 9:07 PM, Tim Bean wrote: > > Hello, I have a quick question about time series methodology. If I want > to > > display a boxplot of time series data, sorted by period, I can type: > > > > boxplot(data ~ cycle(data)); > > > > where data is of class "ts" > > > > Is there a similar method for calculating, say, the median value of each > > time step within the series? (So for a monthly data set, calculate median > > for all Januarys, all Februarys, all Marchs, etc.) > > > > Thanks, > > Tim > > > >[[alternative HTML version deleted]] > > > > __ > > R-help@r-project.org mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Discontinuous graph
Hi, I wanted to make a graph with the following table (2 rows, 3 columns): a b c x 1 3 5 y 5 8 6 The first column represents the start cordinate, and the second column contains the end cordinate for the x-axis. The third column contains the y-axis co-ordinate. For example, the first row in the matrix above represents the points (1,5),(2,5), (3,5). How would I go about making a discontinuous graph ? thanks! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Lattice plot
Hi, I was trying to get a graph in lattice with the following data frame (7 rows, 5 cols): chr start1 end1 meth positive 1 1 10 20 1.5y 2 2 12 18 -0.7n 3 3 22 34 2.0y 4 1 35 70 3.0y 5 1120 140 -1.3n 6 1180 190 0.2y 7 2220 300 0.4y I wanted the panels to be organized by 'chr' - which is ok. Further, I wanted the lines to be discontinuous. For example, in the first row, the x co-ordinate starts with a value of 10 (2nd column) and ends with a value of 20 (3rd column). The corresponding y value for this range of x values is 1.5 (4th column). Similarly, for the same panel (i.e chr=1), the fourth row would have x co-ordinate range from 35 to 70 with a y co-ordinate of 3. If it were only one panel, a similar result could be achieved for the data x2: > x2 chr start1 end1 meth positive 1 1 10 20 1.5y 4 1 35 70 3.0y 5 1120 140 -1.3n 6 1180 190 0.2y ## Code courtesy of BAPTISTE AUGUIE library(ggplot2) ggplot(data=x2) + geom_segment(aes(x=start1, xend=end1, y=meth, yend=meth)) - Can I get lattice to do a similar graph for the panels? thanks! [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Discontinuous graph
Thanks Baptiste. That is exactly what I needed. However, now I also need to know how I can achieve this using the lattice package, since I think I will have to make several panels. I've just rephrased the problem and put up another post. Hopefully, this will avoid some confusion. best regards, From: baptiste auguie To: r Sent: Mon, November 16, 2009 1:31:28 PM Subject: Re: [R] Discontinuous graph Hi, An alternative with ggplot2, library(ggplot2) ggplot(data=coords) + geom_segment(aes(x=a, xend=b, y=c, yend=c)) HTH, baptiste 2009/11/16 David Winsemius : > > On Nov 16, 2009, at 12:40 PM, Tim Smith wrote: > >> Hi, >> I wanted to make a graph with the following table (2 rows, 3 columns): >> a b c >> x 1 3 5 >> y 5 8 6 >> The first column represents the start cordinate, and the second column >> contains the end cordinate for the x-axis. The third column contains the >> y-axis co-ordinate. For example, the first row in the matrix above >> represents the points (1,5),(2,5), (3,5). How would I go about making a >> discontinuous graph ? >> >> thanks! > > coords <- read.table(textConnection("a b c > x 1 3 5 > y 5 8 6"), header=TRUE) > > plot(NULL, NULL, xlim = c(min(coords$a)-.5, max(coords$b)+.5), > ylim=c(min(coords$c)-.5, max(coords$c)+.5) ) > apply(coords, 1, function(x) segments(x0=x[1],y0= x[3], x1= x[2], y1=x[3]) > ) > > -- > > David Winsemius, MD > Heritage Laboratories > West Hartford, CT > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Adding columns to lower level of list
Dear List, I have very little experience with lists and am having some very basic problems. I don't know how to add columns to the lower levels of a list, or how to take something from the upper level and add it as a column to the lower level. I am analyzing animal movement data in the package Adehabitat. I have a list of animal movements called "cut.ltr" (class ltraj) that have been divided into a series of "burst" - i.e. movements with no gaps in time over a given threashold. I would like to 1. Add the speed to each item in the list, and also the burst. I can calculate speed as: sp<-lapply(cut.ltr,function(l){l$dist/l$dt}) This creates a list of the correct size. But I don't know how to add this to my original list. I.e. add a column to the lower levels of the list called "speed". 2. Add the burst to each lower level of the list. It is in the upper level, but I don't know how to access it. I have tried attribute(), attr(), cut.ltr$"burst", and several other creative guesses. The first five items in the upper level are below - cut.ltr[1:5], along with head(cut.ltr[[1]]). I would like my final result to have two more columns in cut.ltr[[1]]. One with speed, and the second with burst. Thanks in advance for your help. Tim > cut.ltr[1:5] *** List of class ltraj *** Type of the traject: Type II (time recorded) Irregular traject. Variable time lag between two locs Characteristics of the bursts: id burst nb.reloc NAs date.begindate.end 1 Abigail Abigail.1 47 0 2003-05-31 13:29:59 2003-06-01 00:59:56 2 Abigail Abigail.2 288 0 2003-06-18 17:28:11 2003-06-21 17:14:59 3 Abigail Abigail.3 10 0 2003-08-03 23:33:00 2003-08-04 01:43:58 4 Abigail Abigail.4 43 0 2003-08-04 08:15:25 2003-08-04 18:59:58 5 Abigail Abigail.5 78 0 2003-08-05 00:44:19 2003-08-05 20:15:00 > head(cut.ltr[[1]]) x ydate dx dy dist dt R2n abs.angle rel.angle 1 809189.8 2189722 2003-05-31 13:29:59 81.87136 315.3389 325.7937 901 0.0 1.316775 NA 2 809271.6 2190037 2003-05-31 13:45:00 13.00097 258.7351 259.0616 901 106141.5 1.520590 0.20381526 3 809284.6 2190296 2003-05-31 14:00:01 250.52656 669.2065 714.5634 898 338561.8 1.212584 -0.30800666 4 809535.2 2190965 2003-05-31 14:14:59 -171.14372 791.1522 809.4516 902 1665046.9 1.783836 0.57125215 5 809364.0 2191756 2003-05-31 14:30:01 302.26979 707.0157 768.9202 900 4169281.4 1.166785 -0.61705039 6 809666.3 2192463 2003-05-31 14:45:01 284.40962 725.2169 778.9919 900 7742615.6 1.197057 0.03027109 > Tim Clark Department of Zoology University of Hawaii __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Removing objects from a list based on nrow
Dear List, I have a list containing data frames of various numbers of rows. I need to remove any data frame that has less than 3 rows. For example: df1<-data.frame(letter=c("A","B","C","D","E"),number=c(1,2,3,4,5)) df2<-data.frame(letter=c("A","B"),number=c(1,2)) df3<-data.frame(letter=c("A","B","C","D","E"),number=c(1,2,3,4,5)) df4<-data.frame(letter=c("A","B","C","D","E"),number=c(1,2,3,4,5)) lst<-list(df1,df2,df3,df4) How can I determine that the second object (df2) has less than 3 rows and remove it from the list? Thanks! Tim Tim Clark Department of Zoology University of Hawaii __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Removing objects from a list based on nrow
Linlin, Thanks! That works great! Tim Tim Clark Department of Zoology University of Hawaii --- On Sat, 11/28/09, Linlin Yan wrote: > From: Linlin Yan > Subject: Re: [R] Removing objects from a list based on nrow > To: "Tim Clark" > Cc: r-help@r-project.org > Date: Saturday, November 28, 2009, 10:43 PM > Try these: > sapply(lst, nrow) # get row numbers > which(sapply(lst, nrow) < 3) # get the index of rows > which has less than 3 rows > lst <- lst[-which(sapply(lst, nrow) < 3)] # remove > the rows from the list > > On Sun, Nov 29, 2009 at 4:36 PM, Tim Clark > wrote: > > Dear List, > > > > I have a list containing data frames of various > numbers of rows. I need to remove any data frame that has > less than 3 rows. For example: > > > > > df1<-data.frame(letter=c("A","B","C","D","E"),number=c(1,2,3,4,5)) > > df2<-data.frame(letter=c("A","B"),number=c(1,2)) > > > df3<-data.frame(letter=c("A","B","C","D","E"),number=c(1,2,3,4,5)) > > > df4<-data.frame(letter=c("A","B","C","D","E"),number=c(1,2,3,4,5)) > > > > lst<-list(df1,df2,df3,df4) > > > > How can I determine that the second object (df2) has > less than 3 rows and remove it from the list? > > > > Thanks! > > > > Tim > > > > > > > > > > Tim Clark > > Department of Zoology > > University of Hawaii > > > > __ > > R-help@r-project.org > mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, > reproducible code. > > > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.