[R] party for prediction [REPOST]
Apologies for re-posting, my original message seems to have been overlooked by the moderators. -- Forwarded message -- From: Ed Date: 11 October 2012 19:03 Subject: party for prediction To: R-help@r-project.org Hi there I'm experiencing some problems using the party package (specifically mob) for prediction. I have a real scalar y I want to predict from a real valued vector x and an integral vector z. mob seemed the ideal choice from the documentation. The first problem I had was at some nodes in a partitioning tree, the components of x may be extremely highly correlated or effectively constant (that is x are not independent for all choices of components of z). When the resulting fit is fed into predict() the result is NA - this is not the same behaviour as models returned by say lm which ignore missing coefficients. I have fixed this by defining my own statsModel (myLinearModel - imaginative) which also ignores such coefficients when predicting. The second problem I have is that I get "Cholesky not positive definite" errors at some nodes. I guess this is because of numerical error and degeneracy in the covariance matrix? Any thoughts on how to avoid having this happen would be welcome; it is ignorable though for now. The third and really big problem I have is that when I apply mob to large datasets (say hundreds of thousands of elements) I get a "logical subscript too long" error inside mob_fit_fluctests. It's caught in a try(), and mob just gives up and treats the node as terminal. This is really hurting me though; with 1% of my data I can get a good fit and a worthwhile tree, but with the whole dataset I get a very stunted tree with a pretty useless prediction ability. I guess what I really want to know is: (a) has anyone else had this problem, and if so how did they overcome it? (b) is there any way to get a line or stack trace out of a try() without source modification? (c) failing all of that, does anyone know of an alternative to mob that does the same thing; for better or worse I'm now committed to recursive partitioning over linear models, as per mob? (d) failing all of this, does anyone have a link to a way to rebuild, or locally modify, an R package (preferably windows, but anything would do)? Sorry for the length of this post. If I should RTFM, please point me at any relevant manual by all means. I've spent a few days on this as you can maybe tell, but I'm far from being an R expert. Thanks for any help you can give. Best wishes, Ed __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] party for prediction [REPOST]
Sorry, my mistake, I didn't get a notification or see it send. Thanks for clearing that up. Best wishes Ed On 12 October 2012 16:58, David Winsemius wrote: > > On Oct 12, 2012, at 1:37 AM, Ed wrote: > >> Apologies for re-posting, my original message seems to have been >> overlooked by the moderators. >> > No. Your original post _was_ forwarded to the list. On my machine it appeared > at October 11, 2012 11:03:08 AM PDT. No one responded. It seems possible > that its lack of data or code is the reason for that state of affairs. > > -- > David. > >> -- Forwarded message -- >> From: Ed >> Date: 11 October 2012 19:03 >> Subject: party for prediction >> To: R-help@r-project.org >> >> >> Hi there >> >> I'm experiencing some problems using the party package (specifically >> mob) for prediction. I have a real scalar y I want to predict from a >> real valued vector x and an integral vector z. mob seemed the ideal >> choice from the documentation. >> >> The first problem I had was at some nodes in a partitioning tree, the >> components of x may be extremely highly correlated or effectively >> constant (that is x are not independent for all choices of components >> of z). When the resulting fit is fed into predict() the result is NA - >> this is not the same behaviour as models returned by say lm which >> ignore missing coefficients. I have fixed this by defining my own >> statsModel (myLinearModel - imaginative) which also ignores such >> coefficients when predicting. >> >> The second problem I have is that I get "Cholesky not positive >> definite" errors at some nodes. I guess this is because of numerical >> error and degeneracy in the covariance matrix? Any thoughts on how to >> avoid having this happen would be welcome; it is ignorable though for >> now. >> >> The third and really big problem I have is that when I apply mob to >> large datasets (say hundreds of thousands of elements) I get a >> "logical subscript too long" error inside mob_fit_fluctests. It's >> caught in a try(), and mob just gives up and treats the node as >> terminal. This is really hurting me though; with 1% of my data I can >> get a good fit and a worthwhile tree, but with the whole dataset I get >> a very stunted tree with a pretty useless prediction ability. >> >> I guess what I really want to know is: >> (a) has anyone else had this problem, and if so how did they overcome it? >> (b) is there any way to get a line or stack trace out of a try() >> without source modification? >> (c) failing all of that, does anyone know of an alternative to mob >> that does the same thing; for better or worse I'm now committed to >> recursive partitioning over linear models, as per mob? >> (d) failing all of this, does anyone have a link to a way to rebuild, >> or locally modify, an R package (preferably windows, but anything >> would do)? >> >> Sorry for the length of this post. If I should RTFM, please point me >> at any relevant manual by all means. I've spent a few days on this as >> you can maybe tell, but I'm far from being an R expert. >> >> Thanks for any help you can give. >> >> Best wishes, >> >> Ed > > David Winsemius, MD > Alameda, CA, USA > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] party for prediction [REPOST]
such a strategy helps here. I've considered using rpart() to partition into cells of constant gradient, then fitting linear models myself to the cells. This is my next thought. I'm pretty sure partitioning over linear regression is the way forward for the data we have. I tried mars and glm but there are good reasons to think they're less reasonable, even though the fit wasn't particularly poor. I'm not particularly wedded to party's approach except that it looked like it immediately returned what we needed, and with some degree of "optimality" into the bargain. >> (b) is there any way to get a line or stack trace out of a try() >> without source modification? > > Not sure, I don't know any off the top off my head. I guess I really will have to bite the bullet and try to figure out how to install modified libraries. Thanks. >> (c) failing all of that, does anyone know of an alternative to mob >> that does the same thing; for better or worse I'm now committed to >> recursive partitioning over linear models, as per mob? > > > If your partitioning variables are particularly simple (e.g., all binary) > you could exploit that and it may be easier to write a custom function for > your particular data. Then likelihood-ratio tests (rather than LM-type > tests) would also be easier to apply in case of unidentified parameters. > > But if there are partitioning variables with different measurement scales, > then this will not be that simple... Unfortunately each partitioning variable is essentially a state indicator, taking values say 0,...,R where R is different for each component. I'm not a stats expert either; I've spent some time with the party manuals and papers, but I wouldn't be confident of implementing something like it in the time available to me (though if I have to I will, but that wouldn't be a good situation to be in). >> (d) failing all of this, does anyone have a link to a way to rebuild, or >> locally modify, an R package (preferably windows, but anything would do)? > > > Have a look at the "Writing R Extensions" manual and the R for Windows FAQ. Will do. Thank you very much for your responses, I really appreciate it. Best wishes, Ed __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] party for prediction [REPOST]
This was an exceptionally helpful answer, I can only thank you again. I have plenty of avenues ahead where I was worried before I was getting trapped in a dead end. If all else fails, the idea of using anova is brilliant. Thank you! Ed On 14 October 2012 18:36, Achim Zeileis wrote: > On Sun, 14 Oct 2012, Ed wrote: > >> First up, thanks hugely for your response. I've been beating my head >> against this! >> >> On 14 October 2012 16:51, Achim Zeileis wrote: >>> >>> I'm not sure what you mean by "integral vector". If you want to apply the >>> approach to hundreds of thousands of observations, I gues that these are >>> categorical (maybe even binary?) but maybe not... >> >> >> I'm sorry I can't go into the details of the data, I would if I could. >> z are categorical variables represented as integers, mostly ordered, >> but not all. I've tried fitting them as integers, as well as ordered, >> but O don't think it made a huge difference. > > > The tests performed for categorical partitioning variables are rather > different from the tests for numerical partitioning variables. If all of the > variables are categorical, this may not be immediately obvious, but the > factor coding should be more appropriate (especially if the number of levels > is small or moderate). > > >>> If I recall correctly, we kept linearModel as simple as we did to save as >>> much time as possible. This can be particularly important when one of the >>> partitioning variables has many possible splits and the linearModel has >>> to >>> be fitted thousands of times. >> >> >> I can appreciate that, but maybe having an alternative linearModel which >> will predict when the fit is degenerate would be worth including? I'm happy >> to contribute what I have, although it's pretty obvious stuff (and probably >> done suboptimally since I'm not much of an R coder at this point). For me at >> least, even with huge datasets, the speed of party is quite good; it's >> getting a better result that's the problem. > > > As I explained in my last e-mail. In your situation this does not solve the > problem completely because subsequent the tests are also not adapted to > this. Setting the empirical estimating functions to zero for non-identified > coefficients might alleviate the problem but is not really a clean solution. > > >>> Also, mob() assesses the stability of all coefficients of the model in >>> all >>> nodes during partitioning. If any of the coefficients is not identified, >>> this would have to be excluded from all subsequent parameter stability >>> tests >>> in that node (and its child nodes). This is currently not provided for in >>> mob(). >> >> >> Would pretending the coefficients were fit at 0 fool mob into doing >> something moderately meaningful here? > > > The coefficients are not looked at during fitting, only the estfun(). This > would have to be set to 0. > > >> If not, I would try to hack the code, but I'm honestly at something of >> a loss as to how to modify it and feed the results back into my >> interpreter. I have bytecode installed; I downloaded the source, but I >> haven't squared the circle of modifying the source and installing the >> result. I will check out the docs on writing extensions you suggest. > > > Writing (or modifying) R packages and installing them under Windows is > pretty standard and well documented. The pointers I gave you should > hopefully get you started. > > >>>> The second problem I have is that I get "Cholesky not positive definite" >>>> errors at some nodes. I guess this is because of numerical error and >>>> degeneracy in the covariance matrix? Any thoughts on how to avoid having >>>> this happen would be welcome; it is ignorable though for now. >>> >>> >>> This comes from the parameter stability tests and might be a result of an >>> unidentified (or close to unidentified) model fit. >> >> >> This is a great help to know. I improved my results quite considerably >> with aggressive scaling of everything (scaling the response and all >> the predictors to lie between 0 an 1). That deepened my tree by a >> factor of two or so (say depth 3 to 7) and improved the quality of fit >> substantially. Is there any way I can engage a more numerically robust >> Cholesky in mob? > > > No, I don't think that this is conceivable with the way this is implemented > at the moment. Instead o
[R] Changing a for loop to a function using sapply
Apparently there is one or more concepts that I do not fully understand from the descriptions of a function and the apply material. I have been reading the mail from this forum and have learned much but, in this case, what I have been reading here and from the manual isn't enough. The following code produces what I want with the for loop. From what I have read from this forum, a for loop its not necessarily the best path so I tried to create a function do to the same work. Using the following 64 bit version on Windows 7 Dell laptop R version 2.15.1 (2012-06-22) -- "Roasted Marshmallows" Copyright (C) 2012 The R Foundation for Statistical Computing ISBN 3-900051-07-0 Platform: x86_64-pc-mingw32/x64 (64-bit) below is the part that works # The following lines create a string of nucleotides and uses a for loop to create multiple strings. # random.string replicate something based on rs sampling criteria. random.string <- rep(NA, rs<-sample(3:18,1,replace = TRUE)) # The randomizeString function uses members of DNAnucleotides list to sample 3 at a time # placing the results in "a". randomizeString <- function(x) { DNAnucleotides<- c("a","c","g","t") a <-sample(DNAnucleotides,3, replace = TRUE) return(a) } # The following paste output uses random.string to indicate the number of times the function # randomizeString selects a triplet from the list DNAnucleotides to create a text string # of a sequence of nucleotides. # collapse = "" removes the quotes from the triplets to produce one long string when the string # is printed by paste. paste(c(sapply(random.string, randomizeString, simplify = TRUE), ""), collapse = "") # The for loop uses the paste output to create multiple random length nucleotide strings # which can be printed to a file. for(i in 1:20) DNA[i]<-paste(c(sapply(rep(NA, rs<-sample(3:21,1,replace = TRUE)) , randomizeString, simplify = TRUE), ""), collapse = "") DNA Rowname <- c(1:20) # provides row numbers to be used with the sequences produced Arrow<- rep(">",20) # provides a list of ">" to be used to separate the row numbers and sequences # DNAout uses a for loop to combine the vectors to create one string vector of sequences. DNAout<-class(character) for(j in 1:20)DNAout[j]<- paste(Rowname[j]," ",Arrow[j]," ",DNA[j], collapse = "" ) DNAout Here is what I have tried in attempts to create a function to replicate the results of the for loop above. This one comes close. #This repeats the above script without the comments. ## options(stringsAsFactors=FALSE) DNA<-class(character) randomizeString <- function(x) { DNAnucleotides <- c("A","C","G","T") a <-sample(DNAnucleotides, 3, replace = TRUE) return(a) } for(i in 1:20) DNA[i]<-paste(c(sapply(rep(NA, rs<-sample(3:18,1,replace = TRUE)) , randomizeString, simplify = TRUE), ""), collapse = "") DNA Rowname <- c(1:20) Arrow<- rep(">",20) DNAout<-class(character) for(j in 1:20)DNAout[j]<- paste(Rowname[j]," ",Arrow[j]," ",DNA[j], collapse = "" ) DNAout ### ##The following works partially DNAoutc <- class("character") DNAoutc <- function(x,y,z){sapply(x, paste(x," ",y," ",z,"\n", collapse = ""))} DNAoutc(Rowname,Arrow,DNA) Error in get(as.character(FUN), mode = "function", envir = envir) : object '1 > ACAAACAATGAGGTCCGCCGGATGAAGCTG 2 > CAAACCTCGTGCAAAGGTGCTTCATGGTAAATCCGTTTAGCCGGGAAAGT 3 > TACATCGAAGCTCGTTGAAG 4 > CGTCAACATGAACAAATGACATCCAGACGCACGCTGTAA 5 > CATTTAACCCTTGGTGTGATG 6 > AAGTATGAGTGGGCCTTGGGTTCTGGCTCCCACGCGTTGTGC 7 > AGTTCCCGCAAACTGATACTGATCAGCACTTAGAGACCGCCACTATCAGTT 8 > AATAATGCATGCTAGGCAGCCCGCTCGACCATTAGGGATAGAGCT 9 > GACATCAAGTCATAGGTT 10 > CAGAACAATATACACGTT 11 > CGCAACCATCTACACTGCGTT 12 > GTGAACTGAGGTATGACCGGTGGATAATAACGGGACC 13 > TAGCAACATGAGTGCCTCAGGTTGTCGTTCAATAAACTCGGGAAG 14 > GCGATGATCCGCTTATAGCATGGACAAAGCAACGTTCTGTCGTCGGATTC 15 > AGCATGTTAGCAATTTG 16 > ACTAGTTCTGCCGTCATTTCAATG 17 > ATTCTTCCCTTG 18 > CATCTCGATTCTTTCTTACAATGT 19 > ATAGATACCTTGGTCAAATAATCGTTTCAAGGT 20 > TGGATAATAGCGGATAC ' of mode 'function' was not found My other attempts essentially give errors which I can not seem to figure out what I am missing to correct the errors. Below are a few of the failed attempts. # mode(DNAoutf)<-("function") DNAoutf <- sapply(x,function(x,y,z){paste(x," ",y," ",z,"\n", collapse = "" )}) DNAoutf(Rowname,Arrow,DNA) > mode(DNAoutf)<-("function") Error in mode(DNAoutf) <- ("function") : object 'DNAoutf' not found > DNAoutf<- function(x,y,z) {sapply(x,y,z),paste(x," ",y," ",z,"\n", collapse = "" ))} Error: unexpected ',' in "DNAoutf<- function(x,y,z) {sapply(x,y,z)," > DNAoutf(R
[R] maximum likelihood using nlm to estimate 4 variables
Hi I need help I am new to R and am having problems estimating parameters out of 3stage constrained function. I have constructed a code as below and my data are two colomns of R_j and R_m(sample given below). R_j and R_m represents the dependent and independent variables respectively. The parameters al_j, au_j, b_j , and sigma_j need to be estimate and there are no initial estimates to them llik=function(R_j,R_m) { LF=if(R_j< 0)sum[ln(1/(2*pi*(sigma_j^2)))-(1/(2*(sigma_j^2))*(R_j+al_j-b_j*R_m))^2] + if(R_j> 0)sum[ln(1/(2*pi*(sigma_j^2)))-(1/(2*(sigma_j^2))*(R_j+au_j-b_j*R_m))^2] + if(R_j==0)sum[(ln(%pnorm((au_j-b_j*R_m)/sigma_j)-%pnorm((al_j-b_j*R_m)/sigma_j)))] } est.nlm = nlm(llik,0) #not sure what to put for the 4 initial estimates so I just put 0 est.nlm$estimate Sample Data R_j R_m 0.002 0.026567295 0.003 -0.009798475 0.050.008497274 -0.01 0.012464578 -0.0009 0.002896023 0.090.000879473 0.01-0.003194435 0.0006 0.010281122 I will appreciate if you help me to modify my code to get my estimates or give me any better method to use. Thank you in advance Edward Student: Institute of Actuaries -- View this message in context: http://r.789695.n4.nabble.com/maximum-likelihood-using-nlm-to-estimate-4-variables-tp3629290p3629290.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] A list of data frames and a list of colnames.
I have a list of file names, and a list of data frames contained in those files. mynames <- list.files() mydata <- lapply(mynames, read.delim) Every file contains two columns. > colnames(mydata[[1]]) [1] "Name" "NumReads" > colnames(mydata[[2]]) [1] "Name" "NumReads" I can set the colnames easily enough with a for loop. for (i in seq_along(mynames)) { colnames(mydata[[i]])[2] <- mynames[i] } Is there a nicer way to do this? __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] as.data.frame doesn't set col.names
Why doesn't this work? > samples$geno <- as.data.frame(sapply(yo, toupper), col.names="geno") > samples quant_samples age sapply(yo, toupper) E11.5 F20het BA40 E11.5 F20het BA40 E11.5 F20HET E11.5 F20het BA45 E11.5 F20het BA45 E11.5 F20HET __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] as.data.frame doesn't set col.names
Wait. Now I'm really confused. > > head(samples) quant_samples age sapply(yo, toupper) E11.5 F20het BA40 E11.5 F20het BA40 E11.5 F20HET E11.5 F20het BA45 E11.5 F20het BA45 E11.5 F20HET E11.5 F20het BB84 E11.5 F20het BB84 E11.5 F20HET E11.5 F9.20DKO KTr3 E11.5 F9.20DKO KTr3 E11.5F9.20DKO E11.5 F9.20DKO PEd2 E11.5 F9.20DKO PEd2 E11.5F9.20DKO E11.5 F9.20DKO j0J1 E11.5 F9.20DKO j0J1 E11.5F9.20DKO > colnames(samples) [1] "quant_samples" "age" "geno" Really, really confused. On Tue, Oct 24, 2017 at 12:58 PM, Ed Siefker wrote: > Why doesn't this work? > >> samples$geno <- as.data.frame(sapply(yo, toupper), col.names="geno") >> samples > quant_samples age sapply(yo, toupper) > E11.5 F20het BA40 E11.5 F20het BA40 E11.5 F20HET > E11.5 F20het BA45 E11.5 F20het BA45 E11.5 F20HET __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] googlesheets gs_reshape_cellfeed()
I have a google spreadsheet with a column of hyperlinks I want the URL from. The googlesheets package can return this information with gs_read_cellfeed(), but it needs to be reshaped with gs_reshape_cellfeed(). Problem is, gs_reshape_cellfeed() returns the 'value' of the cells, not the 'input_value' making it exactly like gs_read(). How do I extract input_value from a cell feed in a convenient format? I want a data frame that looks exactly like the output of gs_read(), except returning 'input_value' instead of 'value'. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] ggplot inside function doesn't plot
I have a function: myplot <- function (X) { d <- plotCounts(dds2, gene=X, intgroup="condition", returnData=TRUE) png(paste("img/", X, ".png", sep="")) ggplot(d, aes(x=condition, y=count, color=condition)) + geom_point(position=position_jitter(w=0.1,h=0)) + scale_y_log10(breaks=c(25,100,400)) + ggtitle(X) + theme(plot.title = element_text(hjust = 0.5)) dev.off() } 'd' is a dataframe count condition E11.5 F20HET BA40_quant 955.9788 E11.5 F20HET E11.5 F20HET BA45_quant 796.2863 E11.5 F20HET E11.5 F20HET BB84_quant 745.0340 E11.5 F20HET E11.5 F9.20DKO YEH3_quant 334.2994 E11.5 F9.20DKO E11.5 F9.20DKO fkm1_quant 313.7307 E11.5 F9.20DKO E11.5 F9.20DKO zzE2_quant 349.3313 E11.5 F9.20DKO If I set X="Etv5" and paste the contents of the function into R, I get 'img/Etv5.png' If I run myplot(X), I get nothing. > X [1] "Etv5" > list.files("img") character(0) > myplot(X) null device 1 > list.files("img") character(0) > d <- plotCounts(dds2, gene=X, intgroup="condition", returnData=TRUE) > png(paste("img/", X, ".png", sep="")) > ggplot(d, aes(x=condition, y=count, color=condition)) + + geom_point(position=position_jitter(w=0.1,h=0)) + + scale_y_log10(breaks=c(25,100,400)) + + ggtitle(X) + + theme(plot.title = element_text(hjust = 0.5)) > dev.off() null device 1 > list.files("img") [1] "Etv5.png" Why doesn't my function work? __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] ggplot inside function doesn't plot
I don't really understand. I mean, I understand the solution is print(ggplot(...)). But why is that required in a function and not at the console? Shouldn't I be able to rely on what I do at the console working in a script? Is this inconsistent behavior by design? On Thu, Nov 2, 2017 at 11:54 AM, David Winsemius wrote: > >> On Nov 2, 2017, at 9:27 AM, Ed Siefker wrote: >> >> I have a function: >> >> myplot <- function (X) { >>d <- plotCounts(dds2, gene=X, intgroup="condition", returnData=TRUE) >>png(paste("img/", X, ".png", sep="")) >>ggplot(d, aes(x=condition, y=count, color=condition)) + >>geom_point(position=position_jitter(w=0.1,h=0)) + >>scale_y_log10(breaks=c(25,100,400)) + >>ggtitle(X) + >>theme(plot.title = element_text(hjust = 0.5)) >> >>dev.off() >>} >> >> 'd' is a dataframe >> >> count condition >> E11.5 F20HET BA40_quant 955.9788 E11.5 F20HET >> E11.5 F20HET BA45_quant 796.2863 E11.5 F20HET >> E11.5 F20HET BB84_quant 745.0340 E11.5 F20HET >> E11.5 F9.20DKO YEH3_quant 334.2994 E11.5 F9.20DKO >> E11.5 F9.20DKO fkm1_quant 313.7307 E11.5 F9.20DKO >> E11.5 F9.20DKO zzE2_quant 349.3313 E11.5 F9.20DKO >> >> If I set X="Etv5" and paste the contents of the function into R, I get >> 'img/Etv5.png' >> If I run myplot(X), I get nothing. >> >> >>> X >> [1] "Etv5" >>> list.files("img") >> character(0) >>> myplot(X) >> null device >> 1 >>> list.files("img") >> character(0) >>> d <- plotCounts(dds2, gene=X, intgroup="condition", returnData=TRUE) >>> png(paste("img/", X, ".png", sep="")) >>> ggplot(d, aes(x=condition, y=count, color=condition)) + >> + geom_point(position=position_jitter(w=0.1,h=0)) + >> + scale_y_log10(breaks=c(25,100,400)) + >> + ggtitle(X) + >> + theme(plot.title = element_text(hjust = 0.5)) >>> dev.off() >> null device >> 1 >>> list.files("img") >> [1] "Etv5.png" >> >> Why doesn't my function work? > > `ggplot` creates an object. You need to print it when used inside a function. > Inside a function (in a more restricted environment) there is no > parse-eval-print-loop. > > >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > David Winsemius > Alameda, CA, USA > > 'Any technology distinguishable from magic is insufficiently advanced.' > -Gehm's Corollary to Clarke's Third Law > > > > > __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] drc, ggplot2, and gridExtra
I have dose response data I have analyzed with the 'drc' package. Using plot() works great. I want to arrange my plots and source data on a single page. I think 'gridExtra' is the usual package for this. I could use plot() and par(mfrow=...), but then I can't put the source data table on the page. gridExtra provides grid.table() which makes nice graphical tables. It doesn't work with par(mfrow=...), but has the function grid.arrange() instead. Unfortunately, grid.arrange() doesn't accept plot(). It does work with qplot() from 'ggplot2'. Unfortunately, qplot() doesn't know how to deal with data of class drc. I'm at a loss on how to proceed here. Any thoughts? __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Exporting to text files
I have dose response data analyzed with the package 'drc'. 'summary(mymodel)' prints my kinetic parameters. I want that text in an ASCII text file. I want to get exactly what I would get if I copied and pasted from the terminal window. I've read the documentation on data export to text files here: https://cran.r-project.org/doc/manuals/r-release/R-data.html#Export-to-text-files write() does not work. > summary(mymodel) Model fitted: Michaelis-Menten (2 parms) Parameter estimates: Estimate Std. Error t-value p-value d:(Intercept) 213.435 67.094 3.1811 0.009801 ** e:(Intercept) 94.493 59.579 1.5860 0.143820 --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Residual standard error: 22.03492 (10 degrees of freedom) > write(summary(mymodel), "kinetics.txt") Error in cat(x, file = file, sep = c(rep.int(sep, ncolumns - 1), "\n"), : argument 1 (type 'list') cannot be handled by 'cat' If I try to unlist(mymodel): > write(unlist(summary(mymodel)), "kinetics.txt") I get the following contents of "kinetics.txt": 485.537711262143 4501.62443636671 3821.31920509004 3821.31920509004 3549.67055527084 213.435401944579 94.4931993582911 67.0941460663053 59.5791117361684 3.18113299681396 1.58601222147673 0.00980057624097692 0.143819823442402 MM.2() continuous 10 4.63571040101587 3.93514151059103 3.93514151059103 3.65540149913749 Michaelis-Menten 2 22.0349202690217 10 How do I get the output of 'summary(mymodel)' verbatim? Why doesn't it work the way I think it does? What documentation should I read to understand what's going on here? __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] How to alpha entire plot?
I have two chromatograms I want plotted on the same axes. I would like the plots to be transparent, so the first chart is not obscured. I have tried adjustcolor(..., alpha.f=0.3), the problem is that my chromatogram is so dense with datapoints that they overlap and the entire graph just ends up a solid color. The second histogram still obscures the first. Consider this example: col1 <- adjustcolor("red", alpha.f=0.3) col2 <- adjustcolor("blue", alpha.f=0.3) EU <- data.frame(EuStockMarkets) with(EU, plot(DAX, CAC, col=col2, type="h", ylim=c(0,6000))) par(new=TRUE) with(EU, plot(DAX, FTSE, col=col1, type="h", ylim=c(0,6000))) The density of the red plot around 2000 completely obscures the blue plot behind it. What I would like to do is plot both plots in solid colors, then alpha the entire thing, and then overlay them. Or some other method that achieves a comparable result. Thanks __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Boxplot, formula interface, and labels.
I have data I'd like to plot using the formula interface to boxplot. I call boxplot like so: with(mydata, boxplot(count ~ geno * tissue)) I get a boxplot with x axis labels like "wt.kidney". I would like to change the '.' to a newline. Where is this separator configured? Thanks, -Ed __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Boxplot, formula interface, and labels.
Another way to think of this problem. If I could get my hands on the vector of names boxplot() is creating, I could use gsub() to replace '.' with '\n'. Is there something I could run before boxplot() that would give me that vector of names which I could then pass to boxplot()? On Thu, Sep 28, 2017 at 11:40 AM, Ed Siefker wrote: > I have data I'd like to plot using the formula interface to boxplot. > I call boxplot like so: > > with(mydata, boxplot(count ~ geno * tissue)) > > I get a boxplot with x axis labels like "wt.kidney". I would like > to change the '.' to a newline. Where is this separator configured? > > Thanks, > -Ed __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Fwd: Boxplot, formula interface, and labels.
I knew I was making harder than it needed to be. I see it now in ?boxplot Thanks! On Thu, Sep 28, 2017 at 12:30 PM, David L Carlson wrote: > Just change the separator: > > data(Titanic) > Titanic.df <- as.data.frame(Titanic) > boxplot(Freq~Class*Sex, Titanic.df, cex.axis=.6, sep="\n") > > See attached .png. > > > David L Carlson > Department of Anthropology > Texas A&M University > College Station, TX 77843-4352 > > > -Original Message- > From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Ista Zahn > Sent: Thursday, September 28, 2017 12:27 PM > To: Ed Siefker > Cc: r-help > Subject: Re: [R] Boxplot, formula interface, and labels. > > mybp <- boxplot(count ~ geno * tissue, data = mydata, plot = FALSE) > mybp$names <- gsub("\\.", "\n", mybp$names) > bxp(mybp) > > See ?boxplot for details. > > Best, > Ista > > On Thu, Sep 28, 2017 at 12:40 PM, Ed Siefker wrote: >> I have data I'd like to plot using the formula interface to boxplot. >> I call boxplot like so: >> >> with(mydata, boxplot(count ~ geno * tissue)) >> >> I get a boxplot with x axis labels like "wt.kidney". I would like to >> change the '.' to a newline. Where is this separator configured? >> >> Thanks, >> -Ed >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] data.matrix output is not numeric
I have a data frame full of integer values. I need a matrix full of numeric values. ?data.matrix reads: Return the matrix obtained by converting all the variables in a data frame to numeric mode and then binding them together as the columns of a matrix. This does not work. test.df <- data.frame(a=as.integer(c(1,2,3)), b=as.integer(c(4,5,6))) > class(test.df[[1,1]]) [1] "integer" > class(data.matrix(test.df)[[1]]) [1] "integer" What's going on here? __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] reading data into nested frames
I have many data files named like this: E11.5-021415-dko-1-1-masked-bottom-area.tsv E11.5-021415-dko-1-1-masked-top-area.tsv E11.5-021415-dko-1-2-masked-bottom-area.tsv E11.5-021415-dko-1-2-masked-top-area.tsv E11.5-021415-dko-1-3-masked-bottom-area.tsv E11.5-021415-dko-1-3-masked-top-area.tsv age-date-genotype-num-slicenum-filler-position-data An individual sample is an age-date-geno-num, each sample has two parts, and is composed of around 10 slices. Each row of the tsv is an area which will be summed for the total area. What I want is a dataframe, with a row for each sample and a column for bottom and top. Under bottom and top, I want each element to be a dataframe with a row for each slice and a column for the area. So I can lapply over this list of files, use strsplit to pull out the slice num and put the area into the correct row of a dataframe easily enough. But I have a line for every datapoint, not sample, and there would be a dataframe for each area. How can I merge all the data for the slices into one data frame? Does this make sense? Thanks -Ed __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] merging dataframes in a list
I have a list of data as follows. > list(data.frame(name="sample1", red=20), data.frame(name="sample1", > green=15), data.frame(name="sample2", red=10), data.frame(name="sample 2", > green=30)) [[1]] name red 1 sample1 20 [[2]] name green 1 sample115 [[3]] name red 1 sample2 10 [[4]] name green 1 sample230 I would like to massage this into a data frame like this: name red green 1 sample1 2015 2 sample2 1030 I'm imagining I can use aggregate(mylist, by=samplenames, merge) right? But how do I get the list of samplenames? How do I subset each dataframe inside the list? __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] merging dataframes in a list
I manually constructed the list of sample names and tried the aggregate call I mentioned. Merge works when called manually, but not when using aggregate. > mylist <- list(data.frame(name="sample1", red=20), data.frame(name="sample1", > green=15), data.frame(name="sample2", red=10), data.frame(na me="sample2", > green=30)) > names <- list("sample1", "sample1", "sample2", "sample2") > merge(mylist[1], mylist[2]) name red green 1 sample1 2015 > merge(mylist[3], mylist[4]) name red green 1 sample2 1030 > aggregate(mylist, by=as.list(names), merge) Error in as.data.frame(y) : argument "y" is missing, with no default What's the right way to do this? On Fri, Jun 3, 2016 at 1:20 PM, Ed Siefker wrote: > I have a list of data as follows. > >> list(data.frame(name="sample1", red=20), data.frame(name="sample1", >> green=15), data.frame(name="sample2", red=10), data.frame(name="sample 2", >> green=30)) > [[1]] > name red > 1 sample1 20 > > [[2]] > name green > 1 sample115 > > [[3]] > name red > 1 sample2 10 > > [[4]] > name green > 1 sample230 > > > I would like to massage this into a data frame like this: > > name red green > 1 sample1 2015 > 2 sample2 1030 > > > I'm imagining I can use aggregate(mylist, by=samplenames, merge) > right? But how do I get the list of samplenames? How do I subset > each dataframe inside the list? __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] merging dataframes in a list
aggregate isn't really what I want. Maybe tapply? I still can't get it to work. > length(mylist) [1] 4 > length(names) [1] 4 > tapply(mylist, names, merge) Error in tapply(mylist, names, merge) : arguments must have same length I guess because a list isn't an atomic data type. What function will do the same on lists? lapply doesn't have a 'by' argument. On Fri, Jun 3, 2016 at 1:41 PM, Ed Siefker wrote: > I manually constructed the list of sample names and tried the > aggregate call I mentioned. > Merge works when called manually, but not when using aggregate. > >> mylist <- list(data.frame(name="sample1", red=20), >> data.frame(name="sample1", green=15), data.frame(name="sample2", red=10), >> data.frame(na me="sample2", green=30)) >> names <- list("sample1", "sample1", "sample2", "sample2") >> merge(mylist[1], mylist[2]) > name red green > 1 sample1 2015 >> merge(mylist[3], mylist[4]) > name red green > 1 sample2 1030 >> aggregate(mylist, by=as.list(names), merge) > Error in as.data.frame(y) : argument "y" is missing, with no default > > What's the right way to do this? > > On Fri, Jun 3, 2016 at 1:20 PM, Ed Siefker wrote: >> I have a list of data as follows. >> >>> list(data.frame(name="sample1", red=20), data.frame(name="sample1", >>> green=15), data.frame(name="sample2", red=10), data.frame(name="sample 2", >>> green=30)) >> [[1]] >> name red >> 1 sample1 20 >> >> [[2]] >> name green >> 1 sample115 >> >> [[3]] >> name red >> 1 sample2 10 >> >> [[4]] >> name green >> 1 sample230 >> >> >> I would like to massage this into a data frame like this: >> >> name red green >> 1 sample1 2015 >> 2 sample2 1030 >> >> >> I'm imagining I can use aggregate(mylist, by=samplenames, merge) >> right? But how do I get the list of samplenames? How do I subset >> each dataframe inside the list? __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] merging dataframes in a list
Thanks, ldply got me a data frame straight away. But it filled empty spaces with NA and merge no longer works. > ldply(mylist) name red green 1 sample1 20NA 2 sample1 NA15 3 sample2 10NA 4 sample2 NA30 > mydf <- ldply(mylist) > merge(mydf[1,],mydf[2,]) [1] name red green <0 rows> (or 0-length row.names) > merge(mydf[1,],mydf[2,], by=1) name red.x green.x red.y green.y 1 sample120 NANA 15 How do I merge dataframes with NA? On Fri, Jun 3, 2016 at 2:17 PM, Ulrik Stervbo wrote: > You can use ldply in the plyr package to bind all the data.frames together > (a regular loop will also work). Afterwards you can summarise using ddply > > Hope this helps > Ulrik > > > Ed Siefker schrieb am Fr., 3. Juni 2016 21:10: >> >> aggregate isn't really what I want. Maybe tapply? I still can't get >> it to work. >> >> > length(mylist) >> [1] 4 >> > length(names) >> [1] 4 >> > tapply(mylist, names, merge) >> Error in tapply(mylist, names, merge) : arguments must have same length >> >> I guess because a list isn't an atomic data type. What function will >> do the same on lists? lapply doesn't have a 'by' argument. >> >> On Fri, Jun 3, 2016 at 1:41 PM, Ed Siefker wrote: >> > I manually constructed the list of sample names and tried the >> > aggregate call I mentioned. >> > Merge works when called manually, but not when using aggregate. >> > >> >> mylist <- list(data.frame(name="sample1", red=20), >> >> data.frame(name="sample1", green=15), data.frame(name="sample2", red=10), >> >> data.frame(na me="sample2", green=30)) >> >> names <- list("sample1", "sample1", "sample2", "sample2") >> >> merge(mylist[1], mylist[2]) >> > name red green >> > 1 sample1 2015 >> >> merge(mylist[3], mylist[4]) >> > name red green >> > 1 sample2 1030 >> >> aggregate(mylist, by=as.list(names), merge) >> > Error in as.data.frame(y) : argument "y" is missing, with no default >> > >> > What's the right way to do this? >> > >> > On Fri, Jun 3, 2016 at 1:20 PM, Ed Siefker wrote: >> >> I have a list of data as follows. >> >> >> >>> list(data.frame(name="sample1", red=20), data.frame(name="sample1", >> >>> green=15), data.frame(name="sample2", red=10), data.frame(name="sample >> >>> 2", >> >>> green=30)) >> >> [[1]] >> >> name red >> >> 1 sample1 20 >> >> >> >> [[2]] >> >> name green >> >> 1 sample115 >> >> >> >> [[3]] >> >> name red >> >> 1 sample2 10 >> >> >> >> [[4]] >> >> name green >> >> 1 sample230 >> >> >> >> >> >> I would like to massage this into a data frame like this: >> >> >> >> name red green >> >> 1 sample1 2015 >> >> 2 sample2 1030 >> >> >> >> >> >> I'm imagining I can use aggregate(mylist, by=samplenames, merge) >> >> right? But how do I get the list of samplenames? How do I subset >> >> each dataframe inside the list? >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] metafor - code for analysing geometric means
Dear All I have tried very hard to work out what to do with putting logged data into metafor; the paper says.. 'geometric mean antibody concentrations (GMCs) or opsonophagocytic activity titres (geometric mean titres [GMT]) were calculated with 95% CIs by taking the antilog of the mean of the log concentration or titre transformations.' Does this look right if I take the reported mean, upper and lower bound of the CI, and the number? m<-log(mean) ub<-log(upper bound) lb<-log(lower bound) diff<-ub-lb SE<-diff/3.92 SD<-SE*(sqrt(n)) Then put m, SD and n for each group into metafor as normal. Or is there a better way? I am afraid I didn't understand how to do it on a log scale. Thank you Edward Edward Purssell PhD Senior Lecturer Florence Nightingale Faculty of Nursing and Midwifery King's College London James Clerk Maxwell Building 57 Waterloo Road London SE1 8WA Telephone 020 7848 3021 Mobile 07782 374217 email edward.purss...@kcl.ac.uk https://www.researchgate.net/profile/Edward_Purssell From: Viechtbauer Wolfgang (STAT) Sent: 14 November 2014 10:40 To: Michael Dewey; Purssell, Ed; r-help@r-project.org Subject: RE: [R] metafor - code for analysing geometric means With "geometric mean 1 CI /3.92", I assume you mean "(upper bound - lower bound) / 3.92". Two things: 1) That will give you the SE of the mean, not the SD of the observations (which is what you need as input). 2) Probably the CI for the geometric mean was calculated on the log-scale (as Michael hinted at). Check if log(upper bound) and log(lower bound) is (within rounding error) symmetric around log(geometric mean). Then (log(upper bound) - log(lower bound)) / 3.96 * sqrt(n) will give you the SD of the log of the values used to compute the geometric mean. Then you could use log(geometric mean) and that SD as input. But this would give you the difference of the log-transformed geometric means. Not sure if this is what you want to analyze. Two more articles that may be helpful here: Friedrich, J. O., Adhikari, N. K., & Beyene, J. (2012). Ratio of geometric means to analyze continuous outcomes in meta-analysis: Comparison to mean differences and ratio of arithmetic means using empiric data and simulation. Statistics in Medicine, 31(17), 1857-1886. Souverein, O. W., Dullemeijer, C., van 't Veer, P., & van der Voet, H. (2012). Transformations of summary statistics as input in meta-analysis for linear dose-response models on a logarithmic scale: A methodology developed within EURRECA. BMC Medical Research Methodology, 12(57). Best, Wolfgang > -Original Message- > From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] > On Behalf Of Michael Dewey > Sent: Thursday, November 13, 2014 12:36 > To: Purssell, Ed; r-help@r-project.org > Subject: Re: [R] metafor - code for analysing geometric means > > On 13/11/2014 11:00, Purssell, Ed wrote: > > ?Dear All > > > > I have some data expressed in geometric means and 95% confidence > intervals. Can I code them in metafor as: > > > > rma(m1i=geometric mean 1, m2i=geometric mean 2, sd1i=geometric mean 1 > CI /3.92, sd2i=geometric mean 2 CI/3.92...etc, measure="MD") > > Would it not be better to work on the log scale? > > > All of the studies use geometric means. > > > > Thanks! > > > > Edward > > -- > Michael > http://www.dewey.myzen.co.uk __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] inverse of which()
Given a vector of booleans, chich() will return indices that are TRUE. Given a vector of indices, how can I get a vector of booleans? My intent is to do logical operations on the output of grep(). Maybe there's a better way to do this? Thanks -Ed __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] inverse of which()
That's exactly what I want! Thanks! -Ed On Wed, Feb 27, 2019 at 5:14 PM David L Carlson wrote: > > I'm not sure I completely understand your question. Would using grepl() > instead of grep() let you do what you want? > > > David L Carlson > Department of Anthropology > Texas A&M University > College Station, TX 77843-4352 > > -Original Message- > From: R-help On Behalf Of Ed Siefker > Sent: Wednesday, February 27, 2019 5:03 PM > To: r-help > Subject: [R] inverse of which() > > Given a vector of booleans, chich() will return indices that are TRUE. > > Given a vector of indices, how can I get a vector of booleans? > > My intent is to do logical operations on the output of grep(). Maybe > there's a better way to do this? > > Thanks > -Ed > > __ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Using compute.es and metafor together
Dear All For mathematically challenged people such as myself; is it ok to use the compute.es package to calculate effect sizes and then import the effect sizes d and variances of d into metafor, coding these as yi and vi respectively and then running the meta-analysis? This seems easier because compute.es offers a lot of ways of calculating d and its variance using similar codes. Thanks Edward Edward Purssell PhD Senior Lecturer Florence Nightingale Faculty of Nursing and Midwifery King's College London James Clerk Maxwell Building 57 Waterloo Road London SE1 8WA Telephone 020 7848 3021 Mobile 07782 374217 email edward.purss...@kcl.ac.uk https://www.researchgate.net/profile/Edward_Purssell [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] metafor - code for analysing geometric means
?Dear All I have some data expressed in geometric means and 95% confidence intervals. Can I code them in metafor as: rma(m1i=geometric mean 1, m2i=geometric mean 2, sd1i=geometric mean 1 CI /3.92, sd2i=geometric mean 2 CI/3.92...etc, measure="MD") All of the studies use geometric means. Thanks! Edward [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] plot changes usr?
I'm trying to plot() over an existing plot() like this: > attach(mtcars) > plot(mpg, hp) > par(new=TRUE) > par("usr") [1] 9.46 34.84 40.68 346.32 > plot(mpg, hp, col="red", axes=FALSE, xlim=par("usr")[1:2], > ylim=par("usr")[3:4], xlab="", ylab="") > par("usr") [1] 8.4448 35.8552 28.4544 358.5456 For some reason "usr" is changing, and so it's not plotting over the existing data in the right place. __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] problem with formula argument to randomForest
The randomForest function generates an error whenever I supply it with a formula using the function, I() to inhibit interpretation. When I do so, I always get an error like this one: Error in unique(c("AsIs", oldClass(x))) : object 'Age' not found Is this because of: 1. a restriction for the randomForest function that I have not seen documented; 2. a deficiency / error in randomForest; or 3. an error in my calling sequence? I am including a very simple example to demonstrate the problem. Simply using I() generates the error. This is not a meaningful use of I(), but is very simple. My Interest is for I( / ) . I also demonstrate that the usage of I() in a formula works just fine for another discrimination function, lda. The sample code is included after my signature, along with line-by-line output. Thanks in advance ! Ed Komp ITTC Lab, University of Kansas === > library(rpart) > library(MASS) > library(randomForest) randomForest 4.6-12 Type rfNews() to see new features/changes/bug fixes. > formula <- as.formula('Kyphosis ~ Age + Number + Start') > formula Kyphosis ~ Age + Number + Start > formulaWithI <- as.formula('Kyphosis ~ I(Age) + Number + Start') > formulaWithI Kyphosis ~ I(Age) + Number + Start > fit <- randomForest(formula, data=kyphosis) > fitWithI <- randomForest(formulaWithI, data=kyphosis) Error in unique(c("AsIs", oldClass(x))) : object 'Age' not found > > fit <- lda(formula, data = kyphosis) > fitWithI <- lda(formula, data = kyphosis) > fitWithI Call: lda(formula, data = kyphosis) Prior probabilities of groups: absent present 0.7901235 0.2098765 Group means: Age Number Start absent 79.89062 3.75 12.609375 present 97.82353 5.176471 7.294118 Coefficients of linear discriminants: LD1 Age 0.005910971 Number 0.291501797 Start -0.170496626 > > sessionInfo() R version 3.2.2 (2015-08-14) Platform: x86_64-apple-darwin13.4.0 (64-bit) Running under: OS X 10.11 (El Capitan) locale: [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8 attached base packages: [1] stats graphics grDevices utils datasets methods base other attached packages: [1] randomForest_4.6-12 MASS_7.3-44 rpart_4.1-10 __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] aggregate and the $ operator
Aggregate does the right thing with column names when passing it numerical coordinates. Given a dataframe like this: Nuclei Positive Nuclei Slide 1133 96A1 2 96 70A1 3 62 52A2 4 60 50A2 I can call 'aggregate' like this: > aggregate(example[1], by=example[3], sum) Slide Nuclei 1A1229 2A2122 But that means I have to keep track of which column is which number. If I try it the easy way, it doesn't keep track of column names and it forces me to coerce the 'by' to a list. > aggregate(example$Nuclei, by=list(example$Slide), sum) Group.1 x 1 A1 229 2 A2 122 Is there a better way to do this? Thanks -Ed __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] aggregate and the $ operator
So that's how that works! Thanks. On Fri, Jan 22, 2016 at 1:32 PM, Joe Ceradini wrote: > Does this do what you want? > > aggregate(Nuclei ~ Slide, example, sum) > > On Fri, Jan 22, 2016 at 12:20 PM, Ed Siefker wrote: >> >> Aggregate does the right thing with column names when passing it >> numerical coordinates. >> Given a dataframe like this: >> >> Nuclei Positive Nuclei Slide >> 1133 96A1 >> 2 96 70A1 >> 3 62 52A2 >> 4 60 50A2 >> >> I can call 'aggregate' like this: >> >> > aggregate(example[1], by=example[3], sum) >> Slide Nuclei >> 1A1229 >> 2A2122 >> >> But that means I have to keep track of which column is which number. >> If I try it the >> easy way, it doesn't keep track of column names and it forces me to >> coerce the 'by' >> to a list. >> >> > aggregate(example$Nuclei, by=list(example$Slide), sum) >> Group.1 x >> 1 A1 229 >> 2 A2 122 >> >> Is there a better way to do this? Thanks >> -Ed >> >> __ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > > > > -- > Cooperative Fish and Wildlife Research Unit > Zoology and Physiology Dept. > University of Wyoming > joecerad...@gmail.com / 914.707.8506 > wyocoopunit.org > __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] lists and rownames
I'm doing some string manipulation on a vector of file names, and noticed something curious. When I strsplit the vector, I get a list of character vectors. The list is numbered, as lists are. When I cast that list as a data frame with 'as.data.frame()', the resulting columns have names derived from the original filenames. Example code is below. My question is, where are these names stored in the list? Are there methods that can access this from the list? Is there a way to preserve them verbatim? Thanks -Ed > example.names [1] "con1-1-masked-bottom-green.tsv" "con1-1-masked-bottom-red.tsv" [3] "con1-1-masked-top-green.tsv""con1-1-masked-top-red.tsv" > example.list <- strsplit(example.names, "-") > example.list [[1]] [1] "con1" "1" "masked""bottom""green.tsv" [[2]] [1] "con1""1" "masked" "bottom" "red.tsv" [[3]] [1] "con1" "1" "masked""top" "green.tsv" [[4]] [1] "con1""1" "masked" "top" "red.tsv" > example.df <- as.data.frame(example.list) > example.df c..con11maskedbottomgreen.tsv.. 1con1 2 1 3 masked 4 bottom 5 green.tsv c..con11maskedbottomred.tsv.. 1 con1 2 1 3masked 4bottom 5 red.tsv c..con11maskedtopgreen.tsv.. 1 con1 21 3 masked 4 top 5green.tsv c..con11maskedtopred.tsv.. 1 con1 2 1 3 masked 4top 5red.tsv __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] T tests on multiple groups
I have a data set with observations on groups with multiple variables. Let's call them GENO and AGE. I have control and test genotypes and two different ages. It is only meaningful to compare control and test within the same age. I'd like to get the p value for each group compared back to control of the appropriate age. T-test requires that the grouping factor has exactly two levels. How can I do this efficiently? I was hoping something like ttest(OBS ~ GENO * AGE, mydata) would work. Is there something I can do with tapply() or aggregate() to do this? I'd like to end up with a table that looks like this: GENOAgeOBSp.val control101.11 control100.91 control202.11 control201.91 A10110.01224066 A1090.01224066 A20210.003102783 A20190.003102783 B1040.057714305 B1060.057714305 B20140.005923285 B20160.005923285 AB1010.698488655 AB101.10.698488655 AB2020.552786405 AB202.20.552786405 __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Average over data sets
Hello, I have a number of files output1.dat, output2.dat, ... , output20.dat, each of which monitors several variables over a fixed number of timepoints. From this I want to create a data frame which contains the mean value between all files, for each timepoint and each variable. The code below works, but it seems like I should be able to do the second part without a for loop. I played with sapply(myList, mean), but that seems to take the mean between time points and files, rather than just between files. #Number of files to calculate mean value between numberOfRuns = 20; myList = list(); for (i in 1:numberOfRuns) { #Read in file fileName = paste("output", i, ".dat", sep=""); myData = read.table(fileName, header=TRUE); #Append data frame to list myList[[i]] = myData; } #Create variable to store data means myAverage = myList[[1]]/numberOfRuns; for (i in 2:numberOfRuns) { myAverage = myAverage + myList[[i]]/numberOfRuns; } Is a list of data frames a sensible structure to store this or should I use an array? Any pointers gratefully received. Ed Long __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] cygwin clipboard
I'd like to be able to access the windows clipboard from R under Cygwin. But... > read.table(file="clipboard") Error in file(file, "rt") : cannot open the connection In addition: Warning message: In file(file, "rt") : unable to contact X11 display > Is this supported in any way? Thanks -Ed __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] why is this a factor?
I have a table, and I want a new column to add some annotations to. But it ends up as a factor instead of characters, and won't let me add arbitrary text. > data(iris) > iris<-data.frame(iris,annot=c("")) > iris[1,"annot"]<-"annotation" Warning message: In `[<-.factor`(`*tmp*`, iseq, value = "annotation") : invalid factor level, NAs generated > class(iris[,"annot"]) [1] "factor" > class(c("")) [1] "character" Why is c("") a character, but when I add it to a data frame it's a factor? What am I missing? Is there a better way to add a new column to a data frame? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Metafor - why use escalc?
Dear All As you can specify the data directly to rma.uni via n1i, m1i, sd1i, etc in Metafor, why would you ever want to use escalc to calculate yi and vi? Aren't these just intermediate steps to the final pooled effect size which is calculated by rma.uni; or is there some advantage to calculating yi and vi separately using escalc? Thanks Ed [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] metafor combining escalc effect-sizes
Dear All I have a question about combining effect sizes generated by escalc in metafor. I realise these may be stupid things to do; but they are deliberately so to explain what I mean - I don't intend doing this! I have 3 studies; each of which has a different measure of effect/presents the data differently, so I use escalc to calculate the effect size of each and combine them into a data-frame: es1<-escalc(measure="MD", m1i=10 , m2i=5 , n1i=12 , n2i=12, sd1i=2, sd2i=2) es2<-escalc(measure="RR", ai=10 , bi=5 , ci=12 , di=12) es3<-escalc(measure="RR", ai=10 , ci=5 , n1i=15 , n2i=12) es4<-rbind(es1, es2, es3) # combines the 3 effect sizes into a data frame attach(es4) # makes the data frame available to R es5<-rma(yi, vi, data=es4) # running the meta analysis here gives the error message Error in rma(yi, vi, data = es4) : Length of yi and ni vectors are not the same. But if I save this as a .csv and open it in R using read.csv("E:/es5.csv", etc) i get a data frame that looks like this: yi vi 1 5. 0.6667 2 0.6931 0.4667 3 0.4700 0.1500 I can run it using rma(yi, vi, data=es4) I have three questions. 1. Can escalc be used in this way to calculate each study effect size indvidually and then rbinding them into a data-frame (assuming that it is a sensible thing to do, which I realise the above probably isn't)? 2. What is the meaning of the error message: Error in rma(yi, vi, data = es4) : Length of yi and ni vectors are not the same. 3. Is it right to save it as a .csv, open it and re-run it as I have done? Thanks very much, and to Wolfgang thanks for a great programme! I am using it in my MSc teaching here for healthcare students. Edward [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Java requested System.exit(130)
I'm used to using ctrl-c to end operations without killing R. But I've used xlsx in this session, which loads Java, which apparently intercepts the ctrl-C. Accordingly, I hit ctrl-C, R died, and I lost a lot of work. I did some looking, and found a thread(http://comments.gmane.org/gmane.comp.lang.r.rosuda.devel/1368) that says: "Yes, at least on Sun JVMs you need to add -Xrs java option so the JVM doesn't steal SIGINT from R (see archives)." So, how do I actually do that? I'm not running java from the command line, I'm using "library(xlsx)". How do I tell R to pass that option to the JVM? Thanks -Ed __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R licensing query
Unfortunately this is how things work in the real world. I suspect the reason so many people keep getting in trouble for taking classified information home is because they can not get any work done on the office computer due to things like this. Many of the places I've worked have not permuted me to install Vim on my computer. I had to use MS Visual C++ editor or MS Notepad for all text editing. Since I usually get payed by the hour, it just cost them more, and increases my income, but I still find it incredibly annoying. -EdK Ed Keith e_...@yahoo.com Blog: edkeith.blogspot.com --- On Thu, 6/17/10, Frank E Harrell Jr wrote: > From: Frank E Harrell Jr > Date: Thursday, June 17, 2010, 12:11 PM > Pardon my english but you're working > for idiots. I'd look elsewhere if there are other > options. IT departments should be here to help get > things done, not to help prevent good work from being done. > > Frank > > On 06/17/2010 04:28 AM, McAllister, Gina wrote: > > I have recently started a new job at an NHS hospital > in Scotland. Since > > I took up this post 6 months ago I have had an ongoing > dispute with the > > IT secutiry dept. who refuse to install R on my > computer. I previously > > worked in another branch of the NHS where R was widely > used and yet > > there is nothing I can say which will persuade the IT > dept here to even > > visit the website! With some help from our head > of department, they > > have now agreed to install R but only if they receive > an email from 'R' > > ensuring that it is licensed for commercial use, is > compaitable with > > Windows XP and will not affect the networked computer > system here. My > > only other option for data anlaysis is Excel, we have > no money for > > S-plus or any other stats programme. Can anyone > suggest anything or > > send me a suitable email? > > > > Many thanks, > > Georgina > > > > > > -- Frank E Harrell Jr Professor and > Chairman School of Medicine > > Department of > Biostatistics Vanderbilt University > > __ > R-help@r-project.org > mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, > reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] F# vs. R
It's been a long time since I used Fortran, and I have only dabbled in F#, but I do not think translating Fortran (or R) to F# will be easy. F# is basicly a functional language (like ML) and a very differant mind set than Fortran (or R). -EdK Ed Keith e_...@yahoo.com Blog: edkeith.blogspot.com --- On Thu, 7/8/10, rkevinbur...@charter.net wrote: > From: rkevinbur...@charter.net > Subject: Re: [R] F# vs. R > To: r-help@r-project.org, "Patrick Burns" , > serg...@gmail.com > Date: Thursday, July 8, 2010, 10:16 AM > True, porting old C and Fortran code > to C# or F# would be a pain and probably riddled with errors > but it is not too soon to start looking to see if there is a > better way. There have been numerous ports of LAPACK, BLAS, > etc. to C#. Maybe they could be leveraged. > > Maybe just allowing packages to be wrtten in C# or F# would > be helpful. And remember there is Mono. > > Just my 2 cents. > > Patrick Burns > wrote: > > I'd like to hear answers to this as well. > > A language doesn't have to be a complete > > replacement to be useful. > > > > F# seems to have some nice features. > > > > Pat > > > > On 07/07/2010 17:54, Sergey Goriatchev wrote: > > > Hello, Marc > > > > > > No, I do not want to validate Cox PH. :-) > > > I do use R daily, though right now I do not use > the statistical part that much. > > > > > > I just generally wonder if any R-user tried F# > and his/her opinions. > > > > > > Regards, > > > Sergey > > > > > > > > > On Wed, Jul 7, 2010 at 17:56, Marc Schwartz > wrote: > > >> On Jul 7, 2010, at 10:31 AM, Sergey > Goriatchev wrote: > > >> > > >>> Hello, everyone > > >>> > > >>> F# is now public. Compiled code should > run faster than R. > > >>> > > >>> Anyone has opinion on F# vs. R? Just > curious > > >>> > > >>> Best, > > >>> S > > >> > > >> > > >> The key time critical parts of R are written > in compiled C and FORTRAN. > > >> > > >> Of course, if you want to take the time to > code and validate a Cox PH or mixed effects model in F# and > then run them against R's coxph() or lme()/lmer() functions > to test the timing, feel free... :-) > > >> > > >> So unless there is a pre-existing library of > statistical and related functionality for F#, perhaps you > need to reconsider your query. > > >> > > >> Regards, > > >> > > >> Marc Schwartz > > >> > > >> > > > > > > > > > > > > > -- > > Patrick Burns > > pbu...@pburns.seanet.com > > http://www.burns-stat.com > > (home of 'Some hints for the R beginner' > > and 'The R Inferno') > > > > __ > > R-help@r-project.org > mailing list > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, > reproducible code. > > __ > R-help@r-project.org > mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, > reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Using feather.plot to try and generate a stick plot of current velocity data (and having issues)
Hello All, I am attempting to use the feather.plot function from the plotrix package to graph current velocity data as I have speed and direction. I let "r" be the first 10 rows of current speed data and "theta" be the first 10 rows of directional data in radians. I had tried this with 10 measurements, but keep getting the following error message: > feather.plot(r,theta,1:10,yref=0,use.arrows=FALSE, fp.type="m") Error in segments(xpos, yref, xpos + x, y, ...) : invalid third argument My goal was to trouble shoot this smaller data set and see if I could ramp up to few thousand entries to basically generate a stick plot of current flow data. Curious if I was doing anything obviously wrong with my arguments or if I should be using an entirely different function. Thanks for any guidance Eddie Hughes [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Alphabetical sequence of data along the x-axis in a box plot
Hello All, I noticed when I generated some boxplots, the data is presented in alphabetical order along the x-axis (the data in this case was the four quandrants of a sample area (NE,NW, SE, SW) that was my first column of data). Is there a way to have R plot the data in a different order? I imagine you could use a dummy variable, but didn't know if there might be a simple argument that will address this? Thanks for any guidance, Eddie Hughes [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] gennerating skewed random numbers
This is not exactly an R specific question, but I think the people on this list can probably help. I'm working on a simulation. In the model I have the first three moments of the distributions of the variables. I know how to generate a random number from a distribution given the first two moments assuming the third moment is 0. But I do not know how to generate a number drawn from a distribution with a nonzero third monument. If someone could point me to a good reference I would appreciate it. Thank you in advance, -EdK Ed Keith e_...@yahoo.com Blog: edkeith.blogspot.com __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Display a DataFrame in a data grid
Hi, all; I'm new to R. Have been a SAS developer for over 20 years. Whenever I create a new table - you call them dataFrame objects - or modify an existing one, I like to open the table in a grid with horizontal and vertical sliders so that I can scan across the table and (especially) look at all four corners. If I made a gross error, it often shows up when I look at the corners of the table. I just can't seem to find how to evoke such a display. Can anybody help me here? Ed Ed Heaton Project Manager, Sr. SAS Developer Data and Analytic Solutions, Inc. 3057 Nutley Street, #602 Fairfax, VA 22031 Office: 301-520-7414 Fax: 703-991-8182 <mailto:ehea...@dasconsultants.com> ehea...@dasconsultants.com <http://www.dasconsultants.com/> www.dasconsultants.com CMMI ML-2, SBA 8(a) & SDB, WBE (WBENC), MBE (VA & MD) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Thanks for the help with displaying a data frame.
Thanks to Michael Weylandt and Josh Wiley for pointing me to the View() function. It worked like a charm - once I learned that R is case-sensitive. I told you I am new to R! Ed Ed Heaton 10318 Yearling Drive Rockville, MD 20850-3517 Voice: (301) 424-8186 Mobile: (301) 520-7414 Fax: (301) 424-8187 eMail: <mailto:e...@heaton.name> e...@heaton.name URL: <http://ed.heaton.name/> http://ed.heaton.name [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Getting data from an *.RData file into a data.frame object.
Hi, all. I'm new to R. I've been a SAS programmer for 20 years. I seem to be having trouble with the most basic task - bringing a table in an *.RData file into a data.frame object. Here's how I created the *.RData file. library(RODBC) db <- odbcConnect("***") df <- sqlQuery( db , "select * from schema.table where year(someDate)=2006" ) save( df , file="C:/Documents and Settings/userName/My Documents/table2006.RData" ) dim(df) remove(df) odbcClose(db) remove(db) detach("package:RODBC") Next, I moved that data file (table2006.RData) to another workstation - not at the client site. Now, I need to get that data file into a data.frame object. I know this should be simple, but I can't seem to find out how to do that. I tried the following. First, after opening R without doing anything, RGui used 35,008 KB of memory. I submitted the following. > debt2006 <- load("T:/R.Data/table2006.RData") Memory used by RGui jumped to 191,512 KB. So, it looks like the data loaded. However, debt2005 is of type character instead of data.frame. > ls() [1] "debt2005" > class(debt2005) [1] "character" > Help, please. Ed Ed Heaton Project Manager, Sr. SAS Developer Data and Analytic Solutions, Inc. 10318 Yearling Drive Rockville, MD 20850 Office: 301-520-7414 ehea...@dasconsultants.com www.dasconsultants.com <http://www.dasconsultants.com/> CMMI ML-2, SBA 8(a) & SDB, WBE (WBENC), MBE (VA & MD) e...@heaton.name (Re: http://www.r-project.org/posting-guide.html) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Writing non-graphic (text) output to PDF
Hi, friends. I keep coming to you because I'm so new to R and can't seem to figure out some simple things. Sorry. Consider the following code. I want to load a table and write out the structure to a PDF document. I just can't seem to manage writing non-graphic output to PDF. Any help? I've tried several functions, but nothing worked. All I get is the title. # ** # Load the DEBT table. debt <- readRDS("T:/R.Data/Debt.rData") dim(debt) # Open the debt.pdf file for graphics output. pdf( file=paste( "R:/DAS/DMS/FedDebt" ,"DataDiscovery" ,"DistributionAnalysis" ,"Report" ,"Debt.pdf" ,sep="/" ) ) # == # Write the debt structucture to the output PDF. plot.new() title("DEBT") str(debt) # == dev.off() # Turn off the PDF device. # ** End of Program Ed Ed Heaton Project Manager, Sr. SAS Developer Data and Analytic Solutions, Inc. 10318 Yearling Drive Rockville, MD 20850 Office: 301-520-7414 ehea...@dasconsultants.com www.dasconsultants.com <http://www.dasconsultants.com/> CMMI ML-2, SBA 8(a) & SDB, WBE (WBENC), MBE (VA & MD) e...@heaton.name (Re: http://www.r-project.org/posting-guide.html) __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Why do I have a column called row.names?
I'm trying to read in a tab separated table with read.delim(). I don't particularly care what the row names are. My data file looks like this: start stopSymbol Insert sequence Clone End Pair FISH 203048 67173930ABC8-43024000D23TI:993812543 TI:993834585 255176 87869359ABC8-43034700N15TI:995224581 TI:995237913 1022033 1060472 ABC27-1253C21 TI:2094436044 TI:2094696079 1022033 1061172 ABC23-1388A1TI:2120730727 TI:2121592459 I have to do something with row.names because my first column has duplicate entries. So I read in the file like this: > BACS<-read.delim("testdata.txt", row.names=NULL, fill=TRUE) > head(BACS) row.namesstart stop Symbol Insert.sequence Clone.End.Pair 1203048 67173930 ABC8-43024000D23 NATI:993812543 TI:993834585 2255176 87869359 ABC8-43034700N15 NATI:995224581 TI:995237913 3 1022033 1060472ABC27-1253C21 NA TI:2094436044 TI:2094696079 4 1022033 1061172 ABC23-1388A1 NA TI:2120730727 TI:2121592459 FISH 1 NA 2 NA 3 NA 4 NA Why is there a column named "row.names"? I've tried a few different ways of invoking this, but I always get the first column named row.names, and the rest of the columns shifted by one. Obviously I could fix this by using row.names<-, but I'd like to understand why this happens. Any insight? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Why do I have a column called row.names?
I did read that, and I still don't understand why I have a column called row.names. I used "row.names = NULL" in order to get numbered row names, which was successful: > row.names(BACS) [1] "1" "2" "3" "4" I don't see what this has to do with an extraneous column name. Can you be more explicit as to what exactly I'm supposed to take away from this segment of the help file? Thanks. On Mon, Jun 4, 2012 at 1:05 PM, David L Carlson wrote: > Try help("read.delim") - always a good strategy before using a function for > the first time: > > In it, you will find: "Using row.names = NULL forces row numbering. Missing > or NULL row.names generate row names that are considered to be 'automatic' > (and not preserved by as.matrix)." > > -- > David L Carlson > Associate Professor of Anthropology > Texas A&M University > College Station, TX 77843-4352 > > >> -Original Message- >> From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- >> project.org] On Behalf Of Ed Siefker >> Sent: Monday, June 04, 2012 12:47 PM >> To: r-help@r-project.org >> Subject: [R] Why do I have a column called row.names? >> >> I'm trying to read in a tab separated table with read.delim(). >> I don't particularly care what the row names are. >> My data file looks like this: >> >> >> start stop Symbol Insert sequence Clone End Pair FISH >> 203048 67173930 ABC8-43024000D23 TI:993812543 >> TI:993834585 >> 255176 87869359 ABC8-43034700N15 TI:995224581 >> TI:995237913 >> 1022033 1060472 ABC27-1253C21 TI:2094436044 TI:2094696079 >> 1022033 1061172 ABC23-1388A1 TI:2120730727 TI:2121592459 >> >> >> >> I have to do something with row.names because my first column has >> duplicate entries. So I read in the file like this: >> >> > BACS<-read.delim("testdata.txt", row.names=NULL, fill=TRUE) >> > head(BACS) >> row.names start stop Symbol Insert.sequence >> Clone.End.Pair >> 1 203048 67173930 ABC8-43024000D23 NA TI:993812543 >> TI:993834585 >> 2 255176 87869359 ABC8-43034700N15 NA TI:995224581 >> TI:995237913 >> 3 1022033 1060472 ABC27-1253C21 NA TI:2094436044 >> TI:2094696079 >> 4 1022033 1061172 ABC23-1388A1 NA TI:2120730727 >> TI:2121592459 >> FISH >> 1 NA >> 2 NA >> 3 NA >> 4 NA >> >> >> Why is there a column named "row.names"? I've tried a few different >> ways of invoking this, but I always get the first column named >> row.names, >> and the rest of the columns shifted by one. >> >> Obviously I could fix this by using row.names<-, but I'd like to >> understand >> why this happens. Any insight? >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting- >> guide.html >> and provide commented, minimal, self-contained, reproducible code. > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Lavaan Package - How to Extract Residuals in Data Values
Dear Emily, The lavaan package is typically used to fit models with latent variables, and these models are typically fit to the covariance matrix (and not necessarily to the raw data). Thus, it is usually not straightforward to get data residuals from the fitted models. In your case, it appears that all variables are observed, so you could use "meanstructure=TRUE" within the sem() command to get the intercept for your regression. Then I believe the residuals could be obtained manually. I also wonder whether your specified model is really what you want. I believe that, if you estimate error in all your variables and also specify some covariances between independent variables, the model will be unidentified. It appears that you are handling this by fixing b1 to be zero, but then you are effectively excluding LOG_SR_A_D from the model. I wonder whether you can get by with a simple regression model as estimated by lm(). Ed -- Ed Merkle, PhD Assistant Professor Department of Psychological Sciences University of Missouri Columbia, MO, USA 65211 On 7/9/12 1:25 PM, r-help-requ...@r-project.org wrote: Date: Mon, 9 Jul 2012 11:41:33 -0400 From: Emily Zimmerman To:r-help@r-project.org Subject: [R] Lavaan Package - How to Extract Residuals in Data Values Message-ID: Content-Type: text/plain Hello R Community, I am using the Lavaan package in R 2.15.0 to analyze data collected from 1200 lakes across North America. My dataset includes 3 continuous independent variables (LOG_NTL, LOG_PTL, and LOG_SR_A_D) and 1 continuous dependent variable (BIOVOL) . I have successfully constructed structural equation models using the Lavaan package (example included below with code), but I have not been able to figure out how to extract the residuals in the data values themselves (the unexplained values) of my dependent variable, BIOVOL. For the last step of my analysis, I would like to plot the residuals for BIOVOL against one of the independent variables to see the relationship. I understand how to get the residuals for the covariance matrix, but I do not know how to get the residuals in the data values themselves for BIOVOL. Does anyone know how to extract residuals for data values themselves in the Lavaan package? Here is the code I am using to construct my model and the model that I am trying to get the residuals for: #Specify the model >model2BIOVre <- 'BIOVOL ~ LOG_NTL + LOG_PTL + b1*LOG_SR_A_D + LOG_NTL ~~ LOG_PTL + LOG_NTL ~~ LOG_SR_A_D + b1 == 0' #Fit the model with the sem function >fit <- sem(model2BIOVre, data=lakes, fixed.x=FALSE, estimator="MLM") #Summarize model >summary(fit, fit.measures=TRUE, standardize=TRUE, rsq=TRUE) Here is where I am stumped...I have read the package manuals, and tutorials located at lavaan.urgent.be, as well as some by James Grace. I have also tried to manipulate some other codes, but I can't get it. I may have missed something as I am relatively new to R, but it is not clear to me how to do this. Any help would be very much appreciated. Thank you, Emily Zimmerman __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] In rpart, how is "improve" calculated? (in the "class" case)
Tal, For the Gini criterion, the "improve" value can be calculated as a weighted sum of the improvement in impurity. Continuing with your original code: # for "gini" impurity_root<- gini(prop.table(table(y))) impurity_l<- gini(prop.table(table(obs_0))) impurity_R<-gini(prop.table(table(obs_1))) # (13 and 7 are sample sizes in respective nodes) 13*(impurity_root - impurity_l) + 7*(impurity_root - impurity_R) [1] 5.384615 This does not appear to extend immediately to the information criterion, however. I'm not sure about the 6.84. Ed On 6/14/11 5:00 AM, r-help-requ...@r-project.org wrote: -- Message: 4 Date: Mon, 13 Jun 2011 15:47:26 +0300 From: Tal Galili To:r-help@r-project.org Subject: [R] In rpart, how is "improve" calculated? (in the "class" case) Message-ID: Content-Type: text/plain Hi all, I apologies in advance if I am missing something very simple here, but since I failed at resolving this myself, I'm sending this question to the list. I would appreciate any help in understanding how the rpart function is (exactly) computing the "improve" (which is given in fit$split), and how it differs when using the split='information' vs split='gini' parameters. According to the help in rpart.object: "improve, which is the improvement in deviance given by this split" From what I understand, that would mean that the "improve" value should not be different when using different "split" switches. Since it is different, then I suspect that it is reflecting the impurity measure somehow, but I can't seem to understand how exactly. Bellow is some simple R code showing the result for a simple classification tree, with what the function outputs, and what I would have expected to see if "improve" were to simply reflect the change in impurity. set.seed(1324) y<- sample(c(0,1), 20, T) x<- y x[1:5]<- 0 require(rpart) fit<- rpart(y~x, method = "class", parms=list(split='information')) fit$split[,3] # why is improve here 6.84 ? fit<- rpart(y~x, method = "class", parms=list(split='gini')) fit$split[,3] # why is improve here 5.38 ? # Here is what I thought it should have been: # for "information" entropy<- function(p) { if(any(p==1)) return(0) # works for the case when y has only 0 and 1 categories... -sum(p*log(p,2)) } gini<- function(p) {sum(p*(1-p))} obs_1<- y[x>.5] obs_0<- y[x<.5] n_l<- sum(x>.5) n_R<- sum(x<.5) n<- length(x) # for entropy (information) impurity_root<- entropy(prop.table(table(y))) impurity_l<- entropy(prop.table(table(obs_0))) impurity_R<-entropy(prop.table(table(obs_1))) # shouldn't this have been "improve" ?? impurity_root - ((n_l/n)*impurity_l + (n_R/n)*impurity_R) # 0.7272 # for "gini" impurity_root<- gini(prop.table(table(y))) impurity_l<- gini(prop.table(table(obs_0))) impurity_R<-gini(prop.table(table(obs_1))) impurity_root - ((n_l/n)*impurity_l + (n_R/n)*impurity_R) # 0.3757 Thanks upfront, Tal Contact Details:--- Contact me:tal.gal...@gmail.com | 972-52-7275845 Read me:www.talgalili.com (Hebrew) |www.biostatistics.co.il (Hebrew) | www.r-statistics.com (English) -- -- *** Note new email address *** Ed Merkle, PhD Assistant Professor Department of Psychological Sciences (starting August 2011) University of Missouri Columbia, MO, USA 65211 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] logical to vector?
I am trying to use the coXpress function from the coXpress package. This function requires numerical vectors indicating which columns are in which group. The problem is, I can only figure out how to get a logical structure, not a numerical one. In other words, coXpress wants something like: "1:3" I have something like: TRUE TRUE TRUE FALSE FALSE Can I convert one into the other easily? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Rserve as a proxy
Is there a simple way to use Rserve/RSclient as a proxy to transparently send requests from a local instance of R to a remote instance? It seems like this would by doable by wrapping each call that doesn't refer to a local path inside RSeval. Is this harder than it seems? Does this already exist somewhere? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] subsetting by cell value with a list
I would like to subset by dataframe by matching all rows that have any value from a list of values. I can get it to work if I have exactly one value, I'm not sure how to do it with a list of values though. This works and gives me exactly one line: my.df[ which( mydf$IDX==17)), ] I would like to do something like this: my.df[ which( mydf$IDX==c(17, 42), ] Obviously that won't work, but I hope the meaning is clear. What's the right way to express this? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] argument names inside a function?
Is there a way I can get the names of the arguments passed to a function from within a function? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] argument names inside a function?
Thanks, deparse(substitute()) does exactly what I want. On Sat, Mar 24, 2012 at 4:20 PM, R. Michael Weylandt wrote: > Can you be a little more concrete? > > If you want the form of the expression given (rather than its value), > deparse(substitute()) will work: > > fnc1 <- function(x){ deparse(substitute(x))} > > fnc1(3) # 3 > > fnc1(x) # "x" > > fnc1(x + 4) # "x+4" > > If you are passing them through the ... argument, you can coerce that > to a list and use the names() attribute. > > If you want to reconstruct the exact call (e.g., for a modelling > function), match.call() will do it. > > Hope this helps, > Michael > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] avoiding for loops
I have data that looks like this: > df1 group id 1 red A 2 red B 3 red C 4 blue D 5 blue E 6 blue F I want a list of the groups containing vectors with the ids.I am avoiding subset(), as it is only recommended for interactive use. Here's what I have so far: df1 <- data.frame(group=c("red", "red", "red", "blue", "blue", "blue"), id=c("A", "B", "C", "D", "E", "F")) groups <- levels(df1$group) byid <- lapply(groups, "==", df1$group) groupIDX <- lapply(byid, which) > groupIDX [[1]] [1] 4 5 6 [[2]] [1] 1 2 3 This gives me a list of the indices for each group. I want to subset df1 based on this list. If I want just one group I can do this: > df1[groupIDX[[1]],]$id [1] D E F What sort of statement should I use if I want a result like: [[1]] [1] D E F Levels: A B C D E F [[2]] [1] A B C Levels: A B C D E F So far, I've used a for loop. Can I express this with apply statements? groupIDs <- list(1:length(groupIDX)) groupData<- for (i in 1:length(groupIDX)) { groupIDs[[i]] <- df1[groupIDX[[i]],]$id } __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] lapply and paste
I have a list of suffixes I want to turn into file names with extensions. suff<- c("C1", "C2", "C3") paste("filename_", suff[[1]], ".ext", sep="") [1] "filename_C1.ext" How do I use lapply() on that call to paste()? What's the right way to do this: filenames <- lapply(suff, paste, ...) ? Can I have lapply() reorder the arguments to FUN? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] lapply and paste
Thank you, I was confused about that. What exactly is lapply for then, if R handles this kind of thing automatically? Are there functions that are not "vectorized"? On Wed, Mar 28, 2012 at 1:37 PM, R. Michael Weylandt wrote: > I think you're confused about the need for lapply -- paste is > vectorized so this > > paste("filename_", suff, ".ext", sep = "") > > will work. But if you want to use lapply (for whatever reason) try this: > > lapply(suff, function(x) paste("filename_", x, ".ext", sep = "") > > Michael > > On Wed, Mar 28, 2012 at 2:31 PM, Ed Siefker wrote: >> I have a list of suffixes I want to turn into file names with extensions. >> >> suff<- c("C1", "C2", "C3") >> paste("filename_", suff[[1]], ".ext", sep="") >> [1] "filename_C1.ext" >> >> How do I use lapply() on that call to paste()? >> What's the right way to do this: >> >> filenames <- lapply(suff, paste, ...) >> >> ? >> >> Can I have lapply() reorder the arguments to FUN? >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Sys.setlocale() and text()
Dear HelpeRs, I have a question about the Sys.setlocale() command and plotting. I am running Windows XP, with R 2.6.1. My default locale is English_United States.1252. I am trying to add a lowercase sigma to a plot using the following code: Sys.setlocale("LC_CTYPE","greek") plot(1:10,1:10) text(4,3,"\xF3") For R 2.6.1, this code gives me the glyph from my default (1252) instead of from the 1253 codes. For an older version of R (2.3.0) on the same computer, this code gives me the lowercase sigma that I wanted. I have been unable to pinpoint what has changed. Thanks for the help, and I apologize if I am missing something obvious. -- Ed Merkle, PhD Assistant Professor Dept. of Psychology Wichita State University Wichita, KS 67260 __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Sys.setlocale() and text()
Thanks very much for the response. I think I left out an important detail, however. I want my lowercase sigma to be displayed in a specific font from the Rdevga file (my project involves fonts). So far as I know, quote() does not allow me to select a font. Thus, I am specifically interested in the text() command and reasons why my example code performs differently in R 2.3.0 vs 2.6.1. Thanks, Ed Gabor Grothendieck wrote: > Try this: > > plot(1:10, main = quote(sigma ^ 2)) > > > On Dec 11, 2007 10:09 PM, Ed Merkle <[EMAIL PROTECTED]> wrote: >> Dear HelpeRs, >> >> I have a question about the Sys.setlocale() command and plotting. I am >> running Windows XP, with R 2.6.1. My default locale is English_United >> States.1252. >> >> I am trying to add a lowercase sigma to a plot using the following code: >> >> Sys.setlocale("LC_CTYPE","greek") >> plot(1:10,1:10) >> text(4,3,"\xF3") >> >> >> For R 2.6.1, this code gives me the glyph from my default (1252) instead >> of from the 1253 codes. For an older version of R (2.3.0) on the same >> computer, this code gives me the lowercase sigma that I wanted. I have >> been unable to pinpoint what has changed. Thanks for the help, and I >> apologize if I am missing something obvious. >> >> >> -- >> Ed Merkle, PhD >> Assistant Professor >> Dept. of Psychology >> Wichita State University >> Wichita, KS 67260 >> >> __ >> R-help@r-project.org mailing list >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] text vector clustering
Srinivas, I don't know of a clustering algorithm, but you might check out agrep() from the base package and stringMatch() from the MiscPsycho package. These can help to identify similar text sequences, and it may be possible to group similar names by using these commands over and over again. Ed -- Ed Merkle, PhD Assistant Professor Dept. of Psychology Wichita State University Wichita, KS 67260 Date: Thu, 22 Jan 2009 16:33:03 +0530 From: srinivasa raghavan Subject: [R] text vector clustering To: r-help@r-project.org Message-ID: Content-Type: text/plain Hi, I am a new user of R using R 2.8.1 in windows 2003. I have a csv file with single column which contain the 30,000 students names. There were typo errors while entering this student names. The actual list of names is < 1000. However we dont have that list for keyword search. I am interested in grouping/cluster these names as those which are similar letter to letter. Are there any text clustering algorithm in R which can group names of similar type in to segments of exactly matching , 90% matching, 80% matching,etc. thanks in advance, regards, srinivas statistical analyst. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Couple of Questions about Classification trees
The issue with the sample size is that there are so many measurements in comparison to number of meats. Aside from that, you should check out the rpart package. Its commands are similar to the tree package, but there are more options for the plots. I don't know immediately how to display misclassification rates, but the text.rpart command can display numbers of incorrectly- and correctly-classified observations in each node. Ed -- Ed Merkle, PhD Assistant Professor Dept. of Psychology Wichita State University Wichita, KS, USA 67260 Date: Wed, 11 Mar 2009 13:53:46 -0700 (PDT) From: Jen_mp3 Subject: Re: [R] Couple of Questions about Classification trees To: r-help@r-project.org Message-ID: <22464302.p...@talk.nabble.com> Content-Type: text/plain; charset=us-ascii Okay perhaps I should've been more clear about the data. Im actually working on spectroscopic measurements from food authenticity testing. I have five different types of meat: 55 of chicken, 55 of turkey, 55 of pork, 34 of beef and 32 of lamb - 231 in total. On each of these 231 meats, 1024 spectroscopic measurements were taken. Matrix of 231 by 1024. But the questions I want answered are which of the 1024 measurements are important for predicting meat type and which of the different types of meat are incorrectly classified - i.e can we tell the difference between chicken and turkey. So to carry out a multivariate analysis on the data Ive split it into two. A training data set and a test data set - half and half although I think the larger half (55 goes into 27 and 28) went into the test data set which explains the inequalities in the row numbers. By the way 1024 is standard - can't change that. Can't change the 231 either. So I created a new row with the meat types for each row. End up with the following R code: library(tree) meat.tree <- tree(meat.type~., data=train) using tree.cv (or cv.tree) lowest missclassification rate is 5 so cut the number of nodes down to 5 using prune.tree prunedtree <- prune.tree(meat.tree, best = 5, method = "misclass") Then I want to use predict.tree and the test data set. predicttree <- predict.tree(prunedtree, data = test) I already said what it produces. Again, how would I display the misclassification rate at each node on the diagram? I know about misclass.tree(prunedtree, detail = TRUE) but that doesn't actually display them on the classification tree - it just gives a bunch of numbers of the worksheet and it just wouldn't look very neat if I had to add them later. -- View this message in context: http://www.nabble.com/Couple-of-Questions-about-Classification-trees-tp22461673p22464302.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] RWeb Server
Hello, I am looking for tutorials on setting up R on a Windows 2008 server for the purpose of making calls from web pages (e.g. SharePoint) or report engines (e.g. SSRS) to embed inline dynamically rendered R content. I found the link to RWeb (http://www.math.montana.edu/Rweb/) which I was hoping would get me started in the right direction, but apparently the links are broken and the contact is no longer available. Other than that I have not been able to find related support. Is there any help you can offer to get me going? Thank you! Ed Ed Wiebe, Manager Enterprise Architecture, Enterprise Information Services California Department of Corrections and Rehabilitation 1900 Birkmont Drive Rancho Cordova, CA 95742 1-916-358-1866 Desk 1-916-358-2019 Fax ed.wi...@cdcr.ca.gov<https://ca.mail.ca.gov/OWA/UrlBlockedError.aspx> "The key to successfully doing something is in successfully understanding what you're doing." - Thomas Erl [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Best 64-bit Linux distro for R?
On Sun, Feb 8, 2009 at 3:54 PM, Dirk Eddelbuettel wrote: > To differentiate the then-different > chips of AMD from Intels Itanium ia64 line, the 'amd64' name was > introduced. These days ia64 is ancient history and "we're all amd64 users". > > By the way, if you decide to go with Ubuntu or Debian, the r-sig-debian list > is there to help. > > Hth, Dirk You will also see the "amd64" architecture referred to by the name "x86_64". It is yet another name for the same architecture. As far as the choice of distro is concerned, I've had total success with R on all the major distros on my 64-bit machine. But I would definitely give the nod to a Debian-based distro like Ubuntu because of the large existing base of R packages in the Debian / Ubuntu repositories. -- M. Edward (Ed) Borasky I've never met a happy clam. In fact, most of them were pretty steamed. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] installing R on Ubuntu
On Mon, Feb 9, 2009 at 4:51 AM, Neil Shephard wrote: > > The preceived "difficulty" of installing R under whatever flavour of > GNU/Linux in this thread stems from being unfamiliar with the process of the > package management of the flavour of GNU/Linux you use (and in part by the > various distros not having the most recent version of R in their > repositories. > > People who say "why can't it be as easy as dowloading a self-installing > binary and running that" are trying to fit a round peg (their experience and > understanding of how applications install in M$-windows) in a square hole > (or triangular, hexagonal, or whatever depending on the distribution of > GNU/Linux). This is true. However, for the most common Linux distros --Debian, Red Hat Enterprise / CentOS / Scientific Linux / Fedora, openSUSE and Ubuntu -- you can install the most recent R compiled for your distro from http:///bin/linux/ In addition, most of the distros have third-party repositories where you can find the latest version of R. In short, if you have an x86 or x86_64/amd64 system running almost any Linux, you can find a pre-compiled R. R is a popular package, and it's pretty easy to find even for Power PC or some of the obscure architectures. > > There are pro's and con's to each of the GNU/Linux flavours and its really a > matter of deciding which you like/have invested time in learning. > > Irrespective its still simple to install R from source under GNU/Linux... > > 1) Download source tar-ball > 2) Extract and cd to the directory > 3) ./configure --prefix=/where/you/want/R/to/go (optionally setting the > install path at this stage) > 4) ./make > 5) ./make install > > ...all documented in the FAQ at > http://cran.r-project.org/doc/FAQ/R-FAQ.html#How-can-R-be-installed-_0028Unix_0029 Many Linux distros do *not* install the development tools by default, and which ones live in which packages varies by distro. Fedora in particular is extremely stripped when you install from the LiveCD. You have to install gcc, make and a couple of other things just to install VMware Tools, for example, when running Fedora as a VMware guest. For building R from source and installing R packages, you'll also need to install gfortran. And many libraries with external dependencies, like Rgraphviz, will require not only the package itself (graphviz) but also the C headers, which may have the name "graphviz-devel" on some distros and some other name on other distros. > > This might not be as clean as using the native package management, but does > mean that you'll have the latest version installed. > > Neil > > (Addendum - I've tried several different distros, starting with RedHat 7.3, > then various versions of Slackware 8 through to 9 before settling on Gentoo, > all were easy to install R in). I just recently switched from Gentoo to openSUSE. Gentoo usually had the latest R source in their repository within a day or so of it coming out of the R Project release cycle. To get it, all you needed to do was put the package name in the "/etc/portage/package-keywords" file. And Gentoo, since it is almost all compiled from source, by nature *does* have all the development tools installed and installs all the headers when it installs packages. -- M. Edward (Ed) Borasky I've never met a happy clam. In fact, most of them were pretty steamed. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Scale
I would like to get horizontal numbers on the both axes: X and Y. I got horizontal numbers only on the Y axis when adding las=2, How to obtain a horizontal orientation for number on scale also for the X axis (now they are vertical)? Here is my code: plot(survfit(Y~addicts$clinic), fun="cloglog", las=2) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] rounding
Hi, Round(0.55,1)=0.5 Round(2.55,1)=2.6 Can this be right? Thanks, Ed [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.