[R] Help request: Parsing docx files for key words and appending to a spreadsheet

2023-12-29 Thread Andy
ted to UTF-8 plain text, would that make the task easier? I am not a confident coder, and am really only just getting my head around R so appreciate a steep learning curve ahead, but of course, I don't know what I don't know, so any pointers in the right d

Re: [R] Help request: Parsing docx files for key words and appending to a spreadsheet

2023-12-29 Thread Andy
n filetype %in% c("docx") && grepl("^([fh]ttp)", file) :'length = 38' in coercion to 'logical(1)' ## And so I am going around in circles and not at all clear on how I can make progress. I am sure that there must be a way, but the sugg

Re: [R] Help request: Parsing docx files for key words and appending to a spreadsheet

2023-12-29 Thread Andy
en manipulate it. To be more specific, we might need an example of the DF [...] On Fri, Dec 29, 2023 at 10:14 AM Andy wrote: [...] I'd like to be able to accomplish the following: (1) Append the title, the month, the author, the number of words, and page number(s) to a spreadsheet

Re: [R] Help request: Parsing docx files for key words and appending to a spreadsheet

2023-12-30 Thread Andy
Thanks Ivan and Calum I continue to appreciate your support. Calum, I entered the code snippet you provided, and it returns 'file missing'. Looking at this, while the object 'full_filename' exists, what is happening is that the path from getwd() is being appended to the title of the article, b

Re: [R] Help request: Parsing docx files for key words and appending to a spreadsheet

2023-12-30 Thread Andy
ion and page number Length Byline Subject (only if the threshold of coverage for a specific subject is >=50% is reached (e.g. Greenwashing (51%)) - if not, enter 'nil' and move onto the next article in the folder This is the ambition. I am clearly a long way short of that though. Man

Re: [R] Help request: Parsing docx files for key words and appending to a spreadsheet

2023-12-30 Thread Andy
Hi Eric Thanks for that. That seems to fix one problem (the lack of a separator), but introduces a new one when I complete the function Calum proposed:Error in docx_summary() : argument "x" is missing, with no default The whole code so far looks like this: # Load libraries library(tcltk) libr

Re: [R] Help request: Parsing docx files for key words and appending to a spreadsheet

2023-12-30 Thread Andy
t line   # which summarises it.   # the result is saved in a data frame object   # called content which we shall show some   # heading into from   head(content) } Results in this error now:Error in x$doc_obj : $ operator is invalid for atomic vectors Thank you. On 30/12/2023 12:12, Andy

Re: [R] Help request: Parsing docx files for key words and appending to a spreadsheet

2024-01-04 Thread Andy
ction and append part. If I can get it to work for one of these fields, I suspect that I can repeat the basic syntax to extract and append the remaining fields. Therefore, if someone can either suggest a syntax or point me to a useful tutorial, that would be splendid. Thank you in anticipation.

Re: [R] Help request: Parsing docx files for key words and appending to a spreadsheet

2024-01-06 Thread Andy
f the major sticking points I kept bumping up against. Thank you so much for this. All the best Andy On 05/01/2024 13:59, Howard, Tim G (DEC) wrote: Here's a simplified version of how I would do it, using `textreadr` but otherwise base functions. I haven't done it all, but have a

[R] Multivariate P-GARCH Model

2012-06-27 Thread andy
Hi, I am trying to estimate a multivariate P-GARCH model for two factors x&y. I have selected p-garch to study the leverage effects. Is there any toolkit in R that can help me do this? Thanks, Andy -- View this message in context: http://r.789695.n4.nabble.com/Multivariate-P-GARCH-M

Re: [R] DCC-GARCH model

2012-06-28 Thread andy
Hello Marcin, did you get the answer to your questions. I have the same questions and would appreciate your help if you found the answers. Thanks, Ankur -- View this message in context: http://r.789695.n4.nabble.com/DCC-GARCH-model-tp3524387p4634776.html Sent from the R help mailing list archi

[R] Importing an Excel spreadsheet

2008-03-20 Thread andy
ems to be the same as any other row. I really don't want to have to manually re-enter the data (some 98 rows x 26 columns). Can someone advise me on what I am overlooking here please. Thanks Andy -- "If they can get you asking the wrong questions, they don't have to worr

Re: [R] Importing an Excel spreadsheet

2008-03-20 Thread andy
Bryan K Woods wrote: > If you open the spreadsheet in Excel you can then do "Save as..." and > select type CSV (comma-delimited text). Once you have the data in CSV > format, you can use the R function read.csv to import the data. > > Cheers, > Bryan > > andy wr

Re: [R] Importing an Excel spreadsheet [SOLVED]

2008-03-21 Thread andy
sheet. Plus I am using GNU/Linux, not Windows so some approaches won't work. I think this is now sorted. Many thanks Andy -- "If they can get you asking the wrong questions, they don't have to worry about the answers." - Thomas Pynchon, "Gravity's Rainbow&quo

[R] Random Sample - data frame

2009-01-27 Thread Andy
I don't seem to be able to get the results that I want. # Example: name <- c("andy", "kevin", "lindsay", "karen") age <- c(29, 37, 26, 31) gender <- c("M", "M", "F", "F") people <- data.frame(name, age

[R] word stemming for corpus linguistics

2016-07-26 Thread Andy Wolfe
sis using the tm package as part of the whole text mining process? I appreciate any help. Thanks. Andy [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mail

Re: [R] word stemming for corpus linguistics

2016-07-26 Thread Andy Wolfe
on on that process, and whether that is applied before or after the text is transformed into a DTM because searching on-line hasn't (yet) thrown anything back. Thanks. Andy On 26/07/16 08:50, Paul Johnston wrote: Suggest look at http://www.inside-r.org/packages/cran/tm/docs/stemDocumen

Re: [R] word stemming for corpus linguistics

2016-07-26 Thread Andy Wolfe
ne until I come across a better (read, more elegant) solution. Best Andy On 26/07/16 14:05, Paul Johnston wrote: Hi I use the tm_map() with stemDocument used as an argument Looking at a particular file before stemming writeLines(as.character(data_mined_volatile[[1]])) ## The European

[R] Most appropriate function for the following optimisation issue?

2015-10-20 Thread Andy Yuan
Hello Please could you help me to select the most appropriate/fastest function to use for the following constraint optimisation issue? Objective function: Min: Sum( (X[i] - S[i] )^2) Subject to constraint : Sum (B[i] x X[i]) =0 where i=1��n and S[i] and B[i] are real numbers Need to

[R] HELP - as.numeric changing column data

2016-01-06 Thread Andy Schneider
Hi - I'm trying to plot some data and having a lot of trouble! I have a simple dataset consisting of two columns - income_per_capita and mass_beauty_value. When I read the data in and plot it, I get the attached plot Mass Beauty Non-Numeric:

Re: [R] Random Forest classification

2016-04-18 Thread Liaw, Andy
This is explained in the "Details" section of the help page for partialPlot. Best Andy > -Original Message- > From: R-help [mailto:r-help-boun...@r-project.org] On Behalf Of Jesús Para > Fernández > Sent: Tuesday, April 12, 2016 1:17 AM > To: r-help@r-project.

Re: [R] randomForest outlier

2008-07-16 Thread Liaw, Andy
asure. Please see the "value" section of ?outlier to see how this measure is computed. Andy From: Birgitle > > Still the same question: > > > Birgitle wrote: > > > > I try to use ?randomForest to find variables that are the > most important > > to d

Re: [R] confusion matrix in randomForest

2008-07-21 Thread Liaw, Andy
randomForest predictions are based on votes of individual trees, thus have little to do with error rates of individual trees. Andy > -Original Message- > From: [EMAIL PROTECTED] > [mailto:[EMAIL PROTECTED] On Behalf Of Miklos Kiss > Sent: Saturday, July 19, 2008 10:47 PM &g

Re: [R] equivalent R functions for Numerical Recipes fitxy and fitexy ?

2008-07-31 Thread Liaw, Andy
Not a direct answer to your questions, but for error-in-variables problems, there are newer technologies than what is in NR. For example: install.packages("simex") library(simex) example(simex) Andy From: Marc Fischer > > Dear Folks, > > We need to fit the model y

Re: [R] Department of Redundancy Department.

2008-08-15 Thread Liaw, Andy
mp; (e1 == e2) } > > X <- structure(NA, class="MaybeNA") > > is.logical(X) > [1] TRUE > > (X == TRUE) > [1] FALSE > > Ta da ;) But anything compared to NA should result in NA! Andy > Henrik > > PS. It might be worth mentioning base::isTRUE() w

Re: [R] Test of Homogeneity of Variances

2008-08-22 Thread Liaw, Andy
x27;t have the basics, you're just going to get more and more puzzled every step of the way. Just a frank suggestion. Best, Andy From: Daren Tan > > I am testing whether the sample variances are equal. When > p-value < 0.05 (alpha), should accept null hypothesis (sample >

[R] Polychoric and tetrachoric correlation

2008-09-01 Thread Andy Fugard
true, how can one estimate 95% confidence intervals for the correlations? My guess would be mat = hetcor(dataframe) mat$correlation - (1.96 * mat$std.errors) mat$correlation + (1.96 * mat$std.errors) Thanks, Andy -- Andy Fugard, Postgraduate Research Student Psychology (Room S6), The

Re: [R] Polychoric and tetrachoric correlation

2008-09-03 Thread Andy Fugard
Dear John, Yes, that's great - thanks! Andy John Fox wrote: Dear Andy, Yes, the tetrachoric correlation is a special case of the polychoric correlation when both factors are dichotomous. The 95-percent confidence interval that you suggest might be adequate if the sample si

Re: [R] Derivative of nonparametric curve

2009-09-09 Thread Liaw, Andy
e > > ?smooth.spline > and > ?predict.smooth.spline Since sm.regression() (from the sm package, I presume) uses kernel methods, a kernel-based estimator of derivatives is available in the KernSmooth package. Andy > cheers

[R] Barplot+Table

2009-09-11 Thread Andy Choens
I am trying to automate a report that my company does every couple of years for the state of Maine. In the past we have used SPSS to run the data and then used complicated Excel template to make the tables/graphics which we then imported into Word. Since there are 256 tables/graphics for this re

Re: [R] Barplot+Table

2009-09-11 Thread Andy Choens
On Friday 11 September 2009 02:47:32 pm Henrique Dallazuanna wrote: > Try the textplot function in the gplots package: Thank you. That definitely gives me a direction to pursue. It doesn't look like there is an easy way to make things line up though, which is unfortunate but I'm sure it's possib

Re: [R] Random Forest

2010-02-16 Thread Liaw, Andy
> something wrong? Try to follow the posting guide (link in the footer of the message) and you may just get the help you're looking for. Please help us to help you! Andy > Thanks > > -- > View this message in context: > http://n4.nabble.com/Random-Forest-tp1557464p1557464

Re: [R] Alternatives to linear regression with multiple variables

2010-02-22 Thread Liaw, Andy
lf(X1, X2, X3), data=mydata) R> plot(fit) Andy > -Original Message- > From: r-help-boun...@r-project.org > [mailto:r-help-boun...@r-project.org] On Behalf Of Guy Green > Sent: Monday, February 22, 2010 7:47 AM > To: r-help@r-project.org > Subject: [R] Alternatives to

Re: [R] Random Forest prediction questions

2010-03-01 Thread Liaw, Andy
mean? Not sure why you're debugging that portion of the code. That is just to dimension the array passed back from C into a matrix. What is "t1"? > 2. how can i drop a tree from the forest? Look at the $forest component of the randomForest object, and subset the dimension tha

Re: [R] Random Forest

2010-03-01 Thread Liaw, Andy
mple is clear: R> iris.rf = randomForest(Species~., iris, ntree=148) R> iris.p = predict(iris.rf, iris, predict.all=TRUE) R> str(iris.p$individual) chr [1:150, 1:148] "setosa" "setosa" "setosa" "setosa" "setosa" ... - attr(*, "dimnames&q

Re: [R] Thougt I understood factors but??

2010-03-01 Thread Liaw, Andy
: R> a [1] 3 3 3 2 2 2 1 1 1 Levels: 3 2 1 R> as.numeric(a) [1] 1 1 1 2 2 2 3 3 3 R> as.numeric(as.character(a)) [1] 3 3 3 2 2 2 1 1 1 Andy > > Levels: 3 2 1 > >> a<-gl(3,3,9) > >> factor(a,levels=3:1) > > That is the right way IMO to safely change the order

Re: [R] ANOVA "Types" and Regression models: the same?

2010-03-02 Thread Liaw, Andy
igh time to retire the archaic concept of the different types of sums of squares. IMHO they are the biggest red herrings in Statistics. Best, Andy From: Ravi Kulkarni > > Hello, > I think I am beginning to understand what is involved in > the so-called > "Type-I, II, ..."

Re: [R] Gradient Boosting Trees with correlated predictors in gbm

2010-03-02 Thread Liaw, Andy
In most implementations of boosting, and for that matter, single tree, the first variable wins when there are ties. In randomForest the variables are sampled, and thus not tested in the same order from one node to the next, thus the variables are more likely to "share the glory".

Re: [R] scientific (statistical) foundation for Y-RANDOMIZATION in regression analysis

2010-03-08 Thread Liaw, Andy
ampled data, including feature selections, etc. Andy From: Damjan Krstajic > > Dear all, > > I am a statistician doing research in QSAR, building > regression models where the dependent variable is a numerical > expression of some chemical activity and input variables are >

Re: [R] How can I understand this sentenc e,and express it by means of Mathema tical approach?

2010-03-08 Thread Liaw, Andy
If your ultimate interest is in real scientific progress, I'd suggest that you ignore that sentence (and any conclusion drawn subsequent to it). Cheers, Andy From: bbslover > > This topic refer to independent variables reduction, as we > know ,a lot of > method can do wit

Re: [R] Is there an equivalence of lm's "anova" for an rpart object ?

2010-03-08 Thread Liaw, Andy
One way to do it (no p-values) is explained in the original CART book. You basically add up all the "improvement" (in fit$split[, "improve"]) due to each splitting variable. Andy From: Tal Galili > > Simple example: > > # Classification Tree with rpart >

Re: [R] Random Forest

2010-03-10 Thread Liaw, Andy
Thanks for providing the code that allows me to reproduce the problem. It looks like the prediction routine for some reason returns "0" as prediction for some trees, thus causing the problem observed. I'll look into it. Andy From: Dror > > Hi, > Thank you for y

Re: [R] Robust estimation of variance components for a nested design

2010-03-11 Thread Liaw, Andy
I believe Pinhiero et al published a paper in JCGS a few years back on the subject, modeling the random effects with t distributions. No software were publicly available, as far as I know. Andy From: S Ellison > Sent: Thursday, March 11, 2010 9:56 AM > To: r-help@r-project.org > Su

Re: [R] Regarding variable importance in the randomForest package

2010-03-16 Thread Liaw, Andy
Seems like you're new to R as well? The first argument should contain only the predictor variables, but you used the entire data frame that contains the response. Andy > -Original Message- > From: r-help-boun...@r-project.org > [mailto:r-help-boun...@r-project.org

Re: [R] Equivalent to Matlab's "Ans"

2009-06-30 Thread Liaw, Andy
Something like this? R> mean(rnorm(100)) [1] -0.0095774 R> .Last.value [1] -0.0095774 Andy > -Original Message- > From: r-help-boun...@r-project.org > [mailto:r-help-boun...@r-project.org] On Behalf Of Stephane > Sent: Tuesday, June 30, 2009 2:07 PM > To:

Re: [R] Testing memory limits in R??

2009-07-06 Thread Andy Zhu
check Memory in R: ?Memory --- On Mon, 7/6/09, Scott Zentz wrote: From: Scott Zentz Subject: [R] Testing memory limits in R?? To: r-help@r-project.org Date: Monday, July 6, 2009, 3:52 PM Hello Everyone,    We have recently purchased a server which has 64GB of memory running a 64bit OS and

Re: [R] #INCLUDE

2009-07-08 Thread Andy Zhu
source(_external_file_name) --- On Wed, 7/8/09, Idgarad wrote: From: Idgarad Subject: [R] #INCLUDE To: r-help@r-project.org Date: Wednesday, July 8, 2009, 11:16 AM What is R's equivalent to a C-like #include to incorporate external files. I have a 2k line function that is generated and need to

Re: [R] Linear Regression Problem

2009-07-14 Thread Liaw, Andy
For the coefficient to be equal to the correlation, you need to scale y as well. You can get the correlations by something like the following and then back-calculate the coefficients from there. R> x = matrix(rnorm(100*4e4), 100, 4e4) R> y = rnorm(100) R> rxy = cor(x, cbind(

Re: [R] randomForest - what is a 'good' pseudo r-squared?

2009-07-21 Thread Liaw, Andy
;100%. You may want to check the distribution of the response (or residuals) to see if a transformation is appropriate. Tree-based methods (of which random forests is one) can be sensitive to heteroscedasticity. Best, Andy From: lara harrup (IAH-P) > > Hi all > > I have been trying to us

Re: [R] cannot allocate a vector with 1920165909 length

2009-07-28 Thread Andy Zhu
Out of memory? How large is your physical memory? --- On Tue, 7/28/09, zhijie zhang wrote: From: zhijie zhang Subject: [R] cannot allocate a vector with 1920165909 length To: r-h...@stat.math.ethz.ch Date: Tuesday, July 28, 2009, 9:49 PM Dear Rusers, The error for the  following was that it c

[R] All sub-summands of a vector

2010-04-02 Thread Andy Rominger
And then the sum of x[1:2], x[2:3], etc. And then...so on. The result would be: 1 2 3 4 2 5 7 6 9 10 I can do this with for loops (code below) but for long vectors (10^6 elements) looping takes more time than I'd like. Any suggestions? Thanks very much in advance-- Andy # calculate sums of

Re: [R] All sub-summands of a vector

2010-04-02 Thread Andy Rominger
Great, thanks for your help. I tried: x <- 1:1 y <- lapply(1:1,function(t){t*runmean(x,t,alg="fast",endrule="trim")}) and it worked in about 90 sec. Thanks again, Andy On Fri, Apr 2, 2010 at 3:43 PM, Gabor Grothendieck wrote: > There is also rollmean in t

Re: [R] sample size > 20K? Was: fitness of regression tree: how tomeasure???

2010-04-05 Thread Liaw, Andy
statisticians such as John Tukey, condemned the sampling procedure. Tukey was perhaps the most vocal critic, saying, "A random selection of three people would have been better than a group of 300 chosen by Mr. Kinsey." Andy From: Frank E Harrell Jr > > Good comments Bert. Just

Re: [R] Question on implementing Random Forests scoring

2010-04-09 Thread Liaw, Andy
e source code of the package. Your C code just need to wrap around that. Andy Notice: This e-mail message, together with any attachme...{{dropped:10}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read t

Re: [R] Help with Partial dependence bar graph

2010-04-21 Thread Liaw, Andy
Store the returned value of partialPlot() in an object and do your own barplot. Read the "Value" section in the help page for partialPlot. Andy From: Daudi Jjingo > > Hello, > > I need to draw a partial dependence bar graph. > My the my predictor vectors ar

Re: [R] Question on: Random Forest Variable Importance for RegressionProblems

2010-04-28 Thread Liaw, Andy
I would have thought that the help page for importance() is an (the?) obvious place to look... If that description is not clear, please let me know which part isn't clear to you. Andy From: Mareike Lies > > I am trying to use the package RandomForest performing regression. >

Re: [R] Curve Fitting/Regression with Multiple Observations

2010-04-30 Thread Liaw, Andy
st is very high, and the signal to noise ratio is rather low. This has the tendency of burning out those who started with good intentions to help. Andy From: Kyeong Soo (Joseph) Kim > > Dear Keith, > > Thanks for the suggestion and taking your time to respond to it. > > But,

Re: [R] Barplot+Table

2009-09-23 Thread Andy Choens
Marc Schwartz wrote: >Using the data that is in the online plot rather than the above, here >is a first go. Note that I am not drawing the background grid in the >barplot or the lines for table below it. These could be added if you >really need them. Note: I snipped out the syntax from Marc'

Re: [R] how to visualize gini coefficient in each node in RF?

2009-09-29 Thread Liaw, Andy
No. The forest object is too large as is. I didn't think it's worth the extra memory to store them. They were never kept even in the Fortran/C code. Andy From: Chrysanthi A. > Sent: Monday, September 28, 2009 5:20 PM > To: r-help@r-project.org > Subject: [R] how to visuali

Re: [R] how to visualize gini coefficient in each node in RF?

2009-09-30 Thread Liaw, Andy
You can try to hunt for it in the findbestsplit Fortran subroutine. It uses some thing that's equivalent (but easier to compute), not exactly identical. Breiman uses whatever computational shortcuts he could find in his code. Best, Andy

Re: [R] Ubuntu, Revolutions, R

2009-10-05 Thread Andy Choens
request is that this is the sort of change / improvement that is unlikely to make it into the "New Features" publications produced by Canonical, since R users are obviously a tiny minority of Ubuntu users. I think it's going to be important for Canonical and it's partners (

Re: [R] Random Forest - partial dependence plot

2009-10-20 Thread Liaw, Andy
ape of that trend that is "important". You may interpret the relative range of these plots from different predictor variables, but not the absolute range. Hope that helps. Andy > -Original Message- > From: r-help-boun...@r-project.org > [mailto:r-help-boun...@r-pro

Re: [R] "interactions" feature in RF?

2009-10-22 Thread Liaw, Andy
That has not yet been implemented in the R version of the package. Best, Andy > -Original Message- > From: r-help-boun...@r-project.org > [mailto:r-help-boun...@r-project.org] On Behalf Of Chrysanthi A. > Sent: Thursday, October 22, 2009 6:40 AM > To: r-help@r-project.or

Re: [R] violin - like plots for bivariate data

2009-11-16 Thread Liaw, Andy
sounds like bivariate density contours may be what you're looking for. Andy From: Eric Nord > > I'm attempting to produce something like a violin plot to > display how y > changes with x for members of different groups (My specific > case is how > floral area c

Re: [R] Installing RandomForest on SuSe Linux - warnings

2009-12-07 Thread Liaw, Andy
Those are the same warnings I get when I test the package (before submitting to CRAN) and have been that way for a long time. They stemmed from conditional allocation of arrays in C. gcc -wall seems to always pick on that. As far as I know, they are harmless. Andy > -Original Mess

Re: [R] RandomForest - getTree status code

2009-12-07 Thread Liaw, Andy
Is that the entire tree? If so there's a problem. The node status is defined as follows in rf.h of the source code: #define NODE_TERMINAL -1 #define NODE_TOSPLIT -2 #define NODE_INTERIOR -3 i.e., "-3" means "non-terminal" node. Andy > -Original Message--

Re: [R] coefficients of each local polynomial from locfit

2009-12-08 Thread Liaw, Andy
I believe the prediction is done some some sort of grid, then interpolated to fill in the rest. This is, however, purely for computational reason, and not for any threoretical reasons. The formal definition of local polynomials is to do a weighted fit of polynomial at each point. Andy

Re: [R] different randomForest performance for same data

2009-12-15 Thread Liaw, Andy
You need to be _extremely_ careful when assigning levels of factors. Look at this example: R> x1 = factor(c("a", "b", "c")) R> x2 = factor(c("a", "c", "c")) R> x3 = x2 R> levels(x3) <- levels(x1) R> x3 [1

Re: [R] Error while using rfImpute

2009-05-08 Thread Liaw, Andy
Try re-starting R, load the randomForest package, and then run example(rfImpute) and see if that works. Can you post your sessionInfo() output? Andy From: cosmos science > > Dear Administrator, > > I am using linux (suse 10.2). While attempting rfImpute, I am > getting the &g

[R] Import Visual FoxPro (.dbf)

2009-05-11 Thread Andy Choens
0.8.30-1 Ubuntu:9.04 (32-bit) Thanks for any thoughts. --andy -- This is the price and the promise of citizenship. - Barack Obama __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting

Re: [R] pair matching

2009-05-12 Thread Liaw, Andy
If the matching need not be one-to-one, then you can just compute the Euclidean distances between the two vectors, then in each row (or column, which ever corresponds to the shorter vector) and find the smallest. This should be fairly easy to do. Andy From: Thomas S. Dye > > Given two n

Re: [R] questions on rpart (tree changes when rearrange the order of covariates?!)

2009-05-13 Thread Liaw, Andy
I've gotten some pointers from Terry Therneau about where in the code to check. I may try to implement breaking ties at random (as I've done in randomForest). No promises, though... Andy > > > > > > > Does anyone know how rpart deal with ties? > > > &g

Re: [R] read multiple large files into one dataframe

2009-05-13 Thread Liaw, Andy
fun with the data even when you manage to get it into R in one piece. Andy From: SYKES, Jennifer > > Hello > > > > Apologies if this is a simple question, I have searched the help and > have not managed to work out a solution. > > Does anybody know an efficient m

Re: [R] Importing data into R and combining 2 files

2009-05-14 Thread Andy Choens
On Thu, 2009-05-14 at 10:30 -0700, Sunita22 wrote: > Hello > > I have to import 2 txt files into R. 1 file contains the data and the other > contains the header, column headings, datatypes and labels for the data. > This is your first complicating factor. > I have 2 problems: > > 1) my data fi

Re: [R] Using sample to create Training and Test sets

2009-05-15 Thread Liaw, Andy
Here's one possibility: idx <- sample(nrow(acc)) training <- acc[idx[1:400], ] testset <- acc[-idx[1:400], ] Andy From: Chris Arthur > > Forgive the newbie question, I want to select random rows from my > data.frame to create a test set (which I can do) but then I want

Re: [R] Simulation from a multivariate normal distribution

2009-05-18 Thread Liaw, Andy
Check out the help page for replicate(). Andy From: barbara.r...@uniroma1.it > > I must to create an array with dimensions 120x8x500. Better I > have to make 500 simulations of 8 series of return from a multivariate > normal distribution. there's the command "mvrnorm&quo

Re: [R] How to google for R stuff?

2009-05-21 Thread Andy Choens
> You are very picky. When I enter > > R residuals > > into Google, 8 out of the first 10 hits are for R topics. Isn't that > good enough for you? > > I think this is true of most Google searches: the letter R most often > means the R project. Although it does not appear to be a factor wit

[R] degree symbol using X11 on OSX

2009-05-26 Thread Andy Jacobson
cript (i.e. with a degree symbol) when I use dev.copy to write it to an eps file. I'm using R version 2.8.1 Patched (2009-01-19 r47650) on an intel Mac, fully updated OS X 10.5.7. Help appreciated, Andy -- Andy Jacobson andy.jacob...@noaa.gov NOAA Earth System Research Lab Global Moni

Re: [R] Constrained fits: y~a+b*x-c*x^2, with a,b,c >=0

2009-05-27 Thread Liaw, Andy
There's also the "nnls" (non-negative least squares) package on CRAN that might be useful, although I'm puzzled by the negative sign in front of c in Alex post... Cheers, Andy From: Berwin A Turlach > > G'day Alex, > > On Wed, 27 May 2009 11:51:39 +0200

Re: [R] Heatmap

2009-06-08 Thread Liaw, Andy
Couldn't you get that just by giving heatmap() the transpose of your data? > -Original Message- > From: r-help-boun...@r-project.org > [mailto:r-help-boun...@r-project.org] On Behalf Of Alex Roy > Sent: Monday, June 08, 2009 9:32 AM > To: r-help@r-project.org > Subject: [R] Heatmap > >

Re: [R] Random Forest % Variation vs Psuedo-R^2?

2009-06-08 Thread Liaw, Andy
estimate of MSE. HTH, Andy > -Original Message- > From: r-help-boun...@r-project.org > [mailto:r-help-boun...@r-project.org] On Behalf Of Ryan Harrigan > Sent: Sunday, June 07, 2009 9:38 PM > To: r-help@r-project.org > Subject: [R] Random Forest % Variation vs Psuedo-R^2

Re: [R] Problem in 'Apply' function: does anybody have othersolution

2009-06-17 Thread Liaw, Andy
Could it be that the "problematic" data came from csv files with quotes? What does str() on those data say? Recall that apply() will coerce the object to a matrix (if it's not), which means everything needs to be the same type, so if even just one column is read into R as non-numeric, the entire r

[R] where/what is i? for loop (black?) magic

2009-06-17 Thread Liaw, Andy
). This is the only logical explanation I can come up with given the behavior observed above. Can anyone confirm/deny this? If this is true, one thing to consider is not to use a large object to loop over (e.g., columns of a very large data frame). Andy Notice: This e-mail message, togethe

Re: [R] where/what is i? for loop (black?) magic

2009-06-18 Thread Liaw, Andy
From: Duncan Murdoch > > Liaw, Andy wrote: > > A colleague and I were trying to understand all the > possible things one > > can do with for loops in R, and found some surprises. I think we've > > done sufficient detective work to have a good guess as to > wh

[R] FW: Can I estimate strength and correlation of Random Forest in R package " randomForest"?

2009-06-19 Thread Liaw, Andy
Didn't realize the message was cc'ed to R-help. Here's my reply... ____ From: Liaw, Andy Sent: Thursday, June 18, 2009 11:35 AM To: 'Li GUO' Subject: RE: Can I estimate strength and correlation of Random Forest in R package " ran

Re: [R] Do we have to control for block in block designs if it is insignificant?

2009-03-24 Thread Liaw, Andy
The short answer is "no" (meaning to leave the blocks in the model). As Frank Harrell said, you've spent your degrees of freedom. Go home and be happy. Best, Andy From: J S > Sent: Tuesday, March 24, 2009 9:49 AM > To: r-help@r-project.org > Subject: [R] Do we have

Re: [R] Random Forest Variable Importance

2009-03-27 Thread Liaw, Andy
Read ?importance, especially the "scale" argument. Andy > -Original Message- > From: r-help-boun...@r-project.org > [mailto:r-help-boun...@r-project.org] On Behalf Of Li GUO > Sent: Friday, March 27, 2009 1:24 PM > To: r-help@r-project.org > Subject

Re: [R] Concern with randomForest

2009-04-07 Thread Liaw, Andy
this statistic would be estimating a very small number (or zero), and can come out negative. I would interpret any negative pseudo-R^2 as indication of very poor model. Andy From: Ryan Harrigan > Hi all, > When running a randomForest run using the following command: > > forestpl

Re: [R] help with random forest package

2009-04-08 Thread Liaw, Andy
e varUsed() and getTree() functions. Andy From: Chrysanthi A. > Hello, > > I am a phd student in Bioinformatics and I am using the Random Forest > package in order to classify my data, but I have some questions. > Is there a function in order to visualize the trees, so as to >

Re: [R] help with random forest package

2009-04-08 Thread Liaw, Andy
, just the underlying representation of the tree). Andy From: Chrysanthi A. [mailto:chrys...@gmail.com] Sent: Wednesday, April 08, 2009 2:56 PM To: Liaw, Andy Cc: r-help@r-project.org Subject: Re: [R] help with random fore

Re: [R] Random Forests Variable Importance Question

2009-04-13 Thread Liaw, Andy
ith two classes) are the analogous measures that address each of the two classes specifically, rather than over all of the data. Andy From: Paul Fisch > > I am trying to use the random forests package for classification in R. > > The Variable Importance Measures listed are: &

Re: [R] Re : Running random forest using different training andtesting schemes

2009-04-13 Thread Liaw, Andy
estimates are very close to what you'd get from CV, without all the work. Andy From: Chrysanthi A. > > Hi Pierre, > > Thanks a lot for your help.. > So, using that script, I just separate my data in two parts, > right? For > using as training set the 70 % of the da

Re: [R] help with random forest package

2009-04-13 Thread Liaw, Andy
". You can take 1 - OOB error rate as the estimate of prediction accuracy (if you have not selected variables, e.g., using variable importance, in building the final RF model). Andy From: Chrysanthi A. [mailto:chrys...@gmail.com] Sent: F

Re: [R] help with random forest package

2009-04-13 Thread Liaw, Andy
edictions, only predictions from trees for which the case is out-of-bag are counted. That's why you may get odd-ball vote fractions even when you grow 100 trees and expect the votes to be in seq(0, 1, by=0.01).] 100% - 2.34% = 97.66%, not 76.6% (I can only assume you had a t

Re: [R] Random Forests: Question about R^2

2009-04-13 Thread Liaw, Andy
hope there's no question about how the pseudo R^2 is computed on a test set? If you understand how that's done, I assume the confusion is only how the OOB MSE is formed. Best, Andy From: Dimitri Liakhovitski > > Dear Random Forests gurus, > > I have a question about R^2 pr

Re: [R] Random Forests: Question about R^2

2009-04-13 Thread Liaw, Andy
Apologies: that should have been sum(residual^2)! > -Original Message- > From: Dimitri Liakhovitski [mailto:ld7...@gmail.com] > Sent: Monday, April 13, 2009 4:35 PM > To: Liaw, Andy > Cc: R-Help List > Subject: Re: [R] Random Forests: Question about R^2 > > And

[R] Creating a list of database names for merge.rec

2009-04-16 Thread Andy Barenberg
form of > c("dbn5", "dbn6",) > where to make the merge.rec function work I need it to be of the form > dbn5,dbn6, without the quotation marks. any easier way of doing this? Thanks Andy -- Andy Barenberg University of Mass. Amherst - Economics 924 Th

Re: [R] Random Forests: Question about R^2

2009-04-21 Thread Liaw, Andy
Just one small correction: in #3 it should be squared residuals. Yes, the function returns a vector of r^2 with length=ntree, with the k-th element being the r^2 for the forest consisting of the first k trees. Cheers, Andy From: Dimitri Liakhovitski > > I would like to summarize. Wou

Re: [R] Random Forests: Predictor importance for Regression Trees

2009-04-21 Thread Liaw, Andy
Yes, you've got it! Cheers, Andy From: Behalf Of Dimitri > > Hello! > > I think I am relatively clear on how predictor importance (the first > one) is calculated by Random Forests for a Classification tree: > > Importance of predictor P1 when the response varia

Re: [R] help with random forest package

2009-04-28 Thread Liaw, Andy
xpect anyone to predict data "manually" like this. predict.randomForest() does all this for you. As to individual tree predictions, predict.randomForest() has an option "predict.all" that you can use. To get the OOB votes, though, you will also need to

  1   2   3   4   >