[R] Time Series prediction

2013-05-27 Thread Giovanni Azua
Hello, I would like to use a parametric TS model and predictor as benchmark to compare against other ML methods I'm employing. I currently build a simple e.g. ARIMA model using the convenient auto.arima function like this: library(forecast) df <- read.table("/Users/bravegag/data/myts.dat") # btw

Re: [R] Text mining

2013-01-26 Thread Giovanni Azua
Hi Steve, IMO this problem does not need a classifier but rather a database and a simple query. I would just build a database with all city names including the geo information, and then say whether it is north or south exactly. If there was such a "rule" (which I doubt) I would expect it to have

Re: [R] use subset to trim data but include last per category

2012-09-09 Thread Giovanni Azua
Hello, This solves my problem in a horribly inelegant way that works: df <- data.frame(n=newInput$n, iter=newInput$iter, Error=newInput$Error, Duality_Gap=newInput$Duality, Runtime=newInput$Acc) df_last <- aggregate(x=df$iter, by=list(df$n), FUN=max) names(df_last)[names(df_last)=="Group.1"] <-

Re: [R] use subset to trim data but include last per category

2012-09-09 Thread Giovanni Azua
OO#.. Playing > Research Engineer (Solar/BatteriesO.O#. #.O#. with > /Software/Embedded Controllers) .OO#. .OO#. rocks...1k > ------- > Sent from my phone. Please excuse my brevi

[R] use subset to trim data but include last per category

2012-09-09 Thread Giovanni Azua
Hello, I bumped into the following funny use-case. I have too much data for a given plot. I have the following data frame df: > str(df) 'data.frame': 5015 obs. of 5 variables: $ n : Factor w/ 5 levels "1000","2000",..: 1 1 1 1 1 1 1 1 1 1 ... $ iter : int 10 20 30 40 50 60

[R] Grid package: how to customize cell spacing?

2012-09-08 Thread Giovanni Azua
Hello, I am using the recipe below to place plots side by side: http://wiki.stdout.org/rcookbook/Graphs/Multiple%20graphs%20on%20one%20page%20%28ggplot2%29/ How can I reduce or customize the horizontal spacing between the grid cells? I have researched the Grid package but can't find the way to

Re: [R] simplest way (set of functions) to parse a file

2012-08-27 Thread Giovanni Azua
00 > 16 80Step1 0.0 > 17 81Step1 0.0 > 18 102Step2 0.72146 > 19 204Step2 0.000230161 > 20 64 10Step2 0.003956920 > 21 64 20Step2 0.004390998 > 22 64 30Step2 0.004326610 > 23 64 40S

[R] simplest way (set of functions) to parse a file

2012-08-27 Thread Giovanni Azua
Hello, What would be the best set of R functions to parse and transform a file? My file looks as shown below. I would like to plot this data and I need to parse it into a single data frame that sorts of "transposes the data" with the following structure: > df <- data.frame(n=c(1,1,2,2),iter=c(

[R] [slightly OT] le: will a new point shift the solution question

2012-03-23 Thread Giovanni Azua
Hello, Is there an R function that given a linear regression solution for a data set will answer in the most efficient way whether a new data point shifts the solution or not? or whether the new solution would differ by less than some error. I need this in the context of an iterative method an

Re: [R] glm predict issue

2011-12-26 Thread Giovanni Azua
, Ben Bolker wrote: > Giovanni Azua gmail.com> writes: > >> >> Hello, >> >> I have tried reading the documentation and googling for the answer but > reviewing the online matches I end up >> more confused than before. >> >> My problem is appa

[R] glm predict issue

2011-12-26 Thread Giovanni Azua
Hello, I have tried reading the documentation and googling for the answer but reviewing the online matches I end up more confused than before. My problem is apparently simple. I fit a glm model (2^k experiment), and then I would like to predict the response variable (Throughput) for unseen fact

Re: [R] data frame and cumulative sum

2011-12-07 Thread Giovanni Azua
gt; a modicum of effort > > Michael > > On Wed, Dec 7, 2011 at 5:13 PM, Giovanni Azua wrote: >> Hello, >> >> I have a data frame that looks like this (containing interarrival times): >> >>> str(df) >> 'data.frame': 18233

[R] data frame and cumulative sum

2011-12-07 Thread Giovanni Azua
Hello, I have a data frame that looks like this (containing interarrival times): > str(df) 'data.frame': 18233 obs. of 1 variable: $ Interarrival: int 135 806 117 4 14 1 9 104 169 0 ... > head(df) Interarrival 1 135 2 806 3 117 44 5 14 6

[R] R-latex syntax highlighting?

2011-11-23 Thread Giovanni Azua
Hello, Can anyone provide or point me to a good setup for the listings latex package that would produce nice R-syntax highlighting? I am using an example I found in internet for setting up listings like this: \lstset{ language=R, basicstyle=\scriptsize\ttfamily, commentstyle=\ttfamily\color{gray

Re: [R] [OT] 1 vs 2-way anova technical question

2011-11-22 Thread Giovanni Azua
On Nov 22, 2011, at 3:52 PM, Liviu Andronic wrote: > On Tue, Nov 22, 2011 at 2:09 PM, Giovanni Azua wrote: >> Mr. Gunter did not read/understand my problem, and there were no useful tips >> but only ad hominem attacks. By your side-taking I suspect you are in the >> sam

Re: [R] [OT] 1 vs 2-way anova technical question

2011-11-22 Thread Giovanni Azua
On Nov 22, 2011, at 10:35 AM, Joshua Wiley wrote: > It is true the way you use general lists is not our business, but the > R-help list is a community and there are community rules. One of I meant that my use of the lists is not of __his__ business I wasn't referring to you nor other people in

Re: [R] [OT] 1 vs 2-way anova technical question

2011-11-21 Thread Giovanni Azua
On Nov 21, 2011, at 8:31 PM, Bert Gunter wrote: > we disagree is that I think data analysts with limited statistical > backgrounds should consult with local statisticians instead of trying > to muddle through on their own thru lists like this. This is not meant I think that people lacking reading

Re: [R] [OT] 1 vs 2-way anova technical question

2011-11-21 Thread Giovanni Azua
Hello Rob, Thank you for your suggestions. I tried glm too without success. Anyhow I include all the information just in case someone with good knowledge can give me a hand with this. I take log of the response variable because: - its values span across multiple orders of magnitudes - the diag

Re: [R] [OT] 1 vs 2-way anova technical question

2011-11-21 Thread Giovanni Azua
that > point 4 is a widely shared problem among posters here. > > Cheers, > Bert > > On Mon, Nov 21, 2011 at 5:02 AM, Giovanni Azua wrote: >> Hello, >> >> Couple of clarifications: >> - A,B,C,D are factors and I am also interested in possible interactio

Re: [R] [OT] 1 vs 2-way anova technical question

2011-11-21 Thread Giovanni Azua
ir 2-way interactions ... Thanks in advance, Best regards, Giovanni On Nov 21, 2011, at 12:04 PM, Giovanni Azua wrote: > Hello, > > I know there is plenty of people in this group who can give me a good answer > :) > > I have a 2^k model where k=4 like this: > Model 1) R~A*B*C*D >

[R] [OT] 1 vs 2-way anova technical question

2011-11-21 Thread Giovanni Azua
Hello, I know there is plenty of people in this group who can give me a good answer :) I have a 2^k model where k=4 like this: Model 1) R~A*B*C*D If I use the "*" in R among all elements it means to me to explore all interactions and include them in the model i.e. I think this would be the so

[R] anova to pca

2011-11-20 Thread Giovanni Azua
Hello, I would like to reinforce my anova results using PCA i.e. which factor are most important because they explain most of the variance (i.e. signal) of my 2^k*r experiment. However, I get the following error while trying to run PCA: > throughput.prcomp <- > prcomp(~No_databases+Partitionin

[R] aov how to get the SST?

2011-11-17 Thread Giovanni Azua
Hello, I currently run aov in the following way: > throughput.aov <- > aov(log(Throughput)~No_databases+Partitioning+No_middlewares+Queue_size,data=throughput) > summary(throughput.aov) Df Sum Sq Mean Sq F valuePr(>F) No_databases 1 184.68 184.675 136.6945 < 2.2e-16

[R] boxplot strange behavior

2011-11-16 Thread Giovanni Azua
Hello, I generate box plots from my data like this: qplot(x=xxx,y=column,data=data,geom="boxplot") + xlab("xxx") + ylab(ylabel) + theme_bw() + scale_y_log10() + geom_jitter(alpha=I(1/10)) The problem is that I see lot of points above the maximum at the same level as some outliers. It looks ver

[R] aov output question

2011-11-14 Thread Giovanni Azua
Hello, I currently get anova results out of the aov function (see below) I use the model.tables and I believe it gives me back the model parameters of the fit (betas), however I don't see the intercept (beta_0) and don't understand what the "rep" output means and there is no description in the

Re: [R] 2^k*r (with replications) experimental design question

2011-11-13 Thread Giovanni Azua
le to designate the > replicates and use it as a blocking factor in the ANOVA. If you want > to treat the replicates as a random rather than a fixed factor, then > look into the nlme or lme4 packages. > > HTH, > Dennis > > On Sun, Nov 13, 2011 at 4:33 PM, Giovanni Azua wrot

[R] 2^k*r (with replications) experimental design question

2011-11-13 Thread Giovanni Azua
Hello, I have one replication (r=1 of the 2^k*r) of a 2^k experimental design in the context of performance analysis i.e. my response variables are Throughput and Response Time. I use the "aov" function and the results look ok: > str(throughput) 'data.frame': 286 obs. of 7 variables: $ Time

[R] issue plotting TukeyHSD

2011-11-13 Thread Giovanni Azua
Hello, When I try to use TukeyHSD in the following way it shows the confidence interval corresponding to the last factor only. > throughput.aov <- > aov(Throughput~No_databases+Partitioning+No_middlewares+Queue_size,data=throughput) plot(TukeyHSD(throughput.aov)) # I expected here to see the c

Re: [R] 2^k experiment generator

2011-11-13 Thread Giovanni Azua
Never mind, found it, it is the expand.grid function. On Nov 13, 2011, at 3:25 PM, Giovanni Azua wrote: > Hello, > > While looking for info on 2^k experimental design and anova I remember I saw > somewhere there was a function to generate all the experiments. I can't fin

Re: [R] dev.new() within a loop

2011-11-13 Thread Giovanni Azua
On Nov 13, 2011, at 3:23 PM, David Winsemius wrote: >>> Please read both my comments and the FAQ more carefully . You are >>> inadequately considering the information that has been offered to you. >>> >> Ok you wanted to make sure I have to read the FAQ well I didn't have to :) >> Googling usin

[R] 2^k experiment generator

2011-11-13 Thread Giovanni Azua
Hello, While looking for info on 2^k experimental design and anova I remember I saw somewhere there was a function to generate all the experiments. I can't find the function anymore can anyone suggest? The function takes as input the factors and levels and generates all the experiments. I kno

Re: [R] dev.new() within a loop

2011-11-13 Thread Giovanni Azua
Hello David, On Nov 13, 2011, at 5:20 AM, David Winsemius wrote: >> However, when executing plot_raw which invokes dev.new(..) all windows come >> out blank whereas if I execute each file outside of a loop then I can see >> the plots properly. > > Perhaps ...(you did not say what package this p

[R] dev.new() within a loop

2011-11-12 Thread Giovanni Azua
Hello, I have a loop where I iterate performance data files within a folder, parse and plot them in one shot (see below). However, when executing plot_raw which invokes dev.new(..) all windows come out blank whereas if I execute each file outside of a loop then I can see the plots properly. Wh

[R] 2^k*r experimental design and anova

2011-11-10 Thread Giovanni Azua
Hello, Can anyone point me to an online tutorial or book containing the easiest way to do ANOVA over the result data from a 2^k*r experiment. It is not clear to me if I can pass the raw data corresponding to each experiment or just the summarized data i.e. mean, sse, std, etc. I would like to

[R] binning runtimes

2011-10-24 Thread Giovanni Azua
Hello, Suppose I have the dataset shown below. The amount of observations is too massive to get a nice geom_point and smoother on top. What I would like to do is to bin the data first. The data is indexed by Time (minutes from 1 to 120 i.e. two hours of System benchmarking). Option 1) group th

[R] code review: is it too much to ask?

2011-10-23 Thread Giovanni Azua
, Giovanni # = # Advanced Systems Lab # Milestone 1 # Author: Giovanni Azua # Date: 22 October 2011 # =

[R] summarizing a data frame i.e. count -> group by

2011-10-23 Thread Giovanni Azua
Hello, This is one problem at the time :) I have a data frame df that looks like this: time partitioning_mode workload runtime 1 1 shardingquery 607 2 1 shardingquery 85 3 1 shardingquery 52 4 1 shardingquery

Re: [R] unfold list (variable number of columns) into a data frame

2011-10-23 Thread Giovanni Azua
Hi Dennis, Thank you very nice :) Best regards, Giovanni On Oct 23, 2011, at 6:55 PM, Dennis Murphy wrote: > Hi: > > Here's one approach: > > # Function to process a list component into a data frame > ff <- function(x) { > data.frame(time = x[1], partitioning_mode = x[2], workload = x[3],

[R] unfold list (variable number of columns) into a data frame

2011-10-23 Thread Giovanni Azua
Hello, I used R a lot one year ago and now I am a bit rusty :) I have my raw data which correspond to the list of runtimes per minute (minute "1" "2" "3" in two database modes "sharding" and "query" and two workload types "query" and "refresh") and as a list of char arrays that looks like this:

Re: [R] issue loading doBy library

2011-10-23 Thread Giovanni Azua
Hi Josh, Thank you for your feedback, after lot of trial and error the problem is finally solved. To solve this problem, I tried in this order: 1) uninstalling the two packages "Matrix" and "lme4" and reinstalling them. 2) uninstalling doBy and reinstalling it with and without 1) 3) upgrading t

[R] issue loading doBy library

2011-10-22 Thread Giovanni Azua
Hello, How can I fix this? I have the latest version of R 2.13.2 and I use Mac OS X 10.7.2 > library(doBy) Loading required package: lme4 Error in dyn.load(file, DLLpath = DLLpath, ...) : function 'cholmod_l_start' not provided by package 'Matrix' Error: package 'lme4' could not be loaded > lib

[R] update and rebuild all?

2010-08-24 Thread Giovanni Azua
Hello, I upgraded my Mac R version to the newest 2.11.1, then I ran the option to update all packages but there was an error related to fetching one of those and the process stopped. I retried updating all packages but nothing happens. Although all my course project scripts work perfectly is th

Re: [R] plot for linear discriminant

2010-05-16 Thread Giovanni Azua
Hello Hadley, Thank you very much for your help! I have just received your book btw :) On May 16, 2010, at 6:16 PM, Hadley Wickham wrote: >Hi Giovanni, > >Have a look at the classifly package for an alternative approach that >works for all classification algorithms. If you provided a small >repr

[R] abline limit constrain x-range how?

2010-05-15 Thread Giovanni Azua
Hello, I managed to "linearize" my LDA decision boundaries now I would like to call abline three times but be able to specify the exact x range. I was reading the doc but it doesn't seem to support this use-case? are there alternatives. The reason why I use abline is because I first call plot t

[R] plot for linear discriminant

2010-05-15 Thread Giovanni Azua
Hello, I have a labelled dataset with three classes. I have computed manually the LDA hyperplane that separate the classes from each other i.e. \hat{\delta}_j(x)=x^Tb_j + c_j where b_j \in \mathbb{R}^p and c_j \in \mathbb{R} my concrete b_j looks like e.g. b_j <- rbind(1,2) c_j <- 3 How can I

Re: [R] plot formula 'x' is missing?

2010-05-14 Thread Giovanni Azua
; Hi Giovanni, > > curve(1/(1+exp(5.0993-0.1084*x)), 0, 100) > > HTH, > Jorge > > > On Sat, May 15, 2010 at 12:43 AM, Giovanni Azua <> wrote: > Hello, > > I'd like to plot the logistic function for a specific model like this: > > > plot(formula=y~1/(1

[R] plot formula 'x' is missing?

2010-05-14 Thread Giovanni Azua
Hello, I'd like to plot the logistic function for a specific model like this: > plot(formula=y~1/(1+exp(5.0993-0.1084*x)),data=data.frame(x=seq(0,100,length.out=1000))) Error in is.function(x) : 'x' is missing However, I get the 'x' is missing error above and don't know how to fix it ... Can a

Re: [R] plot with no default axis labels

2010-05-14 Thread Giovanni Azua
Hello Jim, Very nice example! thank you! Best regards, Giovanni On May 14, 2010, at 11:50 AM, Jim Lemon wrote: > On 05/14/2010 07:31 PM, Giovanni Azua wrote: >> Hello, >> >> I could not find an easy way to have the plot function not display the >> default x and y-a

Re: [R] plot with no default axis labels

2010-05-14 Thread Giovanni Azua
Hello, I found the answer here: http://www.statmethods.net/advgraphs/axes.html basically plot(...,axes=FALSE,...) ## avoids default axis labels Best regards, Giovanni On May 14, 2010, at 11:31 AM, Giovanni Azua wrote: > Hello, > > I could not find an easy way to have the plot fun

[R] plot with no default axis labels

2010-05-14 Thread Giovanni Azua
Hello, I could not find an easy way to have the plot function not display the default x and y-axis labels, I would like to customize it to show only points of interest ... I would like to: 1- call plot that show no x-axis and y-axis labels 2- call axis specifying the exact points of interest fo

Re: [R] ggplot2's geom_errorbar legend

2010-05-02 Thread Giovanni Azua
Hello Ista, On May 1, 2010, at 8:37 PM, Ista Zahn wrote: > Hi Giovanni, > A reproducible example would help. Also, since I think this will be > tricky, it might be a good idea to post it to the ggplot2 mailing list > (you can register at http://had.co.nz/ggplot2/ ). > > Best, > Ista First, thank

[R] cbind and automatic type conversion

2010-05-01 Thread Giovanni Azua
Hello, I have three method types and 100 generalization errors for each, all in the range [0.65,0.81]. I would like to make a stacked histogram plot using ggplot2 with this data ... Therefore I need a data frame of the form e.g. Method GE -- --

Re: [R] closest match in R to c-like struct?

2010-05-01 Thread Giovanni Azua
On May 1, 2010, at 6:48 PM, steven mosher wrote: > I was talking with another guy on the list about this very topic. > > A simple example would help. > > first a sample C struct, and then how one would do the equivalent in R. > > In the end i suppose one want to do a an 'array' of these structs

Re: [R] closest match in R to c-like struct?

2010-05-01 Thread Giovanni Azua
On May 1, 2010, at 5:04 PM, (Ted Harding) wrote: > Well, 'list' must be pretty close! The main difference would be > that in C the structure type would be declared first, and then > applied to create an object with that structure, whereas an R > lists are created straight off. If you want to set u

[R] bootstrap generalization error

2010-05-01 Thread Giovanni Azua
Hello, I use the following function "bootstrapge" to calculate (and compare) the generalization error of several bootstrap implementations: ## ## Calculates and returns a coefficient corresponding to the generalization ## error. The formula for the bootstrap generalization error is: ## $N^{-1}\

[R] closest match in R to c-like struct?

2010-05-01 Thread Giovanni Azua
Hello, What would be in R the closest match to a c-struct? e.g. data.frame requires all elements to be of the same length ... or is there a way to circumvent this? TIA, Best regards, Giovanni __ R-help@r-project.org mailing list https://stat.ethz.ch/ma

[R] ggplot2's geom_errorbar legend

2010-05-01 Thread Giovanni Azua
Hello, I create a simple ggplot that only shows a straight line. I then add three datasets of CI using the geom_errorbar function. The problem is that I can't find any way to have the legend showing up ... I need to show what each color of the CIs corresponds to i.e. which method. Can anyone

Re: [R] apply question

2010-04-30 Thread Giovanni Azua
Hello David, On Apr 30, 2010, at 11:00 PM, David Winsemius wrote: > Note: Loops may be just as fast or faster than apply calls. > How come!? is this true also for other similar functions: lapply, tapply and sapply? Then the only advantage of these above is only syntactic sugar? >> >> indices

[R] apply question

2010-04-30 Thread Giovanni Azua
Hello, I have a bootstrap implementation loop that I would like to replace by a faster "batch operator". The loop is as follows: for (b in 1:B) { indices <- sample(1:N, size=N, replace=TRUE) # sample n elements with replacement theta_star[b,] = statistic(data,indices) # exe

[R] ggplot2 legend how?

2010-04-30 Thread Giovanni Azua
Hello, I have just ordered the "ggplot2: Elegant Graphics for Data Analysis (Use R)" but while it arrives :) can anyone please show me how to setup and add a simple legend to a ggplot? This is my use case, I need a legend showing CI "Classic", "Own bootstrap", "R bootstrap": library(ggplot2)

Re: [R] SOLVED plotting multiple CIs

2010-04-30 Thread Giovanni Azua
Hello, After installing gfortran from http://r.research.att.com/gfortran-4.2.3.dmg it finally works! see below. Thank you all. @Ista Zahn: Looks fantastic! :) thank you so much! ... is there a way to have a small circle on the true value? Best regards, Giovanni > install.packages("Hmisc", d

Re: [R] plotting multiple CIs

2010-04-30 Thread Giovanni Azua
Hello David, On Apr 30, 2010, at 6:00 PM, David Winsemius wrote: > Looks like you do not have the RTools bundle and perhaps not the XCode > framework either? > > I am not suggesting that you do so, since it appears you are not conversant > with compiling source code packages. If I am wrong abo

Re: [R] plotting multiple CIs

2010-04-30 Thread Giovanni Azua
Hello Zahn, On Apr 30, 2010, at 4:35 PM, Ista Zahn wrote: > Hi Giovanni, > I think the ggplot2 package might help you out here. Do you want > something like this? Thank you for your suggestion however I could not give it a try since landed in the same issue being reported about the Hmisc packag

[R] plotting multiple CIs

2010-04-30 Thread Giovanni Azua
Hello, I need to plot multiple confidence intervals for the same model parameter e.g. so for the same value of the parameter in point x_1 I would like to see four different confidence intervals so that I can compare the accuracy e.g. boot basic vs normal vs my own vs classic lm CI etc. I li

Re: [R] function pointer question

2010-04-26 Thread Giovanni Azua
Hello Jan, On Apr 26, 2010, at 8:56 AM, Jan van der Laan wrote: > You can use the '...' for that, as in: > > loocv <- function(data, fnc, ...) { > n <- length(data.x) > score <- 0 > for (i in 1:n) { > x_i <- data.x[-i] > y_i <- data.y[-i] > yhat <- fnc(x=x_i,y=y_i, ...) > score <- score +

Re: [R] function pointer question

2010-04-25 Thread Giovanni Azua
gards, Giovanni On Apr 26, 2010, at 1:38 AM, Giovanni Azua wrote: > Hello, > > I have the following function that receives a "function pointer" formal > parameter name "fnc": > > loocv <- function(data, fnc) { > n <- length(data.x) > score <

[R] function pointer question

2010-04-25 Thread Giovanni Azua
Hello, I have the following function that receives a "function pointer" formal parameter name "fnc": loocv <- function(data, fnc) { n <- length(data.x) score <- 0 for (i in 1:n) { x_i <- data.x[-i] y_i <- data.y[-i] yhat <- fnc(x=x_i,y=y_i) score <- score + (y_i - yhat)^2

Re: [R] interpreting acf plot

2010-04-17 Thread Giovanni Azua
Hello Denis, (1) I appreciate your feedback, however, I feel I have all the right to ask a specific question related R namely what's the interpretation of the acf function plot. I gave away the information that it is a homework because many times people before helping ask what's the context for

[R] interpreting acf plot

2010-04-17 Thread Giovanni Azua
, Giovanni # = # Computational Statistics # Series 4 # Author: Giovanni Azua # Date: 16 April 2010 # = rm(list=ls())

Re: [R] create space for a matrix

2010-03-28 Thread Giovanni Azua Garcia
Hi Leo, see the matrix function e.g. m <- matrix(0, nrow=1, ncol=3) then you can use functions like rbind or cbind to create bigger ones. I am a newbie so double check everything :) HTH, Best regards, Giovanni On Mar 29, 2010, at 8:37 AM, leobon wrote: > > Hello all, > I want to creat a sp

[R] data fitting and confidence band

2010-03-27 Thread Giovanni Azua Garcia
Hello, I am fitting data using different methods e.g. Local Polynomial and Smoothing splines. The data is generated out of a true function model with added normally distributed noise. I would like to know "how often the confidence band for all points simultaneously contain all true values". I