Hello,
I would like to use a parametric TS model and predictor as benchmark to
compare against other ML methods I'm employing. I currently build a simple
e.g. ARIMA model using the convenient auto.arima function like this:
library(forecast)
df <- read.table("/Users/bravegag/data/myts.dat")
# btw
Hi Steve,
IMO this problem does not need a classifier but rather a database and a
simple query. I would just build a database with all city names including
the geo information, and then say whether it is north or south exactly.
If there was such a "rule" (which I doubt) I would expect it to have
Hello,
This solves my problem in a horribly inelegant way that works:
df <- data.frame(n=newInput$n, iter=newInput$iter, Error=newInput$Error,
Duality_Gap=newInput$Duality, Runtime=newInput$Acc)
df_last <- aggregate(x=df$iter, by=list(df$n), FUN=max)
names(df_last)[names(df_last)=="Group.1"] <-
OO#.. Playing
> Research Engineer (Solar/BatteriesO.O#. #.O#. with
> /Software/Embedded Controllers) .OO#. .OO#. rocks...1k
> -------
> Sent from my phone. Please excuse my brevi
Hello,
I bumped into the following funny use-case. I have too much data for a given
plot. I have the following data frame df:
> str(df)
'data.frame': 5015 obs. of 5 variables:
$ n : Factor w/ 5 levels "1000","2000",..: 1 1 1 1 1 1 1 1 1 1 ...
$ iter : int 10 20 30 40 50 60
Hello,
I am using the recipe below to place plots side by side:
http://wiki.stdout.org/rcookbook/Graphs/Multiple%20graphs%20on%20one%20page%20%28ggplot2%29/
How can I reduce or customize the horizontal spacing between the grid cells? I
have researched the Grid package but can't find the way to
00
> 16 80Step1 0.0
> 17 81Step1 0.0
> 18 102Step2 0.72146
> 19 204Step2 0.000230161
> 20 64 10Step2 0.003956920
> 21 64 20Step2 0.004390998
> 22 64 30Step2 0.004326610
> 23 64 40S
Hello,
What would be the best set of R functions to parse and transform a file?
My file looks as shown below. I would like to plot this data and I need to
parse it into a single data frame that sorts of "transposes the data" with the
following structure:
> df <- data.frame(n=c(1,1,2,2),iter=c(
Hello,
Is there an R function that given a linear regression solution for a data set
will answer in the most efficient way whether a new data point shifts the
solution or not? or whether the new solution would differ by less than some
error.
I need this in the context of an iterative method an
, Ben Bolker wrote:
> Giovanni Azua gmail.com> writes:
>
>>
>> Hello,
>>
>> I have tried reading the documentation and googling for the answer but
> reviewing the online matches I end up
>> more confused than before.
>>
>> My problem is appa
Hello,
I have tried reading the documentation and googling for the answer but
reviewing the online matches I end up more confused than before.
My problem is apparently simple. I fit a glm model (2^k experiment), and then I
would like to predict the response variable (Throughput) for unseen fact
gt; a modicum of effort
>
> Michael
>
> On Wed, Dec 7, 2011 at 5:13 PM, Giovanni Azua wrote:
>> Hello,
>>
>> I have a data frame that looks like this (containing interarrival times):
>>
>>> str(df)
>> 'data.frame': 18233
Hello,
I have a data frame that looks like this (containing interarrival times):
> str(df)
'data.frame': 18233 obs. of 1 variable:
$ Interarrival: int 135 806 117 4 14 1 9 104 169 0 ...
> head(df)
Interarrival
1 135
2 806
3 117
44
5 14
6
Hello,
Can anyone provide or point me to a good setup for the listings latex package
that would produce nice R-syntax highlighting?
I am using an example I found in internet for setting up listings like this:
\lstset{
language=R,
basicstyle=\scriptsize\ttfamily,
commentstyle=\ttfamily\color{gray
On Nov 22, 2011, at 3:52 PM, Liviu Andronic wrote:
> On Tue, Nov 22, 2011 at 2:09 PM, Giovanni Azua wrote:
>> Mr. Gunter did not read/understand my problem, and there were no useful tips
>> but only ad hominem attacks. By your side-taking I suspect you are in the
>> sam
On Nov 22, 2011, at 10:35 AM, Joshua Wiley wrote:
> It is true the way you use general lists is not our business, but the
> R-help list is a community and there are community rules. One of
I meant that my use of the lists is not of __his__ business I wasn't referring
to you nor other people in
On Nov 21, 2011, at 8:31 PM, Bert Gunter wrote:
> we disagree is that I think data analysts with limited statistical
> backgrounds should consult with local statisticians instead of trying
> to muddle through on their own thru lists like this. This is not meant
I think that people lacking reading
Hello Rob,
Thank you for your suggestions. I tried glm too without success. Anyhow I
include all the information just in case someone with good knowledge can give
me a hand with this. I take log of the response variable because:
- its values span across multiple orders of magnitudes
- the diag
that
> point 4 is a widely shared problem among posters here.
>
> Cheers,
> Bert
>
> On Mon, Nov 21, 2011 at 5:02 AM, Giovanni Azua wrote:
>> Hello,
>>
>> Couple of clarifications:
>> - A,B,C,D are factors and I am also interested in possible interactio
ir 2-way interactions ...
Thanks in advance,
Best regards,
Giovanni
On Nov 21, 2011, at 12:04 PM, Giovanni Azua wrote:
> Hello,
>
> I know there is plenty of people in this group who can give me a good answer
> :)
>
> I have a 2^k model where k=4 like this:
> Model 1) R~A*B*C*D
>
Hello,
I know there is plenty of people in this group who can give me a good answer :)
I have a 2^k model where k=4 like this:
Model 1) R~A*B*C*D
If I use the "*" in R among all elements it means to me to explore all
interactions and include them in the model i.e. I think this would be the so
Hello,
I would like to reinforce my anova results using PCA i.e. which factor are most
important because they explain most of the variance (i.e. signal) of my 2^k*r
experiment. However, I get the following error while trying to run PCA:
> throughput.prcomp <-
> prcomp(~No_databases+Partitionin
Hello,
I currently run aov in the following way:
> throughput.aov <-
> aov(log(Throughput)~No_databases+Partitioning+No_middlewares+Queue_size,data=throughput)
> summary(throughput.aov)
Df Sum Sq Mean Sq F valuePr(>F)
No_databases 1 184.68 184.675 136.6945 < 2.2e-16
Hello,
I generate box plots from my data like this:
qplot(x=xxx,y=column,data=data,geom="boxplot") + xlab("xxx") + ylab(ylabel) +
theme_bw() + scale_y_log10() + geom_jitter(alpha=I(1/10))
The problem is that I see lot of points above the maximum at the same level as
some outliers. It looks ver
Hello,
I currently get anova results out of the aov function (see below) I use the
model.tables and I believe it gives me back the model parameters of the fit
(betas), however I don't see the intercept (beta_0) and don't understand what
the "rep" output means and there is no description in the
le to designate the
> replicates and use it as a blocking factor in the ANOVA. If you want
> to treat the replicates as a random rather than a fixed factor, then
> look into the nlme or lme4 packages.
>
> HTH,
> Dennis
>
> On Sun, Nov 13, 2011 at 4:33 PM, Giovanni Azua wrot
Hello,
I have one replication (r=1 of the 2^k*r) of a 2^k experimental design in the
context of performance analysis i.e. my response variables are Throughput and
Response Time. I use the "aov" function and the results look ok:
> str(throughput)
'data.frame': 286 obs. of 7 variables:
$ Time
Hello,
When I try to use TukeyHSD in the following way it shows the confidence
interval corresponding to the last factor only.
> throughput.aov <-
> aov(Throughput~No_databases+Partitioning+No_middlewares+Queue_size,data=throughput)
plot(TukeyHSD(throughput.aov)) # I expected here to see the c
Never mind, found it, it is the expand.grid function.
On Nov 13, 2011, at 3:25 PM, Giovanni Azua wrote:
> Hello,
>
> While looking for info on 2^k experimental design and anova I remember I saw
> somewhere there was a function to generate all the experiments. I can't fin
On Nov 13, 2011, at 3:23 PM, David Winsemius wrote:
>>> Please read both my comments and the FAQ more carefully . You are
>>> inadequately considering the information that has been offered to you.
>>>
>> Ok you wanted to make sure I have to read the FAQ well I didn't have to :)
>> Googling usin
Hello,
While looking for info on 2^k experimental design and anova I remember I saw
somewhere there was a function to generate all the experiments. I can't find
the function anymore can anyone suggest?
The function takes as input the factors and levels and generates all the
experiments. I kno
Hello David,
On Nov 13, 2011, at 5:20 AM, David Winsemius wrote:
>> However, when executing plot_raw which invokes dev.new(..) all windows come
>> out blank whereas if I execute each file outside of a loop then I can see
>> the plots properly.
>
> Perhaps ...(you did not say what package this p
Hello,
I have a loop where I iterate performance data files within a folder, parse and
plot them in one shot (see below).
However, when executing plot_raw which invokes dev.new(..) all windows come out
blank whereas if I execute each file outside of a loop then I can see the plots
properly. Wh
Hello,
Can anyone point me to an online tutorial or book containing the easiest way to
do ANOVA over the result data from a 2^k*r experiment. It is not clear to me if
I can pass the raw data corresponding to each experiment or just the summarized
data i.e. mean, sse, std, etc.
I would like to
Hello,
Suppose I have the dataset shown below. The amount of observations is too
massive to get a nice geom_point and smoother on top. What I would like to do
is to bin the data first. The data is indexed by Time (minutes from 1 to 120
i.e. two hours of System benchmarking).
Option 1) group th
,
Giovanni
#
=
# Advanced Systems Lab
# Milestone 1
# Author: Giovanni Azua
# Date: 22 October 2011
#
=
Hello,
This is one problem at the time :)
I have a data frame df that looks like this:
time partitioning_mode workload runtime
1 1 shardingquery 607
2 1 shardingquery 85
3 1 shardingquery 52
4 1 shardingquery
Hi Dennis,
Thank you very nice :)
Best regards,
Giovanni
On Oct 23, 2011, at 6:55 PM, Dennis Murphy wrote:
> Hi:
>
> Here's one approach:
>
> # Function to process a list component into a data frame
> ff <- function(x) {
> data.frame(time = x[1], partitioning_mode = x[2], workload = x[3],
Hello,
I used R a lot one year ago and now I am a bit rusty :)
I have my raw data which correspond to the list of runtimes per minute (minute
"1" "2" "3" in two database modes "sharding" and "query" and two workload types
"query" and "refresh") and as a list of char arrays that looks like this:
Hi Josh,
Thank you for your feedback, after lot of trial and error the problem is
finally solved.
To solve this problem, I tried in this order:
1) uninstalling the two packages "Matrix" and "lme4" and reinstalling them.
2) uninstalling doBy and reinstalling it with and without 1)
3) upgrading t
Hello,
How can I fix this? I have the latest version of R 2.13.2 and I use Mac OS X
10.7.2
> library(doBy)
Loading required package: lme4
Error in dyn.load(file, DLLpath = DLLpath, ...) :
function 'cholmod_l_start' not provided by package 'Matrix'
Error: package 'lme4' could not be loaded
> lib
Hello,
I upgraded my Mac R version to the newest 2.11.1, then I ran the option to
update all packages but there was an error related to fetching one of those and
the process stopped. I retried updating all packages but nothing happens.
Although all my course project scripts work perfectly is th
Hello Hadley,
Thank you very much for your help! I have just received your book btw :)
On May 16, 2010, at 6:16 PM, Hadley Wickham wrote:
>Hi Giovanni,
>
>Have a look at the classifly package for an alternative approach that
>works for all classification algorithms. If you provided a small
>repr
Hello,
I managed to "linearize" my LDA decision boundaries now I would like to call
abline three times but be able to specify the exact x range. I was reading the
doc but it doesn't seem to support this use-case? are there alternatives. The
reason why I use abline is because I first call plot t
Hello,
I have a labelled dataset with three classes. I have computed manually the LDA
hyperplane that separate the classes from each other i.e.
\hat{\delta}_j(x)=x^Tb_j + c_j where b_j \in \mathbb{R}^p and c_j \in \mathbb{R}
my concrete b_j looks like e.g.
b_j <- rbind(1,2)
c_j <- 3
How can I
; Hi Giovanni,
>
> curve(1/(1+exp(5.0993-0.1084*x)), 0, 100)
>
> HTH,
> Jorge
>
>
> On Sat, May 15, 2010 at 12:43 AM, Giovanni Azua <> wrote:
> Hello,
>
> I'd like to plot the logistic function for a specific model like this:
>
> > plot(formula=y~1/(1
Hello,
I'd like to plot the logistic function for a specific model like this:
> plot(formula=y~1/(1+exp(5.0993-0.1084*x)),data=data.frame(x=seq(0,100,length.out=1000)))
Error in is.function(x) : 'x' is missing
However, I get the 'x' is missing error above and don't know how to fix it ...
Can a
Hello Jim,
Very nice example! thank you!
Best regards,
Giovanni
On May 14, 2010, at 11:50 AM, Jim Lemon wrote:
> On 05/14/2010 07:31 PM, Giovanni Azua wrote:
>> Hello,
>>
>> I could not find an easy way to have the plot function not display the
>> default x and y-a
Hello,
I found the answer here:
http://www.statmethods.net/advgraphs/axes.html
basically plot(...,axes=FALSE,...) ## avoids default axis labels
Best regards,
Giovanni
On May 14, 2010, at 11:31 AM, Giovanni Azua wrote:
> Hello,
>
> I could not find an easy way to have the plot fun
Hello,
I could not find an easy way to have the plot function not display the default
x and y-axis labels, I would like to customize it to show only points of
interest ... I would like to:
1- call plot that show no x-axis and y-axis labels
2- call axis specifying the exact points of interest fo
Hello Ista,
On May 1, 2010, at 8:37 PM, Ista Zahn wrote:
> Hi Giovanni,
> A reproducible example would help. Also, since I think this will be
> tricky, it might be a good idea to post it to the ggplot2 mailing list
> (you can register at http://had.co.nz/ggplot2/ ).
>
> Best,
> Ista
First, thank
Hello,
I have three method types and 100 generalization errors for each, all in the
range [0.65,0.81]. I would like to make a stacked histogram plot using ggplot2
with this data ...
Therefore I need a data frame of the form e.g.
Method GE
-- --
On May 1, 2010, at 6:48 PM, steven mosher wrote:
> I was talking with another guy on the list about this very topic.
>
> A simple example would help.
>
> first a sample C struct, and then how one would do the equivalent in R.
>
> In the end i suppose one want to do a an 'array' of these structs
On May 1, 2010, at 5:04 PM, (Ted Harding) wrote:
> Well, 'list' must be pretty close! The main difference would be
> that in C the structure type would be declared first, and then
> applied to create an object with that structure, whereas an R
> lists are created straight off. If you want to set u
Hello,
I use the following function "bootstrapge" to calculate (and compare) the
generalization error of several bootstrap implementations:
##
## Calculates and returns a coefficient corresponding to the generalization
## error. The formula for the bootstrap generalization error is:
## $N^{-1}\
Hello,
What would be in R the closest match to a c-struct? e.g. data.frame requires
all elements to be of the same length ... or is there a way to circumvent this?
TIA,
Best regards,
Giovanni
__
R-help@r-project.org mailing list
https://stat.ethz.ch/ma
Hello,
I create a simple ggplot that only shows a straight line. I then add three
datasets of CI using the geom_errorbar function. The problem is that I can't
find any way to have the legend showing up ... I need to show what each color
of the CIs corresponds to i.e. which method.
Can anyone
Hello David,
On Apr 30, 2010, at 11:00 PM, David Winsemius wrote:
> Note: Loops may be just as fast or faster than apply calls.
>
How come!? is this true also for other similar functions: lapply, tapply and
sapply?
Then the only advantage of these above is only syntactic sugar?
>>
>> indices
Hello,
I have a bootstrap implementation loop that I would like to replace by a faster
"batch operator".
The loop is as follows:
for (b in 1:B) {
indices <- sample(1:N, size=N, replace=TRUE) # sample n elements with
replacement
theta_star[b,] = statistic(data,indices) # exe
Hello,
I have just ordered the "ggplot2: Elegant Graphics for Data Analysis (Use R)"
but while it arrives :) can anyone please show me how to setup and add a simple
legend to a ggplot?
This is my use case, I need a legend showing CI "Classic", "Own bootstrap", "R
bootstrap":
library(ggplot2)
Hello,
After installing gfortran from http://r.research.att.com/gfortran-4.2.3.dmg it
finally works! see below.
Thank you all.
@Ista Zahn: Looks fantastic! :) thank you so much! ... is there a way to have a
small circle on the true value?
Best regards,
Giovanni
> install.packages("Hmisc", d
Hello David,
On Apr 30, 2010, at 6:00 PM, David Winsemius wrote:
> Looks like you do not have the RTools bundle and perhaps not the XCode
> framework either?
>
> I am not suggesting that you do so, since it appears you are not conversant
> with compiling source code packages. If I am wrong abo
Hello Zahn,
On Apr 30, 2010, at 4:35 PM, Ista Zahn wrote:
> Hi Giovanni,
> I think the ggplot2 package might help you out here. Do you want
> something like this?
Thank you for your suggestion however I could not give it a try since landed in
the same issue being reported about the Hmisc packag
Hello,
I need to plot multiple confidence intervals for the same model parameter e.g.
so for the same value of the parameter in point x_1 I would like to see four
different confidence intervals so that I can compare the accuracy e.g. boot
basic vs normal vs my own vs classic lm CI etc.
I li
Hello Jan,
On Apr 26, 2010, at 8:56 AM, Jan van der Laan wrote:
> You can use the '...' for that, as in:
>
> loocv <- function(data, fnc, ...) {
> n <- length(data.x)
> score <- 0
> for (i in 1:n) {
> x_i <- data.x[-i]
> y_i <- data.y[-i]
> yhat <- fnc(x=x_i,y=y_i, ...)
> score <- score +
gards,
Giovanni
On Apr 26, 2010, at 1:38 AM, Giovanni Azua wrote:
> Hello,
>
> I have the following function that receives a "function pointer" formal
> parameter name "fnc":
>
> loocv <- function(data, fnc) {
> n <- length(data.x)
> score <
Hello,
I have the following function that receives a "function pointer" formal
parameter name "fnc":
loocv <- function(data, fnc) {
n <- length(data.x)
score <- 0
for (i in 1:n) {
x_i <- data.x[-i]
y_i <- data.y[-i]
yhat <- fnc(x=x_i,y=y_i)
score <- score + (y_i - yhat)^2
Hello Denis,
(1) I appreciate your feedback, however, I feel I have all the right to ask a
specific question related R namely what's the interpretation of the acf
function plot. I gave away the information that it is a homework because many
times people before helping ask what's the context for
,
Giovanni
#
=
# Computational Statistics
# Series 4
# Author: Giovanni Azua
# Date: 16 April 2010
#
=
rm(list=ls())
Hi Leo,
see the matrix function e.g.
m <- matrix(0, nrow=1, ncol=3)
then you can use functions like rbind or cbind to create bigger ones.
I am a newbie so double check everything :)
HTH,
Best regards,
Giovanni
On Mar 29, 2010, at 8:37 AM, leobon wrote:
>
> Hello all,
> I want to creat a sp
Hello,
I am fitting data using different methods e.g. Local Polynomial and Smoothing
splines. The data is generated out of a true function model with added normally
distributed noise.
I would like to know "how often the confidence band for all points
simultaneously contain all true values". I
71 matches
Mail list logo