[R] data.frame, converting row data to columns
I have a data frame something like: name wrist nLevelemot 14094 3.341 frustrated 24094 3.941 frustrated 34094NA1 frustrated 44094 3.511 frustrated 54094 3.811 frustrated 64101 2.624 excited 74094 2.651 frustrated 84101NA4 excited 94101 0.244 excited 10 4101 0.234 excited I am trying to change it to this: name nLevel emot w1 w2 w3 w4 w5 w5 4094 1 frustrated3.34 3.94 NA3.51 3.812.65 4101 4 excited 2.62 NA 0.240.23 NANA The nLevel and emot will never vary with the name, so there can be one row per name. But I need the wrist measurements to be in the same row. The number of wrist measures are variable, so I could fill in with NAs . But I really just need help with reshaping the data frame I think I had some success with the melt x = melt .data .frame(bsub,id.vars=c("name","nLevel","emot"),measure.vars=c("wrist")) But I can't figure out the cast to get the wrist values in the rows. Thanks ds David H. Shanabrook dhsha...@acad.umass.edu 256-1019 (home) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] question on lm or glm matrix of coeficients X test data terms
Hi, is there an easy way to get the calculated weights in a regression equation? for e.g. if my model has 2 variables 1 and 2 with coefficient .05 and .6 how can I get the computed values for a test dataset for each coefficient? data var1,var2 10,100 so I want to get .5, 60 back in a vector. This is a one row example but I would want to get a matrix of multiplied out coefficients and terms for use in comparing contribution of variables to final score. As in a scorecard using logistic regression. Please advise. thanks Dhruv __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] question on lm or glm matrix of coeficients X test data terms
thanks Jorge. I appreciate your quick help. Will this work if I have 20 columns of data but my regression only has 5 variables? I am looking for something generic where I can give it my model and test data and get back a vector of the multiplied coefficients (with no hard coding). When predict is called with an input model and data, R must be multiplying all co-efficients times variables and summing the number but is there a way to get components of the regressiom terms stored in a matrix before they are added? The idea is to build n models with various terms and after producing a prediction list the top 3 variables that had the biggest impact in that particular set of predictor values. e.g. if I build a model to predict default of loans I would then need to list the top factors in the model that can be used to explain why the loan is risky. With 10-16 variables which can be present or not for each case there be a different 2 or 3 variables that led to the said prediction. Dhruv --- On Mon 07/07, Jorge Ivan Velez < [EMAIL PROTECTED] > wrote: From: Jorge Ivan Velez [mailto: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Date: Mon, 7 Jul 2008 20:12:53 -0400 Subject: Re: [R] question on lm or glm matrix of coeficients X test data terms Dear Dhruv,Try also:# data setset.seed(123)X=matrix(rpois(10,10),ncol=2)# Function to estimate your outcomeoutcome=function(x,betas){if(length(x)!=length(betas)) stop("x and betas are of different length!") y=x*betasy}# outcome for beta1=0.05 and beta2=0.6t(apply(X,1,outcome,betas=c(0.05,0.6)))# outcome for beta1=5 and beta2=6 t(apply(X,1,outcome,betas=c(5,6))) HTH,JorgeOn Mon, Jul 7, 2008 at 7:56 PM, DS <[EMAIL PROTECTED]> wrote: Hi, is there an easy way to get the calculated weights in a regression equation? for e.g. if my model has 2 variables 1 and 2 with coefficient .05 and .6 how can I get the computed values for a test dataset for each coefficient? data var1,var2 10,100 so I want to get .5, 60 back in a vector. This is a one row example but I would want to get a matrix of multiplied out coefficients and terms for use in comparing contribution of variables to final score. As in a scorecard using logistic regression. Please advise. thanks Dhruv __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] question on lm or glm matrix of coeficients X test data terms
thanks Jorge. I appreciate your multiple improvements. This still involves hard coding the co-efficients. I wonder if this is what glm and lm are doing. for e.g. m<-lm(K~a+b,data=data) m$coefficients would have 0 for all variables except a and b and then R must be multiplying the weights the same way as your function. I will try to use your code with the coefficients matrix from the model and see if that works and report back what I find tomorrow. Then if I can add code to return the names of the columns with the resulting highest 3 values of the numbers then I should be done. thanks a lot Jorge. regards, Dhruv --- On Mon 07/07, Jorge Ivan Velez < [EMAIL PROTECTED] > wrote: From: Jorge Ivan Velez [mailto: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Date: Mon, 7 Jul 2008 21:42:54 -0400 Subject: Re: [R] question on lm or glm matrix of coeficients X test data terms That's R: you come out with solutions every time. I hope don't bother you with this. Try also:# data set (10 rows, 10 columns)set.seed(123)X=matrix(rpois(100,10),ncol=10)# Function to estimate your outcome outcome=function(x,betas){if(length(x)!=length(betas)) stop("x and beta have different lengths!")y=x*betassum(y)}# let's assume that you want to include x1, x4, x7 and x9 only# by using beta1=0.5, beta4=0.6, beta7=-0.1, beta9=0.3 betas=c(0.5,0,0,0.6,0,0,-0.1,0,0.3,0)# Resultsapply(X,1,outcome, betas=betas)HTH,JorgeOn Mon, Jul 7, 2008 at 9:31 PM, Jorge Ivan Velez <[EMAIL PROTECTED]> wrote: Sorry, I forgot to the the sum over the rows:# data set (10 rows, 10 columns) set.seed(123)X=matrix(rpois(100,10),ncol=10)# Function to estimate your outcomeoutcome=function(x,betas){if(length(x)!=length(betas)) stop("x and beta have different lengths!") y=x*betasy}# let's assume that you want to include x1, x4, x7 and x9 only# by using beta1=0.5, beta4=0.6, beta7=-0.1, beta9=0.3betas=c(0.5,0,0,0.6,0,0,-0.1,0,0.3,0) # Resultsapply(t(apply(X,1,outcome, betas=betas)),1,sum) HTH,JorgeOn Mon, Jul 7, 2008 at 9:23 PM, Jorge Ivan Velez <[EMAIL PROTECTED]> wrote: Dear Dhruv,It's me again. I've been thinking about a little bit. If you want to include/exclude variables to estimate your outcome, you could try something like this:# data set (10 rows, 10 columns) set.seed(123)X=matrix(rpois(100,10),ncol=10)# Function to estimate your outcomeoutcome=function(x,betas){if(length(x)!=length(betas)) stop("x and beta have different lengths!") y=x*betasy}# let's assume that you want to include x1, x4, x7 and x9 only# by using beta1=0.5, beta4=0.6, beta7=-0.1, beta9=0.3betas=c(0.5,0,0,0.6,0,0,-0.1,0,0.3,0)# Resultst(apply(X,1,outcome, betas=betas)) HTH,JorgeOn Mon, Jul 7, 2008 at 9:11 PM, Jorge Ivan Velez <[EMAIL PROTECTED]> wrote: Dear Dhruv,The short answer is not, because the function I built doesn't work for more variables than coefficients (see the "stop" I introduced). You should do some modifications such as coefficients equals to 1 or 0. For example: # data set (10 rows, 10 columns)set.seed(123)X=matrix(rpois(100,10),ncol=10)X# Function to estimate your outcomeoutcome=function(x,betas,val){k=length(x)nb=length(betas) if(length(x)!=length(betas)) betas=c(betas, rep(val,k-nb)) y=x*betasy}# beta1=1, beta2=2, the rest is equal to zerot(apply(X,1,outcome,betas=c(1,2),val=0))# beta1=0.5, beta2=0.6, the rest is equal to 1 t(apply(X,1,outcome,betas=c(1,2),val=1)) HTH,JorgeOn Mon, Jul 7, 2008 at 8:57 PM, DS <[EMAIL PROTECTED]> wrote: thanks Jorge. I appreciate your quick help. Will this work if I have 20 columns of data but my regression only has 5 variables? I am looking for something generic where I can give it my model and test data and get back a vector of the multiplied coefficients (with no hard coding). When predict is called with an input model and data, R must be multiplying all co-efficients times variables and summing the number but is there a way to get components of the regressiom terms stored in a matrix before they are added? The idea is to build n models with various terms and after producing a prediction list the top 3 variables that had the biggest impact in that particular set of predictor values. e.g. if I build a model to predict default of loans I would then need to list the top factors in the model that can be used to explain why the loan is risky. With 10-16 variables which can be present or not for each case there be a different 2 or 3 variables that led to the said prediction. Dhruv --- On Mon 07/07, Jorge Ivan Velez < [EMAIL PROTECTED] > wrote: From: Jorge Ivan Velez [mailto: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Date: Mon, 7 Jul 2008 20:12:53 -0400 Subject: Re: [R] question on lm or glm matrix of coeficients X test
Re: [R] question on lm or glm matrix of coeficients X test data terms
Hi, I found some of what I was looking for. using the following I can get a matrix of regression coefficient multiplied out by the variable data. g<-predict(comodel,type='terms',data4) m<-cbind(data4,g) What remains is how do I pick the 3-4 rows for each data row with the highest values? I need to get the column names of the top 3 coefficients from this matrix. Some looping through for each row and pick the top 3 highest coefficient/variable products and then getting the columns names for these 3. is there an easy way to get this in an R function? thanks Dhruv --- On Mon 07/07, Jorge Ivan Velez < [EMAIL PROTECTED] > wrote: From: Jorge Ivan Velez [mailto: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Date: Mon, 7 Jul 2008 21:42:54 -0400 Subject: Re: [R] question on lm or glm matrix of coeficients X test data terms That's R: you come out with solutions every time. I hope don't bother you with this. Try also:# data set (10 rows, 10 columns)set.seed(123)X=matrix(rpois(100,10),ncol=10)# Function to estimate your outcome outcome=function(x,betas){if(length(x)!=length(betas)) stop("x and beta have different lengths!")y=x*betassum(y)}# let's assume that you want to include x1, x4, x7 and x9 only# by using beta1=0.5, beta4=0.6, beta7=-0.1, beta9=0.3 betas=c(0.5,0,0,0.6,0,0,-0.1,0,0.3,0)# Resultsapply(X,1,outcome, betas=betas)HTH,JorgeOn Mon, Jul 7, 2008 at 9:31 PM, Jorge Ivan Velez <[EMAIL PROTECTED]> wrote: Sorry, I forgot to the the sum over the rows:# data set (10 rows, 10 columns) set.seed(123)X=matrix(rpois(100,10),ncol=10)# Function to estimate your outcomeoutcome=function(x,betas){if(length(x)!=length(betas)) stop("x and beta have different lengths!") y=x*betasy}# let's assume that you want to include x1, x4, x7 and x9 only# by using beta1=0.5, beta4=0.6, beta7=-0.1, beta9=0.3betas=c(0.5,0,0,0.6,0,0,-0.1,0,0.3,0) # Resultsapply(t(apply(X,1,outcome, betas=betas)),1,sum) HTH,JorgeOn Mon, Jul 7, 2008 at 9:23 PM, Jorge Ivan Velez <[EMAIL PROTECTED]> wrote: Dear Dhruv,It's me again. I've been thinking about a little bit. If you want to include/exclude variables to estimate your outcome, you could try something like this:# data set (10 rows, 10 columns) set.seed(123)X=matrix(rpois(100,10),ncol=10)# Function to estimate your outcomeoutcome=function(x,betas){if(length(x)!=length(betas)) stop("x and beta have different lengths!") y=x*betasy}# let's assume that you want to include x1, x4, x7 and x9 only# by using beta1=0.5, beta4=0.6, beta7=-0.1, beta9=0.3betas=c(0.5,0,0,0.6,0,0,-0.1,0,0.3,0)# Resultst(apply(X,1,outcome, betas=betas)) HTH,JorgeOn Mon, Jul 7, 2008 at 9:11 PM, Jorge Ivan Velez <[EMAIL PROTECTED]> wrote: Dear Dhruv,The short answer is not, because the function I built doesn't work for more variables than coefficients (see the "stop" I introduced). You should do some modifications such as coefficients equals to 1 or 0. For example: # data set (10 rows, 10 columns)set.seed(123)X=matrix(rpois(100,10),ncol=10)X# Function to estimate your outcomeoutcome=function(x,betas,val){k=length(x)nb=length(betas) if(length(x)!=length(betas)) betas=c(betas, rep(val,k-nb)) y=x*betasy}# beta1=1, beta2=2, the rest is equal to zerot(apply(X,1,outcome,betas=c(1,2),val=0))# beta1=0.5, beta2=0.6, the rest is equal to 1 t(apply(X,1,outcome,betas=c(1,2),val=1)) HTH,JorgeOn Mon, Jul 7, 2008 at 8:57 PM, DS <[EMAIL PROTECTED]> wrote: thanks Jorge. I appreciate your quick help. Will this work if I have 20 columns of data but my regression only has 5 variables? I am looking for something generic where I can give it my model and test data and get back a vector of the multiplied coefficients (with no hard coding). When predict is called with an input model and data, R must be multiplying all co-efficients times variables and summing the number but is there a way to get components of the regressiom terms stored in a matrix before they are added? The idea is to build n models with various terms and after producing a prediction list the top 3 variables that had the biggest impact in that particular set of predictor values. e.g. if I build a model to predict default of loans I would then need to list the top factors in the model that can be used to explain why the loan is risky. With 10-16 variables which can be present or not for each case there be a different 2 or 3 variables that led to the said prediction. Dhruv --- On Mon 07/07, Jorge Ivan Velez < [EMAIL PROTECTED] > wrote: From: Jorge Ivan Velez [mailto: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Date: Mon, 7 Jul 2008 20:12:53 -0400 Subject: Re: [R] question on lm or glm matrix of coeficients X test data terms Dear Dhruv,Try also:# d
Re: [R] question on lm or glm matrix of coeficients X test data terms
thanks Jorge. This is great! regards, Dhruv --- On Tue 07/08, Jorge Ivan Velez < [EMAIL PROTECTED] > wrote: From: Jorge Ivan Velez [mailto: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Date: Tue, 8 Jul 2008 20:45:06 -0400 Subject: Re: [R] question on lm or glm matrix of coeficients X test data terms Hi Dhruv,Thanks for the data. Here is what you need so far:# Data setyourdata=structure(c(0.024575733, 0.775009533, 0.216823408, 0.413676529, 0.270053406, 0.579946123, 0.634013362, 0.928518128, 0.405825012, 0.862204203, 0.856558209, 0.187865722, 0.818774004, 0.918802224, 0.469496189, 0.240583922, 0.390818789, 0.767969261, 0.13339806, 0.986023924, 0.442655239, 0.437441939, 0.313678293, 0.952285599, 0.528433974, 0.328609537, 0.84584467, 0.608194527, 0.96139021, 0.485592658, 0.251827955, 0.289777559), .Dim = c(4L, 8L), .Dimnames = list( NULL, c("A", "B", "C", "D", "E", "F", "A:B", "G:H"))) # Function to select the top k values (names)ftopk= function(x,top=3){ res=cnames[order(x, decreasing = TRUE)][1:top] paste(res,collapse=";",sep="")}# Application of the function using the top 3 rows topk=apply(yourdata,1,ftopk,top=3)# Resultdata.frame(yourdata,topk) A B C D E F A.B G.H topk1 0.02457573 0.2700534 0.4058250 0.8187740 0.3908188 0.4426552 0.5284340 0.9613902 G:H;D;A:B 2 0.77500953 0.5799461 0.8622042 0.9188022 0.7679693 0.4374419 0.3286095 0.4855927 D;C;A3 0.21682341 0.6340134 0.8565582 0.4694962 0.1333981 0.3136783 0.8458447 0.2518280 C;A:B;B4 0.41367653 0.9285181 0.1878657 0.2405839 0.9860239 0.9522856 0.6081945 0.2897776 E;F;B HTH,JorgeOn Tue, Jul 8, 2008 at 8:19 PM, DS <[EMAIL PROTECTED]> wrote: Hi Jorge, I am attaching some sample data that looks like the coefficient matrix. In the spreadsheet for each row I have listed the column names I would want to extract for each row. (the ones with the highest values in the row). hope this helps. thanks regards, Dhruv --- On Tue 07/08, Jorge Ivan Velez < [EMAIL PROTECTED] > wrote: From: Jorge Ivan Velez [mailto: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Date: Tue, 8 Jul 2008 19:36:52 -0400 Subject: Re: [R] question on lm or glm matrix of coeficients X test data terms Dear Dhruv, Could you please send me part your data set m? Just 10-20 rows, so I'll have any idea about what you have and what you'd like. I hope you don't mind.Thanks a lot,Jorge On Tue, Jul 8, 2008 at 7:33 PM, DS wrote: Hi, I found some of what I was looking for. using the following I can get a matrix of regression coefficient multiplied out by the variable data. g wrote: From: Jorge Ivan Velez [mailto: [EMAIL PROTECTED] To: [EMAIL PROTECTED] Date: Mon, 7 Jul 2008 20:12:53 -0400 Subject: Re: [R] question on lm or glm matrix of coeficients X test data terms Dear Dhruv,Try also:# data setset.seed(123)X=matrix(rpois(10,10),ncol=2)# Function to estimate your outcomeoutcome=function(x,betas){if(length(x)!=length(betas)) stop("x and betas are of different length!") y=x*betasy}# outcome for beta1=0.05 and beta2=0.6t(apply(X,1,outcome,betas=c(0.05,0.6)))# outcome for beta1=5 and beta2=6 t(apply(X,1,outcome,betas=c(5,6))) HTH,JorgeOn Mon, Jul 7, 2008 at 7:56 PM, DS wrote: Hi, is there an easy way to get the calculated weights in a regression equation? for e.g. if my model has 2 variables 1 and 2 with coefficient .05 and .6 how can I get the computed values for a test dataset for each coefficient? data var1,var2 10,100 so I want to get .5, 60 back in a vector. This is a one row example but I would want to get a matrix of multiplied out coefficients and terms for use in comparing contribution of variables to final score. As in a scorecard using logistic regression. Please advise. thanks Dhruv __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.
[R] r format questions
Hi, 1) I have noticed that when I use the aggregate function it outputs numbers in the results. for example: aggregate by product group.1 Aggregate 1ProductA 1000400.00 2ProductB 23232323.00 3Missing 232323.00 is there a way to suppress the numbers infront of aggregate outputs. I checked and they don't look like columns when I do a summary so I can't -1 them away. 2) is there an easy way to then take my aggregate matrix and then format the sum wtih $ and commas. for e.g instead 1 it should show $10,000.00? I am trying to create a report and am piping the aggregate into an xtable and feeding it R2html. thanks Dhruv Medical Billing and Coding Training ools. http://tagline.excite.com/fc/JkJQPTgMxYL8ba16zHPqHis4q6x4p3rbpaGcEAQIui8YyxCoQBNUxa/ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] design question on piping multiple data sets from 1 file into R
Hi, I have some queries that I use to get time series information for 8 seperate queries which deal with a different set of time series each. I take my queries run them and save the output as csv file and them format the data into graphs in excel. I wanted to know if there is an elegant and clean way to read in 1 csv file but to read the seperate matrices on different rows into seperate R data objects. if this is easy then I can read the 8 datasets in the csv file into 8 r objects and pipe them to time series objects for graphs. thanks Dhruv Email Fax It's easy to receive faxes via email. Click now to find out how! http://tagline.excite.com/fc/JkJQPTgLMRGrZRz1SpXTBEyJ7zsqYo4Wrxjvd4ml8SSHhbc6NzbNSo/ [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.