[R] Package: Waveslim Error: The object is this type is not subsettable
Hi everyone I am using the wavelet and waveslim package in R to find the wave variance.Here is the code return.modwt<-modwt(X, filter="la8", n.levels=5, boundary="periodic", fast=TRUE) return.modwt.var<- wave.variance(return.modwt, type="nongaussian") Where X is a uni-variate time series. I am expecting a matrix with 5 rows(no. of levels) and 3 columns(variance,upper bound,lower bound) but I am getting a message: no method for coercing this S4 class to a vector. What does this mean?I have installed waveslim 1.6.4 version. Thanks in Advance! -- View this message in context: http://r.789695.n4.nabble.com/Package-Waveslim-Error-The-object-is-this-type-is-not-subsettable-tp4300319p4300319.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] NEED HELP : Association in single DTM
I have free text data in a single text document. I create a corpus, and then a document term matrix out of it. I can create a word cloud too. But when I do word association for the same, using "findAssocs(), it always returns numeric(0). EX : findAssocs(dtm, "king" ,0.1) I read on stack overflow that it is because I have a single document. What is the workaround for the same ? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] NEED HELP : Association in single DTM
Hi Boris, In that case, if I have lot of free text data (let us assume part of an Election speech) in one single TEXT document, and i want to find the association of the top 3 most frequently occurring words with the other words in the speech, what method do I adopt ? On Wed, Nov 15, 2017 at 7:08 PM, Boris Steipe wrote: > If you consider the definition of a DTM, and that findAssoc() computes > associations between words as correlations across documents(!), you will > realize that you can't what you want from a single document. Indeed, what > kind of an "association" would you even be looking for? > > B. > > > > > On Nov 15, 2017, at 12:40 AM, Rahul singh > wrote: > > > > I have free text data in a single text document. I create a corpus, and > > then a document term matrix out of it. I can create a word cloud too. > > > > But when I do word association for the same, using "findAssocs(), it > always > > returns numeric(0). > > > > EX : findAssocs(dtm, "king" ,0.1) > > > > I read on stack overflow that it is because I have a single document. > > > > What is the workaround for the same ? > > > > [[alternative HTML version deleted]] > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide http://www.R-project.org/ > posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] 10 Minute Time Window Aggregate By Multiple Grouping Variables
Hello All, I have 1M rows of time stamped information about delivery trucks and their trip related information from simulation .Detailed info the column names and attributes for clarity 1. id: alphanumeric factor/character 2. datetime.of.trip.start: POSIXct -mm-dd hh:mm:ss 3. datetime.of.trip.end: POSIXct -mm-dd hh:mm:ss 4. trip.distance: numeric, miles 5. trip.fuel.consumed: numeric, gallons Close to 2500 trucks , simulated for a year with varying trip behavior. This is what I am trying to accomplish 1. First, I want to create 10 minute intervals from a chosen start date and end date from simulation, ex: 2011-03-30 00:00:00 to 2012-04-01 00:00:00 { approximately 8760 hours * six, 10 minute blocks per hour = looking at 52K unique timestamps }. If it works, will look into 15 minute, hourly and so on. * This will be representative of a start.time.index * I want to recreate the same time window with same frequency but for a column named end.time.index with time increment of 10 minutes 2. Go to the raw data, inspect the " datetime.of.trip.start " and "datetime.of.trip.end", get the time span, match it with the derived 10 min time time intervals (start and end), equally distribute the columns 4,5 (the numeric variables) among the 10 minute indices. Single row example - datetime.of.trip.start: 2017-01-01 00:00:00 - datetime.of.end.start: 2017-01-01 01:00:00 - trip.distance = 60 miles - trip.fuel.consumed = 6 gallons Interpretation of raw data, on January 1, 2017 from 00 am/midnight to 1 am, the delivery truck traveled 60 miles using 6 gallons of fuel 3. I want to be able to do this over entire raw data, get the aggregate sum of the variables of interest (distance and fuel for example) by the 10.min start and end time intervals across all ids as long as there is the 10min start/end time overlap derived from the raw data 4. Repeat the same with some grouping criteria from DateTime or suitable index, example day of week or by truckid. From index 1 (00:00:00 to 00:10:00 ) to index 6(00:00:50 to 01:00:00), 60 miles and 6 gallons are equally distributed. So far, I have created the 10 min HH:MM ids and exploring the foverlaps in data.table, period.apply from xts, converting the time indices into numerics and explore ddply %>% mutate %>% group by options. I am trying to get a more streamlined version that would work as a time series object which would help me get the descriptive statistics (aggregate by grouping criteria for sums and means) and do some plotting and time series analysis (RMA, smoothing, mixed regression models). I got stuck and totally lost when working with zoo and xts, I was creating the index and working on period.apply but was not sure how to work around with 2 POSIXct column variables (start and end). I have the desired index and the cleaned data, I am looking for a more elegant data.table solution to complete the 10min aggregate. Thanks! # simple illustration of the daa id<-c("A1","A1","A1","A1","A1","A1") start.time.index=as.POSIXct (c("2017-01-01 00:00:00","2017-01-01 00:10:00","2017-01-01 00:20:00","2017-01-01 00:30:00","2017-01-01 00:40:00","2017-01-01 00:50:00"), format="%Y-%m-%d %H:%M:%S") end.time.index=as.POSIXct (c("2017-01-01 00:10:00","2017-01-01 00:20:00","2017-01-01 00:30:00","2017-01-01 00:40:00","2017-01-01 00:50:00","2017-01-01 01:00:00"), format="%Y-%m-%d %H:%M:%S") trip.miles=c(10,10,10,10,10,10) trip.fuel=c(1,1,1,1,1,1) # total of 6 gallons in 1 hour equally divided into 6 bins of 10 minute window. desired.data<-c(id,start.time.index,end.time.index,trip.miles,end.time.index) [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] foverlaps data.table error
Hello All Have 2 tables dt1: start end kwh10min 2013-04-01 00:00:54 UTC 2013-04-01 01:00:10 UTC 0.05 2013-04-01 00:40:26 UTC 2013-04-01 00:50:00 UTC 0.1 2013-04-01 02:13:20 UTC 2013-04-01 04:53:42 UTC 0.15 2013-04-02 02:22:00 UTC 2013-04-01 04:33:12 UTC 0.2 2013-04-01 02:26:23 UTC 2013-04-01 04:05:12 UTC 0.25 2013-04-01 02:42:47 UTC 2013-04-01 04:34:33 UTC 0.3 2013-04-01 02:53:12 UTC 2013-04-03 05:27:05 UTC 0.35 2013-04-02 02:54:08 UTC 2013-04-02 05:31:15 UTC 0.4 2013-04-03 02:57:16 UTC 2013-04-03 05:29:32 UTC 0.45 dt2: start and end are 10 minute interval blocks spanning 2013-4-1 00:00:00 to 2013-04-04 I want to add the column 3 of dt1 to map as long as the start and end time are within the 10 minute blocks and keep appending the columns ideally the output should be 4/1/2013 0:00 4/1/2013 0:10 0.05 0 4/1/2013 0:10 4/1/2013 0:20 0.05 0 4/1/2013 0:20 4/1/2013 0:30 0.05 0 4/1/2013 0:30 4/1/2013 0:40 0.05 0 4/1/2013 0:40 4/1/2013 0:50 0.05 0.01 4/1/2013 0:50 4/1/2013 1:00 0.05 0.01 I tried setkey(dums,start,end) setkey(map,start,end) foverlaps(map,dums,type="within",nomatch=0L) I keep getting the error Error in foverlaps(map, dums, type = "within", nomatch = 0L) : All entries in column start should be <= corresponding entries in column end in data.table 'y' [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] jfindClass class not found - Please Help
> library(RJDBC) Loading required package: DBI Loading required package: rJava > jcc = JDBC ("com/ibm/db2/jcc/DB2Driver","/opt/ibm/db2/java/lib/db2jcc4.jar") Error in .jfindClass(as.character(driverClass)[1]) : class not found [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] A request
Hello there!! Could somebody please go through the question ( http://stats.stackexchange.com/questions/268323/string-kernels-in-r)? In short I need the reference to the algorithms used for string kernels in Kernlab package in R. Thank you. Regards: Rahul [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Mosaic plot using ggplot2
Hello, I would like to know if mosaic plots are supported by ggplot2? If so, can someone point me to a couple of examples or give me any pointers? Cheers Rahul __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] R- NLP on R but ....
I'll appreciate the help on the following problem: I solved many Nonlinear programming problems with nonlinear constraintsRdonlp is working well but i am unable to get INTEGER data with nonlinear constraints in Rdonlp. Is it possible to get Integer Values of parameters in any package of R with nonlinear constraints. Rahul [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Help with the Error Message in R "Error in 1:nchid : result would be too long a vector"
Hello everyone, I am using *mlogit* to analyse my choice experiment data. I have *3 alternatives* for each individual and for each individual I have *9 questions*. I have a response from *516 individuals*. So it is a panel of 9*516 observations. I have arranged the data in long format (it contains 100 columns indicating different variables and identifiers). In mlogit I tried the following command--- *mldata<- mlogit.data(mydata, shape = "long", alt.var = "Alt_name", choice = "Choice_binary", id.var = "IND")* It is giving me the following error message- Error in 1:nchid : result would be too long a vector Could you please help me with this? I don't think it is too big a data 100 ROWS*13932 columns. I faced no issue in Excel. I am stuck due to this issue. Thanks in advance. -- Best Regards, Rahul Chakraborty Research Fellow National Institute of Public Finance and Policy New Delhi- 110067 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with the Error Message in R "Error in 1:nchid : result would be too long a vector"
:3.539 Mean :3.514 3rd Qu.:12.00 3rd Qu.:4.000 3rd Qu.:4.000 Max. :15.00 Max. :5.000 Max. :5.000 EV_risk_tot EV_risk_avg EV_price EV_awareness_tot EV_awareness_avg Min. : 2.000 Min. :1.00 Min. :1.000 Min. : 3.000 Min. :1.000 1st Qu.: 8.000 1st Qu.:4.00 1st Qu.:1.000 1st Qu.: 4.000 1st Qu.:1.333 Median : 9.000 Median :4.50 Median :2.000 Median : 5.000 Median :1.667 Mean : 8.661 Mean :4.33 Mean :2.244 Mean : 5.419 Mean :1.806 3rd Qu.:10.000 3rd Qu.:5.00 3rd Qu.:3.000 3rd Qu.: 6.000 3rd Qu.:2.000 Max. :10.000 Max. :5.00 Max. :5.000 Max. :15.000 Max. :5.000 EV_awareness_medianLost_env Investment_trust Lottery1 Min. :1.000 Min. :1.000 Min. : 0 Length:13932 1st Qu.:1.000 1st Qu.:5.000 1st Qu.: 0 Class :character Median :2.000 Median :5.000 Median : 0 Mode :character Mean :1.806 Mean :4.913 Mean : 1345 3rd Qu.:2.000 3rd Qu.:5.000 3rd Qu.: 0 Max. :5.000 Max. :5.000 Max. :10 Time1 Lottery2Time2 Length:13932 Length:13932 Length:13932 Class :character Class :character Class :character Mode :character Mode :character Mode :character Yes, I have many Likert items and many dummy variables. How do I solve this issue? Best regards, On Tue, Sep 22, 2020 at 1:45 AM David Winsemius wrote: > If you had included output of summary(mydata) we might be more capable > of giving a fact-based answer but I'm guessing that you have a lot of > catagorical variables with multiple levels and some sort of combinatoric > explosion is resulting in too many levels of a constructed factor. > > > -- > > David. > > On 9/21/20 12:55 PM, Rahul Chakraborty wrote: > > Hello everyone, > > > > I am using *mlogit* to analyse my choice experiment data. I have *3 > > alternatives* for each individual and for each individual I have *9 > > questions*. I have a response from *516 individuals*. So it is a panel of > > 9*516 observations. I have arranged the data in long format (it contains > > 100 columns indicating different variables and identifiers). > > > > In mlogit I tried the following command--- > > > > *mldata<- mlogit.data(mydata, shape = "long", alt.var = "Alt_name", > choice > > = "Choice_binary", id.var = "IND")* > > > > It is giving me the following error message- Error in 1:nchid : result > > would be too long a vector > > > > Could you please help me with this? I don't think it is too big a data > 100 > > ROWS*13932 columns. I faced no issue in Excel. I am stuck due to this > issue. > > Thanks in advance. > > > > -- Best Regards, > > Rahul Chakraborty > > Research Fellow > > National Institute of Public Finance and Policy > > New Delhi- 110067 > > > > [[alternative HTML version deleted]] > > > > __ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > -- Rahul Chakraborty Research Fellow National Institute of Public Finance and Policy New Delhi- 110067 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with the Error Message in R "Error in 1:nchid : result would be too long a vector"
000 Max. :1. Max. :1. Max. :1. HH_cars PPC_morethan10 Daily_travel_medium Daily_travel_long Min. :0. Min. :0. Min. :0. Min. :0.0 1st Qu.:0. 1st Qu.:0. 1st Qu.:0. 1st Qu.:0.0 Median :0. Median :0. Median :0. Median :0.0 Mean :0.4864 Mean :0.4516 Mean :0.3702 Mean :0.02713 3rd Qu.:1. 3rd Qu.:1. 3rd Qu.:1. 3rd Qu.:0.0 Max. :3. Max. :1. Max. :1. Max. :1.0 Garage_y DL_y Own_accom Freerider_tot Min. :0. Min. :0. Min. :0. Min. :2.000 1st Qu.:0. 1st Qu.:0. 1st Qu.:0. 1st Qu.:2.000 Median :1. Median :1. Median :1. Median :2.000 Mean :0.7267 Mean :0.6357 Mean :0.6647 Mean :2.244 3rd Qu.:1. 3rd Qu.:1. 3rd Qu.:1. 3rd Qu.:2.000 Max. :1. Max. :1. Max. :1. Max. :8.000 Satisfaction_tot Political_view WTP_env_tot Warmglow_tot Standout Min. : 2.000 Min. :1.000 Min. : 2.000 Min. : 2.00 Min. :1.000 1st Qu.: 3.000 1st Qu.:3.000 1st Qu.: 7.000 1st Qu.: 6.00 1st Qu.:2.000 Median : 4.000 Median :3.000 Median : 8.000 Median : 8.00 Median :3.000 Mean : 4.264 Mean :3.258 Mean : 8.124 Mean : 7.61 Mean :2.657 3rd Qu.: 5.000 3rd Qu.:4.000 3rd Qu.: 9.000 3rd Qu.: 9.00 3rd Qu.:3.000 Max. :10.000 Max. :5.000 Max. :10.000 Max. :10.00 Max. :5.000 Acceptance_new Climate_perceptionEnv_pref Tech_leader Min. :1.0Min. :1.000 Min. :1.000 Min. :1.0 1st Qu.:2.01st Qu.:4.000 1st Qu.:2.000 1st Qu.:2.0 Median :3.0Median :5.000 Median :3.000 Median :2.0 Mean :2.8Mean :4.483 Mean :3.093 Mean :2.5 3rd Qu.:4.03rd Qu.:5.000 3rd Qu.:4.000 3rd Qu.:3.0 Max. :5.0Max. :5.000 Max. :5.000 Max. :5.0 Social_motivation_tot EV_risk_tot EV_awareness_tot Min. : 3.00 Min. : 2.000 Min. : 3.000 1st Qu.: 9.00 1st Qu.: 8.000 1st Qu.: 4.000 Median :11.00 Median : 9.000 Median : 5.000 Mean :10.62 Mean : 8.661 Mean : 5.419 3rd Qu.:12.00 3rd Qu.:10.000 3rd Qu.: 6.000 Max. :15.00 Max. :10.000 Max. :15.000 On Tue, Sep 22, 2020 at 2:07 AM Rahul Chakraborty wrote: > Hello, > > Here is the result of summary(mydata) > > summary(mydata) > INDBlockQES STR ALT > Min. : 1.0 Min. :1.000 Min. :1 Min. : 101 Min. :1 > 1st Qu.:129.8 1st Qu.:1.000 1st Qu.:3 1st Qu.:12978 1st Qu.:1 > Median :258.5 Median :2.000 Median :5 Median :25855 Median :2 > Mean :258.5 Mean :2.467 Mean :5 Mean :25855 Mean :2 > 3rd Qu.:387.2 3rd Qu.:4.000 3rd Qu.:7 3rd Qu.:38732 3rd Qu.:3 > Max. :516.0 Max. :4.000 Max. :9 Max. :51609 Max. :3 >ALT_name ASC Choice Choice_binary > Length:13932 Min. :0. Min. :1.000 Min. :0. > Class :character 1st Qu.:0. 1st Qu.:1.000 1st Qu.:0. > Mode :character Median :1. Median :1.000 Median :0. > Mean :0.6667 Mean :1.626 Mean :0. > 3rd Qu.:1. 3rd Qu.:2.000 3rd Qu.:1. > Max. :1. Max. :3.000 Max. :1. > Price Refuel_availability Registration_charges Running_cost > Min. : 9.00 Min. :0.25Min. :0.0 Min. :115.0 > 1st Qu.:10.00 1st Qu.:0.751st Qu.:0.04000 1st Qu.:192.0 > Median :10.00 Median :0.90Median :0.06000 Median :268.0 > Mean :10.33 Mean :0.80Mean :0.05333 Mean :268.2 > 3rd Qu.:11.00 3rd Qu.:1.003rd Qu.:0.08000 3rd Qu.:383.0 > Max. :12.00 Max. :1.00Max. :0.08000 Max. :383.0 > Market_shareFriends_share Refuel_time Emission > Min. :0.0500 Min. :0. Min. : 5.00 Min. :0. > 1st Qu.:0.1500 1st Qu.:0.1500 1st Qu.: 5.00 1st Qu.:0. > Median :0.2500 Median :0.3000 Median : 5.00 Median :0.7500 > Mean :0. Mean :0. Mean :13.33 Mean :0.5833 > 3rd Qu.:0.6000 3rd Qu.:0.5500 3rd Qu.:30.00 3rd Qu.:1. > Max. :0.9000 Max. :1. Max. :30.00 Max. :1. > Sex Age2 Age3 Age4 > Min. :0. Min. :0. Min. :0. Min. :0. > 1st Qu.:1. 1st Qu.:0. 1st Qu.:0. 1st Qu.:0. > Median :1. Median :0. Median :0. Median :0. > Mean :0.7791 Mean :0.4574 Mean :0.2326 Mean :0.1531 > 3rd Q
Re: [R] Help with the Error Message in R "Error in 1:nchid : result would be too long a vector"
Hello David and everyone, I am really sorry for not abiding by the specific guidelines in my prior communications. I tried to convert the present email in plain text format (at least it is showing me so in my gmail client). I have also converted the xlsx file into a csv format with .txt extension. So, my problem is I need to run panel mixed logit regression for a choice model. There are 3 alternatives, 9 questions for each individual and 516 individuals in data. I have created a csv file in long format from the survey questionnaire. Apart from the alternative specific variables I have many individual specific variables and most of these are dummies (dummy coded). I will use subsets of these in my alternative model specifications. So, in my data I have 100 columns with 13932 rows (3*9*516). After reading the csv file and creating a dataframe 'mydata' I used the following command for mlogit. mldata1<- mlogit.data(mydata, shape = "long", alt.var = "Alt_name", choice = "Choice_binary", id.var = "IND") It gives me the same error message- Error in 1:nchid : result would be too long a vector. The attached file (csv file with .txt extension) is an example of 2 individuals each with 3 questions. I have also reduced the number of columns to 57. Now, there are 18 rows. But still if I use the same command on my new data I get the same error message. Can anyone please help me out with this? Because of this error I am stuck at the dataframe level. Thanks in advance. Regards, Rahul Chakraborty On Tue, Sep 22, 2020 at 4:50 AM David Winsemius wrote: > > @Rahul; > > > You need to learn to post in plain text and attachments may not be xls > or xlsx. They need to be text files. And even if they are comma > separated files and text, they still need to be named with a txt extension. > > > I'm the only one who got the xlsx file. I got the error regardless of > how many column I omitted, so my gues was possibly incorrect. But I did > RTFM. See ?mlogit.datadfi The mlogit.data function is deprecated and you > are told to use the dfidx function. Trying that you now get an error > saying: " the two indexes don't define unique observations". > > > > sum(duplicated( dfrm[,1:2])) > [1] 12 > > length(dfrm[,1]) > [1] 18 > > So of your 18 lines in the example file, most of them appear to be > duplicated in their first two rows and apparently that is not allowed by > dfidx. > > > Caveat: I'm not a user of the mlogit package so I'm just reading the > manual and possibly coming up with informed speculation. > > Please read the Posting Guide. You have been warned. Repeated violations > of the policies laid down in that hallowed document will possibly result > in postings being ignored. > IND,QES,STR,ALT_name,Choice_binary,Price,Refuel_availability,Registration_charges,Running_cost,Market_share,Friends_share,Refuel_time,Emission,Sex,Age2,Age3,Age4,Edu_PG,Edu_Oth,Occu_Pvt,Occu_Pub,Occu_SE,Location_metro,Location_majorcity,Ahm,Ben,Chen,NCR,Hyd,Kol,Mum,MajCity,HH_size,Children,IG2,IG3,IG4,HH_cars,PPC_morethan10,Daily_travel_medium,Daily_travel_long,Garage_y,DL_y,Own_accom,Freerider_tot,Satisfaction_tot,Political_view,WTP_env_tot,Warmglow_tot,Standout,Acceptance_new,Climate_perception,Env_pref,Tech_leader,Social_motivation_tot,EV_risk_tot,EV_awareness_tot 1,1,101,Hybrid,1,11,0.8,0.08,268,0.25,0.15,5,0.75,1,1,0,0,1,0,1,0,0,1,0,0,0,0,0,0,0,1,0,1,0,0,0,1,0,1,0,0,0,0,0,2,3,3,8,4,3,2,5,1,1,12,6,3 1,1,101,Electric,0,10,0.5,0.02,115,0.25,0,30,0,1,1,0,0,1,0,1,0,0,1,0,0,0,0,0,0,0,1,0,1,0,0,0,1,0,1,0,0,0,0,0,2,3,3,8,4,3,2,5,1,1,12,6,3 1,1,101,Petrol,0,10,1,0.08,383,0.5,0.85,5,1,1,1,0,0,1,0,1,0,0,1,0,0,0,0,0,0,0,1,0,1,0,0,0,1,0,1,0,0,0,0,0,2,3,3,8,4,3,2,5,1,1,12,6,3 1,2,102,Hybrid,1,10,0.8,0.08,230,0.25,0,5,0.75,1,1,0,0,1,0,1,0,0,1,0,0,0,0,0,0,0,1,0,1,0,0,0,1,0,1,0,0,0,0,0,2,3,3,8,4,3,2,5,1,1,12,6,3 1,2,102,Electric,0,12,0.5,0.04,153,0.05,0.15,30,0,1,1,0,0,1,0,1,0,0,1,0,0,0,0,0,0,0,1,0,1,0,0,0,1,0,1,0,0,0,0,0,2,3,3,8,4,3,2,5,1,1,12,6,3 1,2,102,Petrol,0,10,1,0.08,383,0.7,0.85,5,1,1,1,0,0,1,0,1,0,0,1,0,0,0,0,0,0,0,1,0,1,0,0,0,1,0,1,0,0,0,0,0,2,3,3,8,4,3,2,5,1,1,12,6,3 1,3,103,Hybrid,1,9,1,0.06,307,0.15,0.3,5,0.75,1,1,0,0,1,0,1,0,0,1,0,0,0,0,0,0,0,1,0,1,0,0,0,1,0,1,0,0,0,0,0,2,3,3,8,4,3,2,5,1,1,12,6,3 1,3,103,Electric,0,11,0.25,0.02,115,0.25,0,30,0,1,1,0,0,1,0,1,0,0,1,0,0,0,0,0,0,0,1,0,1,0,0,0,1,0,1,0,0,0,0,0,2,3,3,8,4,3,2,5,1,1,12,6,3 1,3,103,Petrol,0,10,1,0.08,383,0.6,0.7,5,1,1,1,0,0,1,0,1,0,0,1,0,0,0,0,0,0,0,1,0,1,0,0,0,1,0,1,0,0,0,0,0,2,3,3,8,4,3,2,5,1,1,12,6,3 2,1,201,Hybrid,0,9,0.8,0.06,268,0.25,0.3,5,0.75,1,1,0,0,1,0,1,0,0,1,0,0,0,0,0,0,1,0,0,3,0,1,0,0,0,0,0,0,0,0,1,2,6,4,8,4,2,2,5,2,3,10,9,12 2,1,201,Electric,1,12,0.75,0,115,0.15,0.15,30,0,1,1,0,0,1,0,1,0,0,1,0,0,0,0,0,0,1,0,0,3,0,1,0,0,0,
Re: [R] Help with the Error Message in R "Error in 1:nchid : result would be too long a vector"
David, My apologies with the first one. I was checking different tutorials on mlogit where they were using mlogit.data, so I ended up using it. I am not getting what you are saying by the "duplicates in first two columns". See, my first column is IND which identifies my individuals, second column is QES which identifies the question number each individual faces, 3rd column is a stratification code that can be ignored. Columns 6-13 are alternative specific variables and rest are individual specific. So 1st 3 rows indicate 1st question faced by 1st individual containing 3 alternatives, and so on. So, I have already arranged the data in long format. Here, I could not get what the "duplicate in first two columns" mean. And I am really sorry that there was an error in my code as Rui has pointed out. The correct code is mldata1 <- dfidx(mydata, shape = "long", alt.var = "ALT_name", choice = "Choice_binary", id.var = "IND") It still shows the error- "the two indexes don't define unique observations" It would be really helpful if you kindly help. Regards, On Tue, Sep 22, 2020 at 8:46 PM David Winsemius wrote: > > You were told two things about your code: > > > 1) mlogit.data is deprecated by the package authors, so use dfidx. > > 2) dfidx does not allow duplicate ids in the first two columns. > > > Which one of those are you asserting is not accurate? > > > -- > > David. > > On 9/21/20 10:20 PM, Rahul Chakraborty wrote: > > Hello David and everyone, > > > > I am really sorry for not abiding by the specific guidelines in my > > prior communications. I tried to convert the present email in plain > > text format (at least it is showing me so in my gmail client). I have > > also converted the xlsx file into a csv format with .txt extension. > > > > So, my problem is I need to run panel mixed logit regression for a > > choice model. There are 3 alternatives, 9 questions for each > > individual and 516 individuals in data. I have created a csv file in > > long format from the survey questionnaire. Apart from the alternative > > specific variables I have many individual specific variables and most > > of these are dummies (dummy coded). I will use subsets of these in my > > alternative model specifications. So, in my data I have 100 columns > > with 13932 rows (3*9*516). After reading the csv file and creating a > > dataframe 'mydata' I used the following command for mlogit. > > > > mldata1<- mlogit.data(mydata, shape = "long", alt.var = "Alt_name", > > choice = "Choice_binary", id.var = "IND") > > > > It gives me the same error message- Error in 1:nchid : result would be > > too long a vector. > > > > The attached file (csv file with .txt extension) is an example of 2 > > individuals each with 3 questions. I have also reduced the number of > > columns to 57. Now, there are 18 rows. But still if I use the same > > command on my new data I get the same error message. Can anyone please > > help me out with this? Because of this error I am stuck at the > > dataframe level. > > > > > > Thanks in advance. > > > > > > Regards, > > Rahul Chakraborty > > > > On Tue, Sep 22, 2020 at 4:50 AM David Winsemius > > wrote: > >> @Rahul; > >> > >> > >> You need to learn to post in plain text and attachments may not be xls > >> or xlsx. They need to be text files. And even if they are comma > >> separated files and text, they still need to be named with a txt extension. > >> > >> > >> I'm the only one who got the xlsx file. I got the error regardless of > >> how many column I omitted, so my gues was possibly incorrect. But I did > >> RTFM. See ?mlogit.datadfi The mlogit.data function is deprecated and you > >> are told to use the dfidx function. Trying that you now get an error > >> saying: " the two indexes don't define unique observations". > >> > >> > >> > sum(duplicated( dfrm[,1:2])) > >> [1] 12 > >> > length(dfrm[,1]) > >> [1] 18 > >> > >> So of your 18 lines in the example file, most of them appear to be > >> duplicated in their first two rows and apparently that is not allowed by > >> dfidx. > >> > >> > >> Caveat: I'm not a user of the mlogit package so I'm just reading the > >> manual and possibly coming up with informed speculation. > >> > >> Please read the Posting Guide. You have been warned. Repeated violations > >> of the policies laid down in that hallowed document will possibly result > >> in postings being ignored. > >> -- Rahul Chakraborty Research Fellow National Institute of Public Finance and Policy New Delhi- 110067 __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Help with the Error Message in R "Error in 1:nchid : result would be too long a vector"
Hello Rui, Thanks a lot for your response. But, I will surely say that the data I attached is in long format as it has 18 rows (3 alternatives*3 questions* 2 individuals). Had it been a wide format data it would have had 6 rows (3 questions* 2 individuals). But, anyway thanks. Best, Rahul On Wed, Sep 23, 2020 at 3:23 AM Rui Barradas wrote: > > Hello, > > Please keep this on the list so that others can give their contribution. > > If you have reshaped your data can you post the code you ran to reshape > it? Right now we only have the original attachment, in wide format, not > the long format data. > > Rui Barradas > > Às 21:55 de 22/09/20, Rahul Chakraborty escreveu: > > Hi, > > > > Thank you so much for your reply. > > Yes, thank you for pointing that out, I apologise for that error in > > the variable name. However, my data is in long format. > > > > See, my first column is IND which identifies my individuals, > > second column is QES which identifies the question number each > > individual faces, 3rd column is a stratification code that can be > > ignored. Columns 6-13 are alternative specific variables and rest are > > individual specific. So 1st 3 rows indicate 1st question faced by 1st > > individual containing 3 alternatives, and so on. So, I have already > > arranged the data in long format. > > > > With that in mind if I use shape="long" it still gives me error. > > > > Best regards, > > > > On Tue, Sep 22, 2020 at 11:00 PM Rui Barradas wrote: > >> > >> Hello, > >> > >> I apologize if the rest of quotes prior to David's email are missing, > >> for some reason today my mail client is not including them. > >> > >> As for the question, there are two other problems: > >> > >> 1) Alt_name is misspelled, it should be ALT_name; > >> > >> 2) the data is in wide, not long, format. > >> > >> A 3rd, problem is that in ?dfidx it says > >> > >> alt.var > >> the name of the variable that contains the alternative index (for a long > >> data.frame only) or the name under which the alternative index will be > >> stored (the default name is alt) > >> > >> > >> So if shape = "wide", alt.var is not needed. > >> But I am not a user of package mlogit, I'm just guessing. > >> > >> The following seems to fix it (it doesn't throw errors). > >> > >> > >> mldata1 <- dfidx(mydata, shape = "wide", > >>#alt.var = "ALT_name", > >>choice = "Choice_binary", > >>id.var = "IND") > >> > >> > >> Hope this helps, > >> > >> Rui Barradas > >> > >> > >> Às 16:15 de 22/09/20, David Winsemius escreveu: > >>> You were told two things about your code: > >>> > >>> > >>> 1) mlogit.data is deprecated by the package authors, so use dfidx. > >>> > >>> 2) dfidx does not allow duplicate ids in the first two columns. > >>> > >>> > >>> Which one of those are you asserting is not accurate? > >>> > >>> > > > > > > -- Rahul Chakraborty Research Fellow National Institute of Public Finance and Policy New Delhi- 110067 __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Designing a Fractional Factorial Design in R
Dear all, Presently I am working on designing a questionnaire for my discrete choice experiment. I intend to use R for the "fractional factorial design". I have the following objective- The respondent has to choose one out of 4 objects. Each of the 4 objects are classified by 5 different attributes. However, the levels are not the same under each of the objects. For example, the table below displays first three attributes and corresponding values of levels. Object1 Object2Object3 Object4 Attribute1 100 80, 100, 120120, 140, 160 120, 140, 160 Attribute2100,80,120 100,80,60,40 80,60,40 75, 50, 25 Atrribute3 100 100 100,75 75, 50, 20, 10 In this scenario, as you can see the number and values of levels for each attribute may vary across different objects. Given this scenario which package should I use to implement a fractional factorial design? Any help would be highly appreciated. Thanking you, -- Regards, Rahul Chakraborty Research Fellow National Institute of Public Finance and Policy New Delhi- 110067 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problem in generating an "Orthogonal fractional design"
Dear all, Presently I am working on designing a questionnaire for my discrete choice experiment. I want to generate an orthogonal fractional factorial design for the following problem- The respondent has to choose one out of 4 objects (*X1, X2, X3, X4*). Each of the 4 objects are classified by 10 different attributes. However, the levels are not the same under each of the objects. The table below displays the situation. Attributes No. of Levels Choices and values X1 X2 X3 X4 A 5 1 1,2,3 3,4,5 3,4,5 B 4 1 1 1,2 3,4 C 4 1 1 2,4 3,4 D 5 1 1,2,3 1,2,3 1,4,5 E 5 1,2 2,3 3,4 5 F 2 1 1 1,2 1,2 G 2 1 1 1,2 2 H 2 1 1 1,2 1,2 I 4 1 2,3,4 2,3,4 2,3,4 J 3 1 2,3 2,3 2,3 *X* 4 1 2 3 4 The last row denotes the 4 objects. Now I want to generate the choice sets for my questionnaire. I would like to use *orthogonal fractional factorial design*. I kept the row with *X* in order to sort out the redundant combinations from the choice sets. I have the following questions- 1. *How to decide on the number of runs that one has to chose for fractional factorial design?* I used *AlgDesign* to generate the full factorial which consists of 0.768 million combinations. So, I need a modest number of runs, but how much should I target? I do not see any document where one explains how to choose the number of trials/experimental runs. The papers I am following only tell that they have used N number of runs instead of the full factorial. 2. Out of 0.768 million combinations in the full factorial, there will be many which are redundant. For example- I don't want those rows where (X=X1) and A=(2 or 3 or 4 or 5). There are many other such cases which I don't want in my design. I have coded all levels for each attribute and that's why they are in the full factorial. *How do I generate an orthogonal fractional factorial so that it does not contain such redundant combinations?* I included the X attribute with the purpose of dropping those combinations conditioned upon specific values of X and other factors. Should I execute that and then generate the fractional factorial using *optFederov* from the remaining data in the dataframe? I would be highly obliged if you can kindly help me in this regard. I am a student of Economics, so I do not have very deep understanding of the statistical procedure of such algorithms. So, my question might sound extremely naive for which I am sorry. -- Regards, Rahul Chakraborty Research Fellow National Institute of Public Finance and Policy New Delhi- 110067 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] significance testing for the difference in the ratio of means
I have a question regarding significance testing for the difference in the ratio of means. The data consists of a control and a test group, each with and without treatment. I am interested in testing if the treatment has a significantly different effect (say, in terms of fold-activation) on the test group compared to the control. The form of the data with arbitrary n and not assuming equal variance: m1 = mean of (control group) n = 7 m2 = mean of (control group w/ treatment) n= 10 m3 = mean of (test group) n = 8 m4 = mean of (test group w/ treatment) n = 9 H0: m2/m1 = m4/m3 restated, H0: m2/m1 - m4/m3 = 0; Method 1: Fieller's Intervals Use fieller's theorum available in R as part of the mratios package. This is a promising way to compute standard error/confidence intervals for each of the two ratios but will not yield p-values for significance testing. Significance by non-overlap of confidence intervals is too stringent a test and will lead to frequent type II errors. Method 2: Bootstrap Abandoning an analytical solution, we try a numerical solution. I can repeatedly (1000 or 10,000 times) draw with replacement samples of size 7,10,8,9 from m1,m2,m3,m4 respectively. Each iteration, I can compute the ratio for m2/m1 and m4/m3 as well as the difference. Standard deviations of the m2/m1 and the m4/m3 bootstrap distributions can give me standard errors for these two ratios. Then, I can test to see where "0" falls on the third distribution, the distribution of the difference of the ratios. If 0 falls on one of the tails, beyond the 2.5th or 97.5th percentile, I can declare a significant difference in the two ratios. My question here is if I can correctly report the percentile location of "0" as the p-value? Method 3: Permutation test I understand the best way to obtain a p-value for the significance test would be to resample under the null hypothesis. However, as I am comparing the ratio of means, I do not have individual observations to randomize between the groups. The best I can think to do is create an exhaustive list of all (7x10) = 70 possible observations for m2/m1 from the data. Then create a similar list of all (8x9) = 72 possible observations for m4/m3. Pool all (70+72) = 142 observations and repeatedly randomly assign them to two groups of size 70 and 72 to represent the two ratios and compute the difference in means. This distribution could represent the distribution under the null hypothesis and I could then measure where my observed value falls to compute the p-value. This however, makes me uncomfortable as it seems to treat the data as a "mean of ratios" rather than a "ratio of means". Method 4: Combination of bootstrap and permutation test Sample with replacement samples of size 7,10,8,9 from m1,m2,m3,m4 respectively as in method 2 above. Calculate the two ratios for these 4 samples (m2/m1 and m4/m3). Record these two ratios into a list. Repeat this process an arbitrary (B) number of times and record the two ratios into your growing list each time. Hence if B = 10, we will have 20 observations of the ratios. Then proceed with permutation testing with these 20 ratio observations by repeatedly randomizing them into two equal groups of 10 and computing the difference in means of the two groups as we did in method 3 above. This could potentially yeild a distribution under the null hypothesis and p-values could be obtained by localizing the observed value on this distribution. I am unsure of appropriate values for B or if this method is valid at all. Another complication would be the concern for multiple comparisons if I wished to include additional test groups (m5 = testgroup2; m6 = testgroup2 w/ treatment; m7 = testgroup3, m8 = testgoup3 w/ treatment...etc) and how that might be appropriately handled. Method 2 seems the most intuitive to me. Bootstrapping this way will likely yield appropriate Starndard Errors for the two ratios. However, I am very much interested in appropriate p-values for the comparison and I am not sure if localizing "0" on the bootstrap distribution of the difference of means is appropriate. Thank you in advance for your suggestions. -Rahul [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] significance testing for the difference in the ratio of means
My apologies if my request is off topic and for my admittedly half-baked understanding of the topic. I'm afraid trying to talk with the "local statistical help", and trying to post on several general statistical forums to look for proper guidance has not yielded any response much less any helpful ones. I turned to this forum in desperation because 1) I will be using R to implement the chosen strategy and 2) looking through the archives of this forum seemed promising especially because of past helpful posts as this: https://stat.ethz.ch/pipermail/r-help/2009-April/194843.html Perhaps you can suggest a resource which would cover the applicable "standard methodology" and perhaps its implementation in R? I would truly appreciate any guidance. My protocols/design = each observation within the 4 groups represents a recording of a continuous variable (whole-cell current from 1 cell in electrophysiology measurements). The data for each group appears roughly normal (albeit small n values from 7-10 per group). The variance is not equal among the groups because it seems to vary with the mean, ie larger currents = larger absolute variance. There is no explicit randomization involved as these observations are merely the measurements of wholecell currents for cells receiving an identical experimental treatment. I am interested comparing the "fold-activation" effect of the treatment for control cells versus for testgroup cells which have differing baseline pre-treatment current values. Best, Rahul On Fri, Jun 14, 2013 at 7:13 PM, Bert Gunter wrote: > Sigh... > > (Again!) These are primarily statistical, not R, issues. I would urge > that you seek local statistical help. You appear to be approaching > this with a good deal of semi-informed adhoc-ery. Standard methodology > should be applicable, but it would be presumptuous and ill-advised of > me to offer specifics remotely without understanding in detail the > goals of your research, the nature of your design (e.g. protocols, > randomization?), and the behavior of your data (what do appropriate > plots tell you??) > > Others may be bolder. Proceed at your own risk. > > Cheers, > Bert > > On Fri, Jun 14, 2013 at 2:07 PM, Rahul Mahajan wrote: >> I have a question regarding significance testing for the difference in the >> ratio of means. >> The data consists of a control and a test group, each with and without >> treatment. I am interested in testing if the treatment has a significantly >> different effect (say, in terms of fold-activation) on the test group >> compared to the control. >> >> The form of the data with arbitrary n and not assuming equal variance: >> >> m1 = mean of (control group) n = 7 >> m2 = mean of (control group w/ treatment) n= 10 >> m3 = mean of (test group) n = 8 >> m4 = mean of (test group w/ treatment) n = 9 >> >> H0: m2/m1 = m4/m3 >> restated, >> H0: m2/m1 - m4/m3 = 0; >> >> Method 1: Fieller's Intervals >> Use fieller's theorum available in R as part of the mratios package. This >> is a promising way to compute standard error/confidence intervals for each >> of the two ratios but will not yield p-values for significance testing. >> Significance by non-overlap of confidence intervals is too stringent a >> test and will lead to frequent type II errors. >> >> Method 2: Bootstrap >> Abandoning an analytical solution, we try a numerical solution. I can >> repeatedly (1000 or 10,000 times) draw with replacement samples of size >> 7,10,8,9 from m1,m2,m3,m4 respectively. Each iteration, I can compute the >> ratio for m2/m1 and m4/m3 as well as the difference. Standard deviations >> of the m2/m1 and the m4/m3 bootstrap distributions can give me standard >> errors for these two ratios. Then, I can test to see where "0" falls on >> the third distribution, the distribution of the difference of the ratios. >> If 0 falls on one of the tails, beyond the 2.5th or 97.5th percentile, I >> can declare a significant difference in the two ratios. My question here >> is if I can correctly report the percentile location of "0" as the p-value? >> >> Method 3: Permutation test >> I understand the best way to obtain a p-value for the significance test >> would be to resample under the null hypothesis. However, as I am comparing >> the ratio of means, I do not have individual observations to randomize >> between the groups. The best I can think to do is create an exhaustive >> list of all (7x10) = 70 possible observations for m2/m1 from the data. >> Then create a similar list of all (8x9) = 72 possible observations for >> m4/m3. Pool all (70+72) = 142 obser
[R] Help on reading multipe files in R
Hi, How can I read multiple files(in a loop like manner) within a single code of R ? For example, I need to run the same code for different datasets (here list of companies) and since individual files are quite large, appending the files into one file is not a desirable option. Can this be done through a macro or sql kind of command? Thanks and Regards, Rahul S Menon Research Associate ISB, Hyderabad Ph-040-2318 7288 DISCLAIMER : This e-mail (including any attachments) is ...{{dropped:11}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Daily to Monthly Cumulative Returns
I want to do a daily, weekly and monthly regression between InvestmentGrade Credit Spreads (Dependent Variable) and Treasuries (Independent Variable). My starting point is daily spread data and daily prices for US treasuries. Should I convert the US Prices into log returns i.e. log(Pt/Pt-1) or simple daily returns (Pt/Pt-1 - 1) for this analysis. What about Credit Spreads - credit spreads is like a return - so should I take log(spread) or simply use the spread. Lastly , the aggregate from daily to weekly or monthly , what functions should I use - Do I aggregate the log returns or the simple returns - do I take a simple sum or a cumulative aggregation - and how would I do it in R I am inclined to use the log returns for everything as regressions, correlation calculations all assume normality. But then to aggregate I would need to use the original returns (or spreads) - aggregate cumulatively - and then take the log ? [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Fitting / Distributions
I have the following : a sequence of security returns and their probabilities e.g. Security Returns / Probability of Event : 10% Return with 50% probability -5% Return with 10% probability 3% return with 10% probability 15% return with 10% probability I can calculate the mean and the std deviation of the above i.e. E[X] and sqrt of E[X^2] - E^2[X] where E[X] is the sum of probabilities and returns. Given that I have the mean and the std. deviation , I can model the distribution as normal and do further analysis. But is there a way of better fitting a distribution - to take the tails into account - how would I fit the distribution and model(graph) the probability distribution in R And how would I do a monte-carlo for use in simulation that would generate the appropriate distribution [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Removing Bad Data
I created a couple of timeSeries objects - when I was merging them , I got an error. Looking at the data , I see that one of the time series has 06/30/2007 0.0028 0.0183 0.0122 0.0042 0.0095 - 07/31/2007 -0.0111 0.0255 0.0096 -0.0069 -0.0024 0.0043 08/31/2007 -0.0108 -0.0237 -0.0062 -0.0138 -0.0173 -0.0065 09/30/2007 0.0197 0.0477 0.0410 0.0331 0.0114 0.0322 The "-" (first row, last column) is getting picked up from excel and is not a number How do I check for this and replace with zero [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Package for Jump detection
Dear All, Are there any packages in R to carry out the jump detection test and find the jump sizes and its its time of occurence on high frequency data(5 minute interval) using non-parametric approach suggested by Lee and Mykland in their paper "Jumps in Financial Markets: A New Nonparametric Test and Jump Dynamics". Regards and Thanks in advance, Rahul [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Neuralnet Error
I require some help in debugging this code library(neuralnet) ir<-read.table(file="iris_data.txt",header=TRUE,row.names=NULL) ir1 <- data.frame(ir[1:100,2:6]) ir2 <- data.frame(ifelse(ir1$Species=="setosa",1,ifelse(ir1$Species=="versicolor",0,""))) colnames(ir2)<-("Output") ir3 <- data.frame(rbind(ir1[1:4],ir2)) #rownames(ir3)<-c("SL","SW","PL","PW","Output") print(ir3) n<- neuralnet(Output~Sepal.Length+Sepal.Width+Petal.Length+Petal.Width,data=ir3,err.fct="sse",hidden=2,linear.output=FALSE) Output: Error in neurons[[i]] %*% weights[[i]] : requires numeric/complex matrix/vector arguments Any assisstance is appreciated [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] High Frequency Data
Is there any way to analyse high frequency data in R like cleaning,manipulation and volatility etc.I know there are packages like RTAQ and Realized for analysing high frequency data but they are only valid for NYSE stocks and have well defined data format. Please help. [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] convert function in RTAQ package for high frequency data analysis
Hello Everyone, I am trying to convert the txt file into RData format by using convert function in RTAQ package.The txt file looks like: 2010-07-01 08:04:28 SBUX Q 24.9500 100 T 0 0 2010-07-01 08:04:28 SBUX Q 24.9500 100 T 0 0 2010-07-01 08:04:28 SBUX Q 24.9600 300 T 0 0 The code I am using is: > convert(from="2010-07-01",to="2010-07 -01",datasource="C:\\workdirectory\\TAQdata",datadestination="C:\\workdirectory\\datadestination",trades=T,quotes=F,ticker="SBUX",dir=F,format="%Y-%m-%d %H:%M:%S",) Warning message: In read.table(file = file, header = header, sep = sep, quote = quote, : incomplete final line found by readTableHeader on 'C:\workdirectory\TAQdata/2010-07-01/SBUX_trades.txt' > sbux.t <- TAQLoad(tickers="SBUX",from="2010-07-01",to="2010-07-01",trades=T,quotes=F,datasource="C:\\workdirectory\\datadestination") > head(sbux.t,3) I am getting the following output: SYMBOL EX PRICE SIZE COND CORR G127 "2010-07-01" "SBUX" "24.9500" "100" "1" "0" "0" "2010-07-01" "SBUX" "24.9500" "100" "1" "0" "0" "2010-07-01" "SBUX" "24.9600" "300" "1" "0" "0" Warning message: timezone of object (GMT) is different than current timezone (). But my desired output is: SYMBOL EX PRICE SIZE COND CORR G127 2010-07-01 08:04:28 "SBUX" "Q" "24.9500" " 100" "T" "0" "0" 2010-07-01 08:04:28 "SBUX" "Q" "24.9500" " 100" "T" "0" "0" 2010-07-01 08:04:28 "SBUX" "Q" "24.9600" " 300" "T" "0" "0" Can someone please help? Thanks [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R reference Books
I found "The R Book" by Michael J. Crawley quite satisfying for the purpose of learning R. On May 28, 10:25 pm, "Neil Gupta" <[EMAIL PROTECTED]> wrote: > Hi I am still fairly new to R but picking it up quickly. I have some > problems manipulating data in tables. I was wondering if anyone new any good > resources such as an R manual. Some of the intro pdfs I viewed do not show > how...much appreciated. > > [[alternative HTML version deleted]] > > __ > [EMAIL PROTECTED] mailing listhttps://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guidehttp://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Oja median
Hi, Can we get the code for calculating Oja median for multivariate data Thanks and Regards Rahul Agarwal Analyst Equities Quantitative Research UBS_ISC, Hyderabad On Net: 19 533 6363 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Oja median
Hi Richard, Thanks for the code.but I already have a code for bivariate data...I want it for data when it is multivariate or its dimension is greater than two...i mean to say if it is trivariate or higher dimensioncan I get a more generalised form of oja median code Thanks and Regards Rahul Agarwal Analyst Equities Quantitative Research UBS_ISC, Hyderabad On Net: 19 533 6363 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Sent: Thursday, September 18, 2008 3:02 PM To: Agarwal, Rahul-A Cc: r-help@r-project.org; [EMAIL PROTECTED] Subject: Re: [R] Oja median > Can we get the code for calculating Oja median for multivariate data RSiteSearch("oja median") returns a link to this R-help post with code http://finzi.psych.upenn.edu/R/Rhelp02a/archive/12781.html Regards, Richie. Mathematical Sciences Unit HSL ATTENTION: This message contains privileged and confidential inform...{{dropped:20}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Oja median
Thanks a lot Martin. I would definitely like to know more about calculating multivariate median. It would be better if I can get some method where I can calculate the median using functions in R. Will wait for your reply Thanks Rahul Agarwal Analyst Equities Quantitative Research UBS_ISC, Hyderabad On Net: 19 533 6363 -Original Message- From: Martin Maechler [mailto:[EMAIL PROTECTED] Sent: Thursday, September 18, 2008 5:47 PM To: Agarwal, Rahul-A Cc: [EMAIL PROTECTED]; r-help@r-project.org Subject: Re: [R] Oja median >>>>> <[EMAIL PROTECTED]> >>>>> on Thu, 18 Sep 2008 05:20:56 -0400 writes: > Hi, > Can we get the code for calculating Oja median for > multivariate data Excuse me, but must you really? The Oja median has (finite) breakdown point 2/n, i.e., is not robust in any reasonable sense, and is quite expensive to compute, etc etc. Rather use a better robust method, typically (start with) cov.rob() from the recommended (hence part of every correct R installation) MASS package. There are (somewhat) better methods available for robust "location+scatter" [aka "(mu , Sigma)"] from packages 'rrcov', 'robustbase' and/or 'robust', e.g., robustbase::covMcd If you need to know more, please divert to the SIG (special interest group) mailling list R-SIG-robust to which I CC this. Martin Maechler, ETH Zurich __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Oja median
Hi Could any one help me to code tukey half space depth and directinal quantile regression in R Rahul Agarwal Analyst Equities Quantitative Research UBS_ISC, Hyderabad On Net: 19 533 6363 -Original Message- From: roger [mailto:[EMAIL PROTECTED] Sent: Friday, September 19, 2008 5:15 AM To: Agarwal, Rahul-A Cc: [EMAIL PROTECTED]; r-help@r-project.org; [EMAIL PROTECTED] Subject: Re: [R] Oja median If you had followed the thread of the link that Richard reported you will see an implementation for the general d-dimensional version. Of course this isn't very speedy in higher dimensions, but that is of the nature of the beast, I'm afraid. On Sep 18, 2008, at 4:44 AM, <[EMAIL PROTECTED]> wrote: > Hi Richard, > > Thanks for the code.but I already have a code for bivariate > data...I want it for data when it is multivariate or its dimension is > greater than two...i mean to say if it is trivariate or higher > dimensioncan I get a more generalised form of oja median code > > Thanks and Regards > > > Rahul Agarwal > Analyst > Equities Quantitative Research > UBS_ISC, Hyderabad > On Net: 19 533 6363 > > > > > -Original Message- > From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] > Sent: Thursday, September 18, 2008 3:02 PM > To: Agarwal, Rahul-A > Cc: r-help@r-project.org; [EMAIL PROTECTED] > Subject: Re: [R] Oja median > >> Can we get the code for calculating Oja median for multivariate data > > RSiteSearch("oja median") returns a link to this R-help post with code > http://finzi.psych.upenn.edu/R/Rhelp02a/archive/12781.html > > Regards, > Richie. > > Mathematical Sciences Unit > HSL > > > -- > -- > ATTENTION: > > This message contains privileged and confidential inform...{{dropped: > 20}} > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reading Data
Hi I don't want to ignore the date row. See basically my first row is the date column and the 1st column is the stocks name. Now using the if loop I have to find prices of stocks corresponding to a date. I hope the problem is clear to you now For example Stocks 30-Jan-08 28-Feb-08 31-Mar-08 30-Apr-08 a1.003.007.003.00 b2.004.004.007.00 c3.008.00655.00 3.00 d4.0023.00 4.005.00 e5.0078.00 6.005.00 Rahul Agarwal Analyst Equities Quantitative Research UBS_ISC, Hyderabad On Net: 19 533 6363 -Original Message- From: Gustaf Rydevik [mailto:[EMAIL PROTECTED] Sent: Tuesday, October 07, 2008 2:19 PM To: Agarwal, Rahul-A Cc: [EMAIL PROTECTED]; r-help@r-project.org Subject: Re: [R] Reading Data On Tue, Oct 7, 2008 at 10:36 AM, <[EMAIL PROTECTED]> wrote: > > Hi, > I have a data in which the first row is in date format and the first > column is in text format and rest all the entries are numeric. > Whenever I am trying to read the data using read.table, the whole of > my data is converted in to the text format. > > Please suggest what shall I do because using the numeric data which > are prices I need to calculate the return but if these prices are not > numeric then calculating return will be a problem > > regards > > Rahul Agarwal > Analyst > Equities Quantitative Research > UBS_ISC, Hyderabad > On Net: 19 533 6363 > Hi, A single column in a data frame can't contain mixed formats. In the absence of example data, would guess one of the following could work : 1) read.table("data.txt",skip=1, header=T) ## If you have headers 2) read.table("data.txt", header=T) ## If the date row is supposed to be variable names. 3) read.table("data.txt",skip=1) ## If there are no headers, and you want to ignore the date regards, Gustaf -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reading Data
Hi What if I need to take dates and stock names from these table...i mean I need to read in this table and then use if function and extratc the data from FOO Identifier weight Start_Date End_Date a 6.7631Jan06 31Jan07 g 2.8628Feb06 28Feb07 e 22.94 31Mar06 30Mar07 c 30.05 28Apr06 30Apr07 t 20.55 31May06 31May07 Rahul Agarwal Analyst Equities Quantitative Research UBS_ISC, Hyderabad On Net: 19 533 6363 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Philipp Pagel Sent: Tuesday, October 07, 2008 2:52 PM To: r-help@r-project.org Subject: Re: [R] Reading Data > For example > > Stocks 30-Jan-08 28-Feb-08 31-Mar-08 30-Apr-08 > a 1.003.007.003.00 > b 2.004.004.007.00 > c 3.008.00655.00 3.00 > d 4.0023.00 4.005.00 > e 5.0078.00 6.005.00 OK - this may be what you want: > foo <- read.table('q.tbl', header=T, check.names=F, row.names=1) > str(foo) 'data.frame': 5 obs. of 4 variables: $ 30-Jan-08: num 1 2 3 4 5 $ 28-Feb-08: num 3 4 8 23 78 $ 31-Mar-08: num 7 4 655 4 6 $ 30-Apr-08: num 3 7 3 5 5 > foo 30-Jan-08 28-Feb-08 31-Mar-08 30-Apr-08 a 1 3 7 3 b 2 4 4 7 c 3 8 655 3 d 423 4 5 e 578 6 5 > foo['31-Mar-08'] 31-Mar-08 a 7 b 4 c 655 d 4 e 6 > foo['d', '31-Mar-08'] [1] 4 Maybe row.names is not what you want - but the example sould get you going. cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan 85350 Freising, Germany http://mips.gsf.de/staff/pagel __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reading Data
Bang on!! Thanks for the help Rahul Agarwal Analyst Equities Quantitative Research UBS_ISC, Hyderabad On Net: 19 533 6363 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Philipp Pagel Sent: Tuesday, October 07, 2008 2:52 PM To: r-help@r-project.org Subject: Re: [R] Reading Data > For example > > Stocks 30-Jan-08 28-Feb-08 31-Mar-08 30-Apr-08 > a 1.003.007.003.00 > b 2.004.004.007.00 > c 3.008.00655.00 3.00 > d 4.0023.00 4.005.00 > e 5.0078.00 6.005.00 OK - this may be what you want: > foo <- read.table('q.tbl', header=T, check.names=F, row.names=1) > str(foo) 'data.frame': 5 obs. of 4 variables: $ 30-Jan-08: num 1 2 3 4 5 $ 28-Feb-08: num 3 4 8 23 78 $ 31-Mar-08: num 7 4 655 4 6 $ 30-Apr-08: num 3 7 3 5 5 > foo 30-Jan-08 28-Feb-08 31-Mar-08 30-Apr-08 a 1 3 7 3 b 2 4 4 7 c 3 8 655 3 d 423 4 5 e 578 6 5 > foo['31-Mar-08'] 31-Mar-08 a 7 b 4 c 655 d 4 e 6 > foo['d', '31-Mar-08'] [1] 4 Maybe row.names is not what you want - but the example sould get you going. cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan 85350 Freising, Germany http://mips.gsf.de/staff/pagel __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Reading Data
Hi, I have a data in which the first row is in date format and the first column is in text format and rest all the entries are numeric. Whenever I am trying to read the data using read.table, the whole of my data is converted in to the text format. Please suggest what shall I do because using the numeric data which are prices I need to calculate the return but if these prices are not numeric then calculating return will be a problem regards Rahul Agarwal Analyst Equities Quantitative Research UBS_ISC, Hyderabad On Net: 19 533 6363 -Original Message- From: Martin Maechler [mailto:[EMAIL PROTECTED] Sent: Thursday, September 18, 2008 5:47 PM To: Agarwal, Rahul-A Cc: [EMAIL PROTECTED]; r-help@r-project.org Subject: Re: [R] Oja median >>>>> <[EMAIL PROTECTED]> >>>>> on Thu, 18 Sep 2008 05:20:56 -0400 writes: > Hi, > Can we get the code for calculating Oja median for > multivariate data Excuse me, but must you really? The Oja median has (finite) breakdown point 2/n, i.e., is not robust in any reasonable sense, and is quite expensive to compute, etc etc. Rather use a better robust method, typically (start with) cov.rob() from the recommended (hence part of every correct R installation) MASS package. There are (somewhat) better methods available for robust "location+scatter" [aka "(mu , Sigma)"] from packages 'rrcov', 'robustbase' and/or 'robust', e.g., robustbase::covMcd If you need to know more, please divert to the SIG (special interest group) mailling list R-SIG-robust to which I CC this. Martin Maechler, ETH Zurich __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Reading Data
Why m I getting this message data=read.table("H:/Rahul/london/david/rexcel/price1.txt",header=T,check.names=F,row.names=1) Error in read.table("H:/Rahul/london/david/rexcel/price1.txt", header = T, : duplicate 'row.names' are not allowed Rahul Agarwal Analyst Equities Quantitative Research UBS_ISC, Hyderabad On Net: 19 533 6363 -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Philipp Pagel Sent: Tuesday, October 07, 2008 2:52 PM To: r-help@r-project.org Subject: Re: [R] Reading Data > For example > > Stocks 30-Jan-08 28-Feb-08 31-Mar-08 30-Apr-08 > a 1.003.007.003.00 > b 2.004.004.007.00 > c 3.008.00655.00 3.00 > d 4.0023.00 4.005.00 > e 5.0078.00 6.005.00 OK - this may be what you want: > foo <- read.table('q.tbl', header=T, check.names=F, row.names=1) > str(foo) 'data.frame': 5 obs. of 4 variables: $ 30-Jan-08: num 1 2 3 4 5 $ 28-Feb-08: num 3 4 8 23 78 $ 31-Mar-08: num 7 4 655 4 6 $ 30-Apr-08: num 3 7 3 5 5 > foo 30-Jan-08 28-Feb-08 31-Mar-08 30-Apr-08 a 1 3 7 3 b 2 4 4 7 c 3 8 655 3 d 423 4 5 e 578 6 5 > foo['31-Mar-08'] 31-Mar-08 a 7 b 4 c 655 d 4 e 6 > foo['d', '31-Mar-08'] [1] 4 Maybe row.names is not what you want - but the example sould get you going. cu Philipp -- Dr. Philipp Pagel Lehrstuhl für Genomorientierte Bioinformatik Technische Universität München Wissenschaftszentrum Weihenstephan 85350 Freising, Germany http://mips.gsf.de/staff/pagel __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] FW: Reading Data
Rahul Agarwal Analyst Equities Quantitative Research UBS_ISC, Hyderabad On Net: 19 533 6363 hi let me explain you the problem we have a database which is in this format Stocks 30-Jan-08 28-Feb-08 31-Mar-08 30-Apr-08 a1.003.007.003.00 b2.004.004.007.00 c3.008.00655.00 3.00 d4.0023.00 4.005.00 e5.0078.00 6.005.00 and we have a query which is in this format Identifier weight Start_Date End_Date a6.7631-Jan-06 31-Jan-07 e2.8628-Feb-06 28-Feb-07 f22.94 31-Mar-06 30-Mar-07 y30.05 28-Apr-06 30-Apr-07 h20.55 31-May-06 31-May-07 d6.76 f2.86 r22.94 okay now my task is to calculate returns for all the indentifiers for a respective start and end date from table 1. now length of start date and end date column is same and that of weight and identifier is same. i hope everything is clear now. let me also send you the code that i have written but in my code i have problem with the date format and also with the stocks name data=read.table("H:/Rahul/london/david/rexcel/price.txt") query=read.table("H:/Rahul/london/david/rexcel/prac.txt",header=TRUE) data=as.matrix(data) instrument=data[,1] date=data[1,] query=as.matrix(query) q_ins=query[,1] wt=query[,2] q_sd=query[,3] q_ed=query[,4] returns=function(I,SD,ED){ p=rep(0,2) for(i in 2:length(instrument)) { if(instrument[i]==I) { for(j in 2:length(date)) { if(date[j]==SD) p[1]=data[i,j] } for(j in 2:length(date)) { if(date[j]==ED) p[2]=data[i,j] } } returns=(p[2]/p[1])-1 } #print(p) #print(I) return(returns) } ##The original Funda## matrix_ret=matrix(0,length(q_ins),length(q_sd)) for(i in 1:length(q_sd)) { for(j in 1:length(q_ins)) { matrix_ret[j,i]=returns(q_ins[j],q_sd[i],q_ed[i]) } } #Removing NA from the matrix matrix_ret1=sapply(X=matrix_ret, FUN=function(x) ifelse(is.na(x),0.00,x)) matrix_ret=matrix(matrix_ret1,length(q_ins),length(q_sd)) wt_ret=matrix(0,length(q_sd),1) for(i in 1:length(q_sd)) { for(j in 1:length(q_ins)) { wt_ret[i]=wt_ret[i]+(wt[j]*matrix_ret[j,i]) } } result=cbind(q_ed,wt_ret) Rahul Agarwal Analyst Equities Quantitative Research UBS_ISC, Hyderabad On Net: 19 533 6363 -Original Message- From: Gustaf Rydevik [mailto:[EMAIL PROTECTED] <mailto:[EMAIL PROTECTED]> ] Sent: Tuesday, October 07, 2008 2:19 PM To: Agarwal, Rahul-A Cc: [EMAIL PROTECTED]; r-help@r-project.org Subject: Re: [R] Reading Data On Tue, Oct 7, 2008 at 10:36 AM, <[EMAIL PROTECTED]> wrote: > > Hi, > I have a data in which the first row is in date format and the first > column is in text format and rest all the entries are numeric. > Whenever I am trying to read the data using read.table, the whole of > my data is converted in to the text format. > > Please suggest what shall I do because using the numeric data which > are prices I need to calculate the return but if these prices are not > numeric then calculating return will be a problem > > regards > > Rahul Agarwal > Analyst > Equities Quantitative Research > UBS_ISC, Hyderabad > On Net: 19 533 6363 > Hi, A single column in a data frame can't contain mixed formats. In the absence of example data, would guess one of the following could work : 1) read.table("data.txt",skip=1, header=T) ## If you have headers 2) read.table("data.txt", header=T) ## If the date row is supposed to be variable names. 3) read.table("data.txt",skip=1) ## If there are no headers, and you want to ignore the date regards, Gustaf -- Gustaf Rydevik, M.Sci. tel: +46(0)703 051 451 address:Essingetorget 40,112 66 Stockholm, SE skype:gustaf_rydevik [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Oja median
i have tried the code. The first function (which computes the Oja median) has a slight problem: If you try four points that create a square centred at (2,2) oja.median(cbind(c(1,3,1,3),c(1,1,3,3))) gives (-2,-2) instead of (2,2). Probably just a sign switch somewhere. The second function seems to get the Oja median right but I don't understand what it does when computing quantiles other than the median. It seems to me (but I might be missing something!) that the ordering of observations in the computation of the cofactors, i.e. how signs alternate, should depend on an angle theta that is given as an input. In Koenker's book (p. 274) this is obtained by sorting the bivariate observations y(i), i=1,...,n, according to the quantity y(i,1)cos(theta)+y(i,2)sin(theta). In the p>2 case I would expect to provide p-1 angles to determine a direction in p dimensions. If I have understood correctly, the code just follows the order in which the observations are found in the original data matrix. Rahul Agarwal __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Problem in Cart
Hi, Can some one help me in this project. I would like to initiate a project using CART. For example for NIFTY 50 stocks the first node could be cheap or expensive based on PE (Price Earning) with 2 subsequent nodes for earnings certainty and return on assets. Can anyone tell me how to go ahead for this project. I believe Prof. Ripley can have a say on it. Rahul Agarwal Visit our website at http://www.ubs.com This message contains confidential information and is in...{{dropped:16}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Problem in Cart
Hi, Sorry for the confusion but I am looking to use R for regression trees. My query is stated below and I am not able to understand how can I use tree library in this case Rahul Agarwal Analyst Equities Quantitative Research UBS_ISC, Hyderabad On Net: 19 533 6363 -Original Message- From: Bert Gunter [mailto:[EMAIL PROTECTED] Sent: Wednesday, October 22, 2008 7:08 PM To: Agarwal, Rahul-A; R-help@r-project.org Subject: RE: [R] Problem in Cart CART is a commercial package and not part of R. R has several packages that do various kinds of regression and classification trees. Try: RSiteSearch("Classification Tree",restr="func") -- Bert Gunter -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of [EMAIL PROTECTED] Sent: Wednesday, October 22, 2008 2:13 AM To: R-help@r-project.org Subject: [R] Problem in Cart Hi, Can some one help me in this project. I would like to initiate a project using CART. For example for NIFTY 50 stocks the first node could be cheap or expensive based on PE (Price Earning) with 2 subsequent nodes for earnings certainty and return on assets. Can anyone tell me how to go ahead for this project. I believe Prof. Ripley can have a say on it. Rahul Agarwal Visit our website at http://www.ubs.com This message contains confidential information and is\ i...{{dropped:28}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R D COM Excel Add-Ins
Also I wud like to add that in this addin the rput command never works Rahul Agarwal -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of David Seres Sent: Friday, October 24, 2008 2:46 PM To: r-help@r-project.org Subject: [R] R D COM Excel Add-Ins Hello All! I have a question regarding the package RDCOMClient. I want to start an Excel file with R and it works flawlessly except the fact, that Add-Ins are not loaded. Can someone please explain me how to load one? Does it work with ex$AddIns$Invoke? Greetings, David [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. Visit our website at http://www.ubs.com This message contains confidential information and is in...{{dropped:16}} __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Dataframe with unequal rows
I have a data frame with unequal rows length separated by comma.I have to read the data first and then calculate number of comma in each row...how can I do that Regards Rahul __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] R in the NY Times
I believe R as a package has everything people with little knowledge of programming can handle quite easily. Moreover even if someone has no programming knowledge can learn R without much effort. I also believe if people in corporate world start using R instead of other complex software which are very expensive then in this job make we can save many jobs and can also save people. -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of ohri2...@gmail.com Sent: Friday, January 09, 2009 12:58 AM To: Louis Bajuk-Yorgan Cc: r-help@r-project.org Subject: Re: [R] R in the NY Times Yes I think R as a package can really learn from SAS and SPSS in making GUI more user friendly , even at the risk of dumbing down some complexity.. also as a consultant I know that selling software requires a lot of marketing follow ups..which is why R has lagged behind in actual implementation and marketing ( who will go on site at a client and implement)...despite being more robust and of course helping companies save costs in these critical times. If you market R more and even get a 10 % share of the commercial market, imagine how many jobs you save by cutting down software costs of the employers.. Ajay www.decisionstats.com On 1/8/09, Louis Bajuk-Yorgan wrote: > > As the product manager for S+, I'd like to comment as well. I think > the burgeoning interest in R demonstrates that there's demand for > analytics to solve real, business-critical problems in a broad > spectrum of companies and roles, and that some of the incumbent > analytics offerings, in particular SAS and SPSS, don't sufficiently > meet the growing need for analytics in many major companies. > > S+ (now TIBCO Spotfire S+) is of course a commercial software package > based on the S language, which was a forerunner of R as mentioned in > the article, and has been widely adopted. It is currently used in a > wide variety of areas, including Life Sciences, Financial Services, > and Utilities, for applications such as speeding the analysis of > clinical trial data, optimizing portfolios, and assessing potential > sites for building wind farms. > > I welcome, respect, and appreciate the vitality, creativity, and sheer > productivity of the R community, and the high quality of statistical > methods the community creates. And, because of the close historical > ties between the two products, it is generally easy to port most R > statistics into the commercial S+ environment, and we have worked to > make that easier in recent releases. > > Once in S+, these analytic methods can be incorporated into intuitive > tools for business decision makers and deployed to automated > environments, using visual workflows, web-based applications (using > standard web services), Spotfire Guided Applications for dynamic > visual analysis, and scalable, event-driven architectures using > TIBCO's IT infrastructure. S+ also provides some unique offerings, > such as the ability to flexibly and efficiently analyze very large data sets. > > In this way, I feel companies can maximize the value of their analytic > investments to make rapid business decisions, whether those analytics > are developed in R or S+. > > Regards, > Lou Bajuk-Yorgan > Sr. Director, Product Management > TIBCO Spotfire Division > lba...@tibco.com > > -Original Message- > From: r-help-boun...@r-project.org > [mailto:r-help-boun...@r-project.org] > On Behalf Of Douglas Bates > Sent: Wednesday, January 07, 2009 12:58 PM > To: marc_schwa...@comcast.net > Cc: r-help@r-project.org > Subject: Re: [R] R in the NY Times > > On Wed, Jan 7, 2009 at 8:50 AM, Marc Schwartz > wrote: >> on 01/07/2009 08:44 AM Kevin E. Thorpe wrote: >>> Zaslavsky, Alan M. wrote: This article is accompanied by nice pictures of Robert and Ross. Data Analysts Captivated by Power of R http://www.nytimes.com/2009/01/07/technology/business-computing/07p r ogram.html January 7, 2009 Data Analysts Captivated by R's Power By ASHLEE VANCE SAS says it has noticed R's rising popularity at universities, despite educational discounts on its own software, but it dismisses the technology as being of interest to a limited set of people working on very hard tasks. "I think it addresses a niche market for high-end data analysts that > want free, readily available code," said Anne H. Milley, director of > technology product marketing at SAS. She adds, "We have customers who build engines for aircraft. I am happy they are not using freeware when I get on a jet." >>> >>> Thanks for posting. Does anyone else find the statement by SAS to >>> be > >>> humourous yet arrogant and short-sighted? >>> >>> Kevin > >> It is an ignorant comment by a marketing person who has been spoon >> fed > >> her lines...it is also a comment being made from a very d
[R] Debug command-how to use
I am getting this error could any one tell me why? if(debug) cat("rahul") Error in if (debug) cat("rahul") : argument is not interpretable as logical __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Debug command-how to use
Thanks a lot...I realised where I was going wrongif I declare debug = F then my problem is getting solved. -Original Message- From: jim holtman [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 12, 2008 9:12 AM To: Agarwal, Rahul-A Cc: r-help@r-project.org Subject: Re: [R] Debug command-how to use What is 'debug' defined as? Include at least the assignment or 'str(debug)'. If you have not assigned anything to it, then 'debug' is a function in the basic set of R functions and may be giving you a message like: > if (debug) 1 Error in if (debug) 1 : argument is not interpretable as logical > str(debug) # here is what it is defined as function (fun) On Tue, Nov 11, 2008 at 9:50 PM, <[EMAIL PROTECTED]> wrote: > I am getting this error could any one tell me why? > if(debug) cat("rahul") > Error in if (debug) cat("rahul") : > argument is not interpretable as logical > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. > -- Jim Holtman Cincinnati, OH +1 513 646 9390 What is the problem that you are trying to solve? __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Big O, little o
Please refer to Introduction to probability and statistics by Rohatgi and Saley -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Sohail Sent: Wednesday, November 12, 2008 5:57 PM To: R-help@r-project.org Subject: [R] Big O, little o Sorry for misusing this forum, Please, can anyone refer some book or other source to understand the concept of Big O and little o in probability? Regards, Sohail Chand [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] REXcel problem
> Hi, > > I am trying to run a code in Excel using VBA from R, using > Rexcel.but everytime I am getting this error. Error - 2147220502 in module Recel.Rserver Error running expression Eval(parse(text="setwd(\"H:\\ > I am using R-2.8.0 and R.2.7 > Please help. > > > > << OLE Object: Picture (Device Independent Bitmap) >> > [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Oja median
Hi Roger, As we know that The Oja median has (finite) breakdown point 2/n, i.e., is not robust in any reasonable sense, and is quite expensive to compute, so do we have some better methodology to compute multivariate median Rahul Agarwal Analyst Equities Quantitative Research UBS_ISC, Hyderabad On Net: 19 533 6363 [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] Oja median
Apologies Roger, if you find anything wrong with the mail. Thanks for the reference Roger. IS there anything else which I can look at, say something in R itself? Regards and thanks once again Rahul Agarwal Analyst Equities Quantitative Research UBS_ISC, Hyderabad On Net: 19 533 6363 -Original Message- From: roger koenker [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 19, 2008 10:49 PM To: Agarwal, Rahul-A Cc: r-help; [EMAIL PROTECTED] Subject: Re: [R] Oja median Cross posting is sociopathic. The Oja median _is_ robust in several reasonable senses, but admittedly not from a breakdown perspective. There are several other options: for a good review see, e.g. Multivariate Median, A. Niinimaa and H. Oja Encyclopedia of Statistical Sciences url:www.econ.uiuc.edu/~rogerRoger Koenker email[EMAIL PROTECTED]Department of Economics vox: 217-333-4558University of Illinois fax: 217-244-6678Champaign, IL 61820 On Nov 19, 2008, at 5:40 AM, <[EMAIL PROTECTED]> wrote: > > Hi Roger, > > As we know that The Oja median has (finite) breakdown point 2/n, i.e., > is not robust in any reasonable sense, and is quite expensive to > compute, so do we have some better methodology to compute multivariate > median > > > Rahul Agarwal > Analyst > Equities Quantitative Research > UBS_ISC, Hyderabad > On Net: 19 533 6363 > > > > > > [[alternative HTML version deleted]] > > __ > R-help@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Looking for Financial case study/scenario - Integration R with Highly powerful Database (SAP HANA)
Hi, I am working on a highly powerful analytical database system - SAP HANA. it can crunch a million records in a matter of micro seconds and supply the data (in the form of a table/dataframe) for any R algorithm/function. I am looking for specific scenarios/use cases/KPIs in the Finance/Insurance sector to bring together the the data crunching power of SAP HANA and functions available in HANA. Thanks & Regards,Rahul Rajagopalan Nair [[alternative HTML version deleted]] __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.