Using quantiles does not imply assumption of normality, unless you drag that assumption in separately. Please go review statistics again, offlist, and come back when you need help with R. -- Sent from my phone. Please excuse my brevity.
On October 22, 2017 10:02:57 PM PDT, Hemant Sain <hemantsai...@gmail.com> wrote: >hello, >I'm confused what you guys are talking about. >i just want to set ideal threshold values for my RFM scores which can >be >done using Quantiles but i don't want to use quantiles because my data >is >not normally distributed so it will lead to wrong ranges of breaks. to >fix >this problem I'm looking for an approach which can define the ideal >range >to breaks to categorize RFM scores into 3 segments. >that's all i want. >THanks > > >On 14 October 2017 at 04:24, Jim Lemon <drjimle...@gmail.com> wrote: > >> Hemant's problem is that the indicators are not distributed >uniformly. >> With a uniform distribution, categorization gives a reasonably >optimal >> separation of cases. One approach would be to drop categorization and >> calculate the overall score as the mean of the standardized indicator >> scores. Whether this is an option I do not know. I did offer an >> "eyeball" set of breaks in a previous email, but apparently this was >> not sufficient. >> >> Jim >> >> On Sat, Oct 14, 2017 at 4:27 AM, David Winsemius ><dwinsem...@comcast.net> >> wrote: >> > >> >> On Oct 13, 2017, at 2:51 AM, PIKAL Petr <petr.pi...@precheza.cz> >wrote: >> >> >> >> Hi >> >> >> >> You expect us to solve your problem but you ignore advice already >> recieved. >> >> >> >> Your data are unreadable, use dput(yourdata) instead. see ?dput >> >> >> >>> test<-read.table("clipboard", heade=T) >> >> Error in scan(file = file, what = what, sep = sep, quote = quote, >dec = >> dec, : >> >> line 115 did not have 6 elements >> > >> > I didn't have such a problem: (illustrated with a more minimal >example) >> > >> > dat <- scan( what=list("",1,"",1L,1L,1), >> > text="194849 6.99 8/22/2017 9 5 9.996 >> > 194978 14.78 8/28/2017 3 15 16.308 >> > 198614 18.44 7/31/2017 31 1 18.44 >> > 234569 34.99 8/20/2017 11 8 13.5075 >> > 252686 7.99 7/31/2017 31 2 7.99 >> > 291719 21.26 8/25/2017 6 2 15.67 >> > 291787 46.1 8/31/2017 0 2 32.57 >> > 292630 24.34 7/31/2017 31 1 24.34 >> > 295204 21.86 7/18/2017 44 1 21.86 >> > 295989 8.98 8/20/2017 11 2 14.095 >> > 298883 14.38 8/24/2017 7 2 11.185 >> > 308824 10.77 7/31/2017 31 1 10.77") >> > >> > names(dat) <- c("user_id", "subtotal_amount", "created_at", >"Recency", >> "Frequency", "Monetary") >> > dat <- data.frame(dat,stringsAsFactors=FALSE) >> > >> > I suspect read.table would also have worked for me, but I was >expecting >> difficulties based on Petr's posting. >> > >> > >> > #And ended up with this result (on the original copied data): >> >> str(dat) >> > 'data.frame': 500 obs. of 6 variables: >> > $ user_id : chr "194849" "194978" "198614" "234569" ... >> > $ subtotal_amount: num 6.99 14.78 18.44 34.99 7.99 ... >> > $ created_at : chr "8/22/2017" "8/28/2017" "7/31/2017" >"8/20/2017" >> ... >> > $ Recency : int 9 3 31 11 31 6 0 31 44 11 ... >> > $ Frequency : int 5 15 1 8 2 2 2 1 1 2 ... >> > $ Monetary : num 10 16.31 18.44 13.51 7.99 ... >> > >> > ... but the following criticism seems, well, _critical_ (as in >> essential for one to address if a reasonable proposal is to be >offered.) >> > >> > >> >> What is „ideal interval“ can you define it? Should it be such to >> provide eqal number of observations? >> > >> > That is the crucial question for you to answer, Hemant. Read the >> ?quartile help page if your answer is "yes" or even "maybe". >> >> >> >> Or maybe you could normalise your values and use quartile method. >> > >> > Well, maybe not so much on that last one, Petr. Normalization >should not >> affect the classification based on quartiles. It doesn't change the >> ordering of variables. >> > >> > -- >> > David. >> > >> >> >> >> Cheers >> >> Petr >> >> >> >> From: Hemant Sain [mailto:hemantsai...@gmail.com] >> >> Sent: Friday, October 13, 2017 8:51 AM >> >> To: PIKAL Petr <petr.pi...@precheza.cz> >> >> Cc: r-help mailing list <r-help@r-project.org> >> >> Subject: Re: [R] How to define proper breaks in RFM analysis >> >> >> >> Hey, >> >> i want to define 3 ideal breaks (bin) for each variable one of >those >> variables is attached in the previous email, >> >> i don't want to consider quartile method because quartile is not >> working ideally for that data set because data distribution is non >normal. >> >> so i want you to suggest another method so that i can define 3 >breaks >> with the ideal interval for Recency, frequency and monetary to >calculate >> RFM score. >> >> i'm again attaching you some of the data set. >> >> please look into it and help me with the R code. >> >> Thanks >> >> >> >> >> >> >> >> Data >> >> >> >> user_id >> >> >> >> subtotal_amount >> >> >> >> created_at >> >> >> >> Recency >> >> >> >> Frequency >> >> >> >> Monetary >> >> >> >> 194849 >> >> >> >> 6.99 >> >> >> >> 8/22/2017 >> >> >> > snipped >> > >> >> >> >> >> >> On 13 October 2017 at 10:35, PIKAL Petr ><petr.pi...@precheza.cz<mailto: >> petr.pi...@precheza.cz>> wrote: >> >> Hi >> >> >> >> Your statement about attaching data is problematic. We cannot do >much >> with it. Instead use output from dput(yourdata) to show us what >exactly >> your data look like. >> >> >> >> We also do not know how do you want to split your data. It would >be >> nice if you can show also what should be the bins with respective >data. >> Unless you provide this information you probably would not get any >sensible >> answer. >> >> >> >> Cheers >> >> Petr >> >> >> >> >> >>> -----Original Message----- >> >>> From: R-help [mailto:r-help-boun...@r-project.org<mailto:r-help- >> boun...@r-project.org>] On Behalf Of Hemant Sain >> >>> Sent: Thursday, October 12, 2017 10:18 AM >> >>> To: r-help mailing list <r-help@r-project.org<mailto:r >> -h...@r-project.org>> >> >>> Subject: [R] How to define proper breaks in RFM analysis >> >>> >> >>> Hello, >> >>> I'm working on RFM analysis and i wanted to define my own breaks >but my >> >>> frequency distribution is not normally distributed so when I'm >using >> quartile its >> >>> not giving the optimal results. >> >>> so I'm looking for a better approach where i can define breaks >> dynamically >> >>> because after visualization i can do it easily but i want to >apply >> this model so >> >>> that it can automatically define the breaks according to data >set. >> >>> I'm attaching sample data for reference. >> >>> >> >>> Thanks >> >>> >> >>> *Freq* >> >>> 5 >> >>> 15 >> >>> 1 >> > snipped >> >> . >> >> >> >> [[alternative HTML version deleted]] >> >> >> >> ______________________________________________ >> >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> >> https://stat.ethz.ch/mailman/listinfo/r-help >> >> PLEASE do read the posting guide http://www.R-project.org/ >> posting-guide.html >> >> and provide commented, minimal, self-contained, reproducible code. >> > >> > David Winsemius >> > Alameda, CA, USA >> > >> > 'Any technology distinguishable from magic is insufficiently >advanced.' >> -Gehm's Corollary to Clarke's Third Law >> > >> > ______________________________________________ >> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide http://www.R-project.org/ >> posting-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> > > > >-- >hemantsain.com > > [[alternative HTML version deleted]] > >______________________________________________ >R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >https://stat.ethz.ch/mailman/listinfo/r-help >PLEASE do read the posting guide >http://www.R-project.org/posting-guide.html >and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.