Hemant's problem is that the indicators are not distributed uniformly. With a uniform distribution, categorization gives a reasonably optimal separation of cases. One approach would be to drop categorization and calculate the overall score as the mean of the standardized indicator scores. Whether this is an option I do not know. I did offer an "eyeball" set of breaks in a previous email, but apparently this was not sufficient.
Jim On Sat, Oct 14, 2017 at 4:27 AM, David Winsemius <dwinsem...@comcast.net> wrote: > >> On Oct 13, 2017, at 2:51 AM, PIKAL Petr <petr.pi...@precheza.cz> wrote: >> >> Hi >> >> You expect us to solve your problem but you ignore advice already recieved. >> >> Your data are unreadable, use dput(yourdata) instead. see ?dput >> >>> test<-read.table("clipboard", heade=T) >> Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, >> : >> line 115 did not have 6 elements > > I didn't have such a problem: (illustrated with a more minimal example) > > dat <- scan( what=list("",1,"",1L,1L,1), > text="194849 6.99 8/22/2017 9 5 9.996 > 194978 14.78 8/28/2017 3 15 16.308 > 198614 18.44 7/31/2017 31 1 18.44 > 234569 34.99 8/20/2017 11 8 13.5075 > 252686 7.99 7/31/2017 31 2 7.99 > 291719 21.26 8/25/2017 6 2 15.67 > 291787 46.1 8/31/2017 0 2 32.57 > 292630 24.34 7/31/2017 31 1 24.34 > 295204 21.86 7/18/2017 44 1 21.86 > 295989 8.98 8/20/2017 11 2 14.095 > 298883 14.38 8/24/2017 7 2 11.185 > 308824 10.77 7/31/2017 31 1 10.77") > > names(dat) <- c("user_id", "subtotal_amount", "created_at", "Recency", > "Frequency", "Monetary") > dat <- data.frame(dat,stringsAsFactors=FALSE) > > I suspect read.table would also have worked for me, but I was expecting > difficulties based on Petr's posting. > > > #And ended up with this result (on the original copied data): >> str(dat) > 'data.frame': 500 obs. of 6 variables: > $ user_id : chr "194849" "194978" "198614" "234569" ... > $ subtotal_amount: num 6.99 14.78 18.44 34.99 7.99 ... > $ created_at : chr "8/22/2017" "8/28/2017" "7/31/2017" "8/20/2017" ... > $ Recency : int 9 3 31 11 31 6 0 31 44 11 ... > $ Frequency : int 5 15 1 8 2 2 2 1 1 2 ... > $ Monetary : num 10 16.31 18.44 13.51 7.99 ... > > ... but the following criticism seems, well, _critical_ (as in essential for > one to address if a reasonable proposal is to be offered.) > > >> What is „ideal interval“ can you define it? Should it be such to provide >> eqal number of observations? > > That is the crucial question for you to answer, Hemant. Read the ?quartile > help page if your answer is "yes" or even "maybe". >> >> Or maybe you could normalise your values and use quartile method. > > Well, maybe not so much on that last one, Petr. Normalization should not > affect the classification based on quartiles. It doesn't change the ordering > of variables. > > -- > David. > >> >> Cheers >> Petr >> >> From: Hemant Sain [mailto:hemantsai...@gmail.com] >> Sent: Friday, October 13, 2017 8:51 AM >> To: PIKAL Petr <petr.pi...@precheza.cz> >> Cc: r-help mailing list <r-help@r-project.org> >> Subject: Re: [R] How to define proper breaks in RFM analysis >> >> Hey, >> i want to define 3 ideal breaks (bin) for each variable one of those >> variables is attached in the previous email, >> i don't want to consider quartile method because quartile is not working >> ideally for that data set because data distribution is non normal. >> so i want you to suggest another method so that i can define 3 breaks with >> the ideal interval for Recency, frequency and monetary to calculate RFM >> score. >> i'm again attaching you some of the data set. >> please look into it and help me with the R code. >> Thanks >> >> >> >> Data >> >> user_id >> >> subtotal_amount >> >> created_at >> >> Recency >> >> Frequency >> >> Monetary >> >> 194849 >> >> 6.99 >> >> 8/22/2017 >> > snipped > >> >> >> On 13 October 2017 at 10:35, PIKAL Petr >> <petr.pi...@precheza.cz<mailto:petr.pi...@precheza.cz>> wrote: >> Hi >> >> Your statement about attaching data is problematic. We cannot do much with >> it. Instead use output from dput(yourdata) to show us what exactly your data >> look like. >> >> We also do not know how do you want to split your data. It would be nice if >> you can show also what should be the bins with respective data. Unless you >> provide this information you probably would not get any sensible answer. >> >> Cheers >> Petr >> >> >>> -----Original Message----- >>> From: R-help >>> [mailto:r-help-boun...@r-project.org<mailto:r-help-boun...@r-project.org>] >>> On Behalf Of Hemant Sain >>> Sent: Thursday, October 12, 2017 10:18 AM >>> To: r-help mailing list <r-help@r-project.org<mailto:r-help@r-project.org>> >>> Subject: [R] How to define proper breaks in RFM analysis >>> >>> Hello, >>> I'm working on RFM analysis and i wanted to define my own breaks but my >>> frequency distribution is not normally distributed so when I'm using >>> quartile its >>> not giving the optimal results. >>> so I'm looking for a better approach where i can define breaks dynamically >>> because after visualization i can do it easily but i want to apply this >>> model so >>> that it can automatically define the breaks according to data set. >>> I'm attaching sample data for reference. >>> >>> Thanks >>> >>> *Freq* >>> 5 >>> 15 >>> 1 > snipped >> . >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >> and provide commented, minimal, self-contained, reproducible code. > > David Winsemius > Alameda, CA, USA > > 'Any technology distinguishable from magic is insufficiently advanced.' > -Gehm's Corollary to Clarke's Third Law > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.