Hemant's problem is that the indicators are not distributed uniformly.
With a uniform distribution, categorization gives a reasonably optimal
separation of cases. One approach would be to drop categorization and
calculate the overall score as the mean of the standardized indicator
scores. Whether this is an option I do not know. I did offer an
"eyeball" set of breaks in a previous email, but apparently this was
not sufficient.

Jim

On Sat, Oct 14, 2017 at 4:27 AM, David Winsemius <dwinsem...@comcast.net> wrote:
>
>> On Oct 13, 2017, at 2:51 AM, PIKAL Petr <petr.pi...@precheza.cz> wrote:
>>
>> Hi
>>
>> You expect us to solve your problem but you ignore advice already recieved.
>>
>> Your data are unreadable, use dput(yourdata) instead. see ?dput
>>
>>> test<-read.table("clipboard", heade=T)
>> Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, 
>>  :
>>  line 115 did not have 6 elements
>
> I didn't have such a problem: (illustrated with a more minimal example)
>
> dat <-  scan( what=list("",1,"",1L,1L,1),
>              text="194849 6.99 8/22/2017 9 5 9.996
> 194978 14.78 8/28/2017 3 15 16.308
> 198614 18.44 7/31/2017 31 1 18.44
> 234569 34.99 8/20/2017 11 8 13.5075
> 252686 7.99 7/31/2017 31 2 7.99
> 291719 21.26 8/25/2017 6 2 15.67
> 291787 46.1 8/31/2017 0 2 32.57
> 292630 24.34 7/31/2017 31 1 24.34
> 295204 21.86 7/18/2017 44 1 21.86
> 295989 8.98 8/20/2017 11 2 14.095
> 298883 14.38 8/24/2017 7 2 11.185
> 308824 10.77 7/31/2017 31 1 10.77")
>
> names(dat) <- c("user_id", "subtotal_amount", "created_at", "Recency", 
> "Frequency", "Monetary")
> dat <- data.frame(dat,stringsAsFactors=FALSE)
>
> I suspect read.table would also have worked for me, but I was expecting 
> difficulties based on Petr's posting.
>
>
> #And ended up with this result (on the original copied data):
>> str(dat)
> 'data.frame':   500 obs. of  6 variables:
>  $ user_id        : chr  "194849" "194978" "198614" "234569" ...
>  $ subtotal_amount: num  6.99 14.78 18.44 34.99 7.99 ...
>  $ created_at     : chr  "8/22/2017" "8/28/2017" "7/31/2017" "8/20/2017" ...
>  $ Recency        : int  9 3 31 11 31 6 0 31 44 11 ...
>  $ Frequency      : int  5 15 1 8 2 2 2 1 1 2 ...
>  $ Monetary       : num  10 16.31 18.44 13.51 7.99 ...
>
> ...  but the following criticism seems, well, _critical_ (as in essential for 
> one to address if a reasonable proposal is to be offered.)
>
>
>> What is „ideal interval“ can you define it? Should it be such to provide 
>> eqal number of observations?
>
> That is the crucial question for you to answer, Hemant. Read the ?quartile 
> help page if your answer is "yes" or even "maybe".
>>
>> Or maybe you could normalise your values and use quartile method.
>
> Well, maybe not so much on that last one, Petr. Normalization should not 
> affect the classification based on quartiles. It doesn't change the ordering 
> of variables.
>
> --
> David.
>
>>
>> Cheers
>> Petr
>>
>> From: Hemant Sain [mailto:hemantsai...@gmail.com]
>> Sent: Friday, October 13, 2017 8:51 AM
>> To: PIKAL Petr <petr.pi...@precheza.cz>
>> Cc: r-help mailing list <r-help@r-project.org>
>> Subject: Re: [R] How to define proper breaks in RFM analysis
>>
>> Hey,
>> i want to define 3 ideal breaks (bin) for each variable one of those 
>> variables is attached in the previous email,
>> i don't want to consider quartile method because quartile is not working 
>> ideally for that data set because data distribution is non normal.
>> so i want you to suggest another method so that i can define 3 breaks with 
>> the ideal interval for Recency, frequency and monetary to calculate RFM 
>> score.
>> i'm again attaching you some of the data set.
>> please look into it and help me with the R code.
>> Thanks
>>
>>
>>
>> Data
>>
>> user_id
>>
>> subtotal_amount
>>
>> created_at
>>
>> Recency
>>
>> Frequency
>>
>> Monetary
>>
>> 194849
>>
>> 6.99
>>
>> 8/22/2017
>>
> snipped
>
>>
>>
>> On 13 October 2017 at 10:35, PIKAL Petr 
>> <petr.pi...@precheza.cz<mailto:petr.pi...@precheza.cz>> wrote:
>> Hi
>>
>> Your statement about attaching data is problematic. We cannot do much with 
>> it. Instead use output from dput(yourdata) to show us what exactly your data 
>> look like.
>>
>> We also do not know how do you want to split your data. It would be nice if 
>> you can show also what should be the bins with respective data. Unless you 
>> provide this information you probably would not get any sensible answer.
>>
>> Cheers
>> Petr
>>
>>
>>> -----Original Message-----
>>> From: R-help 
>>> [mailto:r-help-boun...@r-project.org<mailto:r-help-boun...@r-project.org>] 
>>> On Behalf Of Hemant Sain
>>> Sent: Thursday, October 12, 2017 10:18 AM
>>> To: r-help mailing list <r-help@r-project.org<mailto:r-help@r-project.org>>
>>> Subject: [R] How to define proper breaks in RFM analysis
>>>
>>> Hello,
>>> I'm working on RFM analysis and i wanted to define my own breaks but my
>>> frequency distribution is not normally distributed so when I'm using 
>>> quartile its
>>> not giving the optimal results.
>>> so I'm looking for a better approach where i can define breaks dynamically
>>> because after visualization i can do it easily but i want to apply this 
>>> model so
>>> that it can automatically define the breaks according to data set.
>>> I'm attaching sample data for reference.
>>>
>>> Thanks
>>>
>>>                           *Freq*
>>> 5
>>> 15
>>> 1
> snipped
>> .
>>
>>       [[alternative HTML version deleted]]
>>
>> ______________________________________________
>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> https://stat.ethz.ch/mailman/listinfo/r-help
>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
>> and provide commented, minimal, self-contained, reproducible code.
>
> David Winsemius
> Alameda, CA, USA
>
> 'Any technology distinguishable from magic is insufficiently advanced.'   
> -Gehm's Corollary to Clarke's Third Law
>
> ______________________________________________
> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> https://stat.ethz.ch/mailman/listinfo/r-help
> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
> and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to