Re: [R] How to define proper breaks in RFM analysis

Jeff Newmiller Mon, 23 Oct 2017 00:41:55 -0700

Using quantiles does not imply assumption of normality, unless you drag that 
assumption in separately. Please go review statistics again, offlist, and come 
back when you need help with R.
-- 
Sent from my phone. Please excuse my brevity.


On October 22, 2017 10:02:57 PM PDT, Hemant Sain <hemantsai...@gmail.com> wrote:
>hello,
>I'm confused what you guys are talking about.
>i just want to set ideal threshold values for my RFM scores which can
>be
>done using Quantiles but i don't want to use quantiles because my data
>is
>not normally distributed so it will lead to wrong ranges of breaks. to
>fix
>this problem I'm looking for an approach which can define the ideal
>range
>to breaks to categorize RFM scores into 3 segments.
>that's all i want.
>THanks
>
>
>On 14 October 2017 at 04:24, Jim Lemon <drjimle...@gmail.com> wrote:
>
>> Hemant's problem is that the indicators are not distributed
>uniformly.
>> With a uniform distribution, categorization gives a reasonably
>optimal
>> separation of cases. One approach would be to drop categorization and
>> calculate the overall score as the mean of the standardized indicator
>> scores. Whether this is an option I do not know. I did offer an
>> "eyeball" set of breaks in a previous email, but apparently this was
>> not sufficient.
>>
>> Jim
>>
>> On Sat, Oct 14, 2017 at 4:27 AM, David Winsemius
><dwinsem...@comcast.net>
>> wrote:
>> >
>> >> On Oct 13, 2017, at 2:51 AM, PIKAL Petr <petr.pi...@precheza.cz>
>wrote:
>> >>
>> >> Hi
>> >>
>> >> You expect us to solve your problem but you ignore advice already
>> recieved.
>> >>
>> >> Your data are unreadable, use dput(yourdata) instead. see ?dput
>> >>
>> >>> test<-read.table("clipboard", heade=T)
>> >> Error in scan(file = file, what = what, sep = sep, quote = quote,
>dec =
>> dec,  :
>> >>  line 115 did not have 6 elements
>> >
>> > I didn't have such a problem: (illustrated with a more minimal
>example)
>> >
>> > dat <-  scan( what=list("",1,"",1L,1L,1),
>> >              text="194849 6.99 8/22/2017 9 5 9.996
>> > 194978 14.78 8/28/2017 3 15 16.308
>> > 198614 18.44 7/31/2017 31 1 18.44
>> > 234569 34.99 8/20/2017 11 8 13.5075
>> > 252686 7.99 7/31/2017 31 2 7.99
>> > 291719 21.26 8/25/2017 6 2 15.67
>> > 291787 46.1 8/31/2017 0 2 32.57
>> > 292630 24.34 7/31/2017 31 1 24.34
>> > 295204 21.86 7/18/2017 44 1 21.86
>> > 295989 8.98 8/20/2017 11 2 14.095
>> > 298883 14.38 8/24/2017 7 2 11.185
>> > 308824 10.77 7/31/2017 31 1 10.77")
>> >
>> > names(dat) <- c("user_id", "subtotal_amount", "created_at",
>"Recency",
>> "Frequency", "Monetary")
>> > dat <- data.frame(dat,stringsAsFactors=FALSE)
>> >
>> > I suspect read.table would also have worked for me, but I was
>expecting
>> difficulties based on Petr's posting.
>> >
>> >
>> > #And ended up with this result (on the original copied data):
>> >> str(dat)
>> > 'data.frame':   500 obs. of  6 variables:
>> >  $ user_id        : chr  "194849" "194978" "198614" "234569" ...
>> >  $ subtotal_amount: num  6.99 14.78 18.44 34.99 7.99 ...
>> >  $ created_at     : chr  "8/22/2017" "8/28/2017" "7/31/2017"
>"8/20/2017"
>> ...
>> >  $ Recency        : int  9 3 31 11 31 6 0 31 44 11 ...
>> >  $ Frequency      : int  5 15 1 8 2 2 2 1 1 2 ...
>> >  $ Monetary       : num  10 16.31 18.44 13.51 7.99 ...
>> >
>> > ...  but the following criticism seems, well, _critical_ (as in
>> essential for one to address if a reasonable proposal is to be
>offered.)
>> >
>> >
>> >> What is „ideal interval“ can you define it? Should it be such to
>> provide eqal number of observations?
>> >
>> > That is the crucial question for you to answer, Hemant. Read the
>> ?quartile help page if your answer is "yes" or even "maybe".
>> >>
>> >> Or maybe you could normalise your values and use quartile method.
>> >
>> > Well, maybe not so much on that last one, Petr. Normalization
>should not
>> affect the classification based on quartiles. It doesn't change the
>> ordering of variables.
>> >
>> > --
>> > David.
>> >
>> >>
>> >> Cheers
>> >> Petr
>> >>
>> >> From: Hemant Sain [mailto:hemantsai...@gmail.com]
>> >> Sent: Friday, October 13, 2017 8:51 AM
>> >> To: PIKAL Petr <petr.pi...@precheza.cz>
>> >> Cc: r-help mailing list <r-help@r-project.org>
>> >> Subject: Re: [R] How to define proper breaks in RFM analysis
>> >>
>> >> Hey,
>> >> i want to define 3 ideal breaks (bin) for each variable one of
>those
>> variables is attached in the previous email,
>> >> i don't want to consider quartile method because quartile is not
>> working ideally for that data set because data distribution is non
>normal.
>> >> so i want you to suggest another method so that i can define 3
>breaks
>> with the ideal interval for Recency, frequency and monetary to
>calculate
>> RFM score.
>> >> i'm again attaching you some of the data set.
>> >> please look into it and help me with the R code.
>> >> Thanks
>> >>
>> >>
>> >>
>> >> Data
>> >>
>> >> user_id
>> >>
>> >> subtotal_amount
>> >>
>> >> created_at
>> >>
>> >> Recency
>> >>
>> >> Frequency
>> >>
>> >> Monetary
>> >>
>> >> 194849
>> >>
>> >> 6.99
>> >>
>> >> 8/22/2017
>> >>
>> > snipped
>> >
>> >>
>> >>
>> >> On 13 October 2017 at 10:35, PIKAL Petr
><petr.pi...@precheza.cz<mailto:
>> petr.pi...@precheza.cz>> wrote:
>> >> Hi
>> >>
>> >> Your statement about attaching data is problematic. We cannot do
>much
>> with it. Instead use output from dput(yourdata) to show us what
>exactly
>> your data look like.
>> >>
>> >> We also do not know how do you want to split your data. It would
>be
>> nice if you can show also what should be the bins with respective
>data.
>> Unless you provide this information you probably would not get any
>sensible
>> answer.
>> >>
>> >> Cheers
>> >> Petr
>> >>
>> >>
>> >>> -----Original Message-----
>> >>> From: R-help [mailto:r-help-boun...@r-project.org<mailto:r-help-
>> boun...@r-project.org>] On Behalf Of Hemant Sain
>> >>> Sent: Thursday, October 12, 2017 10:18 AM
>> >>> To: r-help mailing list <r-help@r-project.org<mailto:r
>> -h...@r-project.org>>
>> >>> Subject: [R] How to define proper breaks in RFM analysis
>> >>>
>> >>> Hello,
>> >>> I'm working on RFM analysis and i wanted to define my own breaks
>but my
>> >>> frequency distribution is not normally distributed so when I'm
>using
>> quartile its
>> >>> not giving the optimal results.
>> >>> so I'm looking for a better approach where i can define breaks
>> dynamically
>> >>> because after visualization i can do it easily but i want to
>apply
>> this model so
>> >>> that it can automatically define the breaks according to data
>set.
>> >>> I'm attaching sample data for reference.
>> >>>
>> >>> Thanks
>> >>>
>> >>>                           *Freq*
>> >>> 5
>> >>> 15
>> >>> 1
>> > snipped
>> >> .
>> >>
>> >>       [[alternative HTML version deleted]]
>> >>
>> >> ______________________________________________
>> >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> >> https://stat.ethz.ch/mailman/listinfo/r-help
>> >> PLEASE do read the posting guide http://www.R-project.org/
>> posting-guide.html
>> >> and provide commented, minimal, self-contained, reproducible code.
>> >
>> > David Winsemius
>> > Alameda, CA, USA
>> >
>> > 'Any technology distinguishable from magic is insufficiently
>advanced.'
>>  -Gehm's Corollary to Clarke's Third Law
>> >
>> > ______________________________________________
>> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide http://www.R-project.org/
>> posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
>-- 
>hemantsain.com
>
>       [[alternative HTML version deleted]]
>
>______________________________________________
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] How to define proper breaks in RFM analysis

Reply via email to