Using quantiles does not imply assumption of normality, unless you drag that 
assumption in separately. Please go review statistics again, offlist, and come 
back when you need help with R.
-- 
Sent from my phone. Please excuse my brevity.

On October 22, 2017 10:02:57 PM PDT, Hemant Sain <hemantsai...@gmail.com> wrote:
>hello,
>I'm confused what you guys are talking about.
>i just want to set ideal threshold values for my RFM scores which can
>be
>done using Quantiles but i don't want to use quantiles because my data
>is
>not normally distributed so it will lead to wrong ranges of breaks. to
>fix
>this problem I'm looking for an approach which can define the ideal
>range
>to breaks to categorize RFM scores into 3 segments.
>that's all i want.
>THanks
>
>
>On 14 October 2017 at 04:24, Jim Lemon <drjimle...@gmail.com> wrote:
>
>> Hemant's problem is that the indicators are not distributed
>uniformly.
>> With a uniform distribution, categorization gives a reasonably
>optimal
>> separation of cases. One approach would be to drop categorization and
>> calculate the overall score as the mean of the standardized indicator
>> scores. Whether this is an option I do not know. I did offer an
>> "eyeball" set of breaks in a previous email, but apparently this was
>> not sufficient.
>>
>> Jim
>>
>> On Sat, Oct 14, 2017 at 4:27 AM, David Winsemius
><dwinsem...@comcast.net>
>> wrote:
>> >
>> >> On Oct 13, 2017, at 2:51 AM, PIKAL Petr <petr.pi...@precheza.cz>
>wrote:
>> >>
>> >> Hi
>> >>
>> >> You expect us to solve your problem but you ignore advice already
>> recieved.
>> >>
>> >> Your data are unreadable, use dput(yourdata) instead. see ?dput
>> >>
>> >>> test<-read.table("clipboard", heade=T)
>> >> Error in scan(file = file, what = what, sep = sep, quote = quote,
>dec =
>> dec,  :
>> >>  line 115 did not have 6 elements
>> >
>> > I didn't have such a problem: (illustrated with a more minimal
>example)
>> >
>> > dat <-  scan( what=list("",1,"",1L,1L,1),
>> >              text="194849 6.99 8/22/2017 9 5 9.996
>> > 194978 14.78 8/28/2017 3 15 16.308
>> > 198614 18.44 7/31/2017 31 1 18.44
>> > 234569 34.99 8/20/2017 11 8 13.5075
>> > 252686 7.99 7/31/2017 31 2 7.99
>> > 291719 21.26 8/25/2017 6 2 15.67
>> > 291787 46.1 8/31/2017 0 2 32.57
>> > 292630 24.34 7/31/2017 31 1 24.34
>> > 295204 21.86 7/18/2017 44 1 21.86
>> > 295989 8.98 8/20/2017 11 2 14.095
>> > 298883 14.38 8/24/2017 7 2 11.185
>> > 308824 10.77 7/31/2017 31 1 10.77")
>> >
>> > names(dat) <- c("user_id", "subtotal_amount", "created_at",
>"Recency",
>> "Frequency", "Monetary")
>> > dat <- data.frame(dat,stringsAsFactors=FALSE)
>> >
>> > I suspect read.table would also have worked for me, but I was
>expecting
>> difficulties based on Petr's posting.
>> >
>> >
>> > #And ended up with this result (on the original copied data):
>> >> str(dat)
>> > 'data.frame':   500 obs. of  6 variables:
>> >  $ user_id        : chr  "194849" "194978" "198614" "234569" ...
>> >  $ subtotal_amount: num  6.99 14.78 18.44 34.99 7.99 ...
>> >  $ created_at     : chr  "8/22/2017" "8/28/2017" "7/31/2017"
>"8/20/2017"
>> ...
>> >  $ Recency        : int  9 3 31 11 31 6 0 31 44 11 ...
>> >  $ Frequency      : int  5 15 1 8 2 2 2 1 1 2 ...
>> >  $ Monetary       : num  10 16.31 18.44 13.51 7.99 ...
>> >
>> > ...  but the following criticism seems, well, _critical_ (as in
>> essential for one to address if a reasonable proposal is to be
>offered.)
>> >
>> >
>> >> What is „ideal interval“ can you define it? Should it be such to
>> provide eqal number of observations?
>> >
>> > That is the crucial question for you to answer, Hemant. Read the
>> ?quartile help page if your answer is "yes" or even "maybe".
>> >>
>> >> Or maybe you could normalise your values and use quartile method.
>> >
>> > Well, maybe not so much on that last one, Petr. Normalization
>should not
>> affect the classification based on quartiles. It doesn't change the
>> ordering of variables.
>> >
>> > --
>> > David.
>> >
>> >>
>> >> Cheers
>> >> Petr
>> >>
>> >> From: Hemant Sain [mailto:hemantsai...@gmail.com]
>> >> Sent: Friday, October 13, 2017 8:51 AM
>> >> To: PIKAL Petr <petr.pi...@precheza.cz>
>> >> Cc: r-help mailing list <r-help@r-project.org>
>> >> Subject: Re: [R] How to define proper breaks in RFM analysis
>> >>
>> >> Hey,
>> >> i want to define 3 ideal breaks (bin) for each variable one of
>those
>> variables is attached in the previous email,
>> >> i don't want to consider quartile method because quartile is not
>> working ideally for that data set because data distribution is non
>normal.
>> >> so i want you to suggest another method so that i can define 3
>breaks
>> with the ideal interval for Recency, frequency and monetary to
>calculate
>> RFM score.
>> >> i'm again attaching you some of the data set.
>> >> please look into it and help me with the R code.
>> >> Thanks
>> >>
>> >>
>> >>
>> >> Data
>> >>
>> >> user_id
>> >>
>> >> subtotal_amount
>> >>
>> >> created_at
>> >>
>> >> Recency
>> >>
>> >> Frequency
>> >>
>> >> Monetary
>> >>
>> >> 194849
>> >>
>> >> 6.99
>> >>
>> >> 8/22/2017
>> >>
>> > snipped
>> >
>> >>
>> >>
>> >> On 13 October 2017 at 10:35, PIKAL Petr
><petr.pi...@precheza.cz<mailto:
>> petr.pi...@precheza.cz>> wrote:
>> >> Hi
>> >>
>> >> Your statement about attaching data is problematic. We cannot do
>much
>> with it. Instead use output from dput(yourdata) to show us what
>exactly
>> your data look like.
>> >>
>> >> We also do not know how do you want to split your data. It would
>be
>> nice if you can show also what should be the bins with respective
>data.
>> Unless you provide this information you probably would not get any
>sensible
>> answer.
>> >>
>> >> Cheers
>> >> Petr
>> >>
>> >>
>> >>> -----Original Message-----
>> >>> From: R-help [mailto:r-help-boun...@r-project.org<mailto:r-help-
>> boun...@r-project.org>] On Behalf Of Hemant Sain
>> >>> Sent: Thursday, October 12, 2017 10:18 AM
>> >>> To: r-help mailing list <r-help@r-project.org<mailto:r
>> -h...@r-project.org>>
>> >>> Subject: [R] How to define proper breaks in RFM analysis
>> >>>
>> >>> Hello,
>> >>> I'm working on RFM analysis and i wanted to define my own breaks
>but my
>> >>> frequency distribution is not normally distributed so when I'm
>using
>> quartile its
>> >>> not giving the optimal results.
>> >>> so I'm looking for a better approach where i can define breaks
>> dynamically
>> >>> because after visualization i can do it easily but i want to
>apply
>> this model so
>> >>> that it can automatically define the breaks according to data
>set.
>> >>> I'm attaching sample data for reference.
>> >>>
>> >>> Thanks
>> >>>
>> >>>                           *Freq*
>> >>> 5
>> >>> 15
>> >>> 1
>> > snipped
>> >> .
>> >>
>> >>       [[alternative HTML version deleted]]
>> >>
>> >> ______________________________________________
>> >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> >> https://stat.ethz.ch/mailman/listinfo/r-help
>> >> PLEASE do read the posting guide http://www.R-project.org/
>> posting-guide.html
>> >> and provide commented, minimal, self-contained, reproducible code.
>> >
>> > David Winsemius
>> > Alameda, CA, USA
>> >
>> > 'Any technology distinguishable from magic is insufficiently
>advanced.'
>>  -Gehm's Corollary to Clarke's Third Law
>> >
>> > ______________________________________________
>> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>> > https://stat.ethz.ch/mailman/listinfo/r-help
>> > PLEASE do read the posting guide http://www.R-project.org/
>> posting-guide.html
>> > and provide commented, minimal, self-contained, reproducible code.
>>
>
>
>
>-- 
>hemantsain.com
>
>       [[alternative HTML version deleted]]
>
>______________________________________________
>R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
>https://stat.ethz.ch/mailman/listinfo/r-help
>PLEASE do read the posting guide
>http://www.R-project.org/posting-guide.html
>and provide commented, minimal, self-contained, reproducible code.

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to