Re: [R] Problem with data distribution

Neha gupta Thu, 17 Feb 2022 11:56:32 -0800

Dear John, thanks a lot for the detailed answer.

Yes, I am not an expert in R language and when a problem comes in, I google
it or post it on these forums. (I have just a little bit experience of ML
in R).




On Thu, Feb 17, 2022 at 8:21 PM John Fox <j...@mcmaster.ca> wrote:

> Dear Nega gupta,
>
> On 2022-02-17 1:54 p.m., Neha gupta wrote:
> > Hello everyone
> >
> > I have a dataset with output variable "bug" having the following values
> (at
> > the bottom of this email). My advisor asked me to provide data
> distribution
> > of bugs with 0 values and bugs with more than 0 values.
> >
> > data = readARFF("synapse.arff")
> > data2 = readARFF("synapse.arff")
> > data$bug
> > library(tidyverse)
> > data %>%
> >    filter(bug == 0)
> > data2 %>%
> >    filter(bug >= 1)
> > boxplot(data2$bug, data$bug, range=0)
> >
> > But both the graphs are exactly the same, how is it possible? Where I am
> > doing wrong?
>
> As it turns out, you're doing several things wrong.
>
> First, you're not using pipes and filter() correctly. That is, you don't
> do anything with the filtered versions of the data sets. You're
> apparently under the incorrect impression that filtering modifies the
> original data set.
>
> Second, you're greatly complicating a simple problem. You don't need to
> read the data twice and keep two versions of the data set. As well,
> processing the data with pipes and filter() is entirely unnecessary. The
> following code works:
>
>     with(data, boxplot(bug[bug == 0], bug[bug >= 1], range=0))
>
> Third, and most fundamentally, the parallel boxplots you're apparently
> trying to construct don't really make sense. The first "boxplot" is just
> a horizontal line at 0 and so conveys no information. Why not just plot
> the nonzero values if that's what you're interested in?
>
> Fourth, you didn't share your data in a convenient form. I was able to
> reconstruct them via
>
>    bug <- scan()
>    0 1 0 0 0 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 0 0 0 0 1 0 0 0 0 0
>    0 4 1 0
>    0 1 0 0 0 0 0 0 1 0 3 2 0 0 0 0 3 0 0 0 0 2 0 0 0 1 0 0 0 0 1 1 1 0 0
>    0 0 0 0
>    1 1 2 1 0 1 0 0 0 2 2 1 1 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 5 0 0 0 0 0 0
>    7 0 0 1
>    0 1 1 0 2 0 3 0 1 0 0 1 0 0 0 0 0 1 1 0 0 0 0 1 0 3 2 1 1 0 0 0 0 0 0
>    0 1 0 0
>    0 0 0 0 0 0 0 0 0 1 0 1 0 0 3 0 0 1 0 1 3 0 0 0 0 0 0 0 0 1 0 4 1 1 0
>    0 0 0 1
>    0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 1 0 0 0 0 0
>
>    data <- data.frame(bug)
>
> Finally, it's better not to post to the list in plain-text email, rather
> than html (as the posting guide suggests).
>
> I hope this helps,
>   John
>
> >
> >
> > data$bug
> >    [1] 0 1 0 0 0 1 2 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 2 0 0 0 0 1 0 0 0
> 0 0
> > 0 4 1 0
> >   [40] 0 1 0 0 0 0 0 0 1 0 3 2 0 0 0 0 3 0 0 0 0 2 0 0 0 1 0 0 0 0 1 1 1
> 0 0
> > 0 0 0 0
> >   [79] 1 1 2 1 0 1 0 0 0 2 2 1 1 0 0 0 0 0 0 1 0 0 1 0 0 1 0 0 5 0 0 0 0
> 0 0
> > 7 0 0 1
> > [118] 0 1 1 0 2 0 3 0 1 0 0 1 0 0 0 0 0 1 1 0 0 0 0 1 0 3 2 1 1 0 0 0 0
> 0 0
> > 0 1 0 0
> > [157] 0 0 0 0 0 0 0 0 0 1 0 1 0 0 3 0 0 1 0 1 3 0 0 0 0 0 0 0 0 1 0 4 1
> 1 0
> > 0 0 0 1
> > [196] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 3 0 1 0 0 0 0 0
> >
> >       [[alternative HTML version deleted]]
> >
> > ______________________________________________
> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
> > https://stat.ethz.ch/mailman/listinfo/r-help
> > PLEASE do read the posting guide
> http://www.R-project.org/posting-guide.html
> > and provide commented, minimal, self-contained, reproducible code.
> --
> John Fox, Professor Emeritus
> McMaster University
> Hamilton, Ontario, Canada
> web: https://socialsciences.mcmaster.ca/jfox/
>
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Problem with data distribution

Reply via email to