Re: [R] Need ideas on how to show spikes in my data and how to code it in R

Daniel Folkinshteyn Mon, 23 Jun 2008 13:11:25 -0700

on 06/23/2008 03:40 PM Thomas Frööjd said the following:

1.       Shift the mean and std on the reference dataset to the mean
and std of my clinic birth weight data.

to shift the mean by any distance, just add or subtract that distancefrom each observation (e.g., to move mean from m1 to m2, to eachobservation add (m2 -m1) ).


to shift the stddev, from, say, s1 to s2, multiply each observation by s2/s1

instead of shifting ref dataset to mean/sdev of other dataset, it mightbe more intuitive to transform both to mean=0, sdev=1.

2.       Scale the data so they can be plotted on the same axis. The
reference dataset has around 20 000 observations and my data from the
clinic only around 3000 so I have to fix this otherwise the plot of
the reference datset will be much bigger in the graph.

if you do a density plot (see ?density in R), it will automatically bescaled. if you want the histogram scaled too, then after calculating thehistogram frequencies, multiply them by a ratio of numberofobs for yourdata, and number of obs for reference data (i.e.: NOBS_yourdata /NOSB_refdata)

but i'd say, you might do better to just work with a density plot andset the appropriate bandwidth parameter, rather than working with ahistogram, for presentational purposes.

3.       Plot both on the same graph. The reference dataset like a
density plot and my dataset as a histogram, that means weight bins on
the x axis and number of observations on y. It should be added that my
reference dataset isn't truly continuous but recorded at 100g
intervals. This means both datasets have the same grouping however
plotting both as histogram would probably make it harder to understand
for a person with little training in statistics. This means that the
reference dataset "density function" has to be smoothed somehow.

see ?density, set the appropriate bandwidth parameter to achieve yourdesired degree of smoothing.

I would be very thankful for help on any of those steps. Also if you
think this approach is wrong for some reason please tell me.

i think you'd have a much easier time of it (and also a better-lookingand more informative plot), if you plot both as density on the sameplot, and forgo the histogram overlay. your reference dataset will be anice smooth histogram, as long as you choose a wide enough bandwidth toavoid showing peaks every 100g, and your target dataset will have largepeaks at 2, 2.5, 3, etc. will look very nice salient. :)

also, unless the babies in both plots are from different species :), youprobably don't need to transform the data to equalize means and variances.


-d

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] Need ideas on how to show spikes in my data and how to code it in R

Reply via email to