on 06/23/2008 03:40 PM Thomas Frööjd said the following:
1.       Shift the mean and std on the reference dataset to the mean
and std of my clinic birth weight data.

to shift the mean by any distance, just add or subtract that distance from each observation (e.g., to move mean from m1 to m2, to each observation add (m2 -m1) ).

to shift the stddev, from, say, s1 to s2, multiply each observation by s2/s1

instead of shifting ref dataset to mean/sdev of other dataset, it might be more intuitive to transform both to mean=0, sdev=1.

2.       Scale the data so they can be plotted on the same axis. The
reference dataset has around 20 000 observations and my data from the
clinic only around 3000 so I have to fix this otherwise the plot of
the reference datset will be much bigger in the graph.

if you do a density plot (see ?density in R), it will automatically be scaled. if you want the histogram scaled too, then after calculating the histogram frequencies, multiply them by a ratio of numberofobs for your data, and number of obs for reference data (i.e.: NOBS_yourdata / NOSB_refdata)

but i'd say, you might do better to just work with a density plot and set the appropriate bandwidth parameter, rather than working with a histogram, for presentational purposes.

3.       Plot both on the same graph. The reference dataset like a
density plot and my dataset as a histogram, that means weight bins on
the x axis and number of observations on y. It should be added that my
reference dataset isn't truly continuous but recorded at 100g
intervals. This means both datasets have the same grouping however
plotting both as histogram would probably make it harder to understand
for a person with little training in statistics. This means that the
reference dataset "density function" has to be smoothed somehow.

see ?density, set the appropriate bandwidth parameter to achieve your desired degree of smoothing.

I would be very thankful for help on any of those steps. Also if you
think this approach is wrong for some reason please tell me.

i think you'd have a much easier time of it (and also a better-looking and more informative plot), if you plot both as density on the same plot, and forgo the histogram overlay. your reference dataset will be a nice smooth histogram, as long as you choose a wide enough bandwidth to avoid showing peaks every 100g, and your target dataset will have large peaks at 2, 2.5, 3, etc. will look very nice salient. :)

also, unless the babies in both plots are from different species :), you probably don't need to transform the data to equalize means and variances.

-d

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to