On Nov 27, 2011, at 12:15 AM, Jeffrey Joh wrote:


I'm trying to do the second case among Jim's suggestions. I used Bert's suggestion and it works great.

I would also like to ask if anyone is familiar with a package for making box-plots. I would like to bin my datapoints at defined X intervals and display a boxplot for each bin on the same chart.

Combining `cut` (to define the intervals) and `boxplot` should be fairly straight-forward.


In Stata, there is a tool for making these, and it varies the width of the boxplot based on the number of points in each plot.

We have a tool for that, too. Study `quantile` a bit, to automatically pick cutpoints that will divide into approximately equal groups.

(I use the `cut2` function in the Hmisc package, because it is integrated with `rms` that I use all the time, and because its defaults for cut()-ting are more to my liking. It also has a "g=" parameter that automates the cut( ..., quantile(...)) processing.



I am hoping there is a similar tool for R.

Thank you,
Jeffrey

----------------------------------------
Date: Tue, 22 Nov 2011 18:51:05 +1100
From: j...@bitwrit.com.au
To: johjeff...@hotmail.com
CC: r-help@r-project.org
Subject: Re: [R] Binned line plot

On 11/22/2011 04:29 PM, Jeffrey Joh wrote:

I have a scatter plot with 10000 points. I would like to add a line that bins every 50 points and connects the average of each bin. I'm looking for something similar to line type "m" in Stata.

With this dataset of 10000 points, I would also like to bin the data and make boxplots at certain intervals, so that I have a set of boxplots to represent each bin. I would also like the width of each box to be proportional to the number of points in each bin.

How can I make these plots? Is there a simple package to use?

Hi Jeffrey,
There are three possibilities that come to mind:

1) You want to bin the points based on their order in the data frame.

2) You want to bin the points based on the x or y values of the coordinates.

3) You want to bin the points based on the x _and_ y values of the
coordinates.

Number 1 is trivial and has already been answered (assume a two column
data frame of coordinates named "xypoints").

#first point - set up a loop to get a vector of averages
meanx<-rep(0,200)
meany<-rep(0,200)
for(index in 1:200) {
start<-1+50*(index-1)
meanx[index]<-mean(xypoints[start:(start+49),"x"])
meany[index]<-mean(xypoints[start:(start+49),"y"])
}
plot(meanx,meany,type="l")

Number 2 requires that you sort the pairs based on the value of the one you want, then apply the same process as 1 to the sorted pairs. Number 3
is somewhat more difficult.

I don't do this much, and some of the people who do map analysis will
probably come up with a much better method.

Find the most extreme point.
Find the 49 points closest to that point to constitute group 1.
Remove those points from the data frame.
Go back to the first step if there are any points left.

You will end up with 200 groups of points that are spatially grouped.
Get the centroids and plot as above.

Another wild guess from

Jim
                        

David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to