Dear R user, I am using UK census data on travel to work. The authorities have provided a breakdown in each area by mode (car, bicycle etc.) and distance travelled (0 – 2 km, 2 – 5 km etc). Therefore, after processing, the data for Sheffield look like this https://files.one.ubuntu.com/ej2VtVbJTEaelvMRlsocRg :
dshef <- read.table("distmodesheff.csv", sep=",", header=TRUE) print(dshef) Dist Tr Bici Met Pas Foot Bus Car 1 2 > 45 571 491 2125 16644 4469 13494 2 2 – 5 80 1136 2540 4738 3659 17290 30212 3 5 – 10 217 466 2335 3994 1041 12963 35221 4 10 – 20 191 76 491 1333 332 2439 16322 5 20 – 30 168 6 25 235 41 175 3711 6 30 – 40 78 6 3 122 20 74 2179 7 40 – 60 349 6 21 261 96 333 3501 8 60 < 332 62 125 369 534 433 3276 9 Other 148 40 79 905 388 622 6481 It's interesting to look at the different distributions of different transport modes: attach(dshef) rs <- rbind(Tr,Bici,Met,Pas,Foot,Bus,Car) barplot(rs, beside=TRUE, names=Dist, col=rainbow(7), legend=TRUE) http://r.789695.n4.nabble.com/file/n3758198/1.png This is brilliant, and creates output similar to that of OO calc: http://r.789695.n4.nabble.com/file/n3758198/egraphmini.jpg However, as you can see, the pre-made categories (0 – 2 km etc.) are unevenly spaced bins within a continuous variable. This puts the analysis into histogram mode (with frequency determined by the area, not the height). What I would look for for the vector Car, for example, would be something like this: n <- c(rep(1.5,Car[1]), rep(3,Car[2]), rep(7.7,Car[3]), rep(15,Car[4]),rep(25,Car[5]), rep(35,Car[6]), rep(50,Car[7]), rep(100,Car[8]) ) hist(n, breaks=c(0,2,5,10,20,30,40,60,200)) http://r.789695.n4.nabble.com/file/n3758198/2.png This produces a histogram, but it's a tedious an ugly way of getting there. Also, this does not allow for trend-line analysis of the likely distribution of the continuous variable distance: lines(density(n)), for example results in peaks around my arbitrary value. Has anyone else encountered similar issues? I've searched high and low but can find no solution other than creating a barplot with variable widths: http://r.789695.n4.nabble.com/Histogram-using-frequency-data-td827927.html Any ideas about how to resolve this issue very greatly appreciated. Eventually I hope to model the distribution of distances travelled in order to estimate the mean distance within each bin. Many thanks, Robin -- View this message in context: http://r.789695.n4.nabble.com/Histogram-from-frequency-data-in-pre-made-bins-tp3758198p3758198.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.