[R] Producing multiple analyses (histograms/kernel densities) of network timings between groups

Jack Challen Wed, 14 Aug 2013 12:16:46 -0700

(This is a repost from a little while ago. I assume my mail got silently 
bounced because I used some rather strange email routing. If it did get 
through, and I simply haven't seen it or a response, then please accept my 
apologies)
Hi,


I'm new to R, and new to statistics. I'm *trying* to learn R, but I'm 
struggling with the R-intro, mainly (I think) due to the fact that I have no 
background in stats, and some of the language is unfamiliar to me (I started 
with C and Perl, mainly) so I might use the wrong terms. I think the "R in 
action" book might help, but recommendations are welcome.


I have a whole bunch of network timings (ICMP echos) between different groups 
of nodes using two different networks. I want to compare the timings between 
the groups and across networks, as I /believe/ that one network has much 
greater variability than the other. I want to prove this, one way or the other, 
and I think a graphical view of the ~20000 results would help. The initial 
histograms/kernel densities I've produced so far support that theory (i.e. they 
look a bit like the Normal distribution, but one network is much more 
"stretched out" and "bumpy"), but I've resorted to pre-processing that data in 
Perl in order to produce the graphs. I think R can be used to do all of this in 
one.

For each network, I have files like this:

===
RoomA RoomB 0.34
RoomC RoomA 0.12
RoomB RoomA 0.12
===

The columns are: From, To, and Time taken. There are 4 rooms in total.
The data's unsorted, and there will be multiple pairs (i.e. I haven't done 
de-duplication of pairings via the handshake algorithm, I just pinged 
everything from everything). There will be multiple entries for each pairing.

The graphs I think I want to produce are:

For "From RoomA", overlay each timing graph for every other room. That means 
there will be 4 kernel densities (well actually I'd take a histogram plotted as 
a line, as I think that's more appropriate, and I don't know what a kernel 
density is) on one graph.
I'd also like to do the above for "From RoomB", "From RoomC", and "From RoomD", 
so I'd end up with with 4 graphs (all with the same xlim/ylim) each with 4 
lines plotted. I'd eventually like those produced as vector Postscript for 
inclusion in a report, but I think I can handle that with ?postscript() and 
?layout()

I've got as far as importing the data with
read.table("eth_ping_timings.dat", col.names=c("From", "To", "Time"))
Then I can do "standard" simple operations on Foo$Time. "Factoring" (if that is 
indeed the term) is where I fall down. I simply don't know how to break out the 
pairings.

Is R actually the way to go for this? I feel pretty confident I could cobble 
together some Perl which produces Postscript to describe the curves, but I 
suspect that once I produce what these graphs, I will immediately think of 
other questions to ask, and R sounds like it's the proper tool to ask those 
questions.

cheers
jack

________________________________

This email and any files transmitted with it are confide...{{dropped:10}}

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] Producing multiple analyses (histograms/kernel densities) of network timings between groups

Reply via email to