(This is a repost from a little while ago. I assume my mail got silently
bounced because I used some rather strange email routing. If it did get
through, and I simply haven't seen it or a response, then please accept my
apologies)
Hi,
I'm new to R, and new to statistics. I'm *trying* to learn R, but I'm
struggling with the R-intro, mainly (I think) due to the fact that I have no
background in stats, and some of the language is unfamiliar to me (I started
with C and Perl, mainly) so I might use the wrong terms. I think the "R in
action" book might help, but recommendations are welcome.
I have a whole bunch of network timings (ICMP echos) between different groups
of nodes using two different networks. I want to compare the timings between
the groups and across networks, as I /believe/ that one network has much
greater variability than the other. I want to prove this, one way or the other,
and I think a graphical view of the ~20000 results would help. The initial
histograms/kernel densities I've produced so far support that theory (i.e. they
look a bit like the Normal distribution, but one network is much more
"stretched out" and "bumpy"), but I've resorted to pre-processing that data in
Perl in order to produce the graphs. I think R can be used to do all of this in
one.
For each network, I have files like this:
===
RoomA RoomB 0.34
RoomC RoomA 0.12
RoomB RoomA 0.12
===
The columns are: From, To, and Time taken. There are 4 rooms in total.
The data's unsorted, and there will be multiple pairs (i.e. I haven't done
de-duplication of pairings via the handshake algorithm, I just pinged
everything from everything). There will be multiple entries for each pairing.
The graphs I think I want to produce are:
For "From RoomA", overlay each timing graph for every other room. That means
there will be 4 kernel densities (well actually I'd take a histogram plotted as
a line, as I think that's more appropriate, and I don't know what a kernel
density is) on one graph.
I'd also like to do the above for "From RoomB", "From RoomC", and "From RoomD",
so I'd end up with with 4 graphs (all with the same xlim/ylim) each with 4
lines plotted. I'd eventually like those produced as vector Postscript for
inclusion in a report, but I think I can handle that with ?postscript() and
?layout()
I've got as far as importing the data with
read.table("eth_ping_timings.dat", col.names=c("From", "To", "Time"))
Then I can do "standard" simple operations on Foo$Time. "Factoring" (if that is
indeed the term) is where I fall down. I simply don't know how to break out the
pairings.
Is R actually the way to go for this? I feel pretty confident I could cobble
together some Perl which produces Postscript to describe the curves, but I
suspect that once I produce what these graphs, I will immediately think of
other questions to ask, and R sounds like it's the proper tool to ask those
questions.
cheers
jack
________________________________
This email and any files transmitted with it are confide...{{dropped:10}}
______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.