[R] Question about scatterplot in package car
I am getting an error message from scatterplot: > library(car) > scatterplot(Prestige$income~Prestige$type) Error in Summary.factor(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, : range not meaningful for factors In addition: Warning message: In Ops.factor(x[floor(d)], x[ceiling(d)]) : + not meaningful for factors > The command does output the kind of graph that I want (boxplots). I just did install.packages("car") so I believe I have the latest version. More generally, the reason I am trying to do this is because I am trying to generate a boxplot with case number labels on the outlier points. I initially tried to do it with the base boxplot() but the returned values of the outliers do not include enough information to actually identify which cases those points came from. Scatterplot in car seems like it should be able to do the trick, but I haven't been able to figure out how to get it to work. The following command works as expected: scatterplot(Prestige$income~Prestige$prestige, id.n=4) but this command does not label anything: scatterplot(Prestige$income~Prestige$type, id.n=4) and also produces the same error. Furthermore, even if that did work, what I really want is for the points already identified as outliers to be labeled, but I have not been able to figure out how to do that. I am not sure if that is because I am confused, or because scatterplot() isn't working right! Any help with how to do this would be greatly appreciated. Thanks in advance! -- -dave "Pseudo-colored pictures of a person's brain lighting up are undoubtedly more persuasive than a pattern of squiggles produced by a polygraph. That could be a big problem if the goal is to get to the truth." -Dr. Steven Hyman, Harvard __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Group labels in lattice barchart
Hello, I've been searching on the web for a few hours and seem to be stuck on this. The code pasted below generates a histogram of subject responses in four different conditions in an experiment. This version of the graph is one I'm using for internal consistency checking, so I've set it up to indicate the order of the responses, which is contained in the variable StimCount. The purpose of this is to be able to compare across the panels to see if a similar outlier in two panels happened at about the same time. Right now the stacked group bars are colored to indicate this StimCount number, which is OK but I have been trying to figure out how to indicate that with a simple number on top of the relevant stacked bar segment. I haven't found any useful information on the web, except one response to someone asking a similar question on Stack Overflow, which stated that it is "easy enough if you know R" but that they can't post the answer due to licensing restrictions, which i! s pretty much an anti-helpful answer. http://stackoverflow.com/questions/2147084/r-add-labels-to-lattice-barchart In any case, as you can see from running the code below, right now I have something that puts numbers on the barchart, but with several failures: 1. The numbers are never higher than one bar segment; they don't go up with the stacked segments. 2. The labels are wrong; the bar segments seem to be labeled with the order of the segment, not with the underlying value of the grouping variable StimCount (as is shown in the key at the bottom). I know this is because I have "label=groups" instead of "label=StimCount", but I can't for the life of me figure out how to get access to the actual value of StimCount inside my custom panel function. I have tried several different approaches, such as label=StimCount or label=data$StimCount (adjusting the argument list as seemed appropriate, too) but nothing I have been able to think of worked for that. Just to clarify, what I want is for each bar segment to have a number that matches its color, as specified in the key at the bottom. As an aside, in the help for panel.barchart (as in many other places) it says "... extra arguments will be accepted but ignored". I understand what that means but I have struggled to figure out any way of determining what the actual contents of the "..." will be, so I know what inputs are available to me in my custom panel function. This seems like a very critical piece of information which, for some reason, is kept well-hidden... Any help will be greatly appreciated! The following commands should produce an example plot by pasting directly into R, assuming you have a net connection: library(lattice) load(url('http://brainimaging.waisman.wisc.edu/~perlman/testdata.rdata')) print(barchart(Count~Rating | RateType*Temperature, data=tf, groups=StimCount, stack=TRUE, scales=list(alternating=c(3)), ylim=c(0,11), par.settings=list(superpose.polygon=list(col=rainbow(10))), auto.key = list(points = FALSE, rectangles = TRUE, space = "bottom", columns=5), panel=function(x, y, subscripts, groups, ...) { panel.barchart(x, y, subscripts=subscripts, groups=groups, ...) panel.text(x, y-0.5, label=groups, cex=1) } )) Note: this does not work, I get an error message on the plot that says "argument "data" is missing": myPanel <- function(x, y, data, ...) { panel.barchart(x, y, ...) panel.text(x, y-0.5, label=data$StimCount, cex=1) } -- -dave "Pseudo-colored pictures of a person's brain lighting up are undoubtedly more persuasive than a pattern of squiggles produced by a polygraph. That could be a big problem if the goal is to get to the truth." -Dr. Steven Hyman, Harvard __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] Reshape question
I have a data frame in wide format. There are six variables that represent two factors in long format 3x2, Valence and Temperature: > head(dpts) File Subj Time Group PainNeg.hot PainNeg.warm SociNeg.hot SociNeg.warm Positiv.hot Positiv.warm Errors 1 WB101_1_1_dp.txt 1011 MNP 30.70 13.75000 16.319048 35.1730.1833314.38 1 2 WB101_2_1_dp.txt 1012 MNP5.27-79.6 -24.738095 -5.5023.95000 -14.70 0 3 WB102_1_1_dp.txt 1021 MNP 50.75-13.432145.185714 19.08 4.2-8.03 1 4 WB102_2_1_dp.txt 1022 MNP -41.38 9.32500 -9.845238 22.95 -14.3500040.93 0 5 WB103_1_1_dp.txt 1031 MNP 25.27-48.27500 48.726190 8.14166710.98333 -31.97 2 6 WB103_2_1_dp.txt 1032 MNP 26.75-13.289293.447619 -8.641667 -10.9 -27.416667 1 The following command does part of what I want: dptsr<-reshape(dpts, varying=c('PainNeg.hot','PainNeg.warm','SociNeg.hot','SociNeg.warm','Positiv.hot','Positiv.warm'), v.names=c('Bias'),direction='long',timevar=c('Valence','Temperature'), times=c('PainNeg.hot','PainNeg.warm','SociNeg.hot','SociNeg.warm','Positiv.hot','Positiv.warm'), idvar=c('Subj','Time')) But it doesn't break out the two factors: > head(dptsr) File Subj Time Group Errors Valence Temperature Bias 101.1.PainNeg.hot WB101_1_1_dp.txt 1011 MNP 1 PainNeg.hot PainNeg.hot 30.70 101.2.PainNeg.hot WB101_2_1_dp.txt 1012 MNP 0 PainNeg.hot PainNeg.hot 5.27 102.1.PainNeg.hot WB102_1_1_dp.txt 1021 MNP 1 PainNeg.hot PainNeg.hot 50.75 102.2.PainNeg.hot WB102_2_1_dp.txt 1022 MNP 0 PainNeg.hot PainNeg.hot -41.38 103.1.PainNeg.hot WB103_1_1_dp.txt 1031 MNP 2 PainNeg.hot PainNeg.hot 25.27 103.2.PainNeg.hot WB103_2_1_dp.txt 1032 MNP 1 PainNeg.hot PainNeg.hot 26.75 So I did successfully create two factor variables, but they both contain the same values. Instead I would want "Valence" to be (for example) "PainNeg" and "Temperature" to be "hot". Can anyone help me figure out how to get reshape to do this? I have never been able to make much sense out of the reshape documentation... Thanks! -dave-- A neuroscientist is at the video arcade, when someone makes him a $1000 bet on Pac-Man. He smiles, gets out his screwdriver and takes apart the Pac-Man game. Everyone says "What are you doing?" The neuroscientist says "Well, since we all know that Pac-Man is based on electric signals traveling through these circuits, obviously I can understand it better than the other guy by going straight to the source!" __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] by output into data frame
I could do this in various hacky ways, but what's the right way? I have a nice application of the by function, which does what I want. The output looks like this: > auc_stress lab.samples.stress$subid: 2 cortisol amylase 1 919.05 6834.8 --- lab.samples.stress$subid: 3 cortisol amylase 11 728.25 24422.05 etc. What I want is a data frame roughly like this: subid cortisol.auc amylase.auc 2 919.056834.8 3 728.2524422.05 etc. What is a nice way to make that happen? Here is the code and data that I am using, which should run directly if you copy and paste it: sanity.check<-read.csv("http://brainimaging.waisman.wisc.edu/~perlman/testdata.csv";, header=TRUE, sep = ",") lab.samples <- subset(sanity.check,Sample!='before bed' & Sample!='morning after') lab.samples$Sample<-factor(lab.samples$Sample) lab.samples.stress<-subset(lab.samples,challenge=='stress') lab.samples.control<-subset(lab.samples,challenge=='control') auc_ground <- function(sub_df) { print(sub_df) auc<-sub_df[1,]*0 timedif<-c(60,10,10,10,10,10,10) for (i in 1:(nrow(sub_df)-1) ) { print(c(i,i+1)) #print(c(values[i],values[i+1])) pair_area<-(sub_df[i,]+sub_df[i+1,])*timedif[i]/2 auc<-auc+pair_area } auc } auc_stress<-by(lab.samples.stress[c('cortisol','amylase')], lab.samples.stress$subid, auc_ground, simplify=T) auc_control<-by(lab.samples.control[c('cortisol','amylase')], lab.samples.control$subid, auc_ground, simplify=T) Thanks for your help! P.S. sorry if this question has been answered before, it is nearly impossible to get useful google results on search terms like "by"... too common word... -dave-- A neuroscientist is at the video arcade, when someone makes him a $1000 bet on Pac-Man. He smiles, gets out his screwdriver and takes apart the Pac-Man game. Everyone says "What are you doing?" The neuroscientist says "Well, since we all know that Pac-Man is based on electric signals traveling through these circuits, obviously I can understand it better than the other guy by going straight to the source!" __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] There must be a better way to do this
I made this rather cool plot which I am quite pleased with: http://brainimaging.waisman.wisc.edu/~perlman/data/BeeswarmLinesDemo.pdf However, I feel there must be a better way to do it than what I did. I'm attaching the code to create it, which downloads the data by http so it should run for you if you have the current version of beeswarm installed (which was just updated today, incidentally). It might also work with a non-current version of beeswarm. The problem is that I jumped through all kinds of hoops to: a) get the subject numbers for each point associated with the point xy coordinates output by beeswarm. The order of the points is not the same as the order in the input file; they are shuffled in a way that I think depends on the input formula. The trick I used (ok, I hope you're sitting down when you read this) is to run beeswarm a second time with pwcol=Subj, so then the "col" column of the output becomes the subject numbers. I know, horrible. But I don't know how else to do it. I feel like there is probably some logic to the way the cases were reordered by the formula, but I don't know how to work with that. b) get the lines() function to pair the xy coordinates properly. I did this by reshaping the whole thing into wide format, with separate columns for x.1 y.1 x.2 y.2, and then add a third pair of columns x.3 y.3 which is all NA, and then reshaping it back into long format. Then the lines() function automatically does the right thing, but I feel like that was a horrible hack and there must be a smarter way to do it. Thanks very much in advance for any help! -dave-- A neuroscientist is at the video arcade, when someone makes him a $1000 bet on Pac-Man. He smiles, gets out his screwdriver and takes apart the Pac-Man game. Everyone says "What are you doing?" The neuroscientist says "Well, since we all know that Pac-Man is based on electric signals traveling through these circuits, obviously I can understand it better than the other guy by going straight to the source!" __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
Re: [R] There must be a better way to do this
Thanks, that is very helpful. I agree that my example plot was a bit cluttered, but this is what I actually wanted: http://brainimaging.waisman.wisc.edu/~perlman/data/MNPT1T2_h_unp_raw.pdf I just needed to get example code out quickly. You get better help when you have a self-contained demo of the question. :) I have replaced my old horrible code with the nice concise segments code. Thanks! On May 9, 2012, at 3:55 AM, Jim Lemon wrote: > On 05/09/2012 03:59 AM, David Perlman wrote: >> I made this rather cool plot which I am quite pleased with: >> http://brainimaging.waisman.wisc.edu/~perlman/data/BeeswarmLinesDemo.pdf >> >> However, I feel there must be a better way to do it than what I did. I'm >> attaching the code to create it, which downloads the data by http so it >> should run for you if you have the current version of beeswarm installed >> (which was just updated today, incidentally). It might also work with a >> non-current version of beeswarm. >> >> The problem is that I jumped through all kinds of hoops to: >> >> a) get the subject numbers for each point associated with the point xy >> coordinates output by beeswarm. The order of the points is not the same as >> the order in the input file; they are shuffled in a way that I think depends >> on the input formula. The trick I used (ok, I hope you're sitting down when >> you read this) is to run beeswarm a second time with pwcol=Subj, so then the >> "col" column of the output becomes the subject numbers. I know, horrible. >> But I don't know how else to do it. I feel like there is probably some >> logic to the way the cases were reordered by the formula, but I don't know >> how to work with that. >> >> b) get the lines() function to pair the xy coordinates properly. I did this >> by reshaping the whole thing into wide format, with separate columns for x.1 >> y.1 x.2 y.2, and then add a third pair of columns x.3 y.3 which is all NA, >> and then reshaping it back into long format. Then the lines() function >> automatically does the right thing, but I feel like that was a horrible hack >> and there must be a smarter way to do it. >> >> > Hi Dave, > This plot looks like the offspring of a boxplot, a beeswarm plot and a > bumpchart after a heavy night on the grog. Beauty is in the eye of the > beholder, I guess. > > Let's see, first you plot the boxplots, then the beeswarm on the centerlines > of the boxplots, then you want to add the lines. Okay, try this: > > paindat<-data.frame( > HEP1=sample(1:20,30,TRUE, > prob=c(seq(0,0.1,length.out=10),seq(0.1,0,length.out=10))), > HEP2=sample(1:20,30,TRUE, > prob=c(seq(0,0.1,length.out=10),seq(0.1,0,length.out=10))), > MBSR1=sample(1:20,30,TRUE, > prob=c(seq(0,0.1,length.out=10),seq(0.1,0,length.out=10))), > MBSR2=sample(1:20,30,TRUE, > prob=c(seq(0,0.1,length.out=10),seq(0.1,0,length.out=10))), > Wait1=sample(1:20,30,TRUE, > prob=c(seq(0,0.1,length.out=10),seq(0.1,0,length.out=10))), > Wait2=sample(1:20,30,TRUE, > prob=c(seq(0,0.1,length.out=10),seq(0.1,0,length.out=10 > boxplot(paindat,ylim=c(0,20), > col=c("pink","pink","lightgreen","lightgreen","lightblue","lightblue")) > require(beeswarm) > bsinfo<-beeswarm(tangledat,add=TRUE) > segments(bsinfo$x[bsinfo$x.orig=="HEP1"],bsinfo$y[bsinfo$x.orig=="HEP1"], > bsinfo$x[bsinfo$x.orig=="HEP2"],bsinfo$y[bsinfo$x.orig=="HEP2"]) > segments(bsinfo$x[bsinfo$x.orig=="MBSR1"],bsinfo$y[bsinfo$x.orig=="MBSR1"], > bsinfo$x[bsinfo$x.orig=="MBSR2"],bsinfo$y[bsinfo$x.orig=="MBSR2"]) > segments(bsinfo$x[bsinfo$x.orig=="Wait1"],bsinfo$y[bsinfo$x.orig=="Wait1"], > bsinfo$x[bsinfo$x.orig=="Wait2"],bsinfo$y[bsinfo$x.orig=="Wait2"]) > > and let me say right here that the beeswarm function is a crackerjack piece > of work. > > Jim -dave-- A neuroscientist is at the video arcade, when someone makes him a $1000 bet on Pac-Man. He smiles, gets out his screwdriver and takes apart the Pac-Man game. Everyone says "What are you doing?" The neuroscientist says "Well, since we all know that Pac-Man is based on electric signals traveling through these circuits, obviously I can understand it better than the other guy by going straight to the source!" __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.
[R] List indexing question
Consider the following: > x<-list(c(1,2,3),c(4,5,6)) > x[1] [[1]] [1] 1 2 3 > x[2] [[1]] [1] 4 5 6 So far that all seems reasonable. But now there's a problem. I'm used to python, where I would say x[2][1] and get the value 4. But I can't figure out how to do that in R. > x[2][1] [[1]] [1] 4 5 6 > x[2,1] Error in x[2, 1] : incorrect number of dimensions I have no idea why x[2][1] returns the same thing as x[2]; that makes no sense to me at all. What is the proper syntax for what I'm trying to do? Thanks! -dave-- A neuroscientist is at the video arcade, when someone makes him a $1000 bet on Pac-Man. He smiles, gets out his screwdriver and takes apart the Pac-Man game. Everyone says "What are you doing?" The neuroscientist says "Well, since we all know that Pac-Man is based on electric signals traveling through these circuits, obviously I can understand it better than the other guy by going straight to the source!" __ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.