[R] Question about scatterplot in package car

2011-01-15 Thread David Perlman
I am getting an error message from scatterplot:

> library(car)
> scatterplot(Prestige$income~Prestige$type)
Error in Summary.factor(c(2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,  : 
  range not meaningful for factors
In addition: Warning message:
In Ops.factor(x[floor(d)], x[ceiling(d)]) : + not meaningful for factors
> 

The command does output the kind of graph that I want (boxplots).

I just did install.packages("car") so I believe I have the latest version.

More generally, the reason I am trying to do this is because I am trying to 
generate a boxplot with case number labels on the outlier points.  I initially 
tried to do it with the base boxplot() but the returned values of the outliers 
do not include enough information to actually identify which cases those points 
came from.  Scatterplot in car seems like it should be able to do the trick, 
but I haven't been able to figure out how to get it to work.

The following command works as expected:
scatterplot(Prestige$income~Prestige$prestige, id.n=4)

but this command does not label anything:
scatterplot(Prestige$income~Prestige$type, id.n=4)
and also produces the same error.

Furthermore, even if that did work, what I really want is for the points 
already identified as outliers to be labeled, but I have not been able to 
figure out how to do that.  I am not sure if that is because I am confused, or 
because scatterplot() isn't working right!

Any help with how to do this would be greatly appreciated.  Thanks in advance!

--
-dave
"Pseudo-colored pictures of a person's brain lighting up are 
undoubtedly more persuasive than a pattern of squiggles produced by a
polygraph.  That could be a big problem if the goal is to get to the
truth."  -Dr. Steven Hyman, Harvard

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Group labels in lattice barchart

2011-03-22 Thread David Perlman
Hello, I've been searching on the web for a few hours and seem to be stuck on 
this.  The code pasted below generates a histogram of subject responses in four 
different conditions in an experiment.  This version of the graph is one I'm 
using for internal consistency checking, so I've set it up to indicate the 
order of the responses, which is contained in the variable StimCount.  The 
purpose of this is to be able to compare across the panels to see if a similar 
outlier in two panels happened at about the same time.  Right now the stacked 
group bars are colored to indicate this StimCount number, which is OK but I 
have been trying to figure out how to indicate that with a simple number on top 
of the relevant stacked bar segment.  I haven't found any useful information on 
the web, except one response to someone asking a similar question on Stack 
Overflow, which stated that it is "easy enough if you know R" but that they 
can't post the answer due to licensing restrictions, which i!
 s pretty much an anti-helpful answer.
http://stackoverflow.com/questions/2147084/r-add-labels-to-lattice-barchart

In any case, as you can see from running the code below, right now I have 
something that puts numbers on the barchart, but with several failures:
1. The numbers are never higher than one bar segment; they don't go up with the 
stacked segments.
2. The labels are wrong; the bar segments seem to be labeled with the order of 
the segment, not with the underlying value of the grouping variable StimCount 
(as is shown in the key at the bottom).  I know this is because I have 
"label=groups" instead of "label=StimCount", but I can't for the life of me 
figure out how to get access to the actual value of StimCount inside my custom 
panel function.  I have tried several different approaches, such as 
label=StimCount or label=data$StimCount (adjusting the argument list as seemed 
appropriate, too) but nothing I have been able to think of worked for that.

Just to clarify, what I want is for each bar segment to have a number that 
matches its color, as specified in the key at the bottom.


As an aside, in the help for panel.barchart (as in many other places) it says 
"... extra arguments will be accepted but ignored".  I understand what that 
means but I have struggled to figure out any way of determining what the actual 
contents of the "..." will be, so I know what inputs are available to me in my 
custom panel function.  This seems like a very critical piece of information 
which, for some reason, is kept well-hidden...

Any help will be greatly appreciated!  

The following commands should produce an example plot by pasting directly into 
R, assuming you have a net connection:

library(lattice)
load(url('http://brainimaging.waisman.wisc.edu/~perlman/testdata.rdata'))

print(barchart(Count~Rating | RateType*Temperature, data=tf, groups=StimCount,
stack=TRUE, scales=list(alternating=c(3)), ylim=c(0,11),
par.settings=list(superpose.polygon=list(col=rainbow(10))),
auto.key = list(points = FALSE, rectangles = TRUE, space = "bottom", columns=5),
panel=function(x, y, subscripts, groups, ...) {
  panel.barchart(x, y, subscripts=subscripts, groups=groups, ...) 
  panel.text(x, y-0.5, label=groups, cex=1)
}
))


Note: this does not work, I get an error message on the plot that says 
"argument "data" is missing":
myPanel <- function(x, y, data, ...) {
  panel.barchart(x, y,  ...) 
  panel.text(x, y-0.5, label=data$StimCount, cex=1)
}


--
-dave
"Pseudo-colored pictures of a person's brain lighting up are 
undoubtedly more persuasive than a pattern of squiggles produced by a
polygraph.  That could be a big problem if the goal is to get to the
truth."  -Dr. Steven Hyman, Harvard

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] Reshape question

2012-03-06 Thread David Perlman
I have a data frame in wide format.  There are six variables that represent two 
factors in long format 3x2, Valence and Temperature:

> head(dpts)
  File Subj Time Group PainNeg.hot PainNeg.warm SociNeg.hot 
SociNeg.warm Positiv.hot Positiv.warm Errors
1 WB101_1_1_dp.txt  1011   MNP   30.70 13.75000   16.319048
35.1730.1833314.38  1
2 WB101_2_1_dp.txt  1012   MNP5.27-79.6  -24.738095
-5.5023.95000   -14.70  0
3 WB102_1_1_dp.txt  1021   MNP   50.75-13.432145.185714
19.08 4.2-8.03  1
4 WB102_2_1_dp.txt  1022   MNP  -41.38  9.32500   -9.845238
22.95   -14.3500040.93  0
5 WB103_1_1_dp.txt  1031   MNP   25.27-48.27500   48.726190 
8.14166710.98333   -31.97  2
6 WB103_2_1_dp.txt  1032   MNP   26.75-13.289293.447619
-8.641667   -10.9   -27.416667  1

The following command does part of what I want:

dptsr<-reshape(dpts,
varying=c('PainNeg.hot','PainNeg.warm','SociNeg.hot','SociNeg.warm','Positiv.hot','Positiv.warm'),
v.names=c('Bias'),direction='long',timevar=c('Valence','Temperature'),
times=c('PainNeg.hot','PainNeg.warm','SociNeg.hot','SociNeg.warm','Positiv.hot','Positiv.warm'),
idvar=c('Subj','Time'))

But it doesn't break out the two factors:

> head(dptsr)
  File Subj Time Group Errors Valence 
Temperature   Bias
101.1.PainNeg.hot WB101_1_1_dp.txt  1011   MNP  1 PainNeg.hot 
PainNeg.hot  30.70
101.2.PainNeg.hot WB101_2_1_dp.txt  1012   MNP  0 PainNeg.hot 
PainNeg.hot   5.27
102.1.PainNeg.hot WB102_1_1_dp.txt  1021   MNP  1 PainNeg.hot 
PainNeg.hot  50.75
102.2.PainNeg.hot WB102_2_1_dp.txt  1022   MNP  0 PainNeg.hot 
PainNeg.hot -41.38
103.1.PainNeg.hot WB103_1_1_dp.txt  1031   MNP  2 PainNeg.hot 
PainNeg.hot  25.27
103.2.PainNeg.hot WB103_2_1_dp.txt  1032   MNP  1 PainNeg.hot 
PainNeg.hot  26.75

So I did successfully create two factor variables, but they both contain the 
same values.  Instead I would want "Valence" to be (for example) "PainNeg" and 
"Temperature" to be "hot".

Can anyone help me figure out how to get reshape to do this?  I have never been 
able to make much sense out of the reshape documentation...

Thanks!



-dave--
A neuroscientist is at the video arcade, when someone makes him a $1000 bet
on Pac-Man. He smiles, gets out his screwdriver and takes apart the Pac-Man
game. Everyone says "What are you doing?" The neuroscientist says "Well,
since we all know that Pac-Man is based on electric signals traveling
through these circuits, obviously I can understand it better than the other
guy by going straight to the source!"

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] by output into data frame

2012-03-19 Thread David Perlman
I could do this in various hacky ways, but what's the right way?

I have a nice application of the by function, which does what I want.  The 
output looks like this:

> auc_stress
lab.samples.stress$subid: 2
  cortisol amylase
1   919.05  6834.8
---
 
lab.samples.stress$subid: 3
   cortisol  amylase
11   728.25 24422.05

etc.

What I want is a data frame roughly like this:

subid  cortisol.auc  amylase.auc
2  919.056834.8
3  728.2524422.05

etc.

What is a nice way to make that happen?



Here is the code and data that I am using, which should run directly if you 
copy and paste it:


sanity.check<-read.csv("http://brainimaging.waisman.wisc.edu/~perlman/testdata.csv";,
 header=TRUE, sep = ",")
lab.samples <- subset(sanity.check,Sample!='before bed' & Sample!='morning 
after')
lab.samples$Sample<-factor(lab.samples$Sample)
lab.samples.stress<-subset(lab.samples,challenge=='stress')
lab.samples.control<-subset(lab.samples,challenge=='control')

auc_ground <- function(sub_df) {
print(sub_df)
auc<-sub_df[1,]*0
timedif<-c(60,10,10,10,10,10,10)
for (i in 1:(nrow(sub_df)-1) ) {
print(c(i,i+1))
#print(c(values[i],values[i+1]))
pair_area<-(sub_df[i,]+sub_df[i+1,])*timedif[i]/2
auc<-auc+pair_area
}
auc
}

auc_stress<-by(lab.samples.stress[c('cortisol','amylase')], 
lab.samples.stress$subid, auc_ground, simplify=T)
auc_control<-by(lab.samples.control[c('cortisol','amylase')], 
lab.samples.control$subid, auc_ground, simplify=T)


Thanks for your help!

P.S. sorry if this question has been answered before, it is nearly impossible 
to get useful google results on search terms like "by"...  too common word...


-dave--
A neuroscientist is at the video arcade, when someone makes him a $1000 bet
on Pac-Man. He smiles, gets out his screwdriver and takes apart the Pac-Man
game. Everyone says "What are you doing?" The neuroscientist says "Well,
since we all know that Pac-Man is based on electric signals traveling
through these circuits, obviously I can understand it better than the other
guy by going straight to the source!"

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] There must be a better way to do this

2012-05-08 Thread David Perlman
I made this rather cool plot which I am quite pleased with:
http://brainimaging.waisman.wisc.edu/~perlman/data/BeeswarmLinesDemo.pdf

However, I feel there must be a better way to do it than what I did.  I'm 
attaching the code to create it, which downloads the data by http so it should 
run for you if you have the current version of beeswarm installed (which was 
just updated today, incidentally).  It might also work with a non-current 
version of beeswarm.

The problem is that I jumped through all kinds of hoops to: 

a) get the subject numbers for each point associated with the point xy 
coordinates output by beeswarm.  The order of the points is not the same as the 
order in the input file; they are shuffled in a way that I think depends on the 
input formula.  The trick I used (ok, I hope you're sitting down when you read 
this) is to run beeswarm a second time with pwcol=Subj, so then the "col" 
column of the output becomes the subject numbers.  I know, horrible.  But I 
don't know how else to do it.  I feel like there is probably some logic to the 
way the cases were reordered by the formula, but I don't know how to work with 
that.

b) get the lines() function to pair the xy coordinates properly.  I did this by 
reshaping the whole thing into wide format, with separate columns for x.1 y.1 
x.2 y.2, and then add a third pair of columns x.3 y.3 which is all NA, and then 
reshaping it back into long format.  Then the lines() function automatically 
does the right thing, but I feel like that was a horrible hack and there must 
be a smarter way to do it.


Thanks very much in advance for any help!


-dave--
A neuroscientist is at the video arcade, when someone makes him a $1000 bet
on Pac-Man. He smiles, gets out his screwdriver and takes apart the Pac-Man
game. Everyone says "What are you doing?" The neuroscientist says "Well,
since we all know that Pac-Man is based on electric signals traveling
through these circuits, obviously I can understand it better than the other
guy by going straight to the source!"


__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] There must be a better way to do this

2012-05-09 Thread David Perlman
Thanks, that is very helpful.  I agree that my example plot was a bit 
cluttered, but this is what I actually wanted:
http://brainimaging.waisman.wisc.edu/~perlman/data/MNPT1T2_h_unp_raw.pdf
I just needed to get example code out quickly.  You get better help when you 
have a self-contained demo of the question.  :)

I have replaced my old horrible code with the nice concise segments code.  
Thanks!


On May 9, 2012, at 3:55 AM, Jim Lemon wrote:

> On 05/09/2012 03:59 AM, David Perlman wrote:
>> I made this rather cool plot which I am quite pleased with:
>> http://brainimaging.waisman.wisc.edu/~perlman/data/BeeswarmLinesDemo.pdf
>> 
>> However, I feel there must be a better way to do it than what I did.  I'm 
>> attaching the code to create it, which downloads the data by http so it 
>> should run for you if you have the current version of beeswarm installed 
>> (which was just updated today, incidentally).  It might also work with a 
>> non-current version of beeswarm.
>> 
>> The problem is that I jumped through all kinds of hoops to:
>> 
>> a) get the subject numbers for each point associated with the point xy 
>> coordinates output by beeswarm.  The order of the points is not the same as 
>> the order in the input file; they are shuffled in a way that I think depends 
>> on the input formula.  The trick I used (ok, I hope you're sitting down when 
>> you read this) is to run beeswarm a second time with pwcol=Subj, so then the 
>> "col" column of the output becomes the subject numbers.  I know, horrible.  
>> But I don't know how else to do it.  I feel like there is probably some 
>> logic to the way the cases were reordered by the formula, but I don't know 
>> how to work with that.
>> 
>> b) get the lines() function to pair the xy coordinates properly.  I did this 
>> by reshaping the whole thing into wide format, with separate columns for x.1 
>> y.1 x.2 y.2, and then add a third pair of columns x.3 y.3 which is all NA, 
>> and then reshaping it back into long format.  Then the lines() function 
>> automatically does the right thing, but I feel like that was a horrible hack 
>> and there must be a smarter way to do it.
>> 
>> 
> Hi Dave,
> This plot looks like the offspring of a boxplot, a beeswarm plot and a 
> bumpchart after a heavy night on the grog. Beauty is in the eye of the 
> beholder, I guess.
> 
> Let's see, first you plot the boxplots, then the beeswarm on the centerlines 
> of the boxplots, then you want to add the lines. Okay, try this:
> 
> paindat<-data.frame(
> HEP1=sample(1:20,30,TRUE,
> prob=c(seq(0,0.1,length.out=10),seq(0.1,0,length.out=10))),
> HEP2=sample(1:20,30,TRUE,
> prob=c(seq(0,0.1,length.out=10),seq(0.1,0,length.out=10))),
> MBSR1=sample(1:20,30,TRUE,
> prob=c(seq(0,0.1,length.out=10),seq(0.1,0,length.out=10))),
> MBSR2=sample(1:20,30,TRUE,
> prob=c(seq(0,0.1,length.out=10),seq(0.1,0,length.out=10))),
> Wait1=sample(1:20,30,TRUE,
> prob=c(seq(0,0.1,length.out=10),seq(0.1,0,length.out=10))),
> Wait2=sample(1:20,30,TRUE,
> prob=c(seq(0,0.1,length.out=10),seq(0.1,0,length.out=10
> boxplot(paindat,ylim=c(0,20),
> col=c("pink","pink","lightgreen","lightgreen","lightblue","lightblue"))
> require(beeswarm)
> bsinfo<-beeswarm(tangledat,add=TRUE)
> segments(bsinfo$x[bsinfo$x.orig=="HEP1"],bsinfo$y[bsinfo$x.orig=="HEP1"],
> bsinfo$x[bsinfo$x.orig=="HEP2"],bsinfo$y[bsinfo$x.orig=="HEP2"])
> segments(bsinfo$x[bsinfo$x.orig=="MBSR1"],bsinfo$y[bsinfo$x.orig=="MBSR1"],
> bsinfo$x[bsinfo$x.orig=="MBSR2"],bsinfo$y[bsinfo$x.orig=="MBSR2"])
> segments(bsinfo$x[bsinfo$x.orig=="Wait1"],bsinfo$y[bsinfo$x.orig=="Wait1"],
> bsinfo$x[bsinfo$x.orig=="Wait2"],bsinfo$y[bsinfo$x.orig=="Wait2"])
> 
> and let me say right here that the beeswarm function is a crackerjack piece 
> of work.
> 
> Jim

-dave--
A neuroscientist is at the video arcade, when someone makes him a $1000 bet
on Pac-Man. He smiles, gets out his screwdriver and takes apart the Pac-Man
game. Everyone says "What are you doing?" The neuroscientist says "Well,
since we all know that Pac-Man is based on electric signals traveling
through these circuits, obviously I can understand it better than the other
guy by going straight to the source!"

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


[R] List indexing question

2012-05-21 Thread David Perlman
Consider the following:
> x<-list(c(1,2,3),c(4,5,6))
> x[1]
[[1]]
[1] 1 2 3

> x[2]
[[1]]
[1] 4 5 6

So far that all seems reasonable.  But now there's a problem.  I'm used to 
python, where I would say x[2][1] and get the value 4.  But I can't figure out 
how to do that in R.

> x[2][1]
[[1]]
[1] 4 5 6

> x[2,1]
Error in x[2, 1] : incorrect number of dimensions

I have no idea why x[2][1] returns the same thing as x[2]; that makes no sense 
to me at all.

What is the proper syntax for what I'm trying to do?

Thanks!


-dave--
A neuroscientist is at the video arcade, when someone makes him a $1000 bet
on Pac-Man. He smiles, gets out his screwdriver and takes apart the Pac-Man
game. Everyone says "What are you doing?" The neuroscientist says "Well,
since we all know that Pac-Man is based on electric signals traveling
through these circuits, obviously I can understand it better than the other
guy by going straight to the source!"

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.