Re: [R] set dataframe field value from lookup table

David Winsemius Thu, 09 Dec 2010 09:19:13 -0800

Offlist comments No reply needed .. This is just for emphasis andclarification.


On Dec 9, 2010, at 11:19 AM, Jon Erik Ween wrote:

David
I see how findInterval is a more elegant way of doing 1). I'd needto change the indices in the lookup table, as
findInterval(36, c(0, 17, 19, 24, 29, 34, 44, 54, 64, 69, 74, 79,84, 89) )
[1] 6
should be 7, not 6. The age range for the 7th column 35-44. Butthat's easy.

As you say, changing the output to agree with your expectations is"easy" but to be clear, R _is_ delivering the correct response to thequestion "which interval is 36 located in ... the 6th. Any ambiguityis due to your not formulating a good question (and my errors).


I can't see how findInterval will help me for 2), though.

"2)" was never very clear. I do think findInterval must be what isneeded, but I am repeating my call for you to post a full example andmore complete explanation to the list.

The standard score is integer and not a range.

Which is _not_ how statisticians usually think of a "z score". So itmay need some further background or use of less misleadingterminology. You are probably tasked with using a table handed to youthat at one time was a "z-score" for <something> but has been recastin tabular form.


--
David.

So it maps 1 to 1. The real problem, though, is setting the value inthe main dataframe (df) with the value from the lookup table basedon the identified age and score indices.
My initial guess was:
df$DSTz <-DSTzlook[which(DSTzlook[,1]==df$Agetmp),which(DSTzlook[1,]==df$DSF+df$DSB)]
which could be rewritten:
df$DSTz <-DSTzlook[which(DSTzlook[,1]== findInterval(df$Age, c(0,17, 19, 24, 29, 34, 44, 54, 64, 69, 74, 79, 84,89))),which(DSTzlook[1,]==df$DSF+df$DSB)]
But it is the indirect referencing of the lookup in the main tablethat causes me trouble.
Jon

Soli Deo Gloria

Jon Erik Ween, MD, MS
Scientist, Kunin-Lunenfeld Applied Research Unit
Director, Stroke Clinic, Brain Health Clinic, Baycrest Centre
Assistant Professor, Dept. of Medicine, Div. of Neurology
   University of Toronto Faculty of Medicine

Kimel Family Building, 6th Floor, Room 644
Baycrest Centre
3560 Bathurst Street
Toronto, Ontario M6A 2E1
Canada

Phone: 416-785-2500 x3648
Fax: 416-785-2484
Email: jw...@klaru-baycrest.on.ca
Confidential: This communication and any attachment(s) may containconfidential or privileged information and is intended solely forthe address(es) or the entity representing the recipient(s). If youhave received this information in error, you are hereby advised todestroy the document and any attachment(s), make no copies of sameand inform the sender immediately of the error. Any unauthorized useor disclosure of this information is strictly prohibited.
On 2010-12-09, at 11:06 AM, David Winsemius wrote:
On Dec 9, 2010, at 10:51 AM, Jon Erik Ween wrote:
Thanks David
What I am trying to do is set up a script that assigns z-scores toa large dataframe (2500x300, but has Age in years and test scoresas columns.) from a published table of age-corrected standardscores on this cognitive test.
1) The age intervals in the lookup table are given and not mychoice.
You may want to skip the intermediate translation to the row andcolumn labels and just use the results of findInterval:
findInterval( 16, c(0, 17, 19, 24, 29, 34, 44, 54, 64, 69, 74, 79,84, 89) )
[1] 1
findInterval( 90, c(0, 17, 19, 24, 29, 34, 44, 54, 64, 69, 74, 79,84, 89) )
[1] 14

Those look like appropriate indices for the column argument
2) Sorry I didn't post an example table, it looks something likethis ("Age" is in the first row, standard scores in the firstcolumn):
     17   19   24   29   34   44 ....
30   2.6  2.6  2.6  2.6  2.6  2.6
29  1.8  1.8  1.8  2.0  2.6  2.6
28  1.0  1.0  1.8  1.8  2.6  2.6
27  0.0  0.5  1.0  1.8  2.6  2.6
26   -.5  0.0  0.0  1.0  1.8  2.6
.
.
.
.
So, if a subject (row) has age==29 and a standard score of 28, thevalue should be 1.8, etc.
Looks like a job for two findInterval indices to be used used with"[ r , c ] ".
--
David.
Thanks


Jon

Soli Deo Gloria

Jon Erik Ween, MD, MS
Scientist, Kunin-Lunenfeld Applied Research Unit
Director, Stroke Clinic, Brain Health Clinic, Baycrest Centre
Assistant Professor, Dept. of Medicine, Div. of Neurology
 University of Toronto Faculty of Medicine

Kimel Family Building, 6th Floor, Room 644
Baycrest Centre
3560 Bathurst Street
Toronto, Ontario M6A 2E1
Canada

Phone: 416-785-2500 x3648
Fax: 416-785-2484
Email: jw...@klaru-baycrest.on.ca
Confidential: This communication and any attachment(s) may containconfidential or privileged information and is intended solely forthe address(es) or the entity representing the recipient(s). Ifyou have received this information in error, you are herebyadvised to destroy the document and any attachment(s), make nocopies of same and inform the sender immediately of the error. Anyunauthorized use or disclosure of this information is strictlyprohibited.
On 2010-12-09, at 10:33 AM, David Winsemius wrote:
On Dec 9, 2010, at 9:34 AM, Jon Erik Ween wrote:
Hi
This is (hopefully) a bit more cogent phrasing of a previouspost. I'mtrying to compute a z-score to rows in a large dataframe basedon values inanother dataframe. Here's the script (that does not work). 2questons,
1) Anyone know of a more elegant way to calculate the "rounded"age value
than the nested ifelse's I've used?

2) how to reference the lookup table based on computed indices?

Thanks

Jon

# Define tables
DSTzlook <-
read.table("/Users/jween/Documents/ResearchProjects/ABC/data/DSTz.txt",
header=TRUE, sep="\t", na.strings="NA", dec=".", strip.white=TRUE)
df<-stroke

# Compute rounded age.
df$Agetmp
<-ifelse(df$Age>=89,89,ifelse(df$Age>=84,84,ifelse(df$Age>=79,79,ifelse(df$Age>=74,74,ifelse(df$Age>=69,69,ifelse(df$Age>=64,64,ifelse(df$Age>=54,54,ifelse(df$Age>=44,44,ifelse(df$Age>=34,34,ifelse(df$Age>=29,29,ifelse(df$Age>=24,24,ifelse(df$Age>=19,19,17))))))))))))
Ew, painful. If you want categorized ages (since what the abovecoding is producing is not "rounded" in any sense of that word asI understand it, then why not findInterval() as an index into theages you wnat to label these case with?
df$Agetmp <- c(17,19,24,29,34,44,54,64,69,74,79,84)[ # noteExtract operationfindInterval(runif(100,0,100),c(17,19,24,29,34,44,54,64,69,74,79,84,110) )
        ]  # close extraction
The other option, of course, and a more "honest" one in thisinstance would be
cut(vec, breaks=c(...), labels=c(...) )
(It's not clear why you are not picking midpoint ages withinthose brackets to me.)
# Reference the lookup table based on computed indices
df$DSTz
<-DSTzlook[which(DSTzlook[,1]==df$Agetmp),which(DSTzlook[1,]==df$DSF+df$DSB)]
I have not been able to figure out what you are trying to dohere. Trying to use a 2d lookup looks promising a a way toemulate what an Excel user might attempt, but an example (asrequested in the message at the bottom of every posting) wouldreally be of great help in making this more concrete for those ofus with insufficient abstractive abilities.
--
David.
# Cleanup
#rm(df)
#df$Agetmp<-NULL
--
View this message in context: 
http://r.789695.n4.nabble.com/set-dataframe-field-value-from-lookup-table-tp3080245p3080245.html
Sent from the R help mailing list archive at Nabble.com.
David Winsemius, MD
West Hartford, CT
David Winsemius, MD
West Hartford, CT


David Winsemius, MD
West Hartford, CT

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] set dataframe field value from lookup table

Reply via email to