[R] please help ! label selected data points in huge number of data points potentially as high as 50, 000 !

Umesh Rosyara Sat, 05 Mar 2011 21:10:25 -0800

Dear All 
 
I am reposting because I my problem is real issue and I have been working on
this. I know this might be simple to those who know it ! Anyway I need help
!
 
Let me clear my point. I have huge number of datapoints plotted using either
base plot function or xyplot in lattice (I have preference to use lattice). 
         name xvar            p
1       M1    1  0.107983837
2       M2   11  0.209125624
3       M3   21  0.163959428
4       M4   31  0.132469859
5       M5   41  0.086095130
6       M6   51  0.180822010
7       M7   61  0.246619925
8       M8   71  0.147363687
9       M9   81  0.162663127
........
5000 observations  
 
I need to plot xvar (x variable) and p (y variable) using either plot () or
xyplot(). And I want show (print to graph) datapoint name labels to those
rows that have p value < 0.01 (means that they are significant). With my
limited R knowlege I can use text (x,y, labels) option to manually add the
text, but I have huge number of data point(though I provide just 1000 here,
potentially it can go upto 50,000). So I want to display name corresponding
to those observations (rows) that have pvalue less than 0.05 (threshold). 
 
Here is my example dataset and my status:
name <- c(paste ("M", 1:5000, sep = ""))
xvar <- seq(1, 50000, 10)
set.seed(134)
p <- rnorm(5000, 0.15,0.05)
dataf <- data.frame(name,xvar, p)
 
# using lattice (my first preference)
require(lattice) 
xyplot(p ~ xvar, dataf)
 
#I want to display names for the following observation that meet requirement
of p <0.01. 
which (dataf$p < 0.01) 
[1]  811  854 1636 1704 2148 2161 2244 3205 3268 4177 4564 4614 4639 4706
 
Thus significant observations are:
        name  xvar             p
811   M811  8101  0.0050637068
854   M854  8531 -0.0433901783
1636 M1636 16351 -0.0279014039
1704 M1704 17031  0.0029878335
2148 M2148 21471  0.0048898232
2161 M2161 21601 -0.0354130557
2244 M2244 22431  0.0003255200
3205 M3205 32041  0.0079758430
3268 M3268 32671  0.0012797145
4177 M4177 41761  0.0015487439
4564 M4564 45631  0.0024867152
4614 M4614 46131  0.0078381964
4639 M4639 46381 -0.0063151605
4706 M4706 47051  0.0032200517


I want the datapoint (8101, 0.0050637068) with M811 in the plot. Similarly
for all of the above (that are significant). I do not want to label all out
of 5000 who do have p value < 0.01. I know I can add manually - text (8101,
0.0050637068, M811) in plot() in base. 
 
plot (dataf$xvar,p)
text (8101, 0.0050637068, "M811")
text (8531, -0.0433901783, "M854")
 
I need more automation to deal with observations as high as 50,000. In real
sense I do not know how many variables there will be. 
 
You help is highly appreciated. Thank you;
 
Best Regards
 
Umesh R 
 
 
 

  _____  

From: Umesh Rosyara [mailto:rosyar...@gmail.com] 
Sent: Saturday, March 05, 2011 12:30 PM
To: 'r-help@r-project.org'
Subject: displaying label meeting condition (i.e. significant, i..e p value
less than 005) in plot function 


Dear R users,
 
Here is my problem:
 
# example data
name <- c(paste ("M", 1:1000, sep = ""))
xvar <- seq(1, 10000, 10)
set.seed(134)
p <- rnorm(1000, 0.15,0.05)
dataf <- data.frame(name,xvar, p)
plot (dataf$xvar,p)
abline(h=0.05)
 
# I can know which observation number is less than 0.05 
which (dataf$p < 0.05) 
[1]  12  20  80 269 272 338 366 368 397 403 432 453 494 543 592 691 723 789
811
[20] 854 891 931 955
 
I want to display (label) corresponding names on the plot above: 
means that 12th observation M12, 20th observation M20 and so on. Please note
that I have names not in numerical sequience (rather different names), just
provided for this example to create dataset easily.
 
Thanks in advance
 
Umesh R
 

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

[R] please help ! label selected data points in huge number of data points potentially as high as 50, 000 !

Reply via email to