Hi Jim; Thanks so much for this info. I did not know this as I am very much new in R, So do you think that, rather than using unique !duplicated would be better to use?
Thanks in advance, Greg On Sun, Jun 12, 2016 at 7:06 PM, Jim Lemon <drjimle...@gmail.com> wrote: > Hi Greg, > You've got a problem that you don't seem to have identified. Your > "reg" field in the "map" data frame can define at most 100000 unique > values. This means that each value will be repeated about 270 times. > Unless there are constraints you haven't mentioned, we would expect > that in 135 cases for each value, the values in each "ref" row will be > in the reverse order and the spans may overlap. I notice that you may > have tried to get around this by sorting the "map" data frame, but > then the order of the rows is different, and the number of rows > "between" any two values changes. Apart from this, it is almost > certain that the number of values of "p > 0.85" in the multiple runs > between each set of "ref" values will be different. It is possible to > perform both tasks that you mention, but only the second will yield an > unique or tied value for all of the cases. So your result data frame > will have an unspecified number of values for each row in "ref" for > the first task. > > Jim > > > On Mon, Jun 13, 2016 at 6:14 AM, greg holly <mak.hho...@gmail.com> wrote: > > Dear all; > > > > > > > > I have two data sets, data=map and data=ref). A small part of each data > set > > are given below. Data map has more than 27 million and data ref has about > > 560 rows. Basically I need run two different task. My R codes for these > > task are given below but they do not work properly. > > > > I sincerely do appreciate your helps. > > > > > > Regards, > > > > Greg > > > > > > > > Task 1) > > > > For example, the first and second columns for row 1 in data ref are 29220 > > 63933. So I need write an R code normally first look the first row in ref > > (which they are 29220 and 63933) than summing the column of "map$rate" > and > > give the number of rows that >0.85. Then do the same for the second, > > third....in ref. At the end I would like a table gave below (the results > I > > need). Please notice the all value specified in ref data file are exist > in > > map$reg column. > > > > > > > > Task2) > > > > Again example, the first and second columns for row 1 in data ref are > 29220 > > 63933. So I need write an R code give the minimum map$p for the 29220 > > -63933 intervals in map file. Than > > > > do the same for the second, third....in ref. > > > > > > > > > > #my attempt for the first question > > > > temp<-map[order(map$reg, map$p),] > > > > count<-1 > > > > temp<-unique(temp$reg > > > > for(i in 1:length(ref) { > > > > for(j in 1:length(ref) > > > > { > > > > temp1<-if (temp[pos[i]==ref[ref$reg1,] & (temp[pos[j]==ref[ref$reg2,] > > & temp[cumsum(temp$rate) > >>0.70,]) > > > > count=count+1 > > > > } > > > > } > > > > #my attempt for the second question > > > > > > > > temp<-map[order(map$reg, map$p),] > > > > count<-1 > > > > temp<-unique(temp$reg > > > > for(i in 1:length(ref) { > > > > for(j in 1:length(ref) > > > > { > > > > temp2<-if (temp[pos[i]==ref[ref$reg1,] & (temp[pos[j]==ref[ref$reg2,]) > > > > output<-temp2[temp2$p==min(temp2$p),] > > > > } > > > > } > > > > > > > > Data sets > > > > > > Data= map > > > > reg p rate > > > > 10276 0.700 3.867e-18 > > > > 71608 0.830 4.542e-16 > > > > 29220 0.430 1.948e-15 > > > > 99542 0.220 1.084e-15 > > > > 26441 0.880 9.675e-14 > > > > 95082 0.090 7.349e-13 > > > > 36169 0.480 9.715e-13 > > > > 55572 0.500 9.071e-12 > > > > 65255 0.300 1.688e-11 > > > > 51960 0.970 1.163e-10 > > > > 55652 0.388 3.750e-10 > > > > 63933 0.250 9.128e-10 > > > > 35170 0.720 7.355e-09 > > > > 06491 0.370 1.634e-08 > > > > 85508 0.470 1.057e-07 > > > > 86666 0.580 7.862e-07 > > > > 04758 0.810 9.501e-07 > > > > 06169 0.440 1.104e-06 > > > > 63933 0.750 2.624e-06 > > > > 41838 0.960 8.119e-06 > > > > > > data=ref > > > > reg1 reg2 > > > > 29220 63933 > > > > 26441 41838 > > > > 06169 10276 > > > > 74806 92643 > > > > 73732 82451 > > > > 86042 93502 > > > > 85508 95082 > > > > > > > > the results I need > > > > reg1 reg2 n > > > > 29220 63933 12 > > > > 26441 41838 78 > > > > 06169 10276 125 > > > > 74806 92643 11 > > > > 73732 82451 47 > > > > 86042 93502 98 > > > > 85508 95082 219 > > > > [[alternative HTML version deleted]] > > > > ______________________________________________ > > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > > https://stat.ethz.ch/mailman/listinfo/r-help > > PLEASE do read the posting guide > http://www.R-project.org/posting-guide.html > > and provide commented, minimal, self-contained, reproducible code. > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.