One additional comment. If you want 0 instead of NA when there is no match then the match statement should read:
match_list <- match( data_2$data1, data_2$data2, nomatch=0) On Fri, Oct 13, 2017 at 7:39 AM, Eric Berger <ericjber...@gmail.com> wrote: > Combining and completing the advice from Greg and Boris the complete > solution is two lines: > > data_2 <- read.csv("excel_data.csv", stringsAsFactors = FALSE) > match_list <- match( data_2$data1, data_2$data2 ) > > The vector match_list will have the matching position when it exists and > NA's otherwise. Its length will be the same as the length of data_2$data1. > > You should get experience in reading the help information for R functions. > In this case, type ?match to get information about the 'match' function. > > HTH, > Eric > > > On Fri, Oct 13, 2017 at 12:16 AM, Boris Steipe <boris.ste...@utoronto.ca> > wrote: > >> It's generally a very good idea to examine the structure of data after >> you have read it in. str(data2) would have shown you that read.csv() turned >> your strings into factors, and that's why the == operator no longer does >> what you think it does. >> >> use ... >> >> data_2 <- read.csv("excel_data.csv", stringsAsFactors = FALSE) >> >> ... to turn this off. Also, the %in% operator will achieve more directly >> what you are trying to do. No need for loops. >> >> B. >> >> >> >> >> > On Oct 12, 2017, at 4:25 PM, Yasin Gocgun <yasing...@gmail.com> wrote: >> > >> > Hi, >> > >> > I have two columns that contain numbers along with letters (as shown >> below) >> > and have different lengths. Each entry in the first column is likely to >> be >> > found in the second column at most once. >> > >> > For each entry of the first column, if that entry is found in the second >> > column, I would like to get the corresponding index. For instance, if >> the >> > first entry of the first column is 5th entry in the second column, I >> would >> > like to keep this index 5. >> > >> > AST2017000005534 TUR2017000001428 >> > CTS2017000079930 CTS2017000071989 >> > CTS2017000079931 CTS2017000072015 >> > >> > In a loop, when I use the following code to get those indices, >> > >> > >> > data_2 = read.csv("excel_data.csv") >> > column_1 = data_2$data1 >> > column_2 = data_2$data2 >> > >> > match_list <- array(0,dim=c(310,1)); # 310 is the length of the first >> > column >> > >> > for (indx in 1: 310){ >> > for(indx2 in 1:713){ # 713 is the length of the second column >> > if(column_1[indx] == column_2[indx2] ){ >> > match_list[indx,1] = indx2; >> > break; >> > } >> > } >> > } >> > >> > >> > R provides the following error: >> > >> > Error in Ops.factor(column_1[indx], column_2[indx2]) : >> > level sets of factors are different >> > >> > So can someone explain me how I can resolve this issue? >> > >> > Thnak you, >> > >> > Yasin >> > >> > [[alternative HTML version deleted]] >> > >> > ______________________________________________ >> > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> > https://stat.ethz.ch/mailman/listinfo/r-help >> > PLEASE do read the posting guide http://www.R-project.org/posti >> ng-guide.html >> > and provide commented, minimal, self-contained, reproducible code. >> >> ______________________________________________ >> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> PLEASE do read the posting guide http://www.R-project.org/posti >> ng-guide.html >> and provide commented, minimal, self-contained, reproducible code. >> > > [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.