Sorry, you're right. The result line should be:
result.m[cbind(factor(result$fcell), factor(result$cellneigh))] <- result$distance idcell <- data.frame( id = seq_len(5), fcell = sample(1:100, 5)) censDist <- expand.grid(fcell=seq_len(100), cellneigh=seq_len(100)) censDist$distance <- runif(nrow(censDist)) # assemble the non-symmetric distance matrix result <- subset(censDist, fcell %in% idcell$fcell & cellneigh %in% idcell$fcell) result.m <- matrix(NA, nrow=nrow(idcell), ncol=nrow(idcell)) result.m[cbind(factor(result$fcell), factor(result$cellneigh))] <- result$distance It's just about instantaneous on the dataset you sent me: system.time({ result <- subset(censDist, f_cell %in% id_cell$f_cell & f_cell_neigh %in% id_cell$f_cell) result.m <- matrix(NA, nrow=nrow(id_cell), ncol=nrow(id_cell)) result.m[cbind(factor(result$f_cell), factor(result$f_cell_neigh))] <- result$distance }) user system elapsed 0.361 0.007 0.368 Sarah On Fri, May 13, 2016 at 10:36 AM, A M Lavezzi <mario.lave...@unipa.it> wrote: > PLEASE IGNORE THE PREVIOUS EMAIL, IT WAS SENT BY MISTAKE > > Hello Sarah > thanks a lot for your advice. > > I followed your suggestions unitil the creation of "result" > > The allocation of the values of result$distance to the matrix result.m, > however ,does not seem to work: it produces a matrix with identical columns > corresponding to the last values of result$distance. Maybe my description of > the dataset was not clear enough. > > I produced the final matrix spat_dist with a loop, that I report below (it > takes about 1 hour on my macbook pro), > > set_i = -1 # create a variable to store the i values already examined > > for(i in unique(result$id)){ > > set_i=c(set_i,i) # store the value of the i > > set_neigh = result$id_neigh[result$id==i & !result$id_neigh %in% set_i] # > identify the locations connected to i. If the distance between i and j was > examined before, don't look for the distance between j and i > > for(j in set_neigh){ > if(i!=j){ > spat_dist[i,j] = result$distance[result$id==i & result$id_neigh==j] > spat_dist[j,i] = spat_dist[i,j] > } > else{ > spat_dist[i,j]=0 > } > } > } > > It is not the most elegant and efficient solution in the world, that's for > sure. > > I would be grateful, if you could suggest an alternative instruction to: > > result.m[factor(result$fcell), factor(result$cellneigh)] <- result$distance > > so I will learn a faster procedure (I tried many times but to modify this > structure but I did not make it). I don't want to abuse of your time, so > forget it if you are busy > > Thank you so much anyway, > Mario > > ps I attach the data. Notice that the 1327 units in id_cell are firms, > indexed by id, located in location f_cell. Different firms can be located in > the same f_cell. With respect to your suggestion, I added two columns to > "result" with the id of the firms. > > On Fri, May 13, 2016 at 3:26 PM, A M Lavezzi <mario.lave...@unipa.it> wrote: >> >> >> Hello Sarah >> thanks a lot for your advice. >> >> I followed your suggestions unitl the creation of "result" >> >> The allocation of the values of result$distance to the matrix result.m, >> however ,does not seem to work: it produces a matrix with identical columns >> corresponding to the last values of result$distance. Maybe my description of >> the dataset was not clear enough. >> >> I produced the final matrix with a loop, that I report below (it takes >> about 1 hour on my macbook pro), >> >> set_i = -1 # create a variable to store the i values already examined >> >> for(i in unique(result$id)){ >> >> set_i=c(set_i,i) # store the value of the i >> >> set_neigh = result$id_neigh[result$id==i & !result$id_neigh %in% set_i] >> # identify the locations connected to i. Exclude those >> >> for(j in set_neigh){ >> if(i!=j){ >> spat_dist[i,j] = result$distance[result$id==i & result$id_neigh==j] >> spat_dist[j,i] = spat_dist[i,j] >> } >> else{ >> spat_dist[i,j]=0 >> } >> } >> } >> >> It not the most elegant and efficient solution in the world, that's for >> sure >> >> >> >> On Thu, May 12, 2016 at 2:51 PM, Sarah Goslee <sarah.gos...@gmail.com> >> wrote: >>> >>> I don't see any reason why a loop is out of the question, and >>> answering would have been much easier if you'd included the requested >>> reproducible data, but what about this? >>> >>> This solution is robust to pairs from idcell being absent in censDist, >>> and to the difference from A to B being different than the distance >>> from B to A, but not to A-B appearing twice. If that's possible, >>> you'll need to figure out how to manage it. >>> >>> # create some fake data >>> >>> idcell <- data.frame( >>> id = seq_len(5), >>> fcell = sample(1:100, 5)) >>> >>> censDist <- expand.grid(fcell=seq_len(100), cellneigh=seq_len(100)) >>> censDist$distance <- runif(nrow(censDist)) >>> >>> # assemble the non-symmetric distance matrix >>> result <- subset(censDist, fcell %in% idcell$fcell & cellneigh %in% >>> idcell$fcell) >>> result.m <- matrix(NA, nrow=nrow(idcell), ncol=nrow(idcell)) >>> result.m[factor(result$fcell), factor(result$cellneigh)] <- >>> result$distance >>> >>> Sarah >>> >>> On Thu, May 12, 2016 at 5:26 AM, A M Lavezzi <mario.lave...@unipa.it> >>> wrote: >>> > Hello, >>> > >>> > I have a sample of 1327 locations, each one idetified by an id and a >>> > numerical code. >>> > >>> > I need to build a spatial matrix, say, M, i.e. a 1327x1327 matrix >>> > collecting distances among the locations. >>> > >>> > M(i,i) should be 0, M(i,j) should contain the distance among location i >>> > and >>> > j >>> > >>> > I shoud use data organized in the following way: >>> > >>> > 1) id_cell contains the identifier (id) of each location (1...1327) and >>> > the >>> > numerical code of the location (f_cell) (see head of id_cell below) >>> > >>> >> head(id_cell) >>> > id f_cell >>> > 1 1 2120 >>> > 12 2 204 >>> > 22 3 2546 >>> > 24 4 1327 >>> > 34 5 1729 >>> > 43 6 2293 >>> > >>> > 2) censDist contains, for each location identified by its numerical >>> > code, >>> > the distance to other locations (censDist has 1.5 million rows). The >>> > head(consist) below, for example, reads like this: >>> > >>> > location 2924 has a distance to 2732 of 1309.7525 >>> > location 2924 has a distance to 2875 of 696.2891, >>> > etc. >>> > >>> >> head(censDist) >>> > f_cell f _cell_neigh distance >>> > 1 2924 2732 1309.7525 >>> > 2 2924 2875 696.2891 >>> > 3 2924 2351 1346.0561 >>> > 4 2924 2350 1296.9804 >>> > 5 2924 2725 1278.1877 >>> > 6 2924 2721 1346.9126 >>> > >>> > >>> > Basically, for every location in id_cell I should pick up the distance >>> > to >>> > other locations in id_cell from censDist, and allocate it in M >>> > >>> > I have not come up with a satisfactory vectorizion of this problem and >>> > using a loop is out of question. >>> > >>> > Thanks for your help >>> > Mario >>> > >>> > >> [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.