Hi Phil, Sorry it's not in the environment you are using, but perhaps this will help:
taby<-table(df$y) ynames<-names(taby) for(yval in 1:length(taby)) { if(taby[yval] > 1) { cat(paste(ynames[yval],1:taby[yval],sep=""),"\n") df$y[which(df$y == ynames[yval])]<-paste(ynames[yval],1:taby[yval],sep="") } } Jim On Sun, Mar 29, 2020 at 12:19 PM <p...@philipsmith.ca> wrote: > > I have a problem involving inefficient coding. My code works, but in my > actual application it takes a very long time to execute. I have included > a reprex here that uses the same code, but with a much smaller-scale > application. > > The data frame I am working with (df in my reprex) is in long form and I > want to change it to wide form. My problem is that the pivot column, > column 2 in my reprex, has some duplicate strings, so the pivot doesn't > work well (df1 in my reprex). I want to find all the duplicates and tag > them so they are no longer duplicates. My code succeeds (df3 in my > reprex). But in the real application there can be over 100 "cases" and > the for loops grind on far too long. > > I encounter this problem frequently in the datasets I use, so I am > looking for a general solution that is as efficient as possible. Any > help will be much appreciated. > > Philip > > ``` r > library(tidyverse) > df <- data.frame(time=c(1,1,1,1,1,1,2,2,2,2,2,2), > y=c("A","B","C","B","D","C","A","B","C","B","D","C"), > z=sample(1:100,12,replace=TRUE),stringsAsFactors=FALSE) > df1 <- pivot_wider(df,id_cols=1,names_from=y,values_from=z) > #> Warning: Values in `z` are not uniquely identified; output will > contain list-cols. > #> * Use `values_fn = list(z = list)` to suppress this warning. > #> * Use `values_fn = list(z = length)` to identify where the duplicates > arise > #> * Use `values_fn = list(z = summary_fun)` to summarise duplicates > fixcol <- function(dfm,cases,per,s,tag) { > # dfm is the data frame > # s is the target column number, containing character names > # tag is a string to be added to a duplicate name > # cases is the number of rows for a single time period > # per is the number of time periods > # all time periods must have the same number of rows > for (k in 1:per) { > for (i in (1+(k-1)*cases):(k*cases-1)) { > for (j in (i+1):(k*cases)) { > if (dfm[j,s]==dfm[i,s]) { # found a duplicate > dfm[j,s] <- paste0(dfm[i,s],tag) # fix the duplicate > dfm[j,s] > } > } > } > } > return(dfm) > } > df2 <- fixcol(df,6,2,2,"_dup") > df3 <- pivot_wider(df2,id_cols=1,names_from=y,values_from=z) > ``` > > <sup>Created on 2020-03-28 by the [reprex > package](https://reprex.tidyverse.org) > (v0.3.0)</sup>______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.