Does this help? df4 <- ( df %>% group_by( time, y ) %>% mutate( lvl = seq.int( n() ) ) %>% ungroup() %>% mutate( y = ifelse( 1==lvl , y , paste( y, "dup" ) ) ) )
On March 28, 2020 6:18:51 PM PDT, p...@philipsmith.ca wrote: >I have a problem involving inefficient coding. My code works, but in my > >actual application it takes a very long time to execute. I have >included >a reprex here that uses the same code, but with a much smaller-scale >application. > >The data frame I am working with (df in my reprex) is in long form and >I >want to change it to wide form. My problem is that the pivot column, >column 2 in my reprex, has some duplicate strings, so the pivot doesn't > >work well (df1 in my reprex). I want to find all the duplicates and tag > >them so they are no longer duplicates. My code succeeds (df3 in my >reprex). But in the real application there can be over 100 "cases" and >the for loops grind on far too long. > >I encounter this problem frequently in the datasets I use, so I am >looking for a general solution that is as efficient as possible. Any >help will be much appreciated. > >Philip > >``` r >library(tidyverse) >df <- data.frame(time=c(1,1,1,1,1,1,2,2,2,2,2,2), > y=c("A","B","C","B","D","C","A","B","C","B","D","C"), > z=sample(1:100,12,replace=TRUE),stringsAsFactors=FALSE) >df1 <- pivot_wider(df,id_cols=1,names_from=y,values_from=z) >#> Warning: Values in `z` are not uniquely identified; output will >contain list-cols. >#> * Use `values_fn = list(z = list)` to suppress this warning. >#> * Use `values_fn = list(z = length)` to identify where the >duplicates >arise >#> * Use `values_fn = list(z = summary_fun)` to summarise duplicates >fixcol <- function(dfm,cases,per,s,tag) { > # dfm is the data frame > # s is the target column number, containing character names > # tag is a string to be added to a duplicate name > # cases is the number of rows for a single time period > # per is the number of time periods > # all time periods must have the same number of rows > for (k in 1:per) { > for (i in (1+(k-1)*cases):(k*cases-1)) { > for (j in (i+1):(k*cases)) { > if (dfm[j,s]==dfm[i,s]) { # found a duplicate > dfm[j,s] <- paste0(dfm[i,s],tag) # fix the duplicate > dfm[j,s] > } > } > } > } > return(dfm) >} >df2 <- fixcol(df,6,2,2,"_dup") >df3 <- pivot_wider(df2,id_cols=1,names_from=y,values_from=z) >``` > ><sup>Created on 2020-03-28 by the [reprex >package](https://reprex.tidyverse.org) (v0.3.0)</sup> -- Sent from my phone. Please excuse my brevity. ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.