On Fri, Jul 4, 2014 at 7:50 AM, João Azevedo Patrício <joao.patri...@gmx.pt> wrote: > Hi, > > I've been trying to solve this issue but with no success. > > I have some data like this: > > 1 > TC WC > 2 > 0 Instruments & Instrumentation; Nuclear Science & Technology; > Physics, Particles & Fields; Spectroscopy > 3 > 0 Nanoscience & Nanotechnology; Materials Science, Multidisciplinary; > Physics, Applied > 4 > 2 Physics, Nuclear; Physics, Particles & Fields > 5 > 0 Chemistry, Inorganic & Nuclear > 6 > 2 Chemistry, Physical; Materials Science, Multidisciplinary; > Metallurgy & Metallurgical Engineering > > And I need to have this: > > 1 > TC WC > 2 > 0 Instruments & Instrumentation > 2 > 0 Nuclear Science & Technology > 2 > 0 Physics, Particles & Fields > 2 > 0 Spectroscopy > 3 > 0 Nanoscience & Nanotechnology > 3 > 0 Materials Science, Multidisciplinary > 3 > 0 Physics, Applied > 4 > 2 Physics, Nuclear > 4 > 2 Physics, Particles & Fields > 5 > 0 Chemistry, Inorganic & Nuclear > 6 > 2 Chemistry, Physical > 6 > 2 Materials Science, Multidisciplinary > 6 > 2 Metallurgy & Metallurgical Engineering > > This means repeat the row for each element in WC and keeping the same value > in TC. The goal is to check how many TC (sum) there are by WC, when WC is > multiple. > > i've tried to separate the column using strsplt but then I cannot keep the > track of TC. > > thanks in advance. > -- > João Azevedo Patrício
Best that I've come up with, which seems to give the result desired from the example data given. splitAtSemiColon <- function(input) { z <- strsplit(input$WC,';'); result <- data.table(TC=rep(input$TC,sapply(z,length)), WC=unlist(z)); return(result); } flatted.data <- splitAtSemiColon(original.data); <transcript> > print(original.data,right=FALSE) TC 1 0 2 0 3 2 4 0 5 2 WC 1 Instruments & Instrumentation; Nuclear Science & Technology; Physics, Particles & Fields; Spectroscopy 2 Nanoscience & Nanotechnology; Materials Science, Multidisciplinary; Physics, Applied 3 Physics, Nuclear; Physics, Particles & Fields 4 Chemistry, Inorganic & Nuclear 5 Chemistry, Physical; Materials Science, Multidisciplinary; Metallurgy & Metallurgical Engineering > >> print(splitAtSemiColon,right=FALSE); function(x) { z=strsplit(x$WC,';'); result3=data.frame(TC=rep(x$TC,sapply(z,length)),WC=unlist(z)); return(result3); } > print(splitAtSemiColon(original.data),right=FALSE); TC WC 1 0 Instruments & Instrumentation 2 0 Nuclear Science & Technology 3 0 Physics, Particles & Fields 4 0 Spectroscopy 5 0 Nanoscience & Nanotechnology 6 0 Materials Science, Multidisciplinary 7 0 Physics, Applied 8 2 Physics, Nuclear 9 2 Physics, Particles & Fields 10 0 Chemistry, Inorganic & Nuclear 11 2 Chemistry, Physical 12 2 Materials Science, Multidisciplinary 13 2 Metallurgy & Metallurgical Engineering Note that I still have a problem in that the WC data can have leading and/or trailing blanks due to the say that strsplit works. The easiest way to fix this is to use the strtrim() function from the stringr package. -- There is nothing more pleasant than traveling and meeting new people! Genghis Khan Maranatha! <>< John McKown ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.