Thank you so much Rui! On Sun, Feb 4, 2018 at 12:20 AM, Rui Barradas <ruipbarra...@sapo.pt> wrote:
> Hello, > > Please always cc the list. > > As for the question, I believe the following does it. > > a <- strsplit(mydata$ID, "[[:alpha:]]+") > b <- strsplit(mydata$ID, "[[:digit:]]+") > > a <- sapply(a, `[`, 1) > c <- sapply(a, `[`, 2) > b <- sapply(b, function(x) x[x != ""]) > > c2 <- sprintf("%010d", as.integer(c)) > > newID <- paste0(a, b, c2) > > > Hope this helps, > > Rui Barradas > > On 2/4/2018 2:01 AM, Val wrote: > >> Thank you so much again for your help! >> >> I have one more question related to this. >> >> 1. How do I further split this "358USA1540165 " into three parts. >> a) 358 >> b) USA >> c) 1540165 >> >> I want to add leading zeros to the third part like "0001540165" >> and then combine b and c to get this USA1540165 >> so USA1540165 changed to USA1540165 >> >> The other one is that the data set has several country codes and if I >> want to limit my data set to only certain country codes , how do I do that. >> >> Thank you again >> >> >> >> >> On Sat, Feb 3, 2018 at 1:05 PM, Rui Barradas <ruipbarra...@sapo.pt >> <mailto:ruipbarra...@sapo.pt>> wrote: >> >> Hello, >> >> As for the first question, instead of writing a xlsx file, maybe it >> is easier to write a csv file and then open it with Excel. >> >> tbl2 <- addmargins(tbl1) >> write.csv(tbl2, "tt1.csv") >> >> As for the second question, the following does it. >> >> inx <- apply(tbl1, 1, function(x) all(x != 0)) >> tbl1b <- addmargins(tbl1[inx, ]) >> tbl1b >> >> >> Hope this helps, >> >> Rui Barradas >> >> On 2/3/2018 4:42 PM, Val wrote: >> >> Thank you so much Rui. >> >> 1. How do I export this table to excel file? >> I used this >> tbl1 <- table(Country, IDNum) >> tbl2=addmargins(tbl1) >> write.xlsx(tbl2,"tt1.xlsx"),sheetName="summary", >> row.names=FALSE) >> The above did not give me that table. >> >> >> 2. I want select those unique Ids that do have records in all >> countries. >> From the above data set, this ID "FIN1540166" should be >> excluded from the summary table and the table looks like as follow >> >> IDNum Country 1 33 358 44 Sum CAN1540164 47 141 248 90 526 >> USA1540165 290 757 321 171 1539 Sum 337 898 569 261 2065 >> >> Thank you again >> >> >> On Fri, Feb 2, 2018 at 11:26 PM, Rui Barradas >> <ruipbarra...@sapo.pt <mailto:ruipbarra...@sapo.pt> >> <mailto:ruipbarra...@sapo.pt <mailto:ruipbarra...@sapo.pt>>> >> wrote: >> >> Hello, >> >> Thanks for the reproducible example. >> See if the following does what you want. >> >> IDNum <- sub("^(\\d+).*", "\\1", mydata$ID) >> Country <- sub("^\\d+(.*)", "\\1", mydata$ID) >> >> tbl1 <- table(Country, IDNum) >> addmargins(tbl1) >> >> tbl2 <- xtabs(Y ~ Country + IDNum, mydata) >> addmargins(tbl2) >> >> >> Hope this helps, >> >> Rui Barradas >> >> >> On 2/3/2018 3:00 AM, Val wrote: >> >> Hi all, >> >> I have a data set need to be summarized by unique ID >> (count and >> sum of a >> variable) >> A unique individual ID (country name Abbreviation >> followed by >> an integer >> numbers) may have observation in several countries. >> Then the ID was >> changed by adding the country code as a prefix and >> new ID was >> constructed >> or recorded like (country code, + the original unique >> ID Example >> original ID "CAN1540164" , if this ID has an >> observation in >> CANADA then >> the ID was changed to "1CAN1540164". From this new >> ID I >> want get out >> the country code get the original unique ID and >> summarize >> the data by >> unique ID and country code >> >> The data set look like >> mydata <- read.table(textConnection("GR ID iflag Y >> A 1CAN1540164 1 20 >> A 1CAN1540164 1 12 >> A 1CAN1540164 1 15 >> A 44CAN1540164 1 30 >> A 44CAN1540164 1 24 >> A 44CAN1540164 1 25 >> A 44CAN1540164 1 11 >> A 33CAN1540164 1 12 >> A 33CAN1540164 1 23 >> A 33CAN1540164 1 65 >> A 33CAN1540164 1 41 >> A 358CAN1540164 1 28 >> A 358CAN1540164 1 32 >> A 358CAN1540164 1 41 >> A 358CAN1540164 1 54 >> A 358CAN1540164 1 29 >> A 358CAN1540164 1 64 >> B 1USA1540165 1 125 >> B 1USA1540165 1 165 >> B 44USA1540165 1 171 >> B 33USA1540165 1 254 >> B 33USA1540165 1 241 >> B 33USA1540165 1 262 >> B 358USA1540165 1 321 >> C 358FIN1540166 1 225 "),header = TRUE >> ,stringsAsFactors = FALSE) >> >> From the above data there are three unique IDs and >> four >> country codes (1, >> 44, 33 and 358) >> >> I want the following two tables >> >> Table 1. count the unique ID by country code >> 1 44 33 358 TOT >> CAN1540164 3 4 4 6 17 >> USA1540165 2 1 3 1 7 >> FIN1540166 - - - 1 1 >> TOT 5 5 7 8 25 >> >> >> Table 2 Sum of Y variable by unique ID and country. code >> >> 1 44 33 >> 358 TOT >> CAN1540164 47 90 141 248 526 >> USA1540165 290 171 757 321 1539 >> FIN1540166 - - - 225 >> 225 >> TOT 337 261 898 794 >> 2290 >> >> >> How do I do it in R? >> >> The first step is to get the unique country codes >> unique ID >> by splitting >> the new ID >> >> Thank you in advance >> >> [[alternative HTML version deleted]] >> >> ______________________________________________ >> R-help@r-project.org <mailto:R-help@r-project.org> >> <mailto:R-help@r-project.org <mailto:R-help@r-project.org>> >> mailing list >> -- To UNSUBSCRIBE and more, see >> https://stat.ethz.ch/mailman/listinfo/r-help >> <https://stat.ethz.ch/mailman/listinfo/r-help> >> <https://stat.ethz.ch/mailman/listinfo/r-help >> <https://stat.ethz.ch/mailman/listinfo/r-help>> >> PLEASE do read the posting guide >> http://www.R-project.org/posting-guide.html >> <http://www.R-project.org/posting-guide.html> >> <http://www.R-project.org/posting-guide.html >> <http://www.R-project.org/posting-guide.html>> >> and provide commented, minimal, self-contained, >> reproducible code. >> >> >> >> [[alternative HTML version deleted]] ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.