Mark W. Miller wrote: > > I have a list of scientific names in a data set. I would like to split > the names into genus, species and subspecies. Not all names include a > subspecies. Could someone show me how to do this? >
strsplit should work for your example... data.frame( genus=sapply(strsplit(aa, " "), "[", 1), species=sapply(strsplit(aa, " "), "[", 2), subspecies=sapply(strsplit(aa, " "), "[", 3) ## will be NA for missing subsp ) However, scientific names are often pretty messy - I often have datasets like this... x [1] "Aquilegia caerulea James var. caerulea" [2] "Aquilegia caerulea James var. ochroleuca Hook." [3] "Aquilegia caerulea James var. pinetorum (Tidestrom) Payson ex Kearney & Peebles" [4] "Aquilegia caerulea James" [5] "Aquilegia chaplinei Standl." [6] "Aquilegia chaplinei Standley ex Payson" [7] "Aquilegia chrysantha Gray var. chrysantha" [8] "Aquilegia chrysantha Gray" So I first strip out author names using strsplit and use grep to find subspecies/variety abbreviations noauthor<-function(x){ ## split name into vector of separate words y<-strsplit(x, " ") sapply(y, function(x){ n<-grep( "^var\\.$|^ssp\\.$|^var$|^f\\.$",x) # apply a function to paste together the first and second elements # plus element after matching var., spp., f. (or and others) # use sort in case the name includes both var and spp -sometimes happens paste( x[sort(c(1:2, n,n+1))], collapse=" ") })} noauthor(x[1:8]) [1] "Aquilegia caerulea var. caerulea" [2] "Aquilegia caerulea var. ochroleuca" [3] "Aquilegia caerulea var. pinetorum" [4] "Aquilegia caerulea" [5] "Aquilegia chaplinei" [6] "Aquilegia chaplinei" [7] "Aquilegia chrysantha var. chrysantha" [8] "Aquilegia chrysantha" Chris -- View this message in context: http://old.nabble.com/splitting-scientific-names-into-genus%2C-species%2C-and-subspecies-tp26204666p26205654.html Sent from the R help mailing list archive at Nabble.com. ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.