> On Apr 25, 2016, at 2:32 AM, Sunny Singha <sunnysingha.analyt...@gmail.com> > wrote: > > Thank you Jim, > The code did assist me to get the what I needed. > Also, I learnt that there are different types of dashes > (en-dash/em-dash/hyphen) as explained on this site : > http://www.punctuationmatters.com/hyphen-dash-n-dash-and-m-dash/ > > I achieved it by executing below command after going through this page > on stackoverflow: > http://stackoverflow.com/questions/9223795/how-to-correctly-deal-with-escaped-unicode-characters-in-r-e-g-the-em-dash > > splitends<-sapply(end,strsplit,"-|\u2013|,") > > where '\u2013' is, i guess, the unicode for en-dash/em-dash character > in the ranges values. > I had scrapped the HTML table from this web page : > https://en.wikipedia.org/wiki/List_of_World_Heritage_in_Danger > and range values does have en-dash characters. > > For now the issue is resolved but how does one capture values similar > to '\u2013' for other possible special cases to be specified in the > regex ?
It's possible to target sequences of Unicode characters using a regex character class which does have a sequence operator. (R's sequence operator fails in my efforts.) x <- "\"em\u2013dash\" \"em–dash\" \" em \u2016 dash\"" gsub('[\u2013:\u2016]', "", x) # removes both #[1] "\"emdash\" \"emdash\" \" em dash\"" -- David. > > Regards, > Sunny Singha. > > > On Mon, Apr 25, 2016 at 12:39 PM, Jim Lemon <drjimle...@gmail.com> wrote: >> Hi Sunny, >> Try this: >> >> # notice that I have replaced the fancy hyphens with real hyphens >> end<-c("2001-","1992-","2013-","2013-","2013-","2013-", >> "1993-2007","2010-","2012-","1984-1992","1996-","2015-") >> splitends<-sapply(end,strsplit,"-") >> last_bit(x) return(x[length(x)]) >> sapply(splitends,last_bit) >> >> Jim >> >> On Mon, Apr 25, 2016 at 4:35 PM, Sunny Singha >> <sunnysingha.analyt...@gmail.com> wrote: >>> Hi, >>> I have a char vector with year values. Some cells have single year >>> value '2001-' and some have range like 1996-2007. >>> I need to remove hyphen character '-' from all the values within the >>> character vector named as 'end'. After removing the hyphen I need to >>> get the last >>> number from the cells where there are year range values i.e if the >>> cell has range 1996-2007, the code should return me 2007. >>> >>> How could I get this done? >>> >>> Below are the values within this char vector: >>> >>>> end >>> [1] "2001-" "1992-" "2013-" "2013-" >>> "2013-" "2013-" >>> [7] "2003-" "2010-" "2009-" "1986-" >>> "2012-" "2003-" >>> [13] "2005-" "2013-" "2003-" "2013-" >>> "1993–2007, 2010-" "2012-" >>> [19] "1984–1992, 1996-" "2015-" "2009-" "2000-" >>> "2005-" "1997-" >>> [25] "2012-" "1997-" "2002-" "2006-" >>> "1992-" "2007-" >>> [31] "1997-" "1982-" "2015-" "2015-" >>> "2010-" "1996–2007, 2011-" >>> [37] "2004-" "1999-" "2007-" "1996-" >>> "2013-" "2012-" >>> [43] "2012-" "2010-" "2011-" "1994-" >>> "2014-" >>> >>> I tried below command--> gsub('[-|,]', '', end) >>> This did remove all the hyphen character but not from cells having >>> range year values.Below is the result after executing above command: >>> As you see hypphen character is removed from single values but not >>> from ranges. Please guide. >>> >>>> gsub('[-|,]', '', end) >>> [1] "2001" "1992" "2013" "2013" >>> "2013" "2013" "2003" >>> [8] "2010" "2009" "1986" "2012" >>> "2003" "2005" "2013" >>> [15] "2003" "2013" "1993–2007 2010" "2012" >>> "1984–1992 1996" "2015" "2009" >>> [22] "2000" "2005" "1997" "2012" >>> "1997" "2002" "2006" >>> [29] "1992" "2007" "1997" "1982" >>> "2015" "2015" "2010" >>> [36] "1996–2007 2011" "2004" "1999" "2007" >>> "1996" "2013" "2012" >>> [43] "2012" "2010" "2011" "1994" >>> "2014" >>> >>> Regards, >>> Sunny Singha >>> >>> ______________________________________________ >>> R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see >>> https://stat.ethz.ch/mailman/listinfo/r-help >>> PLEASE do read the posting guide http://www.R-project.org/posting-guide.html >>> and provide commented, minimal, self-contained, reproducible code. > > ______________________________________________ > R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see > https://stat.ethz.ch/mailman/listinfo/r-help > PLEASE do read the posting guide http://www.R-project.org/posting-guide.html > and provide commented, minimal, self-contained, reproducible code. David Winsemius Alameda, CA, USA ______________________________________________ R-help@r-project.org mailing list -- To UNSUBSCRIBE and more, see https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.