This is not an R-devel question, so please do not reply to this list. I would try sapply(strsplit(loaded.topics$doc.id, "_"), function(xx) xx[1]) to get the MD part.
Kasper On Mon, Feb 18, 2013 at 7:19 PM, bryan rasmussen <rasmussen.br...@gmail.com> wrote: > I have a table with a structure like the following: > > lang | basic id | doc id | topics| > se | 447157 | MD_2002_0014 |12 | > > loaded topics <- read.table("path to file",header=TRUE, sep="|", > fileEncoding="utf-8") > > In that table the actual meaningful data (in this context) is the text > before the first underscore in doc id which is the document type ( for > example MD as above), and topics. > However topics can have more than one value in it, multiple values are > comma separated, if there is no actual topic I have a 0 although I can > also have an empty column if I want. > > So what I want is the best way to extract the meaningful data - the > comma separated values of each topics column and the actual document > type so that I can start to do reports of how many documents of type X > have no topics, median number of topics per document type etc. > > Do I have to loop through the table and build a new table up with the > info I want, or is there a smarter way to do it? > If a smarter way, what is that smarter way. > > Thanks, > Bryan Rasmussen > > ______________________________________________ > R-devel@r-project.org mailing list > https://stat.ethz.ch/mailman/listinfo/r-devel ______________________________________________ R-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-devel