Re: [Rd] best way to extract this meaningful data from a table

Kasper Daniel Hansen Mon, 18 Feb 2013 18:58:43 -0800

This is not an R-devel question, so please do not reply to this list.

I would try
  sapply(strsplit(loaded.topics$doc.id, "_"), function(xx) xx[1])
to get the MD part.


Kasper

On Mon, Feb 18, 2013 at 7:19 PM, bryan rasmussen
<[email protected]> wrote:
> I have a table with a structure like the following:
>
> lang | basic id | doc id | topics|
> se  | 447157 | MD_2002_0014 |12 |
>
> loaded topics <- read.table("path to file",header=TRUE, sep="|",
> fileEncoding="utf-8")
>
> In that table the actual meaningful data (in this context) is the text
> before the first underscore in doc id which is the document type ( for
> example MD as above), and topics.
> However topics can have more than one value in it, multiple values are
> comma separated, if there is no actual topic I have a 0 although I can
> also have an empty column if I want.
>
> So what I want is the best way to extract the meaningful data - the
> comma separated values of each topics column and the actual document
> type so that I can start to do reports of how many documents of type X
> have no topics, median number of topics per document type etc.
>
> Do I have to loop through the table and build a new table up with the
> info I want, or is there a smarter way to do it?
> If a smarter way, what is that smarter way.
>
> Thanks,
> Bryan Rasmussen
>
> ______________________________________________
> [email protected] mailing list
> https://stat.ethz.ch/mailman/listinfo/r-devel

______________________________________________
[email protected] mailing list
https://stat.ethz.ch/mailman/listinfo/r-devel

Re: [Rd] best way to extract this meaningful data from a table

Reply via email to