I did this on the source files which were semi-colon delimted (to delimit the
fields, I am not sure what character denotes the new tweet)
After loading the tm package
> txt <- system.file("texts", "txt", package = "tm")
> (twitter <- Corpus(DirSource(txt),
+ readerControl = list(language = "lat"
On Nov 1, 2009, at 8:24 AM, onyourmark wrote:
Hello. The "fields" are separated by a ';'. I think that the data is
"rectangular" in the sense that there are about 15 fields for each
row.
There either are 15 fields or there aren't. You can't make a dataframe
with an approximate number of
On 01/11/2009 7:43 AM, onyourmark wrote:
Hi. I have a huge list called twitter:
It's a list, but more importantly it's a VCorpus and a Corpus. You
should use the functions appropriate to those classes to extract the
strings making up the data, declare their encoding properly (or convert
the
Hello. The "fields" are separated by a ';'. I think that the data is
"rectangular" in the sense that there are about 15 fields for each row. Some
of the fields are empty. In the dput() display below, it seems that the rows
are delimited by ' " ' .
Any idea from this?
Here is the end of the output
Three suggestions:
-- drop the idea of using a dataframe. It's only appropriate when the
data is rectangular.
-- look at strsplit for separating at "@" characters.
-- post the output of dput() on your sample, since email is probably
not capable of rendering this data without creating distort
Hi. I have a huge list called twitter:
> dim(twitter)
NULL
> str(twitter)
List of 1
$ :Classes 'PlainTextDocument', 'TextDocument', 'character' atomic
[1:35575] 11999;10:47:14;20;10;2009;ObamaLouverture;Trails Mixed Lessons For
Governance From Campaigner-in-chief: President obama jumps campaig
6 matches
Mail list logo