Re: [R] text mining

Duncan Murdoch Mon, 30 May 2011 05:30:56 -0700

On 30/05/2011 6:17 AM, rgui wrote:

Hi,


I have a problem when indexing the corpus. I used the following syntax:

>  Setwd ("c :/....")
>  Library (tm)
>  Txt = Corpus (DirSource ("."); readerControl = list (language = "frensh"))

Capitalization is important in R, so when asking a question, please cutand paste what you actually did. In this case, it doesn't matter.

an error message comes:

>>>  Messages d'avis :
1: In readLines(y, encoding = x$Encoding) :
   ligne finale incomplète trouvée dans './n3.txt'
2: In readLines(y, encoding = x$Encoding) :
   ligne finale incomplète trouvée dans './n32.

Those are warnings, not errors. readLines gives those warnings whenthe last line of the file stops abruptly, rather than having an end ofline marker. On Unix systems this usually signals a problem with thefile. Windows is more tolerant, so many editors don't bother to add thefinal marker.

another question:
  how can I read different document types (. pdf,. "...) html using the
package "tm"?

I think you need to convert them to text first (by some tool outside ofR), but I might be wrong.


Duncan Murdoch

Thanks very well for help



--
View this message in context: 
http://r.789695.n4.nabble.com/text-mining-tp3560367p3560367.html
Sent from the R help mailing list archive at Nabble.com.

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Re: [R] text mining

Reply via email to