Hi all, 

 

I have tried to create  a DocumentTermMatrix with a tm package, but i get this 
error :

 

Error in tolower(txt) : 

  invalid input 'PROD Z LAHKO GNETNO MELJNO GLINO, ... in 'utf8towcs'

 

I tried doing this as it is showed in :

http://www.r-project.org/doc/Rnews/Rnews_2008-2.pdf (An Introduction to Text 
Mining),

 

with this R code :

 

setwd("C:/Users/mpavlic/Desktop/temp")

tekst <- Corpus(DirSource("."))

>Warning message:

>In readLines(y, encoding = x$Encoding) :

>incomplete final line found on './test.txt'

 

meta(tekst, "Heading", "local") <- c("test")

meta(tekst[[1]])

>Available meta data pairs are:

  Author       : 

   DateTimeStamp: 2011-05-21 11:25:21

   Description  : 

   Heading      : test

  ID           : test.txt

  Language     : en

  Origin       :

 

test <- TermDocumentMatrix(tekst)

> Error in tolower(txt) : 

> invalid input 'PROD Z LAHKO GNETNO MELJNO GLINO, ... in 'utf8towcs'

 

 

Attached is a small sample (test.txt) on which i worked.

 

Any help would be appreaciated, 

m

 

 

PROD Z LAHKO GNETNO MELJNO GLINO, RJAV PROD IN LAHKO GNETNA MELJNA GLINA, GRUŠÈ 
PEŠÈENJAKA, RJAV SLABO VEZAN KONGLOMERAT, POROZEN, SIVORJAV PROD IN GRUŠÈ  Z 
LAHKO GNETNO PEŠÈENO,  MELJNO GLINO, RJAVA PROD Z LAHKO GNETNO PEŠÈENO GLINO, 
RJAVA PROD DO r = 60 mm, PEŠÈEN IN MELJAST, Z LAHKO GNETNO RJAVO GLINO GLINAST 
PROD DO r = 60 mm, PEŠÈEN IN MELJAST, SREDNJE GOST, RJAV MALO PREPEREL MELJAST 
PROD DO r = 80. mm, S PRODNIKI DO r = 120 mm, PEŠÈEN, VLAZEN, S PLASTMI SLABO 
VEZANEGA KONGLOMERATA, RAHEL DO SREDNJE GOST, RJAVOSIV PROD IN LARKO GNETNA 
PEŠÈENA GLINA, RJAVA GLINAST PROD, RJAV ZELO PREPEREL MELJAST PROD Z LAHKO DO 
SREDNJE GNETNO MELJNO PEŠÈENO GLINO, ZELO VLAŽEN, RJAV PREPEREL PROD Z RJAVO 
MELJNO GLINO PREPEREL GLINAST PROD DO r = 70mm, VLAŽEN  SREDEDNJE GOST
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to