Hi all,
I have tried to create a DocumentTermMatrix with a tm package, but i get this
error :
Error in tolower(txt) :
invalid input 'PROD Z LAHKO GNETNO MELJNO GLINO, ... in 'utf8towcs'
I tried doing this as it is showed in :
http://www.r-project.org/doc/Rnews/Rnews_2008-2.pdf (An Introduction to Text
Mining),
with this R code :
setwd("C:/Users/mpavlic/Desktop/temp")
tekst <- Corpus(DirSource("."))
>Warning message:
>In readLines(y, encoding = x$Encoding) :
>incomplete final line found on './test.txt'
meta(tekst, "Heading", "local") <- c("test")
meta(tekst[[1]])
>Available meta data pairs are:
Author :
DateTimeStamp: 2011-05-21 11:25:21
Description :
Heading : test
ID : test.txt
Language : en
Origin :
test <- TermDocumentMatrix(tekst)
> Error in tolower(txt) :
> invalid input 'PROD Z LAHKO GNETNO MELJNO GLINO, ... in 'utf8towcs'
Attached is a small sample (test.txt) on which i worked.
Any help would be appreaciated,
m
PROD Z LAHKO GNETNO MELJNO GLINO, RJAV PROD IN LAHKO GNETNA MELJNA GLINA, GRUÈ
PEÈENJAKA, RJAV SLABO VEZAN KONGLOMERAT, POROZEN, SIVORJAV PROD IN GRUÈ Z
LAHKO GNETNO PEÈENO, MELJNO GLINO, RJAVA PROD Z LAHKO GNETNO PEÈENO GLINO,
RJAVA PROD DO r = 60 mm, PEÈEN IN MELJAST, Z LAHKO GNETNO RJAVO GLINO GLINAST
PROD DO r = 60 mm, PEÈEN IN MELJAST, SREDNJE GOST, RJAV MALO PREPEREL MELJAST
PROD DO r = 80. mm, S PRODNIKI DO r = 120 mm, PEÈEN, VLAZEN, S PLASTMI SLABO
VEZANEGA KONGLOMERATA, RAHEL DO SREDNJE GOST, RJAVOSIV PROD IN LARKO GNETNA
PEÈENA GLINA, RJAVA GLINAST PROD, RJAV ZELO PREPEREL MELJAST PROD Z LAHKO DO
SREDNJE GNETNO MELJNO PEÈENO GLINO, ZELO VLAEN, RJAV PREPEREL PROD Z RJAVO
MELJNO GLINO PREPEREL GLINAST PROD DO r = 70mm, VLAEN SREDEDNJE GOST
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.