On Feb 29, 2012, at 6:00 PM, Mickael R problem wrote:
Hello everybody,
I work, I try, with TM but I have a problem with some special words in
french. I think this is due to the manner to transform PDF to text,
but I'm
not perfectly sure.
Let's see to the example :
findFreqTerms(tdm1,30)
[33] "<U+F0A3>" "<U+FB01>n" "<U
+FB01>nancement"
"<U+FB01>nancier" "<U+FB01>nancière" "<U+FB01>nancières"
"<U+FB01>nanciers" "<U+FB01>xe"
Some french words are not well reading by TM with the reader
readPlain. I
try to use reader= reader PDF. But it doesn't work so I must
transformed PDF
text to text. And some words are not understand so when I use
TermDocumentMatrix a word like inflation diseappear. It's a big
probleme for
me. I spend lot of time on this problem, any idea ? Thank's for you
time.
You included no information about your platform, locale settings, or
encoding of the text.
?Encoding
?sessionInfo
--
David Winsemius, MD
West Hartford, CT
______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.