"Try this before running removePuncutation():
corpus <- tm_map(corpus, function(x) gsub("[\'\U2019]«»", " ", x))"
It will replace quotation marks with a space, and that's enough to
separate them from the rest of the word.
I try to use your solution. It's work only for characters, not for a Corpus,
Le samedi 03 mars 2012 à 16:56 -0800, Mickael R problem a écrit :
> Hello everybody,
> I don't give up the fight, but it's hard. I have finded a solution for the
> ligature with a best converter wich tranlated more precisely PDF to plain
> text. But a new problem has occured. In french particulary,
Hello everybody,
I don't give up the fight, but it's hard. I have finded a solution for the
ligature with a best converter wich tranlated more precisely PDF to plain
text. But a new problem has occured. In french particulary, but it should be
the case in english too, I have a big problem ' " bracke
Le jeudi 01 mars 2012 à 07:07 -0800, Mickael R problem a écrit :
> Hi Richard,
> clearly there is a problem with latin ligature because the word resulting
> from my ask with findFreqTerms give me some words > "n"
>
> "nancement"
> >> "nancier" "nancière""nancières"
>
Hi Richard,
clearly there is a problem with latin ligature because the word resulting
from my ask with findFreqTerms give me some words > "n"
"nancement"
>> "nancier" "nancière""nancières"
>> "nanciers""xe"
where U+FB01 is a code for latin ligature. The problem
my computer run under windows vista 64 sp2. The question about encoding, I
don't understand it, sorry ?
--
View this message in context:
http://r.789695.n4.nabble.com/TM-reader-with-text-tp4433394p4433526.html
Sent from the R help mailing list archive at Nabble.com.
Most, maybe all, of the example words you posted include ligatures,
With "financier" for example, the leading "fi" is rendered in PDF and in
most typesetting
situations as a ligature with the a single complex character representing
the "fi' combination.
ï¬ ï¬
I pasted the "fi" and "fl" ligature
On Feb 29, 2012, at 6:00 PM, Mickael R problem wrote:
Hello everybody,
I work, I try, with TM but I have a problem with some special words in
french. I think this is due to the manner to transform PDF to text,
but I'm
not perfectly sure.
Let's see to the example :
findFreqTerms(tdm1,30)
8 matches
Mail list logo