Most, maybe all, of the example words you posted include ligatures,
With "financier" for example, the leading "fi" is rendered in PDF and in
most typesetting
situations as a ligature with the a single complex character representing
the "fi' combination.

fi fl

I pasted the "fi" and "fl" ligatures in this email. I hope they get through.

I don't know the package you are using, I hope it has arguments that tell
it about ligatures.

Rich



On Wed, Feb 29, 2012

at 6:49 PM, David Winsemius <dwinsem...@comcast.net> wrote:

>
> On Feb 29, 2012, at 6:00 PM, Mickael R problem wrote:
>
> Hello everybody,
>> I work, I try, with TM but I have a problem with some special words in
>> french. I think this is due to the manner to transform PDF to text, but
>> I'm
>> not perfectly sure.
>> Let's see to the example :
>>
>> findFreqTerms(tdm1,30)
>>   [33] "<U+F0A3>"            "<U+FB01>n"           "<U+FB01>nancement"
>> "<U+FB01>nancier"     "<U+FB01>nancière"    "<U+FB01>nancières"
>> "<U+FB01>nanciers"    "<U+FB01>xe"
>>
>> Some french words are not well reading by TM with the reader readPlain. I
>> try to use reader= reader PDF. But it doesn't work so I must transformed
>> PDF
>> text to text. And some words are not understand so when I use
>> TermDocumentMatrix a word like inflation diseappear. It's a big probleme
>> for
>> me. I spend lot of time on this problem, any idea ? Thank's for you time.
>>
>
> You included no information about your platform, locale settings, or
> encoding of the text.
>
> ?Encoding
> ?sessionInfo
>
> --
>
> David Winsemius, MD
> West Hartford, CT
>
>
> ______________________________**________________
> R-help@r-project.org mailing list
> https://stat.ethz.ch/mailman/**listinfo/r-help<https://stat.ethz.ch/mailman/listinfo/r-help>
> PLEASE do read the posting guide http://www.R-project.org/**
> posting-guide.html <http://www.r-project.org/posting-guide.html>
> and provide commented, minimal, self-contained, reproducible code.
>

        [[alternative HTML version deleted]]

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to