Re: Inverse English an digits in Arabic Text

Alexandre Rafalovitch Tue, 08 Sep 2020 06:24:06 -0700

If you are uploading a PDF, then you must be doing it via Tika or via
an extract handler (which uses Tika under the covers).

Try getting a standalone Tika of the same version and see what it
outputs. Perhaps there is something in those specific PDF pages that
confuse Tika. Like, if it used different font for English text and
therefore Adobe encoded each letter individually and therefore broke
the flow. PDF is not a content format, but presentation format. These
things happen.

Regards,
   Alex

On Tue, 8 Sep 2020 at 09:11, <ad...@ukr.net> wrote:
>
>
> Thank you for support,
>
> I upload PDF file page by page. And in this case left to right (LTR) or right 
> to left (RTL) reading apples for the whole document not for the specific text 
> block ( separate for Arabic, separate for Enlish)
>
> I can see the same behavior with output for via  /select as well as /browse 
> call
>
> Almost sure the problem is with during upload
> <filter class="solr.ASCIIFoldingFilterFactory"/>
>
> But adding this to the
>   <analyzer type="index"> and latter to another analyzer does not change the 
> result.
>
>

Re: Inverse English an digits in Arabic Text

Reply via email to