Vlad created SOLR-14832:
---------------------------

             Summary: Inversion Eglish and numbers characters in Arabic 
documents
                 Key: SOLR-14832
                 URL: https://issues.apache.org/jira/browse/SOLR-14832
             Project: Solr
          Issue Type: Bug
      Security Level: Public (Default Security Level. Issues are Public)
    Affects Versions: 4.1
            Reporter: Vlad


Hi Support,

 

please help to resolve an issue. I upload/index several documents in English 
and in Arabic languages to SOLR, in addition I use handler for Arabic language:

  <fieldType name="text" class="solr.TextField" positionIncrementGap="50">

   <analyzer type="index">

        <tokenizer class="solr.StandardTokenizerFactory"/>

        <filter class="solr.StopFilterFactory" ignoreCase="true" 
words="stopwords.txt" enablePositionIncrements="true" />

                         <filter 
class="solr.RemoveDuplicatesTokenFilterFactory"/>

                         <filter class="solr.ArabicNormalizationFilterFactory"/>

        <filter class="solr.ArabicStemFilterFactory"/>

        <filter class="solr.LowerCaseFilterFactory"/>

 

      </analyzer>

      <analyzer type="query">

        <tokenizer class="solr.StandardTokenizerFactory"/>

        <filter class="solr.StopFilterFactory" ignoreCase="true" 
words="stopwords.txt" enablePositionIncrements="true" />

        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" 
ignoreCase="true" expand="true"/>

                         <filter 
class="solr.RemoveDuplicatesTokenFilterFactory"/>

                          <filter 
class="solr.ArabicNormalizationFilterFactory"/>

        <filter class="solr.ArabicStemFilterFactory"/>

        <filter class="solr.LowerCaseFilterFactory"/>

 

      </analyzer>

 

There are two environments:
 # Local machine:

                - SOLR version: 4,2

                - Windows version: 10

 
 # DEV env:

                - SOLR version: 

                - Cloudera suit

                - Linux core version: 3.10.0-862

 

Issue appears when uploading documents:
 # Local machine:

                - Doc in English with English words only - ok (for example, 
"[www.apache.org|http://www.apache.org/]";)

                - Doc in Arabic with some English words - ok (for example, 
"[www.apache.org|http://www.apache.org/]";)

 
 # DEV env:

                - Doc in English with English words only - ok (for example, 
"[www.apache.org|http://www.apache.org/]";)

                - Doc in Arabic with some English - English text is inverted 
(for example, "gro.echapa.www"), what makes search by key words impossible.

 

Please advise whether this fixable and how?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

---------------------------------------------------------------------
To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org
For additional commands, e-mail: issues-h...@lucene.apache.org

Reply via email to