[ https://issues.apache.org/jira/browse/SOLR-14832?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Vlad updated SOLR-14832: ------------------------ Description: Hi Support, please help to resolve an issue. I upload/index several documents in English and in Arabic languages to SOLR, in addition I use handler for Arabic language: <fieldType name="text" class="solr.TextField" positionIncrementGap="50"> <analyzer type="index"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" /> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> <filter class="solr.ArabicNormalizationFilterFactory"/> <filter class="solr.ArabicStemFilterFactory"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" /> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> <filter class="solr.ArabicNormalizationFilterFactory"/> <filter class="solr.ArabicStemFilterFactory"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> There are two environments: # Local machine: - SOLR version: 4,2 - Windows version: 10 # DEV env: - SOLR version 4.1 as part of the cloudera suit - Linux core version: 3.10.0-862 Issue appears when uploading documents: # Local machine: - Doc in English with English words only - ok (for example, "[www.apache.org|http://www.apache.org/]") - Doc in Arabic with some English words - ok (for example, "[www.apache.org|http://www.apache.org/]") # DEV env: - Doc in English with English words only - ok (for example, "[www.apache.org|http://www.apache.org/]") - Doc in Arabic with some English - English text is inverted (for example, "gro.echapa.www"), what makes search by key words impossible. Please advise whether this fixable and how? Thank you in advance! was: Hi Support, please help to resolve an issue. I upload/index several documents in English and in Arabic languages to SOLR, in addition I use handler for Arabic language: <fieldType name="text" class="solr.TextField" positionIncrementGap="50"> <analyzer type="index"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" /> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> <filter class="solr.ArabicNormalizationFilterFactory"/> <filter class="solr.ArabicStemFilterFactory"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true" /> <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" ignoreCase="true" expand="true"/> <filter class="solr.RemoveDuplicatesTokenFilterFactory"/> <filter class="solr.ArabicNormalizationFilterFactory"/> <filter class="solr.ArabicStemFilterFactory"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> There are two environments: # Local machine: - SOLR version: 4,2 - Windows version: 10 # DEV env: - SOLR version: - Cloudera suit - Linux core version: 3.10.0-862 Issue appears when uploading documents: # Local machine: - Doc in English with English words only - ok (for example, "[www.apache.org|http://www.apache.org/]") - Doc in Arabic with some English words - ok (for example, "[www.apache.org|http://www.apache.org/]") # DEV env: - Doc in English with English words only - ok (for example, "[www.apache.org|http://www.apache.org/]") - Doc in Arabic with some English - English text is inverted (for example, "gro.echapa.www"), what makes search by key words impossible. Please advise whether this fixable and how? > Inversion Eglish and numbers characters in Arabic documents > ----------------------------------------------------------- > > Key: SOLR-14832 > URL: https://issues.apache.org/jira/browse/SOLR-14832 > Project: Solr > Issue Type: Bug > Security Level: Public(Default Security Level. Issues are Public) > Affects Versions: 4.1 > Reporter: Vlad > Priority: Major > > Hi Support, > > please help to resolve an issue. I upload/index several documents in English > and in Arabic languages to SOLR, in addition I use handler for Arabic > language: > <fieldType name="text" class="solr.TextField" positionIncrementGap="50"> > <analyzer type="index"> > <tokenizer class="solr.StandardTokenizerFactory"/> > <filter class="solr.StopFilterFactory" ignoreCase="true" > words="stopwords.txt" enablePositionIncrements="true" /> > <filter > class="solr.RemoveDuplicatesTokenFilterFactory"/> > <filter > class="solr.ArabicNormalizationFilterFactory"/> > <filter class="solr.ArabicStemFilterFactory"/> > <filter class="solr.LowerCaseFilterFactory"/> > > </analyzer> > <analyzer type="query"> > <tokenizer class="solr.StandardTokenizerFactory"/> > <filter class="solr.StopFilterFactory" ignoreCase="true" > words="stopwords.txt" enablePositionIncrements="true" /> > <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt" > ignoreCase="true" expand="true"/> > <filter > class="solr.RemoveDuplicatesTokenFilterFactory"/> > <filter > class="solr.ArabicNormalizationFilterFactory"/> > <filter class="solr.ArabicStemFilterFactory"/> > <filter class="solr.LowerCaseFilterFactory"/> > > </analyzer> > > There are two environments: > # Local machine: > - SOLR version: 4,2 > - Windows version: 10 > > # DEV env: > - SOLR version 4.1 as part of the cloudera suit > - Linux core version: 3.10.0-862 > > Issue appears when uploading documents: > # Local machine: > - Doc in English with English words only - ok (for example, > "[www.apache.org|http://www.apache.org/]") > - Doc in Arabic with some English words - ok (for example, > "[www.apache.org|http://www.apache.org/]") > > # DEV env: > - Doc in English with English words only - ok (for example, > "[www.apache.org|http://www.apache.org/]") > - Doc in Arabic with some English - English text is inverted > (for example, "gro.echapa.www"), what makes search by key words impossible. > > Please advise whether this fixable and how? > > Thank you in advance! -- This message was sent by Atlassian Jira (v8.3.4#803005) --------------------------------------------------------------------- To unsubscribe, e-mail: issues-unsubscr...@lucene.apache.org For additional commands, e-mail: issues-h...@lucene.apache.org