+1 I was hoping to use this as a case for arguing for turning off an overly aggressive stemmer, but I checked on your 10 docs and query, and David is right, of course -- if you change the default operator to AND, you only get the one document back that you had intended to.
I can still use this as a case for getting on my Unicode normalization soapbox and +1'ing your use of the ICUFoldingFilter. With no token filters, you get 4 results; when you add the ICUFoldingFilter, you get 8 results; and when you add in the Arabic stemmer, you get all 10. Not that you need this, but see slide 33 of [1], where we show 78 Unicode variants for "America" in ~800k docs in an Arabic script language. Without Unicode normalization, users might get 1/2 the documents back or far, far fewer...and they wouldn't even know what they were missing! [1] https://github.com/tballison/share/blob/master/slides/TextProcessingAndAdvancedSearch_tallison_MITRE_201510_final_abbrev.pdf -----Original Message----- From: David Hastings [mailto:hastings.recurs...@gmail.com] Sent: Wednesday, August 2, 2017 9:00 AM To: solr-user@lucene.apache.org Subject: Re: Arabic words search in solr perhaps change your default operator to AND instead of OR if thats what you are expecting for a result On Wed, Aug 2, 2017 at 8:57 AM, mohanmca01 <mohanmc...@gmail.com> wrote: > Hi Phil Scadden, > > Thank you for your reply, > > we tried your suggested solution by removing hyphen while indexing, > but it was getting wrong results. i was searching for "شرطة ازكي" and > it was showing me the result that am looking for, plus irrelevant > result which either have the first or second word that i have typed while > searching. > > First word: شرطة > Second Word: ازكي > > results that we are getting: > > > { > "responseHeader": { > "status": 0, > "QTime": 3, > "params": { > "indent": "true", > "q": "bizNameAr:(شرطة ازكي)", > "_": "1501678260335", > "wt": "json" > } > }, > "response": { > "numFound": 444, > "start": 0, > "docs": [ > { > "id": "28107", > "bizNameAr": "شرطة عمان السلطانية - قيادة شرطة محافظة الداخلية > - > - > مركز شرطة إزكي", > "_version_": 1574621132849414100 > }, > { > "id": "13937", > "bizNameAr": "مؤسسةا الازكي للتجارة والمقاولات", > "_version_": 1574621132197200000 > }, > { > "id": "15914", > "bizNameAr": "العلوي والازكي المتحدة ش.م.م", > "_version_": 1574621132344000500 > }, > { > "id": "20639", > "bizNameAr": "سحائب ازكي للتجارة", > "_version_": 1574621132574687200 > }, > { > "id": "25108", > "bizNameAr": "المستشفيات - - مستشفى إزكي", > "_version_": 1574621132737216500 > }, > { > "id": "27629", > "bizNameAr": "وزارة الداخلية - - - والي إزكي -", > "_version_": 1574621132833685500 > }, > { > "id": "36351", > "bizNameAr": "طوارئ الكهرباء - إزكي", > "_version_": 1574621133183910000 > }, > { > "id": "61235", > "bizNameAr": "اضواء ازكي للتجارة", > "_version_": 1574621133785792500 > }, > { > "id": "66821", > "bizNameAr": "أطلال إزكي للتجارة", > "_version_": 1574621133915816000 > }, > { > "id": "67011", > "bizNameAr": "بنك ظفار - فرع ازكي", > "_version_": 1574621133920010200 > } > ] > } > } > > Actually we expecting the below results only since it has both the > words that we typed while searching: > > { > "id": "28107", > "bizNameAr": "شرطة عمان السلطانية - قيادة شرطة محافظة الداخلية > - > - > مركز شرطة إزكي", > "_version_": 1574621132849414100 > }, > > > Configuration: > > In schema.xml we configured as below: > > <field name="bizNameAr" type="text_ar" indexed="true" > stored="true"/> > > > <fieldType name="text_ar" class="solr.TextField" > positionIncrementGap="100"> > <analyzer> > <tokenizer class="solr.StandardTokenizerFactory"/> > <filter class="solr.StopFilterFactory" ignoreCase="true" > words="lang/stopwords_ar.txt" /> > <filter class="solr.ArabicNormalizationFilterFactory"/> > <filter class="solr.ArabicStemFilterFactory"/> > <filter class="solr.ICUFoldingFilterFactory"/> > <filter class="solr.HyphenatedWordsFilterFactory"/> > <charFilter class="solr.PatternReplaceCharFilterFactory" > pattern="ى" > replacement="ئ"/> > <charFilter class="solr.PatternReplaceCharFilterFactory" > pattern="ء" > replacement=""/> > </analyzer> > </fieldType> > > > Thanks, > > > > > > -- > View this message in context: http://lucene.472066.n3. > nabble.com/Arabic-words-search-in-solr-tp4317733p4348774.html > Sent from the Solr - User mailing list archive at Nabble.com. >