Thanks Erick, its working now as expected.
Thanks and Regards, Preeti Bhat -----Original Message----- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: Wednesday, June 28, 2017 9:20 PM To: solr-user Subject: Re: Using asterik(*) with unicode characters. There's a long blog on wildcards here: https://lucidworks.com/2011/11/29/whats-with-lowercasing-wildcard-multiterm-queries-in-solr/ The gist is that when you are analyzing a token, if the analysis chain splits a token into more than one part then wildcards are impossible to get right. So any "MultiTermAware" filter will barf if you ask it to emit more than one token when doing wildcard searches. For filters that are _not_ "MultiTermAware", they're just skipped in the query analysis chain. That leaves the question of why your query chain seems to emit two tokens for MöllerGruppen but not MollerGruppen. I think it's because you have preserveOriginal set to true in the query analysis chain here: <filter class="solr.ASCIIFoldingFilterFactory" preserveOriginal="true"/> So this entry emits both MöllerGruppen and MollerGruppen for the input MöllerGruppen but not for MollerGruppen since MollerGruppen doesn't need any folding. This violates this constraint imposed by ASCIIFoldingFilterFactory being "MultiTermAware", which means if it emits two tokens it barfs. You do not need to set "preserveOriginal='true' " in your _query_ chain since your indexing chain puts both the folded and un-folded versions in the index at the same position. So I think if you set perserveOriginal to false (again, in the _query_ analysis chain, leave it true in the index analysis chain) you'll be OK. Your queries will also be somewhat faster. Best, Erick On Wed, Jun 28, 2017 at 6:25 AM, Preeti Bhat <preeti.b...@shoregrp.com> wrote: > Hi All, > > I have a requirement where the user can give an Unicode or ascii character as > input but expects same result. > > For example: MöllerGruppen AS vs MollerGruppen AS should give out same result. > > I am able to get this done using <filter > class="solr.ASCIIFoldingFilterFactory" preserveOriginal="true"/>, but due to > some reason when it try to do MöllerGruppen* I am getting the below message. > > ""metadata":[ > "error-class","org.apache.solr.common.SolrException", > "root-error-class","org.apache.solr.common.SolrException"], > "msg":"analyzer returned too many terms for multiTerm term: > MöllerGruppen", > "code":400}} > " > > It works for MollerGruppen* though. > > Could someone please advise on this. > > Below is the fieldtype of this field. > > <fieldType name="string_ci" class="solr.TextField"> > <analyzer type="index"> > <charFilter class="solr.HTMLStripCharFilterFactory"/> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.ASCIIFoldingFilterFactory" > preserveOriginal="true"/> > <filter class="solr.TrimFilterFactory"/> > <filter class="solr.StopFilterFactory" words="stopwords.txt" > ignoreCase="true"/> > <filter class="solr.WordDelimiterFilterFactory" > generateWordParts="1" splitOnCaseChange="0" catenateWords="1" > splitOnNumerics="0" stemEnglishPossessive="0" preserveOriginal="1"/> > </analyzer> > <analyzer type="query"> > <charFilter class="solr.HTMLStripCharFilterFactory"/> > <tokenizer class="solr.WhitespaceTokenizerFactory"/> > <filter class="solr.LowerCaseFilterFactory"/> > <filter class="solr.ASCIIFoldingFilterFactory" > preserveOriginal="true"/> > <filter class="solr.TrimFilterFactory"/> > <filter class="solr.StopFilterFactory" words="stopwords.txt" > ignoreCase="true"/> > <filter class="solr.WordDelimiterFilterFactory" > generateWordParts="1" splitOnCaseChange="0" catenateWords="1" > splitOnNumerics="0" stemEnglishPossessive="0" preserveOriginal="1"/> > </analyzer> > </fieldType> > > > > Thanks and Regards, > Preeti > > > > NOTICE TO RECIPIENTS: This communication may contain confidential and/or > privileged information. If you are not the intended recipient (or have > received this communication in error) please notify the sender and > it-supp...@shoregrp.com immediately, and destroy this communication. Any > unauthorized copying, disclosure or distribution of the material in this > communication is strictly forbidden. Any views or opinions presented in this > email are solely those of the author and do not necessarily represent those > of the company. Finally, the recipient should check this email and any > attachments for the presence of viruses. The company accepts no liability for > any damage caused by any virus transmitted by this email. > > NOTICE TO RECIPIENTS: This communication may contain confidential and/or privileged information. If you are not the intended recipient (or have received this communication in error) please notify the sender and it-supp...@shoregrp.com immediately, and destroy this communication. Any unauthorized copying, disclosure or distribution of the material in this communication is strictly forbidden. Any views or opinions presented in this email are solely those of the author and do not necessarily represent those of the company. Finally, the recipient should check this email and any attachments for the presence of viruses. The company accepts no liability for any damage caused by any virus transmitted by this email.