Hi Emir, The solution we wanted to implement is to show top 100 best match technology names from the list of technology names we have. Whatever technology names user has typed will first reach SQL Server and exact match will be done if possible[name==name] , only those do not exactly match[spelling mistakes, jumbled words] will be searched in SOLR.
With the below setup if I query title:(Microsoft Ofice 365) I get the below result [note:scores are same?] { "title":"Lync - Microsoft Office 365", "score":7.7472024 }, { "title":"Microsoft Office 365", "score":7.7472024 }, When I query title:(Microsoft Ofice 365) OR title_ws:(Microsoft Ofice 365) { "title":"RIM BlackBerry Enterprise Server (BES) for Microsoft Office 365 1.0", "title_ws":"RIM BlackBerry Enterprise Server (BES) for Microsoft Office 365 1.0", "score":3.9297152 }, { "title":"Microsoft Office 365", "title_ws":"Microsoft Office 365", "score":3.1437721 } When I query title:(Microsoft Ofice 365) OR title_ws:(Microsoft Ofice 365) qf=title_ws^1 I don’t get any results The expected result is { "title":"Microsoft Office 365", "title_ws":"Microsoft Office 365", }, { "title":"Microsoft Office 365 1.0", "title_ws":"Microsoft Office 365 1.0", }, { "title":"Microsoft Office 365 14.0", "title_ws":"Microsoft Office 365 14.0", }, { "title":"Microsoft Office 365 14.3", "title_ws":"Microsoft Office 365 14.3", }, { "title":"Microsoft Office 365 14.4", "title_ws":"Microsoft Office 365 14.4", }, <fieldType name="txt_token_ng" class="solr.TextField" positionIncrementGap="0" omitNorms="false"> <analyzer type="index"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.NGramFilterFactory" minGramSize="2" maxGramSize="2"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.NGramFilterFactory" minGramSize="2" maxGramSize="2"/> </analyzer> </fieldType> <field name="id" type="int" indexed="true" stored="true" required="true" multiValued="false" /> <field name="title" type="txt_token_ng" indexed="true" stored="true" multiValued="false"/> <field name="manufacturername" type="txt_token_ng" indexed="true" stored="true" multiValued="false"/> <field name="productname" type="txt_token_ng" indexed="true" stored="true" multiValued="false"/> <field name="version" type="txt_token_ng" indexed="true" stored="true" multiValued="false"/> <fieldType name="txt_token_ws" class="solr.TextField" positionIncrementGap="0" omitNorms="false"> <analyzer type="index"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> <analyzer type="query"> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType> <field name="title_ws" type="txt_token_ws" indexed="true" stored="true" multiValued="false"/> <field name="manufacturername_ws" type="txt_token_ws" indexed="true" stored="true" multiValued="false"/> <field name="productname_ws" type="txt_token_ws" indexed="true" stored="true" multiValued="false"/> <field name="version_ws" type="txt_token_ws" indexed="true" stored="true" multiValued="false"/> <copyField source="title" dest="title_ws"/> <copyField source="manufacturername" dest="manufacturername_ws"/> <copyField source="productname" dest="productname_ws"/> <copyField source="version" dest="version_ws"/> Thanks Rajesh Corporate Executive Board India Private Limited. Registration No: U741040HR2004PTC035324. Registered office: 6th Floor, Tower B, DLF Building No.10 DLF Cyber City, Gurgaon, Haryana-122002, India. This e-mail and/or its attachments are intended only for the use of the addressee(s) and may contain confidential and legally privileged information belonging to CEB and/or its subsidiaries, including CEB subsidiaries that offer SHL Talent Measurement products and services. If you have received this e-mail in error, please notify the sender and immediately, destroy all copies of this email and its attachments. The publication, copying, in whole or in part, or use or dissemination in any other way of this e-mail and attachments by anyone other than the intended person(s) is prohibited. -----Original Message----- From: Emir Arnautovic [mailto:emir.arnauto...@sematext.com] Sent: Monday, March 7, 2016 8:16 PM To: solr-user@lucene.apache.org Subject: Re: Text search NGram Not sure I understood question. What I meant is you to try setting omitNorms="false" to your txt_token field type if you want to stick with ngram only solution: <fieldType name="txt_token" class="solr.TextField" positionIncrementGap="0" omitNorms="false"> <analyzer type="index"> <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="\s+" replacement=" "/> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.NGramFilterFactory" minGramSize="2" maxGramSize="800"/> </analyzer> <analyzer type="query"> <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="\s+" replacement=" "/> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> <filter class="solr.NGramFilterFactory" minGramSize="2" maxGramSize="800"/> </analyzer> </fieldType> and to add new field type and field to keep nonngram version of field. Something like: <fieldType name="txt_token_simple" class="solr.TextField" positionIncrementGap="0" > <analyzer type="index"> <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="\s+" replacement=" "/> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> <analyzer type="query"> <charFilter class="solr.PatternReplaceCharFilterFactory" pattern="\s+" replacement=" "/> <tokenizer class="solr.WhitespaceTokenizerFactory"/> <filter class="solr.LowerCaseFilterFactory"/> </analyzer> </fieldType> and use copyField to copy to both fields and query title:test OR title_simple:test. Emir On 07.03.2016 15:31, G, Rajesh wrote: > Hi Emir, > > I have already applied > > <tokenizer class="solr.WhitespaceTokenizerFactory"/> and then I have applied > <filter class="solr.NGramFilterFactory" minGramSize="2" maxGramSize="800"/>. > Is this what you wanted me to have in my config? > > Thanks > Rajesh > > > > Corporate Executive Board India Private Limited. Registration No: > U741040HR2004PTC035324. Registered office: 6th Floor, Tower B, DLF Building > No.10 DLF Cyber City, Gurgaon, Haryana-122002, India. > > This e-mail and/or its attachments are intended only for the use of the > addressee(s) and may contain confidential and legally privileged information > belonging to CEB and/or its subsidiaries, including CEB subsidiaries that > offer SHL Talent Measurement products and services. If you have received this > e-mail in error, please notify the sender and immediately, destroy all copies > of this email and its attachments. The publication, copying, in whole or in > part, or use or dissemination in any other way of this e-mail and attachments > by anyone other than the intended person(s) is prohibited. > > -----Original Message----- > From: G, Rajesh [mailto:r...@cebglobal.com] > Sent: Monday, March 7, 2016 7:50 PM > To: solr-user@lucene.apache.org > Subject: RE: Text search NGram > > Hi Emir, > > Thanks for you email. Can you please help me to understand what do you mean > by "e.g. boost if matching tokenized fileds to make sure exact matches are > ordered first" > > > > Corporate Executive Board India Private Limited. Registration No: > U741040HR2004PTC035324. Registered office: 6th Floor, Tower B, DLF Building > No.10 DLF Cyber City, Gurgaon, Haryana-122002, India. > > This e-mail and/or its attachments are intended only for the use of the > addressee(s) and may contain confidential and legally privileged information > belonging to CEB and/or its subsidiaries, including CEB subsidiaries that > offer SHL Talent Measurement products and services. If you have received this > e-mail in error, please notify the sender and immediately, destroy all copies > of this email and its attachments. The publication, copying, in whole or in > part, or use or dissemination in any other way of this e-mail and attachments > by anyone other than the intended person(s) is prohibited. > > -----Original Message----- > From: Emir Arnautovic [mailto:emir.arnauto...@sematext.com] > Sent: Monday, March 7, 2016 7:36 PM > To: solr-user@lucene.apache.org > Subject: Re: Text search NGram > > Hi Rajesh, > It is most likely related to norms - you can try setting omitNorms="true" and > reindexing content. Anyway, it is not common to use just ngrams for matching > content - in such case you can expect more unexpected ordering/results. You > should combine ngrams fields with normally tokenized fields (e.g. boost if > matching tokenized fileds to make sure exact matches are ordered first). > > Regards, > Emir > > On 07.03.2016 11:44, G, Rajesh wrote: >> Hi Team, >> >> We have the blow type and we have indexed the value "title": "Microsoft >> Visual Studio 2006" and "title": "Microsoft Visual Studio 8.0.61205.56 >> (2005)" >> >> When I search for title:(Microsoft Visual AND Studio AND 2005) I get >> Microsoft Visual Studio 8.0.61205.56 (2005) as the second record and >> Microsoft Visual Studio 2006 as first record. I wanted to have Microsoft >> Visual Studio 8.0.61205.56 (2005) listed first since the user has searched >> for Microsoft Visual Studio 2005. Can you please help?. >> >> We are using NGram so it takes care of misspelled or jumbled words[it >> works as expected] e.g. >> searching Micrs Visual Studio will gets Microsoft Visual Studio >> searching Visual Microsoft Studio will gets Microsoft Visual Studio >> >> <fieldType name="txt_token" class="solr.TextField" >> positionIncrementGap="0" > >> <analyzer type="index"> >> <charFilter >> class="solr.PatternReplaceCharFilterFactory" pattern="\s+" replacement=" "/> >> <tokenizer >> class="solr.WhitespaceTokenizerFactory"/> >> <filter >> class="solr.LowerCaseFilterFactory"/> >> <filter class="solr.NGramFilterFactory" >> minGramSize="2" maxGramSize="800"/> >> </analyzer> >> <analyzer type="query"> >> <charFilter >> class="solr.PatternReplaceCharFilterFactory" pattern="\s+" replacement=" "/> >> <tokenizer >> class="solr.WhitespaceTokenizerFactory"/> >> <filter >> class="solr.LowerCaseFilterFactory"/> >> <filter class="solr.NGramFilterFactory" >> minGramSize="2" maxGramSize="800"/> >> </analyzer> >> </fieldType> >> >> >> >> Corporate Executive Board India Private Limited. Registration No: >> U741040HR2004PTC035324. Registered office: 6th Floor, Tower B, DLF Building >> No.10 DLF Cyber City, Gurgaon, Haryana-122002, India.. >> >> >> >> This e-mail and/or its attachments are intended only for the use of the >> addressee(s) and may contain confidential and legally privileged information >> belonging to CEB and/or its subsidiaries, including CEB subsidiaries that >> offer SHL Talent Measurement products and services. If you have received >> this e-mail in error, please notify the sender and immediately, destroy all >> copies of this email and its attachments. The publication, copying, in whole >> or in part, or use or dissemination in any other way of this e-mail and >> attachments by anyone other than the intended person(s) is prohibited. >> >> > -- > Monitoring * Alerting * Anomaly Detection * Centralized Log Management > Solr & Elasticsearch Support * http://sematext.com/ > -- Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr & Elasticsearch Support * http://sematext.com/