Re: Solr suggest is related to second letter, not to initial letter

Volkan Altan Tue, 17 Feb 2015 00:48:13 -0800

First of all thank you for your answer.

Example Url:
doc 1 suggest_field: galaxy samsung s5 phone
doc 2 suggest_field: shoe adidas 2 hiking



http://localhost:8983/solr/solr/suggest?q=galaxy+s

The result for which I am waiting is just like the one indicated below. But; 
the ‘’Galaxy shoe’’ isn’t supposed to appear. However,unfortunately, the galaxy 
shoe appears now.


<lst name="collation">
<str name="collationQuery">galaxy samsung</str>
<int name="hits">0</int>
<lst name="misspellingsAndCorrections">
<str name="galaxy">galaxy</str>
<str name="samsung">samsung</str>
</lst>
</lst>
<lst name="collation">
<str name="collationQuery">galaxy s5</str>
<int name="hits">0</int>
<lst name="misspellingsAndCorrections">
<str name="galaxy">galaxy</str>
<str name="s5">s5</str>
</lst>
</lst>


I don’t want to use KeywordTokenizer. Because, as long as the compound words 
written by the user are available in any document, I am able to receive a 
conclusion. I just don’t want “q=galaxy + samsung” to appear; because it is an 
inappropriate suggession and it doesn’t work.

Many Thanks Ahead of Time!


My settings;

<searchComponent class="solr.SpellCheckComponent" name="suggest">
        <lst name="spellchecker">
            <str name="name">default</str>
            <str 
name="classname">org.apache.solr.spelling.suggest.Suggester</str>
            <str 
name="lookupImpl">org.apache.solr.spelling.suggest.tst.TSTLookup</str>          
  
            <str name="field">suggestions</str> 
            <float name="threshold">0.00001</float>
            <str name="buildOnCommit">true</str>
        </lst>
        <str name="queryAnalyzerFieldType">suggest_term</str>
    </searchComponent>
    <!-- auto-complete -->
    <requestHandler name="/suggest" class="solr.SearchHandler">
        <lst name="defaults">
            <str name="spellcheck">true</str>
            <str name="spellcheck.build">false</str>
            <str name="spellcheck.dictionary">default</str>
            <str name="spellcheck.onlyMorePopular">true</str>
            <str name="spellcheck.count">10</str>
            <str name=“spellcheck.collate">true</str>
            <str name="spellcheck.collateExtendedResults">true</str>
            <str name="spellcheck.maxCollations">10</str>
            <str name="spellcheck.maxCollationTries">100</str>
        </lst>
        <arr name="components">
            <str>suggest</str>
        </arr>
 </requestHandler>


<fieldType name="suggest_term" class="solr.TextField" 
positionIncrementGap="100">
            <analyzer type="index">
                <charFilter class="solr.MappingCharFilterFactory" 
mapping="mapping-PunctuationToSpace.txt"/>
                <tokenizer class="solr.KeywordTokenizerFactory"/>
                <filter class="solr.TrimFilterFactory"/>                
                <filter class="solr.LowerCaseFilterFactory" />
                <filter class="solr.TurkishLowerCaseFilterFactory"/>
                <filter class="solr.StandardFilterFactory"/>
                <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
                <filter class="solr.StopFilterFactory" ignoreCase="true" 
words="stopwords.txt" enablePositionIncrements="true" />
            </analyzer>
            <analyzer type="query">
                <charFilter class="solr.MappingCharFilterFactory" 
mapping="mapping-PunctuationToSpace.txt"/>
                <tokenizer class="solr.KeywordTokenizerFactory"/>
                <filter class="solr.TrimFilterFactory"/>
                <filter class="solr.StandardFilterFactory"/>
                <filter class="solr.ApostropheFilterFactory"/>
                <filter class="solr.TurkishLowerCaseFilterFactory"/>            
    
                <filter class="solr.StopFilterFactory" ignoreCase="true" 
words="stopwords.txt" enablePositionIncrements="true"/>
            </analyzer>
</fieldType>


> On 16 Şub 2015, at 03:52, Michael Sokolov <msoko...@safaribooksonline.com> 
> wrote:
> 
> StandardTokenizer splits your text into tokens, and the suggester suggests 
> tokens independently.  It sounds as if you want the suggestions to be based 
> on the entire text (not just the current word), and that only adjacent words 
> in the original should appear as suggestions.  Assuming that's what you are 
> after (it's a little hard to tell from your e-mail -- you might want to 
> clarify by providing a few example of how you *do* want it to work instead of 
> just examples of how you *don't* want it to work), you have a couple of 
> choices:
> 
> 1) don't use StandardTokenizer, use KeywordTokenizer instead - this will 
> preserve the entire original text and suggest complete texts, rather than 
> words
> 2) maybe consider using a shingle filter along with standard tokenizer, so 
> that your tokens include multi-word shingles
> 3) Use a suggester with better support for a statistical language model, like 
> this one: 
> http://blog.mikemccandless.com/2014/01/finding-long-tail-suggestions-using.html,
>  but to do this you will probably need to do some java programming since it 
> isn't well integrated into solr
> 
> -Mike
> 
> On 2/14/2015 3:44 AM, Volkan Altan wrote:
>> Any idea?
>> 
>> 
>>> On 12 Şub 2015, at 11:12, Volkan Altan <volkanal...@gmail.com> wrote:
>>> 
>>> Hello Everyone,
>>> 
>>> All I want to do with Solr suggester is obtaining the fact that the 
>>> asserted suggestions  for the second letter whose entry actualizes after 
>>> the initial letter  is actually related to initial letter, itself. But; 
>>> just like the initial letters, the second letters rotate independently, as 
>>> well.
>>> 
>>> 
>>> Example;
>>> http://localhost:8983/solr/solr/suggest?q=facet_suggest_data:”adidas+s"; 
>>> <http://localhost:8983/solr/vitringez/suggest?q=facet_suggest_data:%22adidas+s%22>
>>> 
>>> adidas s
>>> 
>>> response>
>>> <lst name="responseHeader">
>>> <int name="status">0</int>
>>> <int name="QTime">4</int>
>>> </lst>
>>> <lst name="spellcheck">
>>> <lst name="suggestions">
>>> <lst name="s">
>>> <int name="numFound">1</int>
>>> <int name="startOffset">27</int>
>>> <int name="endOffset">28</int>
>>> <arr name="suggestion">
>>> <str>samsung</str>
>>> </arr>
>>> </lst>
>>> <lst name="collation">
>>> <str name="collationQuery">facet_suggest_data:"adidas samsung"</str>
>>> <int name="hits">0</int>
>>> <lst name="misspellingsAndCorrections">
>>> <str name="adidas">adidas</str>
>>> <str name="s">samsung</str>
>>> </lst>
>>> </lst>
>>> </lst>
>>> </lst>
>>> </response>
>>> 
>>> 
>>> The terms of ‘’Adidas’’ and ‘’Samsung’’ are available within seperate 
>>> documents. A common place in which both of them are available cannot be 
>>> found.
>>> 
>>> How can I solve that problem?
>>> 
>>> 
>>> 
>>> schema.xml
>>> 
>>> <fieldType name="suggestions_type" class="solr.TextField" 
>>> positionIncrementGap="100">
>>>             <analyzer type="index">
>>>                 <charFilter class="solr.HTMLStripCharFilterFactory"/>
>>>                 <tokenizer class="solr.StandardTokenizerFactory"/>
>>>                 <filter class="solr.ApostropheFilterFactory"/>
>>>                 <filter class="solr.SynonymFilterFactory" 
>>> synonyms="synonyms.txt" ignoreCase="true" expand="false"/>
>>>                 <filter class="solr.StopFilterFactory" ignoreCase="true" 
>>> words="stopwords.txt" enablePositionIncrements="true" />
>>>                 <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>>>             </analyzer>
>>>             <analyzer type="query">
>>>                 <charFilter class="solr.HTMLStripCharFilterFactory"/>
>>>                 <tokenizer class="solr.StandardTokenizerFactory"/>
>>>                 <filter class="solr.ApostropheFilterFactory"/>
>>>                 <filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
>>>             </analyzer>
>>>         </fieldType>
>>> 
>>> <field name=“facet_suggest_data" type="suggestions_type" indexed="true" 
>>> multiValued="true" stored="false" omitNorms="true"/>
>>> 
>>> 
>>> Best
>>> 
>> 
>

Re: Solr suggest is related to second letter, not to initial letter

Reply via email to