StandardTokenizer splits your text into tokens, and the suggester
suggests tokens independently. It sounds as if you want the suggestions
to be based on the entire text (not just the current word), and that
only adjacent words in the original should appear as suggestions.
Assuming that's what you are after (it's a little hard to tell from your
e-mail -- you might want to clarify by providing a few example of how
you *do* want it to work instead of just examples of how you *don't*
want it to work), you have a couple of choices:
1) don't use StandardTokenizer, use KeywordTokenizer instead - this will
preserve the entire original text and suggest complete texts, rather
than words
2) maybe consider using a shingle filter along with standard tokenizer,
so that your tokens include multi-word shingles
3) Use a suggester with better support for a statistical language model,
like this one:
http://blog.mikemccandless.com/2014/01/finding-long-tail-suggestions-using.html,
but to do this you will probably need to do some java programming since
it isn't well integrated into solr
-Mike
On 2/14/2015 3:44 AM, Volkan Altan wrote:
Any idea?
On 12 Şub 2015, at 11:12, Volkan Altan <volkanal...@gmail.com> wrote:
Hello Everyone,
All I want to do with Solr suggester is obtaining the fact that the asserted
suggestions for the second letter whose entry actualizes after the initial
letter is actually related to initial letter, itself. But; just like the
initial letters, the second letters rotate independently, as well.
Example;
http://localhost:8983/solr/solr/suggest?q=facet_suggest_data:”adidas+s"
<http://localhost:8983/solr/vitringez/suggest?q=facet_suggest_data:%22adidas+s%22>
adidas s
response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">4</int>
</lst>
<lst name="spellcheck">
<lst name="suggestions">
<lst name="s">
<int name="numFound">1</int>
<int name="startOffset">27</int>
<int name="endOffset">28</int>
<arr name="suggestion">
<str>samsung</str>
</arr>
</lst>
<lst name="collation">
<str name="collationQuery">facet_suggest_data:"adidas samsung"</str>
<int name="hits">0</int>
<lst name="misspellingsAndCorrections">
<str name="adidas">adidas</str>
<str name="s">samsung</str>
</lst>
</lst>
</lst>
</lst>
</response>
The terms of ‘’Adidas’’ and ‘’Samsung’’ are available within seperate
documents. A common place in which both of them are available cannot be found.
How can I solve that problem?
schema.xml
<fieldType name="suggestions_type" class="solr.TextField"
positionIncrementGap="100">
<analyzer type="index">
<charFilter class="solr.HTMLStripCharFilterFactory"/>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.ApostropheFilterFactory"/>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="false"/>
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true" />
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
<analyzer type="query">
<charFilter class="solr.HTMLStripCharFilterFactory"/>
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.ApostropheFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>
<field name=“facet_suggest_data" type="suggestions_type" indexed="true" multiValued="true"
stored="false" omitNorms="true"/>
Best