I've been indexing and reindexing stuff here with Shingles. I don't believe
it's the best approach. Results are interesting, but I believe it's not what
the suggester is meant to be.

I tried

<fieldType name="textSuggestion" class="solr.TextField"
positionIncrementGap="10" stored="false" multiValued="true">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.ShingleFilterFactory" maxShingleSize="4"
outputUnigrams="true" outputUnigramsIfNoShingles="false" />
</analyzer>
<analyzer type="query">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.StandardFilterFactory"/>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"/>
</analyzer>
</fieldType>

but I got compound words in the suggestion itself.

If you query them like http://localhost:8983/solr/{mycore}/suggest/?q=dri i
get

<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">1</int>
</lst>
<lst name="spellcheck">
<lst name="suggestions">
<lst name="dri">
<int name="numFound">6</int>
<int name="startOffset">0</int>
<int name="endOffset">3</int>
<arr name="suggestion">
<str>drivers</str>
<str>drivers nvidia</str>
<str>drivers intel</str>
<str>drivers nvidia geforce</str>
<str>drive</str>
<str>driver</str>
</arr>
</lst>
<str name="collation">drivers</str>
</lst>
</lst>
</response>

but when i enter the second word,
http://localhost:8983/solr/{mycore}/suggest/?q=drivers%20n<http://localhost:8983/solr/%7Bmycore%7D/suggest/?q=drivers%20n>
it
scrambles everything

<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">0</int>
</lst>
<lst name="spellcheck">
<lst name="suggestions">
<lst name="drivers">
<int name="numFound">4</int>
<int name="startOffset">0</int>
<int name="endOffset">7</int>
<arr name="suggestion">
<str>drivers</str>
<str>drivers nvidia</str>
<str>drivers intel</str>
<str>drivers nvidia geforce</str>
</arr>
</lst>
<lst name="n">
<int name="numFound">10</int>
<int name="startOffset">8</int>
<int name="endOffset">9</int>
<arr name="suggestion">
<str>nvidia</str>
<str>net</str>
<str>nvidia geforce</str>
<str>network</str>
<str>new</str>
<str>n</str>
<str>ninja</str>
</arr>
</lst>
<str name="collation">drivers nvidia</str>
</lst>
</lst>
</response>

Although the collation seems fine for this, it's not exactly what suggester
is supposed to do.

Any thoughts?

2011/8/17 Alexei Martchenko <ale...@superdownloads.com.br>

> I have the very very very same problem. I could copy+paste your message as
> mine. I've discovered so far that bigger dictionaries work better for me,
> controlling threshold is much better than avoid indexing one or twio fields.
> Of course i'm still polishing this.
>
> At this very moment I was looking into Shingles, are you using them?
> http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.ShingleFilterFactory
>
> How are your fields?
>
> 2011/8/17 Kuba Krzemień <krzemien.k...@gmail.com>
>
>> Hello, I am working on creating a auto-complete functionality for my
>> platform which indexes large ammounts of text (title + contents) - there is
>> too much data for a dictionary. I am using the latest version of Solr (3.3)
>> and I am trying to take advantage of the Suggester functionality.
>> Unfortunately so far the outcome isn't that great.
>>
>> The Suggester works only for single words or whole phrases (depends on the
>> tokenizer). When using the first option, I am unable to suggest any combined
>> queries. For example the suggestion for 'ne' will be 'new'. Suggestion for
>> 'new y' will be two separate lists, one for 'new' and one for 'y'. Whats
>> worse, querying 'new AND y' gives the same results (also when using
>> collate), which means that the returned suggestion may give no results -
>> what makes sense separately often doesn't work combined. I need a way to
>> find only those suggestions, that will return results when doing a AND query
>> (for example 'new AND york', 'new AND year', as long as they give results
>> upon querying - 'new AND yeti' shouldn't be returned as a suggestion).
>>
>> When I use the second tokenizer and the suggestions return phrases, for
>> 'ne' I will get 'new york' and 'new year', but for 'new y' I will get
>> nothing. Also, for 'y' I will get nothing, so the issue remains.
>>
>> If someone has some experience working with the Suggester, or if someone
>> has created a well working auto-suggester based on Solr, please help me.
>> I've been trying to find a sollution for this for quite some time.
>>
>> Yours sincerely,
>> Jackob K
>>
>
>
>
> --
>
> *Alexei Martchenko* | *CEO* | Superdownloads
> ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
> 5083.1018/5080.3535/5080.3533
>
>


-- 

*Alexei Martchenko* | *CEO* | Superdownloads
ale...@superdownloads.com.br | ale...@martchenko.com.br | (11)
5083.1018/5080.3535/5080.3533

Reply via email to