Re: Your valuable suggestion on autocomplete

Vaijanath N. Rao Tue, 06 May 2008 00:14:09 -0700

Hi Rantjil Bould,

I would suggest you to give a thought on Trie data structure which isused for auto-complete. Hitting Solr for every prefix looks timeconsuming job, but I might be wrong. I have Trie implementation and itworks very fast (of course it is in memory data structure unlike solrindex which lies on disk)


--Thanks and Regards
Vaijanath



Rantjil Bould wrote:

Hi Group,
             I have already got some valuable suggestions from group. Based
on that, I have come out with following process to finally implement
autocomplete like fetaure in my system
1- Index the whole documents
2- Extract all terms using indexReader's terms() method

I am getting terms like vl,vla,vlan,vlana,vlanan,vlanand. But I would like
to get absolute terms i.e. vlanand. The field definition in solr is

<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
      <analyzer type="index">
        <tokenizer class="solr.WhitespaceTokenizerFactory"></tokenizer>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true"></filter>
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"></filter>
        <filter class="solr.LowerCaseFilterFactory"></filter>
        <filter class="solr.EnglishPorterFilterFactory"
protected="protwords.txt"></filter>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"></filter>
      </analyzer>
      <analyzer type="query">
        <tokenizer class="solr.WhitespaceTokenizerFactory"></tokenizer>
        <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"></filter>
        <filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"></filter>
        <filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="0"
catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"></filter>
        <filter class="solr.LowerCaseFilterFactory"></filter>
        <filter class="solr.EnglishPorterFilterFactory"
protected="protwords.txt"></filter>
        <filter class="solr.RemoveDuplicatesTokenFilterFactory"></filter>
      </analyzer>
    </fieldType>

Would appreciate your input to get absolute terms??

3- For each term, extract documents containing those term using termDocs()
method
4- Create one more index with fields, term, frequency and docNo. This index
would be used for autocomplete feature.
5- Any letter typed by user in search field, use Ajax script (like
scriptaculous or JQuery) to extract all terms using prefix query.
6- Based on search term selected by user, keep track of document nos in
which this term belongs.
7- For next search term selection using documents nos to select all terms
excluding currently selected term.

This somehow works. As new to SOlr ans also to Lucene, I would like to know
in case it can be improved?

- RB

Re: Your valuable suggestion on autocomplete

Reply via email to