Hi Rantjil Bould,
I would suggest you to give a thought on Trie data structure which is
used for auto-complete. Hitting Solr for every prefix looks time
consuming job, but I might be wrong. I have Trie implementation and it
works very fast (of course it is in memory data structure unlike solr
index which lies on disk)
--Thanks and Regards
Vaijanath
Rantjil Bould wrote:
Hi Group,
I have already got some valuable suggestions from group. Based
on that, I have come out with following process to finally implement
autocomplete like fetaure in my system
1- Index the whole documents
2- Extract all terms using indexReader's terms() method
I am getting terms like vl,vla,vlan,vlana,vlanan,vlanand. But I would like
to get absolute terms i.e. vlanand. The field definition in solr is
<fieldType name="text" class="solr.TextField" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"></tokenizer>
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt" enablePositionIncrements="true"></filter>
<filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="1"
catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"></filter>
<filter class="solr.LowerCaseFilterFactory"></filter>
<filter class="solr.EnglishPorterFilterFactory"
protected="protwords.txt"></filter>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"></filter>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"></tokenizer>
<filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
ignoreCase="true" expand="true"></filter>
<filter class="solr.StopFilterFactory" ignoreCase="true"
words="stopwords.txt"></filter>
<filter class="solr.WordDelimiterFilterFactory"
generateWordParts="1" generateNumberParts="1" catenateWords="0"
catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"></filter>
<filter class="solr.LowerCaseFilterFactory"></filter>
<filter class="solr.EnglishPorterFilterFactory"
protected="protwords.txt"></filter>
<filter class="solr.RemoveDuplicatesTokenFilterFactory"></filter>
</analyzer>
</fieldType>
Would appreciate your input to get absolute terms??
3- For each term, extract documents containing those term using termDocs()
method
4- Create one more index with fields, term, frequency and docNo. This index
would be used for autocomplete feature.
5- Any letter typed by user in search field, use Ajax script (like
scriptaculous or JQuery) to extract all terms using prefix query.
6- Based on search term selected by user, keep track of document nos in
which this term belongs.
7- For next search term selection using documents nos to select all terms
excluding currently selected term.
This somehow works. As new to SOlr ans also to Lucene, I would like to know
in case it can be improved?
- RB