Just FYI, we have also implemented a Trie approach (outside of solr, even 
though our mail search uses solr) at the link in the signature.

You can try out the auto-completion working on the comparison tool on the home 
page.

- nishant

www.reviewgist.com


 


----- Original Message ----
From: Vaijanath N. Rao <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Tuesday, May 6, 2008 12:43:25 PM
Subject: Re: Your valuable suggestion on autocomplete

Hi Rantjil Bould,

I would suggest you to give a thought on Trie data structure which is 
used for auto-complete.  Hitting Solr for every prefix looks time 
consuming job, but I might be wrong. I have Trie implementation and it 
works very fast (of course it is in memory data structure unlike solr 
index which lies on disk)

--Thanks and Regards
Vaijanath



Rantjil Bould wrote:
> Hi Group,
>              I have already got some valuable suggestions from group. Based
> on that, I have come out with following process to finally implement
> autocomplete like fetaure in my system
> 1- Index the whole documents
> 2- Extract all terms using indexReader's terms() method
>
> I am getting terms like vl,vla,vlan,vlana,vlanan,vlanand. But I would like
> to get absolute terms i.e. vlanand. The field definition in solr is
>
> <fieldType name="text" class="solr.TextField" positionIncrementGap="100">
>       <analyzer type="index">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"></tokenizer>
>         <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt" enablePositionIncrements="true"></filter>
>         <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="1"
> catenateNumbers="1" catenateAll="0" splitOnCaseChange="0"></filter>
>         <filter class="solr.LowerCaseFilterFactory"></filter>
>         <filter class="solr.EnglishPorterFilterFactory"
> protected="protwords.txt"></filter>
>         <filter class="solr.RemoveDuplicatesTokenFilterFactory"></filter>
>       </analyzer>
>       <analyzer type="query">
>         <tokenizer class="solr.WhitespaceTokenizerFactory"></tokenizer>
>         <filter class="solr.SynonymFilterFactory" synonyms="synonyms.txt"
> ignoreCase="true" expand="true"></filter>
>         <filter class="solr.StopFilterFactory" ignoreCase="true"
> words="stopwords.txt"></filter>
>         <filter class="solr.WordDelimiterFilterFactory"
> generateWordParts="1" generateNumberParts="1" catenateWords="0"
> catenateNumbers="0" catenateAll="0" splitOnCaseChange="1"></filter>
>         <filter class="solr.LowerCaseFilterFactory"></filter>
>         <filter class="solr.EnglishPorterFilterFactory"
> protected="protwords.txt"></filter>
>         <filter class="solr.RemoveDuplicatesTokenFilterFactory"></filter>
>       </analyzer>
>     </fieldType>
>
> Would appreciate your input to get absolute terms??
>
> 3- For each term, extract documents containing those term using termDocs()
> method
> 4- Create one more index with fields, term, frequency and docNo. This index
> would be used for autocomplete feature.
> 5- Any letter typed by user in search field, use Ajax script (like
> scriptaculous or JQuery) to extract all terms using prefix query.
> 6- Based on search term selected by user, keep track of document nos in
> which this term belongs.
> 7- For next search term selection using documents nos to select all terms
> excluding currently selected term.
>
> This somehow works. As new to SOlr ans also to Lucene, I would like to know
> in case it can be improved?
>
> - RB
>
>  


      
____________________________________________________________________________________
Be a better friend, newshound, and 
know-it-all with Yahoo! Mobile.  Try it now.  
http://mobile.yahoo.com/;_ylt=Ahu06i62sR8HDtDypao8Wcj9tAcJ

Reply via email to