At Netflix, we load the completion lexicon with movie titles, person
names, and a few aliases. Even then, we find a few misspellings in
our metadata (is it "NWA" or "N.W.A."?). Extracting terms from
documents will find a lot of misspellings.

You really do not want to rely on random users to correctly spell
things like Ratatouille and Koyaanisqatsi. Trust me.

Autocomplete needs to be really fast, so we use a dedicated
in-memory index (RAMDirectory) in the front end webapp and
also use an HTTP cache in the load balancer.

We get at least 25 million autocomplete requests a day, more
than 10X the number of search requests. I would plan for
10-15X search traffic.

wunder

On 12/19/08 10:45 AM, "Grant Ingersoll" <gsing...@apache.org> wrote:

> I'd add you probably don't want just the query logs, people may search
> for things that aren't in the index, too.  Your call as to whether
> that is useful or not.  Also, have a look at the TermsComponent, as it
> will tell you the doc freq for terms.
> 
> On Dec 19, 2008, at 10:08 AM, roberto wrote:
> 
>> Erick,
>> 
>> Thanks this sounds good, i'll try.
>> 
>> Mike,
>> 
>> Could you give more details about query logs?
>> 
>> Thanks
>> 
>> On Fri, Dec 19, 2008 at 12:02 AM, Mike Klaas <mike.kl...@gmail.com>
>> wrote:
>> 
>>> 
>>> On 18-Dec-08, at 10:53 AM, roberto wrote:
>>> 
>>> Erick,
>>>> 
>>>> Thanks for the answer, let me clarify the thing, we would like to
>>>> have a
>>>> combobox with the terms to guide the user in the search i mean, if
>>>> a have
>>>> thousands of documents and want to tell them how many documents in
>>>> the
>>>> base
>>>> have the particular word, how can i do that?
>>>> 
>>> 
>>> Sounds like you want query autocomplete.  The best way to do this
>>> (including if you want the box filled with some queries), is to use
>>> the
>>> query logs, not the documents.
>>> 
>>> -Mike
>>> 
>> -- 
>> "Without love, we are birds with broken wings."
>> Morrie
> 
> --------------------------
> Grant Ingersoll
> 
> Lucene Helpful Hints:
> http://wiki.apache.org/lucene-java/BasicsOfPerformance
> http://wiki.apache.org/lucene-java/LuceneFAQ


Reply via email to